Model Resolution in Palantir Finance: avoiding N2

February 2nd, 2009 | Andy


N2, with N = 8

One of the big challenges in Palantir Finance comes when integrating data from multiple data providers. When the server is launched, it needs to create a coherent model of the financial world based on data coming from potentially dozens of data providers. Each data provider defines a set of “models” that it supports. These models can be things like equities, currencies, futures, options, or even new types that the providers themselves define.

The major challenge occurs when multiple providers define models that represent the same real-world entity. Provider A might know about Google, have basic open/high/low/close data for the stock, and know its ticker, country, and ISIN. Provider B might also provide a Google model, have balance sheet data, and know its country, exchange, and ISIN. We want to expose only one Google model to the user, however, and so we need a means of resolving the two Googles together – recognizing that they’re the same instrument – and adding just one equity to the system that encompasses both.

Resolution logic can be fairly complicated. For equities, for example, there are several different ways in which resolution can take place. If two equities have identical ISINs, we can be pretty confident they match, since those identifiers are declared as globally unique. If two equities have the same ticker and the same country of exchange, we might also consider that a match, though perhaps of weaker quality. Two models resolve to each other if any form of resolution considers them equal (with errors being thrown if other forms of resolution contradict the form that considers them equal…i.e. provider A and provider B agree on an instrument’s ISIN but disagree on its ticker).

Read on for the details of how we solve this seemingly n2 problem with a linear solution.

Given N models across providers of a given asset class (say, equities), there are N2 potential checks that I need to do to properly “resolve” all models, since any model can resolve to any other model in the system (and I potentially do want to attempt to resolve a model from provider A to other models from provider A to do error checking, since I may consider it invalid for a provider to provide the same model twice). Obviously we would like to do better than this, and we can, assuming that most models do not resolve to each other.

Envision the set of all (model, provider) pairs as the set of nodes on a graph. Two models from different providers that resolve to each other can be represented by an edge between two nodes in the graph. If the number of providers k is small relative to N, the number of resolution forms for a given asset class is small, and our data is valid, we can come up with an algorithm that solves our problem in N time as follows:

  1. For every form of resolution, ask the data providers for all the data necessary for resolution to take place. For ticker/country resolution, with our data provider interfaces, this gives us a map from every (model, provider) pair to its ticker and country.
  2. We can then invert this map, giving us a map from (ticker, country) pairs to a set of (model, provider) pairs. Note that the values in the inverted map do have to be sets, since there can be multiple (model, provider) pairs with the same ticker and country (indeed, this is expected if ANY models can be resolved between providers).
  3. Then, for every model, for each resolution form, we can look up the relevant properties for that model, and then look up in the inverse map any models that are equivalent to it. This tells us what edges to add to our (model, provider) graph.

We’re essentially building up an in-memory, inverted index of the relevant data each model is giving us. The amortized O(1) lookups that the hashtable-backed maps provide allows us to trade the O(N2) complexity for something more like O(N).


N checks, rather than N2 (assuming that k is trivial compared to N).

Once we’ve done this for every model each connected component of our graph should correspond to one model to be added into our final system. Since the connected components of a graph can be computed in time linear to the number of nodes, we can compute all the final models in linear time. And what is nice is that the maps give us the ability to quickly post-process our data to look for errors. If any two models in a given connected component come from the same provider, this is an error (either the provider has incorrect data, or it is modeling the data improperly). If two models from two different providers resolve, but have conflicting data for a given resolution form, this is also an error. Note that since providers do not have to provide data for every resolution form, it is possible that k models from different providers that resolve together do not form a k-clique on the graph.

Writing data providers is not always easy. There are many data sources out there that are messy, and properly modeling real world data in code can be quite challenging. That’s why it is important to come up with sound, efficient resolution logic that fails noisily, and tells the engineer building the provider when they are and are not playing nicely with the rest of the system.

2 Responses to “Model Resolution in Palantir Finance: avoiding N2”

  1. Ed Says:

    I think you may find that you are missing a layer in your model. There are companies and there are securities. A company may have multiple securities. Price data is a property of a security. A Balance Sheet is a property of a company. In your example, Google the company has a balance sheet that it reports for a given fiscal period. Google’s capital structure may include multiple classes of stock plus preferred stock, warrants, etc. as equity and various bonds and notes as debt instruments. Each of those securities will have prices that are determined by the exchange (if any) that they trade on.

  2. Andy Says:

    Hey Ed,

    You’re correct in that it’s often incorrect to resolve two securities just because they’re associated with the same company. The actual fields we cited as being resolved on are just examples, and the proper fields to resolve likely depend on what your dataset looks like. We’ve made the modeling and resolution layer pluggable within our platform, so you can define the properties you wish to resolve on, and the system will act based on what’s been defined.

Leave a Reply


Palantir