Data Model Change Eventing

May 27th, 2009 | DerekC

One of the early architectural challenges that we faced in building the Palantir Finance product was coming up with a good design for firing events from data models to their listeners. There are many different concepts in our product such as charts, portfolios, and indices which are all maintained by different developers. Initially, each developer had their own system for firing events when a data model changed. This quickly became a drag on development as tools became more integrated because we had to learn each others’ event methodologies and translate between the different systems.

The solution was to select a single event firing system. We wanted something that was easy-to-use yet powerful enough to express all the changes that might be made to a data model. Java’s Property Change Support (PCS) was a good fit because it can support arbitrary events in a very lightweight fashion.

Read on for details of our implementation…
Read the rest of this entry »

Bandwidth isn’t cheap. Disk isn’t cheap. CPU isn’t cheap.

May 22nd, 2009 | Bob

fake clearance screen

At Palantir, we work in Silicon Valley, read High Scalability, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that “Disk is cheap.” or “CPU is cheap”. For a web company with a deployment in a commercial data center (or its own data center), this received knowledge is correct. But for a company that ships distributed systems instead of hosting them, and for whom the deployment environment is the kind of locked-down server room in which classified data can reside, these assumptions couldn’t be more false.

At Palantir, we are almost never able to host our customers’ data – typically, as the data is very sensitive, we are not even allowed to see it! Our customers’ highly sensitive data has to reside in a Secure Compartmented Information Facility or SCIF – a building which has been built to be resistant to attempts to access the information within, whether through active or passive measures. The network inside a SCIF is physically separated – “airgapped” – from the public Internet to prevent information leakage. As the entire rationale for such facilities is to prevent information leakage, moving information into or out of one is a tightly regulated process, almost always requiring a human to be in the loop.
Read the rest of this entry »

Model-View-Adapter

April 20th, 2009 | Kevin

I used to think I understood MVC. In undergraduate CS programs, MVC is taught as an off-the-shelf pattern, explained once and then ready for use in the real world. Wikipedia also makes it seem pretty simple:

Model–View–Controller (MVC) is an architectural pattern used in software engineering. Successful use of the pattern isolates business logic from user interface considerations, resulting in an application where it is easier to modify either the visual appearance of the application or the underlying business rules without affecting the other. In MVC, the model represents the information (the data) of the application; the view corresponds to elements of the user interface such as text, checkbox items, and so forth; and the controller manages the communication of data and the business rules used to manipulate the data to and from the model.

They go on to show the classic triangle diagram and how it’s baked into various GUI and web frameworks. There’s only one clause in the entire article that hints at something deeper: “Though MVC comes in different flavors…”

Different flavors indeed. In fact MVC is not just a pattern but a whole family of patterns: MVC, MVA, MVP, PAC, Model-Delegate…. It very quickly gets very hairy.

In this article I want to describe one of MVC’s lesser-known variants, the Model-View-Adapter (MVA) pattern, and talk about its advantages over traditional MVC in the context of a Java Swing application.

Read the rest of this entry »

The Pokémon Problem: a new anti-pattern

March 19th, 2009 | John C

Gotta catch 'em all!

It’s always fun to release a new piece of jargon into the wild. I’ve run into a number of bugs in our codebase that caused by an anti-pattern I’d like to dub The Pokémon Problem.

Much like the game of Whac-a-Mole, this is a class of bugs where fixing every occurrence does not prevent the bug from returning in new code: it is easy for code delta to result in an instance of the bug being re-introduced into the code base. Even if you “catch ‘em all“, nothing prevents someone else from introducing new Pokémon bugs later.

Not only is this bug easy to re-introduce, but it sometimes can be hard to find all currently existing instances of this pattern. Although tools like Eclipse make it easier to track down all the places that code is called, sometimes you’re looking for things that happen in a certain sequence (which tools like Eclipse don’t do a good job of searching for) and dynamic invocation mechanisms like Java Reflection can sometimes make it impossible to be exhaustive. This type of bug is also resistant to automated refactoring: changing the protocol of dealing with this corner of your code will require you to track down all places it was touched and manually refactor them. It generally signals a failure to use sufficient separation of concerns.

In general, this anti-pattern is a result of APIs that require the caller to be responsible for state management of resources that the API owns. This can include things like an object that requires the caller to have run an initialization method before calling any other method on the object. These bugs get even more insidious when a failure to do things in the right order does not cause a hard failure (like throwing an exception) but instead creates some sort of subtle corruption that may not be noticed or cause subsequent calls to fail unexpectedly.

Read on for some strategies on dealing with the Pokémon problem.
Read the rest of this entry »

Palantir Config Server: lining up the ducks

March 6th, 2009 | Khan

At Palantir, we build distributed software. When deployed at a customer site, our platform consists of several servers running on, and distributed across, a cluster of machines. When I first joined the company, deploying and managing our platform was tedious and time consuming. Need to install servers? One by one, login to the machines where they need to go, lay down their requisite files and manually configure them such that they can work together. Have to bring down a deployment for scheduled maintenance? One by one, and in the correct order, login to the machines where the servers reside and shut them down. Want to change the private keys and certificates used to secure communication between servers? Well, you get the point.

From a customer perspective, the complexity associated with the administration of distributed software represents a significant challenge. Not providing tools to help reduce that complexity impacted the overall usability of our platform. Furthermore, from a Palantir perspective, a non-trivial portion of our resources were being devoted to deploying and managing instances of our platform, both externally (by Forward Deployed Engineers working directly with our customers) and internally (by development, QA and support staff working to maintain and improve our product). Could we be more efficient? No doubt. Given our intense focus on customer satisfaction and the desire to grow / scale our business, action was necessary.

To see how we solved this problem, read on.
Read the rest of this entry »

Palantir Monitoring Server: where build beats buy

February 23rd, 2009 | Eric W.

Graph of CPU usage over time

Distributed systems are complex. Getting them right is hard, and when things don’t go right, it can be difficult to understand what went wrong. In an environment like ours, a good monitoring system isn’t just nice to have; it’s a critical component necessary for understanding behavior and diagnosing problems.

We had three primary goals for the initial monitoring system: graphing of time-series data, alerting on event triggers, and notifications to users. Furthermore, as a product company, we had a design goal of a simple, intuitive (yet powerful and flexible) solution.

Before starting, we did a quick survey of existing open-source packages. Unfortunately, nothing we found quite fit our needs, given our specific requirements of security, protocol, licensing, and integrability into our product. Given that, we made the decision to forge ahead and build our own; we try not to re-invent the wheel but it seemed to make sense here.

For an in-depth look at the architecture of the Monitoring Server and components we used to build it, read on…

Read the rest of this entry »

Model Resolution in Palantir Finance: avoiding N2

February 2nd, 2009 | Andy


N2, with N = 8

One of the big challenges in Palantir Finance comes when integrating data from multiple data providers. When the server is launched, it needs to create a coherent model of the financial world based on data coming from potentially dozens of data providers. Each data provider defines a set of “models” that it supports. These models can be things like equities, currencies, futures, options, or even new types that the providers themselves define.

The major challenge occurs when multiple providers define models that represent the same real-world entity. Provider A might know about Google, have basic open/high/low/close data for the stock, and know its ticker, country, and ISIN. Provider B might also provide a Google model, have balance sheet data, and know its country, exchange, and ISIN. We want to expose only one Google model to the user, however, and so we need a means of resolving the two Googles together – recognizing that they’re the same instrument – and adding just one equity to the system that encompasses both.

Resolution logic can be fairly complicated. For equities, for example, there are several different ways in which resolution can take place. If two equities have identical ISINs, we can be pretty confident they match, since those identifiers are declared as globally unique. If two equities have the same ticker and the same country of exchange, we might also consider that a match, though perhaps of weaker quality. Two models resolve to each other if any form of resolution considers them equal (with errors being thrown if other forms of resolution contradict the form that considers them equal…i.e. provider A and provider B agree on an instrument’s ISIN but disagree on its ticker).

Read on for the details of how we solve this seemingly n2 problem with a linear solution.
Read the rest of this entry »

Using Palantir to implement the TARP

January 22nd, 2009 | AlexF

We talk often with our contacts in finance and intelligence, and an increasingly common subject is the U.S. Government’s Troubled Assets Relief Program (TARP — part of the Treasury Department). Our friends see the large problems facing the TARP and the Federal Reserve, and have been asking how our technology can help.

Some of the problems are out of our hands, but many others are solvable with the proper analytics. Taking a closer look at the task before TARP, we noticed that many challenges mirror those facing the intelligence community:

  • Entity and relationship data is scattered across many sources in a wide variety of formats; some are structured, some are unstructured.
  • Entity structure and relationships are not always known upfront, so the solution must adapt to new data structures on the fly.
  • It is costly, time-consuming, and unnecessary to impose one structure on the entire industry.
  • Scalability is a must: millions of mortgages have been securitized into hundreds of thousands of entities.
  • Sensitive, private data requires sophisticated access control and knowledge management — understanding who is accessing which data, what the organization knows, when it was known, and how it was discovered.
  • Specialists from different fields and geographical regions must be able to collaborate effectively.

Palantir’s technology already solves these problems for the intelligence community. Our dynamic ontology makes it easy to import TARP data and entities, so we’ve created a short video using Palantir that shows the power of our approach. We analyze individual mortgage loans, mortgage-backed securities comprising these loans, and institutions holding tranches of the securities:

For more detail on the similarities, click the link to see a detailed breakdown of intelligence vs. TARP workflows.

Read the rest of this entry »

In the spirit of the season: The Family Giving Tree

December 18th, 2008 | Ari

Palantir is an intense place to work. There are people here around the clock (since developers set their own schedules) and folks and equipment arriving and leaving all the time. We’re a very focused bunch, trying to change the world as fast as we can by creating a whole new class of tools.

However, we’re not just people who build software; we’re sons and daughters, mothers and fathers and citizens of our community. As we headed into the holidays, Palantir employees decided to give something back: we signed up with a local organization call The Family Giving Tree, a now national charity that started as an MBA project out of San Jose State University.

The Family Giving Tree is unique in that it allows children to request the presents that they want. In this way, rather than putting money into a black box of a charity, you purchase the gift itself and donate that.

The people of Palantir Technologies purchased over 100 gifts, fulfilling the holiday wishes of the children that asked for them as well as cash donations that will buy gifts for at least 40 more.

Happy Holidays, everyone! We’ll be back next year with more technical articles and information about Palantir.

VizWeek 2008: awards and workflow

December 12th, 2008 | Ari

As we mentioned in an earlier post, Palantir was recently invited to the IEEE’s VisWeek in Dayton Ohio, and was honored to be invited to participate in the VAST Interactive Challenge as part of VisWeek.

After winning an award for Interactive Visual Analytic Environment, Palantir was one of three teams selected to participate in the interactive session from 73 VAST Challenge entries. For the challenge, we were given a completely new set of data to analyze. We had 30 minutes to import 3 disparate datasets into Palantir, 30 minutes to train an analyst that had never used Palantir, and then 2 hours for the analyst to explore the data.

The data for the challenge came from three different sources, with a set of questions to answer for each set of data. There was an infectious outbreak, a Wikipedia edit war, and an abduction from a city park. Over the three challenges, there were over 100,000 datapoints to analyze. All of the data revolved around a fictitious town in Florida, Barracuda Springs, and was linked to the fictitious cult that was the center of the 2008 VAST Challenge. While two members of our team were importing the three datasets, the third team member was working with our analyst (each of the three teams was given a analyst from a nearby analytical organization). In 30 minutes, our analyst was able to learn how to conduct relational, temporal, geospatial, and statistical analysis in Palantir. After the 30 minutes of training, she was able to easily navigate the Palantir workspace, and solve all three challenges. Below is her work (hit the link to check it out).

Her conclusion was that Palantir was “viciously good software” and that she would be asking her boss if they could acquire Palantir for their work. Hit the link below to see screenshots and explanations for one of the challenge workflows.

We really enjoyed the VAST Challenge, and our experience at VisWeek. There were a lot of outstanding papers, posters, and speakers at VisWeek, and we were inspired by many fantastic visualizations that might soon make their way into Palantir’s Finance and Government Platforms. We are also looking forward to the 2009 VAST Challenge!
Read the rest of this entry »

Palantir