Archive for the ‘coding’ Category

A rigorous friction model for human-computer symbiosis

June 2nd, 2010 | Asher

This is a response to Ari’s awesome post on human-computer symbiosis. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible… let’s take a look:

We are attempting to understand the total analytic capability for a given task a of a human-computer team. Analytic capability in this case probably means:

eq1(1)

Where A is the answer to the analytic problem in question and tA is the time needed to arrive at the answer based on the inputs available. In the case of chess, A could be the optimum next move given all previous information and tA would be how long it takes to decide on this move.

Read on for a look at how this generalizes in human-computer symbiotic systems.
Read the rest of this entry »

Fun with jMock

November 22nd, 2009 | Steve

Here at Palantir, a lot of our automatic tests are full-chain tests. A backend server is fired up, client code runs against it, and everything runs much like a production environment. This makes intuitive sense because it’s a faithful approximation of how the system will run in the field.

However, there are some disadvantages to this:

  • Full-pass tests don’t always localize the problem. Tests on a client class might fail even if it was the service that behaved incorrectly.
  • These full-pass tests are relatively slow. Client code is running against an actual remote service. If a client is being tested, the server code still has to do work — sometimes a lot of work — even if that isn’t the focus of the test.
  • The constraints of the test are loose. Full-chain tests can mostly only see whether the operation finished correctly. It’s much harder to figure out whether the operation was done efficiently and without making unnecessary service calls.
  • They’re very little setup flexibility. If you want an RPC to return a specific value, you have little choice but to have your test get the service into a state where it can return that value. This is easy in some cases, but prohibitively difficult in others.
  • Client tests are forced to share any non-determinism leaked from the service. For example, under real conditions, a request to call A might respond before call B, and sometimes the other way around. This can result in flaky tests or tests that don’t always simulate the conditions you want to exercise.

What’s to be done? Fortunately, there’s an option that handles these cases elegantly. We also test with jMock, a library that dynamically generates mock objects from arbitrary interfaces. These mock objects can be configured to check that particular methods are called with particular inputs a particular number of times, and then give prescribed responses.

Hit the link to see a concrete example of jMock in action.
Read the rest of this entry »

Palantir: search with a twist (part two: realtime indexing and security)

October 27th, 2009 | Ari

magnifying glass

[A number of weeks ago, we published a post on the search technology used by Palantir. That post covered raising the memory efficiency of a couple of operations. This is part two of that series.]

The most familiar use of search engines is to index documents made available on the Internet via the hypertext transfer protocol. Forgotten names like AltaVista, names not-yet-really-learned like Bing, and, of course, Google come to mind.

This one, massive use case has a couple of properties that I’d like to highlight:

  • Asynchronous indexing and querying – web search engines tend to use crawlers and indexers to build up an index of the web. After each crawl is finished, the new index is brought online for use by the query engine.
  • Lack of access controls – all the data in the index is available to any query. In fact, most queries are (from the standpoint of the index) completely anonymous.

Palantir: not a web search engine

Search technology is just one part of what makes up a Palantir system. For us, it’s a way to quickly retrieve Palantir objects in a Palantir system, it’s not the whole of the application.

I’d like to highlight a couple of differences from the web search engine case. A Palantir system needs the following properties:

  • Realtime indexing and querying – we need information to be available immediately as it changes in the system.
  • Leak-proof access controls – we need the search engine to help us make sure that we don’t have information leaking across access control boundaries.

Hit the link to read more about these topics.
Read the rest of this entry »

Palantir: search with a twist (part one: memory efficiency)

August 13th, 2009 | Ari

magnifying glass

A Palantir cluster seamlessly integrates many pieces of proven technology. One of them is our customized version of the venerable Java search engine, Lucene. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query and many, many documents as results. We want to leverage the inverted index capabilities of Lucene, but our data access patterns are a bit different than the typical use case: we need things like pervasive range-querying, different types of relevance, and dynamic views of the data based on security constraints. So in building our data platform, we’ve run into some interesting challenges that are pretty unique in the information retrieval realm, specifically:

  1. Raising memory efficiency
  2. Real-time indexing
  3. Preventing information leaks across access boundaries in an efficient manner

I’ll cover (1) in this post and (2) and (3) in a later post, due out in about two weeks.

Hit the link and we’ll delve into this topic.
Read the rest of this entry »

JavaInvoke allows you to spawn additional Java VMs during testing

July 28th, 2009 | Ari

junit success

Here at Palantir we use test-driven development (or TDD for short). Integrated tools like Eclipse and JUnit simplify writing and running unit tests. However, once you need to test a broader swath of functionality, it’s time to write functional, integration, and system tests. While technically not ‘unit testing’, the testing framework that JUnit provides is basically the same infrastructure that you want to leverage for writing these more involved types of testing.

When you’re developing enterprise software, functional testing often means getting your clients to talk to your servers. For the main Palantir Government product, we integrate the process of bringing the server up and down with the Ant scripts that run our automated unit tests: our testing tasks bring up the server, run the test suite, and then kill the server. This works great and produces nice results.

When I started working on our authentication server, the pattern that we had used before didn’t work for me. While the Palantir Government tests ran with a single, static configuration file, I needed to run the authentication server with multiple configurations in the course of running through the all the different functional tests. I determined that I needed a way to programmatically bring the server up and down for testing. In JUnit parlance, I needed a way to programmatically launch the server component as part of my setup() function for my unit tests and stop it in my teardown().

With my itch-to-scratch firmly in hand (or some other mixed metaphor), I set out to figure out how to invoke new Java processes from inside a unit test. The solution I came up with (with source code and examples) after the jump.
Read the rest of this entry »

The MultiSnake Challenge

July 6th, 2009 | Nick

multisnake game

“Freaking lag!” It had started to become a common refrain around the developer pit. Listed as a project on a candidate’s resume, MultiSnake was a game that we had started to play during our coding breaks. The game was really quite fun — it was easy to play, games were short, and its multi-player nature fostered great competition. The only real drawback was that we seemed to experience network lag. There was nothing more infuriating than having your long snake die by running straight into a completely avoidable wall because the game lagged and didn’t respond to your keyboard commands in time. During one of our particularly lag-heavy games, someone yelled out a gripe that would change our MultiSnaking days for good: “Man, we could totally write this game ourselves, in our app.”

Read the rest of this entry »

Data Model Change Eventing

May 27th, 2009 | DerekC

One of the early architectural challenges that we faced in building the Palantir Finance product was coming up with a good design for firing events from data models to their listeners. There are many different concepts in our product such as charts, portfolios, and indices which are all maintained by different developers. Initially, each developer had their own system for firing events when a data model changed. This quickly became a drag on development as tools became more integrated because we had to learn each others’ event methodologies and translate between the different systems.

The solution was to select a single event firing system. We wanted something that was easy-to-use yet powerful enough to express all the changes that might be made to a data model. Java’s Property Change Support (PCS) was a good fit because it can support arbitrary events in a very lightweight fashion.

Read on for details of our implementation…
Read the rest of this entry »

Model-View-Adapter

April 20th, 2009 | Kevin

I used to think I understood MVC. In undergraduate CS programs, MVC is taught as an off-the-shelf pattern, explained once and then ready for use in the real world. Wikipedia also makes it seem pretty simple:

Model–View–Controller (MVC) is an architectural pattern used in software engineering. Successful use of the pattern isolates business logic from user interface considerations, resulting in an application where it is easier to modify either the visual appearance of the application or the underlying business rules without affecting the other. In MVC, the model represents the information (the data) of the application; the view corresponds to elements of the user interface such as text, checkbox items, and so forth; and the controller manages the communication of data and the business rules used to manipulate the data to and from the model.

They go on to show the classic triangle diagram and how it’s baked into various GUI and web frameworks. There’s only one clause in the entire article that hints at something deeper: “Though MVC comes in different flavors…”

Different flavors indeed. In fact MVC is not just a pattern but a whole family of patterns: MVC, MVA, MVP, PAC, Model-Delegate…. It very quickly gets very hairy.

In this article I want to describe one of MVC’s lesser-known variants, the Model-View-Adapter (MVA) pattern, and talk about its advantages over traditional MVC in the context of a Java Swing application.

Read the rest of this entry »

The Pokémon Problem: a new anti-pattern

March 19th, 2009 | John C

Gotta catch 'em all!

It’s always fun to release a new piece of jargon into the wild. I’ve run into a number of bugs in our codebase that caused by an anti-pattern I’d like to dub The Pokémon Problem.

Much like the game of Whac-a-Mole, this is a class of bugs where fixing every occurrence does not prevent the bug from returning in new code: it is easy for code delta to result in an instance of the bug being re-introduced into the code base. Even if you “catch ‘em all“, nothing prevents someone else from introducing new Pokémon bugs later.

Not only is this bug easy to re-introduce, but it sometimes can be hard to find all currently existing instances of this pattern. Although tools like Eclipse make it easier to track down all the places that code is called, sometimes you’re looking for things that happen in a certain sequence (which tools like Eclipse don’t do a good job of searching for) and dynamic invocation mechanisms like Java Reflection can sometimes make it impossible to be exhaustive. This type of bug is also resistant to automated refactoring: changing the protocol of dealing with this corner of your code will require you to track down all places it was touched and manually refactor them. It generally signals a failure to use sufficient separation of concerns.

In general, this anti-pattern is a result of APIs that require the caller to be responsible for state management of resources that the API owns. This can include things like an object that requires the caller to have run an initialization method before calling any other method on the object. These bugs get even more insidious when a failure to do things in the right order does not cause a hard failure (like throwing an exception) but instead creates some sort of subtle corruption that may not be noticed or cause subsequent calls to fail unexpectedly.

Read on for some strategies on dealing with the Pokémon problem.
Read the rest of this entry »

Model Resolution in Palantir Finance: avoiding N2

February 2nd, 2009 | Andy


N2, with N = 8

One of the big challenges in Palantir Finance comes when integrating data from multiple data providers. When the server is launched, it needs to create a coherent model of the financial world based on data coming from potentially dozens of data providers. Each data provider defines a set of “models” that it supports. These models can be things like equities, currencies, futures, options, or even new types that the providers themselves define.

The major challenge occurs when multiple providers define models that represent the same real-world entity. Provider A might know about Google, have basic open/high/low/close data for the stock, and know its ticker, country, and ISIN. Provider B might also provide a Google model, have balance sheet data, and know its country, exchange, and ISIN. We want to expose only one Google model to the user, however, and so we need a means of resolving the two Googles together – recognizing that they’re the same instrument – and adding just one equity to the system that encompasses both.

Resolution logic can be fairly complicated. For equities, for example, there are several different ways in which resolution can take place. If two equities have identical ISINs, we can be pretty confident they match, since those identifiers are declared as globally unique. If two equities have the same ticker and the same country of exchange, we might also consider that a match, though perhaps of weaker quality. Two models resolve to each other if any form of resolution considers them equal (with errors being thrown if other forms of resolution contradict the form that considers them equal…i.e. provider A and provider B agree on an instrument’s ISIN but disagree on its ticker).

Read on for the details of how we solve this seemingly n2 problem with a linear solution.
Read the rest of this entry »


Palantir