Archive for the ‘software engineering’ Category

JavaInvoke allows you to spawn additional Java VMs during testing

July 28th, 2009 | Ari Gesher

junit success

Here at Palantir we use test-driven development (or TDD for short). Integrated tools like Eclipse and JUnit simplify writing and running unit tests. However, once you need to test a broader swath of functionality, it’s time to write functional, integration, and system tests. While technically not ‘unit testing’, the testing framework that JUnit provides is basically the same infrastructure that you want to leverage for writing these more involved types of testing.

When you’re developing enterprise software, functional testing often means getting your clients to talk to your servers. For the main Palantir Government product, we integrate the process of bringing the server up and down with the Ant scripts that run our automated unit tests: our testing tasks bring up the server, run the test suite, and then kill the server. This works great and produces nice results.

When I started working on our authentication server, the pattern that we had used before didn’t work for me. While the Palantir Government tests ran with a single, static configuration file, I needed to run the authentication server with multiple configurations in the course of running through the all the different functional tests. I determined that I needed a way to programmatically bring the server up and down for testing. In JUnit parlance, I needed a way to programmatically launch the server component as part of my setup() function for my unit tests and stop it in my teardown().

With my itch-to-scratch firmly in hand (or some other mixed metaphor), I set out to figure out how to invoke new Java processes from inside a unit test. The solution I came up with (with source code and examples) after the jump.
Read the rest of this entry »

Data Model Change Eventing

May 27th, 2009 | Derek Cicerone

One of the early architectural challenges that we faced in building the Palantir Finance product was coming up with a good design for firing events from data models to their listeners. There are many different concepts in our product such as charts, portfolios, and indices which are all maintained by different developers. Initially, each developer had their own system for firing events when a data model changed. This quickly became a drag on development as tools became more integrated because we had to learn each others’ event methodologies and translate between the different systems.

The solution was to select a single event firing system. We wanted something that was easy-to-use yet powerful enough to express all the changes that might be made to a data model. Java’s Property Change Support (PCS) was a good fit because it can support arbitrary events in a very lightweight fashion.

Read on for details of our implementation…
Read the rest of this entry »

The Pokémon Problem: a new anti-pattern

March 19th, 2009 | John Carrino

Gotta catch 'em all!

It’s always fun to release a new piece of jargon into the wild. I’ve run into a number of bugs in our codebase that caused by an anti-pattern I’d like to dub The Pokémon Problem.

Much like the game of Whac-a-Mole, this is a class of bugs where fixing every occurrence does not prevent the bug from returning in new code: it is easy for code delta to result in an instance of the bug being re-introduced into the code base. Even if you “catch ‘em all“, nothing prevents someone else from introducing new Pokémon bugs later.

Not only is this bug easy to re-introduce, but it sometimes can be hard to find all currently existing instances of this pattern. Although tools like Eclipse make it easier to track down all the places that code is called, sometimes you’re looking for things that happen in a certain sequence (which tools like Eclipse don’t do a good job of searching for) and dynamic invocation mechanisms like Java Reflection can sometimes make it impossible to be exhaustive. This type of bug is also resistant to automated refactoring: changing the protocol of dealing with this corner of your code will require you to track down all places it was touched and manually refactor them. It generally signals a failure to use sufficient separation of concerns.

In general, this anti-pattern is a result of APIs that require the caller to be responsible for state management of resources that the API owns. This can include things like an object that requires the caller to have run an initialization method before calling any other method on the object. These bugs get even more insidious when a failure to do things in the right order does not cause a hard failure (like throwing an exception) but instead creates some sort of subtle corruption that may not be noticed or cause subsequent calls to fail unexpectedly.

Read on for some strategies on dealing with the Pokémon problem.
Read the rest of this entry »

Palantir Monitoring Server: where build beats buy

February 23rd, 2009 | Eric Wong

Graph of CPU usage over time

Distributed systems are complex. Getting them right is hard, and when things don’t go right, it can be difficult to understand what went wrong. In an environment like ours, a good monitoring system isn’t just nice to have; it’s a critical component necessary for understanding behavior and diagnosing problems.

We had three primary goals for the initial monitoring system: graphing of time-series data, alerting on event triggers, and notifications to users. Furthermore, as a product company, we had a design goal of a simple, intuitive (yet powerful and flexible) solution.

Before starting, we did a quick survey of existing open-source packages. Unfortunately, nothing we found quite fit our needs, given our specific requirements of security, protocol, licensing, and integrability into our product. Given that, we made the decision to forge ahead and build our own; we try not to re-invent the wheel but it seemed to make sense here.

For an in-depth look at the architecture of the Monitoring Server and components we used to build it, read on…

Read the rest of this entry »

Deploying a distributed system

October 7th, 2008 | Bob McGrew

Distributed systems diagram

At Palantir, we write software that gets deployed at each client, integrated across their sensitive data sets, and maintained and administered by that client’s in-house admins. Most deployed enterprise software is run on a single beefy box: consider wikis, blogging systems, bug tracking systems, or practically any client/server or web client software software used today. On the other hand, most enterprise software that runs as a distributed system is hosted: Salesforce.com, Google Apps, or any approach that sells software as a service. What’s fairly unusual about our software is that it’s deployed as a distributed system at each client.

Distributed systems are hard to build and hard to maintain. As long as that distributed system is built and maintained in-house, however, you have a number of advantages:

  • The administrators are full-time product experts who are focused on the mission of keeping your system available and responsive.
  • The development organization can build internal tools for the administrators that only have to be “good enough” and can step in if necessary.
  • It’s easy to get feedback on how the system performs, because there are no sensitivity, privacy, or legal constraints.
  • A single, large deployment allows you to optimize your hardware purchasing and amortize installation headaches across a large number of machines.

This is all great, of course, and if you can host and maintain your distributed system yourself, I’d highly recommend it. Sometimes, however, it’s just not possible. At Palantir, the client data we work with is so sensitive that even we cannot see it, except under very strictly controlled circumstances. It’s also so large that the bandwidth limitations of pushing it into a system hosted by us would be prohibitive.

So suppose that you have to deploy your distributed system in a customer datacenter with external parties maintaining the system. What do you need to consider? In this post, I’ll go into a number of key points that we have faced and addressed at Palantir.

Read the rest of this entry »


Palantir