Fun with jMock

November 22nd, 2009 | Steve

Here at Palantir, a lot of our automatic tests are full-chain tests. A backend server is fired up, client code runs against it, and everything runs much like a production environment. This makes intuitive sense because it’s a faithful approximation of how the system will run in the field.

However, there are some disadvantages to this:

  • Full-pass tests don’t always localize the problem. Tests on a client class might fail even if it was the service that behaved incorrectly.
  • These full-pass tests are relatively slow. Client code is running against an actual remote service. If a client is being tested, the server code still has to do work — sometimes a lot of work — even if that isn’t the focus of the test.
  • The constraints of the test are loose. Full-chain tests can mostly only see whether the operation finished correctly. It’s much harder to figure out whether the operation was done efficiently and without making unnecessary service calls.
  • They’re very little setup flexibility. If you want an RPC to return a specific value, you have little choice but to have your test get the service into a state where it can return that value. This is easy in some cases, but prohibitively difficult in others.
  • Client tests are forced to share any non-determinism leaked from the service. For example, under real conditions, a request to call A might respond before call B, and sometimes the other way around. This can result in flaky tests or tests that don’t always simulate the conditions you want to exercise.

What’s to be done? Fortunately, there’s an option that handles these cases elegantly. We also test with jMock, a library that dynamically generates mock objects from arbitrary interfaces. These mock objects can be configured to check that particular methods are called with particular inputs a particular number of times, and then give prescribed responses.

Hit the link to see a concrete example of jMock in action.
Read the rest of this entry »

Palantir: like an operating system for data analysis

November 6th, 2009 | Ari

If you’ve taken the time to peruse the Palantir Government analysis blog, you’ve seen numerous examples of Palantir Government as applied to interesting problems; they are recorded screen captures of our analysis desktop client. It’s a showcase of useful, meaningful, and compelling visual and semantic tools being used to do analysis on a wide range of datasets.

What enabled this analysis? Aside from the obvious hard work of our UI and analysis tools teams, it’s the flexibility and power of the Palantir data platform. More than just a scalable datastore, the Palantir data platforms act as robust and clean abstractions on top of data.

One of the early architecture decisions that we made when building both Palantir Government and Palantir Finance was to separate the respective data platforms from the end-user applications used to actually perform analysis. More than just following the client-server model, this separation made the data servers in both products into generic intelligence infrastructure for analytic problems, with our clients acting as analysis applications on top of those platforms.

And so, one way to look at our data platform is as an operating system for analytic applications. In this post we’ll explore the history of operating systems, understand why they’re so important and see how the Palantir data servers deliver the same potential to revolutionize the writing of analysis software that operating systems did to the writing of general programs for computers.

Read the rest of this entry »

Palantir: search with a twist (part two: realtime indexing and security)

October 27th, 2009 | Ari

magnifying glass

[A number of weeks ago, we published a post on the search technology used by Palantir. That post covered raising the memory efficiency of a couple of operations. This is part two of that series.]

The most familiar use of search engines is to index documents made available on the Internet via the hypertext transfer protocol. Forgotten names like AltaVista, names not-yet-really-learned like Bing, and, of course, Google come to mind.

This one, massive use case has a couple of properties that I’d like to highlight:

  • Asynchronous indexing and querying – web search engines tend to use crawlers and indexers to build up an index of the web. After each crawl is finished, the new index is brought online for use by the query engine.
  • Lack of access controls – all the data in the index is available to any query. In fact, most queries are (from the standpoint of the index) completely anonymous.

Palantir: not a web search engine

Search technology is just one part of what makes up a Palantir system. For us, it’s a way to quickly retrieve Palantir objects in a Palantir system, it’s not the whole of the application.

I’d like to highlight a couple of differences from the web search engine case. A Palantir system needs the following properties:

  • Realtime indexing and querying – we need information to be available immediately as it changes in the system.
  • Leak-proof access controls – we need the search engine to help us make sure that we don’t have information leaking across access control boundaries.

Hit the link to read more about these topics.
Read the rest of this entry »

The Palantir Technologies Demo Reel: screenshots, round 3

September 29th, 2009 | Ari

Software engineering is a craft that blends science and art. This fact is easy to overlook as the artistic aspects are often eclipsed by discussions of the science and technology behind what we do.

This is not one of those times: the art in software engineering is most evident when building compelling visual interfaces, something Palantir knows a thing or two about.

A demo reel is an industry term in the movie business — a short reel that acts as a portfolio when applying for jobs, a highlight reel of the author’s visual career. We’re not in the movie business, we’re in the software business. We do, however, use moving pictures to tell stories, stories backed by data — this is our demo reel: two-and-a-half minutes of data visualization and user interface eye-candy (It has pounding music — you may want to put on headphones or turn down your speakers.):

The movie will take a few seconds to load. It’s 800×600, so expanding to full-screen is suggested. We’ve done our best to create a streamable-yet-good-looking video. The compression artifacts are there, but shouldn’t be too distracting. In a real Palantir client, there are no compression artifacts and everything looks even better than it does here.

The Palantir family of products is much more that just pretty pictures; we have the underlying intelligence infrastructure to make those realtime animations possible and (more importantly) meaningful. That said, we sure do think they’re pretty.

By the way, if you’re interested in the progression of our interfaces, this not the first time we’ve posted eye candy: we posted a set of updated screenshots a little over a year ago; think of this as the next installment in the series.

And yes, it’s really all Java Swing.

Palantir Finance Applied to Log4J Data

August 26th, 2009 | Andrew C.

In a previous post, Eric W. covered how we analyze polled system health information. Now we’ll look at pushed information, in the form of logging events.

Use Cases & Constraints

We decided on three kinds of questions we wanted to answer:

  • What is the health of the deployment?
    • Example: What errors have occurred in the last 24 hours?
  • Which parts of the platform are our users engaged with?
    • Example: How much time do users spend in each application?
  • How is our server performing over time?
    • Example: What is the average wait on a search query?

The chief constraint was that we build our platform on Log4J. We already use Log4J all over the project, so converting the logging was out of the question. Besides, Log4J provides a guideline for the kind of metadata our events should support, and Log4J makes it easy to record events to a database.

That left us with two problems to solve: how to store structured data with a Log4j message, and how to analyze the collected data.

Analysis is the easy part: just use Palantir! After all, a sequence of logging events has a lot in common with a time series. The rest is explained below.

Read the rest of this entry »

VizWeek 2009: Awards and Workflow

August 24th, 2009 | Ari

We put up a post last year on the 2008 VAST Grand Challenge. Well, the IEEE VAST Challenge 2009 is over and the awards are in. We had another strong year, scoring two awards:

  • Grand Challenge: Analyst’s Tool Choice (Of 48 submissions, only 3 Grand Challenge awards were given)
  • Intuitive Traffic Visualization and Video Description of the Analysis Process

Some background on the event: three years ago, the IEEE began an annual conference called VAST (Visual Analytics in Science and Technology). The VAST symposium focuses on the fundamental research contributions and real-world application of visual analytics. As a part of the conference, the VAST Challenge allows teams to compete on delivering analytic solutions against a synthetic real-world dataset.

A selection of choice quotes from the judges:

  • An award for “highly usable integrated exploration environment”, “efficient analytic exploration platform” or something along these lines would be appropriate.
  • Survey Question: How much novelty do you see in this submission (data processing, visualization, interaction, hypothesis generation or evaluation, overall process, etc.)? Answer: More so than novelty was the extremely efficient solution approach to this challenge, much more so than other solutions.
  • The submission shows two things very clearly: One, it shows the analytical process as being a multi-faceted, simultaneous processing of different information that is quite common among analysts. Two, it shows how multiple perspectives can be displayed on a single monitor, enabling the analyst to visualize what his mind is analyzing. Outstanding!

Our submission

And finally, our submission to the Grand Challenge. Here we have our overview video, with a link to the full video below:

For an in-depth look at the data and techniques used to make this a reality, check out our full submission in Finding a Mole: Cyber Counter Intelligence on the Palantir Analysis Blog.

Palantir: search with a twist (part one: memory efficiency)

August 13th, 2009 | Ari

magnifying glass

A Palantir cluster seamlessly integrates many pieces of proven technology. One of them is our customized version of the venerable Java search engine, Lucene. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query and many, many documents as results. We want to leverage the inverted index capabilities of Lucene, but our data access patterns are a bit different than the typical use case: we need things like pervasive range-querying, different types of relevance, and dynamic views of the data based on security constraints. So in building our data platform, we’ve run into some interesting challenges that are pretty unique in the information retrieval realm, specifically:

  1. Raising memory efficiency
  2. Real-time indexing
  3. Preventing information leaks across access boundaries in an efficient manner

I’ll cover (1) in this post and (2) and (3) in a later post, due out in about two weeks.

Hit the link and we’ll delve into this topic.
Read the rest of this entry »

JavaInvoke allows you to spawn additional Java VMs during testing

July 28th, 2009 | Ari

junit success

Here at Palantir we use test-driven development (or TDD for short). Integrated tools like Eclipse and JUnit simplify writing and running unit tests. However, once you need to test a broader swath of functionality, it’s time to write functional, integration, and system tests. While technically not ‘unit testing’, the testing framework that JUnit provides is basically the same infrastructure that you want to leverage for writing these more involved types of testing.

When you’re developing enterprise software, functional testing often means getting your clients to talk to your servers. For the main Palantir Government product, we integrate the process of bringing the server up and down with the Ant scripts that run our automated unit tests: our testing tasks bring up the server, run the test suite, and then kill the server. This works great and produces nice results.

When I started working on our authentication server, the pattern that we had used before didn’t work for me. While the Palantir Government tests ran with a single, static configuration file, I needed to run the authentication server with multiple configurations in the course of running through the all the different functional tests. I determined that I needed a way to programmatically bring the server up and down for testing. In JUnit parlance, I needed a way to programmatically launch the server component as part of my setup() function for my unit tests and stop it in my teardown().

With my itch-to-scratch firmly in hand (or some other mixed metaphor), I set out to figure out how to invoke new Java processes from inside a unit test. The solution I came up with (with source code and examples) after the jump.
Read the rest of this entry »

The MultiSnake Challenge

July 6th, 2009 | Nick

multisnake game

“Freaking lag!” It had started to become a common refrain around the developer pit. Listed as a project on a candidate’s resume, MultiSnake was a game that we had started to play during our coding breaks. The game was really quite fun — it was easy to play, games were short, and its multi-player nature fostered great competition. The only real drawback was that we seemed to experience network lag. There was nothing more infuriating than having your long snake die by running straight into a completely avoidable wall because the game lagged and didn’t respond to your keyboard commands in time. During one of our particularly lag-heavy games, someone yelled out a gripe that would change our MultiSnaking days for good: “Man, we could totally write this game ourselves, in our app.”

Read the rest of this entry »

Data Model Change Eventing

May 27th, 2009 | DerekC

One of the early architectural challenges that we faced in building the Palantir Finance product was coming up with a good design for firing events from data models to their listeners. There are many different concepts in our product such as charts, portfolios, and indices which are all maintained by different developers. Initially, each developer had their own system for firing events when a data model changed. This quickly became a drag on development as tools became more integrated because we had to learn each others’ event methodologies and translate between the different systems.

The solution was to select a single event firing system. We wanted something that was easy-to-use yet powerful enough to express all the changes that might be made to a data model. Java’s Property Change Support (PCS) was a good fit because it can support arbitrary events in a very lightweight fashion.

Read on for details of our implementation…
Read the rest of this entry »

Palantir