Archive for the ‘software engineering’ Category

Introducing Palantir’s first open source releases

December 14th, 2011 | Ari Gesher

Palantir Technologies Open Source

We’re big fans of open source. Libraries from Apache, Google, and various projects hosted on SourceForge.net make up a significant fraction of the third-party code we use to build our products.

We’re proud to be making our first set of open source releases with these two projects: Cinch and Sysmon.

We think it’s the right thing to do, to add our voice to the chorus of developers making software available to freely use, modify, and distribute. These two projects represent our first dip into the open source water – we’re just getting started. As time and other interests allow, we’ll be making other projects available to the dev community.

We’ve chosen the Apache License, Version 2.0 to make our contributions as free from encumberance as possible – our hope is that many people will find them useful and build on top of them just as we have with our own software.

The Projects

code editor showing Cinch annotations

Cinch – Cinch makes MVC in Swing easy

Cinch is a Java library for simplifying certain types of GUI code. When developing Swing applications it’s easy to fall into the trap of not separating out Models and Controllers. It’s all too easy to just store the state of that boolean in the checkbox itself, or that String in the JTextField. The design goal behind Cinch was to make it easier to apply MVC than to not by reducing much of the typical Swing friction and boilerplate. Cinch uses Java annotations to reflectively wire up Models, Views, and Controllers.

Already in heavy use inside the Palantir Government product, Cinch changes GUI development in Java to be similar to iOS and OS X’s Cocoa, where annotations are used to bind controls to fields.

Graph of CPU usage over time

Sysmon – A lightweight platform monitoring tool for Java VMs

Sysmon is a lightweight platform monitoring tool. It was designed to gather performance data (CPU, disks, network, etc.) from the host running the Java VM. This data is gathered, packaged, and published via Java Management Extensions (JMX) for access using the JMX APIs and standard tools (such as jconsole). Sysmon can be run as a standalone daemon or as a library to add platform monitoring to any application.

Originally built as component in our Palantir cluster monitoring server, this project should be helpful in scenarios where you need to get data off a host platform and into a VM.

Let us know how we’re doing

We’d love to hear from you on how we’re doing. Aside from the normal outlets to communicate about the projects themselves (see the mailing lists and issue trackers for each project), please feel free to email me directly, Ari Gesher, as the curator of these projects.

The Coding Interview

October 3rd, 2011 | Allen Chang

Einstein Coding Interview Joke Image

Note: this part is part two of our series on doing your best in interviews. Part one: “How to Rock an Algorithms Interview”.

Here at Palantir algorithms are important, but code is our lifeblood. We live and die by the quality of the code we ship. It’s no surprise, then, that coding ability is what we stress the most in our interview process. A candidate can get by with mediocre algorithm skills (depending on the role), but no one can skimp on coding.

Suppose you’re confident in your ability to write great software. Your task in a coding interview (of which there will be several) is to show the interviewers that you in fact do have the programming chops — that you’re an experienced coder who knows how to write solid, production-quality code.

This is easier said than done. After all, coding in your favorite IDE from the comfort of $familiar_place is very different from coding on a whiteboard (on a problem you’re totally unfamiliar with) in a pressure-filled 45-minute interview. We realize that the interview environment is not the real world, and we adjust our expectations accordingly. Nonetheless, there are a number of things you can do to put your best foot forward during the interview.

First, though, we’d like to give you a sense for what we look for during a coding interview. Most important is the ability to write clean and correct code — it’s not enough just to be correct. A lot of people will be interacting with your code once you’re on the job, so it should be readable, maintainable, and extensible where appropriate. If your solution is clean and correct, and you produced it in a reasonable amount of time without a lot of help, you’re in good shape. But even if you stumble a bit, there are other ways to demonstrate your ability. As you work, we also watch for debugging ability, problem-solving and analytical skills, creativity, and an understanding of the ecosystem that surrounds production code.

With our evaluation criteria in mind, here are some suggestions we hope will help you perform at your very best.

Read the rest of this entry »

Tech Talk: the Hedgehog Programming Language

June 6th, 2011 | Ari Gesher

A few months back, Kevin introduced us to the Hedgehog Programming language – (here’s the post if you missed it).

The Palantir Finance programming language — Hedgehog as we know it — is an interpreted, statically typed, object-oriented language. With a syntax that’s based loosely on Java, it mixes roughly Java-style semantics and a few idiosyncrasies that make it a really interesting case study in language design. It’s built to be extremely efficient for batch operations on time series, which is the heavy lifting in financial analysis.

In this video, Eugene and Dave, two of the engineers that work on the language and platform features needed to support it, give a talk that goes into a number of areas around the Hedgehog language, including why we needed to build a language, how it makes the platform more powerful, how we built dev tools into the UI to make debugging easier, and a bunch of the nitty-gritty features that go into the strange (but fitting) beast that is the Hedgehog Language.

As a final note: this is one of things that I love about working at Palantir Technologies. We study a problem pretty hard before we decide that we need to re-invent the wheel – and then when we do, we go all out. It’s one of the benefits of working with the incredibly talented and motivated folks here. When someone says, “well, we need to build a programming language. No, we’re sure,” we just roll up our sleeves and do it. We can add it to the list of: JMX monitoring system, refined Lucene search engine, speeding up Map-Reduce-like systems to interactive time, and implementing our own GIS platform.

Decorator Pattern: Implementing decorators using forwarding classes

March 1st, 2011 | Allen Chang

A forwarding class is an abstract base class which makes it easier to implement decorators for a particular interface. A forwarding class simply forwards all calls it receives to some delegate; a decorator can then be implemented by extending the forwarding class and overriding the relevant methods. Here’s an example implementation of a forwarding class for the Collection<E> interface (from Google Collections):

public abstract class ForwardingCollection<E> extends ForwardingObject implements Collection<E> {
  @Override protected abstract Collection<E> delegate();

  public boolean add(E element) {
    return delegate().add(element);
  }

  // ... (more overridden methods) ...
}

Things worth noting:

  • This class overrides the delegate() method to return a more specific type.
  • The class contains no instance variables and does not have to be marked Serializable.

Now, here’s an example of how to implement a decorator using a forwarding class (also from Google Collections):

  public class ConstrainedCollection<E> extends ForwardingCollection<E> {
    private final Collection<E> delegate;
    private final Constraint<? super E> constraint;

    public ConstrainedCollection(Collection<E> delegate, Constraint constraint) {
      this.delegate = checkNotNull(delegate);
      this.constraint = checkNotNull(constraint);
    }
    @Override protected Collection<E> delegate() {
      return delegate;
    }
    @Override public boolean add(E element) {
      constraint.checkElement(element);
      return super.add(element);
    }
    @Override public boolean addAll(Collection<? extends E> elements) {
      return super.addAll(checkElements(elements, constraint));
    }
  }

This class implements a collection that checks a constraint on all elements added to the collection. Note that writing the decorator is easy given the forwarding class — only the methods relevant to the decorator need to be overridden.

The Hedgehog Programming Language

February 2nd, 2011 | Kevin Simler

One thing about being a developer on the Palantir Finance product that doesn’t get nearly enough publicity is the fact that we have our own programming language. I’m pretty excited about it so let me repeat, with emphasis: we have our own programming language. Yeah, it’s awesome. All those late hours you spent in the lab working on your final project in compilers: turns out they’re actually good for something other than getting into grad school.

Building this language ourselves — as opposed to, say, using an existing language that already just works — wasn’t an easy decision. In fact, it wasn’t even a single decision. We wracked our collective brain dozens of times trying to think of a better approach. But every which way we sliced it, the problems we needed to solve always pointed to building our own language. I still question this decision sometimes, but on the whole I’m very happy with how things have turned out.

Read the rest of this entry »

A rigorous friction model for human-computer symbiosis

June 2nd, 2010 | Asher Sinensky

This is a response to Ari’s awesome post on human-computer symbiosis. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible… let’s take a look:

We are attempting to understand the total analytic capability for a given task a of a human-computer team. Analytic capability in this case probably means:

eq1(1)

Where A is the answer to the analytic problem in question and tA is the time needed to arrive at the answer based on the inputs available. In the case of chess, A could be the optimum next move given all previous information and tA would be how long it takes to decide on this move.

Read on for a look at how this generalizes in human-computer symbiotic systems.
Read the rest of this entry »

Friction in Human-Computer Symbiosis: Kasparov on Chess

March 8th, 2010 | Ari Gesher

As we build our platforms and applications following a human-computer symbiosis approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way.

One of the areas that we’re interested is in the overall friction of analysis systems. The systems that we build are built on commodity hardware — we’re not building faster computers and yet we can deliver orders-of-magnitude better performance on analysis tasks than existing solutions. How do we do this? By building software in such a way that it reduces the friction experienced at the boundaries between the computing power, the analyst, and the source data.

Chess as analysis laboratory

Chess is, at its heart, a predictive venture. The player attempts to anticipate their opponent’s moves, planning their own moves accordingly, with the straightforward goal of finding a sequence of piece moves that force checkmate.

This game is, in its ideal form, analysis. (The moves made are the logical extension of the analysis.) The data are clean, the problem is well-defined and everyone plays by the same rules. There are even well-defined metrics for ranking chess players by skill — a better chess player is a better chess-game analyst.

In the realm of evaluation of analysis systems, this is as about as good as it gets in terms of designing controlled experiments to study the relative strengths of different analysis systems.

Garry Kasparov, widely considered to be the greatest chess player of all time, recently wrote a review of Diego Rasskin Gutman’s book, Chess Metaphors: Artificial Intelligence and the Human Mind.

The review is excellent and covers a lot of ground. However, one particular anecdote stood out as a very interesting example of human-computer symbiosis (emphasis added):

In 2005, the online chess-playing site Playchess.com hosted what it called a “freestyle” chess tournament in which anyone could compete in teams with other players or computers. Normally, “anti-cheating” algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less “intelligent” than the playing programs they detect.)

Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.

The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.

After the jump, we look at this finding in a more generalized way and map it onto the Palantir approach.
Read the rest of this entry »

Fun with jMock

November 22nd, 2009 | Steve Downing

Here at Palantir, a lot of our automatic tests are full-chain tests. A backend server is fired up, client code runs against it, and everything runs much like a production environment. This makes intuitive sense because it’s a faithful approximation of how the system will run in the field.

However, there are some disadvantages to this:

  • Full-pass tests don’t always localize the problem. Tests on a client class might fail even if it was the service that behaved incorrectly.
  • These full-pass tests are relatively slow. Client code is running against an actual remote service. If a client is being tested, the server code still has to do work — sometimes a lot of work — even if that isn’t the focus of the test.
  • The constraints of the test are loose. Full-chain tests can mostly only see whether the operation finished correctly. It’s much harder to figure out whether the operation was done efficiently and without making unnecessary service calls.
  • They’re very little setup flexibility. If you want an RPC to return a specific value, you have little choice but to have your test get the service into a state where it can return that value. This is easy in some cases, but prohibitively difficult in others.
  • Client tests are forced to share any non-determinism leaked from the service. For example, under real conditions, a request to call A might respond before call B, and sometimes the other way around. This can result in flaky tests or tests that don’t always simulate the conditions you want to exercise.

What’s to be done? Fortunately, there’s an option that handles these cases elegantly. We also test with jMock, a library that dynamically generates mock objects from arbitrary interfaces. These mock objects can be configured to check that particular methods are called with particular inputs a particular number of times, and then give prescribed responses.

Hit the link to see a concrete example of jMock in action.
Read the rest of this entry »

Palantir: search with a twist (part one: memory efficiency)

August 13th, 2009 | Ari Gesher

magnifying glass

A Palantir cluster seamlessly integrates many pieces of proven technology. One of them is our customized version of the venerable Java search engine, Lucene. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query and many, many documents as results. We want to leverage the inverted index capabilities of Lucene, but our data access patterns are a bit different than the typical use case: we need things like pervasive range-querying, different types of relevance, and dynamic views of the data based on security constraints. So in building our data platform, we’ve run into some interesting challenges that are pretty unique in the information retrieval realm, specifically:

  1. Raising memory efficiency
  2. Real-time indexing
  3. Preventing information leaks across access boundaries in an efficient manner

I’ll cover (1) in this post and (2) and (3) in a later post, due out in about two weeks. (Note: part 2 is available here)

Hit the link and we’ll delve into this topic.
Read the rest of this entry »

JavaInvoke allows you to spawn additional Java VMs during testing

July 28th, 2009 | Ari Gesher

junit success

Here at Palantir we use test-driven development (or TDD for short). Integrated tools like Eclipse and JUnit simplify writing and running unit tests. However, once you need to test a broader swath of functionality, it’s time to write functional, integration, and system tests. While technically not ‘unit testing’, the testing framework that JUnit provides is basically the same infrastructure that you want to leverage for writing these more involved types of testing.

When you’re developing enterprise software, functional testing often means getting your clients to talk to your servers. For the main Palantir Government product, we integrate the process of bringing the server up and down with the Ant scripts that run our automated unit tests: our testing tasks bring up the server, run the test suite, and then kill the server. This works great and produces nice results.

When I started working on our authentication server, the pattern that we had used before didn’t work for me. While the Palantir Government tests ran with a single, static configuration file, I needed to run the authentication server with multiple configurations in the course of running through the all the different functional tests. I determined that I needed a way to programmatically bring the server up and down for testing. In JUnit parlance, I needed a way to programmatically launch the server component as part of my setup() function for my unit tests and stop it in my teardown().

With my itch-to-scratch firmly in hand (or some other mixed metaphor), I set out to figure out how to invoke new Java processes from inside a unit test. The solution I came up with (with source code and examples) after the jump.
Read the rest of this entry »


Palantir