Archive for August, 2009

Palantir Finance Applied to Log4J Data

August 26th, 2009 | Andrew C.

In a previous post, Eric W. covered how we analyze polled system health information. Now we’ll look at pushed information, in the form of logging events.

Use Cases & Constraints

We decided on three kinds of questions we wanted to answer:

  • What is the health of the deployment?
    • Example: What errors have occurred in the last 24 hours?
  • Which parts of the platform are our users engaged with?
    • Example: How much time do users spend in each application?
  • How is our server performing over time?
    • Example: What is the average wait on a search query?

The chief constraint was that we build our platform on Log4J. We already use Log4J all over the project, so converting the logging was out of the question. Besides, Log4J provides a guideline for the kind of metadata our events should support, and Log4J makes it easy to record events to a database.

That left us with two problems to solve: how to store structured data with a Log4j message, and how to analyze the collected data.

Analysis is the easy part: just use Palantir! After all, a sequence of logging events has a lot in common with a time series. The rest is explained below.

Read the rest of this entry »

VizWeek 2009: Awards and Workflow

August 24th, 2009 | Ari

We put up a post last year on the 2008 VAST Grand Challenge. Well, the IEEE VAST Challenge 2009 is over and the awards are in. We had another strong year, scoring two awards:

  • Grand Challenge: Analyst’s Tool Choice (Of 48 submissions, only 3 Grand Challenge awards were given)
  • Intuitive Traffic Visualization and Video Description of the Analysis Process

Some background on the event: three years ago, the IEEE began an annual conference called VAST (Visual Analytics in Science and Technology). The VAST symposium focuses on the fundamental research contributions and real-world application of visual analytics. As a part of the conference, the VAST Challenge allows teams to compete on delivering analytic solutions against a synthetic real-world dataset.

A selection of choice quotes from the judges:

  • An award for “highly usable integrated exploration environment”, “efficient analytic exploration platform” or something along these lines would be appropriate.
  • Survey Question: How much novelty do you see in this submission (data processing, visualization, interaction, hypothesis generation or evaluation, overall process, etc.)? Answer: More so than novelty was the extremely efficient solution approach to this challenge, much more so than other solutions.
  • The submission shows two things very clearly: One, it shows the analytical process as being a multi-faceted, simultaneous processing of different information that is quite common among analysts. Two, it shows how multiple perspectives can be displayed on a single monitor, enabling the analyst to visualize what his mind is analyzing. Outstanding!

Our submission

And finally, our submission to the Grand Challenge. Here we have our overview video, with a link to the full video below:

For an in-depth look at the data and techniques used to make this a reality, check out our full submission in Finding a Mole: Cyber Counter Intelligence on the Palantir Analysis Blog.

Palantir: search with a twist (part one: memory efficiency)

August 13th, 2009 | Ari

magnifying glass

A Palantir cluster seamlessly integrates many pieces of proven technology. One of them is our customized version of the venerable Java search engine, Lucene. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query and many, many documents as results. We want to leverage the inverted index capabilities of Lucene, but our data access patterns are a bit different than the typical use case: we need things like pervasive range-querying, different types of relevance, and dynamic views of the data based on security constraints. So in building our data platform, we’ve run into some interesting challenges that are pretty unique in the information retrieval realm, specifically:

  1. Raising memory efficiency
  2. Real-time indexing
  3. Preventing information leaks across access boundaries in an efficient manner

I’ll cover (1) in this post and (2) and (3) in a later post, due out in about two weeks.

Hit the link and we’ll delve into this topic.
Read the rest of this entry »


Palantir