Archive for the ‘problemspace-government’ Category

Help! Is there a doctor in the network???

July 23rd, 2010 | Ari

Cyber security is a hot topic, especially in national security circles. The world has witnessed a number of high-profile incidents in the past two years that have been notable for sharing three very important aspects:

  • they were targeted attacks, carried out against specific institutions
  • they were politically motivated, and, inconclusively, appear to be state-sponsored
  • they used multiple-step, multi-vectors attacks and managed to evade existing security countermeasures

This deviates from the types of attacks that IT-centric approaches have sought to defend networks against. Traditional approaches neutralize the perceived threats against a network with a host of countermeasures: firewalls, malware scanners, automated network vulnerability scanning, patch policies, and intrusion detection systems. The network defenses can learn new tricks when the administrators update the signatures, or, for certain types of data, employ a Bayesian inference strategy (as has been employed to fight spam). This approach does a good job of protecting against untargeted attacks as well as weak targeted attacks.

Full network defense requires human analysts looking at anomalies at a level above the automated countermeasures. Check out the rest of this post to take a look at how human-driven, computer-aided analysis is a game changer in cyber security.

Read the rest of this entry »

A rigorous friction model for human-computer symbiosis

June 2nd, 2010 | Asher

This is a response to Ari’s awesome post on human-computer symbiosis. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible… let’s take a look:

We are attempting to understand the total analytic capability for a given task a of a human-computer team. Analytic capability in this case probably means:

eq1(1)

Where A is the answer to the analytic problem in question and tA is the time needed to arrive at the answer based on the inputs available. In the case of chess, A could be the optimum next move given all previous information and tA would be how long it takes to decide on this move.

Read on for a look at how this generalizes in human-computer symbiotic systems.
Read the rest of this entry »

Haiti: effective recovery through analysis

April 5th, 2010 | Ari

[Editor's Note: an edited version of this post first appeared on O'Reilly's Radar blog.]

The prologue was an earthquake of unexpected magnitude and location that left 250,000 dead.

As computer scientists and technologists, we’re used to dealing with large numbers in the abstract. Expressed in human terms, the mind-boggling numbers of 250,000 dead, 300,000 injured and over 1 million people left homeless are hard to comprehend.

Hit the link to read more about how effective data management and analysis is crucial to recovery efforts and see specific examples of data about the situation in Haiti modeled in Palantir Government.
Read the rest of this entry »

Friction in Human-Computer Symbiosis: Kasparov on Chess

March 8th, 2010 | Ari

As we build our platforms and applications following a human-computer symbiosis approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way.

One of the areas that we’re interested is in the overall friction of analysis systems. The systems that we build are built on commodity hardware — we’re not building faster computers and yet we can deliver orders-of-magnitude better performance on analysis tasks than existing solutions. How do we do this? By building software in such a way that it reduces the friction experienced at the boundaries between the computing power, the analyst, and the source data.

Chess as analysis laboratory

Chess is, at its heart, a predictive venture. The player attempts to anticipate their opponent’s moves, planning their own moves accordingly, with the straightforward goal of finding a sequence of piece moves that force checkmate.

This game is, in its ideal form, analysis. (The moves made are the logical extension of the analysis.) The data are clean, the problem is well-defined and everyone plays by the same rules. There are even well-defined metrics for ranking chess players by skill — a better chess player is a better chess-game analyst.

In the realm of evaluation of analysis systems, this is as about as good as it gets in terms of designing controlled experiments to study the relative strengths of different analysis systems.

Garry Kasparov, widely considered to be the greatest chess player of all time, recently wrote a review of Diego Rasskin Gutman’s book, Chess Metaphors: Artificial Intelligence and the Human Mind.

The review is excellent and covers a lot of ground. However, one particular anecdote stood out as a very interesting example of human-computer symbiosis (emphasis added):

In 2005, the online chess-playing site Playchess.com hosted what it called a “freestyle” chess tournament in which anyone could compete in teams with other players or computers. Normally, “anti-cheating” algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less “intelligent” than the playing programs they detect.)

Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.

The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.

After the jump, we look at this finding in a more generalized way and map it onto the Palantir approach.
Read the rest of this entry »

Palantir: like an operating system for data analysis

November 6th, 2009 | Ari

If you’ve taken the time to peruse the Palantir Government analysis blog, you’ve seen numerous examples of Palantir Government as applied to interesting problems; they are recorded screen captures of our analysis desktop client. It’s a showcase of useful, meaningful, and compelling visual and semantic tools being used to do analysis on a wide range of datasets.

What enabled this analysis? Aside from the obvious hard work of our UI and analysis tools teams, it’s the flexibility and power of the Palantir data platform. More than just a scalable datastore, the Palantir data platforms act as robust and clean abstractions on top of data.

One of the early architecture decisions that we made when building both Palantir Government and Palantir Finance was to separate the respective data platforms from the end-user applications used to actually perform analysis. More than just following the client-server model, this separation made the data servers in both products into generic intelligence infrastructure for analytic problems, with our clients acting as analysis applications on top of those platforms.

And so, one way to look at our data platform is as an operating system for analytic applications. In this post we’ll explore the history of operating systems, understand why they’re so important and see how the Palantir data servers deliver the same potential to revolutionize the writing of analysis software that operating systems did to the writing of general programs for computers.

Read the rest of this entry »

Palantir: search with a twist (part two: realtime indexing and security)

October 27th, 2009 | Ari

magnifying glass

[A number of weeks ago, we published a post on the search technology used by Palantir. That post covered raising the memory efficiency of a couple of operations. This is part two of that series.]

The most familiar use of search engines is to index documents made available on the Internet via the hypertext transfer protocol. Forgotten names like AltaVista, names not-yet-really-learned like Bing, and, of course, Google come to mind.

This one, massive use case has a couple of properties that I’d like to highlight:

  • Asynchronous indexing and querying – web search engines tend to use crawlers and indexers to build up an index of the web. After each crawl is finished, the new index is brought online for use by the query engine.
  • Lack of access controls – all the data in the index is available to any query. In fact, most queries are (from the standpoint of the index) completely anonymous.

Palantir: not a web search engine

Search technology is just one part of what makes up a Palantir system. For us, it’s a way to quickly retrieve Palantir objects in a Palantir system, it’s not the whole of the application.

I’d like to highlight a couple of differences from the web search engine case. A Palantir system needs the following properties:

  • Realtime indexing and querying – we need information to be available immediately as it changes in the system.
  • Leak-proof access controls – we need the search engine to help us make sure that we don’t have information leaking across access control boundaries.

Hit the link to read more about these topics.
Read the rest of this entry »

VizWeek 2009: Awards and Workflow

August 24th, 2009 | Ari

We put up a post last year on the 2008 VAST Grand Challenge. Well, the IEEE VAST Challenge 2009 is over and the awards are in. We had another strong year, scoring two awards:

  • Grand Challenge: Analyst’s Tool Choice (Of 48 submissions, only 3 Grand Challenge awards were given)
  • Intuitive Traffic Visualization and Video Description of the Analysis Process

Some background on the event: three years ago, the IEEE began an annual conference called VAST (Visual Analytics in Science and Technology). The VAST symposium focuses on the fundamental research contributions and real-world application of visual analytics. As a part of the conference, the VAST Challenge allows teams to compete on delivering analytic solutions against a synthetic real-world dataset.

A selection of choice quotes from the judges:

  • An award for “highly usable integrated exploration environment”, “efficient analytic exploration platform” or something along these lines would be appropriate.
  • Survey Question: How much novelty do you see in this submission (data processing, visualization, interaction, hypothesis generation or evaluation, overall process, etc.)? Answer: More so than novelty was the extremely efficient solution approach to this challenge, much more so than other solutions.
  • The submission shows two things very clearly: One, it shows the analytical process as being a multi-faceted, simultaneous processing of different information that is quite common among analysts. Two, it shows how multiple perspectives can be displayed on a single monitor, enabling the analyst to visualize what his mind is analyzing. Outstanding!

Our submission

And finally, our submission to the Grand Challenge. Here we have our overview video, with a link to the full video below:

For an in-depth look at the data and techniques used to make this a reality, check out our full submission in Finding a Mole: Cyber Counter Intelligence on the Palantir Analysis Blog.

Bandwidth isn’t cheap. Disk isn’t cheap. CPU isn’t cheap.

May 22nd, 2009 | Bob

fake clearance screen

At Palantir, we work in Silicon Valley, read High Scalability, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that “Disk is cheap.” or “CPU is cheap”. For a web company with a deployment in a commercial data center (or its own data center), this received knowledge is correct. But for a company that ships distributed systems instead of hosting them, and for whom the deployment environment is the kind of locked-down server room in which classified data can reside, these assumptions couldn’t be more false.

At Palantir, we are almost never able to host our customers’ data – typically, as the data is very sensitive, we are not even allowed to see it! Our customers’ highly sensitive data has to reside in a Secure Compartmented Information Facility or SCIF – a building which has been built to be resistant to attempts to access the information within, whether through active or passive measures. The network inside a SCIF is physically separated – “airgapped” – from the public Internet to prevent information leakage. As the entire rationale for such facilities is to prevent information leakage, moving information into or out of one is a tightly regulated process, almost always requiring a human to be in the loop.
Read the rest of this entry »

Palantir Config Server: lining up the ducks

March 6th, 2009 | Khan

At Palantir, we build distributed software. When deployed at a customer site, our platform consists of several servers running on, and distributed across, a cluster of machines. When I first joined the company, deploying and managing our platform was tedious and time consuming. Need to install servers? One by one, login to the machines where they need to go, lay down their requisite files and manually configure them such that they can work together. Have to bring down a deployment for scheduled maintenance? One by one, and in the correct order, login to the machines where the servers reside and shut them down. Want to change the private keys and certificates used to secure communication between servers? Well, you get the point.

From a customer perspective, the complexity associated with the administration of distributed software represents a significant challenge. Not providing tools to help reduce that complexity impacted the overall usability of our platform. Furthermore, from a Palantir perspective, a non-trivial portion of our resources were being devoted to deploying and managing instances of our platform, both externally (by Forward Deployed Engineers working directly with our customers) and internally (by development, QA and support staff working to maintain and improve our product). Could we be more efficient? No doubt. Given our intense focus on customer satisfaction and the desire to grow / scale our business, action was necessary.

To see how we solved this problem, read on.
Read the rest of this entry »

Palantir Monitoring Server: where build beats buy

February 23rd, 2009 | Eric W.

Graph of CPU usage over time

Distributed systems are complex. Getting them right is hard, and when things don’t go right, it can be difficult to understand what went wrong. In an environment like ours, a good monitoring system isn’t just nice to have; it’s a critical component necessary for understanding behavior and diagnosing problems.

We had three primary goals for the initial monitoring system: graphing of time-series data, alerting on event triggers, and notifications to users. Furthermore, as a product company, we had a design goal of a simple, intuitive (yet powerful and flexible) solution.

Before starting, we did a quick survey of existing open-source packages. Unfortunately, nothing we found quite fit our needs, given our specific requirements of security, protocol, licensing, and integrability into our product. Given that, we made the decision to forge ahead and build our own; we try not to re-invent the wheel but it seemed to make sense here.

For an in-depth look at the architecture of the Monitoring Server and components we used to build it, read on…

Read the rest of this entry »


Palantir