<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; palantir</title>
	<atom:link href="http:///category/palantir/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Wed, 14 Dec 2011 17:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Introducing Palantir&#8217;s first open source releases</title>
		<link>http://blog.palantirtech.com/2011/12/14/introducing-palantirs-first-open-source-releases/</link>
		<comments>http://blog.palantirtech.com/2011/12/14/introducing-palantirs-first-open-source-releases/#comments</comments>
		<pubDate>Wed, 14 Dec 2011 17:28:31 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[Java Links]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[swing]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1956</guid>
		<description><![CDATA[We&#8217;re big fans of open source. Libraries from Apache, Google, and various projects hosted on SourceForge.net make up a significant fraction of the third-party code we use to build our products. We&#8217;re proud to be making our first set of open source releases with these two projects: Cinch and Sysmon. We think it&#8217;s the right [...]]]></description>
			<content:encoded><![CDATA[<div style='float: left; text-align:right; margin-left:15px; margin-right: 20px; margin-bottom: 10px; margin-top: 10px;'><img src="/wp-content/uploads/2011/12/palantir-ptoss.png" alt="Palantir Technologies Open Source" title="Palantir Technologies Open Source" width='85px'/></div>
<p>We&#8217;re big fans of <a href="http://www.opensource.org/">open source</a>. Libraries from <a href="http://apache.org/">Apache</a>, <a href="https://code.google.com/p/guava-libraries/">Google</a>, and various projects hosted on <a href="http://sourceforge.net/">SourceForge.net</a> make up a significant fraction of the third-party code we use to build our products.</p>
<p>We&#8217;re proud to be making our first set of open source releases with these two projects: <a href="http://github.com/palantir/Cinch">Cinch</a> and <a href="http://github.com/palantir/Sysmon">Sysmon</a>.</p>
<p>We think it&#8217;s the right thing to do, to add our voice to the chorus of developers making software available to freely use, modify, and distribute. These two projects represent our first dip into the open source water &#8211; we&#8217;re just getting started.  As time and other interests allow, we&#8217;ll be making other projects available to the dev community.</p>
<p>We&#8217;ve chosen the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a> to make our contributions as free from encumberance as possible &#8211; our hope is that many people will find them useful and build on top of them just as we have with our own software.</p>
<h2>The Projects</h2>
<div style='float: right; text-align:right; margin-left:15px; width: 253px;margin-bottom: 10px; margin-top: 10px'><img src="/wp-content/uploads/2011/12/cinch-screenshot.png" alt="code editor showing Cinch annotations" title="code editor showing Cinch annotations" width='233'/></div>
<h3><a href="http://github.com/palantir/Cinch">Cinch</a> &#8211; Cinch makes <a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">MVC</a> in Swing easy</h3>
<p>Cinch is a Java library for simplifying certain types of GUI code. When developing Swing applications it&#8217;s easy to fall into the trap of not separating out <a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">Models and Controllers</a>. It&#8217;s all too easy to just store the state of that boolean in the checkbox itself, or that String in the JTextField. The design goal behind Cinch was to make it easier to apply MVC than to not by reducing much of the typical Swing friction and boilerplate. Cinch uses Java annotations to reflectively wire up Models, Views, and Controllers.</p>
<p>Already in heavy use inside the Palantir Government product, Cinch changes GUI development in Java to be similar to iOS and OS X&#8217;s <a href="https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/CocoaBindings/Concepts/WhatAreBindings.html#//apple_ref/doc/uid/20002372-CJBEJBHH">Cocoa, where annotations are used to bind controls to fields</a>.</p>
<div style='float: right; text-align:right; margin-left:15px; width: 253px'><img src="http://blog.palantir.com/wp-content/uploads/2009/02/monitoringserverscreenshot-badge.png" alt="Graph of CPU usage over time" title="Graph of CPU usage over time" width="233" height="188"/></div>
<h3><a href="http://github.com/palantir/Sysmon">Sysmon</a> &#8211; A lightweight platform monitoring tool for Java VMs</h3>
<p>Sysmon is a lightweight platform monitoring tool. It was designed to gather performance data (CPU, disks, network, etc.) from the host running the Java VM. This data is gathered, packaged, and published via Java Management Extensions (<a href="http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html">JMX</a>) for access using the JMX APIs and standard tools (such as <a href="http://download.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html">jconsole</a>). Sysmon can be run as a standalone daemon or as a library to add platform monitoring to any application.   </p>
<p>Originally built as component in our <a href="http://blog.palantir.com/2009/02/23/palantir-monitoring-server-where-build-beats-buy/">Palantir cluster monitoring server</a>, this project should be helpful in scenarios where you need to get data off a host platform and into a VM.</p>
<h2>Let us know how we&#8217;re doing</h2>
<p>We&#8217;d love to hear from you on how we&#8217;re doing.  Aside from the normal outlets to communicate about the projects themselves (see the mailing lists and issue trackers for each project), please feel free to email me directly, <a href='mailto:agesher@palantir.com'>Ari Gesher</a>, as the curator of these projects.  </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/12/14/introducing-palantirs-first-open-source-releases/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How to Rock a Systems Design Interview</title>
		<link>http://blog.palantirtech.com/2011/10/28/how-to-rock-a-systems-design-interview/</link>
		<comments>http://blog.palantirtech.com/2011/10/28/how-to-rock-a-systems-design-interview/#comments</comments>
		<pubDate>Fri, 28 Oct 2011 15:00:41 +0000</pubDate>
		<dc:creator>John Carrino</dc:creator>
				<category><![CDATA[development process]]></category>
		<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[interviewing]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[softwarephilosophy]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1937</guid>
		<description><![CDATA[Comic courtesy of XKCD, via Creative Commons License Note: this third installment in our series on doing your best in interviews. Previously: &#8220;How to Rock an Algorithms Interview&#8221; and &#8220;The Coding Interview&#8221;. One interview that candidates often struggle with is the systems design interview. Even if you know your algorithms and write clean code, that [...]]]></description>
			<content:encoded><![CDATA[<div style='text-align: center'><a href='https://www.xkcd.com/754/'><img style='width: 100%' src='/wp-content/uploads/2011/10/dependencies.png' alt='Compiler design dependency comic, originally from http://www.xkcd.com/754/' title='Comic originally from http://www.xkcd.com/754/' /></a>
<div style='text-align: right; font-size: 0.6em; margin-bottom: 1em;'>Comic courtesy of <a href='http://www.xkcd.com/754/'>XKCD</a>, via Creative Commons License</div>
</div>
<p>
<span style='font-size: 0.7em'><em>Note: this third installment in our series on doing your best in interviews.  Previously: <a href="/2011/09/26/how-to-rock-an-algorithms-interview/" title="How to Rock an Algorithms Interview" target="_blank">&#8220;How to Rock an Algorithms Interview&#8221;</a> and <a href="/2011/10/03/the-coding-interview/" title="The Coding Interview" target="_blank">&#8220;The Coding Interview&#8221;</a>.</em></span>
</p>
<p>One interview that candidates often struggle with is the systems design interview. Even if you know your algorithms and write clean code, that code needs to run on a computer somewhere &mdash; and then things quickly get complicated. A truly unbelievable amount of complexity lies beneath something as simple as <a href="https://plus.google.com/112218872649456413744/posts/dfydM2Cnepe">visiting Google in your browser</a>. While most of that complexity is abstracted away from the end user, as a system designer you have to face it head on, and the more you can handle, the better.</p>
<p>At Palantir, many of our teams give a systems design interview along with an <a href="http://blog.palantir.com/2011/09/26/how-to-rock-an-algorithms-interview/">algorithms interview</a> and a couple of <a href="http://blog.palantir.com/2011/10/03/the-coding-interview/">coding interviews</a>. We don’t expect anyone to be an expert at all three disciplines (although some are). We’re looking for generalists with depth &mdash; people who are good at most things, and great at some. If systems design isn&#8217;t your strength, that’s okay, but you should at least be able to talk and reason competently about a complex system.</p>
<p>Read on to learn about what we&#8217;re looking for and how you can prepare.</p>
<p><span id="more-1937"></span></p>
<h2>We’re measuring three things</h2>
<p>Nominally, this interview appears to require knowledge of <strong>systems</strong> and a knack for <strong>design</strong> &mdash; and it does. What makes it interesting, though, and sets it apart from a coding or an algorithms interview, is that whatever solution you come up with during the interview is just a side effect. What we actually care about is the process. </p>
<p>In other words, the systems design interview is all about <strong>communication</strong>. </p>
<p>This reflects what actually working at Palantir is like. As engineers we have a tremendous amount of freedom. We aren’t asked to implement fully-specced features. Instead we take ownership of <em>open-ended problems</em>, and it’s our job to come up with the best solution to each. We need people we can trust to do the right thing without a lot of supervision &mdash; people who can own large projects and take them consistently in the right direction. Invariably, this means being able to communicate effectively with the people around you. Working on <a href="http://blog.palantir.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/">problems with huge scope</a> isn&#8217;t something you can do in a vacuum.</p>
<h2>It&#8217;s an open-ended conversation</h2>
<p>Usually we’ll start by asking you to design a system that performs a given task. The prompt will be simple, but don’t be fooled &mdash; these problems are wide and bottomless, and the point of the interview is to see how much volume you can cover in 45 minutes.</p>
<p>For the most part, you’ll be steering the conversation. It’s up to you to understand the problem. That might mean asking questions, sketching diagrams on the board, and bouncing ideas off your interviewer. Do you know the constraints? What kind of inputs does your system need to handle? You have to get a sense for the scope of the problem before you start exploring the space of possible solutions. And remember, there is no single right answer to a real-world problem. Everything is a tradeoff.</p>
<h2>Topics</h2>
<p>Systems are complex, and when you’re designing a system you’re grappling with its full complexity. Given this, there are many topics you should be familiar with, such as:</p>
<ul>
<li><b>Concurrency.</b> Do you understand threads, deadlock, and starvation? Do you know how to parallelize algorithms? Do you understand consistency and coherence?</li>
<li><b>Networking.</b> Do you roughly understand <a href='https://secure.wikimedia.org/wikipedia/en/wiki/Inter-process_communication'>IPC</a> and <a href='https://secure.wikimedia.org/wikipedia/en/wiki/Internet_Protocol_Suite'>TCP/IP</a>? Do you know the difference between throughput and latency, and when each is the relevant factor?</li>
<li><b>Abstraction.</b> You should understand the systems you’re building upon. Do you know roughly how an OS, file system, and database work? Do you know about the various levels of caching in a modern OS?</li>
<li><b>Real-World Performance.</b> You should be familiar with the <a href="http://everythingisdata.wordpress.com/2009/10/17/numbers-everyone-should-know/">speed of everything</a> your computer can do, including the relative performance of RAM, disk, SSD and your network.
<li><b>Estimation.</b> Estimation, especially in the form of a back-of-the-envelope calculation, is important because it helps you narrow down the list of possible solutions to only the ones that are feasible. Then you have only a few prototypes or micro-benchmarks to write.</li>
<li><b>Availability and Reliability.</b> Are you thinking about how things can fail, especially in a <a href="https://secure.wikimedia.org/wikipedia/en/wiki/Fallacies_of_Distributed_Computing">distributed environment</a>? Do know how to design a system to cope with network failures? Do you understand durability?</li>
</ul>
<p>Remember, we&#8217;re not looking for mastery of all these topics. We&#8217;re looking for <em>familiarity</em>. We just want to make sure you have a good lay of the land, so you know which questions to ask and when to consult an expert.</p>
<h2>How to prepare</h2>
<p>How do you get better at something? If your answer isn’t along the lines of &#8220;practice&#8221; or &#8220;hard work,&#8221; then I have a bridge to sell you. Just like you have to write a lot of code to get better at coding and do a lot of drills to get really good at basketball, you’ll need practice to get better at design. Here are some activities that can help:</p>
<ul>
<li><strong>Do mock design sessions.</strong> Grab an empty room and a fellow engineer, and ask her to give you a design problem, preferably related to something she&#8217;s worked on. Don&#8217;t think of it as an interview &mdash; just try to come up with the best solution you can. Design interviews are similar to actual design sessions, so getting better at one will make you better at the other.</li>
<li><strong>Work on an actual system</strong>. Contribute to OSS or build something with a friend. Treat your class projects as more than just academic exercises &mdash; actually focus on the architecture and the tradeoffs behind each decision. As with most things, the best way to learn is by doing.</li>
<li><strong>Do back-of-the-envelope calculations for something you&#8217;re building and then write micro-benchmarks to verify them.</strong> If your micro-benchmarks don&#8217;t match your back-of-the-envelope numbers, some part of your mental model will have to give, and you&#8217;ll learn something in the process.</li>
<li><strong>Dig into the performance characteristics of an open source system.</strong>  For example, take a look at <a href="https://code.google.com/p/leveldb/">LevelDB</a>.  It&#8217;s new and clean and small and well-documented. Read about the <a href="http://leveldb.googlecode.com/svn/trunk/doc/impl.html">implementation</a> to understand how it stores its data on disk and how it compacts the data into levels. Ask yourself questions about tradeoffs: which kinds of data and sizes are optimal, and which degrade read/write performance? <em>(Hint: think about random vs. sequential writes.)</em>
<li><strong>Learn how databases and operating systems work</strong> under the hood. These technologies are not only tools in your belt, but also a great source of design inspiration. If you can  think like a DB or an OS and understand how each solves the problems it was designed to solve, you&#8217;ll be able to apply that mindset to other systems.</li>
</ul>
<h2>Final thought: relax and be creative</h2>
<p>The systems design interview can be difficult, but it&#8217;s also a place to be creative and to take joy in the imagining of systems unbuilt. If you listen carefully, make sure you fully understand the problem, and then take a clear, straightforward approach to communicating your ideas, you should do fine.</p>
<p>Good luck!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/10/28/how-to-rock-a-systems-design-interview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Coding Interview</title>
		<link>http://blog.palantirtech.com/2011/10/03/the-coding-interview/</link>
		<comments>http://blog.palantirtech.com/2011/10/03/the-coding-interview/#comments</comments>
		<pubDate>Mon, 03 Oct 2011 23:12:07 +0000</pubDate>
		<dc:creator>Allen Chang</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[interviewing]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1925</guid>
		<description><![CDATA[Note: this part is part two of our series on doing your best in interviews. Part one: &#8220;How to Rock an Algorithms Interview&#8221;. Here at Palantir algorithms are important, but code is our lifeblood. We live and die by the quality of the code we ship. It’s no surprise, then, that coding ability is what [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 10px; margin-bottom: 10px'><img src="/wp-content/uploads/2011/09/einstein_coding_interview.jpg" alt="Einstein Coding Interview Joke Image" title="einstein_coding_interview" width="300"/></div>
<p><span style='font-size: 0.7em'><em>Note: this part is part two of our series on doing your best in interviews.  Part one: <a href="/2011/09/26/how-to-rock-an-algorithms-interview/" title="How to Rock an Algorithms Interview" target="_blank">&#8220;How to Rock an Algorithms Interview&#8221;</a>.</em></span></p>
<p>Here at Palantir algorithms are important, but code is our lifeblood. We live and die by the quality of the code we ship. It’s no surprise, then, that coding ability is what we stress the most in our interview process. A candidate can get by with mediocre algorithm skills (depending on the role), but no one can skimp on coding.</p>
<p>Suppose you&#8217;re confident in your ability to write great software. Your task in a coding interview (of which there will be several) is to show the interviewers that you in fact do have the programming chops — that you&#8217;re an experienced coder who knows how to write solid, production-quality code.</p>
<p>This is easier said than done. After all, coding in your <a href="http://eclipse.org/">favorite IDE</a> from the comfort of <code>$familiar_place</code> is very different from coding on a whiteboard (on a problem you&#8217;re totally unfamiliar with) in a pressure-filled 45-minute interview. We realize that the interview environment is not the real world, and we adjust our expectations accordingly. Nonetheless, there are a number of things you can do to put your best foot forward during the interview.</p>
<p>First, though, we&#8217;d like to give you a sense for what we look for during a coding interview. Most important is the ability to write clean <strong>and</strong> correct code &mdash; it&#8217;s not enough just to be correct. A lot of people will be interacting with your code once you&#8217;re on the job, so it should be readable, maintainable, and extensible where appropriate. If your solution is clean and correct, and you produced it in a reasonable amount of time without a lot of help, you&#8217;re in good shape. But even if you stumble a bit, there are other ways to demonstrate your ability. As you work, we also watch for debugging ability, problem-solving and analytical skills, creativity, and an understanding of the ecosystem that surrounds production code.</p>
<p>With our evaluation criteria in mind, here are some suggestions we hope will help you perform at your very best.</p>
<p><span id="more-1925"></span></p>
<h2>Before you start coding</h2>
<ul>
<li><strong>Make sure you understand the problem.</strong> Don&#8217;t hesitate to ask questions. Specifically, if any of the problem requirements seem loosely defined or otherwise unclear, ask your interviewer to make things more concrete. There is no penalty for asking for clarifications, and you don&#8217;t want to miss a key requirement or proceed on unfounded assumptions.</li>
<li><strong>Work through simple examples.</strong> This can be useful both before you begin and after you&#8217;ve finished coding. Working through simple examples before coding can give you additional clarity on the nature of the problem — it may help you notice additional cases or patterns in the problem that you would otherwise have missed had you been thinking more abstractly.</li>
<li><strong>Make a plan.</strong> Be wary of jumping into code without thinking about your program&#8217;s high-level structure. You don&#8217;t have to work out every last detail (this can be difficult for more meaty problems), but you should give the matter sufficient thought. Without proper planning, you may be forced to waste your limited time reworking significant parts of your program.</li>
<li><strong>Choose a language.</strong> At Palantir, we don&#8217;t care what languages you know as long as you have a firm grasp on the fundamentals (decomposition, object-oriented design, etc.). That said, you need to be able to communicate with your interviewer, so choose something that both of you can understand. In general, it&#8217;s easier for us if you use Java or C++, but we&#8217;ll try to accommodate other languages. If all else fails, <a href="http://lolcode.com/">devise your own pseudo-code</a>. Just make sure it&#8217;s precise (i.e. not hand-wavy) and internally consistent, and explain your choices as you go.</li>
</ul>
<h2>While you&#8217;re coding</h2>
<ul>
<li><strong>Think out loud.</strong> Explain your thought process to your interviewer as you code. This helps you more fully communicate your solution, and gives your interviewer an opportunity to correct misconceptions or otherwise provide high-level guidance.</li>
<li><strong>Break the problem down and define abstractions.</strong> One crucial skill we look for is the ability to handle complexity by breaking problems into manageable sub-problems. For anything non-trivial, you&#8217;ll want to avoid writing one giant, monolithic function. Feel free to define helper functions, helper classes, and other abstractions to reach a working solution. You can leverage design patterns or other programming idioms as well. Ideally, your solution will be well-factored and as a result easy to read, understand, and prove correct.</li>
<li><strong>Delay the implementation of your helper functions.</strong> (this serves a corollary to the previous point) Write out the signature, and make sure you understand the contract your helper will enforce, but don&#8217;t implement it right away. This serves a number of purposes: (1) it shows that you&#8217;re familiar with abstractions (by treating the method as an API); (2) it allows you to maintain momentum towards the overall solution; (3) it results in fewer context-switches for your brain (you can reason about each level of the call stack separately); and (4) your interviewer may grant you the implementation for free, if he or she considers it trivial.</li>
<li><strong>Don&#8217;t get caught up in trivialities.</strong> At Palantir we are much more interested in your general problem solving and coding abilities than your recall of library function names or obscure language syntax. If you can&#8217;t remember exactly how to do something in your chosen language, make something up and just explain to your interviewer that you would look up the specifics in the documentation. Likewise, if you utilize an abstraction or programming idiom which admits a trivial implementation, don&#8217;t be afraid to just write out the interface and omit the implementation so you can concentrate on more important aspects of the problem (e.g., &#8220;I&#8217;m going to use a circular buffer here with the following interface without writing out the full implementation&#8221;).</li>
</ul>
<h2>Once you have a solution</h2>
<ul>
<li><strong>Think about edge cases.</strong> Naturally, you should strive for a solution that&#8217;s correct in all observable aspects. Sometimes there will be a flaw in the core logic of your solution, but more often your only bugs will be in how you handle edge cases. (This is true of real-world engineering as well.) Make sure your solution works on all edge cases you can think of. One way you can search for edge-case bugs is to&#8230;</li>
<li><strong>Step through your code.</strong> One of the best ways to check your work is to simulate how your code executes against a sample input. Take one of your earlier examples and make sure your code produces the right result. Huge caveat here: when mentally simulating how your code behaves, your brain will be tempted to project what it wants to happen rather than what actually says happen. Fight this tendency by being as literal as possible. For example, if you&#8217;re calculating a string index with code like <code>str.length()-suffix.length()</code>, don&#8217;t just assume you know where that index will land; actually do the math and make sure the value is what you were hoping for.</li>
<li><strong>Explain the shortcuts you took.</strong> If you skipped things for reasons of expedience that you would otherwise do in a &#8220;real world&#8221; scenario, please let us know what you did and why. For example, &#8220;If I were writing this for production use, I would check an invariant here.&#8221; Since whiteboard coding is an artificial environment, this gives us a sense for how you&#8217;ll treat code once you&#8217;re actually on the job.</li>
</ul>
<p>As an addendum, here are a few suggestions for books we like about the art of software construction:</p>
<p><em><a href='http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882'>Clean Code: A Handbook of Agile Software Craftsmanship</a></em> &#8211; Robert C. Martin<br />
<em><a href="http://www.cc2e.com/">Code Complete: A Practical Handbook of Software Construction</a></em> &#8211; Steve McConnell<br />
<em><a href="http://cm.bell-labs.com/cm/cs/tpop/">The Practice of Programming</a></em> &#8211; Brian Kernighan, Rob Pike<br />
<em><a href="https://secure.wikimedia.org/wikipedia/en/wiki/Design_Patterns">Design Patterns: Elements of Reusable Object-Oriented Software</a></em> &#8211; Erich Gamma, et al.<br />
<em><a href="http://java.sun.com/docs/books/effective/">Effective Java</a></em> &#8211; Joshua Bloch</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/10/03/the-coding-interview/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How to Rock an Algorithms Interview</title>
		<link>http://blog.palantirtech.com/2011/09/26/how-to-rock-an-algorithms-interview/</link>
		<comments>http://blog.palantirtech.com/2011/09/26/how-to-rock-an-algorithms-interview/#comments</comments>
		<pubDate>Mon, 26 Sep 2011 17:11:54 +0000</pubDate>
		<dc:creator>Kevin Simler</dc:creator>
				<category><![CDATA[interviewing]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1902</guid>
		<description><![CDATA[Comic courtesy of XKCD, via Creative Commons License We do a lot of interviewing at Palantir, and let me tell you: it&#8217;s hard. I don&#8217;t mean that we ask tough questions (although we do). I mean that the task of evaluating a candidate is hard. The problem? Given a whiteboard and one hour, determine whether [...]]]></description>
			<content:encoded><![CDATA[<div style='text-align: center'><a href='http://www.xkcd.com/399/'><img style='width: 100%; margin-bottom: 20px' src='/wp-content/uploads/2011/09/travelling_salesman_problem.png' alt='Traveling salesman problem comic, originally from http://www.xkcd.com/399/' title='Traveling salesman problem comic, originally from http://www.xkcd.com/399/' /></a>
<div style='text-align: right; font-size: 0.6em; margin-bottom: 1em;'>Comic courtesy of <a href='http://www.xkcd.com/399/'>XKCD</a>, via Creative Commons License</div>
</div>
<p>We do a lot of interviewing at Palantir, and let me tell you: it&#8217;s hard. I don&#8217;t mean that we ask tough questions (although we do). I mean that the task of evaluating a candidate is hard.</p>
<p>The problem? Given a whiteboard and one hour, determine whether the person across from you is someone you&#8217;d like to work with, in the trenches, for the next n years. A candidate&#8217;s performance during an interview is only weakly correlated with his or her true potential, but we&#8217;re stuck with the problem of turning the chickenscratch on the whiteboard into an &#8216;aye&#8217; or &#8216;nay&#8217;. Sometimes it feels like a high-stakes game of reading tea leaves. Believe me we&#8217;re doing our best, but we&#8217;re often left the nagging worry that we&#8217;re passing up brilliant people who just had a bad day or who didn&#8217;t click with a particular problem.</p>
<p>In an effort to improve this situation, we wanted to write up a guide that will help candidates make sense of this process, or at least the part known as an Algorithms Interview. At Palantir we ask questions that test for a lot of different skills — coding, design, systems knowledge, etc. — but one of our staple interviews is to ask you to design an algorithm to solve a particular problem.</p>
<p>It usually starts like this:</p>
<blockquote><p>Given X, figure out an efficient way to do Y.</p></blockquote>
<p><strong>First: Make sure you understand the problem</strong>. You&#8217;re not going to lose points asking for clarifications or talking through the obvious upfront. This will also buy you time if your brain isn&#8217;t kicking in right away. Nobody expects you to solve a problem in the first 30 seconds or even the first few minutes.</p>
<p>Once you understand the problem, <strong>try to come up with a solution – any solution whatever</strong>. As long as it&#8217;s valid, it doesn&#8217;t matter if your solution is trivial or ugly or extremely inefficient. What matters is that you&#8217;ve made progress. This does two things: (1) it forces you to engage with the structure of the problem, priming your brain for improvements you can make later, and (2) it gives you something in the bank, which will in turn give you confidence. If you can achieve a brute force solution to a problem, you&#8217;ve cleared a major hurdle to solving it in a more efficient way.</p>
<p>Now comes the hard part. You&#8217;ve given an O(n^3) solution and your interviewer asks you to do it faster. You stare at the problem, but nothing&#8217;s coming to you. At this point, there are a few different moves you can make, depending on the problem at hand and your own personality. Almost all of these can help on almost any problem:</p>
<ol>
<li><strong>Start writing on the board</strong>. This may sound obvious, but I&#8217;ve had dozens of candidates get stuck while staring at a blank wall. Maybe they&#8217;re not visual people, but still I think it&#8217;s more productive to stare at some examples of the problem than to stare at nothing. If you can think of a picture that might be relevant, draw it. If there&#8217;s a medium-sized example you can work through, go for it. (Medium-sized is better than small, because sometimes the solution to a small example won&#8217;t generalize.) Or just write down some propositions that you know to be true. Anything is better than nothing.
</li>
<p><br/></p>
<li><strong>Talk it through</strong>. And don&#8217;t worry about sounding stupid. If it makes you feel better, tell your interviewer, &#8220;I&#8217;m just going to talk out loud. Don&#8217;t hold me to any of this.&#8221; I know many people prefer to quietly contemplate a problem, but if you&#8217;re stuck, talking is one way out of it. Sometimes you&#8217;ll say something that clearly communicates to your interviewer that you understand what&#8217;s going on. Even though you might not put much stock in it, your interviewer may interrupt you to tell you to pursue that line of thinking. Whatever you do, please DON&#8217;T fish for hints. If you need a hint, be honest and ask for one.
</li>
<p><br/></p>
<li><strong>Think algorithms</strong>. Sometimes it&#8217;s useful to mull over the particulars of the problem-at-hand and hope a solution jumps out at you (this would be a bottom-up approach). But you can also think about different algorithms and ask whether each of them applies to the problem in front of you (a top-down approach). Changing your frame of reference in this way can often lead to immediate insight. Here are some algorithmic techniques that can help solve more than half the problems we ask at Palantir:
<ul>
<li>Sorting (plus searching / binary search)</li>
<li>Divide-and-conquer</li>
<li>Dynamic programming / memoization</li>
<li>Greediness</li>
<li>Recursion</li>
<li>Algorithms associated with a specific data structure (which brings us to our fourth suggestion&#8230;)</li>
</ul>
</li>
<p><br/></p>
<li><strong>Think data structures</strong>. Did you know that the top 10 data structures account for 99% of all data structure use in the real world? Probably not, because I just made those numbers up — but they&#8217;re in the right ballpark. Yes, on occasion we ask a problem whose optimal solution requires a <a href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filter</a> or <a href="http://en.wikipedia.org/wiki/Suffix_tree">suffix tree</a>, but even those problems tend to have a near-optimal solution that uses a much more mundane data structure. The data structures that are going to show up most frequently are:
<ul>
<li>Array</li>
<li>Stack / Queue</li>
<li>Hashset / Hashmap / Hashtable / Dictionary</li>
<li>Tree / binary tree</li>
<li>Heap</li>
<li>Graph</li>
</ul>
<p>You should know these data structures inside and out. What are the insertion/deletion/lookup characteristics? (O(log n) for a balanced binary tree, for example.) What are the common caveats? (Hashing is tricky, and usually takes O(k) time when k is the size of the object being hashed.) What algorithms tend to go along with each data structure? (<a href="https://secure.wikimedia.org/wikipedia/en/wiki/Dijkstra%27s_algorithm">Dijkstra&#8217;s</a> for a graph.) But when you understand these data structures, sometimes the solution to a problem will pop into your mind as soon as you even think about using the right one.
</li>
<p><br/></p>
<li><strong>Think about related problems you’ve seen before and how they were solved</strong>. Chances are, the problem you&#8217;ve been presented is a problem that you&#8217;ve seen before, or at least very similar.  Think about those solutions and how they can be adapted to specifics of the problem at hand.  Don&#8217;t get tripped up by the form that the problem is presented &#8211; distil it down to the core task and see if matches something you&#8217;ve solved in the past.</li>
<p><br/></p>
<li><strong>Modify the problem by breaking it up into smaller problems.</strong> Try to solve a special case or simplified version of the problem.  Looking at the corner cases is a good way to bound the complexity and scope of the problem.  A reduction of the problem into a subset of the larger problem can give a base to start from and then work your way up to the full scope at hand.
<p>Looking at the problem as a composition of smaller problems may also be helpful. For example, “find a number in a sorted array which has been shifted cyclically by an unknown constant k” can be solved by (1) first figuring out “k” and then (2) figuring out how to perform binary search on a shifted array).</li>
<p><br/></p>
<li><strong>Don&#8217;t be afraid to backtrack</strong>. If you feel like a particular approach isn&#8217;t working, it might be time to try a different approach. Of course you shouldn&#8217;t give up too easily. But if you&#8217;ve spent a few minutes on an approach that isn&#8217;t bearing any fruit and doesn&#8217;t feel promising, back up and try something else. I&#8217;ve seen more candidates who overcommit than undercommit, which means you should (all else equal) be a little more willing to abandon an unpromising approach.</li>
</ol>
<p>Incidentally, trying out a few different approaches (rather than sticking with a single approach) tends to work well in interviews, because the problems we choose for an interview usually have many different solutions. Happily, the same is true for the problems we solve on the job =)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/09/26/how-to-rock-an-algorithms-interview/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Help! Is there a doctor in the network???</title>
		<link>http://blog.palantirtech.com/2010/07/23/help-is-there-a-doctor-in-the-network/</link>
		<comments>http://blog.palantirtech.com/2010/07/23/help-is-there-a-doctor-in-the-network/#comments</comments>
		<pubDate>Fri, 23 Jul 2010 23:33:01 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1427</guid>
		<description><![CDATA[Cyber security is a hot topic, especially in national security circles. The world has witnessed a number of high-profile incidents in the past two years that have been notable for sharing three very important aspects: they were targeted attacks, carried out against specific institutions they were politically motivated, and, inconclusively, appear to be state-sponsored they [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 250px; margin-left: 15px; margin-bottom: 15px;'>
<img src='http://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Botnet.svg/500px-Botnet.svg.png' width='250'/>
</div>
<p>Cyber security is a hot topic, especially in national security circles.  The world has witnessed a number of high-profile incidents in the past two years that have been notable for sharing three very important aspects: </p>
<ul>
<li>they were targeted attacks, carried out against specific institutions
</li>
<li>they were politically motivated, and, inconclusively, appear to be state-sponsored
</li>
<li>they used multiple-step, multi-vectors attacks and managed to evade existing security countermeasures
</li>
</ul>
<p>This deviates from the types of attacks that IT-centric approaches have sought to defend networks against.  Traditional approaches neutralize the perceived threats against a network with a host of countermeasures: firewalls, malware scanners, automated network vulnerability scanning, patch policies, and intrusion detection systems.  The network defenses can learn new tricks when the administrators update the signatures, or, for certain types of data, employ a <a href="http://en.wikipedia.org/wiki/Bayesian_inference">Bayesian inference</a> strategy (<a href="http://www.paulgraham.com/spam.html">as has been employed to fight spam</a>).  This approach does a good job of protecting against untargeted attacks as well as weak targeted attacks.  </p>
<p>Full network defense requires human analysts looking at anomalies at a level above the automated countermeasures.  Check out the rest of this post to take a look at how <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">human-driven, computer-aided analysis is a game changer</a> in cyber security.</p>
<p><span id="more-1427"></span></p>
<h2>A classic doctrine: the immune system</h2>
<p>If you&#8217;ve worked in network security, you&#8217;re undoubtedly familiar with most  (if not all) of the countermeasure systems listed above.  The question we don&#8217;t often ask is: </p>
<blockquote><p>What is the defensive doctrine being employed by this security architecture?</p></blockquote>
<p>Classic network security can be summed up as this philosophy: </p>
<blockquote><p>Become unattractive as a target-of-opportunity to the legions of script kiddies and somewhat more sophisticated opportunists who search for network defenses they can easily breach.  </p></blockquote>
<p>The goal of the IT-based approach is to be a tougher nut to crack than the network next door. Attackers throw themselves against the defenses, find no exploitable vulnerabilities and move on to the next target-of-opportunity. </p>
<p>As the old joke goes: when a tiger attacks your safari group, you don’t have to run faster than the tiger, you just need to run faster than your friends. We might rewrite that today as: <em><a href="http://en.wikipedia.org/wiki/Leet">when the &#8216;l33t h4cker comes a&#8217;knocking in your network neighborhood, just make sure that you&#8217;re less of a n00b than the next guy and you&#8217;ll probably avoid getting pwned too hard</a>.</em></p>
<p>And so we&#8217;re faced with this reality: today&#8217;s state-of-the-art network defense is a patchwork system of automated countermeasures designed to stop dumb, undirected, automated attacks. This architecture is not unique to cyber security &mdash; it has a close analog in biology. </p>
<p>The human immune system produces antibodies that recognize and defend against specific attacks; it learns over time through successful defense of the organism and, more recently, vaccinations. <a href="http://www.nytimes.com/2010/07/13/science/13micro.html?_r=1&#038;pagewanted=all">Millions of bacteria and viruses are foiled every day by immune systems</a>. We can observe this same pattern in cyberspace: hijacked systems tirelessly scour the Internet&#8217;s address space, looking for hapless networks ripe for takeover. <a href='http://blogs.forbes.com/firewall/2010/06/04/just-how-big-is-the-cyber-threat-to-dod/'>The Pentagon is probed something like 250,000 times a day</a>.</p>
<p>It would be insanity to connect a network to the modern Internet without security countermeasures in place to defend against these sort of attacks.  However, while they are necessary to the task of securing a network, they are certainly not sufficient.</p>
<h2>Targeted attacks: slipping past the immune system</h2>
<div style='text-align: center; float: right; width: 250px; margin-left: 15px; margin-bottom: 15px;'>
<a href='http://www.dpd.cdc.gov/DPDx/HTML/Hookworm.htm'><img src='http://www.dpd.cdc.gov/DPDx/images/ParasiteImages/G-L/Hookworm/Hookworm_LifeCycle.gif' width='250'/><br/><br />
<span style='font-size: 0.8em; text-align: center; font-style: italic'>The Lifecycle of Hookworm</span></a>
</div>
<p>The countermeasures discussed thus far are essential but not infallible and can be bypassed by things like never-before-seen viruses or carefully crafted penetration attempts.  In the biological domain a targeted attack might come in the form of <a href="http://www.ncbi.nlm.nih.gov/pubmed/20208540">HIV</a> (evolved to slip past the immune defenses), a toxin (non-biological, nothing the immune system can do), or a parasite.</p>
<h3>The original crafty adversary</h3>
<p>A parasite can survive and thrive inside its host while <a href="http://jbiol.com/content/8/7/62">evading or suppressing the normal immune response to invaders</a> . They take up comfortable residence inside the body of their host, using it as source of food and protection; finally, they use the host as a place to reproduce and spread to other individuals in the host species.  Parasites don&#8217;t generally kill or gravely harm their hosts (or at least they don&#8217;t do it quickly), as it&#8217;s in their own self-interest to have the host continue living.</p>
<h3>Targeted parasite networks: GhostNet and the Shadow network</h3>
<p>Cyber analog?  You betcha: <a href='http://www.google.com/corporate/execs.html#vint'>Vint Cerf</a> was quoted just last week, <a href='http://voices.washingtonpost.com/fasterforward/2010/07/vint_cerf_at_palantir_night_li.html'>&#8220;The hackers don&#8217;t want to destroy the network. They want to keep it running, so they can keep making money from it.&#8221;</a></p>
<p><a href="http://citizenlab.org/">The Citizen Lab</a>, a University of Toronto-based non-profit that does in-depth, hands-on, technical research in the cyber security domain had this to say:</p>
<blockquote><p>Crime and espionage form a dark underworld of cyberspace. Whereas crime is usually the first to seek out new opportunities and methods, espionage usually follows in its wake, borrowing techniques and tradecraft.
</p></blockquote>
<p>That&#8217;s in the foreword from their recent report, &#8220;<a href='http://www.scribd.com/doc/29435784/SHADOWS-IN-THE-CLOUD-Investigating-Cyber-Espionage-2-0'>Shadows in the Cloud: Investigating Cyber Espionage 2.0</a>&#8220;.  The report details their experiences tracking down the size, scope, and tradecraft behind a massive cyber-espionage botnet, dubbed <a href="http://en.wikipedia.org/wiki/GhostNet">GhostNet</a>:</p>
<blockquote style='text-align: justify;'><p><a href='http://www.scribd.com/doc/13731776/Tracking-GhostNet-Investigating-a-Cyber-Espionage-Network'>Tracking GhostNet: Investigating a Cyber Espionage Network</a> <em>[their first report on this botnet]</em> was the product of a ten-month investigation and analysis focused on allegations of Chinese cyber espionage against the Tibetan community. The research entailed field-based investigations in India, Europe and North America working directly with affected Tibetan organizations, including the Private Office of the Dalai Lama, the Tibetan Government-in-Exile, and several Tibetan NGOs in Europe and North America. The fieldwork generated extensive data that allowed us to examine Tibetan information security practices, as well as capture evidence of malware that had penetrated Tibetan computer systems. We also engaged in extensive data analysis and technical investigation of web-based interfaces to command and control servers that were used by attackers to send instructions to, and receive data from compromised computers.</p>
<p>The report documented a wide ranging network of compromised computers, including at least 1,295 spread across 103 countries, 30 percent of which we identified and determined to be &#8220;high-value&#8221; targets, including ministries of foreign affairs, embassies, international organizations, news organizations, and a computer located at NATO headquarters.</p></blockquote>
<p>These attacks used carefully forged email attacks, known as <a href='http://www.fbi.gov/page2/april09/spearphishing_040109.html'>spearphishing</a>, to entice their targets to unknowingly infect themselves with remote control software. The infections allowed the attackers to exfiltrate data from compromised machines and use them as springboards to attack other systems using similar targeted attacks.  <a href="http://www.dpd.cdc.gov/dpdx/html/hookworm.htm">Sound familiar?</a></p>
<h2>A New Doctrine: The Doctor</h2>
<p>Without an immune system, we&#8217;d be dead within hours; our immune system is absolutely necessary but, again,  not sufficient to keep us healthy.  For those things that the immune system can&#8217;t take care of, we use doctors.  Doctors are adaptive adversaries to disease: they can run tests, they can talk to the patient, they can apply insights learned from other patients or diseases.  Most importantly, a doctor has a much more <a href='http://jokesareawesome.com/joke/932/what_s_the_difference_between_god_and_a...'>omniscient view of the patient</a> than the immune system.</p>
<h3>Network Security &#8211; a 10,000 ft. discipline</h3>
<p>Applying this approach to the network enables security responses that can actually counter targeted attacks. A security officer (our network&#8217;s &#8220;doctor&#8221;) starts an investigation with some sort of anomalous event, a unexpected IP address in a log, an alert from intrusion detection system. </p>
<p>Remember that a runny nose or flagged packet is not an illness or a network compromise, it&#8217;s a symptom.  Symptoms suggest causes, but are only clues. Taken in isolation, they don&#8217;t often offer conclusive information on the health of the patient. In fact, finding the root cause of a symptom (a <a href="http://en.wikipedia.org/wiki/Diagnosis">diagnosis</a>) requires the synthesis of multiple sources of data into a complete, coherent picture of the network or patient.  This often includes things that you can&#8217;t see in the blood or packet stream, like understanding where the patient or user has travelled, what environmental factors might be present in their home, existing allergies, open wireless networks, insecure web apps, drug use, etc.</p>
<h3>Node health vs. network health</h3>
<p>A node gets an infection on your network? <a href="http://www.thinkgeek.com/tshirts-apparel/unisex/frustrations/ad98/">Re-image it, the symptoms go away</a>. In the domain of human medicine, re-imaging of humans when they get sick has not yet gained FDA approval &ndash; something doctors have been uttering oaths about since way before the days of Hippocrates.</p>
<p>But it&#8217;s not the symptoms we&#8217;re after, it&#8217;s the root cause.  Couple that with how easy it is to treat the symptoms via re-imaging, and security officers are more akin to public health officials, more concerned about the overall health of the network than the health of a single node.  This broader concern manifests as an instant list of begged questions about any security anomaly on the network:</p>
<ul>
<li>How did this happen?  Was it a machine (network exploit) or human vector (somebody clicked on something they shouldn&#8217;t have)?</li>
<li>What is the extent of this infection?  Is it limited to a single node?  <a href="http://www.youtube.com/watch?v=EVekNsgUqn4">Why does this small moon appear to have a tractor beam locked on to our ship?</a></li>
<li>Is this part of a larger attack?  What is the true target of this attack? <a href="http://www.youtube.com/watch?v=dddAi8FF3F4">Is this a trap?</a></li>
<li>Do the tracks lead out of or deeper into my network? Was this an inside job?  Did I find an intermediary node in a multi-node penetration?
<li>Who is behind this attack and why do they want in? Can I match this modus operandi with any other known attacks on this or other networks?</li>
<li>How do I prevent this sort of attack in the future?  Do I need to deploy new countermeasures, re-architect parts of the network, and/or teach my people to be more careful?</li>
</ul>
<p>The answer to any of these questions does not appear in a single log file on your network, no more than any single antibody can tell you that the H1N1 flu you&#8217;re now infected with came from the grocery clerk who got it from her boyfriend who, in turn, acquired it on his recent trip to Mexico.</p>
<p>The trees don&#8217;t know how big the forest is.</p>
<h3>Cyber security doctors</h3>
<div style='text-align: center; float: right; width: 250px; margin-left: 15px; margin-bottom: 15px;'>
<a href='http://home.uchicago.edu/~bleakley/graphical_summaries/hookworm_paper_in_graphs.html'><img src='http://upload.wikimedia.org/wikipedia/commons/c/c6/Hookworm_Examination.jpg' width='250'/><br/><br />
<span style='font-size: 0.8em; text-align: center; font-style: italic'>A doctor examines a boy looking for hookworm.</span></a>
</div>
<p>The way to find the answers to these questions is to <em><strong>give a skilled, experienced analyst powerful tools to use against all the data about the attack on all of the systems on your network mashed up with relevant data about the messy meatspace that contains the computers, users, and attackers in question</strong></em>.  </p>
<p>You need firewall logs, intrusion detection system logs, malware detection logs, badge logs to determine who had physical access to the network, travel records of where you expect your employees to be logging into the VPN from, and a dozen other sources of data that are unique to this network.</p>
<p>The data is not enough &mdash; they must to be accessible in a way that enable expedient analysis. In most shops, many of the aforementioned data sources exist, but accessing and cross-referencing them requires a high-level of technical fluency in the storage systems themselves, <em>even for a user that has strong grasp of the story that the data are telling</em>.  Some combination of SQL, shell, grep, awk, sed, perl, and <a href="http://en.wikipedia.org/wiki/Visual_inspection">Mk I Eyeball</a> are used to suss out answers from the data.  It&#8217;s a slow, fragile, error-prone game, and the bar is high to even begin playing.</p>
<p>Whenever computers are recording information about the activities of other computers, the data gets big and it gets big fast. For example, grep is a very powerful and flexible tool, but its linear search through data starts to falter as the data size exceed about 10 GB on rotational media.</p>
<p>In order address and solve these sorts of problems, the world needs a platform with the following properties:</p>
<ul>
<li>Has access to all known information about a given incident</li>
<li>Makes querying and exploring relationships conceptual and interactive</li>
<li>Scale to handle large data sizes</li>
</ul>
<p>It probably looks something like this:</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/cyber/cyber1.flv" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/cyber/cyber1.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/government/videos/cyber/cyber1.flv"/></object> </p>
<p><a href="http://media.palantirtech.com/government/videos/cyber/cyber1.wmv">Download</a> the WMV (50 MB) | <a href="http://media.palantirtech.com/government/videos/cyber/cyber1.asx">Streaming Windows Media</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/07/23/help-is-there-a-doctor-in-the-network/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A rigorous friction model for human-computer symbiosis</title>
		<link>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/</link>
		<comments>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 03:18:52 +0000</pubDate>
		<dc:creator>Asher Sinensky</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1344</guid>
		<description><![CDATA[This is a response to Ari&#8217;s awesome post on human-computer symbiosis. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look: We are attempting to understand the total analytic capability for a given task a of a human-computer [...]]]></description>
			<content:encoded><![CDATA[<div style='text-align: center; float: right; margin-left: 15px; margin-right: 15px'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt="" width="300"/>
</div>
<p>This is a response to <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">Ari&#8217;s awesome post on human-computer symbiosis</a>. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look:</p>
<p>We are attempting to understand the total analytic capability for a given task <strong><em>a</em></strong> of a human-computer team. Analytic capability in this case probably means:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq1.png" alt="eq1"/>(1)
</div>
<p>Where <strong><em>A</em></strong> is the answer to the analytic problem in question and <strong><em>t<sub>A</sub></em></strong> is the time needed to arrive at the answer based on the inputs available. In the case of chess, <strong><em>A</em></strong> could be the optimum next move given all previous information and <strong><em>t<sub>A</sub></em></strong> would be how long it takes to decide on this move.</p>
<p>Read on for a look at how this generalizes in human-computer symbiotic systems.<br />
<span id="more-1344"></span></p>
<p>In the case of the human-computer team, we know that <strong><em>a </em></strong>is going to be a function of both the human&#8217;s analytical capability <strong><em>h</em></strong> and the computer&#8217;s analytical capability <strong><em>c</em></strong> (where both <strong><em>h</em></strong> and <strong><em>c</em></strong> have units of answers/time). In the limit case we know that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq2.png" alt="eq2"/>(2)
</div>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq3.png" alt="eq3"/>(3)
</div>
<p>Or in plain English, if there is no human present, the total analytic capability is simply the analytic capability of the computer. So the naïve solution would be that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq4.png" alt="eq4"/>(4)
</div>
<p>(4) clearly meets the limiting cases described in (2) and (3). Kasparov noticed a mixing function where the ability of the human and computer to work together becomes the dominant term &mdash; we might call this the mixing capability for the given task or <strong><em>m</em></strong>. Including this phenomenon, the total analytic capability (4) would be re-defined as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq5.png" alt="eq5"/>(5)
</div>
<p>where <strong><em>m</em></strong> has the property that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq6.png" alt="eq6"/>(6)
</div>
<p>Thus maintaining the limits expressed in (2) and (3) and adhering to the observation that if there is no human or computer component then there will be no mixing advantage. A naïve solution to this constraint would be simple linear mixing:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq7.png" alt="eq7"/>  (7)
</div>
<p>where <strong><em>M</em></strong> (units of time per answer) is the mixing efficiency and will be primarily based on the type of task being solved &mdash; some analytical tasks lend themselves to a combined process more than others (for example, multiplying 20 digit numbers does not really benefit from the intuition of a human so the ability of a human and computer to perform this task is merely their additive ability). </p>
<p>What Kasparov noticed is that the mixing was primarily based on the quality of the process rather than the analytical power of either the human or computer separately. This seems to imply that we must somehow account for the fact that the quality of the human-computer interface is responsible for the quality of the mixing. This can be modeled as a unitless friction of interaction <strong><em>f<sub>i</sub></em></strong> that impedes the ability of the human and computer to work together. </p>
<p>Equation (7) can thus be re-written as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq8.png" alt="eq8"/>(8)
</div>
<p>In this case, the maximum value for the mixing capability is realized when the friction of interaction goes to zero. This mixing capability is the same as the equation Ari developed (less the coefficient which is necessary to maintain consistent units throughout).</p>
<p>We can now re-write our analytic capability in (5) as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq9.png" alt="eq9"/>(9)
</div>
<p>Below, see a plot of this function over a range of values for <strong><em>h</em></strong>, <strong><em>c</em></strong> and <strong><em>f<sub>i</sub></em></strong>:</p>
<div style='text-align: center; margin: auto; margin-bottom: 1em;'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt=""/>
</div>
<p>As can clearly be seen from this functional plot (note the vertical scale), the effect of interface friction dominates over the other terms whenever both the human and computer can make important contributions to the task at hand. The conclusion can be drawn that the most effective way to solve analytical problems is to minimize the friction of the human-computer interface; or to put it another way: optimal analytical systems are those that are built specifically to maximize the ability of the human to leverage the ability of the computer.</p>
<p>I am certain there is still the possibility for further refinement, for example:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq10a.png" alt="eq10a"/>(10)
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Haiti: effective recovery through analysis</title>
		<link>http://blog.palantirtech.com/2010/04/05/haiti-effective-recovery-through-analysis/</link>
		<comments>http://blog.palantirtech.com/2010/04/05/haiti-effective-recovery-through-analysis/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 21:58:56 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1336</guid>
		<description><![CDATA[Visualizing SMS hotspots in days following the earthquake in Palantir. Screenshot courtesy of Palantir Technologies [Editor's Note: an edited version of this post first appeared on O'Reilly's Radar blog.] The prologue was an earthquake of unexpected magnitude and location that left 250,000 dead. As computer scientists and technologists, we&#8217;re used to dealing with large numbers [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left:15px; margin-bottom: 15px; text-align: center; width: 300px'><a href="/wp-content/uploads/2010/03/haiti.png"><img src="/wp-content/uploads/2010/03/haiti-thumb.png" alt="" title="haiti" width="300" height="210" class="alignnone size-medium wp-image-1468" /></p>
<div style='text-align: center; font-size: 0.8em'>Visualizing SMS hotspots in days following the earthquake in Palantir.</div>
<div style='text-align: center; font-size: 0.6em'>Screenshot courtesy of Palantir Technologies</div>
<p></a></div>
<p><em>[Editor's Note: an edited version of this post <a href="http://radar.oreilly.com/2010/04/good-data-cuts-through-the-cha.html">first appeared on O'Reilly's Radar blog</a>.]</em></p>
<p>The prologue was an earthquake of unexpected magnitude and location that left 250,000 dead.</p>
<p>As computer scientists and technologists, we&#8217;re used to dealing with large numbers in the abstract. Expressed in human terms, the mind-boggling numbers of 250,000 dead, 300,000 injured and over 1 million people left homeless are hard to comprehend. </p>
<p>Hit the link to read more about how effective data management and analysis is crucial to recovery efforts and see specific examples of data about the situation in Haiti modeled in Palantir Government.<br />
<span id="more-1336"></span></p>
<h2>Chapter One: Rescue</h2>
<p>There was one glimmer of hope in this sea of tragedy: the world&#8217;s reaction. In the early hours and days after the quake, the focus was on pinpointing, triaging, and rescuing those in grave danger. Since those first harrowing hours, <a href="http://en.wikipedia.org/wiki/Humanitarian_response_to_the_2010_Haiti_earthquake"> the world has made plain its willingness to help the people of Haiti</a>.  Supplies of money, food, medicine, fresh water, and volunteers have been pouring into Haiti and fundraising efforts are on-going around the world.</p>
<p>Technology also played an early, crucial role, with <a href="http://www.mission4636.org/">Mission 4636</a>, <a href="http://instedd.org/">InSTEDD</a>  and <a href="http://haiti.ushahidi.com/reports/submit">Ushahidi</a> reacting lighting-fast to create a data collection system that enabled people in trouble to quickly communicate their urgent needs to rescuers and relief workers . If you haven&#8217;t already read it, Lukas Biewald&#8217;s piece, <a href="http://radar.oreilly.com/2010/03/how-crowdsourcing-helped-haiti.html">How crowdsourcing helped Haiti&#8217;s relief efforts</a>, is a great look at those first, and most urgent efforts to collect data and synthesize information about the situation on the ground.</p>
<h2>Chapters Two Through Many: Recovery</h2>
<div style='float: right; margin-left:15px; margin-right: 15px'>
<a href="/wp-content/uploads/2010/03/p-hti0366.jpg"><img src="/wp-content/uploads/2010/03/p-hti0366-thumb.jpg" alt="" title="p-hti0366" width="300" height="200" class="alignnone size-medium wp-image-1475" /></a><br/></p>
<div style='text-align: center; font-size: 0.8em'>The extent of the devastation in Haiti.</div>
<div style='text-align: center; font-size: 0.6em'>Photo courtesy of Marko Kokic / ICRC / American Red Cross</div>
</div>
<p>Unfortunately, even partial recovery in Haiti will take years at the bare minimum.<a href="http://www.miamiherald.com/2010/01/17/1429872/vice-president-joe-biden-stresses.html"> U.S. Vice President Joe Biden stated on 16 January</a> that President Obama &#8220;does not view this as a humanitarian mission with a life cycle of a month. This will still be on our radar screen long after it&#8217;s off the crawler at CNN. This is going to be a long slog.&#8221;   </p>
<h3>Building the Deep, Big Picture</h3>
<p>The recovery from a disaster of this magnitude presents some important tasks in the sphere of information technology: coordination of effort, triaging those most in need, and getting good data into the hands of decision makers and aid workers.</p>
<p>Here&#8217;s a partial list of aid, relief, and rescue organizations currently in Haiti, gleaned from <a href="http://en.wikipedia.org/wiki/2010_Haiti_earthquake#Rescue_and_relief_efforts">Wikipedia</a>: </p>
<ul>
<li>An Argentine military field hospital</li>
<li>The Red Cross/Crescent, in various forms</li>
<li>The US military</li>
<li>Multiple UN agencies</li>
<li>Remnants of the Haitian government</li>
<li>The French navy</li>
<li>Sri Lankan relief workers</li>
<li>At least 2000 rescuers from 43 different groups (along with 161 search dogs)</li>
</ul>
<p>A wealth of collaborators like this presents some unique challenges around information fusion: unlike business competitors or opposing sides of a war, the different groups <em>want</em> to share as much information as possible to achieve their common goal.  A unified organization, like a single national military will have pre-existing methods to model and share their <a href="http://en.wikipedia.org/wiki/Situational_awareness">situational awareness</a>. That is not the case in Haiti: this is a collection of groups coming together to form an ad-hoc relief force. Everything from differences in human languages, database schema, collection methodology, and problem domain make most of the datasets seemingly disjoint from the others.</p>
<p>However, each organization has a produced a fairly detailed picture of the parts of Haiti that they are interacting with.  Each organization also wants to consume every other&#8217;s organization&#8217;s detailed knowledge of the situation.  To act effectively, they need to integrate that knowledge into a common operating picture that accurately models the situation on the ground yesterday, today, and tomorrow.</p>
<h3>Analyzing the Haiti situation using Palantir Government</h3>
<p>Our reaction to the earthquake was to try to help in the best way we knew how.  We set up a <a href="http://haiti.paas.palantirtech.com/">publicly available instance of our Palantir Government product</a>, already loaded with relevant data, for use by aid workers and organizations working in Haiti.  Using relevant, open-source data we&#8217;ve started modeling a picture of what&#8217;s going in Haiti.  </p>
<p>Our first cut was to include the locations and names of collapsed buildings, Internally Displaced People (IDP) camps, and Misson 4636 SMS messages, among others.  We also added in map layers that let us see what administrative zone any point on the map is located in.</p>
<p>Having mapped the data into this model, users have access to it through a suite of visualization, analysis, querying, and collaboration tools that allow them to get useful answers to practical questions.  Here are some examples:</p>
<ul>
<li>Which administrative sectors have had the most SMS requests for food in the past 24 hours?</li>
<li>What collapsed buildings are there that may contained hazardous materials that will require special cleanup?</li>
<li>Are any IDP camps near enough to these hazmat sites to warrant special precautions or moving the residents?</li>
</ul>
<p>We&#8217;ve created a video showing all the pieces put together into a seamless whole, using live data in our publicly available Haiti instance:</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=hhttp://media.palantirtech.com/government/videos/haiti/haiti2.flv" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti2.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/government/videos/haiti/haiti2.flv"/></object>
</div>
<h2>The Next Chapter: Flooding</h2>
<div style='float: right; margin-left:15px; margin-bottom: 15px; width: 300px; text-align: center;'>
<a href="/wp-content/uploads/2010/03/p-hti0508.jpg"><img src="/wp-content/uploads/2010/03/p-hti0508-thumb.jpg" alt="" title="p-hti0508" width="300" height="199" class="alignnone size-medium wp-image-1472" /></a><br/></p>
<div style='text-align: center; font-size: 0.8em'>A view of the water point at the Citee Renault camp  in Port-au-Prince, Haitii</div>
<div style='text-align: center; font-size: 0.6em'>Photo courtesy of Joe Lowry / IFRC / American Red Cross</div>
</div>
<p>From the <a href="http://www.redcross.org/portal/site/en/menuitem.1a019a978f421296e81ec89e43181aa0/?vgnextoid=0fe6e0b8da8b6210VgnVCM10000089f0870aRCRD">Red Cross website</a>: </p>
<blockquote><p>      “We’re racing against the clock with hurricane season just around the corner,” said Jean Pierre Taschereau, a Red Cross disaster expert just back from Haiti. “Getting semi-permanent structures in place as well as trenches for sanitation latrines will be critically important.”</p></blockquote>
<p>From <a href='http://www.abc.net.au/lateline/content/2010/s2844832.htm'>&#8220;Quake-ravaged Haiti faces flooding&#8221;</a>:</p>
<blockquote><p>
The UN wants to move 200,000 people out of overcrowded camps like this one. The Haitian government is trying to find land. It&#8217;s identified five sites outside of the Haitian capital, but those five sites are about 200 hectares and by the UN&#8217;s estimates 600 hectares will be needed to house the people it plans to move safely to have proper drainage when the rainy season finally arrives.
</p></blockquote>
<p>Haiti&#8217;s rainy season is notorious for causing flooding.  Now, with the infrastructure of the country destroyed, flood season will be more dangerous than usual.  Not only are the normal structures that protect people from the waters gone, but they&#8217;ve moved out of the ruins of Port-au-Prince to hastily constructed IDP camps, some of which are sitting in the flood plains of Haiti&#8217;s waterways.</p>
<p>The essential question facing relief workers: <em>Which of the approximately 2500 IDP camps are most at risk from flooding?</em></p>
<p>In a place like the United States, an earthquake response and recovery team could engage the services and expertise of the US Geological Survey,  which maintains the <a href="http://waterdata.usgs.gov/nwis">National Water Information System</a>, a warehouse of detailed information about all things water in this country. No such luck in Haiti, where the closest thing to the USGS is the <a href="http://www.cnigs.ht/">Centre National de l&#8217;Information Géo-Spatiale</a>.  A quick look at their website shows that they didn&#8217;t really make it through the earthquake. (In the video, we feature a picture of what&#8217;s left of their facility &mdash; it&#8217;s not pretty).</p>
<p>Since we&#8217;re starting from square one we put together data from the <a href="http://www.agc.army.mil/Haiti/index.html">Army Geospatial Center</a>, <a href="http://ochaonline.un.org/tabid/6412/language/en-US/Default.aspx">the UN</a>, <a href="http://www.noaanews.noaa.gov/stories2010/20100119_haiti.html">NOAA</a>, Haiti-based NGOs, a number of academic papers, and even <a href="http://www.flickr.com/photos/tags/earthquake/map?&#038;fLat=18.5873&#038;fLon=-72.3666&#038;zl=6&#038;order_by=recent">geo-tagged photos from Flickr</a>.  The time it took to integrate this data? About six hours.  Time it took to do the analysis?  About seven minutes.  Amount of that work that is reusable?  All of it.</p>
<p>Check out this video for a walk-through of the analysis:</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv"/></object>
</div>
<p>The best way to improve this analysis will be to add more detailed information about flooding, gathered from the field.  We&#8217;re looking into getting new conduits of information into the Haiti instance to make this a reality as the rains really pick up.</p>
<h2>A Call To Action</h2>
<p>If you&#8217;d like to help us, we&#8217;re accepting new data sources, analyses, and contact with relief organizations.</p>
<p>Volunteers, supplies, and goodwill are only the raw ingredients to recovery; it&#8217;s the efficient and timely application of those resources to Haiti&#8217;s most pressing problems that will change lives and make recovery a reality instead of just a good intention.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/04/05/haiti-effective-recovery-through-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Friction in Human-Computer Symbiosis: Kasparov on Chess</title>
		<link>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/</link>
		<comments>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:32:06 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1302</guid>
		<description><![CDATA[As we build our platforms and applications following a human-computer symbiosis approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way. One of the areas that we&#8217;re interested is in the overall friction of analysis systems. The systems that we build are [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px;'>
<img src='/wp-content/uploads/2010/03/fools-mate.gif'/>
</div>
<p>As we build our <a href="http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/">platforms</a> and <a href="http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/">applications</a> following a <a href="http://en.wikipedia.org/wiki/Intelligence_amplification">human-computer symbiosis</a> approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way.</p>
<p>One of the areas that we&#8217;re interested is in the overall friction of analysis systems.  The systems that we build are built on commodity hardware &mdash; we&#8217;re not building faster computers and yet we can deliver orders-of-magnitude better performance on analysis tasks than existing solutions.  How do we do this?  By building software in such a way that it reduces the friction experienced at the boundaries between the computing power, the analyst,  and the source data.</p>
<h2>Chess as analysis laboratory</h2>
<p>Chess is, at its heart, a predictive venture.  The player attempts to anticipate their opponent&#8217;s moves, planning their own moves accordingly, with the straightforward goal of finding a sequence of piece moves that force checkmate. </p>
<p>This game is, in its ideal form, analysis. (The moves made are the logical extension of the analysis.)  The data are clean, the problem is well-defined and everyone plays by the same rules.  There are even <a href="http://en.wikipedia.org/wiki/Elo_rating_system">well-defined metrics for ranking chess players by skill</a> &mdash; a better chess player is a better chess-game analyst.  </p>
<p>In the realm of evaluation of analysis systems, this is as about as good as it gets in terms of designing controlled experiments to study the relative strengths of different analysis systems.</p>
<p><a href="http://en.wikipedia.org/wiki/Garry_Kasparov">Garry Kasparov</a>, widely considered to be the greatest chess player of all time,  recently wrote <a href="http://www.nybooks.com/articles/23592">a review of Diego Rasskin Gutman&#8217;s book</a>, <a href="http://www.amazon.com/Chess-Metaphors-Artificial-Intelligence-Human/dp/026218267X"><u>Chess Metaphors: Artificial Intelligence and the Human Mind</u>.</a></p>
<p>The review is excellent and covers a lot of ground.  However, one particular anecdote stood out as a very interesting example of human-computer symbiosis (emphasis added):</p>
<blockquote><p>In 2005, the online chess-playing site Playchess.com hosted what it called a &#8220;freestyle&#8221; chess tournament in which anyone could compete in teams with other players or computers. Normally, &#8220;anti-cheating&#8221; algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less &#8220;intelligent&#8221; than the playing programs they detect.)</p>
<p>Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.</p>
<p>The surprise came at the conclusion of the event. <em>The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time.</em> Their skill at manipulating and &#8220;coaching&#8221; their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. <em>Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</em></p></blockquote>
<p>After the jump, we look at this finding in a more generalized way and map it onto the Palantir approach.<br />
<span id="more-1302"></span></p>
<h2>The cyborg Grandmaster: a fearsome opponent</h2>
<p>The tournament Kasparov recalls was a showcase of chess talent, human-computer symbiosis, and raw computing power.  Among those entered  in the tournament were a purpose-made chess machine (similar to <a href="http://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)">Deep Blue</a>) named <a href="http://en.wikipedia.org/wiki/Hydra_(chess)">Hydra</a> and a team of <a href="http://en.wikipedia.org/wiki/Grandmaster_(chess)">Grandmasters</a> assisted by computer programs.</p>
<p>One losing participant had this to say about the computer-aided Grandmasters:</p>
<blockquote><p>
Secondly, I have learned that a <a href="http://en.wikipedia.org/wiki/Grandmaster_(chess)">Grandmaster</a> armed with a chess engine is a killer combination against a plain Engine. Engines see everything via brute force, Grandmasters use their intuition and are able to see &#8220;obvious&#8221; moves at once. So the two of them together are a mighty force.
</p></blockquote>
<p>This is just as Licklider predicted 50 years ago &#8212; quoting <a href="http://blog.palantirtech.com/man-computer-symbiosis/">Man-Computer Symbiosis</a> (if I could put it better, I would):</p>
<blockquote><p>
Men will set the goals and supply the motivations, of course, at least in the early years. They will formulate hypotheses. They will ask questions&#8230; In general, they will make approximate and fallible, but leading, contributions, and they will define criteria and serve as evaluators, judging the contributions of the equipment and guiding the general line of thought.</p>
<p>&#8230;</p>
<p>In addition, the computer will serve as a statistical-inference, decision-theory, or game-theory machine to make elementary evaluations of suggested courses of action whenever there is enough basis to support a formal statistical analysis. Finally, it will do as much diagnosis, pattern-matching, and relevance-recognizing as it profitably can, but it will accept a clearly secondary status in those areas.
</p></blockquote>
<p>So in classic intelligence amplification fashion, having computer programs that can quickly evaluate a move&#8217;s likelihood of success can <em>amplify the power of the Grandmaster</em>.</p>
<p>While empirically true, it does beg the question: how <em>much</em> does it amplify the power of the Grandmaster?</p>
<p>One approximation might be product as a simple linear amplification.  Let&#8217;s imagine a function, <em>a(h,c)</em>, in which the analytic power (<em>a</em>) is the product of power of the human (<em>h</em>) and the computing power of the chess engine being used (<em>c</em>).  This gives us the equation:</p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-simple.png'/>
</div>
<h2>One term to dominate them all: friction-of-interface</h2>
<p>Does this simple approximation hold up?  It does not. The team that won the <a href="http://www.chessbase.com/newsdetail.asp?newsid=2461">PAL/CSS Freestyle Tournament in 2005</a> was composed of two amateur chess players that were able to best a computer-assisted Grandmaster.</p>
<p>How did  they accomplish this feat?  It was not through superior compute power.  Instead, they did so by more effectively feeding insights to their three chess engines. They played so well that a large number of people actually assumed that it was actually Kasparov himself playing:</p>
<blockquote><p>
Many speculated that it might be Garry Kasparov, who was the initiator of this kind of computer assisted chess matches. When we asked him Kasparov confirmed that was not the case. But he reminded us that it doesn&#8217;t really matter. The guiding principle of Freestyle Chess: anything is allowed. &#8220;Even if they were assisted by the devil, that would probably be covered by the rules,&#8221; he joked. &#8220;Only the moves they played count.&#8221;
</p></blockquote>
<p>What does this mean for our simple equation? Well, it looks it&#8217;s missing a term, one we&#8217;ll call <em>f</em>, that describes the efficiency or <strong>friction</strong> of the interface between human and computer.</p>
<p>Quoting Kasparov again:</p>
<blockquote><p>
<em>Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</em>
</p></blockquote>
<p>The implication being that the equation actually looks like this:</p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-variable-h.png'>
</div>
<p>So as the friction of the interface goes to zero, the full amplification of the chess engine is brought to bear.  A quick gut-check in the opposite direction agrees: one can imagine the world&#8217;s most powerful chess engine with the world&#8217;s worst interface; spending the time it would take to express commands to this theoretically awful program would actually be worse than playing without it.</p>
<h2>Palantir: a low-friction interface to data</h2>
<p>As analysis problems go, chess resembles <a href="http://en.wikipedia.org/wiki/Spherical_cow">a spherical cow in a vacuum</a>.  Analysis problems in the real world are orders of magnitude messier.</p>
<p>Let&#8217;s reframe the terms of our equation above into a more general approach to analysis:</p>
<ul>
<li><em>H</em> &#8211; this is power of the analyst.  In chess, the value of this terms varies widely between players; in designing real-world data analysis systems, this is more or less a constant (which is why <em>h</em> above becomes <em>H</em> below).  Of course there are differing levels of expertise, training, and raw ability amongst the user population, but when we design systems, it&#8217;s with the average case in mind.</li>
<li><em>c</em> &#8211; computing power. How fast are the machines?  How well do they scale?  How efficiently do they perform the data tasks at hand? Palantir spends significant engineering effort on optimizing the <em>c</em> term, but most of the growth in this term comes from the layers we depend on, built by companies like Intel, Sun, Oracle, etc.</li>
<li><em>f</em> &#8211; friction.  How easy is it to bring <em>c</em> to bear on the problem? Note that when we talk about <em>friction of interface</em>, this is not exclusively referring to user interface.  More generally, friction can be present at any interface between two systems: data-software, software-software, human-software, etc. The <em>f</em> that we consider in this simple model is sum total system friction.</li>
</ul>
<p>So our final formulation is just in terms of <em>c</em> and <em>f</em> (holding <em>H</em> as a constant): </p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-final.png'>
</div>
<p>When we discuss friction in real-world analysis systems, the friction actually exists at multiple levels:</p>
<ol>
<li>Creating an analysis model that will enable answering the questions that need to be explored</li>
<li>Integrating the data into a single coherent view of the problem</li>
<li>Enabling analysis tools to efficiently query and load the data</li>
<li>Exposing APIs that allow developers to develop custom solutions quickly and efficiently for modeling and analysis tasks not covered by general tools</li>
<li>User interface that makes the tools easy, enjoyable, and quick to use</li>
</ol>
<h3>Minimizing <em>f</em>: Haiti Flooding Predictions</h3>
<p>If this is starting to sound very similar to Palantir&#8217;s marketing information, this is no accident. While some of our backend engineers are concerned with things like scaling and speed-of-querying, the overall innovation that we&#8217;re bringing to the field is not simply about faster data processing systems (even if they are) but reducing the friction at every interface inside a complex human-computer symbiotic system.</p>
<p>You want an example that ties it all together?  It starts with a simple question: which of the many displaced-person camps in Haiti are most at risk for flooding as the rainy season approaches?  Easy to ask, but not so simple to answer. </p>
<p>The original introduction to this video: </p>
<blockquote><p>As we enter the beginning of the rainy season in Haiti, one of the biggest problems facing relief organizations today is the spectre of flooding and mudslides destroying Internally Displaced Persons (IDP) Camps. In this video, we integrate data from many sources to determine high risk aid locations.
</p></blockquote>
<p>The data integration for this video took about six hours, using sources of data that had never before been fused.  The analysis itself takes a few minutes and quickly comes to an actionable answer to the original question.</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv"/></object>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Palantir: like an operating system for data analysis</title>
		<link>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/</link>
		<comments>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/#comments</comments>
		<pubDate>Sat, 07 Nov 2009 03:21:44 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1198</guid>
		<description><![CDATA[If you&#8217;ve taken the time to peruse the Palantir Government analysis blog, you&#8217;ve seen numerous examples of Palantir Government as applied to interesting problems; they are recorded screen captures of our analysis desktop client. It&#8217;s a showcase of useful, meaningful, and compelling visual and semantic tools being used to do analysis on a wide range [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/VisiCalc'><img src='/wp-content/uploads/2009/11/visicalc.png' width='250'/></a>
</div>
<p>If you&#8217;ve taken the time to peruse the Palantir Government <a href='http://www.palantirtech.com/government/analysis-blog'>analysis blog</a>, you&#8217;ve seen numerous examples of Palantir Government as applied to interesting problems; they are recorded screen captures of our analysis desktop client.  It&#8217;s a showcase of useful, meaningful, and compelling visual and semantic tools being used to do analysis on a wide range of datasets.</p>
<p>What enabled this analysis? Aside from the <a href="http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/">obvious hard work of our UI and analysis tools teams</a>, it&#8217;s the flexibility and power of the Palantir data platform.  More than just a scalable datastore, the Palantir data platforms act as robust and clean abstractions on top of data.</p>
<p>One of the early architecture decisions that we made when building both <a href="http://www.palantirtech.com/government">Palantir Government</a> and <a href="http://www.palantirfinance.com/">Palantir Finance</a> was to separate the respective data platforms from the end-user applications used to actually perform analysis.  More than just following the client-server model, this separation made the data servers in both products into generic intelligence infrastructure for analytic problems, with our clients acting as analysis applications on top of those platforms.</p>
<p>And so, one way to look at our data platform is as an operating system for analytic applications.  In this post we&#8217;ll explore the history of operating systems, understand why they&#8217;re so important and see how the Palantir data servers deliver the same potential to revolutionize the writing of analysis software that operating systems did to the writing of general programs for computers.</p>
<p><span id="more-1198"></span></p>
<h2>The OS: abstraction that begat a paradigm</h2>
<p>In the early days of computing, when a programmer wanted to write a program, they had to understand the inner workings of the machine. Writing a program required understanding things like the bus interface of a specific model of hard drive when all that was needed by the program was the clean abstraction of a filesystem. The upshot of this is that much of the time and effort put into a given task was spent writing code to interface with the &#8220;physical&#8221; minutiae of the machine rather than implementing the solution to the problem that the programmer was trying to solve with their software.</p>
<p>This pattern was observed by  <a href="http://en.wikipedia.org/wiki/J._C._R._Licklider">J.R. Licklider</a> and noted in his influential paper, <i><a href="http://blog.palantirtech.com/man-computer-symbiosis/">Man-Computer Symbiosis</a></i> (emphasis added):</p>
<blockquote><p>
<b>About 85 per cent of my “thinking” time was spent getting into a position to think, to make a decision, to learn something I needed to know. Much more time went into finding or obtaining information than into digesting it.</b> Hours went into the plotting of graphs, and other hours into instructing an assistant how to plot. When the graphs were finished, the relations were obvious at once, but the plotting had to be done in order to make them so.<br />
…<br />
<b>Throughout the period I examined, in short, my “thinking” time was devoted mainly to activities that were essentially clerical or mechanical</b>: searching, calculating, plotting, transforming, determining the logical or dynamic consequences of a set of assumptions or hypotheses, preparing the way for a decision or an insight. <b>Moreover, my choices of what to attempt and what not to attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability.</b>
</p></blockquote>
<p>This description of his time as a researcher was echoed in the work of the early programmers: they spent much of their programming time re-inventing the wheel and writing routines that were doing essentially clerical or mechanistic work related to the functioning of the hardware rather the core functions of their programs.</p>
<p>The operating system changed all that: suddenly (and by that I mean: with years of hard work, research, and incremental change) that noisy, inconsistent pile of hardware was transformed into a set of clean abstractions. The programmer was finally freed to spend time and energy on the problem they were really trying to solve.</p>
<p>And so we come to the modern era: dealing with the messy details of hardware has been replaced by the clean and robust abstraction of the operating system.</p>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Operating_system'><img src='/wp-content/uploads/2009/11/250px-operating_system_placementsvg.png' width='250'/></a>
</div>
<p>Three important properties of modern operating systems:</p>
<ul>
<li><b>Hard boundaries between OS functions and process functions</b> &#8211; in modern operating systems, this is usually accomplished with system calls.  The process places the inputs to the system call in a known location and then asks the OS to perform some operation, like writing to a file or making a network connection.  The OS may or may not perform the function, based on things like permissions, availability of resources, etc.
<p>The most important feature here is that the process never has direct access to the true resources of the machine &mdash; instead, all access to the machine&#8217;s resources are brokered by the OS.
</li>
<li><b>Extensions of the abstraction in every direction</b> &#8211; An OS like Linux is really, at its core, a kernel that does process scheduling and lifecycle, manages memory, and services system calls. Everything else is handled by some sort of driver.  A driver might also be called, more generically, a plugin or extension.  Drivers exist for everything from block devices (like hard drives), network cards, and filesystems to input devices and displays.</li>
<li><b>Designed as a general purpose framework</b> &#8211; the operating system <i>doesn&#8217;t actually do any computing</i>; rather, it&#8217;s a set of services to facilitate processes using the resources of the computer.  To that end, they&#8217;re not designed with a specific process in mind, but rather to serve a large class of programs, each designed and written to accomplish a different task using a similar set of resources.</li>
</ul>
<h2>Analysis: the modern computing task</h2>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/ENIAC'><img src='http://upload.wikimedia.org/wikipedia/commons/archive/4/4e/20050923152626!Eniac.jpg' width='250'/></a></div>
<p>The first computer, <a href="http://en.wikipedia.org/wiki/ENIAC">ENIAC</a>, was conceived to do calculation of ballistics tables for artillery pieces &mdash; it was a glorified calculator. Lacking anything even resembling an operating system, it would just run its program. Its compiler? A group of six women who would configure the machine by hand with the program logic.  The input for its first test run, a calculation related to the hydrogen bomb project, was approximately <i>one million punch cards</i>.</p>
<p>Times have changed: 40 or so years of the unrelenting march of Moore&#8217;s Law in computing power has given us something like an <b><a href="http://upload.wikimedia.org/wikipedia/commons/thumb/c/c5/PPTMooresLawai.jpg/596px-PPTMooresLawai.jpg">eight order of magnitude increase</a></b> in the amount of computing power available per unit cost.  Coupled with similar,<a href="http://www.kk.org/thetechnium/archives/2009/07/was_moores_law.php"> more recent gains in storage capacity and network bandwidth</a>, this has produced a world awash in data, <a href='http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/'>crying out for analysis.</a></p>
<p>So the situation today is that we now expect to bring these considerable computing resources to bear on larger, more complex problems in the world.  I&#8217;m talking about things like the <a href="http://www.palantirtech.com/government/analysis-blog/traceback">spread of food-borne illnesses</a>, understanding the connection between genes and protein expression, <a href="http://www.palantirtech.com/government/analysis-blog/sinjar">understanding terrorist networks</a>, <a href="http://www.palantirtech.com/government/analysis-blog/uncovering-a-bot-net-exploring-router-data-using-palantir">finding botnets in network traffic logs</a>, and <a href="http://www.palantirtech.com/government/analysis-blog/transparency">exploring influence networks in government</a>.</p>
<p>These problems, while spanning a widely disparate areas of analysis, share some common traits:</p>
<h3>The data is spread out</h3>
<p>They are described by multiple data sources. Just to make things more interesting: the data sources don&#8217;t agree on their native representations of the real-world data. And finally, the real-world objects that the data are describing are actually described in multiple data sources, with no single source giving a complete and accurate representation.</p>
<h3>The data schema are not human-conceptual</h3>
<p>Rather than representing the data in some schema that maps easily into how the experts on a given problem think about said problem, the data stores in question tend to model data in whatever way was convenient for the creators of that particular data store. Put another way: people don&#8217;t think in tables, rows, columns, and XML snippets.  These first-class data storage elements don&#8217;t usually map to real-world objects.</p>
<h3>The data is sensitive</h3>
<p>Whether it&#8217;s patient information, <a href="http://www.palantirtech.com/government/analysis-blog/horizon">mortgage data</a>, a law enforcement investigation, or sensitive foreign intelligence, there is often the need for <a href="http://www.palantirtech.com/government/analysis-blog/mls">foolproof access controls on the data</a>.</p>
<h2>Palantir: an operating system-class abstraction for analysis</h2>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'><img src='http://blog.palantirtech.com/wp-content/uploads/2009/01/shot0016.png' width='250'/></div>
<p>A Palantir data server provides a similar class of services that an operating system does but focused on the specific needs of analytic tasks.  Here I&#8217;ll focus on the model used by Palantir Government; Palantir Finance uses a similar but significantly different approach to delivering these services.</p>
<p>As you might imagine, however, they both start at a somewhat higher level than punch cards.</p>
<h3>It starts with an ontology</h3>
<p>The Palantir approach to analysis begins with a task-specific ontology: essentially, a human-conceptual description of the real-world problem that&#8217;s being analyzed.</p>
<p>It&#8217;s roughly composed of three pieces:</p>
<ul>
<li>A hierarchical type system of the real-world objects that human experts use to think about this problem. We call these <i>PTObjects</i>, short for &#8220;Palantir Objects&#8221;.</li>
<li>A type system of properties that will contain the data describing these PTObjects.  PTObjects are essentially typed containers for properties. This is where most of the detail of the ontology lies.</li>
<li>A type system of possible relationships between different types of PTObjects.</li>
</ul>
<p>Within the ontology, there are numerous extension points that allow the customization of how data is imported, retrieved, and displayed (following the principle of <i>extending the abstraction in all directions</i>).</p>
<p>The data server takes the ontology as input and is agnostic to its content. This is where the principle of <i>building a general purpose framework</i> comes into play.</p>
<h3>The data sources are mapped into the ontology</h3>
<p>This part of the Palantir data server is a pattern that is very similar to an operating system&#8217;s notion of block device drivers. The difference? Instead of low-level storage systems like hard drives, we&#8217;re dealing with complex databases describing the problem at hand.</p>
<p>In an operating system, every block device can read and write blocks of data.  In the Palantir data server, everything becomes a source of PTObjects.</p>
<p>Our data importer plugins, by analogy,  fulfills the same role as a block device driver:<br />
we build glue code to map the data source&#8217;s schema into the ontology and the connectors to surface the data itself wrapped up in PTObjects.</p>
<h3>The data are composed into real-world objects.</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='/wp-content/uploads/2009/11/pg-object-model.jpg'><img src='/wp-content/uploads/2009/11/pg-object-model.jpg' width='250'/></a>
</div>
<p>Part of this mapping is composing real-world objects into composite PTObjects by resolving PTObjects together.</p>
<p>The operation of resolving is pretty straightforward: we basically union the properties of the two PTObjects into a new PTObject. The end result is a single PTObject that completely represents all the data about something in the real-world from all the available data sources.</p>
<p>As we do this composition, we keep track of where each property came from, down to the record level, in each of its original sources.  (Note that most composed PTObjects will usually have at least one property that comes from two sources).  By preserving the original identity of every atom of data, it allows us to later decompose these PTObjects into their constituent parts or, more importantly, censor a client&#8217;s view based what permissions they have for each of the original data sources.</p>
<p>This a fundamental operation in our system that doesn&#8217;t have an exact analog in operating systems &#8212; it&#8217;s sort of similar to taking  multiple filesystems and mounting them inside a virtual filesystem tree, like Unix does.  However, if each data source is like a filesystem, what we&#8217;re doing is essentially composing individual files from their fragments stored on multiple block devices.</p>
<p>Another analogy: at a level below the block device in the OS, this is also sort of similar to what a <a href="http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_0">RAID0</a> device does, the difference being that our composition is based on the contents of the data itself rather than some previously applied, content-agnostic, decomposition function.  The other difference being motivation: a RAID0 does it for performance, while Palantir is composing data to make it correspond to the real-world objects it represents.</p>
<h3>The server exposes Palantir &#8220;system calls&#8221;</h3>
<p>The interface that the Palantir data server exposes can be boiled down to two essential operations:</p>
<ul>
<li>The client can download copies of PTObjects from the server.  It may request them by id or perform some sort of search/query to specify a set of PTObjects.  This is roughly analogous to the <b><a href="http://en.wikipedia.org/wiki/Open_%28system_call%29">open()</a></b> and <b><a href="http://comsci.liu.edu/~murali/unix/read.htm">read()</a></b> system calls on Unix.
<p>Note that each client only sees the subset of properties for a given PTObject that it is authenticated for.  This censorship of full PTObjects into projected slices is something done by the server on every load of PTObjects.</li>
<li>The client can send new or updated PTObjects to the data server for storage. This is roughly analogous to the <b><a href="http://www.freebsd.org/cgi/man.cgi?query=write&#038;sektion=2&#038;manpath=FreeBSD+7.2-RELEASE">write()</a></b> system call in Unix. It, of course, entails a check as to whether the given client has permission to write to the given PTObject.</li>
</ul>
<p>The server&#8217;s responsibility is the same as the operating system: only let the client do what it has been granted permission to do.  In an operating system, the OS uses hardware features like <a href="http://en.wikipedia.org/wiki/Protected_mode">protected mode</a> to keep lower-privileged processes from accessing machine resources. Palantir uses network calls to achieve the same separation, by placing the client and server on different logical machines.  The effect is the same: the client basically requests (rather than commands) that certain operations are performed by the server.  The server uses its own rules to decide if the access or change is allowed and responds accordingly. And so the principle of <i>hard boundaries</i> is implemented.</p>
<h3>The clients do the analysis</h3>
<p>When an operating system yields to a process, that&#8217;s the time when the true processing begins.  By the same token, in Palantir, it&#8217;s not until a client connects and starts searching, visualizing, and manipulating PTObjects that analysis actually starts taking place (even if the server is doing a lot of the heavy lifting).</p>
<h2>The wide open future</h2>
<p>So why is this exciting?  I&#8217;m glad you asked!</p>
<h3>It&#8217;s about taking analysis to the next level.</h3>
<p>Let&#8217;s say you&#8217;re someone who wants to write an analytic task. Let me ask you a series of rhetorical questions:</p>
<ul>
<li>Do you want to start with three disparate sources of data or with the data already mapped into a Palantir data server?</li>
<li>Which one is a better use of your time as a programmer?</li>
<li>Which one allows you to not repeat mistakes that other programmers have already made and fixed?</li>
<li>Which one is more like writing a program than an operating system?</li>
</ul>
<p>Operating systems took us to a new level of expressiveness when it came to writing computing processes to run on computing hardware. It inverted that 85/15 ratio that Licklider talked about so that programmers spent more time writing the code that did the thing they were trying to create and less time mucking around with hardware.</p>
<p>More programmer time == better analytic tasks.</p>
<h3>It&#8217;s about making machine learning easier.</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Skynet_%28Terminator%29'><img src='http://images1.wikia.nocookie.net/terminator/images/8/8a/Cyberdyne_logo.jpg' width='250'/></a>
</div>
<p>Now consider machine learning as a field.  Pretty much every machine learning task could benefit from starting with its data in something that looks like a Palantir data server.  I&#8217;ve taken an informal survey of machine learning researchers and they agree: the 85/15 ratio still holds for machine learning.</p>
<p>Simply put: <b>most of the time and effort in machine learning is spent getting the data into a form that you can actually apply an algorithm to!</b> Now imagine if the starting point for that was a Palantir data server &mdash; now the machine learning implementer has a world of expressiveness open to them and time and energy are spent on the task at hand instead of the overhead of messing with the data.</p>
<p>Now, we don&#8217;t think that we&#8217;re building Skynet.  Quite the contrary: we believe that platforms like the one we&#8217;ve built will allow machine learning techniques to be put in the hands of experts to augment their ability to look at the world come to conclusions about complex real-world problems by asking questions of the data we&#8217;ve collected. It&#8217;s about <a href="http://en.wikipedia.org/wiki/Intelligence_amplification">Intelligence Augmentation</a>, which can use machine learning techniques and algorithms to build better tools, not creating <a href="http://en.wikipedia.org/wiki/Strong_AI">Strong AI</a>.</p>
<h3>It&#8217;s about creating new markets</h3>
<p>Let&#8217;s go back to the well of operating systems and look back at the history of MS-DOS: the first &#8220;killer&#8221; application on MS-DOS was <a href="http://en.wikipedia.org/wiki/VisiCalc">VisiCalc</a> (that screenshot at the top of this post), a text-based spreadsheet.  As you know, VisiCalc was not the end of the story but just the introduction. MS-DOS, evolved into Windows, allowed application writers an (arguably) clean abstraction on top of commodity hardware in order to build the applications that users actually wanted. Today, we have things like web browsers, multimedia authoring software, virtual machines, and IDEs built on top of what is, essentially, the same set of abstractions that VisiCalc was built on.</p>
<p>However, the most important thing to note is that VisiCalc is credited with creating the market for commercial operating systems &#8212; businesses needed VisiCalc so they paid Microsoft for MS-DOS (and IBM for a PC).  Without VisiCalc, there was no market for MS-DOS (most people, unsurprisingly, didn&#8217;t want to buy a <a href="http://en.wikipedia.org/wiki/Microsoft_BASIC">BASIC interpreter</a>).</p>
<p>We&#8217;re in the business of selling software and we agree with our customers: the Palantir approach has tremendous value.  We&#8217;ve just started tapping the potential of this market.  Think about what Oracle looked like in 1979, think what Microsoft looked like in 1980 &mdash; that&#8217;s Palantir in 2009.</p>
<h3>It&#8217;s about the start of the analysis age</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Information_Age'><img src='http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Internet_map_1024.jpg/600px-Internet_map_1024.jpg' width='250'/></a>
</div>
<p>It can be argued that the operating system is the innovation that ushered in the &#8220;<a href="http://en.wikipedia.org/wiki/Information_Age">information age</a>&#8220;.  Without the operating system, there is no software explosion, which allows computing technology to actually be used on data in the world.</p>
<p>We think that we&#8217;re on the cusp of the analysis age, as imagined by <a href="http://en.wikipedia.org/wiki/Vernor_Vinge">Vernor Vinge</a> in <u><a href="http://books.google.com/books?id=SrLwPdBJodMC&#038;dq=rainbow%27s+end&#038;printsec=frontcover&#038;source=bn&#038;hl=en&#038;ei=TdX0Sui9HsTh8AbGlc3zCQ&#038;sa=X&#038;oi=book_result&#038;ct=result&#038;resnum=5&#038;ved=0CBsQ6AEwBA#v=onepage&#038;q=&#038;f=false">Rainbow&#8217;s End</a></u>.  It was something foreseen by Licklider in 1960, albeit with a timeline that was off by at least a few decades:</p>
<blockquote><p>
“…it seems worthwhile to avoid argument with (other) enthusiasts for artificial intelligence by conceding dominance in the distant future of cerebration to machines alone. There will nevertheless be a fairly long interim during which the main intellectual advances will be made by men and computers working together in intimate association. A multidisciplinary study group, examining future research and development problems of the Air Force, estimated that it would be 1980 before developments in artificial intelligence make it possible for machines alone to do much thinking or problem solving of military significance. That would leave, say, five years to develop man-computer symbiosis and 15 years to use it. The 15 may be 10 or 500, but those years should be intellectually the most creative and exciting in the history of mankind.”
</p></blockquote>
<p>It&#8217;s a golden age of analysis and we&#8217;re just getting started: we&#8217;ve got a lot of work to do, so if this sort of thing excites you, please <a href='http://www.palantirtech.com/careers/culture'>come and join us.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Palantir Technologies Demo Reel: screenshots, round 3</title>
		<link>http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/</link>
		<comments>http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 18:53:27 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[fun]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[swing]]></category>
		<category><![CDATA[user interface]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1185</guid>
		<description><![CDATA[Software engineering is a craft that blends science and art. This fact is easy to overlook as the artistic aspects are often eclipsed by discussions of the science and technology behind what we do. This is not one of those times: the art in software engineering is most evident when building compelling visual interfaces, something [...]]]></description>
			<content:encoded><![CDATA[<p>Software engineering is a craft that blends science and art. This fact is easy to overlook as the artistic aspects are often eclipsed by discussions of the science and technology behind what we do.</p>
<p>This is not one of those times:  the art in software engineering is most evident when building compelling visual interfaces, something Palantir knows a thing or two about. </p>
<p>A demo reel is an industry term in the movie business &mdash; a short reel that acts as a portfolio when applying for jobs, a highlight reel of the author&#8217;s visual career.  We&#8217;re not in the movie business, we&#8217;re in the software business.  We do, however, use moving pictures to tell stories, stories backed by data &mdash; this is our demo reel: two-and-a-half minutes of data visualization and user interface eye-candy (<i><span style='font-size: 0.9em'>It has pounding music &#8212; you may want to put on headphones or turn down your speakers.</span></i>):</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/palantir_demo_reel_final.mp4" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/palantir_demo_reel_final.mp4" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/palantir_demo_reel_final.mp4"></embed></object>
</div>
<p><i><span style='font-size: 0.9em'>The movie will take a few seconds to load.  It&#8217;s 800&#215;600, so expanding to full-screen is suggested. We&#8217;ve done our best to create a streamable-yet-good-looking video.  The compression artifacts are there, but shouldn&#8217;t be too distracting.  In a real Palantir client, there are no compression artifacts and everything looks even better than it does here.</span></i></p>
<p>The Palantir family of products is much more that just pretty pictures; we have the <a href="http://www.palantirtech.com/government/videos/whitevideos">underlying intelligence infrastructure</a> to make those realtime animations possible and (more importantly) <b><i>meaningful</i></b>.  That said, we sure do think they&#8217;re pretty.</p>
<p>By the way, if you&#8217;re interested in the progression of our interfaces, this not the first time we&#8217;ve posted eye candy: we posted <a href="http://blog.palantirtech.com/2008/07/04/palantir-screenshots-round-two/">a set of updated screenshots</a> a little over a year ago; think of this as the next installment in the series.</p>
<p>And yes, it&#8217;s really all <a href="http://java.sun.com/docs/books/tutorial/ui/features/index.html">Java Swing</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

