<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; palantirtech</title>
	<atom:link href="http:///category/palantirtech/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Wed, 14 Dec 2011 17:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>The Coding Interview</title>
		<link>http://blog.palantirtech.com/2011/10/03/the-coding-interview/</link>
		<comments>http://blog.palantirtech.com/2011/10/03/the-coding-interview/#comments</comments>
		<pubDate>Mon, 03 Oct 2011 23:12:07 +0000</pubDate>
		<dc:creator>Allen Chang</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[interviewing]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1925</guid>
		<description><![CDATA[Note: this part is part two of our series on doing your best in interviews. Part one: &#8220;How to Rock an Algorithms Interview&#8221;. Here at Palantir algorithms are important, but code is our lifeblood. We live and die by the quality of the code we ship. It’s no surprise, then, that coding ability is what [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 10px; margin-bottom: 10px'><img src="/wp-content/uploads/2011/09/einstein_coding_interview.jpg" alt="Einstein Coding Interview Joke Image" title="einstein_coding_interview" width="300"/></div>
<p><span style='font-size: 0.7em'><em>Note: this part is part two of our series on doing your best in interviews.  Part one: <a href="/2011/09/26/how-to-rock-an-algorithms-interview/" title="How to Rock an Algorithms Interview" target="_blank">&#8220;How to Rock an Algorithms Interview&#8221;</a>.</em></span></p>
<p>Here at Palantir algorithms are important, but code is our lifeblood. We live and die by the quality of the code we ship. It’s no surprise, then, that coding ability is what we stress the most in our interview process. A candidate can get by with mediocre algorithm skills (depending on the role), but no one can skimp on coding.</p>
<p>Suppose you&#8217;re confident in your ability to write great software. Your task in a coding interview (of which there will be several) is to show the interviewers that you in fact do have the programming chops — that you&#8217;re an experienced coder who knows how to write solid, production-quality code.</p>
<p>This is easier said than done. After all, coding in your <a href="http://eclipse.org/">favorite IDE</a> from the comfort of <code>$familiar_place</code> is very different from coding on a whiteboard (on a problem you&#8217;re totally unfamiliar with) in a pressure-filled 45-minute interview. We realize that the interview environment is not the real world, and we adjust our expectations accordingly. Nonetheless, there are a number of things you can do to put your best foot forward during the interview.</p>
<p>First, though, we&#8217;d like to give you a sense for what we look for during a coding interview. Most important is the ability to write clean <strong>and</strong> correct code &mdash; it&#8217;s not enough just to be correct. A lot of people will be interacting with your code once you&#8217;re on the job, so it should be readable, maintainable, and extensible where appropriate. If your solution is clean and correct, and you produced it in a reasonable amount of time without a lot of help, you&#8217;re in good shape. But even if you stumble a bit, there are other ways to demonstrate your ability. As you work, we also watch for debugging ability, problem-solving and analytical skills, creativity, and an understanding of the ecosystem that surrounds production code.</p>
<p>With our evaluation criteria in mind, here are some suggestions we hope will help you perform at your very best.</p>
<p><span id="more-1925"></span></p>
<h2>Before you start coding</h2>
<ul>
<li><strong>Make sure you understand the problem.</strong> Don&#8217;t hesitate to ask questions. Specifically, if any of the problem requirements seem loosely defined or otherwise unclear, ask your interviewer to make things more concrete. There is no penalty for asking for clarifications, and you don&#8217;t want to miss a key requirement or proceed on unfounded assumptions.</li>
<li><strong>Work through simple examples.</strong> This can be useful both before you begin and after you&#8217;ve finished coding. Working through simple examples before coding can give you additional clarity on the nature of the problem — it may help you notice additional cases or patterns in the problem that you would otherwise have missed had you been thinking more abstractly.</li>
<li><strong>Make a plan.</strong> Be wary of jumping into code without thinking about your program&#8217;s high-level structure. You don&#8217;t have to work out every last detail (this can be difficult for more meaty problems), but you should give the matter sufficient thought. Without proper planning, you may be forced to waste your limited time reworking significant parts of your program.</li>
<li><strong>Choose a language.</strong> At Palantir, we don&#8217;t care what languages you know as long as you have a firm grasp on the fundamentals (decomposition, object-oriented design, etc.). That said, you need to be able to communicate with your interviewer, so choose something that both of you can understand. In general, it&#8217;s easier for us if you use Java or C++, but we&#8217;ll try to accommodate other languages. If all else fails, <a href="http://lolcode.com/">devise your own pseudo-code</a>. Just make sure it&#8217;s precise (i.e. not hand-wavy) and internally consistent, and explain your choices as you go.</li>
</ul>
<h2>While you&#8217;re coding</h2>
<ul>
<li><strong>Think out loud.</strong> Explain your thought process to your interviewer as you code. This helps you more fully communicate your solution, and gives your interviewer an opportunity to correct misconceptions or otherwise provide high-level guidance.</li>
<li><strong>Break the problem down and define abstractions.</strong> One crucial skill we look for is the ability to handle complexity by breaking problems into manageable sub-problems. For anything non-trivial, you&#8217;ll want to avoid writing one giant, monolithic function. Feel free to define helper functions, helper classes, and other abstractions to reach a working solution. You can leverage design patterns or other programming idioms as well. Ideally, your solution will be well-factored and as a result easy to read, understand, and prove correct.</li>
<li><strong>Delay the implementation of your helper functions.</strong> (this serves a corollary to the previous point) Write out the signature, and make sure you understand the contract your helper will enforce, but don&#8217;t implement it right away. This serves a number of purposes: (1) it shows that you&#8217;re familiar with abstractions (by treating the method as an API); (2) it allows you to maintain momentum towards the overall solution; (3) it results in fewer context-switches for your brain (you can reason about each level of the call stack separately); and (4) your interviewer may grant you the implementation for free, if he or she considers it trivial.</li>
<li><strong>Don&#8217;t get caught up in trivialities.</strong> At Palantir we are much more interested in your general problem solving and coding abilities than your recall of library function names or obscure language syntax. If you can&#8217;t remember exactly how to do something in your chosen language, make something up and just explain to your interviewer that you would look up the specifics in the documentation. Likewise, if you utilize an abstraction or programming idiom which admits a trivial implementation, don&#8217;t be afraid to just write out the interface and omit the implementation so you can concentrate on more important aspects of the problem (e.g., &#8220;I&#8217;m going to use a circular buffer here with the following interface without writing out the full implementation&#8221;).</li>
</ul>
<h2>Once you have a solution</h2>
<ul>
<li><strong>Think about edge cases.</strong> Naturally, you should strive for a solution that&#8217;s correct in all observable aspects. Sometimes there will be a flaw in the core logic of your solution, but more often your only bugs will be in how you handle edge cases. (This is true of real-world engineering as well.) Make sure your solution works on all edge cases you can think of. One way you can search for edge-case bugs is to&#8230;</li>
<li><strong>Step through your code.</strong> One of the best ways to check your work is to simulate how your code executes against a sample input. Take one of your earlier examples and make sure your code produces the right result. Huge caveat here: when mentally simulating how your code behaves, your brain will be tempted to project what it wants to happen rather than what actually says happen. Fight this tendency by being as literal as possible. For example, if you&#8217;re calculating a string index with code like <code>str.length()-suffix.length()</code>, don&#8217;t just assume you know where that index will land; actually do the math and make sure the value is what you were hoping for.</li>
<li><strong>Explain the shortcuts you took.</strong> If you skipped things for reasons of expedience that you would otherwise do in a &#8220;real world&#8221; scenario, please let us know what you did and why. For example, &#8220;If I were writing this for production use, I would check an invariant here.&#8221; Since whiteboard coding is an artificial environment, this gives us a sense for how you&#8217;ll treat code once you&#8217;re actually on the job.</li>
</ul>
<p>As an addendum, here are a few suggestions for books we like about the art of software construction:</p>
<p><em><a href='http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882'>Clean Code: A Handbook of Agile Software Craftsmanship</a></em> &#8211; Robert C. Martin<br />
<em><a href="http://www.cc2e.com/">Code Complete: A Practical Handbook of Software Construction</a></em> &#8211; Steve McConnell<br />
<em><a href="http://cm.bell-labs.com/cm/cs/tpop/">The Practice of Programming</a></em> &#8211; Brian Kernighan, Rob Pike<br />
<em><a href="https://secure.wikimedia.org/wikipedia/en/wiki/Design_Patterns">Design Patterns: Elements of Reusable Object-Oriented Software</a></em> &#8211; Erich Gamma, et al.<br />
<em><a href="http://java.sun.com/docs/books/effective/">Effective Java</a></em> &#8211; Joshua Bloch</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/10/03/the-coding-interview/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Hedgehog Programming Language</title>
		<link>http://blog.palantirtech.com/2011/02/02/hhlang/</link>
		<comments>http://blog.palantirtech.com/2011/02/02/hhlang/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 07:56:49 +0000</pubDate>
		<dc:creator>Kevin Simler</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1759</guid>
		<description><![CDATA[One thing about being a developer on the Palantir Finance product that doesn&#8217;t get nearly enough publicity is the fact that we have our own programming language. I&#8217;m pretty excited about it so let me repeat, with emphasis: we have our own programming language. Yeah, it&#8217;s awesome. All those late hours you spent in the [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 300px; margin-bottom: 15px; margin-left: 15px'><a target='new' href='http://www.pfinance.com/'><img src="http://blog.palantir.com/wp-content/uploads/2010/10/hedgehog.jpg" alt="" title="hedgehog" width="300" height="129" class="alignnone size-medium wp-image-1753" /></a></div>
<p>One thing about being a developer on the Palantir Finance product that doesn&#8217;t get nearly enough publicity is the fact that we have our own programming language.  I&#8217;m pretty excited about it so let me repeat, with emphasis:  <em><strong>we have our own programming language</strong></em>.  Yeah, it&#8217;s awesome.  All those late hours you spent in the lab working on your final project in compilers:  turns out they&#8217;re actually good for something other than getting into grad school.</p>
<p>Building this language ourselves &#8212; as opposed to, say, using an existing language that already just works &#8212; wasn&#8217;t an easy decision.  In fact, it wasn&#8217;t even a single decision.  We wracked our collective brain dozens of times trying to think of a better approach.  But every which way we sliced it, the problems we needed to solve always pointed to building our own language.  I still question this decision sometimes, but on the whole I&#8217;m very happy with how things have turned out.</p>
<p><span id="more-1759"></span></p>
<p>The Palantir Finance programming language &#8212; Hedgehog as we know it &#8212; is an interpreted, statically typed, object-oriented language. With a syntax that&#8217;s based loosely on Java, it mixes roughly Java-style semantics and a few idiosyncrasies that make it a really interesting case study in language design.  It&#8217;s built to be extremely efficient for batch operations on time series, which is the heavy lifting in financial analysis.  It also allows you to dynamically add methods to a class from outside the class itself (conceptually similar to <a href="http://juixe.com/techknow/index.php/2006/06/15/mixins-in-ruby/">Ruby&#8217;s Mixins</a>) &mdash; you define the function and its input type, and when you type the dot operator, your new method is auto-completed alongside all the &#8220;native&#8221; methods.  Hedgehog also has a vast number of effectively global constants: all the stocks, bonds, and other financial instruments that are essential to the user experience, but that make for quite a design challenge.</p>
<p>I&#8217;m not a language guy myself, so instead of continuing to geek out over the core language features, I want to geek out about an emergent property that&#8217;s truly unique to the Hedgehog language.  But first I&#8217;m going to back up and talk about something else that&#8217;s really important to us at Palantir:  user experience. (I&#8217;ll get back to languages I promise.)</p>
<p>There&#8217;s a UX principle that says your interface should be &#8220;low threshold, high ceiling&#8221;. That is, it should be easy for the user to get started, but also able to do powerful things.  This is actually a corollary of a more general principle:  that your interface should strive for the <strong>optimal learning curve</strong>.  My first CS professor explained this with a set of three diagrams, each representing one of the major OS families.  I don&#8217;t remember exactly how he drew these diagrams at the time, but an updated version of them might look like this:</p>
<div style='text-align: center'><img style='margin-auto' src="http://blog.palantir.com/wp-content/uploads/2010/10/learning_curves.png" alt="" title="learning_curves" class="alignnone size-medium wp-image-1748" /></div>
<p>The x-axis of each curve represents &#8220;wizardry,&#8221; a measure of the user&#8217;s technical sophistication.  The y-axis represents the power of the system &#8212; how much the user can accomplish at a given level of wizardry.</p>
<p>The best of the three curves, my prof argued, was the third curve.  The first learning curve is great for providing incentives to learn.  Each unit of effort spent to increase your wizardry yields an appropriate amount of reward or power.  The drawback is that it&#8217;s hard for new users to do anything useful; its reward threshold is too high.  The middle curve has a lower threshold and is better for novice users, but will frustrate an intermediate user because of the great plateau in the middle.  (This might represent a place where the GUI isn&#8217;t powerful enough for the tasks you want to accomplish but scripting is still too difficult, leaving no way to express your commands)  The third curve, however, is the best of both worlds:  a low threshold and a smooth trajectory to the top.</p>
<p>Now let&#8217;s apply this back to our topic at hand, programming languages.  Specifically, what does the learning curve look like for learning a first language?  (Once you&#8217;ve learned one, of course, the rest come pretty easily.)</p>
<div style='float: right; width: 469px; margin-bottom: 15px; margin-left: 15px'><img src="http://blog.palantir.com/wp-content/uploads/2010/10/learning_curves_languages.png" alt="" title="learning_curves_languages" class="alignnone size-medium wp-image-1749" /></div>
<p>If your experience of learning to program was anything like mine, the first few projects in your first language were <em>painful</em>.  You could sense the power further up the curve &#8212; it&#8217;s what convinced you to stick with CS &#8212; but simple tasks took a lot more effort than they should have, at the beginning.</p>
<p>Hedgehog on the other hand &#8212; our little homebrew that will someday have its own Wikipedia page &#8212; has the smoothest learning curve I&#8217;ve ever seen in a programming language.  That&#8217;s the emergent property I wanted to talk about, because it&#8217;s a thing of beauty.  You can get started with Hedgehog right away and accomplish quite a bit &mdash; without even knowing that you&#8217;re &#8220;programming&#8221; and the slope on the curve stays relatively constant throughout your trajectory.</p>
<p>We didn&#8217;t realize it at the time, but we were probably destined to create a low-threshold, high-ceiling language with a smooth learning curve, due to the nature of our user base.  Financial analysts are impatient, and they still need to perform many kinds of complicated analysis.  They definitely don&#8217;t have the time or inclination to spend a semester learning how to program.  The solution to their problem is Hedgehog.</p>
<p>Allow me to illustrate with one of the earliest things a user might type into the expression bar:</p>
<div style='text-align: center'><img style='margin-auto' src="http://blog.palantir.com/wp-content/uploads/2010/10/ibm.png" alt="" title="ibm" class="alignnone size-medium wp-image-1746" /></div>
<p>And that&#8217;s it.  The user types a ticker symbol and he gets a chart of IBM&#8217;s stock price.  At no point did he have to wonder about variables or types or #includes.  This experience is so <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">frictionless</a> he probably doesn&#8217;t even realize he&#8217;s writing code in a programming language.  He just starts with what he knows, and the system gives him what he wants.</p>
<p>It starts to get interesting as you move further up the curve.  Take this user input:</p>
<div style='text-align: center'><img style='margin-auto' src="http://blog.palantir.com/wp-content/uploads/2010/10/ibm_volume.png" alt="" title="ibm_volume" class="alignnone size-medium wp-image-1747" /></div>
<p>Of course that innocent dot between &#8220;IBM&#8221; and &#8220;volume&#8221; means a method invocation to anyone who&#8217;s familiar with C++ or Java.  But to a new Palantir Finance user it simply means, &#8220;Let me access all the types of data associated with IBM.&#8221;  Conceptually painless.</p>
<p>Or how about this one?</p>
<div style='text-align: center'><img style='margin-auto' src="http://blog.palantir.com/wp-content/uploads/2010/10/histogram.png" alt="" title="histogram" class="alignnone size-medium wp-image-1745" /></div>
<p>The <code>volume/1000</code> expression is an anonymous method acting in the scope of a Stock object; it&#8217;s syntactic sugar for <code>return this.volume()/1000;</code>.  But by allowing the user to strip away all the unnecessary syntax, we make learning the language that much easier.</p>
<p>I could go on tracing the curve here (I&#8217;ve only scratched the surface), but I hope I&#8217;ve made my point: we coax new users into writing code by making it look as much as possible like performing operations that they already intuitively understand.  This is one of the benefits of creating a domain-specific language &mdash; we got the richness of the domain for free, and all the understanding that comes with it &mdash; and then we went above and beyond the simplification of a traditional <a href="https://secure.wikimedia.org/wikipedia/en/wiki/Domain-specific_language">DSL</a> to really pare down the complexity of the language for novice users.</p>
<p>From simple beginnings like the ones I&#8217;ve shown here, it doesn&#8217;t take our users long at all to cross the threshold to more intermediate-level work, such as chaining function calls together or creating their own methods.  As far as the high ceiling goes, we&#8217;re still working on it, but the language is currently capable of producing not only a <a href="http://en.wikipedia.org/wiki/Quine_(computing)">quine</a>, as one of our candidates showed us (yes, we ended up hiring him), but also code that can generate studies like the one below:</p>
<p><a href="http://blog.palantir.com/wp-content/uploads/2010/10/dashboard.png"><img src="http://blog.palantir.com/wp-content/uploads/2010/10/dashboard.png" alt="" title="dashboard" width="100%" class="alignnone size-medium wp-image-1744" /></a></p>
<p>So Hedgehog has a low threshold and a smooth learning curve, and the ceiling is high enough that our users can do some really serious information processing with it &#8212; tasks that would make their other tools break down and cry.  But there&#8217;s still a lot of interesting work for us to do, especially in pushing the language&#8217;s ceiling higher (developing better interactive debugging; working with large objects efficiently) &mdash; and as always, making it <em>faster</em>.</p>
<p><em>If you&#8217;d like to see the Hedgehog Programming Language in action, you can sign up for an account at <a href='http://joyride.pfinance.com/'>Palantir JoyRide</a>. the <a href="http://www.pfinance.com/">Palantir Finance</a> public demo.</a></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/02/02/hhlang/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Help! Is there a doctor in the network???</title>
		<link>http://blog.palantirtech.com/2010/07/23/help-is-there-a-doctor-in-the-network/</link>
		<comments>http://blog.palantirtech.com/2010/07/23/help-is-there-a-doctor-in-the-network/#comments</comments>
		<pubDate>Fri, 23 Jul 2010 23:33:01 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1427</guid>
		<description><![CDATA[Cyber security is a hot topic, especially in national security circles. The world has witnessed a number of high-profile incidents in the past two years that have been notable for sharing three very important aspects: they were targeted attacks, carried out against specific institutions they were politically motivated, and, inconclusively, appear to be state-sponsored they [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 250px; margin-left: 15px; margin-bottom: 15px;'>
<img src='http://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Botnet.svg/500px-Botnet.svg.png' width='250'/>
</div>
<p>Cyber security is a hot topic, especially in national security circles.  The world has witnessed a number of high-profile incidents in the past two years that have been notable for sharing three very important aspects: </p>
<ul>
<li>they were targeted attacks, carried out against specific institutions
</li>
<li>they were politically motivated, and, inconclusively, appear to be state-sponsored
</li>
<li>they used multiple-step, multi-vectors attacks and managed to evade existing security countermeasures
</li>
</ul>
<p>This deviates from the types of attacks that IT-centric approaches have sought to defend networks against.  Traditional approaches neutralize the perceived threats against a network with a host of countermeasures: firewalls, malware scanners, automated network vulnerability scanning, patch policies, and intrusion detection systems.  The network defenses can learn new tricks when the administrators update the signatures, or, for certain types of data, employ a <a href="http://en.wikipedia.org/wiki/Bayesian_inference">Bayesian inference</a> strategy (<a href="http://www.paulgraham.com/spam.html">as has been employed to fight spam</a>).  This approach does a good job of protecting against untargeted attacks as well as weak targeted attacks.  </p>
<p>Full network defense requires human analysts looking at anomalies at a level above the automated countermeasures.  Check out the rest of this post to take a look at how <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">human-driven, computer-aided analysis is a game changer</a> in cyber security.</p>
<p><span id="more-1427"></span></p>
<h2>A classic doctrine: the immune system</h2>
<p>If you&#8217;ve worked in network security, you&#8217;re undoubtedly familiar with most  (if not all) of the countermeasure systems listed above.  The question we don&#8217;t often ask is: </p>
<blockquote><p>What is the defensive doctrine being employed by this security architecture?</p></blockquote>
<p>Classic network security can be summed up as this philosophy: </p>
<blockquote><p>Become unattractive as a target-of-opportunity to the legions of script kiddies and somewhat more sophisticated opportunists who search for network defenses they can easily breach.  </p></blockquote>
<p>The goal of the IT-based approach is to be a tougher nut to crack than the network next door. Attackers throw themselves against the defenses, find no exploitable vulnerabilities and move on to the next target-of-opportunity. </p>
<p>As the old joke goes: when a tiger attacks your safari group, you don’t have to run faster than the tiger, you just need to run faster than your friends. We might rewrite that today as: <em><a href="http://en.wikipedia.org/wiki/Leet">when the &#8216;l33t h4cker comes a&#8217;knocking in your network neighborhood, just make sure that you&#8217;re less of a n00b than the next guy and you&#8217;ll probably avoid getting pwned too hard</a>.</em></p>
<p>And so we&#8217;re faced with this reality: today&#8217;s state-of-the-art network defense is a patchwork system of automated countermeasures designed to stop dumb, undirected, automated attacks. This architecture is not unique to cyber security &mdash; it has a close analog in biology. </p>
<p>The human immune system produces antibodies that recognize and defend against specific attacks; it learns over time through successful defense of the organism and, more recently, vaccinations. <a href="http://www.nytimes.com/2010/07/13/science/13micro.html?_r=1&#038;pagewanted=all">Millions of bacteria and viruses are foiled every day by immune systems</a>. We can observe this same pattern in cyberspace: hijacked systems tirelessly scour the Internet&#8217;s address space, looking for hapless networks ripe for takeover. <a href='http://blogs.forbes.com/firewall/2010/06/04/just-how-big-is-the-cyber-threat-to-dod/'>The Pentagon is probed something like 250,000 times a day</a>.</p>
<p>It would be insanity to connect a network to the modern Internet without security countermeasures in place to defend against these sort of attacks.  However, while they are necessary to the task of securing a network, they are certainly not sufficient.</p>
<h2>Targeted attacks: slipping past the immune system</h2>
<div style='text-align: center; float: right; width: 250px; margin-left: 15px; margin-bottom: 15px;'>
<a href='http://www.dpd.cdc.gov/DPDx/HTML/Hookworm.htm'><img src='http://www.dpd.cdc.gov/DPDx/images/ParasiteImages/G-L/Hookworm/Hookworm_LifeCycle.gif' width='250'/><br/><br />
<span style='font-size: 0.8em; text-align: center; font-style: italic'>The Lifecycle of Hookworm</span></a>
</div>
<p>The countermeasures discussed thus far are essential but not infallible and can be bypassed by things like never-before-seen viruses or carefully crafted penetration attempts.  In the biological domain a targeted attack might come in the form of <a href="http://www.ncbi.nlm.nih.gov/pubmed/20208540">HIV</a> (evolved to slip past the immune defenses), a toxin (non-biological, nothing the immune system can do), or a parasite.</p>
<h3>The original crafty adversary</h3>
<p>A parasite can survive and thrive inside its host while <a href="http://jbiol.com/content/8/7/62">evading or suppressing the normal immune response to invaders</a> . They take up comfortable residence inside the body of their host, using it as source of food and protection; finally, they use the host as a place to reproduce and spread to other individuals in the host species.  Parasites don&#8217;t generally kill or gravely harm their hosts (or at least they don&#8217;t do it quickly), as it&#8217;s in their own self-interest to have the host continue living.</p>
<h3>Targeted parasite networks: GhostNet and the Shadow network</h3>
<p>Cyber analog?  You betcha: <a href='http://www.google.com/corporate/execs.html#vint'>Vint Cerf</a> was quoted just last week, <a href='http://voices.washingtonpost.com/fasterforward/2010/07/vint_cerf_at_palantir_night_li.html'>&#8220;The hackers don&#8217;t want to destroy the network. They want to keep it running, so they can keep making money from it.&#8221;</a></p>
<p><a href="http://citizenlab.org/">The Citizen Lab</a>, a University of Toronto-based non-profit that does in-depth, hands-on, technical research in the cyber security domain had this to say:</p>
<blockquote><p>Crime and espionage form a dark underworld of cyberspace. Whereas crime is usually the first to seek out new opportunities and methods, espionage usually follows in its wake, borrowing techniques and tradecraft.
</p></blockquote>
<p>That&#8217;s in the foreword from their recent report, &#8220;<a href='http://www.scribd.com/doc/29435784/SHADOWS-IN-THE-CLOUD-Investigating-Cyber-Espionage-2-0'>Shadows in the Cloud: Investigating Cyber Espionage 2.0</a>&#8220;.  The report details their experiences tracking down the size, scope, and tradecraft behind a massive cyber-espionage botnet, dubbed <a href="http://en.wikipedia.org/wiki/GhostNet">GhostNet</a>:</p>
<blockquote style='text-align: justify;'><p><a href='http://www.scribd.com/doc/13731776/Tracking-GhostNet-Investigating-a-Cyber-Espionage-Network'>Tracking GhostNet: Investigating a Cyber Espionage Network</a> <em>[their first report on this botnet]</em> was the product of a ten-month investigation and analysis focused on allegations of Chinese cyber espionage against the Tibetan community. The research entailed field-based investigations in India, Europe and North America working directly with affected Tibetan organizations, including the Private Office of the Dalai Lama, the Tibetan Government-in-Exile, and several Tibetan NGOs in Europe and North America. The fieldwork generated extensive data that allowed us to examine Tibetan information security practices, as well as capture evidence of malware that had penetrated Tibetan computer systems. We also engaged in extensive data analysis and technical investigation of web-based interfaces to command and control servers that were used by attackers to send instructions to, and receive data from compromised computers.</p>
<p>The report documented a wide ranging network of compromised computers, including at least 1,295 spread across 103 countries, 30 percent of which we identified and determined to be &#8220;high-value&#8221; targets, including ministries of foreign affairs, embassies, international organizations, news organizations, and a computer located at NATO headquarters.</p></blockquote>
<p>These attacks used carefully forged email attacks, known as <a href='http://www.fbi.gov/page2/april09/spearphishing_040109.html'>spearphishing</a>, to entice their targets to unknowingly infect themselves with remote control software. The infections allowed the attackers to exfiltrate data from compromised machines and use them as springboards to attack other systems using similar targeted attacks.  <a href="http://www.dpd.cdc.gov/dpdx/html/hookworm.htm">Sound familiar?</a></p>
<h2>A New Doctrine: The Doctor</h2>
<p>Without an immune system, we&#8217;d be dead within hours; our immune system is absolutely necessary but, again,  not sufficient to keep us healthy.  For those things that the immune system can&#8217;t take care of, we use doctors.  Doctors are adaptive adversaries to disease: they can run tests, they can talk to the patient, they can apply insights learned from other patients or diseases.  Most importantly, a doctor has a much more <a href='http://jokesareawesome.com/joke/932/what_s_the_difference_between_god_and_a...'>omniscient view of the patient</a> than the immune system.</p>
<h3>Network Security &#8211; a 10,000 ft. discipline</h3>
<p>Applying this approach to the network enables security responses that can actually counter targeted attacks. A security officer (our network&#8217;s &#8220;doctor&#8221;) starts an investigation with some sort of anomalous event, a unexpected IP address in a log, an alert from intrusion detection system. </p>
<p>Remember that a runny nose or flagged packet is not an illness or a network compromise, it&#8217;s a symptom.  Symptoms suggest causes, but are only clues. Taken in isolation, they don&#8217;t often offer conclusive information on the health of the patient. In fact, finding the root cause of a symptom (a <a href="http://en.wikipedia.org/wiki/Diagnosis">diagnosis</a>) requires the synthesis of multiple sources of data into a complete, coherent picture of the network or patient.  This often includes things that you can&#8217;t see in the blood or packet stream, like understanding where the patient or user has travelled, what environmental factors might be present in their home, existing allergies, open wireless networks, insecure web apps, drug use, etc.</p>
<h3>Node health vs. network health</h3>
<p>A node gets an infection on your network? <a href="http://www.thinkgeek.com/tshirts-apparel/unisex/frustrations/ad98/">Re-image it, the symptoms go away</a>. In the domain of human medicine, re-imaging of humans when they get sick has not yet gained FDA approval &ndash; something doctors have been uttering oaths about since way before the days of Hippocrates.</p>
<p>But it&#8217;s not the symptoms we&#8217;re after, it&#8217;s the root cause.  Couple that with how easy it is to treat the symptoms via re-imaging, and security officers are more akin to public health officials, more concerned about the overall health of the network than the health of a single node.  This broader concern manifests as an instant list of begged questions about any security anomaly on the network:</p>
<ul>
<li>How did this happen?  Was it a machine (network exploit) or human vector (somebody clicked on something they shouldn&#8217;t have)?</li>
<li>What is the extent of this infection?  Is it limited to a single node?  <a href="http://www.youtube.com/watch?v=EVekNsgUqn4">Why does this small moon appear to have a tractor beam locked on to our ship?</a></li>
<li>Is this part of a larger attack?  What is the true target of this attack? <a href="http://www.youtube.com/watch?v=dddAi8FF3F4">Is this a trap?</a></li>
<li>Do the tracks lead out of or deeper into my network? Was this an inside job?  Did I find an intermediary node in a multi-node penetration?
<li>Who is behind this attack and why do they want in? Can I match this modus operandi with any other known attacks on this or other networks?</li>
<li>How do I prevent this sort of attack in the future?  Do I need to deploy new countermeasures, re-architect parts of the network, and/or teach my people to be more careful?</li>
</ul>
<p>The answer to any of these questions does not appear in a single log file on your network, no more than any single antibody can tell you that the H1N1 flu you&#8217;re now infected with came from the grocery clerk who got it from her boyfriend who, in turn, acquired it on his recent trip to Mexico.</p>
<p>The trees don&#8217;t know how big the forest is.</p>
<h3>Cyber security doctors</h3>
<div style='text-align: center; float: right; width: 250px; margin-left: 15px; margin-bottom: 15px;'>
<a href='http://home.uchicago.edu/~bleakley/graphical_summaries/hookworm_paper_in_graphs.html'><img src='http://upload.wikimedia.org/wikipedia/commons/c/c6/Hookworm_Examination.jpg' width='250'/><br/><br />
<span style='font-size: 0.8em; text-align: center; font-style: italic'>A doctor examines a boy looking for hookworm.</span></a>
</div>
<p>The way to find the answers to these questions is to <em><strong>give a skilled, experienced analyst powerful tools to use against all the data about the attack on all of the systems on your network mashed up with relevant data about the messy meatspace that contains the computers, users, and attackers in question</strong></em>.  </p>
<p>You need firewall logs, intrusion detection system logs, malware detection logs, badge logs to determine who had physical access to the network, travel records of where you expect your employees to be logging into the VPN from, and a dozen other sources of data that are unique to this network.</p>
<p>The data is not enough &mdash; they must to be accessible in a way that enable expedient analysis. In most shops, many of the aforementioned data sources exist, but accessing and cross-referencing them requires a high-level of technical fluency in the storage systems themselves, <em>even for a user that has strong grasp of the story that the data are telling</em>.  Some combination of SQL, shell, grep, awk, sed, perl, and <a href="http://en.wikipedia.org/wiki/Visual_inspection">Mk I Eyeball</a> are used to suss out answers from the data.  It&#8217;s a slow, fragile, error-prone game, and the bar is high to even begin playing.</p>
<p>Whenever computers are recording information about the activities of other computers, the data gets big and it gets big fast. For example, grep is a very powerful and flexible tool, but its linear search through data starts to falter as the data size exceed about 10 GB on rotational media.</p>
<p>In order address and solve these sorts of problems, the world needs a platform with the following properties:</p>
<ul>
<li>Has access to all known information about a given incident</li>
<li>Makes querying and exploring relationships conceptual and interactive</li>
<li>Scale to handle large data sizes</li>
</ul>
<p>It probably looks something like this:</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/cyber/cyber1.flv" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/cyber/cyber1.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/government/videos/cyber/cyber1.flv"/></object> </p>
<p><a href="http://media.palantirtech.com/government/videos/cyber/cyber1.wmv">Download</a> the WMV (50 MB) | <a href="http://media.palantirtech.com/government/videos/cyber/cyber1.asx">Streaming Windows Media</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/07/23/help-is-there-a-doctor-in-the-network/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In the spirit of the season: The Family Giving Tree</title>
		<link>http://blog.palantirtech.com/2008/12/18/in-the-spirit-of-the-season-the-family-giving-tree/</link>
		<comments>http://blog.palantirtech.com/2008/12/18/in-the-spirit-of-the-season-the-family-giving-tree/#comments</comments>
		<pubDate>Thu, 18 Dec 2008 21:16:43 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[palantirtech]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=163</guid>
		<description><![CDATA[Palantir is an intense place to work. There are people here around the clock (since developers set their own schedules) and folks and equipment arriving and leaving all the time. We&#8217;re a very focused bunch, trying to change the world as fast as we can by creating a whole new class of tools. However, we&#8217;re [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 210px; text-align: right;'><a href=''><img src='http://www.familygivingtree.org/about_us/images/logo.gif' width='200'/></a></div>
<p>Palantir is an intense place to work.  There are people here around the clock (since developers set their own schedules) and folks and equipment arriving and leaving all the time.  We&#8217;re a very focused bunch, trying to change the world as fast as we can by creating a whole new class of tools.</p>
<p>However, we&#8217;re not just people who build software; we&#8217;re sons and daughters, mothers and fathers and citizens of our community.  As we headed into the holidays, Palantir employees decided to give something back: we signed up with a local organization call <a href="http://www.familygivingtree.org/about_us/about_us.html">The Family Giving Tree,</a> a now national charity that started as an MBA project out of San Jose State University.</p>
<p>The Family Giving Tree is unique in that it allows children to request the presents that they want.  In this way, rather than putting money into a <a href="http://en.wikipedia.org/wiki/Black_box">black box</a> of a charity, you purchase the gift itself and donate that.</p>
<p>The people of Palantir Technologies purchased over 100 gifts, fulfilling the holiday wishes of the children that asked for them as well as cash donations that will buy gifts for at least 40 more.</p>
<div class='postimg' style='text-align: center; margin-bottom: 20px'>
<a href="http://blog.palantirtech.com/wp-content/uploads/2008/12/giving-tree-presents.jpg"><img src="http://blog.palantirtech.com/wp-content/uploads/2008/12/giving-tree-presents.jpg" alt="" title="giving-tree-presents" width="640" /></a><br />
<a href="http://blog.palantirtech.com/wp-content/uploads/2008/12/giving-tree-presents.jpg"><i>The pile of presents collected by Palantir employees for The Family Giving Tree.</i></a>
</div>
<p>Happy Holidays, everyone!  We&#8217;ll be back next year with more technical articles and information about Palantir.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2008/12/18/in-the-spirit-of-the-season-the-family-giving-tree/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Palantir in the wild: Palantir Government Conference</title>
		<link>http://blog.palantirtech.com/2008/10/13/palantir-government-conference/</link>
		<comments>http://blog.palantirtech.com/2008/10/13/palantir-government-conference/#comments</comments>
		<pubDate>Mon, 13 Oct 2008 22:48:56 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[palantir]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=130</guid>
		<description><![CDATA[On Oct. 9th, Palantir hosted our quarterly Government Conference in the DC area. The idea was to bring together customers of Palantir Government from across the defense and intelligence community to create a forum for them to: Talk candidly about their experiences using Palantir Discuss the many different domains they apply our technology against, everything [...]]]></description>
			<content:encoded><![CDATA[<div style="background: black none repeat scroll 0% 0%; float: right; width: 300px; height: 186px; margin-left: 10px; margin-bottom: 10px;"><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="300" height="186" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://blog.palantirtech.com/wp-content/uploads/2008/10/gov-conf-banner-300x186.swf" /><embed id="banner" type="application/x-shockwave-flash" width="300" height="186" src="http://blog.palantirtech.com/wp-content/uploads/2008/10/gov-conf-banner-300x186.swf" bgcolor="#000000" quality="high"></embed></object></div>
<p>On Oct. 9th, Palantir hosted our quarterly Government Conference in the DC area.  The idea was to bring together customers of <a href="http://www.palantirtech.com/videos">Palantir Government</a> from across the defense and <a href="http://en.wikipedia.org/wiki/Intelligence_community">intelligence community</a> to create a forum for them to:</p>
<ul>
<li>Talk candidly about their experiences using Palantir</li>
<li>Discuss the many different domains they apply our technology against, everything from <a href="http://www.palantirtech.com/cyber">cyber defense</a> to <a href="http://www.palantirtech.com/fullct">counter-terrorism</a> to counter-proliferation</li>
<li>Share experiences deploying our large <a href="http://blog.palantirtech.com/2008/10/07/deploying-a-distributed-system/">distributed systems</a></li>
<li>Learn about and see what new features and capabilities are in the pipeline for our next quarterly release</li>
</ul>
<p>Most of the conference time is allocated to our government customers to present information on how they are using Palantir to provide deep mission impact.  While this is only the second conference we have held using this open, customer-focused forum, nearly 200 people attended.</p>
<p>The speakers included:</p>
<ul>
<li>Lt. Col. Robert “Pic” Piccerillo (ret), from the Counter IED Operations Integration Center (COIC)</li>
<li>David Arsenault, Assistant Department Head at <a href="http://www.mitre.org/">MITRE</a></li>
<li>Mike Jennings, an intelligence analyst from the <a href="http://www.fbi.gov/">FBI</a></li>
</ul>
<p>In addition to presentations/demonstrations from our customers, there were several presentations of new functionality and demos by us—including demonstrations of our:</p>
<ul>
<li><a href="http://www.palantirtech.com/platform">Application platform</a>, which allows customers to easily extend Palantir’s frontend by writing applications and helpers that embed in our platform framework</li>
<li>New geospatial capabilities, including geosearch, geotagging, and other integrated workflows not seen elsewhere</li>
<li>PalantirWeb—the new Palantir thin client/web frontend for expanded organizational integration</li>
</ul>
<p>We also had a very special presentation from Jeff Carr, author of the <a href="http://intelfusion.net/wordpress/">IntelFusion blog</a>.  Jeff launched an <a href="http://en.wikipedia.org/wiki/Open_source_intelligence">open-source intelligence</a> effort to analyze the actors and nature of the <a href="http://blog.wired.com/defense/2008/08/georgia-under-o.html">cyber war launched against Georgia</a> that paralleled the Russian invasion called <a href="http://www.palantirtech.com/greygoose">Project Grey Goose</a>.  Jeff presented some very compelling analytic tradecraft used in and some preliminary results from Project Grey Goose.  The iteration 1 report comes out next week!</p>
<div style="margin: 4px; display: inline; clear: none; text-align: center; width: 200px;"><a href="http://blog.palantirtech.com/wp-content/uploads/2008/10/img_9009.jpg"><img title="Customer presentation on Palantir" src="http://blog.palantirtech.com/wp-content/uploads/2008/10/img_9009.jpg" alt="Customer presentation on Palantir" width="200" /></a></div>
<div style="margin: 4px; display: inline; clear: none; text-align: center; width: 200px;"><a href="http://blog.palantirtech.com/wp-content/uploads/2008/10/img_9000.jpg"><img title="Palantir Government Conference" src="http://blog.palantirtech.com/wp-content/uploads/2008/10/img_9000.jpg" alt="Palantir Government Conference" width="200" /></a></div>
<div style="margin: 4px; display: inline; clear: none; text-align: center; width: 200px;"><a href="http://blog.palantirtech.com/wp-content/uploads/2008/10/img_9004.jpg"><img title="Palantir Government Conference" src="http://blog.palantirtech.com/wp-content/uploads/2008/10/img_9004.jpg" alt="Palantir Government Conference" width="200" /></a></div>
<p>All in all, the conference went extremely well:  it was gratifying for the Palantir team to see some of the innovative uses of the product. When your users are surprising and delighting you with the depth and quality of analysis they&#8217;re presenting back to you, you know you&#8217;re building and selling the right platform to truly change the way that people relate to data.</p>
<p>We&#8217;re witnessing the end of the data age and the first sparks of the age of analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2008/10/13/palantir-government-conference/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Palantir: so what is it you guys do?</title>
		<link>http://blog.palantirtech.com/2007/12/04/what-do-we-do/</link>
		<comments>http://blog.palantirtech.com/2007/12/04/what-do-we-do/#comments</comments>
		<pubDate>Tue, 04 Dec 2007 08:01:18 +0000</pubDate>
		<dc:creator>Kevin Simler</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/12/04/what-do-we-do/</guid>
		<description><![CDATA[I often ask candidates if they&#8217;re familiar with what we do at Palantir. Most people think they are. &#8220;Oh, you&#8217;re that data viz. company,&#8221; or, worse, &#8220;You guys do data mining, right?&#8221; At least they&#8217;ve heard of us and at least they&#8217;re on the right track, but I cringe anyway. We aren&#8217;t just a &#8220;data [...]]]></description>
			<content:encoded><![CDATA[<p>I often ask candidates if they&#8217;re familiar with what we do at Palantir.  Most people think they are.  &#8220;Oh, you&#8217;re that data viz. company,&#8221; or, worse, &#8220;You guys do data mining, right?&#8221;  At least they&#8217;ve heard of us and at least they&#8217;re on the right track, but I cringe anyway.  We aren&#8217;t just a &#8220;data visualization&#8221; company and we don&#8217;t do &#8220;data mining.&#8221;  It&#8217;s almost impossible to convey the scope and complexity of what we do in a few short minutes&#8212;or to do so without taking the conversation to an eye-glazing level of abstraction.</p>
<p>The following is my attempt at describing what we do at a high level without oversimplifying.  I hope that after reading this a candidate will &#8216;get&#8217; what we&#8217;re about, or at least understand enough not to apply tiny labels to our expansive vision.</p>
<p><span id="more-82"></span></p>
<h2>The problem: implementing analysis</h2>
<p>At Palantir we specialize in <strong>analysis</strong>.</p>
<p>Yes, that&#8217;s painfully abstract, and I&#8217;ll get to it in a second.</p>
<p>In real-world terms, we are building a <strong>software platform</strong> that enables people to take whatever data is relevant to them and understand it more easily and thoroughly than ever before, using concepts that they already understand.  And we are applying this vision, at first, to solving problems in the finance sector and the government intelligence community.</p>
<p>The first important thing to note is that we don&#8217;t actually do the analysis ourselves.  We don&#8217;t devise winning trading strategies and we don&#8217;t catch terrorists.  We write software that enables other people to pull off these feats.  These people, experts in their respective fields, are called <em>analysts.</em></p>
<p>So what exactly do analysts do?  What is analysis?</p>
<blockquote><p>Analysis is everything necessary to extract <strong>insight</strong> from <strong>information</strong>.</p></blockquote>
<p>Let&#8217;s break that down a bit.</p>
<p>Information is easy:  It&#8217;s data.  It lives in a relational database or as files indexed on a hard drive, and you can easily run queries against it.  It comes in two forms, structured and unstructured.  And there is <em>a lot</em> of it in the modern world &#8211; too much, actually, for current tools to make sense of.</p>
<p>Insight is trickier.  Insight is something only a person can generate, and understanding this is critical for any organization that wants to do analysis right.  Thus the challenge of data analysis is how to bring vast amounts of information into productive contact with human intelligence.  In other words, the challenge is how to <em>enable the analyst</em>.</p>
<p>From the analyst&#8217;s perspective there are five essential features of an analysis platform:</p>
<ol>
<li>First, and most important, <em><strong>the analyst should be in control</strong></em>.  In other words, the primary way of interacting with an analysis tool should be <em>human-driven queries</em>.  While automated approaches can complement a human-driven approach, there simply is no substitute for human intelligence.  Unless you put a person behind the wheel, the system can never be flexible or creative enough to uncover truly original insight.  Artificial Intelligence just isn&#8217;t there yet.</li>
<li>Ability to <em><strong>summarize large data sets</strong></em>.  Some of this is what has traditionally been called data mining:  the largely automated approach&#8212;using machine learning or other statistical techniques&#8212;of processing lots of data at once and extracting nuggets that capture something interesting about the data.  Unlike Palantir, traditional approaches have focused almost exclusively on this aspect of analysis.</li>
<li>Ability to <em><strong>visualize large data sets</strong></em>.  Here the analyst wants interesting and informative ways of viewing data graphically, to make it easier for him to digest.  The analyst wants more than just a summary of the data; he wants a nuanced view of what&#8217;s going on <em>inside</em> these data sets:  What&#8217;s the overall shape of the distribution?  What are the outliers?  What are important structures within the data?</li>
<li>Ability to <em><strong>iterate rapidly</strong></em>.  This means enabling the analyst to ask a question, get the answer, and then quickly ask either a variant on the initial question or a follow-up question that depends on the answer to the initial question.  This rapid, iterative process allows the analyst to quickly test out hypotheses and develop theories about what&#8217;s going on in the data, and by extension to discover what&#8217;s going on in the world.</li>
<li>Ability to <em><strong>collaborate with other analysts</strong></em>.  Getting a handle on a terabyte of data, especially when it comprises multiple data types, is definitely more than a one-person job.  Any organization that&#8217;s serious about understanding the world needs a team of analysts that can work together as more than the sum of its parts.  This requires the ability for one analyst to effortlessly share the results of his analysis with his colleagues.</li>
</ol>
<h2>The Palantir approach</h2>
<p>That&#8217;s what analysis looks like to the analyst, or rather what it should look like in an ideal world.  (Current tools fall far short of this vision.)  So what do <em>we</em> do at Palantir in order to make analysis this smooth and easy?</p>
<p>You could say that we help summarize large data sets, in the sense that we have to provide the analyst with a rich library of techniques and algorithms.  You could also say that we do visualization, in the sense that we have to provide the analyst with a set of interesting and informative ways of visualizing their data.  We do both of these things, and we have to be creative and solve hard problems in order to add value in these areas.  But we do a lot more than that.</p>
<p>Probably the most central hard problem that we address in trying to enable the analyst is <strong>data modeling</strong>, the process of figuring out what data types are relevant to a domain, defining what they represent in the world, and deciding how to represent them in the system.  At Palantir we make sure our data model (ontology) is both flexible and dynamic, and that it mirrors the concepts people naturally use when reasoning about the domain.  This is no small challenge, but we&#8217;re already making it a reality.  In finance our basic data types include financial instruments, dates, portfolios, indices, and strategies&#8212;the same things that financial researchers think about, talk about, and reason with.  In the intelligence product our basic data types include people, places, and events (all with associated properties), which is exactly the way we all represent the world in our minds.</p>
<p>Data modeling, data summarization, and data visualization are the core disciplines for approaching large data sets.  Human-driven queries, rapid iteration, and collaboration are multipliers, taking the power unlocked by the core disciplines to the next level.  When these pieces are brought together in a coherent system, the result is in an analysis platform both very generic and very powerful.</p>
<p>This is what we mean when we say that we&#8217;re changing the way people approach data.  Welcome to the future of analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/12/04/what-do-we-do/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

