<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; problemspace &#8211; finance</title>
	<atom:link href="http://blog.palantirtech.com/category/problemspace-finance/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.palantirtech.com</link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Fri, 23 Jul 2010 23:33:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A rigorous friction model for human-computer symbiosis</title>
		<link>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/</link>
		<comments>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 03:18:52 +0000</pubDate>
		<dc:creator>Asher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1344</guid>
		<description><![CDATA[


This is a response to Ari&#8217;s awesome post on human-computer symbiosis. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look:
We are attempting to understand the total analytic capability for a given task a of a human-computer team. [...]]]></description>
			<content:encoded><![CDATA[<div style='text-align: center; float: right; margin-left: 15px; margin-right: 15px'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt="" width="300"/>
</div>
<p>This is a response to <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">Ari&#8217;s awesome post on human-computer symbiosis</a>. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look:</p>
<p>We are attempting to understand the total analytic capability for a given task <strong><em>a</em></strong> of a human-computer team. Analytic capability in this case probably means:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq1.png" alt="eq1"/>(1)
</div>
<p>Where <strong><em>A</em></strong> is the answer to the analytic problem in question and <strong><em>t<sub>A</sub></em></strong> is the time needed to arrive at the answer based on the inputs available. In the case of chess, <strong><em>A</em></strong> could be the optimum next move given all previous information and <strong><em>t<sub>A</sub></em></strong> would be how long it takes to decide on this move.</p>
<p>Read on for a look at how this generalizes in human-computer symbiotic systems.<br />
<span id="more-1344"></span></p>
<p>In the case of the human-computer team, we know that <strong><em>a </em></strong>is going to be a function of both the human&#8217;s analytical capability <strong><em>h</em></strong> and the computer&#8217;s analytical capability <strong><em>c</em></strong> (where both <strong><em>h</em></strong> and <strong><em>c</em></strong> have units of answers/time). In the limit case we know that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq2.png" alt="eq2"/>(2)
</div>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq3.png" alt="eq3"/>(3)
</div>
<p>Or in plain English, if there is no human present, the total analytic capability is simply the analytic capability of the computer. So the naïve solution would be that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq4.png" alt="eq4"/>(4)
</div>
<p>(4) clearly meets the limiting cases described in (2) and (3). Kasparov noticed a mixing function where the ability of the human and computer to work together becomes the dominant term &mdash; we might call this the mixing capability for the given task or <strong><em>m</em></strong>. Including this phenomenon, the total analytic capability (4) would be re-defined as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq5.png" alt="eq5"/>(5)
</div>
<p>where <strong><em>m</em></strong> has the property that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq6.png" alt="eq6"/>(6)
</div>
<p>Thus maintaining the limits expressed in (2) and (3) and adhering to the observation that if there is no human or computer component then there will be no mixing advantage. A naïve solution to this constraint would be simple linear mixing:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq7.png" alt="eq7"/>  (7)
</div>
<p>where <strong><em>M</em></strong> (units of time per answer) is the mixing efficiency and will be primarily based on the type of task being solved &mdash; some analytical tasks lend themselves to a combined process more than others (for example, multiplying 20 digit numbers does not really benefit from the intuition of a human so the ability of a human and computer to perform this task is merely their additive ability). </p>
<p>What Kasparov noticed is that the mixing was primarily based on the quality of the process rather than the analytical power of either the human or computer separately. This seems to imply that we must somehow account for the fact that the quality of the human-computer interface is responsible for the quality of the mixing. This can be modeled as a unitless friction of interaction <strong><em>f<sub>i</sub></em></strong> that impedes the ability of the human and computer to work together. </p>
<p>Equation (7) can thus be re-written as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq8.png" alt="eq8"/>(8)
</div>
<p>In this case, the maximum value for the mixing capability is realized when the friction of interaction goes to zero. This mixing capability is the same as the equation Ari developed (less the coefficient which is necessary to maintain consistent units throughout).</p>
<p>We can now re-write our analytic capability in (5) as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq9.png" alt="eq9"/>(9)
</div>
<p>Below, see a plot of this function over a range of values for <strong><em>h</em></strong>, <strong><em>c</em></strong> and <strong><em>f<sub>i</sub></em></strong>:</p>
<div style='text-align: center; margin: auto; margin-bottom: 1em;'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt=""/>
</div>
<p>As can clearly be seen from this functional plot (note the vertical scale), the effect of interface friction dominates over the other terms whenever both the human and computer can make important contributions to the task at hand. The conclusion can be drawn that the most effective way to solve analytical problems is to minimize the friction of the human-computer interface; or to put it another way: optimal analytical systems are those that are built specifically to maximize the ability of the human to leverage the ability of the computer.</p>
<p>I am certain there is still the possibility for further refinement, for example:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq10a.png" alt="eq10a"/>(10)
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Friction in Human-Computer Symbiosis: Kasparov on Chess</title>
		<link>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/</link>
		<comments>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:32:06 +0000</pubDate>
		<dc:creator>Ari</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1302</guid>
		<description><![CDATA[


As we build our platforms and applications following a human-computer symbiosis approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way.
One of the areas that we&#8217;re interested is in the overall friction of analysis systems.  The systems that we build are [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px;'>
<img src='/wp-content/uploads/2010/03/fools-mate.gif'/>
</div>
<p>As we build our <a href="http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/">platforms</a> and <a href="http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/">applications</a> following a <a href="http://en.wikipedia.org/wiki/Intelligence_amplification">human-computer symbiosis</a> approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way.</p>
<p>One of the areas that we&#8217;re interested is in the overall friction of analysis systems.  The systems that we build are built on commodity hardware &mdash; we&#8217;re not building faster computers and yet we can deliver orders-of-magnitude better performance on analysis tasks than existing solutions.  How do we do this?  By building software in such a way that it reduces the friction experienced at the boundaries between the computing power, the analyst,  and the source data.</p>
<h2>Chess as analysis laboratory</h2>
<p>Chess is, at its heart, a predictive venture.  The player attempts to anticipate their opponent&#8217;s moves, planning their own moves accordingly, with the straightforward goal of finding a sequence of piece moves that force checkmate. </p>
<p>This game is, in its ideal form, analysis. (The moves made are the logical extension of the analysis.)  The data are clean, the problem is well-defined and everyone plays by the same rules.  There are even <a href="http://en.wikipedia.org/wiki/Elo_rating_system">well-defined metrics for ranking chess players by skill</a> &mdash; a better chess player is a better chess-game analyst.  </p>
<p>In the realm of evaluation of analysis systems, this is as about as good as it gets in terms of designing controlled experiments to study the relative strengths of different analysis systems.</p>
<p><a href="http://en.wikipedia.org/wiki/Garry_Kasparov">Garry Kasparov</a>, widely considered to be the greatest chess player of all time,  recently wrote <a href="http://www.nybooks.com/articles/23592">a review of Diego Rasskin Gutman&#8217;s book</a>, <a href="http://www.amazon.com/Chess-Metaphors-Artificial-Intelligence-Human/dp/026218267X"><u>Chess Metaphors: Artificial Intelligence and the Human Mind</u>.</a></p>
<p>The review is excellent and covers a lot of ground.  However, one particular anecdote stood out as a very interesting example of human-computer symbiosis (emphasis added):</p>
<blockquote><p>In 2005, the online chess-playing site Playchess.com hosted what it called a &#8220;freestyle&#8221; chess tournament in which anyone could compete in teams with other players or computers. Normally, &#8220;anti-cheating&#8221; algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less &#8220;intelligent&#8221; than the playing programs they detect.)</p>
<p>Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.</p>
<p>The surprise came at the conclusion of the event. <em>The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time.</em> Their skill at manipulating and &#8220;coaching&#8221; their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. <em>Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</em></p></blockquote>
<p>After the jump, we look at this finding in a more generalized way and map it onto the Palantir approach.<br />
<span id="more-1302"></span></p>
<h2>The cyborg Grandmaster: a fearsome opponent</h2>
<p>The tournament Kasparov recalls was a showcase of chess talent, human-computer symbiosis, and raw computing power.  Among those entered  in the tournament were a purpose-made chess machine (similar to <a href="http://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)">Deep Blue</a>) named <a href="http://en.wikipedia.org/wiki/Hydra_(chess)">Hydra</a> and a team of <a href="http://en.wikipedia.org/wiki/Grandmaster_(chess)">Grandmasters</a> assisted by computer programs.</p>
<p>One losing participant had this to say about the computer-aided Grandmasters:</p>
<blockquote><p>
Secondly, I have learned that a <a href="http://en.wikipedia.org/wiki/Grandmaster_(chess)">Grandmaster</a> armed with a chess engine is a killer combination against a plain Engine. Engines see everything via brute force, Grandmasters use their intuition and are able to see &#8220;obvious&#8221; moves at once. So the two of them together are a mighty force.
</p></blockquote>
<p>This is just as Licklider predicted 50 years ago &#8212; quoting <a href="http://blog.palantirtech.com/man-computer-symbiosis/">Man-Computer Symbiosis</a> (if I could put it better, I would):</p>
<blockquote><p>
Men will set the goals and supply the motivations, of course, at least in the early years. They will formulate hypotheses. They will ask questions&#8230; In general, they will make approximate and fallible, but leading, contributions, and they will define criteria and serve as evaluators, judging the contributions of the equipment and guiding the general line of thought.</p>
<p>&#8230;</p>
<p>In addition, the computer will serve as a statistical-inference, decision-theory, or game-theory machine to make elementary evaluations of suggested courses of action whenever there is enough basis to support a formal statistical analysis. Finally, it will do as much diagnosis, pattern-matching, and relevance-recognizing as it profitably can, but it will accept a clearly secondary status in those areas.
</p></blockquote>
<p>So in classic intelligence amplification fashion, having computer programs that can quickly evaluate a move&#8217;s likelihood of success can <em>amplify the power of the Grandmaster</em>.</p>
<p>While empirically true, it does beg the question: how <em>much</em> does it amplify the power of the Grandmaster?</p>
<p>One approximation might be product as a simple linear amplification.  Let&#8217;s imagine a function, <em>a(h,c)</em>, in which the analytic power (<em>a</em>) is the product of power of the human (<em>h</em>) and the computing power of the chess engine being used (<em>c</em>).  This gives us the equation:</p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-simple.png'/>
</div>
<h2>One term to dominate them all: friction-of-interface</h2>
<p>Does this simple approximation hold up?  It does not. The team that won the <a href="http://www.chessbase.com/newsdetail.asp?newsid=2461">PAL/CSS Freestyle Tournament in 2005</a> was composed of two amateur chess players that were able to best a computer-assisted Grandmaster.</p>
<p>How did  they accomplish this feat?  It was not through superior compute power.  Instead, they did so by more effectively feeding insights to their three chess engines. They played so well that a large number of people actually assumed that it was actually Kasparov himself playing:</p>
<blockquote><p>
Many speculated that it might be Garry Kasparov, who was the initiator of this kind of computer assisted chess matches. When we asked him Kasparov confirmed that was not the case. But he reminded us that it doesn&#8217;t really matter. The guiding principle of Freestyle Chess: anything is allowed. &#8220;Even if they were assisted by the devil, that would probably be covered by the rules,&#8221; he joked. &#8220;Only the moves they played count.&#8221;
</p></blockquote>
<p>What does this mean for our simple equation? Well, it looks it&#8217;s missing a term, one we&#8217;ll call <em>f</em>, that describes the efficiency or <strong>friction</strong> of the interface between human and computer.</p>
<p>Quoting Kasparov again:</p>
<blockquote><p>
<em>Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</em>
</p></blockquote>
<p>The implication being that the equation actually looks like this:</p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-variable-h.png'>
</div>
<p>So as the friction of the interface goes to zero, the full amplification of the chess engine is brought to bear.  A quick gut-check in the opposite direction agrees: one can imagine the world&#8217;s most powerful chess engine with the world&#8217;s worst interface; spending the time it would take to express commands to this theoretically awful program would actually be worse than playing without it.</p>
<h2>Palantir: a low-friction interface to data</h2>
<p>As analysis problems go, chess resembles <a href="http://en.wikipedia.org/wiki/Spherical_cow">a spherical cow in a vacuum</a>.  Analysis problems in the real world are orders of magnitude messier.</p>
<p>Let&#8217;s reframe the terms of our equation above into a more general approach to analysis:</p>
<ul>
<li><em>H</em> &#8211; this is power of the analyst.  In chess, the value of this terms varies widely between players; in designing real-world data analysis systems, this is more or less a constant (which is why <em>h</em> above becomes <em>H</em> below).  Of course there are differing levels of expertise, training, and raw ability amongst the user population, but when we design systems, it&#8217;s with the average case in mind.</li>
<li><em>c</em> &#8211; computing power. How fast are the machines?  How well do they scale?  How efficiently do they perform the data tasks at hand? Palantir spends significant engineering effort on optimizing the <em>c</em> term, but most of the growth in this term comes from the layers we depend on, built by companies like Intel, Sun, Oracle, etc.</li>
<li><em>f</em> &#8211; friction.  How easy is it to bring <em>c</em> to bear on the problem? Note that when we talk about <em>friction of interface</em>, this is not exclusively referring to user interface.  More generally, friction can be present at any interface between two systems: data-software, software-software, human-software, etc. The <em>f</em> that we consider in this simple model is sum total system friction.</li>
</ul>
<p>So our final formulation is just in terms of <em>c</em> and <em>f</em> (holding <em>H</em> as a constant): </p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-final.png'>
</div>
<p>When we discuss friction in real-world analysis systems, the friction actually exists at multiple levels:</p>
<ol>
<li>Creating an analysis model that will enable answering the questions that need to be explored</li>
<li>Integrating the data into a single coherent view of the problem</li>
<li>Enabling analysis tools to efficiently query and load the data</li>
<li>Exposing APIs that allow developers to develop custom solutions quickly and efficiently for modeling and analysis tasks not covered by general tools</li>
<li>User interface that makes the tools easy, enjoyable, and quick to use</li>
</ol>
<h3>Minimizing <em>f</em>: Haiti Flooding Predictions</h3>
<p>If this is starting to sound very similar to Palantir&#8217;s marketing information, this is no accident. While some of our backend engineers are concerned with things like scaling and speed-of-querying, the overall innovation that we&#8217;re bringing to the field is not simply about faster data processing systems (even if they are) but reducing the friction at every interface inside a complex human-computer symbiotic system.</p>
<p>You want an example that ties it all together?  It starts with a simple question: which of the many displaced-person camps in Haiti are most at risk for flooding as the rainy season approaches?  Easy to ask, but not so simple to answer. </p>
<p>The original introduction to this video: </p>
<blockquote><p>As we enter the beginning of the rainy season in Haiti, one of the biggest problems facing relief organizations today is the spectre of flooding and mudslides destroying Internally Displaced Persons (IDP) Camps. In this video, we integrate data from many sources to determine high risk aid locations.
</p></blockquote>
<p>The data integration for this video took about six hours, using sources of data that had never before been fused.  The analysis itself takes a few minutes and quickly comes to an actionable answer to the original question.</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv"/></object>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Palantir: like an operating system for data analysis</title>
		<link>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/</link>
		<comments>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/#comments</comments>
		<pubDate>Sat, 07 Nov 2009 03:21:44 +0000</pubDate>
		<dc:creator>Ari</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1198</guid>
		<description><![CDATA[


If you&#8217;ve taken the time to peruse the Palantir Government analysis blog, you&#8217;ve seen numerous examples of Palantir Government as applied to interesting problems; they are recorded screen captures of our analysis desktop client.  It&#8217;s a showcase of useful, meaningful, and compelling visual and semantic tools being used to do analysis on a wide [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/VisiCalc'><img src='/wp-content/uploads/2009/11/visicalc.png' width='250'/></a>
</div>
<p>If you&#8217;ve taken the time to peruse the Palantir Government <a href='http://www.palantirtech.com/government/analysis-blog'>analysis blog</a>, you&#8217;ve seen numerous examples of Palantir Government as applied to interesting problems; they are recorded screen captures of our analysis desktop client.  It&#8217;s a showcase of useful, meaningful, and compelling visual and semantic tools being used to do analysis on a wide range of datasets.</p>
<p>What enabled this analysis? Aside from the <a href="http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/">obvious hard work of our UI and analysis tools teams</a>, it&#8217;s the flexibility and power of the Palantir data platform.  More than just a scalable datastore, the Palantir data platforms act as robust and clean abstractions on top of data.</p>
<p>One of the early architecture decisions that we made when building both <a href="http://www.palantirtech.com/government">Palantir Government</a> and <a href="http://www.palantirfinance.com/">Palantir Finance</a> was to separate the respective data platforms from the end-user applications used to actually perform analysis.  More than just following the client-server model, this separation made the data servers in both products into generic intelligence infrastructure for analytic problems, with our clients acting as analysis applications on top of those platforms.</p>
<p>And so, one way to look at our data platform is as an operating system for analytic applications.  In this post we&#8217;ll explore the history of operating systems, understand why they&#8217;re so important and see how the Palantir data servers deliver the same potential to revolutionize the writing of analysis software that operating systems did to the writing of general programs for computers.</p>
<p><span id="more-1198"></span></p>
<h2>The OS: abstraction that begat a paradigm</h2>
<p>In the early days of computing, when a programmer wanted to write a program, they had to understand the inner workings of the machine. Writing a program required understanding things like the bus interface of a specific model of hard drive when all that was needed by the program was the clean abstraction of a filesystem. The upshot of this is that much of the time and effort put into a given task was spent writing code to interface with the &#8220;physical&#8221; minutiae of the machine rather than implementing the solution to the problem that the programmer was trying to solve with their software.</p>
<p>This pattern was observed by  <a href="http://en.wikipedia.org/wiki/J._C._R._Licklider">J.R. Licklider</a> and noted in his influential paper, <i><a href="http://blog.palantirtech.com/man-computer-symbiosis/">Man-Computer Symbiosis</a></i> (emphasis added):</p>
<blockquote><p>
<b>About 85 per cent of my “thinking” time was spent getting into a position to think, to make a decision, to learn something I needed to know. Much more time went into finding or obtaining information than into digesting it.</b> Hours went into the plotting of graphs, and other hours into instructing an assistant how to plot. When the graphs were finished, the relations were obvious at once, but the plotting had to be done in order to make them so.<br />
…<br />
<b>Throughout the period I examined, in short, my “thinking” time was devoted mainly to activities that were essentially clerical or mechanical</b>: searching, calculating, plotting, transforming, determining the logical or dynamic consequences of a set of assumptions or hypotheses, preparing the way for a decision or an insight. <b>Moreover, my choices of what to attempt and what not to attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability.</b>
</p></blockquote>
<p>This description of his time as a researcher was echoed in the work of the early programmers: they spent much of their programming time re-inventing the wheel and writing routines that were doing essentially clerical or mechanistic work related to the functioning of the hardware rather the core functions of their programs.</p>
<p>The operating system changed all that: suddenly (and by that I mean: with years of hard work, research, and incremental change) that noisy, inconsistent pile of hardware was transformed into a set of clean abstractions. The programmer was finally freed to spend time and energy on the problem they were really trying to solve.</p>
<p>And so we come to the modern era: dealing with the messy details of hardware has been replaced by the clean and robust abstraction of the operating system.</p>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Operating_system'><img src='/wp-content/uploads/2009/11/250px-operating_system_placementsvg.png' width='250'/></a>
</div>
<p>Three important properties of modern operating systems:</p>
<ul>
<li><b>Hard boundaries between OS functions and process functions</b> &#8211; in modern operating systems, this is usually accomplished with system calls.  The process places the inputs to the system call in a known location and then asks the OS to perform some operation, like writing to a file or making a network connection.  The OS may or may not perform the function, based on things like permissions, availability of resources, etc.
<p>The most important feature here is that the process never has direct access to the true resources of the machine &mdash; instead, all access to the machine&#8217;s resources are brokered by the OS.
</li>
<li><b>Extensions of the abstraction in every direction</b> &#8211; An OS like Linux is really, at its core, a kernel that does process scheduling and lifecycle, manages memory, and services system calls. Everything else is handled by some sort of driver.  A driver might also be called, more generically, a plugin or extension.  Drivers exist for everything from block devices (like hard drives), network cards, and filesystems to input devices and displays.</li>
<li><b>Designed as a general purpose framework</b> &#8211; the operating system <i>doesn&#8217;t actually do any computing</i>; rather, it&#8217;s a set of services to facilitate processes using the resources of the computer.  To that end, they&#8217;re not designed with a specific process in mind, but rather to serve a large class of programs, each designed and written to accomplish a different task using a similar set of resources.</li>
</ul>
<h2>Analysis: the modern computing task</h2>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/ENIAC'><img src='http://upload.wikimedia.org/wikipedia/commons/archive/4/4e/20050923152626!Eniac.jpg' width='250'/></a></div>
<p>The first computer, <a href="http://en.wikipedia.org/wiki/ENIAC">ENIAC</a>, was conceived to do calculation of ballistics tables for artillery pieces &mdash; it was a glorified calculator. Lacking anything even resembling an operating system, it would just run its program. Its compiler? A group of six women who would configure the machine by hand with the program logic.  The input for its first test run, a calculation related to the hydrogen bomb project, was approximately <i>one million punch cards</i>.</p>
<p>Times have changed: 40 or so years of the unrelenting march of Moore&#8217;s Law in computing power has given us something like an <b><a href="http://upload.wikimedia.org/wikipedia/commons/thumb/c/c5/PPTMooresLawai.jpg/596px-PPTMooresLawai.jpg">eight order of magnitude increase</a></b> in the amount of computing power available per unit cost.  Coupled with similar,<a href="http://www.kk.org/thetechnium/archives/2009/07/was_moores_law.php"> more recent gains in storage capacity and network bandwidth</a>, this has produced a world awash in data, <a href='http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/'>crying out for analysis.</a></p>
<p>So the situation today is that we now expect to bring these considerable computing resources to bear on larger, more complex problems in the world.  I&#8217;m talking about things like the <a href="http://www.palantirtech.com/government/analysis-blog/traceback">spread of food-borne illnesses</a>, understanding the connection between genes and protein expression, <a href="http://www.palantirtech.com/government/analysis-blog/sinjar">understanding terrorist networks</a>, <a href="http://www.palantirtech.com/government/analysis-blog/uncovering-a-bot-net-exploring-router-data-using-palantir">finding botnets in network traffic logs</a>, and <a href="http://www.palantirtech.com/government/analysis-blog/transparency">exploring influence networks in government</a>.</p>
<p>These problems, while spanning a widely disparate areas of analysis, share some common traits:</p>
<h3>The data is spread out</h3>
<p>They are described by multiple data sources. Just to make things more interesting: the data sources don&#8217;t agree on their native representations of the real-world data. And finally, the real-world objects that the data are describing are actually described in multiple data sources, with no single source giving a complete and accurate representation.</p>
<h3>The data schema are not human-conceptual</h3>
<p>Rather than representing the data in some schema that maps easily into how the experts on a given problem think about said problem, the data stores in question tend to model data in whatever way was convenient for the creators of that particular data store. Put another way: people don&#8217;t think in tables, rows, columns, and XML snippets.  These first-class data storage elements don&#8217;t usually map to real-world objects.</p>
<h3>The data is sensitive</h3>
<p>Whether it&#8217;s patient information, <a href="http://www.palantirtech.com/government/analysis-blog/horizon">mortgage data</a>, a law enforcement investigation, or sensitive foreign intelligence, there is often the need for <a href="http://www.palantirtech.com/government/analysis-blog/mls">foolproof access controls on the data</a>.</p>
<h2>Palantir: an operating system-class abstraction for analysis</h2>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'><img src='http://blog.palantirtech.com/wp-content/uploads/2009/01/shot0016.png' width='250'/></div>
<p>A Palantir data server provides a similar class of services that an operating system does but focused on the specific needs of analytic tasks.  Here I&#8217;ll focus on the model used by Palantir Government; Palantir Finance uses a similar but significantly different approach to delivering these services.</p>
<p>As you might imagine, however, they both start at a somewhat higher level than punch cards.</p>
<h3>It starts with an ontology</h3>
<p>The Palantir approach to analysis begins with a task-specific ontology: essentially, a human-conceptual description of the real-world problem that&#8217;s being analyzed.</p>
<p>It&#8217;s roughly composed of three pieces:</p>
<ul>
<li>A hierarchical type system of the real-world objects that human experts use to think about this problem. We call these <i>PTObjects</i>, short for &#8220;Palantir Objects&#8221;.</li>
<li>A type system of properties that will contain the data describing these PTObjects.  PTObjects are essentially typed containers for properties. This is where most of the detail of the ontology lies.</li>
<li>A type system of possible relationships between different types of PTObjects.</li>
</ul>
<p>Within the ontology, there are numerous extension points that allow the customization of how data is imported, retrieved, and displayed (following the principle of <i>extending the abstraction in all directions</i>).</p>
<p>The data server takes the ontology as input and is agnostic to its content. This is where the principle of <i>building a general purpose framework</i> comes into play.</p>
<h3>The data sources are mapped into the ontology</h3>
<p>This part of the Palantir data server is a pattern that is very similar to an operating system&#8217;s notion of block device drivers. The difference? Instead of low-level storage systems like hard drives, we&#8217;re dealing with complex databases describing the problem at hand.</p>
<p>In an operating system, every block device can read and write blocks of data.  In the Palantir data server, everything becomes a source of PTObjects.</p>
<p>Our data importer plugins, by analogy,  fulfills the same role as a block device driver:<br />
we build glue code to map the data source&#8217;s schema into the ontology and the connectors to surface the data itself wrapped up in PTObjects.</p>
<h3>The data are composed into real-world objects.</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='/wp-content/uploads/2009/11/pg-object-model.jpg'><img src='/wp-content/uploads/2009/11/pg-object-model.jpg' width='250'/></a>
</div>
<p>Part of this mapping is composing real-world objects into composite PTObjects by resolving PTObjects together.</p>
<p>The operation of resolving is pretty straightforward: we basically union the properties of the two PTObjects into a new PTObject. The end result is a single PTObject that completely represents all the data about something in the real-world from all the available data sources.</p>
<p>As we do this composition, we keep track of where each property came from, down to the record level, in each of its original sources.  (Note that most composed PTObjects will usually have at least one property that comes from two sources).  By preserving the original identity of every atom of data, it allows us to later decompose these PTObjects into their constituent parts or, more importantly, censor a client&#8217;s view based what permissions they have for each of the original data sources.</p>
<p>This a fundamental operation in our system that doesn&#8217;t have an exact analog in operating systems &#8212; it&#8217;s sort of similar to taking  multiple filesystems and mounting them inside a virtual filesystem tree, like Unix does.  However, if each data source is like a filesystem, what we&#8217;re doing is essentially composing individual files from their fragments stored on multiple block devices.</p>
<p>Another analogy: at a level below the block device in the OS, this is also sort of similar to what a <a href="http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_0">RAID0</a> device does, the difference being that our composition is based on the contents of the data itself rather than some previously applied, content-agnostic, decomposition function.  The other difference being motivation: a RAID0 does it for performance, while Palantir is composing data to make it correspond to the real-world objects it represents.</p>
<h3>The server exposes Palantir &#8220;system calls&#8221;</h3>
<p>The interface that the Palantir data server exposes can be boiled down to two essential operations:</p>
<ul>
<li>The client can download copies of PTObjects from the server.  It may request them by id or perform some sort of search/query to specify a set of PTObjects.  This is roughly analogous to the <b><a href="http://en.wikipedia.org/wiki/Open_%28system_call%29">open()</a></b> and <b><a href="http://comsci.liu.edu/~murali/unix/read.htm">read()</a></b> system calls on Unix.
<p>Note that each client only sees the subset of properties for a given PTObject that it is authenticated for.  This censorship of full PTObjects into projected slices is something done by the server on every load of PTObjects.</li>
<li>The client can send new or updated PTObjects to the data server for storage. This is roughly analogous to the <b><a href="http://www.freebsd.org/cgi/man.cgi?query=write&#038;sektion=2&#038;manpath=FreeBSD+7.2-RELEASE">write()</a></b> system call in Unix. It, of course, entails a check as to whether the given client has permission to write to the given PTObject.</li>
</ul>
<p>The server&#8217;s responsibility is the same as the operating system: only let the client do what it has been granted permission to do.  In an operating system, the OS uses hardware features like <a href="http://en.wikipedia.org/wiki/Protected_mode">protected mode</a> to keep lower-privileged processes from accessing machine resources. Palantir uses network calls to achieve the same separation, by placing the client and server on different logical machines.  The effect is the same: the client basically requests (rather than commands) that certain operations are performed by the server.  The server uses its own rules to decide if the access or change is allowed and responds accordingly. And so the principle of <i>hard boundaries</i> is implemented.</p>
<h3>The clients do the analysis</h3>
<p>When an operating system yields to a process, that&#8217;s the time when the true processing begins.  By the same token, in Palantir, it&#8217;s not until a client connects and starts searching, visualizing, and manipulating PTObjects that analysis actually starts taking place (even if the server is doing a lot of the heavy lifting).</p>
<h2>The wide open future</h2>
<p>So why is this exciting?  I&#8217;m glad you asked!</p>
<h3>It&#8217;s about taking analysis to the next level.</h3>
<p>Let&#8217;s say you&#8217;re someone who wants to write an analytic task. Let me ask you a series of rhetorical questions:</p>
<ul>
<li>Do you want to start with three disparate sources of data or with the data already mapped into a Palantir data server?</li>
<li>Which one is a better use of your time as a programmer?</li>
<li>Which one allows you to not repeat mistakes that other programmers have already made and fixed?</li>
<li>Which one is more like writing a program than an operating system?</li>
</ul>
<p>Operating systems took us to a new level of expressiveness when it came to writing computing processes to run on computing hardware. It inverted that 85/15 ratio that Licklider talked about so that programmers spent more time writing the code that did the thing they were trying to create and less time mucking around with hardware.</p>
<p>More programmer time == better analytic tasks.</p>
<h3>It&#8217;s about making machine learning easier.</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Skynet_%28Terminator%29'><img src='http://images1.wikia.nocookie.net/terminator/images/8/8a/Cyberdyne_logo.jpg' width='250'/></a>
</div>
<p>Now consider machine learning as a field.  Pretty much every machine learning task could benefit from starting with its data in something that looks like a Palantir data server.  I&#8217;ve taken an informal survey of machine learning researchers and they agree: the 85/15 ratio still holds for machine learning.</p>
<p>Simply put: <b>most of the time and effort in machine learning is spent getting the data into a form that you can actually apply an algorithm to!</b> Now imagine if the starting point for that was a Palantir data server &mdash; now the machine learning implementer has a world of expressiveness open to them and time and energy are spent on the task at hand instead of the overhead of messing with the data.</p>
<p>Now, we don&#8217;t think that we&#8217;re building Skynet.  Quite the contrary: we believe that platforms like the one we&#8217;ve built will allow machine learning techniques to be put in the hands of experts to augment their ability to look at the world come to conclusions about complex real-world problems by asking questions of the data we&#8217;ve collected. It&#8217;s about <a href="http://en.wikipedia.org/wiki/Intelligence_amplification">Intelligence Augmentation</a>, which can use machine learning techniques and algorithms to build better tools, not creating <a href="http://en.wikipedia.org/wiki/Strong_AI">Strong AI</a>.</p>
<h3>It&#8217;s about creating new markets</h3>
<p>Let&#8217;s go back to the well of operating systems and look back at the history of MS-DOS: the first &#8220;killer&#8221; application on MS-DOS was <a href="http://en.wikipedia.org/wiki/VisiCalc">VisiCalc</a> (that screenshot at the top of this post), a text-based spreadsheet.  As you know, VisiCalc was not the end of the story but just the introduction. MS-DOS, evolved into Windows, allowed application writers an (arguably) clean abstraction on top of commodity hardware in order to build the applications that users actually wanted. Today, we have things like web browsers, multimedia authoring software, virtual machines, and IDEs built on top of what is, essentially, the same set of abstractions that VisiCalc was built on.</p>
<p>However, the most important thing to note is that VisiCalc is credited with creating the market for commercial operating systems &#8212; businesses needed VisiCalc so they paid Microsoft for MS-DOS (and IBM for a PC).  Without VisiCalc, there was no market for MS-DOS (most people, unsurprisingly, didn&#8217;t want to buy a <a href="http://en.wikipedia.org/wiki/Microsoft_BASIC">BASIC interpreter</a>).</p>
<p>We&#8217;re in the business of selling software and we agree with our customers: the Palantir approach has tremendous value.  We&#8217;ve just started tapping the potential of this market.  Think about what Oracle looked like in 1979, think what Microsoft looked like in 1980 &mdash; that&#8217;s Palantir in 2009.</p>
<h3>It&#8217;s about the start of the analysis age</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Information_Age'><img src='http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Internet_map_1024.jpg/600px-Internet_map_1024.jpg' width='250'/></a>
</div>
<p>It can be argued that the operating system is the innovation that ushered in the &#8220;<a href="http://en.wikipedia.org/wiki/Information_Age">information age</a>&#8220;.  Without the operating system, there is no software explosion, which allows computing technology to actually be used on data in the world.</p>
<p>We think that we&#8217;re on the cusp of the analysis age, as imagined by <a href="http://en.wikipedia.org/wiki/Vernor_Vinge">Vernor Vinge</a> in <u><a href="http://books.google.com/books?id=SrLwPdBJodMC&#038;dq=rainbow%27s+end&#038;printsec=frontcover&#038;source=bn&#038;hl=en&#038;ei=TdX0Sui9HsTh8AbGlc3zCQ&#038;sa=X&#038;oi=book_result&#038;ct=result&#038;resnum=5&#038;ved=0CBsQ6AEwBA#v=onepage&#038;q=&#038;f=false">Rainbow&#8217;s End</a></u>.  It was something foreseen by Licklider in 1960, albeit with a timeline that was off by at least a few decades:</p>
<blockquote><p>
“…it seems worthwhile to avoid argument with (other) enthusiasts for artificial intelligence by conceding dominance in the distant future of cerebration to machines alone. There will nevertheless be a fairly long interim during which the main intellectual advances will be made by men and computers working together in intimate association. A multidisciplinary study group, examining future research and development problems of the Air Force, estimated that it would be 1980 before developments in artificial intelligence make it possible for machines alone to do much thinking or problem solving of military significance. That would leave, say, five years to develop man-computer symbiosis and 15 years to use it. The 15 may be 10 or 500, but those years should be intellectually the most creative and exciting in the history of mankind.”
</p></blockquote>
<p>It&#8217;s a golden age of analysis and we&#8217;re just getting started: we&#8217;ve got a lot of work to do, so if this sort of thing excites you, please <a href='http://www.palantirtech.com/careers/culture'>come and join us.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Model Resolution in Palantir Finance: avoiding N2</title>
		<link>http://blog.palantirtech.com/2009/02/02/model-resolution-in-palantir-finance/</link>
		<comments>http://blog.palantirtech.com/2009/02/02/model-resolution-in-palantir-finance/#comments</comments>
		<pubDate>Mon, 02 Feb 2009 20:00:49 +0000</pubDate>
		<dc:creator>Andy</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[problemspace - finance]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=180</guid>
		<description><![CDATA[
N2, with N = 8
One of the big challenges in Palantir Finance comes when integrating data from multiple data providers.  When the server is launched, it needs to create a coherent model of the financial world based on data coming from potentially dozens of data providers.  Each data provider defines a set of [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 275px; text-align: center;"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/7/73/Complete_graph_K8.svg/600px-Complete_graph_K8.svg.png" alt="" width="260" /><br />
<em>N</em><sup>2</sup>, with <em>N</em> = 8</div>
<p>One of the big challenges in Palantir Finance comes when integrating data from multiple data providers.  When the server is launched, it needs to create a coherent model of the financial world based on data coming from potentially dozens of data providers.  Each data provider defines a set of “models” that it supports.  These models can be things like equities, currencies, futures, options, or even new types that the providers themselves define.</p>
<p>The major challenge occurs when multiple providers define models that represent the same real-world entity.  Provider <em>A</em> might know about Google, have basic open/high/low/close data for the stock, and know its ticker, country, and <a href="http://en.wikipedia.org/wiki/International_Securities_Identifying_Number">ISIN</a>.  Provider <em>B</em> might also provide a Google model, have balance sheet data, and know its country, exchange, and ISIN.  We want to expose only one Google model to the user, however, and so we need a means of <a href="http://en.wikipedia.org/wiki/Identity_resolution">resolving </a>the two Googles together – recognizing that they’re the same instrument – and adding just one equity to the system that encompasses both.</p>
<p>Resolution logic can be fairly complicated.  For equities, for example, there are several different ways in which resolution can take place.  If two equities have identical ISINs, we can be pretty confident they match, since those identifiers are declared as globally unique.  If two equities have the same ticker and the same country of exchange, we might also consider that a match, though perhaps of weaker quality.  Two models resolve to each other if any form of resolution considers them equal (with errors being thrown if other forms of resolution contradict the form that considers them equal…i.e. provider <em>A</em> and provider <em>B</em> agree on an instrument’s ISIN but disagree on its ticker).</p>
<p>Read on for the details of how we solve this seemingly <a href="http://en.wikipedia.org/wiki/Analysis_of_algorithms"><em>n</em><sup>2</sup></a> problem with a linear solution.<br />
<span id="more-180"></span><br />
Given <em>N</em> models across providers of a given asset class (say, equities), there are <em>N</em><sup>2</sup> potential checks that I need to do to properly “resolve” all models, since any model can resolve to any other model in the system (and I potentially do want to attempt to resolve a model from provider <em>A</em> to other models from provider <em>A</em> to do error checking, since I may consider it invalid for a provider to provide the same model twice).  Obviously we would like to do better than this, and we can, assuming that most models do not resolve to each other.</p>
<p>Envision the set of all <em>(model, provider)</em> pairs as the set of nodes on a graph.  Two models from different providers that resolve to each other can be represented by an edge between two nodes in the graph.  If the number of providers <em>k</em> is small relative to <em>N</em>, the number of resolution forms for a given asset class is small, and our data is valid, we can come up with an algorithm that solves our problem in N time as follows:</p>
<ol>
<li>For every form of resolution, ask the data providers for all the data necessary for resolution to take place.  For ticker/country resolution, with our data provider interfaces, this gives us a map from every<em> (model, provider)</em> pair to its ticker and country.</li>
<li>We can then invert this map, giving us a map from <em>(ticker, country)</em> pairs to a set of <em>(model, provider)</em> pairs.  Note that the values in the inverted map do have to be sets, since there can be multiple <em>(model, provider)</em> pairs with the same ticker and country (indeed, this is expected if ANY models can be resolved between providers).</li>
<li> Then, for every model, for each resolution form, we can look up the relevant properties for that model, and then look up in the inverse map any models that are equivalent to it.  This tells us what edges to add to our <em>(model, provider)</em> graph.</li>
</ol>
<p>We&#8217;re essentially building up an in-memory, inverted index of the relevant data each model is giving us.  The amortized <em>O</em>(1) lookups that the hashtable-backed maps provide allows us to trade the <em>O</em>(<em>N</em><sup>2</sup>) complexity for something more like <em>O</em>(<em>N</em>).</p>
<div style="float: left; width: 200px; text-align: center; margin-right: 15px;"><img src="http://blog.palantirtech.com/wp-content/uploads/2009/02/disconnected-k-clusters.png" alt="" width="200" /><br />
<em>N</em> checks, rather than <em>N</em><sup>2</sup> (assuming that <em>k</em> is trivial compared to <em>N</em>).</div>
<p>Once we’ve done this for every model each <a href="http://en.wikipedia.org/wiki/Connected_component_(graph_theory)">connected component</a> of our graph should correspond to one model to be added into our final system.  Since the connected components of a graph can be computed in time linear to the number of nodes, we can compute all the final models in linear time.  And what is nice is that the maps give us the ability to quickly post-process our data to look for errors.  If any two models in a given connected component come from the same provider, this is an error (either the provider has incorrect data, or it is modeling the data improperly).  If two models from two different providers resolve, but have conflicting data for a given resolution form, this is also an error.  Note that since providers do not have to provide data for every resolution form, it is possible that <em>k</em> models from different providers that resolve together do not form a <a href="http://en.wikipedia.org/wiki/Clique_(graph_theory)"><em>k</em>-clique</a> on the graph.</p>
<p>Writing data providers is not always easy.  There are many data sources out there that are messy, and properly modeling real world data in code can be quite challenging.  That’s why it is important to come up with sound, efficient resolution logic that fails noisily, and tells the engineer building the provider when they are and are not playing nicely with the rest of the system.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/02/02/model-resolution-in-palantir-finance/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using Palantir to implement the TARP</title>
		<link>http://blog.palantirtech.com/2009/01/22/tarp/</link>
		<comments>http://blog.palantirtech.com/2009/01/22/tarp/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 09:27:41 +0000</pubDate>
		<dc:creator>AlexF</dc:creator>
				<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=170</guid>
		<description><![CDATA[
We talk often with our contacts in finance and intelligence, and an increasingly common subject is the U.S. Government&#8217;s Troubled Assets Relief Program (TARP &#8212; part of the Treasury Department). Our friends see the large problems facing the TARP and the Federal Reserve, and have been asking how our technology can help.
Some of the problems [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 226px; margin-left: 15px'><a href="http://www.treas.gov/initiatives/eesa/"><img src='http://www.ustreas.gov/education/fact-sheets/images/new_treas_seal.gif'/></a></div>
<p>We talk often with our contacts in finance and intelligence, and an increasingly common subject is the U.S. Government&#8217;s Troubled Assets Relief Program (<a href="http://www.treas.gov/initiatives/eesa/">TARP</a> &#8212; part of the Treasury Department). Our friends see the <a href="http://en.wikipedia.org/wiki/Troubled_Assets_Relief_Program#Purpose">large problems</a> facing the TARP and the Federal Reserve, and have been asking how our technology can help.</p>
<p>Some of the problems are out of our hands, but many others are solvable with the proper analytics. Taking a closer look at the task before TARP, we noticed that many challenges mirror those facing the intelligence community:</p>
<ul>
<li>Entity and relationship <strong>data</strong> is scattered across many sources in a <strong>wide variety of formats</strong>; some are <strong>structured</strong>, some are <strong>unstructured</strong>.</li>
<li>Entity structure and relationships are <strong>not always known upfront</strong>, so the solution must<strong> adapt to new data structures</strong> on the fly.</li>
<li>It is costly, time-consuming, and <strong>unnecessary to impose one structure</strong> on the entire industry.</li>
<li><strong>Scalability</strong> is a must: millions of mortgages have been securitized into hundreds of thousands of entities.</li>
<li>Sensitive, private data requires <strong>sophisticated access control and knowledge management</strong> &#8212; understanding who is accessing which data, what the organization knows, when it was known, and how it was discovered.</li>
<li>Specialists from different fields and geographical regions must be able to <strong>collaborate effectively</strong>.</li>
</ul>
<p>Palantir&#8217;s technology already solves these problems for the intelligence community. Our dynamic ontology makes it easy to import TARP data and entities, so we&#8217;ve created a short video using Palantir that shows the power of our approach. We analyze individual mortgage loans, mortgage-backed securities comprising these loans, and institutions holding <a href="http://en.wikipedia.org/wiki/Tranche">tranches</a> of the securities:</p>
<div style='postimg'>
<a href="http://www.palantirtech.com/government/videos/mbs/"><img src="http://blog.palantirtech.com/wp-content/uploads/2009/01/shot0016.png"/></a>
</div>
<p>For more detail on the similarities, click the link to see a detailed breakdown of intelligence vs. TARP workflows.</p>
<p><span id="more-170"></span></p>
<h2>Workflows</h2>
<p>The types of questions the TARP and the Federal Reserve need to answer successfully are similar to those in the intelligence community. In essence, TARP is performing the sort of analysis performed at intelligence agencies: making sense of large amounts of data to create a coherent and accurate picture of the world. TARP is performing analysis on domestic financial data rather than global intelligence data, and using those insights to craft solutions to the current financial crisis. Our breakdown and comparison of the different aspects of the workflows along the same broad lines looks like this:</p>
<h3>Strategic: Mission Planning and Policy Design</h3>
<table>
<tr>
<th>Classical Intel</th>
<th>TARP</th>
</tr>
<tr>
<td>
<ul>
<li>How have nation-states’ methods of supporting terrorist organizations evolved over the last 10 years?</li>
<li>How has deploying more troops to specific hot spots affected the overall level of violence in those areas?</li>
<li>What types of surrogate forces should be recruited and trained to support missions across theater?</li>
</ul>
</td>
<td>
<ul>
<li>Which institutions will require intervention and what markets are they most exposed to?</li>
<li>Which geographical regions and communities most urgently need federal support?</li>
<li>Which asset classes and types of mortgages should be purchased first?</li>
</ul>
</tr>
</table>
<h3>Operational: Asset Class Level Management and Tactical Planning</h3>
<table>
<tr>
<th>Classical Intel</th>
<th>TARP</th>
</tr>
<tr>
<td>
<ul>
<li>What known terrorist cells are present in a given region and what is the most effective way to combat them based on their ideology?</li>
<li>What are the various touch points for these organizations’ logistical networks and what measures have proved effective in dismantling them in the past?</li>
<li>
How can we measure the efficacy of various actions against the objectives through observable phenomena, including communications, financial information, and human source collection?</li>
</ul>
</td>
<td>
<ul>
<li>What are the characteristics of loans most likely to default in Florida and what is the best strategy for preventing foreclosure?</li>
<li>Which players were most involved in originating commercial loans in Florida? What tactics were used to justify appraisals, and how can these tactics be adjusted for?</li>
<li>What policy for mortgage adjustment yields the fairest outcome in Palm Springs, Florida?</li>
</ul>
</td>
</tr>
</table>
<h3>Tactical: Asset Targeting, Program Implementation, Specific Action Support.</h3>
<table>
<tr>
<th>Classical Intel</th>
<th>TARP</th>
</tr>
<tr>
<td>
<ul>
<li>What times are most likely for a patrol to be attacked in this neighborhood?  What methods are used during the day vs. the night?</li>
<li>Which repercussions are likely to occur as result of arresting a specific individual? What organizations is this person associated with and who is likely to retaliate?
<li>Which human sources are likely to be able to provide actionable intelligence to move against the time sensitive target?</li>
</ul>
</td>
<td>
<ul>
<li>What is the notional size of <a href="http://en.wikipedia.org/wiki/Credit_default_swap">credit default swaps</a> written on this tranche of this <a href="http://en.wikipedia.org/wiki/Commercial_mortgage-backed_security">commercial MBS</a>?  Which banks are the major holders, and how have their assets ratings changed?</li>
<li>Who originated this loan, and how close are the <a href="http://en.wikipedia.org/wiki/Comparables">comparables</a> used in the due-diligence report?</li>
<li>Who is the servicer for this mortgage, and which branch needs to be contacted if the size of the loan is adjusted down?</li>
</ul>
</td>
</tr>
</table>
<h2>Mission</h2>
<p>We believe that the TARP&#8217;s success is critical to the global financial markets and the health of our nation. We&#8217;ve said from the beginning that our mission is to change the way the world approaches data, and today Palantir is a technology leader in both intelligence and finance. As we begin work on this new challenge we&#8217;re excited to be making a difference where it&#8217;s needed most.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/01/22/tarp/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Hal Varian: analysis is the long-term value play</title>
		<link>http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/</link>
		<comments>http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/#comments</comments>
		<pubDate>Tue, 18 Mar 2008 20:00:34 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2008/02/28/why-hal-varian-thinks-palantir-is-a-great-idea/</guid>
		<description><![CDATA[Raw data is an increasingly abundant and inexpensive commodity. Intelligently filtering, analyzing and visually understanding data is where the value is.  Palantir invents technology and products that enables human analysts to harness the power of computers in an intuitive way to quickly and deeply analyze large amounts of data.
The value of data analysis as [...]]]></description>
			<content:encoded><![CDATA[<p>Raw data is an increasingly abundant and inexpensive commodity. Intelligently filtering, analyzing and visually understanding data is where the value is.  Palantir invents technology and products that enables human analysts to harness the power of computers in an intuitive way to quickly and deeply analyze large amounts of data.</p>
<p><a href="http://freakonomics.blogs.nytimes.com/2008/02/25/hal-varian-answers-your-questions/#more-2345">The value of data analysis as a career was recently emphasized by Hal Varian in the Freakonomics blog in The New York Times</a>. <a href="http://people.ischool.berkeley.edu/~hal/">Hal</a> is an internationally known economist who is currently serving as Google’s Chief Economist while on leave from his three professorships at the University of California at Berkeley. </p>
<blockquote><p>Q: Your job sounds extremely interesting. What jobs would you recommend to a young person with an interest, and maybe a bachelors degree, in economics?</p>
<p>A: If you are looking for a career where your services will be in high demand, you should find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. <strong>So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis. So my recommendation is to take lots of courses about how to manipulate and analyze data: databases, machine learning, econometrics, statistics, visualization, and so on. <em>[emphasis added]</em></strong></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Palantir: so what is it you guys do?</title>
		<link>http://blog.palantirtech.com/2007/12/04/what-do-we-do/</link>
		<comments>http://blog.palantirtech.com/2007/12/04/what-do-we-do/#comments</comments>
		<pubDate>Tue, 04 Dec 2007 08:01:18 +0000</pubDate>
		<dc:creator>Kevin</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/12/04/what-do-we-do/</guid>
		<description><![CDATA[I often ask candidates if they&#8217;re familiar with what we do at Palantir.  Most people think they are.  &#8220;Oh, you&#8217;re that data viz. company,&#8221; or, worse, &#8220;You guys do data mining, right?&#8221;  At least they&#8217;ve heard of us and at least they&#8217;re on the right track, but I cringe anyway.  We [...]]]></description>
			<content:encoded><![CDATA[<p>I often ask candidates if they&#8217;re familiar with what we do at Palantir.  Most people think they are.  &#8220;Oh, you&#8217;re that data viz. company,&#8221; or, worse, &#8220;You guys do data mining, right?&#8221;  At least they&#8217;ve heard of us and at least they&#8217;re on the right track, but I cringe anyway.  We aren&#8217;t just a &#8220;data visualization&#8221; company and we don&#8217;t do &#8220;data mining.&#8221;  It&#8217;s almost impossible to convey the scope and complexity of what we do in a few short minutes&#8212;or to do so without taking the conversation to an eye-glazing level of abstraction.</p>
<p>The following is my attempt at describing what we do at a high level without oversimplifying.  I hope that after reading this a candidate will &#8216;get&#8217; what we&#8217;re about, or at least understand enough not to apply tiny labels to our expansive vision.</p>
<p><span id="more-82"></span></p>
<h2>The problem: implementing analysis</h2>
<p>At Palantir we specialize in <strong>analysis</strong>.</p>
<p>Yes, that&#8217;s painfully abstract, and I&#8217;ll get to it in a second.</p>
<p>In real-world terms, we are building a <strong>software platform</strong> that enables people to take whatever data is relevant to them and understand it more easily and thoroughly than ever before, using concepts that they already understand.  And we are applying this vision, at first, to solving problems in the finance sector and the government intelligence community.</p>
<p>The first important thing to note is that we don&#8217;t actually do the analysis ourselves.  We don&#8217;t devise winning trading strategies and we don&#8217;t catch terrorists.  We write software that enables other people to pull off these feats.  These people, experts in their respective fields, are called <em>analysts.</em></p>
<p>So what exactly do analysts do?  What is analysis?</p>
<blockquote><p>Analysis is everything necessary to extract <strong>insight</strong> from <strong>information</strong>.</p></blockquote>
<p>Let&#8217;s break that down a bit.</p>
<p>Information is easy:  It&#8217;s data.  It lives in a relational database or as files indexed on a hard drive, and you can easily run queries against it.  It comes in two forms, structured and unstructured.  And there is <em>a lot</em> of it in the modern world &#8211; too much, actually, for current tools to make sense of.</p>
<p>Insight is trickier.  Insight is something only a person can generate, and understanding this is critical for any organization that wants to do analysis right.  Thus the challenge of data analysis is how to bring vast amounts of information into productive contact with human intelligence.  In other words, the challenge is how to <em>enable the analyst</em>.</p>
<p>From the analyst&#8217;s perspective there are five essential features of an analysis platform:</p>
<ol>
<li>First, and most important, <em><strong>the analyst should be in control</strong></em>.  In other words, the primary way of interacting with an analysis tool should be <em>human-driven queries</em>.  While automated approaches can complement a human-driven approach, there simply is no substitute for human intelligence.  Unless you put a person behind the wheel, the system can never be flexible or creative enough to uncover truly original insight.  Artificial Intelligence just isn&#8217;t there yet.</li>
<li>Ability to <em><strong>summarize large data sets</strong></em>.  Some of this is what has traditionally been called data mining:  the largely automated approach&#8212;using machine learning or other statistical techniques&#8212;of processing lots of data at once and extracting nuggets that capture something interesting about the data.  Unlike Palantir, traditional approaches have focused almost exclusively on this aspect of analysis.</li>
<li>Ability to <em><strong>visualize large data sets</strong></em>.  Here the analyst wants interesting and informative ways of viewing data graphically, to make it easier for him to digest.  The analyst wants more than just a summary of the data; he wants a nuanced view of what&#8217;s going on <em>inside</em> these data sets:  What&#8217;s the overall shape of the distribution?  What are the outliers?  What are important structures within the data?</li>
<li>Ability to <em><strong>iterate rapidly</strong></em>.  This means enabling the analyst to ask a question, get the answer, and then quickly ask either a variant on the initial question or a follow-up question that depends on the answer to the initial question.  This rapid, iterative process allows the analyst to quickly test out hypotheses and develop theories about what&#8217;s going on in the data, and by extension to discover what&#8217;s going on in the world.</li>
<li>Ability to <em><strong>collaborate with other analysts</strong></em>.  Getting a handle on a terabyte of data, especially when it comprises multiple data types, is definitely more than a one-person job.  Any organization that&#8217;s serious about understanding the world needs a team of analysts that can work together as more than the sum of its parts.  This requires the ability for one analyst to effortlessly share the results of his analysis with his colleagues.</li>
</ol>
<h2>The Palantir approach</h2>
<p>That&#8217;s what analysis looks like to the analyst, or rather what it should look like in an ideal world.  (Current tools fall far short of this vision.)  So what do <em>we</em> do at Palantir in order to make analysis this smooth and easy?</p>
<p>You could say that we help summarize large data sets, in the sense that we have to provide the analyst with a rich library of techniques and algorithms.  You could also say that we do visualization, in the sense that we have to provide the analyst with a set of interesting and informative ways of visualizing their data.  We do both of these things, and we have to be creative and solve hard problems in order to add value in these areas.  But we do a lot more than that.</p>
<p>Probably the most central hard problem that we address in trying to enable the analyst is <strong>data modeling</strong>, the process of figuring out what data types are relevant to a domain, defining what they represent in the world, and deciding how to represent them in the system.  At Palantir we make sure our data model (ontology) is both flexible and dynamic, and that it mirrors the concepts people naturally use when reasoning about the domain.  This is no small challenge, but we&#8217;re already making it a reality.  In finance our basic data types include financial instruments, dates, portfolios, indices, and strategies&#8212;the same things that financial researchers think about, talk about, and reason with.  In the intelligence product our basic data types include people, places, and events (all with associated properties), which is exactly the way we all represent the world in our minds.</p>
<p>Data modeling, data summarization, and data visualization are the core disciplines for approaching large data sets.  Human-driven queries, rapid iteration, and collaboration are multipliers, taking the power unlocked by the core disciplines to the next level.  When these pieces are brought together in a coherent system, the result is in an analysis platform both very generic and very powerful.</p>
<p>This is what we mean when we say that we&#8217;re changing the way people approach data.  Welcome to the future of analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/12/04/what-do-we-do/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
