<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; problemspace &#8211; finance</title>
	<atom:link href="http:///category/problemspace-finance/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Wed, 14 Dec 2011 17:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Fun and Games with the Palantir Finance Spreadsheet Application</title>
		<link>http://blog.palantirtech.com/2011/08/11/mandelbrot-testing-with-hh-lang/</link>
		<comments>http://blog.palantirtech.com/2011/08/11/mandelbrot-testing-with-hh-lang/#comments</comments>
		<pubDate>Thu, 11 Aug 2011 20:18:49 +0000</pubDate>
		<dc:creator>Rico Chiu</dc:creator>
				<category><![CDATA[fun]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[user interface]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1859</guid>
		<description><![CDATA[&#8220;You&#8217;re asking us to test our platform&#8217;s programming language? How am I supposed to do that?&#8221; My head itches from trying to recall the bits and pieces of what I learned in high school about programming, specifically the semantics of a programming language. Sure, I did a bit of programming for homework assignments in college, [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 300px; margin-left: 15px; margin-bottom-15px'><a href="/wp-content/uploads/2011/07/mandelbrot-15-1-to-51.jpg"><img src='/wp-content/uploads/2011/08/mandelbrot-15-1-to-51-tiny-thumb.jpg'/></a></div>
<blockquote><p>&#8220;You&#8217;re asking us to test our platform&#8217;s programming language?  How am I supposed to do that?&#8221;
</p></blockquote>
<p>My head itches from trying to recall the bits and pieces of what I learned in high school about programming, specifically the semantics of a programming language.  Sure, I did a bit of programming for homework assignments in college, but I was no CS major.  This was a much different challenge for a QA engineer to test.  Compared to an application, a programming language is completely open ended; there are no specifications to test, guidelines to follow, or limits to break.</p>
<p>The <a href="http://blog.palantir.com/2011/06/06/tech-talk-the-hedgehog-programming-language/">Hedgehog language</a> had the basic set of tools laid out for me already: I could declare variables, create data structures, and use loops for iteration.  As I was trying out individual usage examples, such as how to structure <code>if</code> statements or how to cast an object to a different type, I realized that this was no way to test something as powerful and flexible as an entire language.  It would be like a doctor who claims that since each individual organ works fine, there are no problems with the entire system.  This is insufficient: <a href="http://blog.palantir.com/2010/07/23/help-is-there-a-doctor-in-the-network/">one needs to look at the system as a whole, including examining the interactions between each component</a>.  I decided I needed to create much larger and elaborate code samples in order to test the Hedgehog language in a larger scope.</p>
<p>Using the Hedgehog language, I had programmed several algorithms, solving puzzles that would output a number.  This was getting bit boring, since once the output value was matched the expected number, there was nothing more to be done.  I wanted to create something more dynamic, a toy I could play around and experiment with.  And opportunity presented itself in the form of one of our newest tools: the spreadsheet application.  With the capability of setting the value of each cell programmatically and then coloring them depending on their value… hmm what could I do with this?</p>
<p>Hedgehog is a powerful tool in coding functions and workflows that directly interact with our applications. Most of the time, the language is used to write expressions for an input value, create custom metrics that return values after a set of calculations, or even to set inputs, calculate, and save documents. Given the language’s ability to integrate with Spreadsheet, the capabilities of the Hedgehog language can literally be visually shown to the user, resulting in some stunning displays.<span> </span>Below are three examples I’ve coded in Palantir Finance’s own language: calculating and drawing the Mandelbrot fractal, simulating Conway’s Game of Life, and solving a Nonograms puzzle.<br />
<span id="more-1859"></span></p>
<p class="MsoNormal"><span style="underline;"><strong>The Mandelbrot fractal</strong></span></p>
<p class="MsoNormal">The<a title="Mandelbrot Set" href="http://en.wikipedia.org/wiki/Mandelbrot_set" target="_self"> Mandelbrot set</a> is one of the most recognized images among fractals.<span> </span>Fractals are unique images due to their self-similarity, such that they have infinite detail: as you zoom in on a fractal border, detailed structures will continue to appear indefinitely.<span> </span></p>
<p>Since the Mandelbrot set is an example of an escape time fractal, we can write a function that counts the number of iterations it takes for a given point to escape a threshold value.<span> </span>Applying this function to each point within a grid of specified range and resolution, we can generate a field of values to be set in a spreadsheet.<span> </span>After resizing the spreadsheet cells into small squares and applying conditional formatting, specifically a heat map that colors the cell based on its value, the below images can be generated:</p>
<div id="attachment_1912" class="wp-caption aligncenter" style="width: 660px"><a href="/wp-content/uploads/2011/07/mandelbrot-15-1-to-51.jpg"><img class="size-full wp-image-1912 " src="/wp-content/uploads/2011/07/mandelbrot-15-1-to-51.jpg" alt="Mandelbrot Fractal drawn from (-1.5, -1) to (0.5, 1)" width="650" height="460" /></a><p class="wp-caption-text">Fig M1. Mandelbrot Fractal drawn from (-1.5, -1) to (0.5, 1), with the top left quadrant visible</p></div>
<div id="attachment_1913" class="wp-caption aligncenter" style="width: 660px"><a href="/wp-content/uploads/2011/07/mandelbrot2.jpg"><img class="size-full wp-image-1913" src="/wp-content/uploads/2011/07/mandelbrot2-thumb.jpg" alt="Mandelbrot Fractal drawn from (0.27205, 0.00451) to (0.27505, 0.00751)" width="650" /></a><p class="wp-caption-text">Fig M2. Mandelbrot Fractal drawn from (0.27205, 0.00451) to (0.27505, 0.00751)</p></div>
<div id="attachment_1914" class="wp-caption aligncenter" style="width: 660px"><a href="/wp-content/uploads/2011/07/manual-heat-map.jpg"><img class="size-medium wp-image-1914" src="/wp-content/uploads/2011/07/manual-heat-map-thumb.jpg" alt="Mandelbrot Fractal with manually created heat map" width="650" /></a><p class="wp-caption-text">Fig M3. Mandelbrot Fractal with manually created rainbow heat map</p></div>
<p class="MsoNormal"><span style="underline;"><strong>The Game of Life</strong></span></p>
<p><span>John Conway’s <a title="Game of Life" href="http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life" target="_self">Game of Life</a> </span>is a simple cellular automaton that shows the evolution of an initial state of cells governed by a set of rules.<span> </span>In the classic scenario, cells can be alive or dead, indicated by a filled or empty cell respectively.<span> </span>When evaluating the next generation of cells, each cell checks the number of live neighbors among its adjacent 8 cells.<span> </span>If the cell is alive with 2 or 3 live neighbors, it remains alive.<span> </span>If a dead cell has 3 live neighbors, it becomes alive.<span> </span>All other scenarios, the cell evolves into a dead cell.</p>
<p class="MsoNormal">In this example, we rely heavily on spreadsheet’s cell dependency tree.<span> </span>Cells are able to reference the value of other cells within their expression, creating a directed acyclic graph. If the value of a cell changes, all cells that point towards the changed cell need to recalculate their value.<span> </span>In this specific case, the user can change the value of which generation to view, which causes the recalculation of the grid of cells representing the game.</p>
<p class="MsoNoSpacing">In the spreadsheet below, the initial state, or generation 0, of “Noah’s Ark” is shown.<span> </span>The initial data for Noah’s Ark is derived from an external spreadsheet document containing a grid of 0’s and 1’s.<span> </span></p>
<p><a href="/wp-content/uploads/2011/07/initial-noahs-ark.jpg"><img class="size-medium wp-image-1915" src="/wp-content/uploads/2011/07/initial-noahs-ark.jpg" alt="Initial generation for Noah's Ark" width="650" /></a></p>
<p class="MsoNoSpacing">Below is an animated image of the evolution of Noah’s Ark from generation 0 to generation 10.</p>
<p><a href="/wp-content/uploads/2011/07/animation.gif"><img class="size-full wp-image-1916" src="/wp-content/uploads/2011/07/animation.gif" alt="Animation of Noah's Ark up to 10th Generation" width="640"/></a></p>
<p class="MsoNormal">One problem encountered is that it takes increasingly longer to calculate a larger generation number. Because the only generation that is saved is the initial generation, calculations must always start from generation 0 when the user changes the generation number.<span> This triggers a lot of redundant calculations: if I am at generation 10 and want to see generation 11, the calculation starts from generation 0, repeats all the same calculations up to generation 10, and finally calculates one more iteration to reach generation 11.  T</span>o prevent unnecessary calculations, the results of previous calculations can be written to a cache file: a separate spreadsheet document containing the generation data that was calculated previously.<span> When caching is enabled, previous generation calculations can be retrieved so the same calculation never happens twice.</span></p>
<p class="MsoNormal">Below is an animated image of a “Gospel Glider Gun” from generation 0 to generation 40, shown in increments of 2 generations.<span> </span>The glider gun will continue to oscillate over time while producing a 5 cell glider object that floats away from the gun diagonally down to the right.<span> </span>Without the caching functionality, the 40<sup>th</sup> generation would take approximately 4 times as long as the 10<sup>th</sup> generation.<span> </span>By enabling caching, each animation frame takes approximately the same time to calculate since only 2 generations are computed per frame.</p>
<p><a href="/wp-content/uploads/2011/07/211.gif"><img class="size-full wp-image-1918" src="/wp-content/uploads/2011/07/211.gif" alt="Animation of Glider Gun up to 40th generation" width="640" /></a></p>
<p><span style="underline;"><strong>Nonograms Solver</strong></span></p>
<p><span><a title="Nonograms" href="http://http://en.wikipedia.org/wiki/Nonogram" target="_self">Nonograms</a>, sometimes better known as Picross, </span>are a type of logic puzzle where the user is given a blank grid with a set of numbers for each row and column.<span> </span>The numbers represent the number of consecutively filled blocks in the final solution within that row or column.<span> </span>Using these numbers as clues, it can be deduced whether or not a space is filled or not, until the entire grid simultaneously satisfies all given conditions.<span> </span>Solving puzzles is a NP-complete problem: it is very easy to verify a solution by checking if each row and column satisfies its set of numbers, but becomes increasingly difficult to solve larger puzzles.<span> </span>Time it takes to check a puzzle increases linearly with puzzle size, but the time to solve increases much faster due to the increase in number of iterations over each row or column crossed with an increased number of permutation possibilities for each set of numbers per iteration.<span> </span>Solving a 25&#215;25 nonogram puzzle by hand usually takes about 30 minutes for an experienced solver.</p>
<div style='float: right; margin-left: 15px; margin-bottom: 15px;'><a href="/wp-content/uploads/2011/07/nonogram-permutation.jpg"><img class="size-full wp-image-1919" src="/wp-content/uploads/2011/07/nonogram-permutation.jpg" alt="Possible permutations of [2,1] in a row of 5 spaces" width="186" height="146" /></a></div>
<p class="MsoNormal">Given the initial set of values for each row and column, the algorithm iterates through each row or column individually. <span> </span>The simplest method to solve is to find all possible permutations of the given numbers over the length of the row or column and check if there are any spaces that are consistently filled or unfilled across all possibilities.<span> </span>Let’s consider a simple example: In a row of 5 spaces, the set of values given is [2,1].<span> </span>Thus the possible permutations are shown on the right, where 1 is a filled space, -1 is an unfilled space, and 0 is undetermined.</p>
<p class="MsoNormal">Additional optimizations, such as queuing the order of rows and columns to solve, trimming the solved ends of a range of spaces, and avoiding unnecessary calculations, were made to improve the performance of the algorithm such that the average time to solve a 25&#215;25 puzzle is 5 minutes.<span> </span>This is fairly impressive given that the Hedgehog language is by no means optimized to handle such calculations and array manipulations.<span> </span>Below are some examples of solved puzzles:</p>
<div id="attachment_1920" class="wp-caption aligncenter" style="width: 660px"><a href="/wp-content/uploads/2011/07/10x10.jpg"><img class="size-full wp-image-1920" src="/wp-content/uploads/2011/07/10x10.jpg" alt="10x10 nonogram solution, solved in less than a second" width="650" /></a><p class="wp-caption-text">Fig N1. 10x10 nonogram solution, solved in less than a second</p></div>
<div id="attachment_1921" class="wp-caption aligncenter" style="width: 657px"><a href="/wp-content/uploads/2011/07/20x20.jpg"><img class="size-full wp-image-1921" src="/wp-content/uploads/2011/07/20x20.jpg" alt="20x20 nonograms solution, solved in 20 seconds" width="647" /></a><p class="wp-caption-text">Fig 2. 20x20 nonograms solution, solved in 20 seconds</p></div>
<div id="attachment_1922" class="wp-caption aligncenter" style="width: 660px"><a href="/wp-content/uploads/2011/07/25x25.jpg"><img class="size-full wp-image-1922" src="/wp-content/uploads/2011/07/25x25.jpg" alt="25x25 nonograms solution, solved in 3 minutes" width="650" /></a><p class="wp-caption-text">Fig N3. 25x25 nonograms solution, solved in 3 minutes</p></div>
<p>The initial spark that ignited this series of code examples started from just a simple curiosity of, &#8220;I wonder if it would be possible to write this in Hedgehog?&#8221;  In a way, it was also an exercise for me to learn more about the Hedgehog language, extending the use of its capabilities and libraries of metrics, in order to inject more complicated expressions while testing other applications on our platform. Throughout the process of developing the finished product, I was able to expose issues with the language that may have gone unnoticed with smaller, simpler test cases.  </p>
<p>But of course, the driving motivation behind all this was to have fun, which is a big part of what it means to be working at Palantir.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/08/11/mandelbrot-testing-with-hh-lang/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tech Talk: the Hedgehog Programming Language</title>
		<link>http://blog.palantirtech.com/2011/06/06/tech-talk-the-hedgehog-programming-language/</link>
		<comments>http://blog.palantirtech.com/2011/06/06/tech-talk-the-hedgehog-programming-language/#comments</comments>
		<pubDate>Mon, 06 Jun 2011 20:53:38 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1844</guid>
		<description><![CDATA[A few months back, Kevin introduced us to the Hedgehog Programming language &#8211; (here&#8217;s the post if you missed it). The Palantir Finance programming language — Hedgehog as we know it — is an interpreted, statically typed, object-oriented language. With a syntax that’s based loosely on Java, it mixes roughly Java-style semantics and a few [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 300px; margin-bottom: 15px; margin-left: 15px'><a target='new' href='http://www.pfinance.com/'><img src="http://blog.palantir.com/wp-content/uploads/2010/10/hedgehog.jpg" alt="" title="hedgehog" width="300" height="129" class="alignnone size-medium wp-image-1753" /></a></div>
<p>A few months back, Kevin introduced us to the Hedgehog Programming language &#8211; <a href="http://www.youtube.com/watch?v=54Vv3Os3Ep4">(here&#8217;s the post if you missed it)</a>.  </p>
<p>The Palantir Finance programming language — Hedgehog as we know it — is an interpreted, statically typed, object-oriented language. With a syntax that’s based loosely on Java, it mixes roughly Java-style semantics and a few idiosyncrasies that make it a really interesting case study in language design. It’s built to be extremely efficient for batch operations on time series, which is the heavy lifting in financial analysis.</p>
<p>In this video, Eugene and Dave, two of the engineers that work on the language and platform features needed to support it, give a talk that goes into a number of areas around the Hedgehog language, including why we needed to build a language, how it makes the platform more powerful, how we built dev tools into the UI to make debugging easier, and a bunch of the nitty-gritty features that go into the strange (but fitting) beast that is the Hedgehog Language.</p>
<p><iframe title="YouTube video player" width="640" height="510" src="http://www.youtube.com/embed/54Vv3Os3Ep4" frameborder="0" allowfullscreen></iframe></p>
<p>As a final note: this is one of things that I love about working at Palantir Technologies.  We study a problem pretty hard before we decide that we need to re-invent the wheel &#8211; and then when we do, we go all out.  It&#8217;s one of the benefits of working with the incredibly talented and motivated folks here.  When someone says, &#8220;well, we need to build a programming language.  No, we&#8217;re sure,&#8221; we just roll up our sleeves and do it.  We can add it to the list of: <a href="http://blog.palantir.com/2009/02/23/palantir-monitoring-server-where-build-beats-buy/">JMX monitoring system</a>, <a href="http://blog.palantir.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/">refined Lucene search engine</a>, <a href="http://blog.palantir.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/">speeding up Map-Reduce-like systems to interactive time</a>, and <a href="http://www.palantirtech.com/government/analysis-blog/isr">implementing our own GIS platform</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/06/06/tech-talk-the-hedgehog-programming-language/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Hedgehog Programming Language</title>
		<link>http://blog.palantirtech.com/2011/02/02/hhlang/</link>
		<comments>http://blog.palantirtech.com/2011/02/02/hhlang/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 07:56:49 +0000</pubDate>
		<dc:creator>Kevin Simler</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1759</guid>
		<description><![CDATA[One thing about being a developer on the Palantir Finance product that doesn&#8217;t get nearly enough publicity is the fact that we have our own programming language. I&#8217;m pretty excited about it so let me repeat, with emphasis: we have our own programming language. Yeah, it&#8217;s awesome. All those late hours you spent in the [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 300px; margin-bottom: 15px; margin-left: 15px'><a target='new' href='http://www.pfinance.com/'><img src="http://blog.palantir.com/wp-content/uploads/2010/10/hedgehog.jpg" alt="" title="hedgehog" width="300" height="129" class="alignnone size-medium wp-image-1753" /></a></div>
<p>One thing about being a developer on the Palantir Finance product that doesn&#8217;t get nearly enough publicity is the fact that we have our own programming language.  I&#8217;m pretty excited about it so let me repeat, with emphasis:  <em><strong>we have our own programming language</strong></em>.  Yeah, it&#8217;s awesome.  All those late hours you spent in the lab working on your final project in compilers:  turns out they&#8217;re actually good for something other than getting into grad school.</p>
<p>Building this language ourselves &#8212; as opposed to, say, using an existing language that already just works &#8212; wasn&#8217;t an easy decision.  In fact, it wasn&#8217;t even a single decision.  We wracked our collective brain dozens of times trying to think of a better approach.  But every which way we sliced it, the problems we needed to solve always pointed to building our own language.  I still question this decision sometimes, but on the whole I&#8217;m very happy with how things have turned out.</p>
<p><span id="more-1759"></span></p>
<p>The Palantir Finance programming language &#8212; Hedgehog as we know it &#8212; is an interpreted, statically typed, object-oriented language. With a syntax that&#8217;s based loosely on Java, it mixes roughly Java-style semantics and a few idiosyncrasies that make it a really interesting case study in language design.  It&#8217;s built to be extremely efficient for batch operations on time series, which is the heavy lifting in financial analysis.  It also allows you to dynamically add methods to a class from outside the class itself (conceptually similar to <a href="http://juixe.com/techknow/index.php/2006/06/15/mixins-in-ruby/">Ruby&#8217;s Mixins</a>) &mdash; you define the function and its input type, and when you type the dot operator, your new method is auto-completed alongside all the &#8220;native&#8221; methods.  Hedgehog also has a vast number of effectively global constants: all the stocks, bonds, and other financial instruments that are essential to the user experience, but that make for quite a design challenge.</p>
<p>I&#8217;m not a language guy myself, so instead of continuing to geek out over the core language features, I want to geek out about an emergent property that&#8217;s truly unique to the Hedgehog language.  But first I&#8217;m going to back up and talk about something else that&#8217;s really important to us at Palantir:  user experience. (I&#8217;ll get back to languages I promise.)</p>
<p>There&#8217;s a UX principle that says your interface should be &#8220;low threshold, high ceiling&#8221;. That is, it should be easy for the user to get started, but also able to do powerful things.  This is actually a corollary of a more general principle:  that your interface should strive for the <strong>optimal learning curve</strong>.  My first CS professor explained this with a set of three diagrams, each representing one of the major OS families.  I don&#8217;t remember exactly how he drew these diagrams at the time, but an updated version of them might look like this:</p>
<div style='text-align: center'><img style='margin-auto' src="http://blog.palantir.com/wp-content/uploads/2010/10/learning_curves.png" alt="" title="learning_curves" class="alignnone size-medium wp-image-1748" /></div>
<p>The x-axis of each curve represents &#8220;wizardry,&#8221; a measure of the user&#8217;s technical sophistication.  The y-axis represents the power of the system &#8212; how much the user can accomplish at a given level of wizardry.</p>
<p>The best of the three curves, my prof argued, was the third curve.  The first learning curve is great for providing incentives to learn.  Each unit of effort spent to increase your wizardry yields an appropriate amount of reward or power.  The drawback is that it&#8217;s hard for new users to do anything useful; its reward threshold is too high.  The middle curve has a lower threshold and is better for novice users, but will frustrate an intermediate user because of the great plateau in the middle.  (This might represent a place where the GUI isn&#8217;t powerful enough for the tasks you want to accomplish but scripting is still too difficult, leaving no way to express your commands)  The third curve, however, is the best of both worlds:  a low threshold and a smooth trajectory to the top.</p>
<p>Now let&#8217;s apply this back to our topic at hand, programming languages.  Specifically, what does the learning curve look like for learning a first language?  (Once you&#8217;ve learned one, of course, the rest come pretty easily.)</p>
<div style='float: right; width: 469px; margin-bottom: 15px; margin-left: 15px'><img src="http://blog.palantir.com/wp-content/uploads/2010/10/learning_curves_languages.png" alt="" title="learning_curves_languages" class="alignnone size-medium wp-image-1749" /></div>
<p>If your experience of learning to program was anything like mine, the first few projects in your first language were <em>painful</em>.  You could sense the power further up the curve &#8212; it&#8217;s what convinced you to stick with CS &#8212; but simple tasks took a lot more effort than they should have, at the beginning.</p>
<p>Hedgehog on the other hand &#8212; our little homebrew that will someday have its own Wikipedia page &#8212; has the smoothest learning curve I&#8217;ve ever seen in a programming language.  That&#8217;s the emergent property I wanted to talk about, because it&#8217;s a thing of beauty.  You can get started with Hedgehog right away and accomplish quite a bit &mdash; without even knowing that you&#8217;re &#8220;programming&#8221; and the slope on the curve stays relatively constant throughout your trajectory.</p>
<p>We didn&#8217;t realize it at the time, but we were probably destined to create a low-threshold, high-ceiling language with a smooth learning curve, due to the nature of our user base.  Financial analysts are impatient, and they still need to perform many kinds of complicated analysis.  They definitely don&#8217;t have the time or inclination to spend a semester learning how to program.  The solution to their problem is Hedgehog.</p>
<p>Allow me to illustrate with one of the earliest things a user might type into the expression bar:</p>
<div style='text-align: center'><img style='margin-auto' src="http://blog.palantir.com/wp-content/uploads/2010/10/ibm.png" alt="" title="ibm" class="alignnone size-medium wp-image-1746" /></div>
<p>And that&#8217;s it.  The user types a ticker symbol and he gets a chart of IBM&#8217;s stock price.  At no point did he have to wonder about variables or types or #includes.  This experience is so <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">frictionless</a> he probably doesn&#8217;t even realize he&#8217;s writing code in a programming language.  He just starts with what he knows, and the system gives him what he wants.</p>
<p>It starts to get interesting as you move further up the curve.  Take this user input:</p>
<div style='text-align: center'><img style='margin-auto' src="http://blog.palantir.com/wp-content/uploads/2010/10/ibm_volume.png" alt="" title="ibm_volume" class="alignnone size-medium wp-image-1747" /></div>
<p>Of course that innocent dot between &#8220;IBM&#8221; and &#8220;volume&#8221; means a method invocation to anyone who&#8217;s familiar with C++ or Java.  But to a new Palantir Finance user it simply means, &#8220;Let me access all the types of data associated with IBM.&#8221;  Conceptually painless.</p>
<p>Or how about this one?</p>
<div style='text-align: center'><img style='margin-auto' src="http://blog.palantir.com/wp-content/uploads/2010/10/histogram.png" alt="" title="histogram" class="alignnone size-medium wp-image-1745" /></div>
<p>The <code>volume/1000</code> expression is an anonymous method acting in the scope of a Stock object; it&#8217;s syntactic sugar for <code>return this.volume()/1000;</code>.  But by allowing the user to strip away all the unnecessary syntax, we make learning the language that much easier.</p>
<p>I could go on tracing the curve here (I&#8217;ve only scratched the surface), but I hope I&#8217;ve made my point: we coax new users into writing code by making it look as much as possible like performing operations that they already intuitively understand.  This is one of the benefits of creating a domain-specific language &mdash; we got the richness of the domain for free, and all the understanding that comes with it &mdash; and then we went above and beyond the simplification of a traditional <a href="https://secure.wikimedia.org/wikipedia/en/wiki/Domain-specific_language">DSL</a> to really pare down the complexity of the language for novice users.</p>
<p>From simple beginnings like the ones I&#8217;ve shown here, it doesn&#8217;t take our users long at all to cross the threshold to more intermediate-level work, such as chaining function calls together or creating their own methods.  As far as the high ceiling goes, we&#8217;re still working on it, but the language is currently capable of producing not only a <a href="http://en.wikipedia.org/wiki/Quine_(computing)">quine</a>, as one of our candidates showed us (yes, we ended up hiring him), but also code that can generate studies like the one below:</p>
<p><a href="http://blog.palantir.com/wp-content/uploads/2010/10/dashboard.png"><img src="http://blog.palantir.com/wp-content/uploads/2010/10/dashboard.png" alt="" title="dashboard" width="100%" class="alignnone size-medium wp-image-1744" /></a></p>
<p>So Hedgehog has a low threshold and a smooth learning curve, and the ceiling is high enough that our users can do some really serious information processing with it &#8212; tasks that would make their other tools break down and cry.  But there&#8217;s still a lot of interesting work for us to do, especially in pushing the language&#8217;s ceiling higher (developing better interactive debugging; working with large objects efficiently) &mdash; and as always, making it <em>faster</em>.</p>
<p><em>If you&#8217;d like to see the Hedgehog Programming Language in action, you can sign up for an account at <a href='http://joyride.pfinance.com/'>Palantir JoyRide</a>. the <a href="http://www.pfinance.com/">Palantir Finance</a> public demo.</a></em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/02/02/hhlang/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A rigorous friction model for human-computer symbiosis</title>
		<link>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/</link>
		<comments>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 03:18:52 +0000</pubDate>
		<dc:creator>Asher Sinensky</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1344</guid>
		<description><![CDATA[This is a response to Ari&#8217;s awesome post on human-computer symbiosis. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look: We are attempting to understand the total analytic capability for a given task a of a human-computer [...]]]></description>
			<content:encoded><![CDATA[<div style='text-align: center; float: right; margin-left: 15px; margin-right: 15px'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt="" width="300"/>
</div>
<p>This is a response to <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">Ari&#8217;s awesome post on human-computer symbiosis</a>. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look:</p>
<p>We are attempting to understand the total analytic capability for a given task <strong><em>a</em></strong> of a human-computer team. Analytic capability in this case probably means:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq1.png" alt="eq1"/>(1)
</div>
<p>Where <strong><em>A</em></strong> is the answer to the analytic problem in question and <strong><em>t<sub>A</sub></em></strong> is the time needed to arrive at the answer based on the inputs available. In the case of chess, <strong><em>A</em></strong> could be the optimum next move given all previous information and <strong><em>t<sub>A</sub></em></strong> would be how long it takes to decide on this move.</p>
<p>Read on for a look at how this generalizes in human-computer symbiotic systems.<br />
<span id="more-1344"></span></p>
<p>In the case of the human-computer team, we know that <strong><em>a </em></strong>is going to be a function of both the human&#8217;s analytical capability <strong><em>h</em></strong> and the computer&#8217;s analytical capability <strong><em>c</em></strong> (where both <strong><em>h</em></strong> and <strong><em>c</em></strong> have units of answers/time). In the limit case we know that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq2.png" alt="eq2"/>(2)
</div>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq3.png" alt="eq3"/>(3)
</div>
<p>Or in plain English, if there is no human present, the total analytic capability is simply the analytic capability of the computer. So the naïve solution would be that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq4.png" alt="eq4"/>(4)
</div>
<p>(4) clearly meets the limiting cases described in (2) and (3). Kasparov noticed a mixing function where the ability of the human and computer to work together becomes the dominant term &mdash; we might call this the mixing capability for the given task or <strong><em>m</em></strong>. Including this phenomenon, the total analytic capability (4) would be re-defined as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq5.png" alt="eq5"/>(5)
</div>
<p>where <strong><em>m</em></strong> has the property that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq6.png" alt="eq6"/>(6)
</div>
<p>Thus maintaining the limits expressed in (2) and (3) and adhering to the observation that if there is no human or computer component then there will be no mixing advantage. A naïve solution to this constraint would be simple linear mixing:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq7.png" alt="eq7"/>  (7)
</div>
<p>where <strong><em>M</em></strong> (units of time per answer) is the mixing efficiency and will be primarily based on the type of task being solved &mdash; some analytical tasks lend themselves to a combined process more than others (for example, multiplying 20 digit numbers does not really benefit from the intuition of a human so the ability of a human and computer to perform this task is merely their additive ability). </p>
<p>What Kasparov noticed is that the mixing was primarily based on the quality of the process rather than the analytical power of either the human or computer separately. This seems to imply that we must somehow account for the fact that the quality of the human-computer interface is responsible for the quality of the mixing. This can be modeled as a unitless friction of interaction <strong><em>f<sub>i</sub></em></strong> that impedes the ability of the human and computer to work together. </p>
<p>Equation (7) can thus be re-written as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq8.png" alt="eq8"/>(8)
</div>
<p>In this case, the maximum value for the mixing capability is realized when the friction of interaction goes to zero. This mixing capability is the same as the equation Ari developed (less the coefficient which is necessary to maintain consistent units throughout).</p>
<p>We can now re-write our analytic capability in (5) as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq9.png" alt="eq9"/>(9)
</div>
<p>Below, see a plot of this function over a range of values for <strong><em>h</em></strong>, <strong><em>c</em></strong> and <strong><em>f<sub>i</sub></em></strong>:</p>
<div style='text-align: center; margin: auto; margin-bottom: 1em;'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt=""/>
</div>
<p>As can clearly be seen from this functional plot (note the vertical scale), the effect of interface friction dominates over the other terms whenever both the human and computer can make important contributions to the task at hand. The conclusion can be drawn that the most effective way to solve analytical problems is to minimize the friction of the human-computer interface; or to put it another way: optimal analytical systems are those that are built specifically to maximize the ability of the human to leverage the ability of the computer.</p>
<p>I am certain there is still the possibility for further refinement, for example:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq10a.png" alt="eq10a"/>(10)
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Friction in Human-Computer Symbiosis: Kasparov on Chess</title>
		<link>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/</link>
		<comments>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:32:06 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1302</guid>
		<description><![CDATA[As we build our platforms and applications following a human-computer symbiosis approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way. One of the areas that we&#8217;re interested is in the overall friction of analysis systems. The systems that we build are [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px;'>
<img src='/wp-content/uploads/2010/03/fools-mate.gif'/>
</div>
<p>As we build our <a href="http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/">platforms</a> and <a href="http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/">applications</a> following a <a href="http://en.wikipedia.org/wiki/Intelligence_amplification">human-computer symbiosis</a> approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way.</p>
<p>One of the areas that we&#8217;re interested is in the overall friction of analysis systems.  The systems that we build are built on commodity hardware &mdash; we&#8217;re not building faster computers and yet we can deliver orders-of-magnitude better performance on analysis tasks than existing solutions.  How do we do this?  By building software in such a way that it reduces the friction experienced at the boundaries between the computing power, the analyst,  and the source data.</p>
<h2>Chess as analysis laboratory</h2>
<p>Chess is, at its heart, a predictive venture.  The player attempts to anticipate their opponent&#8217;s moves, planning their own moves accordingly, with the straightforward goal of finding a sequence of piece moves that force checkmate. </p>
<p>This game is, in its ideal form, analysis. (The moves made are the logical extension of the analysis.)  The data are clean, the problem is well-defined and everyone plays by the same rules.  There are even <a href="http://en.wikipedia.org/wiki/Elo_rating_system">well-defined metrics for ranking chess players by skill</a> &mdash; a better chess player is a better chess-game analyst.  </p>
<p>In the realm of evaluation of analysis systems, this is as about as good as it gets in terms of designing controlled experiments to study the relative strengths of different analysis systems.</p>
<p><a href="http://en.wikipedia.org/wiki/Garry_Kasparov">Garry Kasparov</a>, widely considered to be the greatest chess player of all time,  recently wrote <a href="http://www.nybooks.com/articles/23592">a review of Diego Rasskin Gutman&#8217;s book</a>, <a href="http://www.amazon.com/Chess-Metaphors-Artificial-Intelligence-Human/dp/026218267X"><u>Chess Metaphors: Artificial Intelligence and the Human Mind</u>.</a></p>
<p>The review is excellent and covers a lot of ground.  However, one particular anecdote stood out as a very interesting example of human-computer symbiosis (emphasis added):</p>
<blockquote><p>In 2005, the online chess-playing site Playchess.com hosted what it called a &#8220;freestyle&#8221; chess tournament in which anyone could compete in teams with other players or computers. Normally, &#8220;anti-cheating&#8221; algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less &#8220;intelligent&#8221; than the playing programs they detect.)</p>
<p>Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.</p>
<p>The surprise came at the conclusion of the event. <em>The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time.</em> Their skill at manipulating and &#8220;coaching&#8221; their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. <em>Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</em></p></blockquote>
<p>After the jump, we look at this finding in a more generalized way and map it onto the Palantir approach.<br />
<span id="more-1302"></span></p>
<h2>The cyborg Grandmaster: a fearsome opponent</h2>
<p>The tournament Kasparov recalls was a showcase of chess talent, human-computer symbiosis, and raw computing power.  Among those entered  in the tournament were a purpose-made chess machine (similar to <a href="http://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)">Deep Blue</a>) named <a href="http://en.wikipedia.org/wiki/Hydra_(chess)">Hydra</a> and a team of <a href="http://en.wikipedia.org/wiki/Grandmaster_(chess)">Grandmasters</a> assisted by computer programs.</p>
<p>One losing participant had this to say about the computer-aided Grandmasters:</p>
<blockquote><p>
Secondly, I have learned that a <a href="http://en.wikipedia.org/wiki/Grandmaster_(chess)">Grandmaster</a> armed with a chess engine is a killer combination against a plain Engine. Engines see everything via brute force, Grandmasters use their intuition and are able to see &#8220;obvious&#8221; moves at once. So the two of them together are a mighty force.
</p></blockquote>
<p>This is just as Licklider predicted 50 years ago &#8212; quoting <a href="http://blog.palantirtech.com/man-computer-symbiosis/">Man-Computer Symbiosis</a> (if I could put it better, I would):</p>
<blockquote><p>
Men will set the goals and supply the motivations, of course, at least in the early years. They will formulate hypotheses. They will ask questions&#8230; In general, they will make approximate and fallible, but leading, contributions, and they will define criteria and serve as evaluators, judging the contributions of the equipment and guiding the general line of thought.</p>
<p>&#8230;</p>
<p>In addition, the computer will serve as a statistical-inference, decision-theory, or game-theory machine to make elementary evaluations of suggested courses of action whenever there is enough basis to support a formal statistical analysis. Finally, it will do as much diagnosis, pattern-matching, and relevance-recognizing as it profitably can, but it will accept a clearly secondary status in those areas.
</p></blockquote>
<p>So in classic intelligence amplification fashion, having computer programs that can quickly evaluate a move&#8217;s likelihood of success can <em>amplify the power of the Grandmaster</em>.</p>
<p>While empirically true, it does beg the question: how <em>much</em> does it amplify the power of the Grandmaster?</p>
<p>One approximation might be product as a simple linear amplification.  Let&#8217;s imagine a function, <em>a(h,c)</em>, in which the analytic power (<em>a</em>) is the product of power of the human (<em>h</em>) and the computing power of the chess engine being used (<em>c</em>).  This gives us the equation:</p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-simple.png'/>
</div>
<h2>One term to dominate them all: friction-of-interface</h2>
<p>Does this simple approximation hold up?  It does not. The team that won the <a href="http://www.chessbase.com/newsdetail.asp?newsid=2461">PAL/CSS Freestyle Tournament in 2005</a> was composed of two amateur chess players that were able to best a computer-assisted Grandmaster.</p>
<p>How did  they accomplish this feat?  It was not through superior compute power.  Instead, they did so by more effectively feeding insights to their three chess engines. They played so well that a large number of people actually assumed that it was actually Kasparov himself playing:</p>
<blockquote><p>
Many speculated that it might be Garry Kasparov, who was the initiator of this kind of computer assisted chess matches. When we asked him Kasparov confirmed that was not the case. But he reminded us that it doesn&#8217;t really matter. The guiding principle of Freestyle Chess: anything is allowed. &#8220;Even if they were assisted by the devil, that would probably be covered by the rules,&#8221; he joked. &#8220;Only the moves they played count.&#8221;
</p></blockquote>
<p>What does this mean for our simple equation? Well, it looks it&#8217;s missing a term, one we&#8217;ll call <em>f</em>, that describes the efficiency or <strong>friction</strong> of the interface between human and computer.</p>
<p>Quoting Kasparov again:</p>
<blockquote><p>
<em>Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</em>
</p></blockquote>
<p>The implication being that the equation actually looks like this:</p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-variable-h.png'>
</div>
<p>So as the friction of the interface goes to zero, the full amplification of the chess engine is brought to bear.  A quick gut-check in the opposite direction agrees: one can imagine the world&#8217;s most powerful chess engine with the world&#8217;s worst interface; spending the time it would take to express commands to this theoretically awful program would actually be worse than playing without it.</p>
<h2>Palantir: a low-friction interface to data</h2>
<p>As analysis problems go, chess resembles <a href="http://en.wikipedia.org/wiki/Spherical_cow">a spherical cow in a vacuum</a>.  Analysis problems in the real world are orders of magnitude messier.</p>
<p>Let&#8217;s reframe the terms of our equation above into a more general approach to analysis:</p>
<ul>
<li><em>H</em> &#8211; this is power of the analyst.  In chess, the value of this terms varies widely between players; in designing real-world data analysis systems, this is more or less a constant (which is why <em>h</em> above becomes <em>H</em> below).  Of course there are differing levels of expertise, training, and raw ability amongst the user population, but when we design systems, it&#8217;s with the average case in mind.</li>
<li><em>c</em> &#8211; computing power. How fast are the machines?  How well do they scale?  How efficiently do they perform the data tasks at hand? Palantir spends significant engineering effort on optimizing the <em>c</em> term, but most of the growth in this term comes from the layers we depend on, built by companies like Intel, Sun, Oracle, etc.</li>
<li><em>f</em> &#8211; friction.  How easy is it to bring <em>c</em> to bear on the problem? Note that when we talk about <em>friction of interface</em>, this is not exclusively referring to user interface.  More generally, friction can be present at any interface between two systems: data-software, software-software, human-software, etc. The <em>f</em> that we consider in this simple model is sum total system friction.</li>
</ul>
<p>So our final formulation is just in terms of <em>c</em> and <em>f</em> (holding <em>H</em> as a constant): </p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-final.png'>
</div>
<p>When we discuss friction in real-world analysis systems, the friction actually exists at multiple levels:</p>
<ol>
<li>Creating an analysis model that will enable answering the questions that need to be explored</li>
<li>Integrating the data into a single coherent view of the problem</li>
<li>Enabling analysis tools to efficiently query and load the data</li>
<li>Exposing APIs that allow developers to develop custom solutions quickly and efficiently for modeling and analysis tasks not covered by general tools</li>
<li>User interface that makes the tools easy, enjoyable, and quick to use</li>
</ol>
<h3>Minimizing <em>f</em>: Haiti Flooding Predictions</h3>
<p>If this is starting to sound very similar to Palantir&#8217;s marketing information, this is no accident. While some of our backend engineers are concerned with things like scaling and speed-of-querying, the overall innovation that we&#8217;re bringing to the field is not simply about faster data processing systems (even if they are) but reducing the friction at every interface inside a complex human-computer symbiotic system.</p>
<p>You want an example that ties it all together?  It starts with a simple question: which of the many displaced-person camps in Haiti are most at risk for flooding as the rainy season approaches?  Easy to ask, but not so simple to answer. </p>
<p>The original introduction to this video: </p>
<blockquote><p>As we enter the beginning of the rainy season in Haiti, one of the biggest problems facing relief organizations today is the spectre of flooding and mudslides destroying Internally Displaced Persons (IDP) Camps. In this video, we integrate data from many sources to determine high risk aid locations.
</p></blockquote>
<p>The data integration for this video took about six hours, using sources of data that had never before been fused.  The analysis itself takes a few minutes and quickly comes to an actionable answer to the original question.</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv"/></object>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Palantir: like an operating system for data analysis</title>
		<link>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/</link>
		<comments>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/#comments</comments>
		<pubDate>Sat, 07 Nov 2009 03:21:44 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1198</guid>
		<description><![CDATA[If you&#8217;ve taken the time to peruse the Palantir Government analysis blog, you&#8217;ve seen numerous examples of Palantir Government as applied to interesting problems; they are recorded screen captures of our analysis desktop client. It&#8217;s a showcase of useful, meaningful, and compelling visual and semantic tools being used to do analysis on a wide range [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/VisiCalc'><img src='/wp-content/uploads/2009/11/visicalc.png' width='250'/></a>
</div>
<p>If you&#8217;ve taken the time to peruse the Palantir Government <a href='http://www.palantirtech.com/government/analysis-blog'>analysis blog</a>, you&#8217;ve seen numerous examples of Palantir Government as applied to interesting problems; they are recorded screen captures of our analysis desktop client.  It&#8217;s a showcase of useful, meaningful, and compelling visual and semantic tools being used to do analysis on a wide range of datasets.</p>
<p>What enabled this analysis? Aside from the <a href="http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/">obvious hard work of our UI and analysis tools teams</a>, it&#8217;s the flexibility and power of the Palantir data platform.  More than just a scalable datastore, the Palantir data platforms act as robust and clean abstractions on top of data.</p>
<p>One of the early architecture decisions that we made when building both <a href="http://www.palantirtech.com/government">Palantir Government</a> and <a href="http://www.palantirfinance.com/">Palantir Finance</a> was to separate the respective data platforms from the end-user applications used to actually perform analysis.  More than just following the client-server model, this separation made the data servers in both products into generic intelligence infrastructure for analytic problems, with our clients acting as analysis applications on top of those platforms.</p>
<p>And so, one way to look at our data platform is as an operating system for analytic applications.  In this post we&#8217;ll explore the history of operating systems, understand why they&#8217;re so important and see how the Palantir data servers deliver the same potential to revolutionize the writing of analysis software that operating systems did to the writing of general programs for computers.</p>
<p><span id="more-1198"></span></p>
<h2>The OS: abstraction that begat a paradigm</h2>
<p>In the early days of computing, when a programmer wanted to write a program, they had to understand the inner workings of the machine. Writing a program required understanding things like the bus interface of a specific model of hard drive when all that was needed by the program was the clean abstraction of a filesystem. The upshot of this is that much of the time and effort put into a given task was spent writing code to interface with the &#8220;physical&#8221; minutiae of the machine rather than implementing the solution to the problem that the programmer was trying to solve with their software.</p>
<p>This pattern was observed by  <a href="http://en.wikipedia.org/wiki/J._C._R._Licklider">J.R. Licklider</a> and noted in his influential paper, <i><a href="http://blog.palantirtech.com/man-computer-symbiosis/">Man-Computer Symbiosis</a></i> (emphasis added):</p>
<blockquote><p>
<b>About 85 per cent of my “thinking” time was spent getting into a position to think, to make a decision, to learn something I needed to know. Much more time went into finding or obtaining information than into digesting it.</b> Hours went into the plotting of graphs, and other hours into instructing an assistant how to plot. When the graphs were finished, the relations were obvious at once, but the plotting had to be done in order to make them so.<br />
…<br />
<b>Throughout the period I examined, in short, my “thinking” time was devoted mainly to activities that were essentially clerical or mechanical</b>: searching, calculating, plotting, transforming, determining the logical or dynamic consequences of a set of assumptions or hypotheses, preparing the way for a decision or an insight. <b>Moreover, my choices of what to attempt and what not to attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability.</b>
</p></blockquote>
<p>This description of his time as a researcher was echoed in the work of the early programmers: they spent much of their programming time re-inventing the wheel and writing routines that were doing essentially clerical or mechanistic work related to the functioning of the hardware rather the core functions of their programs.</p>
<p>The operating system changed all that: suddenly (and by that I mean: with years of hard work, research, and incremental change) that noisy, inconsistent pile of hardware was transformed into a set of clean abstractions. The programmer was finally freed to spend time and energy on the problem they were really trying to solve.</p>
<p>And so we come to the modern era: dealing with the messy details of hardware has been replaced by the clean and robust abstraction of the operating system.</p>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Operating_system'><img src='/wp-content/uploads/2009/11/250px-operating_system_placementsvg.png' width='250'/></a>
</div>
<p>Three important properties of modern operating systems:</p>
<ul>
<li><b>Hard boundaries between OS functions and process functions</b> &#8211; in modern operating systems, this is usually accomplished with system calls.  The process places the inputs to the system call in a known location and then asks the OS to perform some operation, like writing to a file or making a network connection.  The OS may or may not perform the function, based on things like permissions, availability of resources, etc.
<p>The most important feature here is that the process never has direct access to the true resources of the machine &mdash; instead, all access to the machine&#8217;s resources are brokered by the OS.
</li>
<li><b>Extensions of the abstraction in every direction</b> &#8211; An OS like Linux is really, at its core, a kernel that does process scheduling and lifecycle, manages memory, and services system calls. Everything else is handled by some sort of driver.  A driver might also be called, more generically, a plugin or extension.  Drivers exist for everything from block devices (like hard drives), network cards, and filesystems to input devices and displays.</li>
<li><b>Designed as a general purpose framework</b> &#8211; the operating system <i>doesn&#8217;t actually do any computing</i>; rather, it&#8217;s a set of services to facilitate processes using the resources of the computer.  To that end, they&#8217;re not designed with a specific process in mind, but rather to serve a large class of programs, each designed and written to accomplish a different task using a similar set of resources.</li>
</ul>
<h2>Analysis: the modern computing task</h2>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/ENIAC'><img src='http://upload.wikimedia.org/wikipedia/commons/archive/4/4e/20050923152626!Eniac.jpg' width='250'/></a></div>
<p>The first computer, <a href="http://en.wikipedia.org/wiki/ENIAC">ENIAC</a>, was conceived to do calculation of ballistics tables for artillery pieces &mdash; it was a glorified calculator. Lacking anything even resembling an operating system, it would just run its program. Its compiler? A group of six women who would configure the machine by hand with the program logic.  The input for its first test run, a calculation related to the hydrogen bomb project, was approximately <i>one million punch cards</i>.</p>
<p>Times have changed: 40 or so years of the unrelenting march of Moore&#8217;s Law in computing power has given us something like an <b><a href="http://upload.wikimedia.org/wikipedia/commons/thumb/c/c5/PPTMooresLawai.jpg/596px-PPTMooresLawai.jpg">eight order of magnitude increase</a></b> in the amount of computing power available per unit cost.  Coupled with similar,<a href="http://www.kk.org/thetechnium/archives/2009/07/was_moores_law.php"> more recent gains in storage capacity and network bandwidth</a>, this has produced a world awash in data, <a href='http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/'>crying out for analysis.</a></p>
<p>So the situation today is that we now expect to bring these considerable computing resources to bear on larger, more complex problems in the world.  I&#8217;m talking about things like the <a href="http://www.palantirtech.com/government/analysis-blog/traceback">spread of food-borne illnesses</a>, understanding the connection between genes and protein expression, <a href="http://www.palantirtech.com/government/analysis-blog/sinjar">understanding terrorist networks</a>, <a href="http://www.palantirtech.com/government/analysis-blog/uncovering-a-bot-net-exploring-router-data-using-palantir">finding botnets in network traffic logs</a>, and <a href="http://www.palantirtech.com/government/analysis-blog/transparency">exploring influence networks in government</a>.</p>
<p>These problems, while spanning a widely disparate areas of analysis, share some common traits:</p>
<h3>The data is spread out</h3>
<p>They are described by multiple data sources. Just to make things more interesting: the data sources don&#8217;t agree on their native representations of the real-world data. And finally, the real-world objects that the data are describing are actually described in multiple data sources, with no single source giving a complete and accurate representation.</p>
<h3>The data schema are not human-conceptual</h3>
<p>Rather than representing the data in some schema that maps easily into how the experts on a given problem think about said problem, the data stores in question tend to model data in whatever way was convenient for the creators of that particular data store. Put another way: people don&#8217;t think in tables, rows, columns, and XML snippets.  These first-class data storage elements don&#8217;t usually map to real-world objects.</p>
<h3>The data is sensitive</h3>
<p>Whether it&#8217;s patient information, <a href="http://www.palantirtech.com/government/analysis-blog/horizon">mortgage data</a>, a law enforcement investigation, or sensitive foreign intelligence, there is often the need for <a href="http://www.palantirtech.com/government/analysis-blog/mls">foolproof access controls on the data</a>.</p>
<h2>Palantir: an operating system-class abstraction for analysis</h2>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'><img src='http://blog.palantirtech.com/wp-content/uploads/2009/01/shot0016.png' width='250'/></div>
<p>A Palantir data server provides a similar class of services that an operating system does but focused on the specific needs of analytic tasks.  Here I&#8217;ll focus on the model used by Palantir Government; Palantir Finance uses a similar but significantly different approach to delivering these services.</p>
<p>As you might imagine, however, they both start at a somewhat higher level than punch cards.</p>
<h3>It starts with an ontology</h3>
<p>The Palantir approach to analysis begins with a task-specific ontology: essentially, a human-conceptual description of the real-world problem that&#8217;s being analyzed.</p>
<p>It&#8217;s roughly composed of three pieces:</p>
<ul>
<li>A hierarchical type system of the real-world objects that human experts use to think about this problem. We call these <i>PTObjects</i>, short for &#8220;Palantir Objects&#8221;.</li>
<li>A type system of properties that will contain the data describing these PTObjects.  PTObjects are essentially typed containers for properties. This is where most of the detail of the ontology lies.</li>
<li>A type system of possible relationships between different types of PTObjects.</li>
</ul>
<p>Within the ontology, there are numerous extension points that allow the customization of how data is imported, retrieved, and displayed (following the principle of <i>extending the abstraction in all directions</i>).</p>
<p>The data server takes the ontology as input and is agnostic to its content. This is where the principle of <i>building a general purpose framework</i> comes into play.</p>
<h3>The data sources are mapped into the ontology</h3>
<p>This part of the Palantir data server is a pattern that is very similar to an operating system&#8217;s notion of block device drivers. The difference? Instead of low-level storage systems like hard drives, we&#8217;re dealing with complex databases describing the problem at hand.</p>
<p>In an operating system, every block device can read and write blocks of data.  In the Palantir data server, everything becomes a source of PTObjects.</p>
<p>Our data importer plugins, by analogy,  fulfills the same role as a block device driver:<br />
we build glue code to map the data source&#8217;s schema into the ontology and the connectors to surface the data itself wrapped up in PTObjects.</p>
<h3>The data are composed into real-world objects.</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='/wp-content/uploads/2009/11/pg-object-model.jpg'><img src='/wp-content/uploads/2009/11/pg-object-model.jpg' width='250'/></a>
</div>
<p>Part of this mapping is composing real-world objects into composite PTObjects by resolving PTObjects together.</p>
<p>The operation of resolving is pretty straightforward: we basically union the properties of the two PTObjects into a new PTObject. The end result is a single PTObject that completely represents all the data about something in the real-world from all the available data sources.</p>
<p>As we do this composition, we keep track of where each property came from, down to the record level, in each of its original sources.  (Note that most composed PTObjects will usually have at least one property that comes from two sources).  By preserving the original identity of every atom of data, it allows us to later decompose these PTObjects into their constituent parts or, more importantly, censor a client&#8217;s view based what permissions they have for each of the original data sources.</p>
<p>This a fundamental operation in our system that doesn&#8217;t have an exact analog in operating systems &#8212; it&#8217;s sort of similar to taking  multiple filesystems and mounting them inside a virtual filesystem tree, like Unix does.  However, if each data source is like a filesystem, what we&#8217;re doing is essentially composing individual files from their fragments stored on multiple block devices.</p>
<p>Another analogy: at a level below the block device in the OS, this is also sort of similar to what a <a href="http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_0">RAID0</a> device does, the difference being that our composition is based on the contents of the data itself rather than some previously applied, content-agnostic, decomposition function.  The other difference being motivation: a RAID0 does it for performance, while Palantir is composing data to make it correspond to the real-world objects it represents.</p>
<h3>The server exposes Palantir &#8220;system calls&#8221;</h3>
<p>The interface that the Palantir data server exposes can be boiled down to two essential operations:</p>
<ul>
<li>The client can download copies of PTObjects from the server.  It may request them by id or perform some sort of search/query to specify a set of PTObjects.  This is roughly analogous to the <b><a href="http://en.wikipedia.org/wiki/Open_%28system_call%29">open()</a></b> and <b><a href="http://comsci.liu.edu/~murali/unix/read.htm">read()</a></b> system calls on Unix.
<p>Note that each client only sees the subset of properties for a given PTObject that it is authenticated for.  This censorship of full PTObjects into projected slices is something done by the server on every load of PTObjects.</li>
<li>The client can send new or updated PTObjects to the data server for storage. This is roughly analogous to the <b><a href="http://www.freebsd.org/cgi/man.cgi?query=write&#038;sektion=2&#038;manpath=FreeBSD+7.2-RELEASE">write()</a></b> system call in Unix. It, of course, entails a check as to whether the given client has permission to write to the given PTObject.</li>
</ul>
<p>The server&#8217;s responsibility is the same as the operating system: only let the client do what it has been granted permission to do.  In an operating system, the OS uses hardware features like <a href="http://en.wikipedia.org/wiki/Protected_mode">protected mode</a> to keep lower-privileged processes from accessing machine resources. Palantir uses network calls to achieve the same separation, by placing the client and server on different logical machines.  The effect is the same: the client basically requests (rather than commands) that certain operations are performed by the server.  The server uses its own rules to decide if the access or change is allowed and responds accordingly. And so the principle of <i>hard boundaries</i> is implemented.</p>
<h3>The clients do the analysis</h3>
<p>When an operating system yields to a process, that&#8217;s the time when the true processing begins.  By the same token, in Palantir, it&#8217;s not until a client connects and starts searching, visualizing, and manipulating PTObjects that analysis actually starts taking place (even if the server is doing a lot of the heavy lifting).</p>
<h2>The wide open future</h2>
<p>So why is this exciting?  I&#8217;m glad you asked!</p>
<h3>It&#8217;s about taking analysis to the next level.</h3>
<p>Let&#8217;s say you&#8217;re someone who wants to write an analytic task. Let me ask you a series of rhetorical questions:</p>
<ul>
<li>Do you want to start with three disparate sources of data or with the data already mapped into a Palantir data server?</li>
<li>Which one is a better use of your time as a programmer?</li>
<li>Which one allows you to not repeat mistakes that other programmers have already made and fixed?</li>
<li>Which one is more like writing a program than an operating system?</li>
</ul>
<p>Operating systems took us to a new level of expressiveness when it came to writing computing processes to run on computing hardware. It inverted that 85/15 ratio that Licklider talked about so that programmers spent more time writing the code that did the thing they were trying to create and less time mucking around with hardware.</p>
<p>More programmer time == better analytic tasks.</p>
<h3>It&#8217;s about making machine learning easier.</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Skynet_%28Terminator%29'><img src='http://images1.wikia.nocookie.net/terminator/images/8/8a/Cyberdyne_logo.jpg' width='250'/></a>
</div>
<p>Now consider machine learning as a field.  Pretty much every machine learning task could benefit from starting with its data in something that looks like a Palantir data server.  I&#8217;ve taken an informal survey of machine learning researchers and they agree: the 85/15 ratio still holds for machine learning.</p>
<p>Simply put: <b>most of the time and effort in machine learning is spent getting the data into a form that you can actually apply an algorithm to!</b> Now imagine if the starting point for that was a Palantir data server &mdash; now the machine learning implementer has a world of expressiveness open to them and time and energy are spent on the task at hand instead of the overhead of messing with the data.</p>
<p>Now, we don&#8217;t think that we&#8217;re building Skynet.  Quite the contrary: we believe that platforms like the one we&#8217;ve built will allow machine learning techniques to be put in the hands of experts to augment their ability to look at the world come to conclusions about complex real-world problems by asking questions of the data we&#8217;ve collected. It&#8217;s about <a href="http://en.wikipedia.org/wiki/Intelligence_amplification">Intelligence Augmentation</a>, which can use machine learning techniques and algorithms to build better tools, not creating <a href="http://en.wikipedia.org/wiki/Strong_AI">Strong AI</a>.</p>
<h3>It&#8217;s about creating new markets</h3>
<p>Let&#8217;s go back to the well of operating systems and look back at the history of MS-DOS: the first &#8220;killer&#8221; application on MS-DOS was <a href="http://en.wikipedia.org/wiki/VisiCalc">VisiCalc</a> (that screenshot at the top of this post), a text-based spreadsheet.  As you know, VisiCalc was not the end of the story but just the introduction. MS-DOS, evolved into Windows, allowed application writers an (arguably) clean abstraction on top of commodity hardware in order to build the applications that users actually wanted. Today, we have things like web browsers, multimedia authoring software, virtual machines, and IDEs built on top of what is, essentially, the same set of abstractions that VisiCalc was built on.</p>
<p>However, the most important thing to note is that VisiCalc is credited with creating the market for commercial operating systems &#8212; businesses needed VisiCalc so they paid Microsoft for MS-DOS (and IBM for a PC).  Without VisiCalc, there was no market for MS-DOS (most people, unsurprisingly, didn&#8217;t want to buy a <a href="http://en.wikipedia.org/wiki/Microsoft_BASIC">BASIC interpreter</a>).</p>
<p>We&#8217;re in the business of selling software and we agree with our customers: the Palantir approach has tremendous value.  We&#8217;ve just started tapping the potential of this market.  Think about what Oracle looked like in 1979, think what Microsoft looked like in 1980 &mdash; that&#8217;s Palantir in 2009.</p>
<h3>It&#8217;s about the start of the analysis age</h3>
<div style='float: right; text-align: right; margin-right: 15px; margin-left: 15px'>
<a href='http://en.wikipedia.org/wiki/Information_Age'><img src='http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Internet_map_1024.jpg/600px-Internet_map_1024.jpg' width='250'/></a>
</div>
<p>It can be argued that the operating system is the innovation that ushered in the &#8220;<a href="http://en.wikipedia.org/wiki/Information_Age">information age</a>&#8220;.  Without the operating system, there is no software explosion, which allows computing technology to actually be used on data in the world.</p>
<p>We think that we&#8217;re on the cusp of the analysis age, as imagined by <a href="http://en.wikipedia.org/wiki/Vernor_Vinge">Vernor Vinge</a> in <u><a href="http://books.google.com/books?id=SrLwPdBJodMC&#038;dq=rainbow%27s+end&#038;printsec=frontcover&#038;source=bn&#038;hl=en&#038;ei=TdX0Sui9HsTh8AbGlc3zCQ&#038;sa=X&#038;oi=book_result&#038;ct=result&#038;resnum=5&#038;ved=0CBsQ6AEwBA#v=onepage&#038;q=&#038;f=false">Rainbow&#8217;s End</a></u>.  It was something foreseen by Licklider in 1960, albeit with a timeline that was off by at least a few decades:</p>
<blockquote><p>
“…it seems worthwhile to avoid argument with (other) enthusiasts for artificial intelligence by conceding dominance in the distant future of cerebration to machines alone. There will nevertheless be a fairly long interim during which the main intellectual advances will be made by men and computers working together in intimate association. A multidisciplinary study group, examining future research and development problems of the Air Force, estimated that it would be 1980 before developments in artificial intelligence make it possible for machines alone to do much thinking or problem solving of military significance. That would leave, say, five years to develop man-computer symbiosis and 15 years to use it. The 15 may be 10 or 500, but those years should be intellectually the most creative and exciting in the history of mankind.”
</p></blockquote>
<p>It&#8217;s a golden age of analysis and we&#8217;re just getting started: we&#8217;ve got a lot of work to do, so if this sort of thing excites you, please <a href='http://www.palantirtech.com/careers/culture'>come and join us.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Model Resolution in Palantir Finance: avoiding N2</title>
		<link>http://blog.palantirtech.com/2009/02/02/model-resolution-in-palantir-finance/</link>
		<comments>http://blog.palantirtech.com/2009/02/02/model-resolution-in-palantir-finance/#comments</comments>
		<pubDate>Mon, 02 Feb 2009 20:00:49 +0000</pubDate>
		<dc:creator>Andy Aymeloglu</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[problemspace - finance]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=180</guid>
		<description><![CDATA[N2, with N = 8 One of the big challenges in Palantir Finance comes when integrating data from multiple data providers. When the server is launched, it needs to create a coherent model of the financial world based on data coming from potentially dozens of data providers. Each data provider defines a set of “models” [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 275px; text-align: center;"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/7/73/Complete_graph_K8.svg/600px-Complete_graph_K8.svg.png" alt="" width="260" /><br />
<em>N</em><sup>2</sup>, with <em>N</em> = 8</div>
<p>One of the big challenges in Palantir Finance comes when integrating data from multiple data providers.  When the server is launched, it needs to create a coherent model of the financial world based on data coming from potentially dozens of data providers.  Each data provider defines a set of “models” that it supports.  These models can be things like equities, currencies, futures, options, or even new types that the providers themselves define.</p>
<p>The major challenge occurs when multiple providers define models that represent the same real-world entity.  Provider <em>A</em> might know about Google, have basic open/high/low/close data for the stock, and know its ticker, country, and <a href="http://en.wikipedia.org/wiki/International_Securities_Identifying_Number">ISIN</a>.  Provider <em>B</em> might also provide a Google model, have balance sheet data, and know its country, exchange, and ISIN.  We want to expose only one Google model to the user, however, and so we need a means of <a href="http://en.wikipedia.org/wiki/Identity_resolution">resolving </a>the two Googles together – recognizing that they’re the same instrument – and adding just one equity to the system that encompasses both.</p>
<p>Resolution logic can be fairly complicated.  For equities, for example, there are several different ways in which resolution can take place.  If two equities have identical ISINs, we can be pretty confident they match, since those identifiers are declared as globally unique.  If two equities have the same ticker and the same country of exchange, we might also consider that a match, though perhaps of weaker quality.  Two models resolve to each other if any form of resolution considers them equal (with errors being thrown if other forms of resolution contradict the form that considers them equal…i.e. provider <em>A</em> and provider <em>B</em> agree on an instrument’s ISIN but disagree on its ticker).</p>
<p>Read on for the details of how we solve this seemingly <a href="http://en.wikipedia.org/wiki/Analysis_of_algorithms"><em>n</em><sup>2</sup></a> problem with a linear solution.<br />
<span id="more-180"></span><br />
Given <em>N</em> models across providers of a given asset class (say, equities), there are <em>N</em><sup>2</sup> potential checks that I need to do to properly “resolve” all models, since any model can resolve to any other model in the system (and I potentially do want to attempt to resolve a model from provider <em>A</em> to other models from provider <em>A</em> to do error checking, since I may consider it invalid for a provider to provide the same model twice).  Obviously we would like to do better than this, and we can, assuming that most models do not resolve to each other.</p>
<p>Envision the set of all <em>(model, provider)</em> pairs as the set of nodes on a graph.  Two models from different providers that resolve to each other can be represented by an edge between two nodes in the graph.  If the number of providers <em>k</em> is small relative to <em>N</em>, the number of resolution forms for a given asset class is small, and our data is valid, we can come up with an algorithm that solves our problem in N time as follows:</p>
<ol>
<li>For every form of resolution, ask the data providers for all the data necessary for resolution to take place.  For ticker/country resolution, with our data provider interfaces, this gives us a map from every<em> (model, provider)</em> pair to its ticker and country.</li>
<li>We can then invert this map, giving us a map from <em>(ticker, country)</em> pairs to a set of <em>(model, provider)</em> pairs.  Note that the values in the inverted map do have to be sets, since there can be multiple <em>(model, provider)</em> pairs with the same ticker and country (indeed, this is expected if ANY models can be resolved between providers).</li>
<li> Then, for every model, for each resolution form, we can look up the relevant properties for that model, and then look up in the inverse map any models that are equivalent to it.  This tells us what edges to add to our <em>(model, provider)</em> graph.</li>
</ol>
<p>We&#8217;re essentially building up an in-memory, inverted index of the relevant data each model is giving us.  The amortized <em>O</em>(1) lookups that the hashtable-backed maps provide allows us to trade the <em>O</em>(<em>N</em><sup>2</sup>) complexity for something more like <em>O</em>(<em>N</em>).</p>
<div style="float: left; width: 200px; text-align: center; margin-right: 15px;"><img src="http://blog.palantirtech.com/wp-content/uploads/2009/02/disconnected-k-clusters.png" alt="" width="200" /><br />
<em>N</em> checks, rather than <em>N</em><sup>2</sup> (assuming that <em>k</em> is trivial compared to <em>N</em>).</div>
<p>Once we’ve done this for every model each <a href="http://en.wikipedia.org/wiki/Connected_component_(graph_theory)">connected component</a> of our graph should correspond to one model to be added into our final system.  Since the connected components of a graph can be computed in time linear to the number of nodes, we can compute all the final models in linear time.  And what is nice is that the maps give us the ability to quickly post-process our data to look for errors.  If any two models in a given connected component come from the same provider, this is an error (either the provider has incorrect data, or it is modeling the data improperly).  If two models from two different providers resolve, but have conflicting data for a given resolution form, this is also an error.  Note that since providers do not have to provide data for every resolution form, it is possible that <em>k</em> models from different providers that resolve together do not form a <a href="http://en.wikipedia.org/wiki/Clique_(graph_theory)"><em>k</em>-clique</a> on the graph.</p>
<p>Writing data providers is not always easy.  There are many data sources out there that are messy, and properly modeling real world data in code can be quite challenging.  That’s why it is important to come up with sound, efficient resolution logic that fails noisily, and tells the engineer building the provider when they are and are not playing nicely with the rest of the system.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/02/02/model-resolution-in-palantir-finance/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using Palantir to implement the TARP</title>
		<link>http://blog.palantirtech.com/2009/01/22/tarp/</link>
		<comments>http://blog.palantirtech.com/2009/01/22/tarp/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 09:27:41 +0000</pubDate>
		<dc:creator>Alex Fishman</dc:creator>
				<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=170</guid>
		<description><![CDATA[We talk often with our contacts in finance and intelligence, and an increasingly common subject is the U.S. Government&#8217;s Troubled Assets Relief Program (TARP &#8212; part of the Treasury Department). Our friends see the large problems facing the TARP and the Federal Reserve, and have been asking how our technology can help. Some of the [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 226px; margin-left: 15px'><a href="http://www.treas.gov/initiatives/eesa/"><img src='http://www.ustreas.gov/education/fact-sheets/images/new_treas_seal.gif'/></a></div>
<p>We talk often with our contacts in finance and intelligence, and an increasingly common subject is the U.S. Government&#8217;s Troubled Assets Relief Program (<a href="http://www.treas.gov/initiatives/eesa/">TARP</a> &#8212; part of the Treasury Department). Our friends see the <a href="http://en.wikipedia.org/wiki/Troubled_Assets_Relief_Program#Purpose">large problems</a> facing the TARP and the Federal Reserve, and have been asking how our technology can help.</p>
<p>Some of the problems are out of our hands, but many others are solvable with the proper analytics. Taking a closer look at the task before TARP, we noticed that many challenges mirror those facing the intelligence community:</p>
<ul>
<li>Entity and relationship <strong>data</strong> is scattered across many sources in a <strong>wide variety of formats</strong>; some are <strong>structured</strong>, some are <strong>unstructured</strong>.</li>
<li>Entity structure and relationships are <strong>not always known upfront</strong>, so the solution must<strong> adapt to new data structures</strong> on the fly.</li>
<li>It is costly, time-consuming, and <strong>unnecessary to impose one structure</strong> on the entire industry.</li>
<li><strong>Scalability</strong> is a must: millions of mortgages have been securitized into hundreds of thousands of entities.</li>
<li>Sensitive, private data requires <strong>sophisticated access control and knowledge management</strong> &#8212; understanding who is accessing which data, what the organization knows, when it was known, and how it was discovered.</li>
<li>Specialists from different fields and geographical regions must be able to <strong>collaborate effectively</strong>.</li>
</ul>
<p>Palantir&#8217;s technology already solves these problems for the intelligence community. Our dynamic ontology makes it easy to import TARP data and entities, so we&#8217;ve created a short video using Palantir that shows the power of our approach. We analyze individual mortgage loans, mortgage-backed securities comprising these loans, and institutions holding <a href="http://en.wikipedia.org/wiki/Tranche">tranches</a> of the securities:</p>
<div style='postimg'>
<a href="http://www.palantirtech.com/government/videos/mbs/"><img src="http://blog.palantirtech.com/wp-content/uploads/2009/01/shot0016.png"/></a>
</div>
<p>For more detail on the similarities, click the link to see a detailed breakdown of intelligence vs. TARP workflows.</p>
<p><span id="more-170"></span></p>
<h2>Workflows</h2>
<p>The types of questions the TARP and the Federal Reserve need to answer successfully are similar to those in the intelligence community. In essence, TARP is performing the sort of analysis performed at intelligence agencies: making sense of large amounts of data to create a coherent and accurate picture of the world. TARP is performing analysis on domestic financial data rather than global intelligence data, and using those insights to craft solutions to the current financial crisis. Our breakdown and comparison of the different aspects of the workflows along the same broad lines looks like this:</p>
<h3>Strategic: Mission Planning and Policy Design</h3>
<table>
<tr>
<th>Classical Intel</th>
<th>TARP</th>
</tr>
<tr>
<td>
<ul>
<li>How have nation-states’ methods of supporting terrorist organizations evolved over the last 10 years?</li>
<li>How has deploying more troops to specific hot spots affected the overall level of violence in those areas?</li>
<li>What types of surrogate forces should be recruited and trained to support missions across theater?</li>
</ul>
</td>
<td>
<ul>
<li>Which institutions will require intervention and what markets are they most exposed to?</li>
<li>Which geographical regions and communities most urgently need federal support?</li>
<li>Which asset classes and types of mortgages should be purchased first?</li>
</ul>
</tr>
</table>
<h3>Operational: Asset Class Level Management and Tactical Planning</h3>
<table>
<tr>
<th>Classical Intel</th>
<th>TARP</th>
</tr>
<tr>
<td>
<ul>
<li>What known terrorist cells are present in a given region and what is the most effective way to combat them based on their ideology?</li>
<li>What are the various touch points for these organizations’ logistical networks and what measures have proved effective in dismantling them in the past?</li>
<li>
How can we measure the efficacy of various actions against the objectives through observable phenomena, including communications, financial information, and human source collection?</li>
</ul>
</td>
<td>
<ul>
<li>What are the characteristics of loans most likely to default in Florida and what is the best strategy for preventing foreclosure?</li>
<li>Which players were most involved in originating commercial loans in Florida? What tactics were used to justify appraisals, and how can these tactics be adjusted for?</li>
<li>What policy for mortgage adjustment yields the fairest outcome in Palm Springs, Florida?</li>
</ul>
</td>
</tr>
</table>
<h3>Tactical: Asset Targeting, Program Implementation, Specific Action Support.</h3>
<table>
<tr>
<th>Classical Intel</th>
<th>TARP</th>
</tr>
<tr>
<td>
<ul>
<li>What times are most likely for a patrol to be attacked in this neighborhood?  What methods are used during the day vs. the night?</li>
<li>Which repercussions are likely to occur as result of arresting a specific individual? What organizations is this person associated with and who is likely to retaliate?
<li>Which human sources are likely to be able to provide actionable intelligence to move against the time sensitive target?</li>
</ul>
</td>
<td>
<ul>
<li>What is the notional size of <a href="http://en.wikipedia.org/wiki/Credit_default_swap">credit default swaps</a> written on this tranche of this <a href="http://en.wikipedia.org/wiki/Commercial_mortgage-backed_security">commercial MBS</a>?  Which banks are the major holders, and how have their assets ratings changed?</li>
<li>Who originated this loan, and how close are the <a href="http://en.wikipedia.org/wiki/Comparables">comparables</a> used in the due-diligence report?</li>
<li>Who is the servicer for this mortgage, and which branch needs to be contacted if the size of the loan is adjusted down?</li>
</ul>
</td>
</tr>
</table>
<h2>Mission</h2>
<p>We believe that the TARP&#8217;s success is critical to the global financial markets and the health of our nation. We&#8217;ve said from the beginning that our mission is to change the way the world approaches data, and today Palantir is a technology leader in both intelligence and finance. As we begin work on this new challenge we&#8217;re excited to be making a difference where it&#8217;s needed most.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/01/22/tarp/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Hal Varian: analysis is the long-term value play</title>
		<link>http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/</link>
		<comments>http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/#comments</comments>
		<pubDate>Tue, 18 Mar 2008 20:00:34 +0000</pubDate>
		<dc:creator>Bob McGrew</dc:creator>
				<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2008/02/28/why-hal-varian-thinks-palantir-is-a-great-idea/</guid>
		<description><![CDATA[Raw data is an increasingly abundant and inexpensive commodity. Intelligently filtering, analyzing and visually understanding data is where the value is. Palantir invents technology and products that enables human analysts to harness the power of computers in an intuitive way to quickly and deeply analyze large amounts of data. The value of data analysis as [...]]]></description>
			<content:encoded><![CDATA[<p>Raw data is an increasingly abundant and inexpensive commodity. Intelligently filtering, analyzing and visually understanding data is where the value is.  Palantir invents technology and products that enables human analysts to harness the power of computers in an intuitive way to quickly and deeply analyze large amounts of data.</p>
<p><a href="http://freakonomics.blogs.nytimes.com/2008/02/25/hal-varian-answers-your-questions/#more-2345">The value of data analysis as a career was recently emphasized by Hal Varian in the Freakonomics blog in The New York Times</a>. <a href="http://people.ischool.berkeley.edu/~hal/">Hal</a> is an internationally known economist who is currently serving as Google’s Chief Economist while on leave from his three professorships at the University of California at Berkeley. </p>
<blockquote><p>Q: Your job sounds extremely interesting. What jobs would you recommend to a young person with an interest, and maybe a bachelors degree, in economics?</p>
<p>A: If you are looking for a career where your services will be in high demand, you should find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. <strong>So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis. So my recommendation is to take lots of courses about how to manipulate and analyze data: databases, machine learning, econometrics, statistics, visualization, and so on. <em>[emphasis added]</em></strong></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2008/03/18/why-hal-varian-thinks-palantir-is-a-great-idea/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Palantir: so what is it you guys do?</title>
		<link>http://blog.palantirtech.com/2007/12/04/what-do-we-do/</link>
		<comments>http://blog.palantirtech.com/2007/12/04/what-do-we-do/#comments</comments>
		<pubDate>Tue, 04 Dec 2007 08:01:18 +0000</pubDate>
		<dc:creator>Kevin Simler</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[palantirtech]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/12/04/what-do-we-do/</guid>
		<description><![CDATA[I often ask candidates if they&#8217;re familiar with what we do at Palantir. Most people think they are. &#8220;Oh, you&#8217;re that data viz. company,&#8221; or, worse, &#8220;You guys do data mining, right?&#8221; At least they&#8217;ve heard of us and at least they&#8217;re on the right track, but I cringe anyway. We aren&#8217;t just a &#8220;data [...]]]></description>
			<content:encoded><![CDATA[<p>I often ask candidates if they&#8217;re familiar with what we do at Palantir.  Most people think they are.  &#8220;Oh, you&#8217;re that data viz. company,&#8221; or, worse, &#8220;You guys do data mining, right?&#8221;  At least they&#8217;ve heard of us and at least they&#8217;re on the right track, but I cringe anyway.  We aren&#8217;t just a &#8220;data visualization&#8221; company and we don&#8217;t do &#8220;data mining.&#8221;  It&#8217;s almost impossible to convey the scope and complexity of what we do in a few short minutes&#8212;or to do so without taking the conversation to an eye-glazing level of abstraction.</p>
<p>The following is my attempt at describing what we do at a high level without oversimplifying.  I hope that after reading this a candidate will &#8216;get&#8217; what we&#8217;re about, or at least understand enough not to apply tiny labels to our expansive vision.</p>
<p><span id="more-82"></span></p>
<h2>The problem: implementing analysis</h2>
<p>At Palantir we specialize in <strong>analysis</strong>.</p>
<p>Yes, that&#8217;s painfully abstract, and I&#8217;ll get to it in a second.</p>
<p>In real-world terms, we are building a <strong>software platform</strong> that enables people to take whatever data is relevant to them and understand it more easily and thoroughly than ever before, using concepts that they already understand.  And we are applying this vision, at first, to solving problems in the finance sector and the government intelligence community.</p>
<p>The first important thing to note is that we don&#8217;t actually do the analysis ourselves.  We don&#8217;t devise winning trading strategies and we don&#8217;t catch terrorists.  We write software that enables other people to pull off these feats.  These people, experts in their respective fields, are called <em>analysts.</em></p>
<p>So what exactly do analysts do?  What is analysis?</p>
<blockquote><p>Analysis is everything necessary to extract <strong>insight</strong> from <strong>information</strong>.</p></blockquote>
<p>Let&#8217;s break that down a bit.</p>
<p>Information is easy:  It&#8217;s data.  It lives in a relational database or as files indexed on a hard drive, and you can easily run queries against it.  It comes in two forms, structured and unstructured.  And there is <em>a lot</em> of it in the modern world &#8211; too much, actually, for current tools to make sense of.</p>
<p>Insight is trickier.  Insight is something only a person can generate, and understanding this is critical for any organization that wants to do analysis right.  Thus the challenge of data analysis is how to bring vast amounts of information into productive contact with human intelligence.  In other words, the challenge is how to <em>enable the analyst</em>.</p>
<p>From the analyst&#8217;s perspective there are five essential features of an analysis platform:</p>
<ol>
<li>First, and most important, <em><strong>the analyst should be in control</strong></em>.  In other words, the primary way of interacting with an analysis tool should be <em>human-driven queries</em>.  While automated approaches can complement a human-driven approach, there simply is no substitute for human intelligence.  Unless you put a person behind the wheel, the system can never be flexible or creative enough to uncover truly original insight.  Artificial Intelligence just isn&#8217;t there yet.</li>
<li>Ability to <em><strong>summarize large data sets</strong></em>.  Some of this is what has traditionally been called data mining:  the largely automated approach&#8212;using machine learning or other statistical techniques&#8212;of processing lots of data at once and extracting nuggets that capture something interesting about the data.  Unlike Palantir, traditional approaches have focused almost exclusively on this aspect of analysis.</li>
<li>Ability to <em><strong>visualize large data sets</strong></em>.  Here the analyst wants interesting and informative ways of viewing data graphically, to make it easier for him to digest.  The analyst wants more than just a summary of the data; he wants a nuanced view of what&#8217;s going on <em>inside</em> these data sets:  What&#8217;s the overall shape of the distribution?  What are the outliers?  What are important structures within the data?</li>
<li>Ability to <em><strong>iterate rapidly</strong></em>.  This means enabling the analyst to ask a question, get the answer, and then quickly ask either a variant on the initial question or a follow-up question that depends on the answer to the initial question.  This rapid, iterative process allows the analyst to quickly test out hypotheses and develop theories about what&#8217;s going on in the data, and by extension to discover what&#8217;s going on in the world.</li>
<li>Ability to <em><strong>collaborate with other analysts</strong></em>.  Getting a handle on a terabyte of data, especially when it comprises multiple data types, is definitely more than a one-person job.  Any organization that&#8217;s serious about understanding the world needs a team of analysts that can work together as more than the sum of its parts.  This requires the ability for one analyst to effortlessly share the results of his analysis with his colleagues.</li>
</ol>
<h2>The Palantir approach</h2>
<p>That&#8217;s what analysis looks like to the analyst, or rather what it should look like in an ideal world.  (Current tools fall far short of this vision.)  So what do <em>we</em> do at Palantir in order to make analysis this smooth and easy?</p>
<p>You could say that we help summarize large data sets, in the sense that we have to provide the analyst with a rich library of techniques and algorithms.  You could also say that we do visualization, in the sense that we have to provide the analyst with a set of interesting and informative ways of visualizing their data.  We do both of these things, and we have to be creative and solve hard problems in order to add value in these areas.  But we do a lot more than that.</p>
<p>Probably the most central hard problem that we address in trying to enable the analyst is <strong>data modeling</strong>, the process of figuring out what data types are relevant to a domain, defining what they represent in the world, and deciding how to represent them in the system.  At Palantir we make sure our data model (ontology) is both flexible and dynamic, and that it mirrors the concepts people naturally use when reasoning about the domain.  This is no small challenge, but we&#8217;re already making it a reality.  In finance our basic data types include financial instruments, dates, portfolios, indices, and strategies&#8212;the same things that financial researchers think about, talk about, and reason with.  In the intelligence product our basic data types include people, places, and events (all with associated properties), which is exactly the way we all represent the world in our minds.</p>
<p>Data modeling, data summarization, and data visualization are the core disciplines for approaching large data sets.  Human-driven queries, rapid iteration, and collaboration are multipliers, taking the power unlocked by the core disciplines to the next level.  When these pieces are brought together in a coherent system, the result is in an analysis platform both very generic and very powerful.</p>
<p>This is what we mean when we say that we&#8217;re changing the way people approach data.  Welcome to the future of analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/12/04/what-do-we-do/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

