<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; enterprise software</title>
	<atom:link href="http:///category/enterprise-software/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Wed, 14 Dec 2011 17:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Inside Horizon: interactive analysis at cloud scale</title>
		<link>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/</link>
		<comments>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/#comments</comments>
		<pubDate>Fri, 15 Apr 2011 19:04:46 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1837</guid>
		<description><![CDATA[Late last year, we were honored to be invited to talk at Reflections&#124;Projections, ACM@UIUC&#8217;s annual student-run computing conference. We decided to bring a talk about Horizon, our system for doing aggregate analysis and filtering across very large amounts of data. The video of the talk was posted a few weeks back on the conference website. [...]]]></description>
			<content:encoded><![CDATA[<div style='width: 250; margin-left: 10px; margin-bottom: 10px; float: right;'><a href="http://www.acm.uiuc.edu/conference/2010/"><img src="http://blog.palantir.com/wp-content/uploads/2011/03/reflectionsprojections.png" alt="" title="reflectionsprojections" width="250" height="215"/></a></div>
<p>Late last year, we were honored to be invited to talk at Reflections|Projections, ACM@UIUC&#8217;s annual student-run computing conference.  We decided to bring a talk about Horizon, our system for doing aggregate analysis and filtering across very large amounts of data.  The video of the talk was posted a few weeks back on <a href="http://www.acm.uiuc.edu/Conferenceware/Schedule/Videos">the conference website</a>.</p>
<p>Horizon started as research project / technology demonstrator built as part of Palantir&#8217;s Hack Week &#8211; a periodic innovation sprint that our engineering team uses to build brand new ideas from whole cloth.  It was then used by the Center For Public Integrity in their <a href="http://www.publicintegrity.org/investigations/economic_meltdown/">Who&#8217;s Behind The Subprime Meltdown</a> report.  We produced a short video on the subject, <a href="http://www.palantirtech.com/government/analysis-blog/horizon">Beyond the Cloud: Project Horizon</a>, released on our analysis blog.  Subsequently, it was folded into our product offering, under the name <a href="http://www.palantirtech.com/labs/object-explorer">Object Explorer</a>.</p>
<p>In this hour-long talk, two of the engineers that built this technology tell the story of how Horizon came to be, how it works, and show a live demo of doing analysis on hundreds of millions of records in interactive time.</p>
<p><iframe title="YouTube video player" width="640" height="510" src="http://www.youtube.com/embed/9dOpDeRMTMc" frameborder="0" allowfullscreen></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>JavaInvoke allows you to spawn additional Java VMs during testing</title>
		<link>http://blog.palantirtech.com/2009/07/28/javainvoke/</link>
		<comments>http://blog.palantirtech.com/2009/07/28/javainvoke/#comments</comments>
		<pubDate>Tue, 28 Jul 2009 22:00:30 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[tips and tricks]]></category>
		<category><![CDATA[unit testing]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=209</guid>
		<description><![CDATA[Here at Palantir we use test-driven development (or TDD for short). Integrated tools like Eclipse and JUnit simplify writing and running unit tests. However, once you need to test a broader swath of functionality, it&#8217;s time to write functional, integration, and system tests. While technically not &#8216;unit testing&#8217;, the testing framework that JUnit provides is [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; text-align: right; width: 298px'><img src="/wp-content/uploads/2009/07/junit.png" alt="junit success" width="288" height="194" /></div>
<p>Here at Palantir we use <a href="http://en.wikipedia.org/wiki/Test-driven_development">test-driven development (or TDD for short)</a>.  Integrated tools like <a href="http://www.eclipse.org/">Eclipse </a>and <a href="http://junit.org/">JUnit</a> simplify <a href="http://open.ncsu.edu/se/tutorials/junit/">writing and running unit tests</a>.  However, once you need to test a broader swath of functionality, it&#8217;s time to write <a href="http://www.ibm.com/developerworks/library/j-test.html#h1">functional</a>, <a href='http://en.wikipedia.org/wiki/Integration_testing'>integration</a>, and <a href='http://en.wikipedia.org/wiki/System_testing'>system</a> tests.  While technically not &#8216;unit testing&#8217;, the testing framework that JUnit provides is basically the same infrastructure that you want to leverage for writing these more involved types of testing.</p>
<p>When you&#8217;re developing enterprise software, functional testing often means getting your clients to talk to your servers.  For the main <a href="http://www.palantirtech.com/government">Palantir Government</a> product, we integrate the process of bringing the server up and down with the Ant scripts that run our automated unit tests: our testing tasks bring up the server, <a href="http://ant.apache.org/manual/OptionalTasks/junit.html">run the test suite</a>, and then kill the server. This works great and produces nice results.</p>
<p>When I started working on our authentication server, the pattern that we had used before didn&#8217;t work for me.  While the Palantir Government tests ran with a single, static configuration file, I needed to run the authentication server with multiple configurations in the course of running through the all the different functional tests.  I determined that I needed a way to programmatically bring the server up and down for testing. In JUnit parlance, I needed a way to programmatically launch the server component as part of my setup() function for my unit tests and stop it in my teardown().</p>
<p>With my itch-to-scratch firmly in hand (or some other mixed metaphor), I set out to figure out how to invoke new Java processes from inside a unit test.  The solution I came up with (with source code and examples) after the jump.<br />
<span id="more-209"></span></p>
<h2>The Six Ingredients</h2>
<p>So there are six ingredients that go into spawning a new VM:</p>
<ul>
<li>The classpath to use for the new VM</li>
<li>The name of the class to run</li>
<li>The directory to be used as the current directory for the process</li>
<li>The command line arguments to pass to the process</li>
<li>The set of Java system properties to use for this process</li>
<li>The environment to pass to the process</li>
</ul>
<p>Let&#8217;s look at each item individually.</p>
<h3>Classpath</h3>
<p>The classpath will tell the spawned VM where to load classes from.  In JavaInvoke, we use the existing classpath (from the spawning VM) as a starting point and then prepend any new entries to allow overriding the classpath for the spawned VM.</p>
<p>This takes a lot of the tedium out of having to figuring out what to put in the classpath.  Most likely, you want something similar to what you already have, if not completely identical.</p>
<p>We get the classpath from <code>System.getProperty("java.class.path")</code> and can add new entries by prepending the new entry, using the value of  <code>File.pathSeparatorChar</code> as the entry delimiter.  Using <code>File.pathSeparatorChar</code> makes the code cross-platform friendly (since the path separator is &#8216;;&#8217; on Windows and &#8216;:&#8217; on Unix (Linux, Solaris, OS/X, etc.).</p>
<p>Caveat: if you change the working directory and your original classpath was constructed using relative paths, you&#8217;ll probably have trouble getting anything to run (since your classpath will no longer point to right locations).</p>
<h3>Class name</h3>
<p>Pretty simple: what do you want to run in the spawned VM?  The class must have a <code>static void main(String args[])</code> defined, and it must be available for loading via the classpath.</p>
<h3>Working Directory</h3>
<p>If it should be different from the current working directory (CWD) of the running process, then set it and JavaInvoke will change it in the environment.</p>
<h3>Command line arguments</h3>
<p>If the process needs any command line arguments, including VM options, specify them in a string array.  Note that not all of these arguments will necessarily make it to your main method, since the VM executable will parse it first and remove the VM arguments, passing through the program arguments.</p>
<h3>Java System Properties</h3>
<p>System properties can be used to control many aspects of how a VM runs.  You can set them programmatically in your code or you can set set them on the command line by passing <em>-Dkey=value</em>.  Our JavaInvoke implementation will take a Map<string,String> of properties as a convenience argument; all it does is rewrite the map into the command line.</p>
<h3>Process environment</h3>
<p>This is an operating-system level construct.  This is the set of environment variables, also in a Map<string,String> that you would like merged with the current environment.  This would be the place that you set things like LD_LIBRARY_PATH on Unix.</p>
<h2>Dealing with input and output</h2>
<p>So you might ask the question, &#8220;where does the output from the process go?&#8221;  Or more troubling, &#8220;How do I send the process some input?&#8221;  The Java <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Process.html">Process</a> object has methods to deal with this, allowing you to get streams that give you access to the input, output, and error streams of spawned process. That API is straight-forward to deal with, just like any other use of the java.io streams.</p>
<p>However, we want to make the typical case really easy: pulling the output from the spawned process back to the parent that spawned it.  To that end, we add into the mix a class called OutputPiper.  It fires up a thread that pulls all input from the spawned process, tags it with an identifier, and then outputs to the spawner&#8217;s stdout/stderr.</p>
<h3>OutputPiper</h3>
<p>(as extracted from <a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/ProcessSpawner.java.html'>ProcessSpawner.java</a>)</p>
<pre class="brush: java; title: ; notranslate">
	public static class OutputPiper extends Thread  {
		InputStream in;
		PrintStream out;
		String tag = null;

		public OutputPiper(String tag, InputStream in,PrintStream out) {
			this.in = in;
			this.out = out;
			this.tag = tag;
			// make sure that we don't keep the VM alive
			this.setDaemon(true);
			this.setName(&quot;OutputPiper-&quot; + tag);
			out.println(&quot;Starting output piper for tag: &quot; + tag);
			this.start();
		}

		@Override
		public void run() {
			try {
				BufferedReader reader = new BufferedReader(new InputStreamReader(in));
				String line = null;
				do {
					line = reader.readLine();
					if(line != null) {
						out.println(tag + &quot;: &quot; + line);
					}
				}while(line != null);
			}
			catch (Exception e) {
				//
			}
			out.println(&quot;Output piper exiting for tag: &quot; + tag);
		}

		public static OutputPiper createOutputPiper(String tag, InputStream in, PrintStream out) {
			OutputPiper rc = new OutputPiper(tag, in,out);
			return rc;
		}
	}
</pre>
<p>Outpiper extends <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html">Thread</a> so that all the output will arrive back to the controlling process in a timely manner.  For each given process, we spawn off two OutputPipers, one for stdout and one for stderr, corresponding to the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Process.html#getInputStream()">Process.getInputStream()</a> and the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Process.html#getErrorStream()">Process.getErrorStream()</a>.</p>
<h2>ProcessSpawner &#038; JavaInvoke</h2>
<p>There are two key classes in the example:</p>
<ul>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/ProcessSpawner.java.html'>ProcessSpawner.java</a> &#8211; Essentially a wrapper around <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/ProcessBuilder.html">ProcessBuilder</a>, a generic process spawner that makes it simple to invoke processes that that use OutputPipers to forward their output back to their parent. This class allows you to specify the working directory, process environment, and command line for the process to be invoked.</li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/JavaInvoke.java.html'>JavaInvoke.java</a> &#8211; a specialized subclass of ProcessSpawner, this class makes spawning new VMs a piece of cake, doing the necessary translation for Java system properties, setting the proper classpath environment variable with potential overrides, and fills in the fully qualified class name to run.</li>
</ul>
<h2>The Example &#038; Source Code</h2>
<p>I&#8217;ve put together a running example that implements a trivial client and server in JUnit test.  The setup() method spawns the server and then the tests run the client code against the server, tearing it down after each test.  It&#8217;s available in the <a href='/wp-content/uploads/2009/07/PalantirVMSpawnerExample.zip'>PalantirVMSpawnerExample.zip</a> zip file.  Unzip it, run the <i>run.sh</i> or <i>run.bat</i> script as appropriate.  It should generate output that looks like this:</p>
<pre class="console">
-----------------------------------------------------
Starting test testAck
INFO [main] JavaInvoke - CLASSPATH=./lib/devblog-vmspawner.jar
INFO [main] ProcessSpawner - Build process spawner for the following command line:
INFO [main] ProcessSpawner - /home/pteng/java/i586/jdk1.5.0_14/jre/bin/java com.palantir.blog.processspawner.Server
Starting output piper for tag: server-stdout
Starting output piper for tag: server-stderr
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: [Socket Handler2]: Got message: some message
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: [Socket Handler3]: Got message: SHUTDOWN
Output piper exiting for tag: server-stdout
Output piper exiting for tag: server-stderr
Finished test testAck
-----------------------------------------------------
-----------------------------------------------------
Starting test testShutdown
INFO [main] JavaInvoke - CLASSPATH=./lib/devblog-vmspawner.jar
INFO [main] ProcessSpawner - Build process spawner for the following command line:
INFO [main] ProcessSpawner - /home/pteng/java/i586/jdk1.5.0_14/jre/bin/java com.palantir.blog.processspawner.Server
Starting output piper for tag: server-stdout
Starting output piper for tag: server-stderr
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: [Socket Handler3]: Got message: SHUTDOWN
Output piper exiting for tag: server-stdout
Output piper exiting for tag: server-stderr
Took 3 ms to send shutdown.
Took 335 ms for process to die.
Finished test testShutdown
-----------------------------------------------------
SUCCESS: all 2 tests passed
</pre>
<p>The source is included in the zip file, but if you wanted to look at it or link to it on the web, here are the classes involved:</p>
<ul>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/Client.java.html'>Client.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/Example.java.html'>Example.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/JavaInvoke.java.html'>JavaInvoke.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/ProcessSpawner.java.html'>ProcessSpawner.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/Server.java.html'>Server.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/ServerSpawningTest.java.html'>ServerSpawningTest.java</a></li>
</ul>
<p>And as an added bonus, there&#8217;s an Ant <i>build.xml</i> that will let you tweak and rebuild the demo yourself.</p>
<p>Comments and questions welcome.  Enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/07/28/javainvoke/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The MultiSnake Challenge</title>
		<link>http://blog.palantirtech.com/2009/07/06/the-multisnake-challenge/</link>
		<comments>http://blog.palantirtech.com/2009/07/06/the-multisnake-challenge/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 20:00:20 +0000</pubDate>
		<dc:creator>Nick Miyake</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[fun]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=997</guid>
		<description><![CDATA[&#8220;Freaking lag!&#8221; It had started to become a common refrain around the developer pit. Listed as a project on a candidate&#8217;s resume, MultiSnake was a game that we had started to play during our coding breaks. The game was really quite fun &#8212; it was easy to play, games were short, and its multi-player nature fostered [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; text-align: right; width: 310px'><a href="/wp-content/uploads/2009/06/snake1.png"><img src="/wp-content/uploads/2009/06/snake1.png" alt="multisnake game" width="300" height="217" /></a></div>
<p>&#8220;Freaking lag!&#8221; It had started to become a common refrain around the developer pit. Listed as a project on a candidate&#8217;s resume, MultiSnake was a game that we had started to play during our coding breaks. The game was really quite fun &#8212; it was easy to play, games were short, and its multi-player nature fostered great competition. The only real drawback was that we seemed to experience network lag. There was nothing more infuriating than having your long snake die by running straight into a completely avoidable wall because the game lagged and didn&#8217;t respond to your keyboard commands in time. During one of our particularly lag-heavy games, someone yelled out a gripe that would change our MultiSnaking days for good: &#8220;Man, we could totally write this game ourselves, in our app.&#8221;</p>
<p><span id="more-997"></span>The gripe stuck around, and one day someone finally called out the person making the claim.  &#8220;Do you seriously think we could write this ourselves?&#8221;</p>
<p>&#8220;Sure, why not? We have all of the architecture that we need to make this work. We could do it in four hours.&#8221;</p>
<p>&#8220;I bet you we couldn&#8217;t.&#8221;</p>
<p>The rest of the story and a video of MultiSnake in action follows.<br />
<!--more--></p>
<h2>The Challenge</h2>
<p>A challenge was born. The task sounded fun, and it also provided us with a great chance to test the extensibility of our platform. Our most recent milestone had focused on solidifying the public APIs of our platform, and this challenge seemed like a way to test its pluggability. We also thought that it would be a great showcase to demonstrate how easily one could add capabilities to our platform &#8212; if we could write a multi-player network game using only the same public APIs available to our clients, it would be a strong signal that our framework was solid.</p>
<h2>The Rules</h2>
<p>Once we decided that we were going to take on the challenge, we decided that we would do it on Sunday from 8:00PM to midnight (our normal peak productivity hours) and laid out the following rules:</p>
<ul>
<li>There would be a strict four-hour time limit for all planning, design and coding.</li>
<li>The game had to be implemented using only our public APIs &#8212; no touching core code.</li>
<li>The game had to support all of the features provided by the online game and had to be lag-free (after all, that was the whole point of writing our own!).</li>
</ul>
<h2>The Race</h2>
<p>Sunday night came along, and it was off to the races! We projected a countdown timer onto a whiteboard in the middle of the developer space, blasted some techno music and got to work! Five developers decided to participate, and it was a pretty collaborative effort in which most of the participants ended up contributing in their standard roles &#8212; the frontend people did the game graphics and UI, the backenders worked on the server and game logic, and our data folks dealt with creating a game board provider.</p>
<p>The pluggability of our platform and the fact that it already had support for multiple users and sending out realtime messages from the server to clients made most of the work go pretty smoothly. Besides Eclipse crashing on one of our machines, development was pretty seamless and fast-paced, with team members yelling out at each other across the room to communicate. Within 45 minutes, most of us had finished our first iteration of code, and at the one hour mark we verified that we could successfully get the server and client communicating with each other and draw some game state. By the time we were two hours in, we had most of the core game features implemented, and once we hit the three hour mark we had a fully functional snake game that allowed us to play against each other. We spent the last hour doing some UI polish and ironing out a few bugs, and by the time midnight rolled around we had a fully functional MultiSnake implementation in our platform that was written using only our public APIs. We were even able to get in a few extra features such as a visible timer to count down to the end of the game, a circle to show where a snake respawned and a name that followed the snakes vertically to identify them. We celebrated by eating a rum cake that a coworker had brought in earlier and playing multiple rounds of lag-free MultiSnake against each other. Success!</p>
<h2>The Results</h2>
<div style='text-align: center'>
<object width="425" height="344"><param name="movie" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/misc/snake.flv"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param>
<embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/misc/snake.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object>
</div>
<p>MultiSnake lives on as an add-on to our platform. Even after the challenge was completed, devs have been playing around with the code on slow weekends, adding extra maps, new features such as wormholes, and overhauling the graphics. The final product is a pretty impressive and fun-to-play game that we now often demo as an example of the versatility and power of the APIs for our platform.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/07/06/the-multisnake-challenge/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Bandwidth isn’t cheap. Disk isn’t cheap. CPU isn’t cheap.</title>
		<link>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/</link>
		<comments>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/#comments</comments>
		<pubDate>Sat, 23 May 2009 01:00:26 +0000</pubDate>
		<dc:creator>Bob McGrew</dc:creator>
				<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=961</guid>
		<description><![CDATA[At Palantir, we work in Silicon Valley, read High Scalability, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that &#8220;Disk [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px'><img src='/wp-content/uploads/2009/05/ctu-clearance.jpg' alt='fake clearance screen'/></div>
<p>At Palantir, we work in Silicon Valley, read <a href="http://highscalability.com/">High Scalability</a>, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that &#8220;Disk is cheap.&#8221; or &#8220;CPU is cheap&#8221;. For a web company with a deployment in a commercial data center (or its own data center), this received knowledge is correct.  But for a company that ships distributed systems instead of hosting them, and for whom the deployment environment is the kind of locked-down server room in which classified data can reside, these assumptions couldn’t be more false.</p>
<p>At Palantir, we are almost never able to host our customers’ data – typically, as the data is very sensitive, we are not even allowed to see it!  Our customers&#8217; highly sensitive data has to reside in a <a href='http://en.wikipedia.org/wiki/Sensitive_Compartmented_Information_Facility'>Secure Compartmented Information Facility</a> or SCIF – a building which has been built to be resistant to attempts to access the information within, whether through active or passive measures.  The network inside a SCIF is physically separated – “airgapped” &#8211; from the public Internet to prevent information leakage.  As the entire rationale for such facilities is to prevent information leakage, moving information into or out of one is a tightly regulated process, almost always requiring a human to be in the loop.<br />
<span id="more-961"></span></p>
<h3>Bandwidth is narrow</h3>
<p>Bandwidth in and out of a data center is cheap. Bandwidth in and out of a SCIF is not &#8211; and this manifests in surprising ways. First off, what does it take to get data into a SCIF? First, the data has to be downloaded from wherever it&#8217;s hosted and burned to a CD. Then, someone has to carry it into the SCIF and find a security officer to approve adding it to the network. Finding the security officer can take anywhere from 10 minutes to an entire day. Once you&#8217;ve found the security officer, he has to run a virus scan on the CD, which can run at a rate of roughly 20 minutes per 100MB.</p>
<p>If you look at the entire process, you can model our connection into the SCIF as averaging about an 8 hour latency and 640 Kbps bandwidth. That&#8217;s about the bandwidth of a slow DSL line and the latency of a radio connection to Pluto. (Actually, it’s somewhat slower.) There&#8217;s also a big non-linearity at 700MB, which is the amount of data that fits on a single CD.  For instance, this non-linearity is the big reason why we prefer to send patches to our customers rather than full distributions, which are slightly less than a gigabyte including dependencies – and thus why it’s worth it to us to build a system for automating patch application rather than simply replacing jar files by hand.</p>
<h3>Disks are expensive</h3>
<p>Similarly, if you are running a data warehouse, disk is cheap. You can buy a 1 TB, 7200 RPM disk for about $100, which is perfect for the kind of large, serial reads or writes that a data warehousing workflow requires. However, Palantir uses disk for our database and our search engine, both of which have an <a href='http://en.wikipedia.org/wiki/OLTP'>OLTP</a>-style usage pattern.  As opposed to a data warehouse access pattern, which emphasizes full table scans, OLTP emphasizes random access and therefore requires fast disk. To get 1TB at 15k RPMs costs about $1000, and requires a disk array rather than a single disk. In order to keep the disk fast, you also want to leave it only about 20% full, which overall makes fast disk about 50 times more expensive than slow disk. Most importantly, however, installing a disk array requires trained personnel, a special approval process, and reconfiguring the system to use the new disks, which is a fairly complicated and error-prone process.</p>
<h3>CPUs are hot</h3>
<p>Finally, in a commercial data center, CPU is the cheapest resource of all. In a secure server room, however, it can be quite expensive. Each CPU or additional box requires more power and cooling. If the room is nearly full, adding that extra box may require building out an entirely new server room, which can cost months and hundreds of thousands of dollars just for an office building. Building a server room in a SCIF is much more expensive and prohibitively time-consuming.</p>
<h3>RAM to the rescue</h3>
<p>On the other hand, some things in a SCIF are comparatively cheap. We never use boxes with less than 32GB of memory, and, in fact, lots of sites use 128GB of memory. RAM requires negligible power and cooling, and compared to disk, it&#8217;s relatively simple to install. It&#8217;s also easy to reconfigure the setup to use the additional memory.</p>
<h3>The upshot</h3>
<p>The design guidelines that follow from this are simple: <b>build a system that is as autonomous as possible and scales down as well as it scales out</b>.</p>
<p>All these statistics are compiled from our day-to-day experiences in the office environment of a SCIF. Deploying to soldiers in the field makes the issues involved in deploying to a SCIF seem minor. Of course, that’s what makes what we do fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

