<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; performance</title>
	<atom:link href="http:///category/performance/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Wed, 14 Dec 2011 17:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Inside Horizon: interactive analysis at cloud scale</title>
		<link>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/</link>
		<comments>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/#comments</comments>
		<pubDate>Fri, 15 Apr 2011 19:04:46 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1837</guid>
		<description><![CDATA[Late last year, we were honored to be invited to talk at Reflections&#124;Projections, ACM@UIUC&#8217;s annual student-run computing conference. We decided to bring a talk about Horizon, our system for doing aggregate analysis and filtering across very large amounts of data. The video of the talk was posted a few weeks back on the conference website. [...]]]></description>
			<content:encoded><![CDATA[<div style='width: 250; margin-left: 10px; margin-bottom: 10px; float: right;'><a href="http://www.acm.uiuc.edu/conference/2010/"><img src="http://blog.palantir.com/wp-content/uploads/2011/03/reflectionsprojections.png" alt="" title="reflectionsprojections" width="250" height="215"/></a></div>
<p>Late last year, we were honored to be invited to talk at Reflections|Projections, ACM@UIUC&#8217;s annual student-run computing conference.  We decided to bring a talk about Horizon, our system for doing aggregate analysis and filtering across very large amounts of data.  The video of the talk was posted a few weeks back on <a href="http://www.acm.uiuc.edu/Conferenceware/Schedule/Videos">the conference website</a>.</p>
<p>Horizon started as research project / technology demonstrator built as part of Palantir&#8217;s Hack Week &#8211; a periodic innovation sprint that our engineering team uses to build brand new ideas from whole cloth.  It was then used by the Center For Public Integrity in their <a href="http://www.publicintegrity.org/investigations/economic_meltdown/">Who&#8217;s Behind The Subprime Meltdown</a> report.  We produced a short video on the subject, <a href="http://www.palantirtech.com/government/analysis-blog/horizon">Beyond the Cloud: Project Horizon</a>, released on our analysis blog.  Subsequently, it was folded into our product offering, under the name <a href="http://www.palantirtech.com/labs/object-explorer">Object Explorer</a>.</p>
<p>In this hour-long talk, two of the engineers that built this technology tell the story of how Horizon came to be, how it works, and show a live demo of doing analysis on hundreds of millions of records in interactive time.</p>
<p><iframe title="YouTube video player" width="640" height="510" src="http://www.youtube.com/embed/9dOpDeRMTMc" frameborder="0" allowfullscreen></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Palantir: search with a twist (part two: realtime indexing and security)</title>
		<link>http://blog.palantirtech.com/2009/10/27/palantir-search-with-a-twist-part-two-realtime-indexing-and-security/</link>
		<comments>http://blog.palantirtech.com/2009/10/27/palantir-search-with-a-twist-part-two-realtime-indexing-and-security/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 07:01:01 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1260</guid>
		<description><![CDATA[[A number of weeks ago, we published a post on the search technology used by Palantir. That post covered raising the memory efficiency of a couple of operations. This is part two of that series.] The most familiar use of search engines is to index documents made available on the Internet via the hypertext transfer [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px'><img src='/wp-content/uploads/2009/08/200px-magnifying_glass_icon.png' alt='magnifying glass'/></div>
<p><em>[A number of weeks ago, we published <a href="http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/">a post on the search technology</a> used by Palantir.  That post covered raising the memory efficiency of a couple of operations.  This is part two of that series.]</em></p>
<p>The most familiar use of search engines is to index documents made available on the Internet via the <a href="http://www.ietf.org/rfc/rfc2616.txt">hypertext transfer protocol</a>. Forgotten names like <a href="http://en.wikipedia.org/wiki/AltaVista">AltaVista</a>, names not-yet-really-learned like <a href="http://web.archive.org/web/20040828134017/http://www.bing.com/">Bing</a>, and, of course, <a href="http://infolab.stanford.edu/~backrub/google.html">Google</a> come to mind.</p>
<p>This one, massive use case has a couple of properties that I&#8217;d like to highlight:</p>
<ul>
<li>Asynchronous indexing and querying &#8211; web search engines tend to use crawlers and indexers to build up an index of the web.  After each crawl is finished, the new index is brought online for use by the query engine.</li>
<li>Lack of access controls &#8211; all the data in the index is available to any query.  In fact, most queries are (from the standpoint of the index) completely anonymous.</li>
</ul>
<h3>Palantir: not a web search engine</h3>
<p>Search technology is just one part of what makes up a Palantir system.  For us, it&#8217;s a way to quickly retrieve Palantir objects in a Palantir system, it&#8217;s not the whole of the application.</p>
<p>I&#8217;d like to highlight a couple of differences from the <a href="http://en.wikipedia.org/wiki/Web_search_engine">web search engine</a> case.  A Palantir system needs the following properties:</p>
<ul>
<li>Realtime indexing and querying &#8211; we need information to be available immediately as it changes in the system.</li>
<li>Leak-proof access controls &#8211; we need the search engine to help us make sure that we don&#8217;t have information leaking across access control boundaries.</li>
</ul>
<p>Hit the link to read more about these topics.<br />
<span id="more-1260"></span></p>
<h2>Realtime indexing</h2>
<p>The Palantir platforms implement realtime indexing: as soon as an analyst changes an object in the system, it needs to be available to query. This could be a change to data in the object or a change to the security tags on the object.</p>
<p>From a programming perspective, this is pretty straightforward: a Palantir transaction will not commit until the search engine is finished indexing the new data.</p>
<p>From a search engine operational perspective, this induces some challenges.  Asynchronous indexing allows the search engines to bring online a highly optimized static form of the index.  Contrast that with realtime indexing, where every cycle spent optimizing the index is removing cycles from serving other queries and there is likely a human waiting for the optimizing process to finish.</p>
<p>When using the static index, a query only accesses one, optimized index file which then points to the documents containing the results.  However, as changes and additions are indexed into the system, there is a lot of overhead to merging them into the master index.</p>
<p>Instead of merging and optimizing on every change, Lucene can keep around a number of smaller indexes that hold all the fresh entries.  These are fixed-size append-only segments that are much cheaper to write to than the optimized and merged form of the index. So basically, these &#8216;dynamic&#8217; indexes are linear lists of single-document indexes.  When the search engine goes to run a query it has to follow this simple (yet expensive) algorithm:</p>
<ul>
<li>Query the static, merged index, accumulating results. <i>(this part is reasonably fast)</i></li>
<li>For each of the dynamic indexes:
<ul>
<li>Open the file, incurring IO overhead.</li>
<li>Query each single-document index and look for additional records or newer records that supersede one of the existing found results.</li>
</ul>
</li>
</ul>
<p>You can see how the overhead of this can quickly get pretty large as the number of dynamic indexes grows: it grows linearly with number of new indexed records.  Compare that with the optimized index, which should be close to constant time for any given query.</p>
<p>To get around this, the indexer will only allow a certain number of these dynamic indexes to accumulate before it kicks off a background merge job.  During the merge job, we take a noticeable performance hit, but by batching up the merge run we amortize the overhead away for an overall performance win.  This hybrid mode didn&#8217;t require us to write any new code, but just to tune Lucene to give us the performance profile we wanted.</p>
<h2>Preventing Information Leaks</h2>
<p>The Palantir data platform has a fairly sophisticated security model baked in (see <em><a href="http://www.palantirtech.com/government/videos/whitevideos">The White Videos</em></a> for a more in-depth look at the security model).  One of the features that we have implemented is the ability to show a narrower view of an object based the user&#8217;s permissions: the user only sees the slice of the data that they have been granted access to.  Part of the complexity in implementing this was that we can&#8217;t even hint that the other, hidden data exists at all.</p>
<p>Search engines ranks their results by relevance, showing the matches to the query that it believes to be most relevant first.  One common way to make these relevance calculations is by comparing the length of the search term or phrase to the length of the term that it matched.  Consider the search term &#8216;king&#8217;: it will match the following phrases:</p>
<ul>
<li>&#8220;I&#8217;m the king of the world!&#8221;</li>
<li>&#8220;King salmon are often found in the Pacific Northwest and are also known as Chinook salmon.&#8221;</li>
<li>&#8220;Yes, my king.&#8221;</li>
</ul>
<p>Using a length-computed relevance, the phrase, &#8220;Yes, my king.&#8221; is the most relevant.</p>
<p>Getting back to the Palantir object model: for each distinct set of permissions that an object has, we compute a different object label based on the properties that are visible to that particular slice.  These multiple titles all go into the search engine.  If we were to compute relevance based on the length of the phrase that matched, and the shortest match on the object is shorter than the match that is actually visible to us, we could return the object with a higher-than-obvious relevance.  If we were to do that, we&#8217;d be leaking information, namely that there&#8217;s data on this object that the user making the query is not privy to. (Note that filtering of objects that aren&#8217;t at all visible to the user is done in a higher layer  after the results have been accumulated and ranked by the search engine.)</p>
<p>Given this problem, there are two approaches one can take:</p>
<ol>
<li>Store all the information needed to decide which labels are visible to the user running the query and then use only the visible labels when calculating the relevance of a match. Note that is a pretty expensive operation.</li>
<li>Don&#8217;t use the length of match to compute relevance. Note that skipping a relevance calculation is, obviously, a very cheap thing do.</li>
</ol>
<p>Which do we do?  Both.</p>
<p>When matching against object labels, the length metric actually lets us discern between better and worse matches. So in that case, we incur the cost of this calculation in order to return higher quality results.</p>
<p>However, when matching against things like document bodies, the ratio of the size of the match to the size of the search term starts to have less meaning but still has the possibility of leaking information in the query results.  For fields like this, we turn off the relevance calculations based on length of match. The upshot is the we don&#8217;t have to store the permissions information in the index nor incur the cost of the permissions/views calculation for these fields.</p>
<h2>A heartfelt thank you</h2>
<p>To be clear, this post highlights the ways in which our search code diverges from the main <a href="http://lucene.apache.org/java/docs/">Lucene</a> code base.  We&#8217;re huge fans of Lucene and have great respect for the developers that built and maintain what is probably the world&#8217;s greatest open-source search engine.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/10/27/palantir-search-with-a-twist-part-two-realtime-indexing-and-security/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Palantir: search with a twist (part one: memory efficiency)</title>
		<link>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/</link>
		<comments>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 07:53:59 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[software engineering]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1088</guid>
		<description><![CDATA[A Palantir cluster seamlessly integrates many pieces of proven technology. One of them is our customized version of the venerable Java search engine, Lucene. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query and [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px'><img src='/wp-content/uploads/2009/08/200px-magnifying_glass_icon.png' alt='magnifying glass'/></div>
<p>A Palantir cluster seamlessly integrates many pieces of proven technology.  One of them is our customized version of the venerable Java search engine, <a href="http://lucene.apache.org/java/docs/">Lucene</a>. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query and many, many documents as results. We want to leverage the <a href="http://en.wikipedia.org/wiki/Inverted_index">inverted index</a> capabilities of Lucene, but our data access patterns are a bit different than the typical use case:  we need things like pervasive range-querying, different types of relevance, and dynamic views of the data based on security constraints. So in building our data platform, we&#8217;ve run into some interesting challenges that are pretty unique in the information retrieval realm, specifically:</p>
<ol>
<li>Raising memory efficiency</li>
<li>Real-time indexing</li>
<li>Preventing information leaks across access boundaries in an efficient manner</li>
</ol>
<p>I&#8217;ll cover (1) in this post and (2) and (3) in a <a href="https://wp-admin-techblog.yojoe.local/2009/10/27/palantir-search-with-a-twist-part-two-realtime-indexing-and-security/">later post</a>, due out in about two weeks. <i>(Note: part 2 is available <a href="https://wp-admin-techblog.yojoe.local/2009/10/27/palantir-search-with-a-twist-part-two-realtime-indexing-and-security/">here</a>)</i></p>
<p>Hit the link and we&#8217;ll delve into this topic.<br />
<span id="more-1088"></span></p>
<h2>Raising memory efficiency</h2>
<p>We&#8217;ve addressed the issue of resource constraints, generally, in our earlier post: <a href="http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/"><em>Bandwidth isn’t cheap. Disk isn’t cheap. CPU isn’t cheap.</em></a> In that post, we posited &#8220;RAM to the rescue&#8221;:</p>
<blockquote><p>
On the other hand, some things in a SCIF are comparatively cheap. We never use boxes with less than 32GB of memory, and, in fact, lots of sites use 128GB of memory. RAM requires negligible power and cooling, and compared to disk, it’s relatively simple to install. It’s also easy to reconfigure the setup to use the additional memory.</p></blockquote>
<p>While this is true, no matter how much RAM you buy, your users will find a way to use it all &#8212; search is no exception.  In many of our environments, the search processes share hardware with other processes in the Palantir cluster, so while the OS may have 128 GB of RAM available, the search process&#8217;s VM has substantially less available to it. Compare this to a cluster of dedicated search nodes, where each node will have indexes sized to fit specifically into the memory available.</p>
<p>The upshot is that we needed to modify parts of <a href="http://lucene.apache.org/java/docs/index.html">Lucene</a> to deal with tighter memory constraints than it was designed for.</p>
<h3>Priority queue results accumulation</h3>
<p>Most systems that implement search include some notion of paging through the results.  We use a multi-level paging system, with the search server maintaining a server-side page for each query and serving smaller client-facing pages from.</p>
<p>Vanilla Lucene uses the following algorithm for accumulating search results:</p>
<ol>
<li>Load all matching results.</li>
<li>Sort by some relevance metric(s).</li>
<li>Return the top <i>n</i> results.</li>
</ol>
<p>The results are cached as a server-side page in case the client wants to load more than the first <em>n</em> results. You can see where this could run into trouble: if the total number of matching documents is high, that&#8217;s a lot of wasted RAM while we winnow it down to the size of the server page. So we use the following algorithm:</p>
<ol>
<li>Construct a <a href="http://en.wikipedia.org/wiki/Priority_queue">priority queue</a> of constrained size with priority computed using the chosen relevance metric</li>
<li>Stream through the results, inserting into the queue</li>
<li>Return the set of results in the priority queue</li>
</ol>
<p>Now we never need more RAM than the size of a server-side page to serve results.  The downside is that if the client wants more than one server-side page, we have to run the search &mdash; in its entirety &mdash; twice (ouch). To avoid the first set of results, we adjust the priority queue to kick out all results that were in the first page based on relevance metric.</p>
<h3>Using bitsets to optimize range queries</h3>
<p>A range query can return a result set of very high cardinality &ndash; a range is a very compact way of describing a large set of matching terms (even if they are discrete values, like dates).  One way to think about a range query of, say, <em>10 <= age <= 15</em>, is that it expands to <em>age = 10 OR age = 11 OR age = 12 OR age = 13 OR age = 14 OR age = 15</em>.  Rather than treat range queries in any special way, Lucene just does this expansion of the range and runs the query like a normal query.</p>
<div style='float: right; text-align: right; width: 315px; margin-top: 10px; margin-bottom: 10px;'><img src='/wp-content/uploads/2009/08/searchindexes1.png'/></div>
<p>Internally, Lucene stores a list of metadata nodes, ordered by document id, of each document that matches a given term.  The algorithm goes something like this:</p>
<ol>
<li>Open the document id lists for all matching terms</li>
<li>Walk the list pointers for each potential match such that you accumulate all the metadata for a given document.</li>
<li>Pass all this metadata up to the query processor which decides:
<ol>
<li>Does this document match the overall query? (remember that terms can be inverted)</li>
<li>Use term frequency taken from the metadata to calculate the relevance.</li>
</ol>
</ol>
<p>This structure and attendant algorithm has some nice properties:</p>
<ul>
<li>All documents are processed in a set order.</li>
<li>Everything is known about a document all at once.</li>
<li>It terminates in a single linear scan.</li>
</ul>
<p>&#8230; and has one very nasty property:</p>
<ul>
<li>All of the term value buckets that match the range must be open simultaneously.</li>
</ul>
<p>This is not a big deal for most English language queries.  However, for large ranges and the like, there can be thousands or even millions of terms.</p>
<p>The semantics of range queries have an interesting feature: a document that matches the range twice is not more relevant than one that matches once. (Contrast this with a simple term query: multiple matches <b>do</b> indicate higher relevance). Being able to discard the accounting of how many time we match the range leads to a huge win:</p>
<ol>
<li>We only need a single bit to represent a match</li>
<li>We can process a single term value bucket at a time instead of holding all buckets open in memory.</li>
</ol>
<p>Our search engine accumulates range queries into bitset objects, allowing for a very compact representation of results. We need much less memory than we did before since we only load one term value bucket at a time.  And the algorithm is simpler: no more walking pointers or <em>O(n)</em> check before figuring out which pointer moves next.</p>
<h2>The next episode</h2>
<p>Tune in for <em>Palantir: search with a twist (part two)</em> in a few weeks.  I&#8217;ll cover the following topics:</p>
<ul>
<li>Real-time indexing</li>
<li>Preventing information leaks across access boundaries in an efficient manner. (see Jason&#8217;s <a href='http://www.palantirtech.com/government/analysis-blog/mls'>Multi-Level Security</a> post over on the <a href="http://www.palantirtech.com/government/analysis-blog/">Palantir Government Analysis Blog</a> for a high-level look at why these feature are important. and check out <a href="http://www.palantirtech.com/government/videos/whitevideos">Bob McGrew&#8217;s &#8220;Access Control Model&#8221; White Video</a> for in-depth look at how we apply security to our object model.)
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Bandwidth isn’t cheap. Disk isn’t cheap. CPU isn’t cheap.</title>
		<link>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/</link>
		<comments>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/#comments</comments>
		<pubDate>Sat, 23 May 2009 01:00:26 +0000</pubDate>
		<dc:creator>Bob McGrew</dc:creator>
				<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=961</guid>
		<description><![CDATA[At Palantir, we work in Silicon Valley, read High Scalability, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that &#8220;Disk [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px'><img src='/wp-content/uploads/2009/05/ctu-clearance.jpg' alt='fake clearance screen'/></div>
<p>At Palantir, we work in Silicon Valley, read <a href="http://highscalability.com/">High Scalability</a>, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that &#8220;Disk is cheap.&#8221; or &#8220;CPU is cheap&#8221;. For a web company with a deployment in a commercial data center (or its own data center), this received knowledge is correct.  But for a company that ships distributed systems instead of hosting them, and for whom the deployment environment is the kind of locked-down server room in which classified data can reside, these assumptions couldn’t be more false.</p>
<p>At Palantir, we are almost never able to host our customers’ data – typically, as the data is very sensitive, we are not even allowed to see it!  Our customers&#8217; highly sensitive data has to reside in a <a href='http://en.wikipedia.org/wiki/Sensitive_Compartmented_Information_Facility'>Secure Compartmented Information Facility</a> or SCIF – a building which has been built to be resistant to attempts to access the information within, whether through active or passive measures.  The network inside a SCIF is physically separated – “airgapped” &#8211; from the public Internet to prevent information leakage.  As the entire rationale for such facilities is to prevent information leakage, moving information into or out of one is a tightly regulated process, almost always requiring a human to be in the loop.<br />
<span id="more-961"></span></p>
<h3>Bandwidth is narrow</h3>
<p>Bandwidth in and out of a data center is cheap. Bandwidth in and out of a SCIF is not &#8211; and this manifests in surprising ways. First off, what does it take to get data into a SCIF? First, the data has to be downloaded from wherever it&#8217;s hosted and burned to a CD. Then, someone has to carry it into the SCIF and find a security officer to approve adding it to the network. Finding the security officer can take anywhere from 10 minutes to an entire day. Once you&#8217;ve found the security officer, he has to run a virus scan on the CD, which can run at a rate of roughly 20 minutes per 100MB.</p>
<p>If you look at the entire process, you can model our connection into the SCIF as averaging about an 8 hour latency and 640 Kbps bandwidth. That&#8217;s about the bandwidth of a slow DSL line and the latency of a radio connection to Pluto. (Actually, it’s somewhat slower.) There&#8217;s also a big non-linearity at 700MB, which is the amount of data that fits on a single CD.  For instance, this non-linearity is the big reason why we prefer to send patches to our customers rather than full distributions, which are slightly less than a gigabyte including dependencies – and thus why it’s worth it to us to build a system for automating patch application rather than simply replacing jar files by hand.</p>
<h3>Disks are expensive</h3>
<p>Similarly, if you are running a data warehouse, disk is cheap. You can buy a 1 TB, 7200 RPM disk for about $100, which is perfect for the kind of large, serial reads or writes that a data warehousing workflow requires. However, Palantir uses disk for our database and our search engine, both of which have an <a href='http://en.wikipedia.org/wiki/OLTP'>OLTP</a>-style usage pattern.  As opposed to a data warehouse access pattern, which emphasizes full table scans, OLTP emphasizes random access and therefore requires fast disk. To get 1TB at 15k RPMs costs about $1000, and requires a disk array rather than a single disk. In order to keep the disk fast, you also want to leave it only about 20% full, which overall makes fast disk about 50 times more expensive than slow disk. Most importantly, however, installing a disk array requires trained personnel, a special approval process, and reconfiguring the system to use the new disks, which is a fairly complicated and error-prone process.</p>
<h3>CPUs are hot</h3>
<p>Finally, in a commercial data center, CPU is the cheapest resource of all. In a secure server room, however, it can be quite expensive. Each CPU or additional box requires more power and cooling. If the room is nearly full, adding that extra box may require building out an entirely new server room, which can cost months and hundreds of thousands of dollars just for an office building. Building a server room in a SCIF is much more expensive and prohibitively time-consuming.</p>
<h3>RAM to the rescue</h3>
<p>On the other hand, some things in a SCIF are comparatively cheap. We never use boxes with less than 32GB of memory, and, in fact, lots of sites use 128GB of memory. RAM requires negligible power and cooling, and compared to disk, it&#8217;s relatively simple to install. It&#8217;s also easy to reconfigure the setup to use the additional memory.</p>
<h3>The upshot</h3>
<p>The design guidelines that follow from this are simple: <b>build a system that is as autonomous as possible and scales down as well as it scales out</b>.</p>
<p>All these statistics are compiled from our day-to-day experiences in the office environment of a SCIF. Deploying to soldiers in the field makes the issues involved in deploying to a SCIF seem minor. Of course, that’s what makes what we do fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Oracle&#8217;s JDBC driver + prefetch == garbage [collection]</title>
		<link>http://blog.palantirtech.com/2007/02/23/oracles-jdbc-driver-garbage/</link>
		<comments>http://blog.palantirtech.com/2007/02/23/oracles-jdbc-driver-garbage/#comments</comments>
		<pubDate>Sat, 24 Feb 2007 02:50:31 +0000</pubDate>
		<dc:creator>Ryan Porter</dc:creator>
				<category><![CDATA[javatech]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/02/23/oracles-jdbc-driver-garbage/</guid>
		<description><![CDATA[The Problem Recently, we were experiencing major performance problems with loading documents from the database. Profiling did not isolate a single cause; everything (including unrelated, background operations) seemed slow. So, we started logging garbage collection, and found that we were collecting garbage at a rate of 20GB/min! Profiling revealed that the worst offender, by far, [...]]]></description>
			<content:encoded><![CDATA[<h3>The Problem</h3>
<p>Recently, we were experiencing major performance problems with loading documents from the database.  Profiling did not isolate a single cause; everything (including unrelated, background operations) seemed slow.    So, we started logging garbage collection, and found that we were collecting garbage at a rate of 20GB/min!</p>
<p>Profiling revealed that the worst offender, by far, was <code>OracleStatement.prepareAccessors()</code>.   Interestingly, it only caused a problem when our result set included a <a href="http://www.lc.leidenuniv.nl/awcourse/oracle/java.920/a96654/oralob.htm">LOB</a>.     For such queries, it allocates a 1MB object, regardless of whether the query returned any results at all.</p>
<p>Google searches revealed <a href="http://forums.oracle.com/forums/thread.jspa?messageID=632470">others</a> who saw <a href="http://forums.oracle.com/forums/thread.jspa?messageID=850689">similar problems</a> when accessing LOBs, but no solutions other than upgrading or changing drivers.  We were already using the latest Oracle JDBC driver, and reverting to earlier drivers did not help.  Switching drivers did solve the problem; however, pushing the change to production would require extensive testing to ensure that we were not trading in one problem for another (or more).</p>
<p>I was about to start conducting these tests when John discovered that we were setting the OracleConnection parameter &#8220;defaultRowPrefetch&#8221; to 1000.  This parameter determines how many rows are pulled back from the database on each round-trip, and increasing this value from its default of 10 will normally yield a <a href="http://dba-services.berkeley.edu/docs/oracle/manual-10gR2/java.102/b14355/oraperf.htm#i1043756">performance gain</a>.  As an experiment, I set the value to 1, and re-profiled memory allocation.  The amount of memory allocated by OracleStatement.prepareAccessors() decreased by about three orders of magnitude.  Thus, it appears that when a query can return a LOB, Oracle&#8217;s JDBC driver allocates approximately &#8220;rowPrefetch&#8221; KB of memory, even when zero rows are returned.</p>
<h3>The Solution</h3>
<p>Returning the &#8220;defaultRowPrefetch&#8221; parameter to 10 did rid us of our garbage collection problems.  However, because this is a global setting, reverting it reduced the performance of many other queries which returned many rows with no LOBs.  The prospect of setting &#8220;rowPrefetch&#8221; on a per-query basis was unappetizing, to say the least, but the performance loss was significant.  In the end, we altered how we retrieve rows from the database so that the fetch size geometrically increases as we pull results back from the database.  </p>
<p>Specifically, the first batch we retrieve contains at most 10 rows, after which we increase the batch size to 20.  Once we&#8217;ve retrieved 20 more rows, we increase the fetch size to 40, and so on.  In this way, we never allocate large amounts of memory for queries which return few (or no) results, but we still quickly ramp up to a large fetch size.  </p>
<p>For large queries which returned no LOBs, this solution is still slower than when &#8220;defaultRowPrefetch&#8221; was 1000.    However, the slowdown on those queries was minor, overall system performance was substantially improved, and, importantly, the changes did not require any per-query tuning.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/02/23/oracles-jdbc-driver-garbage/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

