<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; distributed systems</title>
	<atom:link href="http:///category/distributed-systems/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Wed, 14 Dec 2011 17:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Tech Talk: the Hedgehog Programming Language</title>
		<link>http://blog.palantirtech.com/2011/06/06/tech-talk-the-hedgehog-programming-language/</link>
		<comments>http://blog.palantirtech.com/2011/06/06/tech-talk-the-hedgehog-programming-language/#comments</comments>
		<pubDate>Mon, 06 Jun 2011 20:53:38 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1844</guid>
		<description><![CDATA[A few months back, Kevin introduced us to the Hedgehog Programming language &#8211; (here&#8217;s the post if you missed it). The Palantir Finance programming language — Hedgehog as we know it — is an interpreted, statically typed, object-oriented language. With a syntax that’s based loosely on Java, it mixes roughly Java-style semantics and a few [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 300px; margin-bottom: 15px; margin-left: 15px'><a target='new' href='http://www.pfinance.com/'><img src="http://blog.palantir.com/wp-content/uploads/2010/10/hedgehog.jpg" alt="" title="hedgehog" width="300" height="129" class="alignnone size-medium wp-image-1753" /></a></div>
<p>A few months back, Kevin introduced us to the Hedgehog Programming language &#8211; <a href="http://www.youtube.com/watch?v=54Vv3Os3Ep4">(here&#8217;s the post if you missed it)</a>.  </p>
<p>The Palantir Finance programming language — Hedgehog as we know it — is an interpreted, statically typed, object-oriented language. With a syntax that’s based loosely on Java, it mixes roughly Java-style semantics and a few idiosyncrasies that make it a really interesting case study in language design. It’s built to be extremely efficient for batch operations on time series, which is the heavy lifting in financial analysis.</p>
<p>In this video, Eugene and Dave, two of the engineers that work on the language and platform features needed to support it, give a talk that goes into a number of areas around the Hedgehog language, including why we needed to build a language, how it makes the platform more powerful, how we built dev tools into the UI to make debugging easier, and a bunch of the nitty-gritty features that go into the strange (but fitting) beast that is the Hedgehog Language.</p>
<p><iframe title="YouTube video player" width="640" height="510" src="http://www.youtube.com/embed/54Vv3Os3Ep4" frameborder="0" allowfullscreen></iframe></p>
<p>As a final note: this is one of things that I love about working at Palantir Technologies.  We study a problem pretty hard before we decide that we need to re-invent the wheel &#8211; and then when we do, we go all out.  It&#8217;s one of the benefits of working with the incredibly talented and motivated folks here.  When someone says, &#8220;well, we need to build a programming language.  No, we&#8217;re sure,&#8221; we just roll up our sleeves and do it.  We can add it to the list of: <a href="http://blog.palantir.com/2009/02/23/palantir-monitoring-server-where-build-beats-buy/">JMX monitoring system</a>, <a href="http://blog.palantir.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/">refined Lucene search engine</a>, <a href="http://blog.palantir.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/">speeding up Map-Reduce-like systems to interactive time</a>, and <a href="http://www.palantirtech.com/government/analysis-blog/isr">implementing our own GIS platform</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/06/06/tech-talk-the-hedgehog-programming-language/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inside Horizon: interactive analysis at cloud scale</title>
		<link>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/</link>
		<comments>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/#comments</comments>
		<pubDate>Fri, 15 Apr 2011 19:04:46 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">https://wp-admin-techblog.yojoe.local/?p=1837</guid>
		<description><![CDATA[Late last year, we were honored to be invited to talk at Reflections&#124;Projections, ACM@UIUC&#8217;s annual student-run computing conference. We decided to bring a talk about Horizon, our system for doing aggregate analysis and filtering across very large amounts of data. The video of the talk was posted a few weeks back on the conference website. [...]]]></description>
			<content:encoded><![CDATA[<div style='width: 250; margin-left: 10px; margin-bottom: 10px; float: right;'><a href="http://www.acm.uiuc.edu/conference/2010/"><img src="http://blog.palantir.com/wp-content/uploads/2011/03/reflectionsprojections.png" alt="" title="reflectionsprojections" width="250" height="215"/></a></div>
<p>Late last year, we were honored to be invited to talk at Reflections|Projections, ACM@UIUC&#8217;s annual student-run computing conference.  We decided to bring a talk about Horizon, our system for doing aggregate analysis and filtering across very large amounts of data.  The video of the talk was posted a few weeks back on <a href="http://www.acm.uiuc.edu/Conferenceware/Schedule/Videos">the conference website</a>.</p>
<p>Horizon started as research project / technology demonstrator built as part of Palantir&#8217;s Hack Week &#8211; a periodic innovation sprint that our engineering team uses to build brand new ideas from whole cloth.  It was then used by the Center For Public Integrity in their <a href="http://www.publicintegrity.org/investigations/economic_meltdown/">Who&#8217;s Behind The Subprime Meltdown</a> report.  We produced a short video on the subject, <a href="http://www.palantirtech.com/government/analysis-blog/horizon">Beyond the Cloud: Project Horizon</a>, released on our analysis blog.  Subsequently, it was folded into our product offering, under the name <a href="http://www.palantirtech.com/labs/object-explorer">Object Explorer</a>.</p>
<p>In this hour-long talk, two of the engineers that built this technology tell the story of how Horizon came to be, how it works, and show a live demo of doing analysis on hundreds of millions of records in interactive time.</p>
<p><iframe title="YouTube video player" width="640" height="510" src="http://www.youtube.com/embed/9dOpDeRMTMc" frameborder="0" allowfullscreen></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2011/04/15/inside-horizon-interactive-analysis-at-cloud-scale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bandwidth isn’t cheap. Disk isn’t cheap. CPU isn’t cheap.</title>
		<link>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/</link>
		<comments>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/#comments</comments>
		<pubDate>Sat, 23 May 2009 01:00:26 +0000</pubDate>
		<dc:creator>Bob McGrew</dc:creator>
				<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=961</guid>
		<description><![CDATA[At Palantir, we work in Silicon Valley, read High Scalability, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that &#8220;Disk [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px'><img src='/wp-content/uploads/2009/05/ctu-clearance.jpg' alt='fake clearance screen'/></div>
<p>At Palantir, we work in Silicon Valley, read <a href="http://highscalability.com/">High Scalability</a>, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that &#8220;Disk is cheap.&#8221; or &#8220;CPU is cheap&#8221;. For a web company with a deployment in a commercial data center (or its own data center), this received knowledge is correct.  But for a company that ships distributed systems instead of hosting them, and for whom the deployment environment is the kind of locked-down server room in which classified data can reside, these assumptions couldn’t be more false.</p>
<p>At Palantir, we are almost never able to host our customers’ data – typically, as the data is very sensitive, we are not even allowed to see it!  Our customers&#8217; highly sensitive data has to reside in a <a href='http://en.wikipedia.org/wiki/Sensitive_Compartmented_Information_Facility'>Secure Compartmented Information Facility</a> or SCIF – a building which has been built to be resistant to attempts to access the information within, whether through active or passive measures.  The network inside a SCIF is physically separated – “airgapped” &#8211; from the public Internet to prevent information leakage.  As the entire rationale for such facilities is to prevent information leakage, moving information into or out of one is a tightly regulated process, almost always requiring a human to be in the loop.<br />
<span id="more-961"></span></p>
<h3>Bandwidth is narrow</h3>
<p>Bandwidth in and out of a data center is cheap. Bandwidth in and out of a SCIF is not &#8211; and this manifests in surprising ways. First off, what does it take to get data into a SCIF? First, the data has to be downloaded from wherever it&#8217;s hosted and burned to a CD. Then, someone has to carry it into the SCIF and find a security officer to approve adding it to the network. Finding the security officer can take anywhere from 10 minutes to an entire day. Once you&#8217;ve found the security officer, he has to run a virus scan on the CD, which can run at a rate of roughly 20 minutes per 100MB.</p>
<p>If you look at the entire process, you can model our connection into the SCIF as averaging about an 8 hour latency and 640 Kbps bandwidth. That&#8217;s about the bandwidth of a slow DSL line and the latency of a radio connection to Pluto. (Actually, it’s somewhat slower.) There&#8217;s also a big non-linearity at 700MB, which is the amount of data that fits on a single CD.  For instance, this non-linearity is the big reason why we prefer to send patches to our customers rather than full distributions, which are slightly less than a gigabyte including dependencies – and thus why it’s worth it to us to build a system for automating patch application rather than simply replacing jar files by hand.</p>
<h3>Disks are expensive</h3>
<p>Similarly, if you are running a data warehouse, disk is cheap. You can buy a 1 TB, 7200 RPM disk for about $100, which is perfect for the kind of large, serial reads or writes that a data warehousing workflow requires. However, Palantir uses disk for our database and our search engine, both of which have an <a href='http://en.wikipedia.org/wiki/OLTP'>OLTP</a>-style usage pattern.  As opposed to a data warehouse access pattern, which emphasizes full table scans, OLTP emphasizes random access and therefore requires fast disk. To get 1TB at 15k RPMs costs about $1000, and requires a disk array rather than a single disk. In order to keep the disk fast, you also want to leave it only about 20% full, which overall makes fast disk about 50 times more expensive than slow disk. Most importantly, however, installing a disk array requires trained personnel, a special approval process, and reconfiguring the system to use the new disks, which is a fairly complicated and error-prone process.</p>
<h3>CPUs are hot</h3>
<p>Finally, in a commercial data center, CPU is the cheapest resource of all. In a secure server room, however, it can be quite expensive. Each CPU or additional box requires more power and cooling. If the room is nearly full, adding that extra box may require building out an entirely new server room, which can cost months and hundreds of thousands of dollars just for an office building. Building a server room in a SCIF is much more expensive and prohibitively time-consuming.</p>
<h3>RAM to the rescue</h3>
<p>On the other hand, some things in a SCIF are comparatively cheap. We never use boxes with less than 32GB of memory, and, in fact, lots of sites use 128GB of memory. RAM requires negligible power and cooling, and compared to disk, it&#8217;s relatively simple to install. It&#8217;s also easy to reconfigure the setup to use the additional memory.</p>
<h3>The upshot</h3>
<p>The design guidelines that follow from this are simple: <b>build a system that is as autonomous as possible and scales down as well as it scales out</b>.</p>
<p>All these statistics are compiled from our day-to-day experiences in the office environment of a SCIF. Deploying to soldiers in the field makes the issues involved in deploying to a SCIF seem minor. Of course, that’s what makes what we do fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Palantir Config Server: lining up the ducks</title>
		<link>http://blog.palantirtech.com/2009/03/06/palantir-config-server-lining-up-the-ducks/</link>
		<comments>http://blog.palantirtech.com/2009/03/06/palantir-config-server-lining-up-the-ducks/#comments</comments>
		<pubDate>Fri, 06 Mar 2009 10:00:57 +0000</pubDate>
		<dc:creator>Khan Tasinga</dc:creator>
				<category><![CDATA[distributed systems]]></category>
		<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[problemspace-government]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=193</guid>
		<description><![CDATA[At Palantir, we build distributed software. When deployed at a customer site, our platform consists of several servers running on, and distributed across, a cluster of machines. When I first joined the company, deploying and managing our platform was tedious and time consuming. Need to install servers? One by one, login to the machines where [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px'><img src="/wp-content/uploads/2009/03/installpalantirservers.png" alt="" width="261" height="394" /></div>
<p>At Palantir, we build distributed software. When deployed at a customer site, our platform consists of several servers running on, and distributed across, a cluster of machines. When I first joined the company, deploying and managing our platform was tedious and time consuming. Need to install servers? One by one, login to the machines where they need to go, lay down their requisite files and manually configure them such that they can work together. Have to bring down a deployment for scheduled maintenance? One by one, and in the correct order, login to the machines where the servers reside and shut them down. Want to change the private keys and certificates used to secure communication between servers? Well, you get the point.</p>
<p>From a customer perspective, the complexity associated with the administration of distributed software represents a significant challenge. Not providing tools to help reduce that complexity impacted the overall usability of our platform. Furthermore, from a Palantir perspective, a non-trivial portion of our resources were being devoted to deploying and managing instances of our platform, both externally (by Forward Deployed Engineers working directly with our customers) and internally (by development, QA and support staff working to maintain and improve our product). Could we be more efficient? No doubt. Given our intense focus on customer satisfaction and the desire to grow / scale our business, action was necessary.</p>
<p>To see how we solved this problem, read on.<br />
<span id="more-193"></span></p>
<p>We stepped back a bit, taking time to reflect on our situation and understand the problem. Based upon our experience, what key areas would a solution need to address? We settled on the following:</p>
<ol>
<li><strong>Lifecycle management.</strong>
<ol>
<li>Ease initial deployment and upgrade.</li>
<li>Handle coordinated starting, stopping and restarting.</li>
</ol>
</li>
<li><strong>Configuration management.</strong>
<ol>
<li>Track which servers are installed on what machines.</li>
<li>Provide centralized management of server configuration information.</li>
</ol>
</li>
<li><strong>Automation.</strong>
<ol>
<li>Support encoding common management tasks based on best practices.</li>
</ol>
</li>
</ol>
<p>In addition to those three key areas, we also identified several important requirements. A couple that definitely warrant mention:</p>
<ol>
<li><strong>Security.</strong></li>
<li><strong>Extensibility.</strong></li>
</ol>
<p>After getting a good sense of what needed to be accomplished, we put effort into investigating if an existing solution would fit the bill. For a variety of reasons (i.e., available feature set, licensing constraints, etc.), we never found a good match. We did, however, come across several open source building blocks that could, when composed appropriately, combine to form the foundation of a homegrown solution. The Config Server was born.</p>
<h2>Architecture</h2>
<div class="postimg"><img src="/wp-content/uploads/2009/03/configserverarchitecture.png" alt="" width="650" /></div>
<p>The Config Server works with remote agents to enable centralized deployment management. The diagram presented above provides an overview of our management infrastructure. Below is a brief discussion of each key component of our architecture.</p>
<ul>
<li><strong>Agent</strong> &#8211; Agents are installed on every machine in a deployment. They are lightweight background processes that sit around waiting to execute commands submitted by the Config Server, interacting directly with the services installed on a given machine. Instead of implementing our own agent solution, we decided to leverage existing technology, the open source peer-to-peer <a rel="nofollow" href="http://staf.sourceforge.net/">Software Testing Automation Framework (STAF)</a>. From its homepage:<br />
<blockquote><p>The Software Testing Automation Framework (STAF) is an open source, multi-platform, multi-language framework designed around the idea of reusable components, called services (such as process invocation, resource management, logging, and monitoring). STAF removes the tedium of building an automation infrastructure, thus enabling you to focus on building your automation solution. The STAF framework provides the foundation upon which to build higher level solutions, and provides a pluggable approach supported across a large variety of platforms and languages.</p></blockquote>
<p>We added support for two-way SSL to STAF to enhance the security of our management infrastructure (specifically, to allow us to implement authorization based on self-signed certificates). But beyond that, no modification was necessary. STAF provides us with a robust solution for remote process invocation and file management, both absolutely essential for centralized deployment management.</li>
<li><strong>Agent Manager</strong> &#8211; The Agent Manager provides lifecycle and configuration management functionality for the agents in a deployment. It interacts with remote machines through SSH, using the open source <a rel="nofollow" href="http://www.trilead.com/Products/Trilead_SSH_for_Java/">Trilead SSH for Java</a> library.</li>
<li><strong>Config Registry</strong> &#8211; The Config Registry maintains and provides access to all of the information the Config Server has about a deployment. It consist of the following:
<ul>
<li><strong>Agent Registry</strong> &#8211; The Agent Registry contains information about all of the agents in a deployment.</li>
<li><strong>Service Registry</strong> &#8211; The Service Registry keeps track of all of the services in a deployment.</li>
<li><strong>Config Repository</strong> &#8211; The Config Repository is a central store for configurations of the agents and services in a deployment.</li>
<li><strong>Package Repository</strong> &#8211; The Package Repository holds all of the service packages that can be installed in a deployment.</li>
<li><strong>Plugin Repository</strong> &#8211; The Plugin Repository houses all of the plugins that are available for use in the Config Server. Plugins are used by the Security Manager, Service Manager and Task Manager.</li>
</ul>
</li>
<li><strong>Security Manager</strong> &#8211; We secure our servers and management infrastructure using public key cryptography. The Security Manager handles the generation and packaging of private keys and certificates. We perform private key and certificate generation using the <a rel="nofollow" href="http://www.bouncycastle.org/java.html">Bouncy Castle Crypto APIs for Java</a>. Packaging is taken care of by plugins in the Plugin Repository. For example, one plugin packages private keys and certificates into JKS files for use with Java, while another packages them into PEM files for use with OpenSSL.</li>
<li><strong>Service</strong> &#8211; Services represent the software installed on the machines in a deployment that drive our platform. They correspond to the servers we&#8217;ve built and the 3rd party offerings on which they depend (i.e., databases, entity extractors, etc.).</li>
<li><strong>Service Manager</strong> &#8211; The Service Manager interacts with agents to provide lifecycle and configuration management functionality for the services in a deployment. The actual mechanics of lifecycle and configuration management vary from to service to service. For example, starting service A might require invoking one script, while starting service B might require invoking another. For each type of service in a deployment, the Plugin Repository contains a corresponding plugin that embeds the necessary management logic. The Service Manager works with those plugins to get its job done.</li>
<li><strong>Task Manager</strong> &#8211; Managing a deployment requires performing tasks that go beyond lifecycle and configuration management for its constituent agents and services (i.e., log aggregation, database user creation, etc.). Such tasks are implemented as plugins. They make things happen by communicating with agents and / or directly with machines via SSH. The Task Manager interacts with the Plugin Manager to load tasks and coordinate their execution.</li>
</ul>
<h2>Functionality</h2>
<p>How did we do with respect to our stated needs?</p>
<ul>
<li><strong>Lifecycle management</strong> &#8211; The Agent Manager and Service Manager provide centralized lifecycle management. Initial deployment and upgrades, as well as starting, stopping and restarting servers, can all be handled directly through the Config Server.</li>
<li><strong>Configuration management</strong> &#8211; The Config Repository of the Config Server maintains information about deployments and provides centralized configuration management. The Agent Manager and Service Manager support the remote retrieval and application of agent and service configuration.</li>
<li><strong>Automation</strong> &#8211; The Config Server&#8217;s functionality is exposed via a clean and consistent Java API. Common management tasks can be automated by writing code against that API.</li>
</ul>
<p>And what about some of our more important requirements?</p>
<ul>
<li><strong>Security</strong> &#8211; All communication in our management infrastructure is secured using two-way SSL. A simple authorization mechanism, implemented using self-signed certificates, ensures that only the authorized entities (most notably, the Config Server), can execute commands through agents. Client access to the data maintained, and functionality exposed, by the Config Server requires password-based authorization.</li>
<li><strong>Extensibility</strong> &#8211; The Config Server can be extended to support new types of services and perform new tasks by implementing plugins and dropping them in the Plugin Repository.</li>
</ul>
<h2>Future</h2>
<p>In the space of a few months, we built the Config Server to address several key needs and requirements related to the management of our platform. Our work has already begun to pay dividends. Looking ahead, there are several things we would like to do:</p>
<ul>
<li>Add support for low-level system management and configuration related to our platform (i.e., user and group management, firewall configuration, etc.).</li>
<li>Implement multi-deployment management with support for features like staging, mirroring and migration.</li>
<li><a rel="nofollow" href="http://en.wikipedia.org/wiki/Autonomic_computing">Autonomic Computing</a>, integrating with our monitoring solution to implement platform self-management.</li>
</ul>
<p>While we&#8217;ve accomplished a fair amount, plenty of work remains. We look forward to enhancing our Config Server and its associated infrastructure as we strive to make our platform one that is not only powerful and a pleasure to use, but also easy to manage and maintain.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/03/06/palantir-config-server-lining-up-the-ducks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

