<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; software engineering</title>
	<atom:link href="http://blog.palantirtech.com/category/software-engineering/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.palantirtech.com</link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Fri, 23 Jul 2010 23:33:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A rigorous friction model for human-computer symbiosis</title>
		<link>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/</link>
		<comments>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 03:18:52 +0000</pubDate>
		<dc:creator>Asher</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1344</guid>
		<description><![CDATA[


This is a response to Ari&#8217;s awesome post on human-computer symbiosis. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look:
We are attempting to understand the total analytic capability for a given task a of a human-computer team. [...]]]></description>
			<content:encoded><![CDATA[<div style='text-align: center; float: right; margin-left: 15px; margin-right: 15px'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt="" width="300"/>
</div>
<p>This is a response to <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">Ari&#8217;s awesome post on human-computer symbiosis</a>. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look:</p>
<p>We are attempting to understand the total analytic capability for a given task <strong><em>a</em></strong> of a human-computer team. Analytic capability in this case probably means:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq1.png" alt="eq1"/>(1)
</div>
<p>Where <strong><em>A</em></strong> is the answer to the analytic problem in question and <strong><em>t<sub>A</sub></em></strong> is the time needed to arrive at the answer based on the inputs available. In the case of chess, <strong><em>A</em></strong> could be the optimum next move given all previous information and <strong><em>t<sub>A</sub></em></strong> would be how long it takes to decide on this move.</p>
<p>Read on for a look at how this generalizes in human-computer symbiotic systems.<br />
<span id="more-1344"></span></p>
<p>In the case of the human-computer team, we know that <strong><em>a </em></strong>is going to be a function of both the human&#8217;s analytical capability <strong><em>h</em></strong> and the computer&#8217;s analytical capability <strong><em>c</em></strong> (where both <strong><em>h</em></strong> and <strong><em>c</em></strong> have units of answers/time). In the limit case we know that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq2.png" alt="eq2"/>(2)
</div>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq3.png" alt="eq3"/>(3)
</div>
<p>Or in plain English, if there is no human present, the total analytic capability is simply the analytic capability of the computer. So the naïve solution would be that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq4.png" alt="eq4"/>(4)
</div>
<p>(4) clearly meets the limiting cases described in (2) and (3). Kasparov noticed a mixing function where the ability of the human and computer to work together becomes the dominant term &mdash; we might call this the mixing capability for the given task or <strong><em>m</em></strong>. Including this phenomenon, the total analytic capability (4) would be re-defined as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq5.png" alt="eq5"/>(5)
</div>
<p>where <strong><em>m</em></strong> has the property that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq6.png" alt="eq6"/>(6)
</div>
<p>Thus maintaining the limits expressed in (2) and (3) and adhering to the observation that if there is no human or computer component then there will be no mixing advantage. A naïve solution to this constraint would be simple linear mixing:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq7.png" alt="eq7"/>  (7)
</div>
<p>where <strong><em>M</em></strong> (units of time per answer) is the mixing efficiency and will be primarily based on the type of task being solved &mdash; some analytical tasks lend themselves to a combined process more than others (for example, multiplying 20 digit numbers does not really benefit from the intuition of a human so the ability of a human and computer to perform this task is merely their additive ability). </p>
<p>What Kasparov noticed is that the mixing was primarily based on the quality of the process rather than the analytical power of either the human or computer separately. This seems to imply that we must somehow account for the fact that the quality of the human-computer interface is responsible for the quality of the mixing. This can be modeled as a unitless friction of interaction <strong><em>f<sub>i</sub></em></strong> that impedes the ability of the human and computer to work together. </p>
<p>Equation (7) can thus be re-written as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq8.png" alt="eq8"/>(8)
</div>
<p>In this case, the maximum value for the mixing capability is realized when the friction of interaction goes to zero. This mixing capability is the same as the equation Ari developed (less the coefficient which is necessary to maintain consistent units throughout).</p>
<p>We can now re-write our analytic capability in (5) as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq9.png" alt="eq9"/>(9)
</div>
<p>Below, see a plot of this function over a range of values for <strong><em>h</em></strong>, <strong><em>c</em></strong> and <strong><em>f<sub>i</sub></em></strong>:</p>
<div style='text-align: center; margin: auto; margin-bottom: 1em;'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt=""/>
</div>
<p>As can clearly be seen from this functional plot (note the vertical scale), the effect of interface friction dominates over the other terms whenever both the human and computer can make important contributions to the task at hand. The conclusion can be drawn that the most effective way to solve analytical problems is to minimize the friction of the human-computer interface; or to put it another way: optimal analytical systems are those that are built specifically to maximize the ability of the human to leverage the ability of the computer.</p>
<p>I am certain there is still the possibility for further refinement, for example:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq10a.png" alt="eq10a"/>(10)
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Friction in Human-Computer Symbiosis: Kasparov on Chess</title>
		<link>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/</link>
		<comments>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:32:06 +0000</pubDate>
		<dc:creator>Ari</dc:creator>
				<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1302</guid>
		<description><![CDATA[


As we build our platforms and applications following a human-computer symbiosis approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way.
One of the areas that we&#8217;re interested is in the overall friction of analysis systems.  The systems that we build are [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px;'>
<img src='/wp-content/uploads/2010/03/fools-mate.gif'/>
</div>
<p>As we build our <a href="http://blog.palantirtech.com/2009/11/06/palantir-like-an-operating-system-for-data-analysis/">platforms</a> and <a href="http://blog.palantirtech.com/2009/09/29/the-palantir-technologies-demo-reel-screenshots-round-3/">applications</a> following a <a href="http://en.wikipedia.org/wiki/Intelligence_amplification">human-computer symbiosis</a> approach, we keep an ear to the ground for interesting examples that illuminate new techniques or validate our approach in some empirical way.</p>
<p>One of the areas that we&#8217;re interested is in the overall friction of analysis systems.  The systems that we build are built on commodity hardware &mdash; we&#8217;re not building faster computers and yet we can deliver orders-of-magnitude better performance on analysis tasks than existing solutions.  How do we do this?  By building software in such a way that it reduces the friction experienced at the boundaries between the computing power, the analyst,  and the source data.</p>
<h2>Chess as analysis laboratory</h2>
<p>Chess is, at its heart, a predictive venture.  The player attempts to anticipate their opponent&#8217;s moves, planning their own moves accordingly, with the straightforward goal of finding a sequence of piece moves that force checkmate. </p>
<p>This game is, in its ideal form, analysis. (The moves made are the logical extension of the analysis.)  The data are clean, the problem is well-defined and everyone plays by the same rules.  There are even <a href="http://en.wikipedia.org/wiki/Elo_rating_system">well-defined metrics for ranking chess players by skill</a> &mdash; a better chess player is a better chess-game analyst.  </p>
<p>In the realm of evaluation of analysis systems, this is as about as good as it gets in terms of designing controlled experiments to study the relative strengths of different analysis systems.</p>
<p><a href="http://en.wikipedia.org/wiki/Garry_Kasparov">Garry Kasparov</a>, widely considered to be the greatest chess player of all time,  recently wrote <a href="http://www.nybooks.com/articles/23592">a review of Diego Rasskin Gutman&#8217;s book</a>, <a href="http://www.amazon.com/Chess-Metaphors-Artificial-Intelligence-Human/dp/026218267X"><u>Chess Metaphors: Artificial Intelligence and the Human Mind</u>.</a></p>
<p>The review is excellent and covers a lot of ground.  However, one particular anecdote stood out as a very interesting example of human-computer symbiosis (emphasis added):</p>
<blockquote><p>In 2005, the online chess-playing site Playchess.com hosted what it called a &#8220;freestyle&#8221; chess tournament in which anyone could compete in teams with other players or computers. Normally, &#8220;anti-cheating&#8221; algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less &#8220;intelligent&#8221; than the playing programs they detect.)</p>
<p>Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.</p>
<p>The surprise came at the conclusion of the event. <em>The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time.</em> Their skill at manipulating and &#8220;coaching&#8221; their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. <em>Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</em></p></blockquote>
<p>After the jump, we look at this finding in a more generalized way and map it onto the Palantir approach.<br />
<span id="more-1302"></span></p>
<h2>The cyborg Grandmaster: a fearsome opponent</h2>
<p>The tournament Kasparov recalls was a showcase of chess talent, human-computer symbiosis, and raw computing power.  Among those entered  in the tournament were a purpose-made chess machine (similar to <a href="http://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)">Deep Blue</a>) named <a href="http://en.wikipedia.org/wiki/Hydra_(chess)">Hydra</a> and a team of <a href="http://en.wikipedia.org/wiki/Grandmaster_(chess)">Grandmasters</a> assisted by computer programs.</p>
<p>One losing participant had this to say about the computer-aided Grandmasters:</p>
<blockquote><p>
Secondly, I have learned that a <a href="http://en.wikipedia.org/wiki/Grandmaster_(chess)">Grandmaster</a> armed with a chess engine is a killer combination against a plain Engine. Engines see everything via brute force, Grandmasters use their intuition and are able to see &#8220;obvious&#8221; moves at once. So the two of them together are a mighty force.
</p></blockquote>
<p>This is just as Licklider predicted 50 years ago &#8212; quoting <a href="http://blog.palantirtech.com/man-computer-symbiosis/">Man-Computer Symbiosis</a> (if I could put it better, I would):</p>
<blockquote><p>
Men will set the goals and supply the motivations, of course, at least in the early years. They will formulate hypotheses. They will ask questions&#8230; In general, they will make approximate and fallible, but leading, contributions, and they will define criteria and serve as evaluators, judging the contributions of the equipment and guiding the general line of thought.</p>
<p>&#8230;</p>
<p>In addition, the computer will serve as a statistical-inference, decision-theory, or game-theory machine to make elementary evaluations of suggested courses of action whenever there is enough basis to support a formal statistical analysis. Finally, it will do as much diagnosis, pattern-matching, and relevance-recognizing as it profitably can, but it will accept a clearly secondary status in those areas.
</p></blockquote>
<p>So in classic intelligence amplification fashion, having computer programs that can quickly evaluate a move&#8217;s likelihood of success can <em>amplify the power of the Grandmaster</em>.</p>
<p>While empirically true, it does beg the question: how <em>much</em> does it amplify the power of the Grandmaster?</p>
<p>One approximation might be product as a simple linear amplification.  Let&#8217;s imagine a function, <em>a(h,c)</em>, in which the analytic power (<em>a</em>) is the product of power of the human (<em>h</em>) and the computing power of the chess engine being used (<em>c</em>).  This gives us the equation:</p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-simple.png'/>
</div>
<h2>One term to dominate them all: friction-of-interface</h2>
<p>Does this simple approximation hold up?  It does not. The team that won the <a href="http://www.chessbase.com/newsdetail.asp?newsid=2461">PAL/CSS Freestyle Tournament in 2005</a> was composed of two amateur chess players that were able to best a computer-assisted Grandmaster.</p>
<p>How did  they accomplish this feat?  It was not through superior compute power.  Instead, they did so by more effectively feeding insights to their three chess engines. They played so well that a large number of people actually assumed that it was actually Kasparov himself playing:</p>
<blockquote><p>
Many speculated that it might be Garry Kasparov, who was the initiator of this kind of computer assisted chess matches. When we asked him Kasparov confirmed that was not the case. But he reminded us that it doesn&#8217;t really matter. The guiding principle of Freestyle Chess: anything is allowed. &#8220;Even if they were assisted by the devil, that would probably be covered by the rules,&#8221; he joked. &#8220;Only the moves they played count.&#8221;
</p></blockquote>
<p>What does this mean for our simple equation? Well, it looks it&#8217;s missing a term, one we&#8217;ll call <em>f</em>, that describes the efficiency or <strong>friction</strong> of the interface between human and computer.</p>
<p>Quoting Kasparov again:</p>
<blockquote><p>
<em>Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</em>
</p></blockquote>
<p>The implication being that the equation actually looks like this:</p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-variable-h.png'>
</div>
<p>So as the friction of the interface goes to zero, the full amplification of the chess engine is brought to bear.  A quick gut-check in the opposite direction agrees: one can imagine the world&#8217;s most powerful chess engine with the world&#8217;s worst interface; spending the time it would take to express commands to this theoretically awful program would actually be worse than playing without it.</p>
<h2>Palantir: a low-friction interface to data</h2>
<p>As analysis problems go, chess resembles <a href="http://en.wikipedia.org/wiki/Spherical_cow">a spherical cow in a vacuum</a>.  Analysis problems in the real world are orders of magnitude messier.</p>
<p>Let&#8217;s reframe the terms of our equation above into a more general approach to analysis:</p>
<ul>
<li><em>H</em> &#8211; this is power of the analyst.  In chess, the value of this terms varies widely between players; in designing real-world data analysis systems, this is more or less a constant (which is why <em>h</em> above becomes <em>H</em> below).  Of course there are differing levels of expertise, training, and raw ability amongst the user population, but when we design systems, it&#8217;s with the average case in mind.</li>
<li><em>c</em> &#8211; computing power. How fast are the machines?  How well do they scale?  How efficiently do they perform the data tasks at hand? Palantir spends significant engineering effort on optimizing the <em>c</em> term, but most of the growth in this term comes from the layers we depend on, built by companies like Intel, Sun, Oracle, etc.</li>
<li><em>f</em> &#8211; friction.  How easy is it to bring <em>c</em> to bear on the problem? Note that when we talk about <em>friction of interface</em>, this is not exclusively referring to user interface.  More generally, friction can be present at any interface between two systems: data-software, software-software, human-software, etc. The <em>f</em> that we consider in this simple model is sum total system friction.</li>
</ul>
<p>So our final formulation is just in terms of <em>c</em> and <em>f</em> (holding <em>H</em> as a constant): </p>
<div style='text-align: center'>
<img src='/wp-content/uploads/2010/03/hcs-eq-final.png'>
</div>
<p>When we discuss friction in real-world analysis systems, the friction actually exists at multiple levels:</p>
<ol>
<li>Creating an analysis model that will enable answering the questions that need to be explored</li>
<li>Integrating the data into a single coherent view of the problem</li>
<li>Enabling analysis tools to efficiently query and load the data</li>
<li>Exposing APIs that allow developers to develop custom solutions quickly and efficiently for modeling and analysis tasks not covered by general tools</li>
<li>User interface that makes the tools easy, enjoyable, and quick to use</li>
</ol>
<h3>Minimizing <em>f</em>: Haiti Flooding Predictions</h3>
<p>If this is starting to sound very similar to Palantir&#8217;s marketing information, this is no accident. While some of our backend engineers are concerned with things like scaling and speed-of-querying, the overall innovation that we&#8217;re bringing to the field is not simply about faster data processing systems (even if they are) but reducing the friction at every interface inside a complex human-computer symbiotic system.</p>
<p>You want an example that ties it all together?  It starts with a simple question: which of the many displaced-person camps in Haiti are most at risk for flooding as the rainy season approaches?  Easy to ask, but not so simple to answer. </p>
<p>The original introduction to this video: </p>
<blockquote><p>As we enter the beginning of the rainy season in Haiti, one of the biggest problems facing relief organizations today is the spectre of flooding and mudslides destroying Internally Displaced Persons (IDP) Camps. In this video, we integrate data from many sources to determine high risk aid locations.
</p></blockquote>
<p>The data integration for this video took about six hours, using sources of data that had never before been fused.  The analysis itself takes a few minutes and quickly comes to an actionable answer to the original question.</p>
<div style='text-align: center;'>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="id" value="banner" /><param name="quality" value="high" /><param name="bgcolor" value="#000000" /><param name="src" value="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" /><embed src="http://www.palantirtech.com/_ptwp_live_ect0/wp-content/themes/ptcom/swf/fvp.swf?movieurl=http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="480" movieurl="http://media.palantirtech.com/government/videos/haiti/haiti_flooding.flv"/></object>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fun with jMock</title>
		<link>http://blog.palantirtech.com/2009/11/22/fun-with-jmock/</link>
		<comments>http://blog.palantirtech.com/2009/11/22/fun-with-jmock/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 21:15:08 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[development process]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[tips and tricks]]></category>
		<category><![CDATA[unit testing]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1274</guid>
		<description><![CDATA[
Here at Palantir, a lot of our automatic tests are full-chain tests. A backend server is fired up, client code runs against it, and everything runs much like a production environment. This makes intuitive sense because it’s a faithful approximation of how the system will run in the field.
However, there are some disadvantages to this:

Full-pass [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 175px;'><a href='http://www.jmock.org/'><img src='http://www.jmock.org/logo.png' style='background-color: #000066; padding: 10px'/></a></div>
<p>Here at Palantir, a lot of our automatic tests are full-chain tests. A backend server is fired up, client code runs against it, and everything runs much like a production environment. This makes intuitive sense because it’s a faithful approximation of how the system will run in the field.</p>
<p>However, there are some disadvantages to this:</p>
<ul>
<li>Full-pass tests don’t always localize the problem. Tests on a client class might fail even if it was the service that behaved incorrectly.
</li>
<li>These full-pass tests are relatively slow. Client code is running against an actual remote service. If a client is being tested, the server code still has to do work — sometimes a lot of work — even if that isn’t the focus of the test.</li>
<li>The constraints of the test are loose. Full-chain tests can mostly only see whether the operation finished correctly. It’s much harder to figure out whether the operation was done efficiently and without making unnecessary service calls.</li>
<li>They’re very little setup flexibility. If you want an RPC to return a specific value, you have little choice but to have your test get the service into a state where it can return that value. This is easy in some cases, but prohibitively difficult in others.</li>
<li>Client tests are forced to share any non-determinism leaked from the service. For example, under real conditions, a request to call A might respond before call B, and sometimes the other way around. This can result in flaky tests or tests that don’t always simulate the conditions you want to exercise.</li>
</ul>
<p>What’s to be done? Fortunately, there’s an option that handles these cases elegantly. We also test with <a href="http://www.jmock.org/">jMock</a>, a library that dynamically generates mock objects from arbitrary interfaces. These mock objects can be configured to check that particular methods are called with particular inputs a particular number of times, and then give prescribed responses.</p>
<p>Hit the link to see a concrete example of jMock in action.<br />
<span id="more-1274"></span></p>
<h2>jMock in action</h2>
<p>Let&#8217;s say I want to test my object viewer page in Palantir Web, but I don’t want to fire up a dispatch server at all. First, I create my mock service object.</p>
<pre class="brush: java;">
Mockery context = new Mockery();
final PalantirService service = context.mock(PalantirService.class);
</pre>
<p>Then, I set the expectations of my mock object. In this case, I want to tell my mock object to expect a call to PalantirService.getObject() and PalantirService.getDataSources(). getObject() will return a specific object. Any call made to the service apart from these will make the test fail.</p>
<pre class="brush: java;">
context.checking(new Expectations() {{
        oneOf(service).getObject(realm.getId(), myObject.getId());
        will(returnValue(myObject));
        oneOf(service).getDataSources(myObject.getDataSources());
}});
</pre>
<p>Now, I create the object I want to test and inject the service.</p>
<pre class="brush: java;">
ObjectViewController controller = new ObjectViewController();
controller.setService(service);
</pre>
<p>And then we fire away.</p>
<pre class="brush: java;">
ModelMap model = new ModelMap();
controller.doGet(myObject.getId(), model);
</pre>
<p>Now that the controller (the class we’re exercising) has gone off and populated the model, we check to see that the model is populated correctly. Just like we would in any other test.</p>
<pre class="brush: java;">
assertEquals(myObject.getName(), model.get(&quot;objectName&quot;));
assertEquals(myObject, model.get(&quot;object&quot;));
</pre>
<p>But in addition, we also assert that the expectations specified above were satisfied.</p>
<pre class="brush: java;">
context.assertIsSatisfied();
</pre>
<p>Not only can we be sure that the right calls were made with the right parameters, but we can also be sure that no calls besides the expected calls were made. So the next time you want more speed or control over your tests, take a look at jMock or another framework like it. It’s a powerful tool in the effort to test your best!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/11/22/fun-with-jmock/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Palantir: search with a twist (part one: memory efficiency)</title>
		<link>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/</link>
		<comments>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 07:53:59 +0000</pubDate>
		<dc:creator>Ari</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[software engineering]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1088</guid>
		<description><![CDATA[
A Palantir cluster seamlessly integrates many pieces of proven technology.  One of them is our customized version of the venerable Java search engine, Lucene. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 15px; margin-bottom: 15px'><img src='/wp-content/uploads/2009/08/200px-magnifying_glass_icon.png' alt='magnifying glass'/></div>
<p>A Palantir cluster seamlessly integrates many pieces of proven technology.  One of them is our customized version of the venerable Java search engine, <a href="http://lucene.apache.org/java/docs/">Lucene</a>. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query and many, many documents as results. We want to leverage the <a href="http://en.wikipedia.org/wiki/Inverted_index">inverted index</a> capabilities of Lucene, but our data access patterns are a bit different than the typical use case:  we need things like pervasive range-querying, different types of relevance, and dynamic views of the data based on security constraints. So in building our data platform, we&#8217;ve run into some interesting challenges that are pretty unique in the information retrieval realm, specifically:</p>
<ol>
<li>Raising memory efficiency</li>
<li>Real-time indexing</li>
<li>Preventing information leaks across access boundaries in an efficient manner</li>
</ol>
<p>I&#8217;ll cover (1) in this post and (2) and (3) in a later post, due out in about two weeks.</p>
<p>Hit the link and we&#8217;ll delve into this topic.<br />
<span id="more-1088"></span></p>
<h2>Raising memory efficiency</h2>
<p>We&#8217;ve addressed the issue of resource constraints, generally, in our earlier post: <a href="http://blog.palantirtech.com/2009/05/22/bandwidth-isnt-cheap-disk-isnt-cheap-cpu-isnt-cheap/"><em>Bandwidth isn’t cheap. Disk isn’t cheap. CPU isn’t cheap.</em></a> In that post, we posited &#8220;RAM to the rescue&#8221;:</p>
<blockquote><p>
On the other hand, some things in a SCIF are comparatively cheap. We never use boxes with less than 32GB of memory, and, in fact, lots of sites use 128GB of memory. RAM requires negligible power and cooling, and compared to disk, it’s relatively simple to install. It’s also easy to reconfigure the setup to use the additional memory.</p></blockquote>
<p>While this is true, no matter how much RAM you buy, your users will find a way to use it all &#8212; search is no exception.  In many of our environments, the search processes share hardware with other processes in the Palantir cluster, so while the OS may have 128 GB of RAM available, the search process&#8217;s VM has substantially less available to it. Compare this to a cluster of dedicated search nodes, where each node will have indexes sized to fit specifically into the memory available.</p>
<p>The upshot is that we needed to modify parts of <a href="http://lucene.apache.org/java/docs/index.html">Lucene</a> to deal with tighter memory constraints than it was designed for.</p>
<h3>Priority queue results accumulation</h3>
<p>Most systems that implement search include some notion of paging through the results.  We use a multi-level paging system, with the search server maintaining a server-side page for each query and serving smaller client-facing pages from.</p>
<p>Vanilla Lucene uses the following algorithm for accumulating search results:</p>
<ol>
<li>Load all matching results.</li>
<li>Sort by some relevance metric(s).</li>
<li>Return the top <i>n</i> results.</li>
</ol>
<p>The results are cached as a server-side page in case the client wants to load more than the first <em>n</em> results. You can see where this could run into trouble: if the total number of matching documents is high, that&#8217;s a lot of wasted RAM while we winnow it down to the size of the server page. So we use the following algorithm:</p>
<ol>
<li>Construct a <a href="http://en.wikipedia.org/wiki/Priority_queue">priority queue</a> of constrained size with priority computed using the chosen relevance metric</li>
<li>Stream through the results, inserting into the queue</li>
<li>Return the set of results in the priority queue</li>
</ol>
<p>Now we never need more RAM than the size of a server-side page to serve results.  The downside is that if the client wants more than one server-side page, we have to run the search &mdash; in its entirety &mdash; twice (ouch). To avoid the first set of results, we adjust the priority queue to kick out all results that were in the first page based on relevance metric.</p>
<h3>Using bitsets to optimize range queries</h3>
<p>A range query can return a result set of very high cardinality &ndash; a range is a very compact way of describing a large set of matching terms (even if they are discrete values, like dates).  One way to think about a range query of, say, <em>10 <= age <= 15</em>, is that it expands to <em>age = 10 OR age = 11 OR age = 12 OR age = 13 OR age = 14 OR age = 15</em>.  Rather than treat range queries in any special way, Lucene just does this expansion of the range and runs the query like a normal query.</p>
<div style='float: right; text-align: right; width: 315px; margin-top: 10px; margin-bottom: 10px;'><img src='/wp-content/uploads/2009/08/searchindexes1.png'/></div>
<p>Internally, Lucene stores a list of metadata nodes, ordered by document id, of each document that matches a given term.  The algorithm goes something like this:</p>
<ol>
<li>Open the document id lists for all matching terms</li>
<li>Walk the list pointers for each potential match such that you accumulate all the metadata for a given document.</li>
<li>Pass all this metadata up to the query processor which decides:
<ol>
<li>Does this document match the overall query? (remember that terms can be inverted)</li>
<li>Use term frequency taken from the metadata to calculate the relevance.</li>
</ol>
</ol>
<p>This structure and attendant algorithm has some nice properties:</p>
<ul>
<li>All documents are processed in a set order.</li>
<li>Everything is known about a document all at once.</li>
<li>It terminates in a single linear scan.</li>
</ul>
<p>&#8230; and has one very nasty property:</p>
<ul>
<li>All of the term value buckets that match the range must be open simultaneously.</li>
</ul>
<p>This is not a big deal for most English language queries.  However, for large ranges and the like, there can be thousands or even millions of terms.</p>
<p>The semantics of range queries have an interesting feature: a document that matches the range twice is not more relevant than one that matches once. (Contrast this with a simple term query: multiple matches <b>do</b> indicate higher relevance). Being able to discard the accounting of how many time we match the range leads to a huge win:</p>
<ol>
<li>We only need a single bit to represent a match</li>
<li>We can process a single term value bucket at a time instead of holding all buckets open in memory.</li>
</ol>
<p>Our search engine accumulates range queries into bitset objects, allowing for a very compact representation of results. We need much less memory than we did before since we only load one term value bucket at a time.  And the algorithm is simpler: no more walking pointers or <em>O(n)</em> check before figuring out which pointer moves next.</p>
<h2>The next episode</h2>
<p>Tune in for <em>Palantir: search with a twist (part two)</em> in a few weeks.  I&#8217;ll cover the following topics:</p>
<ul>
<li>Real-time indexing</li>
<li>Preventing information leaks across access boundaries in an efficient manner. (see Jason&#8217;s <a href='http://www.palantirtech.com/government/analysis-blog/mls'>Multi-Level Security</a> post over on the <a href="http://www.palantirtech.com/government/analysis-blog/">Palantir Government Analysis Blog</a> for a high-level look at why these feature are important. and check out <a href="http://www.palantirtech.com/government/videos/whitevideos">Bob McGrew&#8217;s &#8220;Access Control Model&#8221; White Video</a> for in-depth look at how we apply security to our object model.)
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>JavaInvoke allows you to spawn additional Java VMs during testing</title>
		<link>http://blog.palantirtech.com/2009/07/28/javainvoke/</link>
		<comments>http://blog.palantirtech.com/2009/07/28/javainvoke/#comments</comments>
		<pubDate>Tue, 28 Jul 2009 22:00:30 +0000</pubDate>
		<dc:creator>Ari</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[enterprise software]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[tips and tricks]]></category>
		<category><![CDATA[unit testing]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=209</guid>
		<description><![CDATA[
Here at Palantir we use test-driven development (or TDD for short).  Integrated tools like Eclipse and JUnit simplify writing and running unit tests.  However, once you need to test a broader swath of functionality, it&#8217;s time to write functional, integration, and system tests.  While technically not &#8216;unit testing&#8217;, the testing framework that [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; text-align: right; width: 298px'><img src="/wp-content/uploads/2009/07/junit.png" alt="junit success" width="288" height="194" /></div>
<p>Here at Palantir we use <a href="http://en.wikipedia.org/wiki/Test-driven_development">test-driven development (or TDD for short)</a>.  Integrated tools like <a href="http://www.eclipse.org/">Eclipse </a>and <a href="http://junit.org/">JUnit</a> simplify <a href="http://open.ncsu.edu/se/tutorials/junit/">writing and running unit tests</a>.  However, once you need to test a broader swath of functionality, it&#8217;s time to write <a href="http://www.ibm.com/developerworks/library/j-test.html#h1">functional</a>, <a href='http://en.wikipedia.org/wiki/Integration_testing'>integration</a>, and <a href='http://en.wikipedia.org/wiki/System_testing'>system</a> tests.  While technically not &#8216;unit testing&#8217;, the testing framework that JUnit provides is basically the same infrastructure that you want to leverage for writing these more involved types of testing.</p>
<p>When you&#8217;re developing enterprise software, functional testing often means getting your clients to talk to your servers.  For the main <a href="http://www.palantirtech.com/government">Palantir Government</a> product, we integrate the process of bringing the server up and down with the Ant scripts that run our automated unit tests: our testing tasks bring up the server, <a href="http://ant.apache.org/manual/OptionalTasks/junit.html">run the test suite</a>, and then kill the server. This works great and produces nice results.</p>
<p>When I started working on our authentication server, the pattern that we had used before didn&#8217;t work for me.  While the Palantir Government tests ran with a single, static configuration file, I needed to run the authentication server with multiple configurations in the course of running through the all the different functional tests.  I determined that I needed a way to programmatically bring the server up and down for testing. In JUnit parlance, I needed a way to programmatically launch the server component as part of my setup() function for my unit tests and stop it in my teardown().</p>
<p>With my itch-to-scratch firmly in hand (or some other mixed metaphor), I set out to figure out how to invoke new Java processes from inside a unit test.  The solution I came up with (with source code and examples) after the jump.<br />
<span id="more-209"></span></p>
<h2>The Six Ingredients</h2>
<p>So there are six ingredients that go into spawning a new VM:</p>
<ul>
<li>The classpath to use for the new VM</li>
<li>The name of the class to run</li>
<li>The directory to be used as the current directory for the process</li>
<li>The command line arguments to pass to the process</li>
<li>The set of Java system properties to use for this process</li>
<li>The environment to pass to the process</li>
</ul>
<p>Let&#8217;s look at each item individually.</p>
<h3>Classpath</h3>
<p>The classpath will tell the spawned VM where to load classes from.  In JavaInvoke, we use the existing classpath (from the spawning VM) as a starting point and then prepend any new entries to allow overriding the classpath for the spawned VM.</p>
<p>This takes a lot of the tedium out of having to figuring out what to put in the classpath.  Most likely, you want something similar to what you already have, if not completely identical.</p>
<p>We get the classpath from <code>System.getProperty("java.class.path")</code> and can add new entries by prepending the new entry, using the value of  <code>File.pathSeparatorChar</code> as the entry delimiter.  Using <code>File.pathSeparatorChar</code> makes the code cross-platform friendly (since the path separator is &#8216;;&#8217; on Windows and &#8216;:&#8217; on Unix (Linux, Solaris, OS/X, etc.).</p>
<p>Caveat: if you change the working directory and your original classpath was constructed using relative paths, you&#8217;ll probably have trouble getting anything to run (since your classpath will no longer point to right locations).</p>
<h3>Class name</h3>
<p>Pretty simple: what do you want to run in the spawned VM?  The class must have a <code>static void main(String args[])</code> defined, and it must be available for loading via the classpath.</p>
<h3>Working Directory</h3>
<p>If it should be different from the current working directory (CWD) of the running process, then set it and JavaInvoke will change it in the environment.</p>
<h3>Command line arguments</h3>
<p>If the process needs any command line arguments, including VM options, specify them in a string array.  Note that not all of these arguments will necessarily make it to your main method, since the VM executable will parse it first and remove the VM arguments, passing through the program arguments.</p>
<h3>Java System Properties</h3>
<p>System properties can be used to control many aspects of how a VM runs.  You can set them programmatically in your code or you can set set them on the command line by passing <em>-Dkey=value</em>.  Our JavaInvoke implementation will take a Map<string,String> of properties as a convenience argument; all it does is rewrite the map into the command line.</p>
<h3>Process environment</h3>
<p>This is an operating-system level construct.  This is the set of environment variables, also in a Map<string,String> that you would like merged with the current environment.  This would be the place that you set things like LD_LIBRARY_PATH on Unix.</p>
<h2>Dealing with input and output</h2>
<p>So you might ask the question, &#8220;where does the output from the process go?&#8221;  Or more troubling, &#8220;How do I send the process some input?&#8221;  The Java <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Process.html">Process</a> object has methods to deal with this, allowing you to get streams that give you access to the input, output, and error streams of spawned process. That API is straight-forward to deal with, just like any other use of the java.io streams.</p>
<p>However, we want to make the typical case really easy: pulling the output from the spawned process back to the parent that spawned it.  To that end, we add into the mix a class called OutputPiper.  It fires up a thread that pulls all input from the spawned process, tags it with an identifier, and then outputs to the spawner&#8217;s stdout/stderr.</p>
<h3>OutputPiper</h3>
<p>(as extracted from <a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/ProcessSpawner.java.html'>ProcessSpawner.java</a>)</p>
<pre class="brush: java;">
	public static class OutputPiper extends Thread  {
		InputStream in;
		PrintStream out;
		String tag = null;

		public OutputPiper(String tag, InputStream in,PrintStream out) {
			this.in = in;
			this.out = out;
			this.tag = tag;
			// make sure that we don't keep the VM alive
			this.setDaemon(true);
			this.setName(&quot;OutputPiper-&quot; + tag);
			out.println(&quot;Starting output piper for tag: &quot; + tag);
			this.start();
		}

		@Override
		public void run() {
			try {
				BufferedReader reader = new BufferedReader(new InputStreamReader(in));
				String line = null;
				do {
					line = reader.readLine();
					if(line != null) {
						out.println(tag + &quot;: &quot; + line);
					}
				}while(line != null);
			}
			catch (Exception e) {
				//
			}
			out.println(&quot;Output piper exiting for tag: &quot; + tag);
		}

		public static OutputPiper createOutputPiper(String tag, InputStream in, PrintStream out) {
			OutputPiper rc = new OutputPiper(tag, in,out);
			return rc;
		}
	}
</pre>
<p>Outpiper extends <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html">Thread</a> so that all the output will arrive back to the controlling process in a timely manner.  For each given process, we spawn off two OutputPipers, one for stdout and one for stderr, corresponding to the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Process.html#getInputStream()">Process.getInputStream()</a> and the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Process.html#getErrorStream()">Process.getErrorStream()</a>.</p>
<h2>ProcessSpawner &#038; JavaInvoke</h2>
<p>There are two key classes in the example:</p>
<ul>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/ProcessSpawner.java.html'>ProcessSpawner.java</a> &#8211; Essentially a wrapper around <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/ProcessBuilder.html">ProcessBuilder</a>, a generic process spawner that makes it simple to invoke processes that that use OutputPipers to forward their output back to their parent. This class allows you to specify the working directory, process environment, and command line for the process to be invoked.</li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/JavaInvoke.java.html'>JavaInvoke.java</a> &#8211; a specialized subclass of ProcessSpawner, this class makes spawning new VMs a piece of cake, doing the necessary translation for Java system properties, setting the proper classpath environment variable with potential overrides, and fills in the fully qualified class name to run.</li>
</ul>
<h2>The Example &#038; Source Code</h2>
<p>I&#8217;ve put together a running example that implements a trivial client and server in JUnit test.  The setup() method spawns the server and then the tests run the client code against the server, tearing it down after each test.  It&#8217;s available in the <a href='/wp-content/uploads/2009/07/PalantirVMSpawnerExample.zip'>PalantirVMSpawnerExample.zip</a> zip file.  Unzip it, run the <i>run.sh</i> or <i>run.bat</i> script as appropriate.  It should generate output that looks like this:</p>
<pre class="console">
-----------------------------------------------------
Starting test testAck
INFO [main] JavaInvoke - CLASSPATH=./lib/devblog-vmspawner.jar
INFO [main] ProcessSpawner - Build process spawner for the following command line:
INFO [main] ProcessSpawner - /home/pteng/java/i586/jdk1.5.0_14/jre/bin/java com.palantir.blog.processspawner.Server
Starting output piper for tag: server-stdout
Starting output piper for tag: server-stderr
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: [Socket Handler2]: Got message: some message
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: [Socket Handler3]: Got message: SHUTDOWN
Output piper exiting for tag: server-stdout
Output piper exiting for tag: server-stderr
Finished test testAck
-----------------------------------------------------
-----------------------------------------------------
Starting test testShutdown
INFO [main] JavaInvoke - CLASSPATH=./lib/devblog-vmspawner.jar
INFO [main] ProcessSpawner - Build process spawner for the following command line:
INFO [main] ProcessSpawner - /home/pteng/java/i586/jdk1.5.0_14/jre/bin/java com.palantir.blog.processspawner.Server
Starting output piper for tag: server-stdout
Starting output piper for tag: server-stderr
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: Spawning socket handler
server-stdout: Waiting for connection
server-stdout: [Socket Handler3]: Got message: SHUTDOWN
Output piper exiting for tag: server-stdout
Output piper exiting for tag: server-stderr
Took 3 ms to send shutdown.
Took 335 ms for process to die.
Finished test testShutdown
-----------------------------------------------------
SUCCESS: all 2 tests passed
</pre>
<p>The source is included in the zip file, but if you wanted to look at it or link to it on the web, here are the classes involved:</p>
<ul>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/Client.java.html'>Client.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/Example.java.html'>Example.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/JavaInvoke.java.html'>JavaInvoke.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/ProcessSpawner.java.html'>ProcessSpawner.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/Server.java.html'>Server.java</a></li>
<li><a href='/wp-content/uploads/vmspawner_html/com/palantir/blog/processspawner/ServerSpawningTest.java.html'>ServerSpawningTest.java</a></li>
</ul>
<p>And as an added bonus, there&#8217;s an Ant <i>build.xml</i> that will let you tweak and rebuild the demo yourself.</p>
<p>Comments and questions welcome.  Enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/07/28/javainvoke/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data Model Change Eventing</title>
		<link>http://blog.palantirtech.com/2009/05/27/data-model-change-eventing/</link>
		<comments>http://blog.palantirtech.com/2009/05/27/data-model-change-eventing/#comments</comments>
		<pubDate>Wed, 27 May 2009 20:46:42 +0000</pubDate>
		<dc:creator>DerekC</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[swing]]></category>
		<category><![CDATA[tips and tricks]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=968</guid>
		<description><![CDATA[One of the early architectural challenges that we faced in building the Palantir Finance product was coming up with a good design for firing events from data models to their listeners.  There are many different concepts in our product such as charts, portfolios, and indices which are all maintained by different developers.  Initially, [...]]]></description>
			<content:encoded><![CDATA[<p>One of the early architectural challenges that we faced in building the <a href="http://www.palantirtech.com/finance">Palantir Finance</a> product was coming up with a good design for firing events from data models to their listeners.  There are many different concepts in our product such as charts, portfolios, and indices which are all maintained by different developers.  Initially, each developer had their own system for firing events when a data model changed.  This quickly became a drag on development as tools became more integrated because we had to learn each others&#8217; event methodologies and translate between the different systems.</p>
<p>The solution was to select a single event firing system.  We wanted something that was easy-to-use yet powerful enough to express all the changes that might be made to a data model.  Java&#8217;s <a href="http://java.sun.com/docs/books/tutorial/javabeans/properties/bound.html">Property Change Support</a> (PCS) was a good fit because it can support arbitrary events in a very lightweight fashion.</p>
<p>Read on for details of our implementation&#8230;<br />
<span id="more-968"></span></p>
<h2>Property Change Support</h2>
<p>Java&#8217;s <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/beans/PropertyChangeSupport.html">PropertyChangeSupport class</a> (PCS) basically allows an object to easily fire events consisting of 4 pieces of information:</p>
<ul>
<li>source object &#8211; the thing that fired the event</li>
<li>property name &#8211; allows the listener to tell events for different things apart</li>
<li>old value &#8211; the old value for the property</li>
<li>new value &#8211; the new value for the property</li>
</ul>
<p>PCS handles all the bookkeeping for adding and removing listeners and firing events.  It is very useful for creating listenable models, but we wanted to make it just a little bit easier by having an abstract class that exposed the add/remove listener calls and took care of initializing PCS:</p>
<pre class="brush: java;">
public abstract class AbstractListenableModel implements Serializable {

    private static final long serialVersionUID = 1L;

    private transient PropertyChangeSupport pcs;

    protected AbstractListenableModel() {
        this.init();
    }

    /**
    * Adds a property change listener to the model.
    */
    public final void addPropertyChangeListener(PropertyChangeListener listener) {
        this.pcs.addPropertyChangeListener(listener);
    }

    /**
    * Removes a property change listener from the model.
    */
    public final void removePropertyChangeListener(PropertyChangeListener listener) {
        this.pcs.removePropertyChangeListener(listener);
    }

    /**
    * Fires a property change event to listeners of the model.
    */
    protected final void firePropertyChange(String propertyName, Object oldValue, Object newValue) {
        this.pcs.firePropertyChange(propertyName, oldValue, newValue);
    }

    /**
    * Initializes transient fields during deserialization.
    */
    protected Object readResolve() {
        this.init();
        return this;
    }

    /**
    * Initializes transient fields.
    */
    private void init() {
        this.pcs = new PropertyChangeSupport(this);
    }
}
</pre>
<p>AbstractListenableModel is basically just a simple wrapper for exposing the functionality of PCS.  By extending this abstract class, it&#8217;s very easy to create a listenable model:</p>
<pre class="brush: java;">
public final class MyModel extends AbstractListenableModel {

    public static final String PROP_FOO = MyClass.class.getName() + &quot;.Foo&quot;;

    private int foo;

    public int getFoo() {
        return this.foo;
    }

    public void setFoo(int foo) {
        //
        // The semantics of the following line are a little hard to unpack,
        // but it does exactly what it needs to do, and the tradeoff
        // for conciseness over immediate readability is worth it for
        // large models with lots of properties.
        //
        // First, the JVM starts to create a stack frame for the call
        // into firePropertyChange().  It begins binding parameter values
        // from the left to the right.  The pointer to the String contained
        // in PROP_FOO is passed in first, then the current value of
        // this.foo is passed in, then the expression
        //        this.foo = foo
        // is evaluated (setting this.foo to the new value of foo), which
        // returns the new value of foo.  All the parameters are then
        // passed down into firePropertyChange(), which checks whether
        // the oldValue is equal to the newValue.  If they're not equal,
        // it fires the event.  If they are equal, it ignores the event.
        //
        this.firePropertyChange(PROP_FOO, this.foo, this.foo = foo);
    }
}
</pre>
<p>In this example, MyModel contains a single property called foo.  When the value of foo is changed, a property change event will be fired to listeners of the model.</p>
<p>You may notice that the value of PROP_FOO is prefixed by the name of the class.  This ensure that naming collisions do not occur for scenarios in which the same listener is used to listen to multiple models which happen to use the same property name.  This scenario becomes much more likely in the case of event bubbling, which I&#8217;ll talk about next.</p>
<h2>Event Bubbling</h2>
<p>Imagine a scenario in which we have a nested model:</p>
<p>Normally, if a listener needs to receive events from both models A and B, it will need to add itself as a listener to each individual model.  While this solution would work, it’s a little cumbersome, especially when model B can get swapped out for model B’—the listener then has to keep itself synched to the internal state of model A.  It would be nice if model A could just automatically forward all the events from model B (or B’) via its PCS support so that a listener only needs to attach itself to one model instead of multiple models.  With a bit more code in AbstractListenableModel, this is possible:</p>
<pre class="brush: java;">
public abstract class AbstractListenableModel implements Serializable {

    private transient PropertyChangeListener childModelListener;

    ...

    /**
    * Registers a child model to this model.
    */
    protected void registerChildModel(ListenableModel childModel, String propertyName) {
        childModel.addPropertyChangeListener(this.childModelListener);
    }

    /**
    * Initializes transient fields.
    */
    private void init() {
        ...
        this.childModelListener = new ChildModelListener();
    }

    /**
    * Listener for property change events fired from child models.
    */
    private final class ChildModelListener implements PropertyChangeListener {
        public void propertyChange(PropertyChangeEvent event) {
            // This is where the bubbling happens
            pcs.firePropertyChange(event);
        }
    }
}
</pre>
<p>Now, whenever model B fires a property change event, this event will also be fired by model A.  This makes it much easier for the listener to listen to events arbitrarily deep in the model hierarchy, because each event fired by a child model gets re-fired (bubbled) by all its ancestors.  All you have to do is attach a listener to the root model and you’ll automatically receive events from all models in the hierarchy.</p>
<p>Note that the registerChildModel method above takes an unused propertyName argument.  In the full implementation of this class, events with the provided property name are monitored.  When an event with the provided property name is fired, childModelListener is detached from the old child model and attached to any new child model.  This ensures that the listenable model is always listening to the current child models.</p>
<h2>Events for Collections</h2>
<p>Any model event support would not be complete without some consideration of how to handle collections such as sets and lists.  To solve this scenario, we created specialized collection classes called ListenableModelSet and ListenableModelList.  These collections hold AbstractListenableModels as their elements and fire events whenever their contents change.  Since the changes to collections can vary widely, the solution we came up for communicating collection changes with full fidelity is basically to fire events with a copy of the old set as the old value and the new set as the new value.  Listeners can then diff the old and new values to determine exactly what changed if necessary.  Additionally, each ListenableModelSet or List adds a ChildModelListener to all of its children (themselves AbstractListenableModels), thereby ensuring that events are bubbled from all models in the collection.</p>
<h2>Conclusion</h2>
<p>Just as we saw with the <a href="http://blog.palantirtech.com/2009/04/20/model-view-adapter/">Adapter</a> piece of the <a href="http://en.wikipedia.org/wiki/Model-view-adapter">MVA</a> triad, when we <a href="http://se.ethz.ch/~meyer/publications/patterns/visitor.pdf">componentize</a> the Model piece there are huge gains to be had.  Once we started using a base Model class and a consistent eventing infrastructure (PropertyChangeSupport), we could add features that made coding across our entire application a lot more pleasant.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/05/27/data-model-change-eventing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Pokémon Problem: a new anti-pattern</title>
		<link>http://blog.palantirtech.com/2009/03/19/the-pokemon-problem/</link>
		<comments>http://blog.palantirtech.com/2009/03/19/the-pokemon-problem/#comments</comments>
		<pubDate>Fri, 20 Mar 2009 03:59:45 +0000</pubDate>
		<dc:creator>John C</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=202</guid>
		<description><![CDATA[
It&#8217;s always fun to release a new piece of jargon into the wild. I&#8217;ve run into a number of bugs in our codebase that caused by an anti-pattern I&#8217;d like to dub The Pokémon Problem.
Much like the game of Whac-a-Mole, this is a class of bugs where fixing every occurrence does not prevent the bug [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; margin-left: 15px; margin-bottom: 15px;"><img src="/wp-content/uploads/2009/03/pokepic1.jpg" alt="Gotta catch 'em all!" width="220" height="245" /></div>
<p>It&#8217;s always fun to release a new piece of jargon into the wild. I&#8217;ve run into a number of bugs in our codebase that caused by an <a href="http://en.wikipedia.org/wiki/Anti-pattern">anti-pattern</a> I&#8217;d like to dub <strong>The Pokémon Problem</strong>.</p>
<p>Much like the game of <a href="http://en.wikipedia.org/wiki/Whack-a-mole">Whac-a-Mole</a>, this is a class of bugs where fixing every occurrence does not prevent the bug from returning in new code: it is easy for code delta to result in an instance of the bug being re-introduced into the code base. Even if you &#8220;<a href="http://www.youtube.com/watch?v=vZ3gPXVMWRY">catch &#8216;em all</a>&#8220;, nothing prevents someone else from introducing new Pokémon bugs later.</p>
<p>Not only is this bug easy to re-introduce, but it sometimes can be hard to find all currently existing instances of this pattern. Although tools like Eclipse make it easier to track down all the places that code is called,  sometimes you&#8217;re looking for things that happen in a certain sequence (which tools like Eclipse don&#8217;t do a good job of searching for) and dynamic invocation mechanisms like <a href="http://www.ibm.com/developerworks/library/j-dyn0603/">Java Reflection</a> can sometimes make it impossible to be exhaustive.  This type of bug is also resistant to automated refactoring: changing the protocol of dealing with this corner of your code will require you to track down all places it was touched and manually refactor them.  It generally signals a failure to use sufficient <a href="http://en.wikipedia.org/wiki/Separation_of_concerns">separation of concerns</a>.</p>
<p>In general, this anti-pattern is a result of <a href="http://en.wikipedia.org/wiki/API">APIs</a> that require the caller to be responsible for state management of resources that the API owns.  This can include things like an object that requires the caller to have run an initialization method before calling any other method on the object.  These bugs get even more insidious when a failure to do things in the right order does not cause a hard failure (like throwing an exception) but instead creates some sort of subtle corruption that may not be noticed or cause subsequent calls to fail unexpectedly.</p>
<p>Read on for some strategies on dealing with the Pokémon problem.<br />
<span id="more-202"></span></p>
<h2>Solving the Pokémon Problem</h2>
<p>How do we solve the Pokémon problem?  Sometimes it is as easy as writing a unit test to verify that no new Pokémon exist. More often than not, though, you&#8217;ll have to use better encapsulation to solve the problem.  If the Pokemon problem is the anti-pattern, <a href="http://en.wikipedia.org/wiki/Information_hiding">encapsulation</a> is its opposite.</p>
<p>For an example, I&#8217;ll turn to a real problem I ran into in the <a href="http://www.palantirtech.com/videos/">Palantir Government</a> codebase:  Let&#8217;s say I have class <em>Parser</em> and it has a method <em>getAttribute()</em>.  I want to add the ability to have shortened (obfuscated) tag names to the <em>Parser</em> class.</p>
<h3>The wrong approach</h3>
<p>One approach I can take:</p>
<ol>
<li>Create a class called <em>AttributeHandler</em> that wraps calls to <em>Parser.getAttribute()</em>. It handles detection of the short or full XML dialect and then returns the appropriate attribute value.
</li>
<li>Stick an instance of this <em>AttrbributeHandler</em> class in a member variable of <em>Parser</em> and make sure to call <em>AttributeHandler.getAttribute()</em> instead of <em>Parser.getAttribute()</em> when processing attributes.</li>
<li>Put comment on <em>Parser.getAttribute()</em> mentioning that it shouldn&#8217;t be called directly anymore.</li>
</ol>
<p>I just introduced a Pokémon problem.  The next person that edits this code may not read my comment and know that calling <em>Parser.getAttribute()</em> is a bug.  They may not test the short tag case and find the bug. How do I fix this Pokémon problem? </p>
<h3>The right approach</h3>
<p>Approaching this encapsulation problem has a number of different approaches one could take.  Here&#8217;s what I chose:</p>
<ol>
<li>Create a class called <em>AttributeHandler</em> that wraps calls to <em>Parser.getAttribute()</em>. It handles detection of the short or full XML dialect and then returns the appropriate attribute value.
</li>
<li>Stick an instance of this <em>AttrbributeHandler</em> class in a member variable of <em>Parser</em>.</li>
<li>Create a wrapper class around <em>Parser</em> that delegates the <em>getAttribute()</em> call to the <em>AttributeHandler</em> member when processing attributes.</li>
</ol>
<p>(You might ask why I used a delegate instead of sub-classing the parser and overriding the <em>getAttribute()</em> method: it didn&#8217;t fit for architectural reasons that aren&#8217;t relevant here). Now the developer no longer has access to the <em>Parser.getAttribute()</em> method that would cause the undesired behavior.</p>
<h2>In-depth Example: Resource Management</h2>
<p>Another common place you might run into the Pokémon problem is code that needs to clean up resources.  Some coders are lazy and don&#8217;t want to always write their try/finally/close blocks properly.  This can lead to resource leaks; for things like file handles, this usually isn&#8217;t a large issue, but for scarce resources like database connections, it&#8217;s critical that things get cleaned and returned to the pool.  For locks, it&#8217;s absolutely essential to avoid deadlocks.</p>
<h3>The wrong way</h3>
<p>Here&#8217;s a simple example of bad resource management.  In this (admittedly contrived) scenario, we have class that&#8217;s managing a resource for us, in this case it&#8217;s a status file.  Different parts of the app need to read the status file where, presumably, something is storing its status.</p>
<p>Here&#8217;s the naive implementation of the status manager class:</p>
<pre class="brush: java;">

public class PokemonStatusFileManager {

	final File statusFile;

	public PokemonStatusFileManager(File statusFilePath) {
		this.statusFile = statusFilePath;
	}

	public InputStream getStatusFileInputStream() throws FileNotFoundException {
		return new FileInputStream(this.statusFile);
	}
}
</pre>
<p>Based on this implementation, here&#8217;s some code that would use it.  Note that first method, <em>parseStatusFileIncorrectly()</em> does no error checking and may not close the <em>InputStream</em> properly if an exception is thrown.  The second method does proper resource handling, but it&#8217;s kind of ugly to read.</p>
<pre class="brush: java;">

public class PokemonParseStatusFile {

	/**
	 * This is not proper resource management.
	 * @param manager
	 * @throws IOException
	 */
	public static void parseStatusFileIncorrectly(PokemonStatusFileManager manager)
		throws IOException {

		InputStream statusFile = manager.getStatusFileInputStream();
		readFrom(statusFile);
		statusFile.close();

	}

	/**
	 * This is proper resource management, but it's tedious to have to write.
	 * Some coders are too lazy to always do this the right way.
	 * @param manager
	 * @throws IOException
	 */
	public static void parseStatusFileCorrectly(PokemonStatusFileManager manager)
		throws IOException {

		InputStream statusFile = null;
		try {
			statusFile = manager.getStatusFileInputStream();
			readFrom(statusFile);
		} finally {
			// carefully close the resource
			if(statusFile != null) {
				try {
					statusFile.close();
				} catch(Exception e) {
					System.err.println(&quot;Error closing statusFile!&quot;);
					e.printStackTrace(System.err);
				}
			}
		}
	}

	static void readFrom(InputStream statusFile) throws IOException {
		// do the reading here...
	}
}
</pre>
<p>So this produces a classic Pokémon problem: everyone who interacts with the StatusFileManager has to do proper resource handling.  Now imagine that this is an important lock instead of just a file handle: this Pokémon problem could cause deadlock (which would be bad).</p>
<p>So how do we fix this? <a href="http://en.wikipedia.org/wiki/Visitor_pattern">The Visitor Pattern</a>.</p>
<h3>Solving the Pokémon problem with the <a href="http://en.wikipedia.org/wiki/Visitor_pattern">visitor pattern</a></h3>
<p>The Visitor Pattern is a potent weapon in fighting the Pokémon problem: it allows you to fully encapsulate access to a resource by injecting into the places it&#8217;s needed but preserving overall control of the resource in the code that &#8220;owns&#8221; the resource.  Classically used for controlling things like iteration order, here the visitor pattern is applied as a form of <a href="http://en.wikipedia.org/wiki/Dependency_injection">dependency injection</a> to enable lifecycle management.</p>
<p>The visitor pattern as applied to resource management is fairly straightforward:</p>
<ol>
<li>Define an interface, <em>ResourceVisitor</em> with one method, <em>visit(Resource r)</em> (where <em>Resource</em> is the type of resource we&#8217;re managing.</li>
<li>Define the resource manager with a method that takes a <em>ResourceVisitor</em> as a parameter.  Manage the lifecycle of the resource and call <em>ResourceVisitor.visit(r)</em> when appropriate, handling all initialization, error-handling and cleanup.</li>
</ol>
<p>Here&#8217;s our re-spin of the status file manager class.  You&#8217;ll notice that we&#8217;ve moved the resource handling code from the correct example above and encapsulated it in the visitor pattern:</p>
<pre class="brush: java;">

public class UnPokemonStatusFileManager {

	/**
	 * Visitor interface implemented by callers wishing to
	 * interact with the status file.
	 */
	public static interface StatusFileVisitor {
		public void visitStatusFile(InputStream statusFile) throws IOException;
	}

	final File statusFile;

	public UnPokemonStatusFileManager(File statusFilePath) {
		this.statusFile = statusFilePath;
	}

	/**
	 * Note that this method is now private.
	 */
	private InputStream getStatusFileInputStream() throws FileNotFoundException {
		return new FileInputStream(this.statusFile);
	}

	/**
	 * Here's the method that takes the visitor and
	 * fully encapsulates the lifecycle of the status file.
	 */
	public void parseStatusFile(StatusFileVisitor parser)
	throws IOException {
		InputStream statusFile = this.getStatusFileInputStream();
		try {
			parser.visitStatusFile(statusFile);
		} finally {
			// carefully close the resource
			try {
				statusFile.close();
			} catch(Exception e) {
				System.err.println(&quot;Error closing statusFile!&quot;);
				e.printStackTrace(System.err);
			}
		}
	}
}
</pre>
<p>Now that we&#8217;ve encapsulated the complexity of the resource handling the <em>UnPokemonStatusFileManager</em> class, code that needs to access the status file can be written in a highly correct manner without much work.</p>
<pre class="brush: java;">

public class UnPokemonParseStatusFile {

	public static void parseStatusFile(UnPokemonStatusFileManager statusFileManager)
		throws IOException {

		// generate anonymous visitor to do the processing
		StatusFileVisitor visitor = new StatusFileVisitor() {
			public void visitStatusFile(InputStream statusFile) throws IOException {
				readFrom(statusFile);
			}
		};
		statusFileManager.parseStatusFile(visitor);
	}

	static void readFrom(InputStream statusFile) throws IOException {
		// do the reading here...
	}
}
</pre>
<p>By encapsulating the full lifecycle of the status file in the visitor pattern, I&#8217;ve ensured that it&#8217;s always accessed properly.  If I change the place where we store status to be a database rather than a file, nothing needs to change except this one class; the calling code remains the same.</p>
<p>We now have easy re-factoring, no resource leaks, and have simplified calling code.  And finally: there are no new bugs to be introduced by callers that aren&#8217;t sure how to use our resource.  Looks like we caught &#8216;em all!</p>
<h2>Wrapping It All Up</h2>
<p>So there it is, your new piece of jargon: The Pokémon Problem anti-pattern.  You heard it here first!  Please post any other great examples to the comments section on this post.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/03/19/the-pokemon-problem/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Palantir Monitoring Server: where build beats buy</title>
		<link>http://blog.palantirtech.com/2009/02/23/palantir-monitoring-server-where-build-beats-buy/</link>
		<comments>http://blog.palantirtech.com/2009/02/23/palantir-monitoring-server-where-build-beats-buy/#comments</comments>
		<pubDate>Mon, 23 Feb 2009 20:00:56 +0000</pubDate>
		<dc:creator>Eric W.</dc:creator>
				<category><![CDATA[enterprise engineering]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=186</guid>
		<description><![CDATA[
Distributed systems are complex. Getting them right is hard, and when things don&#8217;t go right, it can be difficult to understand what went wrong. In an environment like ours, a good monitoring system isn&#8217;t just nice to have; it&#8217;s a critical component necessary for understanding behavior and diagnosing problems.
We had three primary goals for the [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; text-align:right; margin-right:20px; width: 253px'><img src="http://blog.palantirtech.com/wp-content/uploads/2009/02/monitoringserverscreenshot-badge.png" alt="Graph of CPU usage over time" title="Graph of CPU usage over time" width="233" height="188"/></div>
<p>Distributed systems are complex. Getting them right is hard, and when things don&#8217;t go right, it can be difficult to understand what went wrong. In an environment like ours, a good monitoring system isn&#8217;t just nice to have; it&#8217;s a critical component necessary for understanding behavior and diagnosing problems.</p>
<p>We had three primary goals for the initial monitoring system: <b>graphing</b> of time-series data, <b>alerting</b> on event triggers, and <b>notifications</b> to users.  Furthermore, as a product company, we had a design goal of a simple, intuitive (yet powerful and flexible) solution.</p>
<p>Before starting, we did a quick survey of existing open-source packages. Unfortunately, nothing we found quite fit our needs, given our specific requirements of security, protocol, licensing, and integrability into our product. Given that, we made the decision to forge ahead and build our own; we try not to re-invent the wheel but it seemed to make sense here.</p>
<p>For an in-depth look at the architecture of the Monitoring Server and components we used to build it, read on&#8230;</p>
<p><span id="more-186"></span></p>
<h2>Architecture</h2>
<p>At the highest level, a two-tiered architecture made the most sense. The back-end, standalone server component would be responsible for collecting, processing, and exposing data through an API. The front-end component would be web-based <a href="http://en.wikipedia.org/wiki/Portlet">portlets</a> integrated into our existing management interface.</p>
<p>The server architecture was designed to allow generic components to work together, with everything connected up via Spring.  While we started with JMX as our collection method for monitoring data, the architecture sees this as just one pluggable component, with multiple data backends supported.  A Spring webservices API allows the front-end portlets to query and manipulate the components at each level.</p>
<p>For our first shipping release, we&#8217;ve only shipped the JMX backend, and so this is what production architecture looks like for now:</p>
<div class='postimg'>
<img src="http://blog.palantirtech.com/wp-content/uploads/2009/02/monitoring-server-architecture.png" alt="Monitoring Server architecture diagram" title="Monitoring Server architecture diagram" width="650" /></div>
<h2>Components</h2>
<p>Any time you choose build instead of buy, there&#8217;s a lot of work to be done to get the full set of functionality you need. Fortunately, the Java platform has an extremely rich set of freely available projects and libraries, and we leveraged many of them for the back-end:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Java_Management_Extensions">JMX</a>: the core of our system, the Java Management Extensions is a standard for managing and monitoring applications. We use JMX to instrument and monitor our own servers, and because it&#8217;s an adopted standard, we gain access to MBeans exposed by third-party components as well.</li>
<li><a href="https://rrd4j.dev.java.net/">rrd4j</a>: round-robin databases (RRDs) are an excellent storage format for time-series data, and RRD4J is a pure Java implementation of the legendary RRDTool. The round-robin format allows for a fixed size file, since older data is overwritten as newer data arrives. The multi-resolution aspect of the files provides long historical views without a space premium. For example, an RRD can contain a high resolution series for recent information and a low resolution series for long-term data.</li>
<li><a href="http://hsqldb.org/">HSQLDB</a>: a lightweight, native Java, SQL database that can be run in-process. We use HSQLDB to store all non&#8211;time-series information, such as metadata about metrics we&#8217;re monitoring.</li>
<li><a href="http://www.opensymphony.com/quartz/">Quartz</a>: an open source job scheduling system, we use Quartz primarily for scheduling Alerts. Alerts run periodically to check for a condition, and notify if triggered. Each Alert&#8217;s wait period is specified by the user, and fortunately, with Quartz it&#8217;s easy to schedule many Alerts at different frequencies.</li>
<li><a href="http://groovy.codehaus.org/">Groovy</a>: self-described as &#8220;an agile dynamic language for the Java Platform,&#8221; Goovy is integrated into our alerting system. Alerts can contain Groovy scriptlets, which give us the expressiveness to create Alerts such as &#8220;alert if a metric&#8217;s average value over the past 5 minutes is greater than X,&#8221; or &#8220;alert if the variation of a set of metrics&#8217; values across all servers of type Y is greater than Z.&#8221;</li>
<li><a href="http://java.sun.com/products/javamail/">JavaMail</a>: a full-featured email framework. Supports SSL/TLS secure connection protocols, which our clients require.</li>
<li><a href="http://java.sun.com/developer/technicalArticles/WebServices/jaxb/">JAXB</a>: a simple-to-use Java to XML API, JAXB allows us to convert XML into Java objects (and vice-versa). We use JAXB for parsing configuration files and persisting objects into HSQLDB.</li>
<li><a href="http://www.theserverside.com/tt/articles/article.tss?l=IntrotoSpring25">Spring</a>: a framework for developing enterprise Java applications, Spring is the foundation for our monitoring server.
<p>Having never used a component framework before, using Spring&#8217;s Inversion of Control and Dependency Injection paradigms to build an application turned out to be a pleasant and educational experience. While it enforced discipline in using interfaces, it rewarded us with the ability to easily swap implementations of a component. For example, switching to a HSQLDB-based data store required only a single-line edit, and everything just worked. Seriously.</p>
<p>
We also leveraged Spring early in our development process: we pair-coded interfaces, created stub objects, and then wired everything up in Spring. Once our skeleton was in place, we independently worked on component implementations and swapped them in as they were completed. Later in the cycle, we used Spring in our unit tests to compose our application differently for specific tests, isolating important functionality and using dummy components for non-relevant areas.</p>
</li>
</ul>
<h2>User Interface</h2>
<p>By moving the user interface into the portlets, we were able to re-skin the fairly ugly native graphing capability that rrd4j provides with a more generic solution that looks good.  For comparison, here&#8217;s an MRTG style graph produced by rrd4j:</p>
<div class='postimg'><img src="https://rrd4j.dev.java.net/tutorial_files/speed4.gif" title='rrd4j sample graph' alt='rrd4j sample graph'/></div>
<p>And here&#8217;s some graphs from our Monitoring Server (note the portlet UI components for controlling display of the graphs):</p>
<div class='postimg'><img src="http://blog.palantirtech.com/wp-content/uploads/2009/02/monitoringserverscreenshot.png" alt="Graphs from Monitoring Server" title="Graphs from Monitoring Server"/></div>
<p>While the difference is not that stark, our graphs are much easier on the eyes.</p>
<h2>Monitoring Server: present and future</h2>
<p>We recently released the monitoring system, and it&#8217;s already providing insights into our product&#8217;s behavior.  We have more features planned: <b>eventing</b>, which will help us track system events such as a server restart or job completion; <b>generating</b> new time-series data from existing data (for example, a series of the rolling standard deviation of a metric, or the number of failure events in the past 24 hours), and Groovy <b>scripting</b> directly against the monitoring server.  The last feature is particularly helpful when our engineering team can&#8217;t physically access a system due to security restrictions.</p>
<p>From an analysis perspective, we can now start to better understand our system&#8217;s behavior, which will help us identify problems before they occur and help steer our development energy going forward. Even the world&#8217;s best data analysis software needs a little analysis itself sometimes.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/02/23/palantir-monitoring-server-where-build-beats-buy/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Deploying a distributed system</title>
		<link>http://blog.palantirtech.com/2008/10/07/deploying-a-distributed-system/</link>
		<comments>http://blog.palantirtech.com/2008/10/07/deploying-a-distributed-system/#comments</comments>
		<pubDate>Wed, 08 Oct 2008 03:52:32 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[development process]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=118</guid>
		<description><![CDATA[
At Palantir, we write software that gets deployed at each client, integrated across their sensitive data sets, and maintained and administered by that client&#8217;s in-house admins.  Most deployed enterprise software is run on a single beefy box: consider wikis, blogging systems, bug tracking systems, or practically any client/server or web client software software used [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; text-align: right; width: 230px'><img src="http://blog.palantirtech.com/wp-content/uploads/2008/10/pg-distrib-logo.png" alt="Distributed systems diagram" title="Distributed systems diagram" width="221" height="300" class="size-medium wp-image-466" /></div>
<p>At Palantir, we write software that gets deployed at each client, integrated across their sensitive data sets, and maintained and administered by that client&#8217;s in-house admins.  Most deployed enterprise software is run on a single beefy box: consider wikis, blogging systems, bug tracking systems, or practically any client/server or web client software software used today.  On the other hand, most enterprise software that runs as a <a href="http://en.wikipedia.org/wiki/Distributed_system">distributed system</a> is hosted: Salesforce.com, Google Apps, or any approach that sells software as a <a href="http://en.wikipedia.org/wiki/Software_as_a_Service">service</a>.  What’s fairly unusual about our software is that it’s deployed as a distributed system at each client.</p>
<p>Distributed systems are hard to build and hard to maintain.  As long as that distributed system is built and maintained in-house, however, you have a number of advantages:</p>
<ul>
<li> The administrators are full-time product experts who are focused on the mission of keeping your system available and responsive.
<li> The development organization can build internal tools for the administrators that only have to be “good enough” and can step in if necessary.
<li> It’s easy to get feedback on how the system performs, because there are no sensitivity, privacy, or legal constraints.
<li> A single, large deployment allows you to optimize your hardware purchasing and amortize installation headaches across a large number of machines.
</ul>
<p>This is all great, of course, and if you can host and maintain your distributed system yourself, I’d highly recommend it.  Sometimes, however, it’s just not possible.  At Palantir, the client data we work with is so sensitive that even we cannot see it, except under very strictly controlled circumstances.  It’s also so large that the bandwidth limitations of pushing it into a system hosted by us would be prohibitive.</p>
<p>So suppose that you have to deploy your distributed system in a customer datacenter with external parties maintaining the system.  What do you need to consider?  In this post, I&#8217;ll go into a number of key points that we have faced and addressed at Palantir.</p>
<p><span id="more-118"></span></p>
<h3>Understand Your Administrators</h3>
<p>Assume that your administrators are part-time, not product experts, and constantly distracted by their other responsibilities.  They aren’t even experts in the technologies your system is based on: for example, they don’t really know much about databases and they are more comfortable with Windows than with Linux.  Even if these assumptions aren’t all true in any particular case, there will be administrators who meet each of these assumptions.</p>
<h3>Design For Manageablility</h3>
<p> This means building powerful management tools for your system that are web-based and also scriptable.  Remember that your administrators are part-time, so usability is important: by the time your administrator touches the Foobar Configuration Widget the second time, he’s forgotten everything he learned a month ago when he did it the first time.  You also want to build management tools that go all the way down the stack: using low-level tools for occasional jobs leads to mistakes, because those low-level tools tend to be far more powerful than necessary for your system.</p>
<h3>Design In Monitoring And Notification</h3>
<p>Visibility is one of the biggest reasons people want control – but unnecessary control leads to mistakes.  Your administrator shouldn’t have to go to the command line to run <code><a href="http://en.wikipedia.org/wiki/Top_(Unix)">top</a></code> just to figure out whether the system is overloaded.  Each metric that is being monitored needs to have historical data so that your system can distinguish baseline behavior from anomalies.  Each metric displayed to an administrator needs to be displayed with context, whether that’s the mean and standard deviation of the metric, or whether it’s similar metrics on other servers.   Anomalous  behavior should trigger human action through a notification.</p>
<p>Notifications also need to be carefully designed (as well as extensible by the administrator).  Carefully distinguish actionable items from non-actionable items, and try to reduce ambiguity as to what action is required.  It’s similar to error logging: if you let standard system events pollute your error logs, the administrator will soon stop paying attention to them.  Although you may not be able to send monitoring information directly back to the development team, you may also want to prepare reports of what’s gone wrong to collect every so often; just make sure that these reports are human-readable so that they can be vetted to make sure they don’t leak any sensitive information.</p>
<h3>Design For Autonomy</h3>
<p>Where possible, design the system to handle error conditions that can be systemically fixed. The best kind of failure is one that requires no admin intervention.  If you can automatically extend your <a href="http://en.wikipedia.org/wiki/Tablespace">tablespace </a>when it runs out of allocated space, do it.  But be sure to give sufficient warning if a long lead-time action is going to be required (like ordering and installing an additional disk).  You won’t be able to figure out everything that can go wrong ahead of time, but you can iterate to drive down the number of events for which human intervention is required.</p>
<p>In future posts, we plan to drill down on each of these challenges and look at what approaches and technologies worked for us.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2008/10/07/deploying-a-distributed-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
