<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Palantir Technologies &#187; javatech</title>
	<atom:link href="http:///category/javatech/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Wed, 14 Dec 2011 17:48:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>A rigorous friction model for human-computer symbiosis</title>
		<link>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/</link>
		<comments>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 03:18:52 +0000</pubDate>
		<dc:creator>Asher Sinensky</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[Human-Computer Symbiosis]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[palantir]]></category>
		<category><![CDATA[problemspace - finance]]></category>
		<category><![CDATA[problemspace-government]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[softwarephilosophy]]></category>
		<category><![CDATA[user interface]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1344</guid>
		<description><![CDATA[This is a response to Ari&#8217;s awesome post on human-computer symbiosis. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look: We are attempting to understand the total analytic capability for a given task a of a human-computer [...]]]></description>
			<content:encoded><![CDATA[<div style='text-align: center; float: right; margin-left: 15px; margin-right: 15px'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt="" width="300"/>
</div>
<p>This is a response to <a href="http://blog.palantirtech.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/">Ari&#8217;s awesome post on human-computer symbiosis</a>. Ari and I were chatting about the equation he developed and I was wondering if there were some further refinements that are possible&#8230; let&#8217;s take a look:</p>
<p>We are attempting to understand the total analytic capability for a given task <strong><em>a</em></strong> of a human-computer team. Analytic capability in this case probably means:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq1.png" alt="eq1"/>(1)
</div>
<p>Where <strong><em>A</em></strong> is the answer to the analytic problem in question and <strong><em>t<sub>A</sub></em></strong> is the time needed to arrive at the answer based on the inputs available. In the case of chess, <strong><em>A</em></strong> could be the optimum next move given all previous information and <strong><em>t<sub>A</sub></em></strong> would be how long it takes to decide on this move.</p>
<p>Read on for a look at how this generalizes in human-computer symbiotic systems.<br />
<span id="more-1344"></span></p>
<p>In the case of the human-computer team, we know that <strong><em>a </em></strong>is going to be a function of both the human&#8217;s analytical capability <strong><em>h</em></strong> and the computer&#8217;s analytical capability <strong><em>c</em></strong> (where both <strong><em>h</em></strong> and <strong><em>c</em></strong> have units of answers/time). In the limit case we know that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq2.png" alt="eq2"/>(2)
</div>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq3.png" alt="eq3"/>(3)
</div>
<p>Or in plain English, if there is no human present, the total analytic capability is simply the analytic capability of the computer. So the naïve solution would be that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq4.png" alt="eq4"/>(4)
</div>
<p>(4) clearly meets the limiting cases described in (2) and (3). Kasparov noticed a mixing function where the ability of the human and computer to work together becomes the dominant term &mdash; we might call this the mixing capability for the given task or <strong><em>m</em></strong>. Including this phenomenon, the total analytic capability (4) would be re-defined as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq5.png" alt="eq5"/>(5)
</div>
<p>where <strong><em>m</em></strong> has the property that:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq6.png" alt="eq6"/>(6)
</div>
<p>Thus maintaining the limits expressed in (2) and (3) and adhering to the observation that if there is no human or computer component then there will be no mixing advantage. A naïve solution to this constraint would be simple linear mixing:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq7.png" alt="eq7"/>  (7)
</div>
<p>where <strong><em>M</em></strong> (units of time per answer) is the mixing efficiency and will be primarily based on the type of task being solved &mdash; some analytical tasks lend themselves to a combined process more than others (for example, multiplying 20 digit numbers does not really benefit from the intuition of a human so the ability of a human and computer to perform this task is merely their additive ability). </p>
<p>What Kasparov noticed is that the mixing was primarily based on the quality of the process rather than the analytical power of either the human or computer separately. This seems to imply that we must somehow account for the fact that the quality of the human-computer interface is responsible for the quality of the mixing. This can be modeled as a unitless friction of interaction <strong><em>f<sub>i</sub></em></strong> that impedes the ability of the human and computer to work together. </p>
<p>Equation (7) can thus be re-written as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq8.png" alt="eq8"/>(8)
</div>
<p>In this case, the maximum value for the mixing capability is realized when the friction of interaction goes to zero. This mixing capability is the same as the equation Ari developed (less the coefficient which is necessary to maintain consistent units throughout).</p>
<p>We can now re-write our analytic capability in (5) as:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq9.png" alt="eq9"/>(9)
</div>
<p>Below, see a plot of this function over a range of values for <strong><em>h</em></strong>, <strong><em>c</em></strong> and <strong><em>f<sub>i</sub></em></strong>:</p>
<div style='text-align: center; margin: auto; margin-bottom: 1em;'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/graph.png" alt=""/>
</div>
<p>As can clearly be seen from this functional plot (note the vertical scale), the effect of interface friction dominates over the other terms whenever both the human and computer can make important contributions to the task at hand. The conclusion can be drawn that the most effective way to solve analytical problems is to minimize the friction of the human-computer interface; or to put it another way: optimal analytical systems are those that are built specifically to maximize the ability of the human to leverage the ability of the computer.</p>
<p>I am certain there is still the possibility for further refinement, for example:</p>
<div style='text-align: center;margin-bottom: 1em'>
<img style='vertical-align: middle' src="/wp-content/uploads/2010/06/eq10a.png" alt="eq10a"/>(10)
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2010/06/02/a-rigorous-friction-model-for-human-computer-symbiosis/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Fun with jMock</title>
		<link>http://blog.palantirtech.com/2009/11/22/fun-with-jmock/</link>
		<comments>http://blog.palantirtech.com/2009/11/22/fun-with-jmock/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 21:15:08 +0000</pubDate>
		<dc:creator>Steve Downing</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[development process]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[software engineering]]></category>
		<category><![CDATA[tips and tricks]]></category>
		<category><![CDATA[unit testing]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1274</guid>
		<description><![CDATA[Here at Palantir, a lot of our automatic tests are full-chain tests. A backend server is fired up, client code runs against it, and everything runs much like a production environment. This makes intuitive sense because it’s a faithful approximation of how the system will run in the field. However, there are some disadvantages to [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 175px;'><a href='http://www.jmock.org/'><img src='http://www.jmock.org/logo.png' style='background-color: #000066; padding: 10px'/></a></div>
<p>Here at Palantir, a lot of our automatic tests are full-chain tests. A backend server is fired up, client code runs against it, and everything runs much like a production environment. This makes intuitive sense because it’s a faithful approximation of how the system will run in the field.</p>
<p>However, there are some disadvantages to this:</p>
<ul>
<li>Full-pass tests don’t always localize the problem. Tests on a client class might fail even if it was the service that behaved incorrectly.
</li>
<li>These full-pass tests are relatively slow. Client code is running against an actual remote service. If a client is being tested, the server code still has to do work — sometimes a lot of work — even if that isn’t the focus of the test.</li>
<li>The constraints of the test are loose. Full-chain tests can mostly only see whether the operation finished correctly. It’s much harder to figure out whether the operation was done efficiently and without making unnecessary service calls.</li>
<li>They’re very little setup flexibility. If you want an RPC to return a specific value, you have little choice but to have your test get the service into a state where it can return that value. This is easy in some cases, but prohibitively difficult in others.</li>
<li>Client tests are forced to share any non-determinism leaked from the service. For example, under real conditions, a request to call A might respond before call B, and sometimes the other way around. This can result in flaky tests or tests that don’t always simulate the conditions you want to exercise.</li>
</ul>
<p>What’s to be done? Fortunately, there’s an option that handles these cases elegantly. We also test with <a href="http://www.jmock.org/">jMock</a>, a library that dynamically generates mock objects from arbitrary interfaces. These mock objects can be configured to check that particular methods are called with particular inputs a particular number of times, and then give prescribed responses.</p>
<p>Hit the link to see a concrete example of jMock in action.<br />
<span id="more-1274"></span></p>
<h2>jMock in action</h2>
<p>Let&#8217;s say I want to test my object viewer page in Palantir Web, but I don’t want to fire up a dispatch server at all. First, I create my mock service object.</p>
<pre class="brush: java; title: ; notranslate">
Mockery context = new Mockery();
final PalantirService service = context.mock(PalantirService.class);
</pre>
<p>Then, I set the expectations of my mock object. In this case, I want to tell my mock object to expect a call to PalantirService.getObject() and PalantirService.getDataSources(). getObject() will return a specific object. Any call made to the service apart from these will make the test fail.</p>
<pre class="brush: java; title: ; notranslate">
context.checking(new Expectations() {{
        oneOf(service).getObject(realm.getId(), myObject.getId());
        will(returnValue(myObject));
        oneOf(service).getDataSources(myObject.getDataSources());
}});
</pre>
<p>Now, I create the object I want to test and inject the service.</p>
<pre class="brush: java; title: ; notranslate">
ObjectViewController controller = new ObjectViewController();
controller.setService(service);
</pre>
<p>And then we fire away.</p>
<pre class="brush: java; title: ; notranslate">
ModelMap model = new ModelMap();
controller.doGet(myObject.getId(), model);
</pre>
<p>Now that the controller (the class we’re exercising) has gone off and populated the model, we check to see that the model is populated correctly. Just like we would in any other test.</p>
<pre class="brush: java; title: ; notranslate">
assertEquals(myObject.getName(), model.get(&quot;objectName&quot;));
assertEquals(myObject, model.get(&quot;object&quot;));
</pre>
<p>But in addition, we also assert that the expectations specified above were satisfied.</p>
<pre class="brush: java; title: ; notranslate">
context.assertIsSatisfied();
</pre>
<p>Not only can we be sure that the right calls were made with the right parameters, but we can also be sure that no calls besides the expected calls were made. So the next time you want more speed or control over your tests, take a look at jMock or another framework like it. It’s a powerful tool in the effort to test your best!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/11/22/fun-with-jmock/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Model-View-Adapter</title>
		<link>http://blog.palantirtech.com/2009/04/20/model-view-adapter/</link>
		<comments>http://blog.palantirtech.com/2009/04/20/model-view-adapter/#comments</comments>
		<pubDate>Mon, 20 Apr 2009 20:00:17 +0000</pubDate>
		<dc:creator>Kevin Simler</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[swing]]></category>
		<category><![CDATA[tips and tricks]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=210</guid>
		<description><![CDATA[I used to think I understood MVC. In undergraduate CS programs, MVC is taught as an off-the-shelf pattern, explained once and then ready for use in the real world. Wikipedia also makes it seem pretty simple: Model–View–Controller (MVC) is an architectural pattern used in software engineering. Successful use of the pattern isolates business logic from [...]]]></description>
			<content:encoded><![CDATA[<p>I used to think I understood MVC.  In undergraduate CS programs, MVC is taught as an off-the-shelf pattern, explained once and then ready for use in the real world.  Wikipedia also makes it seem pretty simple:</p>
<blockquote><p><a href="http://en.wikipedia.org/wiki/Model-view-controller">Model–View–Controller (MVC)</a> is an architectural pattern used in software engineering. Successful use of the pattern isolates business logic from user interface considerations, resulting in an application where it is easier to modify either the visual appearance of the application or the underlying business rules without affecting the other. In MVC, the model represents the information (the data) of the application; the view corresponds to elements of the user interface such as text, checkbox items, and so forth; and the controller manages the communication of data and the business rules used to manipulate the data to and from the model.</p></blockquote>
<p>They go on to show the classic triangle diagram and how it&#8217;s baked into various GUI and web frameworks.  There&#8217;s only one clause in the entire article that hints at something deeper:  &#8220;Though MVC comes in different flavors…&#8221;</p>
<p>Different flavors indeed.  In fact MVC is not just <em>a</em> pattern but a whole family of patterns:  <a href="http://en.wikipedia.org/wiki/Model-view-controller">MVC</a>, <a href="http://en.wikipedia.org/wiki/Model-view-adapter">MVA</a>, <a href="http://en.wikipedia.org/wiki/Model_View_Presenter">MVP</a>, <a href="http://en.wikipedia.org/wiki/Presentation-abstraction-control">PAC</a>, <a href="http://c2.com/cgi/wiki?ModelDelegate">Model-Delegate</a>&#8230;.  It very quickly gets very hairy.</p>
<p>In this article I want to describe one of MVC&#8217;s lesser-known variants, the <a href="http://en.wikipedia.org/wiki/Model-view-adapter">Model-View-Adapter (MVA) pattern</a>, and talk about its advantages over traditional MVC in the context of a Java Swing application.</p>
<p><span id="more-210"></span></p>
<h2>Architecture</h2>
<p>The best place to start is with an architecture diagram.  While vanilla MVC is a triangle:</p>
<div class="postimg"><a href="/wp-content/uploads/2009/04/mvc.png"><img title="mvc" src="/wp-content/uploads/2009/04/mvc.png" alt="Model-View-Controller" /></a></div>
<p>MVA puts the Adapter in a position to strictly mediate between Model and View:</p>
<div class="postimg"><a href="/wp-content/uploads/2009/04/mva.png"><img title="mva" src="/wp-content/uploads/2009/04/mva.png" alt="Model-View-Adapter" /></a></div>
<p>Here a solid line represents a direct relationship while a dashed line represents an indirect relationship via the Observer pattern.  Put another way, the Adapter holds a pointer both to the Model and to the View and directly calls methods on both.  At the same time, it attaches itself as a listener both to the Model and to the View in order to receive events.  It receives property change events from the Model and action events (checkbox ticked, text entered, etc.) from the View, and then routes appropriate changes to the other side.  The Adapter is entirely responsible for keeping the Model and the View in sync; the Model and View are both relatively dumb structures, knowing nothing about the other.</p>
<p>The advantages to organizing code this way are:</p>
<ul>
<li>All &#8220;moving parts&#8221; are centralized in one place, the Adapter.  No worrying about where to add a listener; no hunting around to find isolated listeners.</li>
<li>Separation of concerns between the View and the Adapter.  The View is responsible for layout and visual presentation while the Adapter is responsible for synchronization and the dynamic aspects of the user interface.</li>
<li>Better decoupling between Models and Views.  Specifically, the View doesn&#8217;t need to know anything about the Model.</li>
</ul>
<p>Additionally, while it will never be possible to fully <a href="http://se.ethz.ch/~meyer/publications/patterns/visitor.pdf">componentize </a> any variant of the MVC pattern, MVA is more amenable to componentization and thus more of its implementation can be centralized (in a single class) and reused.  Once componentized, we can augment the basic functionality with things like:</p>
<ul>
<li>Automatic registration and unregistration of listeners when the View enters and exits the Swing component hierarchy, thereby preventing certain kinds of memory leaks.</li>
<li>Automatic unregistration of listeners when the program shuts down.  This can help free up resources like realtime subscriptions.</li>
<li>Method for swapping a new Model object in for an old Model object.</li>
<li>Ability to execute a task without listeners attached, to help prevent event-action-event loops.</li>
</ul>
<p>The downside to using MVA over MVC is that the Adapter tends to take on a lot of the responsibility and can get quite complicated.  But in my experience that can be mitigated by having good conventions about which pieces (M, V, A) are allowed to communicate with which other pieces and at what times.  Enforcing predictable control flow goes a long way toward managing complexity.</p>
<p>Read on for a code-level description of our implementation of the MVA pattern.</p>
<h2>Palantir MVA Implementation</h2>
<p>Our half-componentization of MVA resides in a single abstract class named Adapter:</p>
<pre class="brush: java; title: ; notranslate">
public abstract class Adapter&lt;ViewType extends Component, ModelType&gt; {
// constructor
protected Adapter(ViewType view, ModelType model); { ... }

/**
* Attach listeners to the View's subcomponents (checkboxes etc.).
* Listeners should be stored as member variables in the Adapter
* subclass.
*/
protected abstract void registerViewListeners();

/**
* Detach the same listeners (member variables) that were
* attached in registerViewListeners().
*/
protected abstract void unregisterViewListeners();

/**
* Attach listener(s) to the Model.
*/
protected abstract void registerModelListeners();

/**
* Detach the same listeners (member variables) that were
* attached in registerModelListeners().
*/
protected abstract void unregisterModelListeners();

/**
* Bring the View fully in synch with the Model.  Typically
* this involves querying state from the Model and
* reconfiguring subcomponents of the View accordingly.
*/
protected abstract void fullSynchronize();

protected ModelType getModel() { ... }
protected ViewType getView() { ... }

// other methods elided
}
</pre>
<p>New View components that want to stay synchronized with a Model must instantiate a subclass of Adapter and implement the abstract methods.  The Adapter parent class (itself an example of the Template Method design pattern) will then call into the appropriate abstract methods at the appropriate times.  For example, after the View is constructed, as soon as it&#8217;s displayed in the Swing component hierarchy the Adapter parent class will automatically call fullSynchronize() (whose implementation should bring the View in line with the Model) and then registerViewListeners() and registerModelListeners(), so the Adapter is poised to react to events.  Likewise, when the View is removed from the component hierarchy (when its containing frame is closed, say), both unregisterViewListeners() and unregisterModelListeners() will be called.  This can help ensure that no memory will be leaked when a long-life-cycle object (like a system-wide singleton) retains a pointer to a short-life-cycle object (the View) via the Observer pattern.</p>
<h2>Dealing With Listener Loops</h2>
<p>One problem that confronts UI developers is the problem of &#8220;listener loops&#8221;:  infinite loops that result when the View fires an event, the Adapter (or Controller) responds to it by setting some property on the Model, and an event is propagated from the Model back to the View, starting the whole cycle over again.</p>
<p>One way to combat this is to make sure your Model only fires events when the value that&#8217;s being set on the Model is different from the value currently stored in the Model.  (This will cut off the infinite loop after one and a half cycles.)  It&#8217;s a good practice but often isn&#8217;t enough, especially when your system is multithreaded and events start to queue up.  You can sometimes get into situations where an M-V-C triplet will thrash forever between two different values for one of the Model&#8217;s properties.</p>
<p>Our solution to this problem is a protected method (on our Adapter base class) called runWithoutViewListeners:</p>
<pre class="brush: java; title: ; notranslate">
/**
* Guarantees that the job r will be run:
*    - on the Swing thread
*    - with Model listeners attached
*    - with View listeners DEtached
*/
public final void runWithoutViewListeners(final Runnable r) { ... }
</pre>
<p>The implementation of this method checks to make sure the view listeners are attached when it&#8217;s called, detaches them via a call to unregisterViewListeners(), invokes the Runnable, then reattaches the view listeners via a call to registerViewListeners().  The code inside the Runnable can then make whatever changes it wants to the View without perturbing the Model downstream.  Listener loop averted!</p>
<h2>More To Come</h2>
<p>I hope that&#8217;s given you some sense of the territory out there in the wide world of MVC-variants.  In a week or two, Derek will show off some of the work he&#8217;s done on the M piece of the MVA triad related to &#8220;event bubbling.&#8221;  Stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2009/04/20/model-view-adapter/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Printing to Plotters in Java</title>
		<link>http://blog.palantirtech.com/2008/08/11/printing-to-plotters-in-java/</link>
		<comments>http://blog.palantirtech.com/2008/08/11/printing-to-plotters-in-java/#comments</comments>
		<pubDate>Tue, 12 Aug 2008 00:00:18 +0000</pubDate>
		<dc:creator>Carl Freeland</dc:creator>
				<category><![CDATA[javatech]]></category>
		<category><![CDATA[tips and tricks]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/?p=104</guid>
		<description><![CDATA[Carl juggles with his creation One of the things our customers love to do is print our beautiful object graphs and tape them to the wall for discussion. What they hate to do is print 30 pages, line them up, and tape them to a poster one at a time. So we bought a plotter, [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; width: 300px; text-align: center; margin-left:10px; margin-bottom: 10px'>
<a href='http://blog.palantirtech.com/wp-content/uploads/2008/08/carljuggles-thumb.jpg' title='carljuggles-thumb.jpg'><img src='http://blog.palantirtech.com/wp-content/uploads/2008/08/carljuggles-thumb.jpg' alt='carljuggles-thumb.jpg' /></a><br/><a href='http://blog.palantirtech.com/wp-content/uploads/2008/08/carljuggles-thumb.jpg' title='carljuggles-thumb.jpg'><em>Carl juggles with his creation</em></a></div>
<p>One of the things our customers love to do is print our <a href="http://blog.palantirtech.com/wp-content/uploads/2008/07/pg-timefilter.png">beautiful object graphs</a> and tape them to the wall for discussion.  What they hate to do is print 30 pages, line them up, and tape them to a poster one at a time.  So we bought a <a href="http://en.wikipedia.org/wiki/Plotter">plotter</a>, and I started plotting.</p>
<p>I needed to print directly to a Java <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/Graphics.html">Graphics</a> object.  Unfortunately, the available information on large output printing from Java is thin at best.  While there are lots of ways to successfully place ink on paper, I was only able to find one that reliably lets the application pick odd paper sizes that plotters use, like 24&#215;19.7 inches. (The term &#8220;plotter&#8221; used to mean something with pens for printing blueprints and such.  Now it just means a large format printer, commonly printers that can use roll paper as a source.)</p>
<p>One of the first things you&#8217;ll learn when you start working with printing in Java is that a language intended to be all things to all people (i.e., cross-platform) is utterly lousy at tasks highly specific to a given environment, such as printing.  It will not surprise you to hear that native print services on Windows are pretty different from those available on a Mac, which themselves are pretty different from the <a href="http://www.cups.org/">CUPS</a> system common to Unix systems.</p>
<p>So, by and large, you are reduced to the least common denominator of printing.  Part and parcel of this least common denominator is agreeing on what constitutes a piece of paper and sticking to it.  This is fine for people thinking, &#8220;My paper is 8.5 inches wide by 11 inches tall.&#8221;  It poses a bit of a problem for people with plotters who are thinking, &#8220;My paper is 24 inches wide by as many damned inches tall as I need.&#8221;  Even relatively powerful programs like PhotoShop or <a href="http://www.gimp.org/features/">GIMP</a> don&#8217;t seem to support plotters well.  I believe Photoshop works by specifying the exact paper size you want to use, but any technique in which the easiest solution for the user is to pull out a calculator does not meet with my approval.<br />
<span id="more-104"></span></p>
<h2>Advice in a nutshell</h2>
<p>A <a href="http://www.apl.jhu.edu/~hall/java/Swing-Tutorial/Swing-Tutorial-Printing.html ">tutorial by MartyHall</a> provides some valuable insights into printing which I applied to my printing component.  While it didn&#8217;t address my specific issues, props to a well-done starting point.</p>
<p>I&#8217;m starting with the assumption that you&#8217;ve got yourself a class that implements the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/print/Printable.html">Printable</a> interface so that once the Java printing subsystem is coerced into accepting the appropriate paper size,<a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/print/Printable.html#print(java.awt.Graphics,%20java.awt.print.PageFormat,%20int)"> Printable.print(Graphics, PageFormat, int)</a> will be called correctly and your job will print properly.  Since my usage was screen-visible as well as printable, I simplified and asserted that one screen pixel is equivalent to one point (1/72 inch), which is how Java prepares the scaling on the Graphics object passed to Printable.print(&#8230;), thus making printouts look about the size of onscreen displays.  Then, knowing that the printout needed, for example, 1200 vertical pixels/points, I can automatically calculate that I want to use a paper size 18&#8243; tall or so (1200 / 72 dpi = 16.67&#8243; + margins).</p>
<p>Here&#8217;s a summary of the critical steps:</p>
<ol>
<li>Do the math yourself to get the paper size required in points (1/72 of an inch)
<li>Implement the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/print/Pageable.html">Pageable</a> interface to report page count (usually one) and the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/print/PageFormat.html">PageFormat</a> using the computed paper size.
<li>Create a <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/print/PrinterJob.html">PrinterJob</a> configured with the correct <a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/print/PrintService.html">PrintService</a>
<li>Set your Pageable on the PrinterJob instance
<li>Optionally display print setup dialog (PrinterJob.printDialog()) recognizing that changes to orientation or paper size will be ignored.
<li>call <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/print/PrinterJob.html#print()">PrinterJob.print()</a>
</ol>
<h3>Generally, don&#8217;t use the standard dialogs</h3>
<p>This might seem contrary to desired behavior, but I found that giving the user access to the usual native or cross-platform print dialogs can be a bit sticky because those are places where the user can alter the selected paper; in order to properly print to roll paper sources, we must override any paper settings the dialogs might provide. Ultimately this can be confusing for the user, but that&#8217;s an unfortunate side effect of supporting plotters.</p>
<h3>Don&#8217;t take any advice too seriously</h3>
<p>We still choose to display the dialog because complex printers like plotters have many options that the users may need to alter and would otherwise have no way to modify.</p>
<h2>StickFigure: looking at plotting in Palantir</h2>
<p>This work didn&#8217;t happen in a vacuum: we have a need to print large graphs for display.  Being able to print them in a large format is one of the simplest and most beloved methods of collaboration.</p>
<p>To give an example of why plotter printing is interesting, we created a stick figure on the graph in Palantir. After creating this monstrosity, we went go print it and examined the layout.  The first layout was with letter-size paper, and just by looking at the head, we could see that it was going to be a lot of pages:</p>
<p><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/letterzoom.jpg' title='letterzoom-thumb.jpg'><img src='http://blog.palantirtech.com/wp-content/uploads/2008/08/letterzoom-thumb.jpg' alt='letterzoom-thumb.jpg' /></a><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/letterzoom.jpg' title='letterzoom-thumb.jpg'><em>Zoomed-in view of the head of the stick figure. (click for full image)</em><br/></a></p>
<p>Zooming out to view the entire graph, it&#8217;s clear that it&#8217;s going to be 35 pages!</p>
<p><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/letter.jpg' title='letterzoom-thumb.jpg'><img src='http://blog.palantirtech.com/wp-content/uploads/2008/08/letter-thumb.jpg' alt='letter-thumb.jpg' /></a><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/letter.jpg' title='letterzoom-thumb.jpg'><em>Full pagination of the stick figure. (click for full image)</em><br/></a></p>
<p>Even switching to the largest standard paper size only gets us to two sheets:</p>
<p><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/arche.jpg' title='arche.jpg'><img src='http://blog.palantirtech.com/wp-content/uploads/2008/08/arche-thumb.jpg' alt='arche.jpg' /></a><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/arche.jpg' title='arche.jpg'><br />
<em>Paginated out to largest standard paper size, still two pages (click for full image)</em></a></p>
<p>When we switch over to using the plotter layout, we finally get to the one sheet we were looking for:</p>
<p><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/roll.jpg' title='roll-thumb.jpg'><img src='http://blog.palantirtech.com/wp-content/uploads/2008/08/roll-thumb.jpg' alt='roll-thumb.jpg' /></a><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/roll.jpg' title='roll-thumb.jpg'><br />
<em>Paginated onto a plotter roll. (click for full image)</em></a></p>
<p>And finally, Mr. StickFigure comes to life:</p>
<p><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/carljuggles.jpg' title='carljuggles.jpg'><img src='http://blog.palantirtech.com/wp-content/uploads/2008/08/carljuggles-med.jpg' alt='carljuggles-med.jpg' /></a><a style='display: block; text-align: center' href='http://blog.palantirtech.com/wp-content/uploads/2008/08/carljuggles.jpg' title='carljuggles.jpg'><em>Carl juggles with his creation</em></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2008/08/11/printing-to-plotters-in-java/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>James Gosling comes to visit</title>
		<link>http://blog.palantirtech.com/2008/03/11/gosling-comes-to-visit/</link>
		<comments>http://blog.palantirtech.com/2008/03/11/gosling-comes-to-visit/#comments</comments>
		<pubDate>Tue, 11 Mar 2008 18:21:45 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[javatech]]></category>
		<category><![CDATA[palantir]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2008/03/11/gosling-comes-to-visit/</guid>
		<description><![CDATA[Following the discovery that our offices were the birthplace of Java (or least the place where it had its childhood), I invited James Gosling to come visit. For those that don&#8217;t know who James Gosling is, he&#8217;s more-or-less the father of Java. Java started as a project of James Gosling&#8217;s in 1991; today, 17 years [...]]]></description>
			<content:encoded><![CDATA[<div style='float: right; margin-left: 20px; margin-bottom: 15px;'><a href='http://blogs.sun.com/jag/' title="james gosling's blog"><img src='http://blogs.sun.com/roller/resources/jag/SouthParkJAG-small.png' alt='james gosling as a south park character' style='border: 0px'/></a></div>
<p>Following <a href="http://blog.palantirtech.com/2008/03/04/birthplace-of-java/">the discovery that our offices were the birthplace of Java</a> (or least the place where it had its childhood), I invited <a href="http://blogs.sun.com/jag/">James Gosling</a> to come visit.  For those that don&#8217;t know who James Gosling is, he&#8217;s more-or-less the <a href="http://www.alenz.org/mirror/khason/why-microsoft-can-blow-off-with-c.html">father of Java</a>.  <a href="http://java.sun.com/features/1998/05/birthday.html">Java started as a project of James Gosling&#8217;s in 1991</a>; today, 17 years later, <a href="http://www.sun.com/aboutsun/media/ceo/bio.jsp?name=James%20Gosling">he&#8217;s still at Sun, in charge of guiding the Java platform into the future.</a></p>
<p>How does one invite such a luminary to come visit one&#8217;s offices?  One guesses what his email address is and sends him an email out of the blue:</p>
<blockquote><p>
James,</p>
<p>My name is Ari Gordon-Schlosberg, an engineer at Palantir Technologies.  I recently became interested in the <a href="http://blog.palantirtech.com/2008/03/04/birthplace-of-java/">storied history of our current facilities at 100 Hamilton Ave. in Palo Alto.</a>  As Java programmers, our engineering team is really excited to be working in the same place that gave the world Java.</p>
<p>You may not have heard of Palantir, but we&#8217;re <a href="http://blog.palantirtech.com/2007/12/04/what-do-we-do/">working on some pretty interesting problems, using Java to build large-scale analysis applications that really push forward the state-of-the-art</a>. We&#8217;ve won some <a href="http://blog.palantirtech.com/2007/09/11/palantir-screenshots/">accolades for our use of Swing by Romain Guy.</a>  If you felt like dropping by the next time you&#8217;re in the valley, we&#8217;d love to have you come by, see your old digs, and take a peek at what we&#8217;re working on.</p>
<p>Sincerely,</p>
<p>Ari Gordon-Schlosberg
</p></blockquote>
<p>To quote the Microsoft Program Manager&#8217;s book of proverbs: 90% of making things happen is sending email.</p>
<p>So James dropped by one Thursday for demos, lunch, and schmoozing with our engineers.</p>
<p>The first order of business was to demo our software to James.  We got a bunch of the senior engineers together and showed him an abbreviated demo of both <a href="http://palantirtech.com/products.html">Palantir Government and Palantir Finance</a>.  We focused less on the problem-space aspects of the software and more on how we&#8217;re using Java to build the application.  We went over how both of our apps are completely written in Java and that our GUIs are built with custom Swing components.</p>
<p>The most memorable part of the conversation went something like this:</p>
<blockquote><p>
LEAD DEV: So&#8230; what do you think of our applications?</p>
<p>GOSLING: It makes me want to weep.</p>
<p>LEAD DEV: Uh&#8230; ?</p>
<p>GOSLING: Yeah, we&#8217;ve been working on this infrastructure for years to be able to build applications like this and finally someone is doing it.
</p></blockquote>
<div style='float: left; margin-right: 20px; margin-bottom: 15px; width: 100px;'>
<a href='http://www.sun.com/aboutsun/media/ceo/bio.jsp?name=James%20Gosling' title='jag.jpg'><img style='width: 100px; border: 0px;' src='http://blog.palantirtech.com/wp-content/uploads/2008/03/jag.jpg' alt='jag.jpg' /></a>
</div>
<p>The rest of the visit was spent talking about Java, its history and its future.  Topics ranged from why it&#8217;s hard to get dinosaurs like cable companies and mobile carriers to use modern technology to some of the complication in building an optimizing JIT compiler.</p>
<p>After lunch, I walked him to the elevator to see him off.  We said our goodbyes and he stepped into the elevator, which was already occupied by the mailman making his rounds.  As the doors closed, I hear the mailman say to James:</p>
<p>&#8220;Well, I haven&#8217;t seen you around here in a while.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2008/03/11/gosling-comes-to-visit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Best Practices: compareTo consistent with equals</title>
		<link>http://blog.palantirtech.com/2007/09/02/compareto-consistent-with-equals/</link>
		<comments>http://blog.palantirtech.com/2007/09/02/compareto-consistent-with-equals/#comments</comments>
		<pubDate>Mon, 03 Sep 2007 01:48:51 +0000</pubDate>
		<dc:creator>John Carrino</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/07/28/compareto-consistent-with-equals/</guid>
		<description><![CDATA[What is wrong with this class and why? I&#8217;ll tell you beforehand there are two things I am looking for and they are both in the compareTo function. Yes, this came from the Palantir code base and caused me some issues. It has been modified slightly for illustrative purposes. According to the documentation for Comparable: [...]]]></description>
			<content:encoded><![CDATA[<p>What is wrong with this class and why?  I&#8217;ll tell you beforehand there are two things I am looking for and they are both in the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Comparable.html#compareTo(java.lang.Object)">compareTo</a> function.  Yes, this came from the Palantir code base and caused me some issues.  It has been modified slightly for illustrative purposes.</p>
<pre class="brush: java; title: ; notranslate">
class GenericDataSource extends DataSource implements Comparable&lt;dataSource&gt; {
    public int compareTo(DataSource o) {
        if(o != null){
            return getName().compareTo(o.getName());
        }
        return 1;
    }
    @Override
    public boolean equals(Object o) {
        if(o == null) return false;
        if(o instanceof DataSource)
            return this.getLocator().equals(((DataSource)d).getLocator());
        return false;
    }
}
</pre>
<p><span id="more-61"></span></p>
<p>According to the documentation for <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Comparable.html">Comparable</a>:</p>
<blockquote><p> It is strongly recommended, but not strictly required that <code>(x.compareTo(y)==0) == (x.equals(y))</code>. Generally speaking, any class that implements the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Comparable.html">Comparable</a> interface and violates this condition should clearly indicate this fact. The recommended language is &#8220;Note: this class has a natural ordering that is inconsistent with equals.&#8221;</p></blockquote>
<p>It is clear that this uses two totally different schemes for <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Object.html#equals(java.lang.Object)">equals</a> and <code>compareTo.</code>  One uses the <code>locator</code> field and the other uses the <code>name</code> field.  This means that this class has an inconsistent <code>compareTo</code>.</p>
<p>How would failing to have my natural ordering be consistent effect me?   You would expect the following code to be true for collections that don&#8217;t contain null, but this is not the case for classes where the natural ordering is not preserved.</p>
<blockquote><p><code>new HashSet&lt;Type&gt;(collection).size() == new TreeSet&lt;Type&gt;(collection).size()</code></p></blockquote>
<p>Another issue with this <code>compareTo</code> function has to do with its handling of <code>null</code>.  Since the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Comparable.html#compareTo(T)"><code>compareTo</code></a> function is reflexive even for <code>null</code>, the javadoc states that <code>null</code> must be handled by an <a href="http://java.sun.com/j2se/1.4.2/docs/api/java/lang/NullPointerException.html">NPE</a>.</p>
<blockquote><p>In the foregoing description, the notation <code>sgn(expression)</code> designates the mathematical <code>signum</code> function, which is defined to return one of -1, 0, or 1 according to whether the value of expression  is negative, zero or positive. The implementor must ensure <code>sgn(x.compareTo(y)) == -sgn(y.compareTo(x))<code> for all <code>x</code> and <code>y</code>. (This implies that <code>x.compareTo(y)</code> must throw an exception if and only if <code>y.compareTo(x)</code> throws an exception.)</code></code></p></blockquote>
<p>In short: in general, make sure that <code>equals</code> and <code>compareTo</code> are implemented in a coherent and consistent manner and it will save you some subtle and annoying headaches.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/09/02/compareto-consistent-with-equals/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SimpleDateFormat is not thread-safe</title>
		<link>http://blog.palantirtech.com/2007/07/11/simpledateformat-is-not-thread-safe/</link>
		<comments>http://blog.palantirtech.com/2007/07/11/simpledateformat-is-not-thread-safe/#comments</comments>
		<pubDate>Wed, 11 Jul 2007 08:58:10 +0000</pubDate>
		<dc:creator>Allen Chang</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/07/11/simpledateformat-is-not-thread-safe/</guid>
		<description><![CDATA[It seems like a relatively common mistake is to assume that the java.text.SimpleDateFormat class is thread-safe (at least for methods such as format(), which you might not expect to mutate state!). This is not true; SimpleDateFormat is not thread-safe, and format() does mutate state. From the javadoc: Date formats are not synchronized. It is recommended [...]]]></description>
			<content:encoded><![CDATA[<p>It seems like a relatively common mistake is to assume that the <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html">java.text.SimpleDateFormat</a> class is thread-safe (at least for methods such as <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html#format(java.util.Date,%20java.lang.StringBuffer,%20java.text.FieldPosition)">format()</a>, which you might not expect to mutate state!). This is not true; SimpleDateFormat is <em>not</em> thread-safe, and <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html#format(java.util.Date,%20java.lang.StringBuffer,%20java.text.FieldPosition)">format()</a> does mutate state. From the javadoc:</p>
<blockquote><p>Date formats are not synchronized.  It is recommended to create separate format instances for each thread.  If multiple threads access a format concurrently, it must be synchronized  externally.</p></blockquote>
<p>Suggestions on handling synchronization:</p>
<ul>
<li>Use the <a href="http://joda-time.sourceforge.net/">Joda Time libraries</a> if possible. They are thread safe. Joda Time is <a href='http://joda-time.sourceforge.net/userguide.html#JDK_Interoperability'>fairly easy to port to from existing JDK compatible code</a>.</li>
<li>Synchronize calls to <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/text/SimpleDateFormat.html#format(java.util.Date,%20java.lang.StringBuffer,%20java.text.FieldPosition)">format()</a> manually.  You probably want to write a wrapper class/function to do this, and make sure nobody calls the original format() method by accident. This seems like it should be safe, but see <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4228335">http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=422833</a>.</li>
<li>Use <a href="http://en.wikipedia.org/wiki/Thread-local_storage">thread-local storage</a>, e.g.</li>
</ul>
<pre class="brush: java; title: ; notranslate">
private final static ThreadLocal&lt;SimpleDateFormat&gt; shortTimeFormat =
     new ThreadLocal&lt;SimpleDateFormat&gt;() {
            protected SimpleDateFormat initialValue() {
                 return new SimpleDateFormat(&quot;HH:mm&quot;);
             }
      };
</pre>
<p><strong>N.B.</strong> In general, you should be very careful when using <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/ThreadLocal.html">ThreadLocal</a> storage, especially when using thread pools (your data won&#8217;t be automatically garbage collected and may be visible to other Threads&#8212;<a href="http://www.devwebsphere.com/devwebsphere/2005/06/dont_use_thread.html">a security risk</a>) or <a href="http://crazybob.org/2006/02/threadlocal-memory-leak.html">when storing references to Thread objects in ThreadLocal storage </a>(which might confuse the GC). However, in this case, we should be okay.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/07/11/simpledateformat-is-not-thread-safe/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>XML Pull Parsing and Enums: like chocolate and peanut butter</title>
		<link>http://blog.palantirtech.com/2007/05/31/life-on-earth/</link>
		<comments>http://blog.palantirtech.com/2007/05/31/life-on-earth/#comments</comments>
		<pubDate>Thu, 31 May 2007 09:59:18 +0000</pubDate>
		<dc:creator>Ari Gesher</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[javatech]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/05/31/life-on-earth/</guid>
		<description><![CDATA[. There comes a time in every developer&#8217;s life when they need to write code that processes some XML. Lately, we&#8217;ve seen the proliferation of APIs that make XML processing easier, like JAXB (Java API for XML Binding). However, when speed and scale are required, chances are you&#8217;re going to need to roll your own [...]]]></description>
			<content:encoded><![CDATA[<p style="float: right; margin-left: 15px"> <img src="http://blog.palantirtech.com/wp-content/uploads/2007/03/enum-screenshot.png" alt="Enumeration Screenshot" />.</p>
<p>There comes a time in every developer&#8217;s life when they need to write code that processes some XML.  Lately, we&#8217;ve seen the proliferation of APIs that make XML processing easier, like <a href="http://java.sun.com/webservices/jaxb/index.jsp">JAXB (Java API for XML Binding)</a>.  However, when speed and scale are required, chances are you&#8217;re going to need to roll your own processor.  Before I continue, let me clear up some terminology, when I say <em>&#8220;processor&#8221;</em>, I mean the code of yours that&#8217;s wrapped around a <a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/SAXParser.html">SAX</a> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPSAX.html#wp69937">(tutorial)</a>, <a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilder.html">DOM</a> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPDOM.html#wp79994">(tutorial)</a>, or an <a href="http://xmlpull.org/v1/doc/api/org/xmlpull/v1/XmlPullParser.html">XPP</a> <a href="http://www.xmlpull.org/v1/download/unpacked/doc/quick_intro.html">(tutorial)</a> parser, not the guts of the parser itself.</p>
<p>At the end of the day, that&#8217;s the interesting part of what you&#8217;re doing &#8211; the grammar of your data model rather than the minutiae of start and end tags.  Building a processor is the interface between the data interchange format and the internal data model of your application.</p>
<p>Click through for a tour of XML parsers and a look at a novel technique for encoding processors that use pull parsers (as usual, we&#8217;ve included a <a href="http://blog.palantirtech.com/wp-content/uploads/2007/05/launch_lifeonearth.jnlp" title="launch_lifeonearth.jnlp">WebStart demo</a>, as well as a <a href="http://blog.palantirtech.com/wp-content/uploads/2007/05/lifeonearth.jar" title="lifeonearth.jar">jar file containing the compiled example along with all of its source code</a>).<br />
<span id="more-53"></span></p>
<h2>XML Parsers</h2>
<p>Each of the three most common types of XML parsers has its own strengths and weaknesses.  DOM, in particular, does not scale well.  The DOM technique of reading the entire document into memory and creating first-class language objects for every explicit and implied element makes it slow and resource intensive.  Allegorically, this is balanced against the ease of programming against the <a href="http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/package-summary.html">DOM API</a>, although I personally find the DOM API to be a bit cumbersome.  The upshot is that DOM is not well suited for large documents (because of its memory requirements) or applications that need low-latency and overall throughput (since you a) don&#8217;t see the DOM until the whole thing is done being built and b) objects are expensive to construct).</p>
<p>SAX Parsing scales very well, but it uses a <a href="http://en.wikipedia.org/wiki/Callback_(computer_science)">callback pattern</a> which can be awkward to code against for large grammars. This problem has been addressed in various ways.  Gianluigi Colaiacomo of IBM wrote this article detailing a method that ends up looking a lot like pull parsing for SAX: <a href="http://www-128.ibm.com/developerworks/xml/library/x-dochan.html">Simplify document handler programs with the SAX parser</a>.</p>
<p>In the case of XML Pull Parsing <a href="http://www.sujal.net/tech/briefs/XPP-TB.html">(excellent tech brief on XPP here)</a>, it&#8217;s optimized for parsing tasks that require all elements in a document to be processed and leads to an event stream style of programming that is much more intuitive and easy-to-follow than SAX.  In the past few years, the work on pull parsing has come together into a standard called <a href="http://www.xmlpull.org/history/index.html">StAX</a>, or <em><strong>St</strong>reaming <strong>A</strong>PI for <strong>X</strong>ML</em>.</p>
<p>StAX has more or less replaced XmlPull as the standard for XML streaming, but the XPP parsers still exist and are very, very fast. XPP became our parser of choice when looking at things like real-time serialization of data and high-speed import of object graphs from XML for our integration interfaces.</p>
<p style="float: right"> <a href="http://en.wikipedia.org/wiki/Scientific_classification"><br />
<img src="http://upload.wikimedia.org/wikipedia/commons/thumb/5/5f/Biological_classification_L_Pengo.svg/150px-Biological_classification_L_Pengo.svg.png" style="border: 0pt none ; margin-left: 15px; margin-right: 15px" alt="Taxonomy" /><br />
</a></p>
<h2>Life On Earth</h2>
<p>First, let&#8217;s set the stage: rather than going into the detail and complexity of the Palantir XML formats, I&#8217;ve designed a somewhat simpler example to make this all easier to digest.  Note that the <a href="http://blog.palantirtech.com/wp-content/uploads/2007/05/lifeonearth.xsd" title="LifeOnEarth Schema definition">LifeOnEarth schema document</a>, source code, and  <a href="http://blog.palantirtech.com/wp-content/uploads/2007/05/species-0.xml" title="LifeOnEarth example instance document">example LifeOnEarth instance document</a> are included in the JAR file.</p>
<p>In this example, what will we be encoding? Organisms!  That&#8217;s right, <a href="http://en.wikipedia.org/wiki/Taxon">biological taxonomy</a>. The biological taxonomy is the set of official Latin names for all the living things in the world.  It&#8217;s basically a straight hierarchy, with one notable exception: <em>Phylum</em> vs. <em>Division</em>.  Plant biologists use the term <em>division</em> to describe the same broad morphological grouping that their zoological counterparts call a <em>phyla</em>.</p>
<p>This is great for our example: after all, who wants to have a straight hierarchy?  We need something a little out of the ordinary to make things interesting.</p>
<h2>The XML schema: <em>lifeOnEarth</em></h2>
<p>The idea here is that we&#8217;ll build an <a href="http://www.w3.org/XML/Schema">XML Schema</a> document describing an instance document format for encoding information about the biological taxonomy.  We make extensive use of <a href="http://www.w3.org/TR/2005/WD-xmlschema11-1-20050224/structures.html#element-complexType">complexType</a> elements in the schema to afford us a sort of code reuse. The <em>Class-&gt;Family-&gt;Order-&gt;Genus-&gt;Species</em> hierarchy logically repeated underneath <em>Phylum</em> and <em>Division</em> in the taxonomy model.  However, so as to be as <a href="http://en.wikipedia.org/wiki/Don't_repeat_yourself">DRY</a> as possible, we use complex type declarations instead of defining complex types in-line when declaring the elements . It&#8217;s a lot like using declared types instead of <a href="http://www.javaworld.com/javaworld/javaqa/2000-03/02-qa-innerclass.html">anonymous inner classes</a> in Java.</p>
<p>Using the complexType elements, we define the names, attributes, and allowed members for every step of the hierarchy. We then define the document itself by declaring a single top level element, <em>lifeOnEarth</em>:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;xsd:element name=&quot;lifeOnEarth&quot; type=&quot;tns:lifeOnEarth&quot;&gt;&lt;/xsd:element&gt;
</pre>
<p>You can see that the <strong>element</strong> <em>lifeOnEarth</em> is of <strong>complexType</strong> <em>lifeOnEarth</em>; the same name in different namespaces. The type <em>lifeOnEarth</em> is defined thusly:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;xsd:complextype name=&quot;lifeOnEarth&quot;&gt;
            &lt;xsd:sequence&gt;
                        &lt;xsd:element name=&quot;domain&quot; type=&quot;tns:domain&quot; maxoccurs=&quot;unbounded&quot;&gt;&lt;/xsd:element&gt;
            &lt;/xsd:sequence&gt;
&lt;/xsd:complextype&gt;
</pre>
<p>An element of type <em>lifeOnEarth</em> contains one or more elements (<em><a href="http://www.w3.org/TR/2005/WD-xmlschema11-1-20050224/structures.html#element-sequence">sequence</a></em>, in XSD parlance) of type <em>domain</em>.  An element of type <em>domain</em> contains a sequence of elements of type <em>kingdom</em>.  So on and so forth, all the way down to <em>species</em> (with the double hierarchy for the <em>phylum</em>/<em>division</em> duality).</p>
<p>Here&#8217;s the schema document:</p>
<pre class="brush: xml; title: ; notranslate">
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;

&lt;xsd:schema xmlns:xsd=&quot;http://www.w3.org/2001/XMLSchema&quot;
            targetNamespace=&quot;http://www.palantirtech.com/schema/examples/lifeOnEarth&quot;
            xmlns:tns=&quot;http://www.palantirtech.com/schema/examples/lifeOnEarth&quot;
            elementFormDefault=&quot;qualified&quot;&gt;
    &lt;xsd:annotation&gt;
        &lt;xsd:documentation&gt;This schema is based on alpha taxonomy, he science of describing, categorizing and naming organisms.  For a good overview, see http://en.wikipedia.org/wiki/Taxon&lt;/xsd:documentation&gt;
    &lt;/xsd:annotation&gt;
    &lt;xsd:complexType name=&quot;lifeOnEarth&quot;&gt;
        &lt;xsd:sequence&gt;
            &lt;xsd:element name=&quot;domain&quot; type=&quot;tns:domain&quot; maxOccurs=&quot;unbounded&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:sequence&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;species&quot;&gt;
        &lt;xsd:attribute name=&quot;commonName&quot; type=&quot;xsd:string&quot;/&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;genus&quot;&gt;
        &lt;xsd:sequence minOccurs=&quot;0&quot;&gt;
            &lt;xsd:element name=&quot;species&quot; type=&quot;tns:species&quot; maxOccurs=&quot;unbounded&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:sequence&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;family&quot;&gt;
        &lt;xsd:sequence minOccurs=&quot;0&quot;&gt;
            &lt;xsd:element name=&quot;genus&quot; type=&quot;tns:genus&quot; maxOccurs=&quot;unbounded&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:sequence&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;order&quot;&gt;
        &lt;xsd:sequence minOccurs=&quot;0&quot;&gt;
            &lt;xsd:element name=&quot;family&quot; type=&quot;tns:family&quot; maxOccurs=&quot;unbounded&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:sequence&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;class&quot;&gt;
        &lt;xsd:sequence minOccurs=&quot;0&quot;&gt;
            &lt;xsd:element name=&quot;order&quot; maxOccurs=&quot;unbounded&quot; type=&quot;tns:order&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:sequence&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;phylum&quot;&gt;
        &lt;xsd:sequence minOccurs=&quot;0&quot;&gt;
            &lt;xsd:element name=&quot;class&quot; type=&quot;tns:class&quot; maxOccurs=&quot;unbounded&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:sequence&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;division&quot;&gt;
        &lt;xsd:sequence minOccurs=&quot;0&quot;&gt;
            &lt;xsd:element name=&quot;class&quot; type=&quot;tns:class&quot; maxOccurs=&quot;unbounded&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:sequence&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;kingdom&quot;&gt;
        &lt;xsd:choice maxOccurs=&quot;unbounded&quot; minOccurs=&quot;0&quot;&gt;
            &lt;xsd:element name=&quot;phylum&quot; type=&quot;tns:phylum&quot;&gt;&lt;/xsd:element&gt;
            &lt;xsd:element name=&quot;division&quot; type=&quot;tns:division&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:choice&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:complexType name=&quot;domain&quot;&gt;
        &lt;xsd:sequence minOccurs=&quot;0&quot;&gt;
            &lt;xsd:element name=&quot;kingdom&quot; type=&quot;tns:kingdom&quot; maxOccurs=&quot;unbounded&quot;&gt;&lt;/xsd:element&gt;
        &lt;/xsd:sequence&gt;
        &lt;xsd:attribute name=&quot;name&quot; type=&quot;xsd:string&quot;/&gt;
    &lt;/xsd:complexType&gt;
    &lt;xsd:element name=&quot;lifeOnEarth&quot; type=&quot;tns:lifeOnEarth&quot;&gt;&lt;/xsd:element&gt;
&lt;/xsd:schema&gt;
</pre>
<p>The upside of making the schema document comes from being able to leverage existing validation mechanisms.  Using a validator, any document can be verified to be a well-form instance of the schema.  <strong>This allows the processor to be able to assume valid documents</strong>, greatly reducing the error handling burden on the developer when writing the processor.  Additionally, if you&#8217;re writing both sides of the XML transaction, you can validate the output of the renderer to perform early error detection. And finally, the validator is the perfect thing for using in <a href="http://en.wikipedia.org/wiki/Unit_testing">unit testing</a> related to your XML processing.  Sun included a <a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/validation/package-summary.html#example-1">simple validation example</a> in the Javadocs for the <a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/validation/package-summary.html">javax.xml.validation</a> package.  For a more in-depth look at using validation, see <a href="http://blog.palantirtech.com/wp-content/uploads/2007/05/lifeonearthschema.java" title="LifeOnEarthSchema.java">LifeOnEarthSchema.java</a>, part of this example that shows how create a shared schema object (using the <a href="http://en.wikipedia.org/wiki/Singleton_pattern">singleton pattern</a>) from an XSD file loaded from the classpath.</p>
<h2>Encoding the XML schema&#8217;s FSA (peanut butter)</h2>
<p class="postimg"> <img src="http://blog.palantirtech.com/wp-content/uploads/2007/03/lifeonearth-fsa.png" alt="Life On Earth XML Parsing Finite State Automata" /></p>
<p>So it turns out that processing XML, just like any parsing task, can be represented by <a href="http://en.wikipedia.org/wiki/Finite_state_machine">Finite State Automata</a>.  Finite state automatas (FSAs) can be encoded by a list of valid states and the valid transitions between states.  Since XML is nicely hierarchical, encoding the information is fairly straightforward. A tag name maps to a state in the FSA and its valid transitions are to any of its parents or any of its children.</p>
<p>With this in mind, I sat down to design our XML formats for integration, realizing that I needed a way to represent all the states the parser could be in as it parsed the document.  That context would drive how the parser events were handled and the routing of parsed data to the proper places.</p>
<p>So I started thinking: &#8220;I&#8217;ll bet I can do this really cleanly with <a href="http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html">enums</a>.&#8221; Here&#8217;s what I came up with:</p>
<p>We encode all the states of the parser into a Java <a href="http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html">Enum</a>.  The Enum constructor takes two arguments: one or more ParserStates that are valid parent states and the text that corresponds to the text representation of the XML tag in the document. Since most states only have one parent, we overload the constructor for convenience and readability to only take a single state instead of an array.</p>
<p>After the constructors, we define an instance method that will check that a passed state is a valid transition from this state and some static members and initialization to allow us to map strings passed by the XML parser into the Enum objects.</p>
<p>Here&#8217;s the Enum&#8217;s definition:</p>
<pre class="brush: java; title: ; notranslate">
	private enum ParserState {

		GROUND(new ParserState[]{},&quot;&quot;),
		LIFE_ON_EARTH(GROUND,&quot;lifeOnEarth&quot;),
		DOMAIN(LIFE_ON_EARTH,&quot;domain&quot;),
		KINGDOM(DOMAIN,&quot;kingdom&quot;),
		PHYLUM(KINGDOM,&quot;phylum&quot;),
		DIVISION(KINGDOM,&quot;division&quot;),
		CLASS(new ParserState[]{DIVISION,PHYLUM},&quot;class&quot;),
		ORDER(CLASS,&quot;order&quot;),
		FAMILY(ORDER,&quot;family&quot;),
		GENUS(FAMILY,&quot;genus&quot;),
		SPECIES(GENUS,&quot;species&quot;);

		/**
		 * Array of parent states to this one.
		 */
		ParserState parents[] = new ParserState[]{};
		/**
		 * Tag name for this state.
		 */
		String tagName = null;

		/**
		 * Constructor for ParserState with a single parent state.
		 * @param parent
		 * @param tagName
		 */
		ParserState(ParserState parent,String tagName){
			this.parents = new ParserState[]{parent};
			this.tagName = tagName;
		}

		/**
		 * Constructor for ParserState with a multiple parent states.
		 * @param parents
		 * @param tagName
		 */
		ParserState(ParserState[] parents,String tagName){
			this.parents = parents;
			this.tagName = tagName;
		}

		/**
		 * Checks whether it is valid to transition to the
		 * specified state from this state.
		 * @param newState
		 * @return
		 */
		public boolean checkValid(ParserState toState){
			if(this.equals(toState))
				return true;

			for(int i = 0; i &lt; toState.parents.length ; i++){
				if(toState.parents[i].equals(this)){
					return true;
				}
			}
			for(int i = 0; i &lt; this.parents.length; i++){
				if(this.parents[i].equals(toState)){
					return true;
				}
			}
			return false;
		}

		public String getTagName() {
			return tagName;
		}

		// End enum instance methods and variables

		// Static methods, variable, and intializations
		static Map&lt;string,ParserState&gt; tagLookup = new HashMap&lt;string,ParserState&gt;();

		/*
		 * This code executes after the enums have been constructed.
		 *
		 * Because of order of execution when initializing an enum,
		 * you can't call static functions in an enum constructor.
		 * (They are constructed before static initialization).
		 *
		 * Instead, we use a static initializer to populate the lookup
		 * hashmap after all the enums are constructed.
		 */
		static {
			for(ParserState state : ParserState.values()){
				registerState(state);
			}
		}

		/**
		 * Maps a tag name to a ParserState
		 * @param tagName
		 * @return the ParserState for that tag.
		 */
		public static ParserState lookupStateForTag(String tagName){
			return tagLookup.get(tagName);
		}

		private static void registerState(ParserState state){
			tagLookup.put(state.tagName, state);
		}

	}
</pre>
<h2>XML Pull Parsing (chocolate)</h2>
<p>The idea with XML Pull Parsing is that the document can be represented as a stream of events describing the document content.  You <em>pull</em> the events off the stream as you process them (hence the name). These events come in five flavors: <a href="http://www.xmlpull.org/v1/doc/api/org/xmlpull/v1/XmlPullParser.html#START_DOCUMENT">START_DOCUMENT</a>, <a href="http://www.xmlpull.org/v1/doc/api/org/xmlpull/v1/XmlPullParser.html#END_DOCUMENT">END_DOCUMENT</a>, <a href="http://www.xmlpull.org/v1/doc/api/org/xmlpull/v1/XmlPullParser.html#START_TAG">START_TAG</a>, <a href="http://www.xmlpull.org/v1/doc/api/org/xmlpull/v1/XmlPullParser.html#END_TAG">END_TAG</a>, <a href="http://www.xmlpull.org/v1/doc/api/org/xmlpull/v1/XmlPullParser.html#TEXT">TEXT</a>.</p>
<p>The tags names are pretty easy to understand intuitively, except for TEXT, which is the event for free text inside of an element.</p>
<p>An example document like this:</p>
<pre class="brush: xml; title: ; notranslate">

&lt;foo&gt; bar &lt;baz&gt;jones&lt;/baz&gt; bleargh&lt;/foo&gt;
</pre>
<p>&#8230; would produce events in the following order: START_DOCUMENT, START_TAG, TEXT, START_TAG, TEXT, END_TAG, TEXT, END_TAG, END_DOCUMENT.</p>
<p>Pretty much every XML pull parser-based processor has a loop that looks like this at their core:</p>
<pre class="brush: java; title: ; notranslate">
	protected void processDocument() throws XmlPullParserException, IOException, PalantirException {

		// pull first event
		int eventType = parser.getEventType();

		do { // core loop
			switch(eventType){

			case XmlPullParser.START_DOCUMENT:
				log.debug(&quot;Start document&quot;);
				break;

			case XmlPullParser.START_TAG:
				processStartElement();
				break;

			case XmlPullParser.END_TAG:
				processEndElement();
				break;

			case XmlPullParser.TEXT:
				processText();
				break;

			// never called, here for completeness
			case XmlPullParser.END_DOCUMENT:
				log.debug(&quot;End document&quot;);
				break;

			}

			eventType = parser.next();

		} while (eventType != XmlPullParser.END_DOCUMENT);
	}
</pre>
<p>So the next logical thing to take a look at is the <strong>processStartElement()</strong> method. It turns out that here&#8217;s where the real peanut-butter &amp; chocolate synergy comes into play.  In this case, we call <a href="http://www.xmlpull.org/v1/doc/api/org/xmlpull/v1/XmlPullParser.html#getName()">getName()</a> on the parser object and it will tell us the name of the tag just started.</p>
<p>We want to:</p>
<ul>
<li>Check that it&#8217;s a valid state transition.</li>
<li>Change the state of the parser.</li>
<li>Dispatch to the appropriate method to deal with this kind of tag.</li>
</ul>
<p>Here&#8217;s the method (note that stateStack is of type <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/Stack.html">Stack&lt;ParserState&gt;</a>):</p>
<pre class="brush: java; title: ; notranslate">
	public void processStartElement() throws PalantirException, XmlPullParserException {

		String tagName = parser.getName();

		// check for state transition
		ParserState newState = ParserState.lookupStateForTag(tagName);
		if(newState != null){
			// we know it's a valid tag
			if(parserState.checkValid(newState)){
				// change FSA state
				stateStack.push(parserState);
				parserState = newState;
			}
			else{
				// invalid transition
				String message = &quot;Got illegal state transition in parser, suspect malformed XML.\n&quot; +
				&quot;At line &quot; + parser.getLineNumber() + &quot;, col &quot; + parser.getColumnNumber() + &quot;.\n&quot; +
				&quot;Invalid state transition: &quot; + parserState.toString() + &quot; -&gt; &quot; + newState.toString();
				log.error(message);
				throw new PalantirException(message);
			}
		}
		else{
			// unknown tag: ignore
		}

		// dispatch to handling method
		dispatch();
	}
</pre>
<p>You can see how we leverage the encoding in the ParserState Enum to make this a very simple method.  Since we&#8217;re encoding tag-to-state mappings, we can easily lookup the appropriate state for this tag (go back and look at the static initializer if I lost you there).  Once we know the proposed new state, we can check whether or not the state transition is a valid one, throwing an exception on an invalid state (which implies a malformed document).  Finally, we call <strong>dispatch()</strong>, which is nothing but a big switch statement using all the possible states from the Enum:</p>
<pre class="brush: java; title: ; notranslate">
	private void dispatch() throws XmlPullParserException, PalantirException {

		// logging statements removed for readability

		switch (parserState) {
		case DOMAIN:
			processDomain();
			break;
		case KINGDOM:
			processKingdom();
			break;
		case PHYLUM:
			processPhylum();
			break;
		case DIVISION:
			processDivision();
			break;
		case CLASS:
			processClass();
			break;
		case FAMILY:
			processFamily();
			break;
		case ORDER:
			processOrder();
			break;
		case GENUS:
			processGenus();
			break;
		case SPECIES:
			processSpecies();
			break;
		default:
			break;
		}
	}
</pre>
<p>The methods for doing the processing for TEXT and END_TAG methods are much simpler but also leverage the dispatch() method:</p>
<pre class="brush: java; title: ; notranslate">
	public void processEndElement() throws XmlPullParserException, PalantirException {
		dispatch();
		// already know to be valid, since transition
		// validity is bi-directional
		parserState = stateStack.pop();
	}

	public void processText() throws XmlPullParserException, PalantirException {
		dispatch();
	}
</pre>
<p>What you end up with is a very straightforward, readable, and <a href="http://boulter.com/blog/2004/08/19/performant-is-not-a-word/">performant</a> dispatching framework in your processor.  The code practically writes itself!  All that&#8217;s left to do is to define each of the tag processing methods, which will have to understand how to process the attributes (START_TAG), any free text (TEXT),  and finally close up any data structures which the tag data is being recorded to.  They will all loosely follow this form:</p>
<pre class="brush: java; title: ; notranslate">
	void processTag() throws XmlPullParserException {

		switch (parser.getEventType()) {
		case XmlPullParser.START_TAG:
			// handle attributes and tag existence here
			break;
		case XmlPullParser.END_TAG:
			// handle close-up here
			break;
		case XmlPullParser.TEXT:
			// handle free text here
			break;
		}
	}
</pre>
<p>This ends up being really nice since all the logic about each tag is encapsulated in a single method, facilitating debugging, modifications, and having other people read and actually understand your code (not the easiest thing to do with SAX Parser callbacks!).</p>
<h2>Putting it all together</h2>
<p>So the demo that shows this off parses the instance document that&#8217;s filled with entries like this:</p>
<pre class="brush: xml; title: ; notranslate">
 &lt;tns:domain name='Bacteria'&gt;
    &lt;tns:kingdom name='Monera'&gt;
      &lt;tns:phylum name='Proteobacteria'&gt;
        &lt;tns:class name='Proteobacteria'&gt;
          &lt;tns:order name='Enterobacteriales'&gt;
            &lt;tns:family name='Enterobacteriaceae'&gt;
              &lt;tns:genus name='Escherichia'&gt;
                &lt;tns:species name='E. coli' commonName=&quot;E. coli&quot;/&gt;
            &lt;/tns:genus&gt;
          &lt;/tns:family&gt;
        &lt;/tns:order&gt;
      &lt;/tns:class&gt;
    &lt;/tns:phylum&gt;
  &lt;/tns:kingdom&gt;
&lt;/tns:domain&gt;
</pre>
<p>It then spits out strings describing each species to our Swing-based console, like this:</p>
<pre class="console">
Lifeform 'E. coli':
Bacteria : Monera : Proteobacteria : Proteobacteria : Enterobacteriales : Enterobacteriaceae : Escherichia : E. coli</pre>
<p><a href="http://blog.palantirtech.com/wp-content/uploads/2007/05/launch_lifeonearth.jnlp" title="launch_lifeonearth.jnlp">Check it out</a> and let me know about any questions or improvements or errors in the comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/05/31/life-on-earth/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Custom Alpha Compositing</title>
		<link>http://blog.palantirtech.com/2007/03/27/custom-alpha-compositing/</link>
		<comments>http://blog.palantirtech.com/2007/03/27/custom-alpha-compositing/#comments</comments>
		<pubDate>Wed, 28 Mar 2007 03:57:50 +0000</pubDate>
		<dc:creator>Carl Freeland</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[swing]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/03/27/custom-alpha-compositing/</guid>
		<description><![CDATA[Every so often (can&#8217;t be more than once every two or three days), Swing doesn&#8217;t quite do what we need, and we end up writing customized code. In this case, all the available AlphaComposite instances provided with Java were variations on the theme of combining the colors and alpha channel of both source images into [...]]]></description>
			<content:encoded><![CDATA[<p>Every so often (can&#8217;t be more than once every two or three days), <a href="http://en.wikipedia.org/wiki/Swing_(Java)">Swing</a> doesn&#8217;t quite do what we need, and we end up writing customized code.  In this case, all the available <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/AlphaComposite.html" title="AlphaComposite">AlphaComposite </a>instances provided with Java were variations on the theme of combining the colors and alpha channel of both source images into a target image. (<a href="http://en.wikipedia.org/wiki/Alpha_compositing">Wikipedia&#8217;s <i>Alpha Compositing</i> article</a> is good background on the topic).</p>
<p align="left">What if what you really wanted was the color from one image and the alpha channel from another?  You&#8217;d be out of luck, but for the talents of Brien.  Here&#8217;s what you normally get with a standard <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/AlphaComposite.html#SRC_OVER" title="AlphaComposite.SRC_OVER">AlphaComposite.SRC_OVER</a> sort of technique.  In the following two examples, the icon is opaque and the rectangle is partially opaque black fading to transparency.</p>
<p align="center"><img src="http://blog.palantirtech.com/wp-content/uploads/2007/03/traditionaltransparency.png" alt="AlphaComposite.SRC_OVER" /></p>
<p align="left">What we needed looks more like this:</p>
<p align="center"><img src="http://blog.palantirtech.com/wp-content/uploads/2007/03/thepalantirway.png" alt="SourceAlphaComposite" /></p>
<p>Read on to find out how we did it, and why.<span id="more-44"></span></p>
<p align="left"> While it seems like this might be overkill, this was only a simple example.  In practice, we wanted a complex circular fade applied to another relatively complex graphic that couldn&#8217;t be easily replaced with primitive graphics operations.  With <a href="/wp-content/uploads/2007/04/sourcealphacomposite.java" title="SourceAlphaComposite.java">SourceAlphaComposite.java</a>, we were able to produce two graphics to get exactly the effect we wanted.  Be on the lookout for another cool usage of this class in the next posting.</p>
<p align="left">Assuming your image <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/ColorModel.html" title="ColorModel">ColorModel</a> supports transparency, the transparency of any given pixel is encoded in the bit patterns of the raster.  We picked an image type that makes things easy: in a <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/BufferedImage.html" title="BufferedImage">BufferedImage</a> of type <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/BufferedImage.html#TYPE_4BYTE_ABGR" title="BufferedImage.TYPE_4BYTE_ABGR">BufferedImage.TYPE_4BYTE_ABGR</a> the opacity of the pixel may be found in the fourth byte of the pixel information.  We extend the interface <a href="https://java.sun.com/j2se/1.5.0/docs/api/java/awt/CompositeContext.html" title="CompositeContext">CompositeContext</a> and implement its <a href="https://java.sun.com/j2se/1.5.0/docs/api/java/awt/CompositeContext.html#compose(java.awt.image.Raster,%20java.awt.image.Raster,%20java.awt.image.WritableRaster)" title="CompositeContext.compose">compose() method</a>.  The method has this fingerprint:</p>
<pre>
<code>public void compose(<a href="https://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/Raster.html" title="Raster">Raster</a> src, <a href="https://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/Raster.html" title="Raster">Raster</a> dstIn, <a href="https://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/WritableRaster.html" title="WritableRaster">WritableRaster</a> dstOut)</code>
</pre>
<p>The algorithm is simple:
</p>
<ol>
<li><a href="http://en.wikipedia.org/wiki/Bit_blit">blit</a> the <code>dstIn</code> argument to the output, <code>dstOut</code>.
<li> Loop over each pixel in the <code>src</code> argument, overwriting its opacity information into the corresponding pixel in <code>dstOut</code>.
</ol>
<p align="left"> A word of warning, however: since this class involves direct manipulation of the alpha channel of a <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/BufferedImage.html" title="BufferedImage">BufferedImage</a>, and not all image types support alpha, let alone in the same way, we picked the image types most convenient to us. Our code example only works with <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/BufferedImage.html" title="BufferedImages">BufferedImages</a> of type <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/BufferedImage.html#TYPE_4BYTE_ABGR" title="BufferedImage.TYPE_4BYTE_ABGR">BufferedImage.TYPE_4BYTE_ABGR</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/awt/image/BufferedImage.html#TYPE_INT_ARGB">BufferedImage.TYPE_INT_ARGB</a>; implementation of support for other image types should be pretty straightforward.</p>
<p>In any case, check out the source code and let me know if you have any questions, comments, or enhancements to share.</p>
<pre class="brush: java; title: ; notranslate">
package com.palantir.ui;

import java.awt.Composite;
import java.awt.CompositeContext;
import java.awt.RenderingHints;
import java.awt.image.BufferedImage;
import java.awt.image.ColorModel;
import java.awt.image.Raster;
import java.awt.image.WritableRaster;

/**
 * A {@link Composite} implementation that uses the destination RGB and the source alpha.
 * This can be used to perform alpha gradients.
 */
public class SourceAlphaComposite implements Composite, CompositeContext {
	public static SourceAlphaComposite createComposite(final BufferedImage bimage) throws UnsupportedBufferException {
		return createComposite(bimage.getType());
	}

	/**
	 * This factory currently supports &lt;code&gt;TYPE_4BYTE_ABGR&lt;/code&gt; and &lt;code&gt;TYPE_INT_ARGB&lt;/code&gt;.
	 */
	public static SourceAlphaComposite createComposite(int type) throws UnsupportedBufferException {
		switch ( type ) {
			case BufferedImage.TYPE_4BYTE_ABGR:
			case BufferedImage.TYPE_INT_ARGB:
				return new SourceAlphaComposite( 3 );

			default:
				throw new UnsupportedBufferException();
		}
	}

	private final int alphaIndex;

	public SourceAlphaComposite(final int alphaIndex) {
		if ( alphaIndex &lt; 0 ) {
			throw new IllegalArgumentException( &quot;There is no way a negative index will work.&quot; );
		}

		this.alphaIndex = alphaIndex;
	}

	public int getAlphaIndex() {
		return alphaIndex;
	}

	public CompositeContext createContext(
		final ColorModel srcColorModel, final ColorModel dstColorModel, final RenderingHints hints
	) {
		return this;
	}

	public void dispose() {
    	// Do nothing
    }

    public void compose(final Raster src, final Raster dstIn, final WritableRaster dstOut) {
    	final int
    		w = dstOut.getWidth(),
    		h = dstOut.getHeight();

    	final int n = src.getNumBands();
    	final int[] spixel = new int[ n ], opixel = new int[ n ], dpixel = new int[ n ];

    	for ( int x = 0; w &gt; x; x++ )
    		for ( int y = 0; h &gt; y; y++ ) {
    			src.getPixel( x, y, spixel );
    			dstIn.getPixel( x, y, dpixel );

    			// Use the destination color (except use the source alpha, below):
    			System.arraycopy( dpixel, 0, opixel, 0, opixel.length );

    			final int dalpha = dpixel[ alphaIndex ];
    			if ( 0 != dalpha ) {
	    			// Use the source alpha:
	    			opixel[ alphaIndex ] = dalpha * spixel[ alphaIndex ] / 0xFF;
    			}

    			dstOut.setPixel( x, y, opixel );
    		}
    }

    /**
     * Exception we throw for images that we don't support.
     */
    public static class UnsupportedBufferException extends Exception {
		public UnsupportedBufferException() {
		}

		public UnsupportedBufferException(final Exception cause) {
			super( cause );
		}
	}
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/03/27/custom-alpha-compositing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Unicode and happy user experiences</title>
		<link>http://blog.palantirtech.com/2007/03/06/unicode-and-happy-user-experiences/</link>
		<comments>http://blog.palantirtech.com/2007/03/06/unicode-and-happy-user-experiences/#comments</comments>
		<pubDate>Wed, 07 Mar 2007 01:42:33 +0000</pubDate>
		<dc:creator>Carl Freeland</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[javatech]]></category>
		<category><![CDATA[tips and tricks]]></category>

		<guid isPermaLink="false">http://blog.palantirtech.com/2007/03/06/unicode-and-happy-user-experiences/</guid>
		<description><![CDATA[Everyone agrees that it&#8217;s crucial to do validation on user input so that, among other things, your application never tries to write a value that&#8217;s too long into a database field with a specific limit. Users of your application shouldn&#8217;t, however, be left guessing whether the megabyte they pasted (and you know they will) into [...]]]></description>
			<content:encoded><![CDATA[<p>Everyone agrees that it&#8217;s crucial to do validation on user input so that, among other things, your application never tries to write a value that&#8217;s too long into a database field with a specific limit.  Users of your application shouldn&#8217;t, however, be left guessing whether the megabyte they pasted (and you <em>know</em> they will) into the eensy-teensy text field really got saved to the database or not.  So you should limit the text field itself so they get immediate feedback, rather than via some Johnnie-come-lately error message, or worse, a bunch of text gets dropped in the bit bucket.</p>
<p>One fairly well established technique is to write a <code><a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/swing/text/DocumentFilter.html">DocumentFilter</a></code>, and when <code><a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/swing/text/DocumentFilter.html#insertString(javax.swing.text.DocumentFilter.FilterBypass,%20int,%20java.lang.String,%20javax.swing.text.AttributeSet)">insertString()</a></code> or <code><a href="http://java.sun.com/j2se/1.5.0/docs/api/javax/swing/text/DocumentFilter.html#replace(javax.swing.text.DocumentFilter.FilterBypass,%20int,%20int,%20java.lang.String,%20javax.swing.text.AttributeSet)">replace()</a></code> is called, validate the added text and truncate as necessary to ensure the database field length is not exceeded.</p>
<p>Now the fun part.  What happens when you try to store your comments on <a href="http://www.unicode.org/charts/PDF/U07C0.pdf">N&#8217;Ko</a>, <a href="http://www.unicode.org/charts/PDF/U1800.pdf">Mongolian</a>, <a href="http://www.unicode.org/charts/PDF/U3100.pdf">Bopomofo</a> (phonetic markers, <a href="http://en.wikipedia.org/wiki/Zhuyin#Use_as_an_input_method">now commonly used as an input character set for Mandarin</a>), or even <a href="http://www.unicode.org/charts/PDF/U16A0.pdf">ancient Viking runes</a>? You get two choices, store as <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a> or <a href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1">ISO-8859-1 (aka Latin-1)</a>, or whatever, and you lose data.  Oops.  Or convert to <a href="http://en.wikipedia.org/wiki/UTF-16">UTF-16</a> or <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a>.  Hm.  Wait a minute, now the value (in bytes) is somewhere between 1-3 times as many bytes as the original String length.  So, how do you limit the text field to the number of bytes the database will permit?  If you picked UTF-16, it&#8217;s pretty simple, divide the database limit by two.  But it&#8217;s pretty wasteful of space, usually.  On the other hand, you can&#8217;t predict exactly how many bytes the UTF-8 representation needs until you try it out.</p>
<p>The following algorithm will produce a <code><a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html">String</a></code> which, <strong>if converted to supplied <code><a href="http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/Charset.html">Charset</a></code></strong>, will be no more than <code>maxBytes</code> in length.  It could be less, depending on the charset chosen and the text being trimmed.  This happens because it removes whole characters at once, which may trim several bytes, jumping you from 1 byte over the limit to two under.</p>
<p>public static String limitStringByBytes(String string, int maxBytes, String encoding) {<br />
   if(string == null)<br />
      return string;<br />
   int i = string.length() &#8211; 1;<br />
   int shaveBytes = computeByteLength(string,encoding) &#8211; maxBytes;<br />
   while ( shaveBytes &gt; 0 &amp;&amp; i &gt;= 0 ) {<br />
      shaveBytes -= computeByteLength( string.charAt( i ), encoding );<br />
      i&#8211;;<br />
   }<br />
   if( (i+1) &lt;= 0 )<br />
      return &#8220;&#8221;;<br />
   else if( (i+1) &gt;= string.length() )<br />
      return string;<br />
   else<br />
      return string.substring(0, i + 1 );<br />
}</p>
<p>As a final note (thanks to the comments by one of our faithful and numerous readers), we would like to acknowledge that we have indeed ignored the existence of the <a href="http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Supplementary_Multilingual_Plane">supplementary planes of Unicode mappings</a>, sticking to the <a href="http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Basic_Multilingual_Plane">Basic Multilingual Plane</a> in this example.  This avoids the even more intricate hassle of dealing with <a href="http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates">surrogate pairs</a>. If one of these rather obscure character encodings (<a href="http://www.unicode.org/charts/PDF/U1D000.pdf">Byzantine Music Symbols</a>, <a href="http://en.wikipedia.org/wiki/Phoenician_alphabet">Phoenician</a>, or my personal favorite, <a href="http://www.unicode.org/charts/PDF/U10400.pdf">Deseret</a> [editors note: yeah, I didn't know what it was either. <a href="http://en.wikipedia.org/wiki/Deseret_alphabet">Wikipedia to the rescue</a>], for example) should appear, it&#8217;s possible that they might be truncated mid-character.  According to the Unicode standard, <a href="http://unicode.org/faq/utf_bom.html#39">this is an error</a>, but also a very unlikely situation to encounter. Free Palantir t-shirt to the first person who posts a working example that properly deals with surrogates.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.palantirtech.com/2007/03/06/unicode-and-happy-user-experiences/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

