You are brilliant, We are Hiring. Find out more...
Palantir Tech Blog

Archive for the ‘javatech’ Category

Printing to Plotters in Java

Monday, August 11th, 2008

One of the things our customers love to do is print our beautiful object graphsландшафт and tape them to the wall for discussion. What they hate to do is print 30 pages, line them up, and tape them to a poster one at a time. So we bought a plotter, and I started plotting.

I needed to print directly to a Java Graphics object. Unfortunately, the available information on large output printing from Java is thin at best. While there are lots of ways to successfully place ink on paper, I was only able to find one that reliably lets the application pick odd paper sizes that plotters use, like 24×19.7 inches. (The term “plotter” used to mean something with pens for printing blueprints and such. Now it just means a large format printer, commonly printers that can use roll paper as a source.)

One of the first things you’ll learn when you start working with printing in Java is that a language intended to be all things to all people (i.e., cross-platform) is utterly lousy at tasks highly specific to a given environment, such as printing. It will not surprise you to hear that native print services on Windows are pretty different from those available on a Mac, which themselves are pretty different from the CUPS system common to Unix systems.

So, by and large, you are reduced to the least common denominator of printing. Part and parcel of this least common denominator is agreeing on what constitutes a piece of paper and sticking to it. This is fine for people thinking, “My paper is 8.5 inches wide by 11 inches tall.” It poses a bit of a problem for people with plotters who are thinking, “My paper is 24 inches wide by as many damned inches tall as I need.” Even relatively powerful programs like PhotoShop or GIMP don’t seem to support plotters well. I believe Photoshop works by specifying the exact paper size you want to use, but any technique in which the easiest solution for the user is to pull out a calculator does not meet with my approval.
(more…)

James Gosling comes to visit

Tuesday, March 11th, 2008
james gosling as a south park character

Following the discovery that our offices were the birthplace of Java (or least the place where it had its childhood), I invited James Gosling to come visit. For those that don’t know who James Gosling is, he’s more-or-less the father of Java. Java started as a project of James Gosling’s in 1991; today, 17 years later, he’s still at Sun, in charge of guiding the Java platform into the future.

How does one invite such a luminary to come visit one’s offices? One guesses what his email address is and sends him an email out of the blue:

James,

My name is Ari Gordon-Schlosberg, an engineer at Palantir Technologies. I recently became interested in the storied history of our current facilities at 100 Hamilton Ave. in Palo Alto. As Java programmers, our engineering team is really excited to be working in the same place that gave the world Java.

You may not have heard of Palantir, but we’re working on some pretty interesting problems, using Java to build large-scale analysis applications that really push forward the state-of-the-art. We’ve won some accolades for our use of Swing by Romain Guy. If you felt like dropping by the next time you’re in the valley, we’d love to have you come by, see your old digs, and take a peek at what we’re working on.

Sincerely,

Ari Gordon-Schlosberg

To quote the Microsoft Program Manager’s book of proverbs: 90% of making things happen is sending email.

So James dropped by one Thursday for demos, lunch, and schmoozing with our engineers.

The first order of business was to demo our software to James. We got a bunch of the senior engineers together and showed him an abbreviated demo of both Palantir Government and Palantir Finance. We focused less on the problem-space aspects of the software and more on how we’re using Java to build the application. We went over how both of our apps are completely written in Java and that our GUIs are built with custom Swing components.

The most memorable part of the conversation went something like this:

LEAD DEV: So… what do you think of our applications?

GOSLING: It makes me want to weep.

LEAD DEV: Uh… ?

GOSLING: Yeah, we’ve been working on this infrastructure for years to be able to build applications like this and finally someone is doing it.

jag.jpg

The rest of the visit was spent talking about Java, its history and its future. Topics ranged from why it’s hard to get dinosaurs like cable companies and mobile carriers to use modern technology to some of the complication in building an optimizing JIT compiler.

After lunch, I walked him to the elevator to see him off. We said our goodbyes and he stepped into the elevator, which was already occupied by the mailman making his rounds. As the doors closed, I hear the mailman say to James:

“Well, I haven’t seen you around here in a while.”

Best Practices: compareTo consistent with equals

Sunday, September 2nd, 2007

What is wrong with this class and why? I’ll tell you beforehand there are two things I am looking for and they are both in the compareTo function. Yes, this came from the Palantir code base and caused me some issues. It has been modified slightly for illustrative purposes.

(more…)

SimpleDateFormat is not thread-safe

Wednesday, July 11th, 2007

It seems like a relatively common mistake is to assume that the java.text.SimpleDateFormat class is thread-safe (at least for methods such as format(), which you might not expect to mutate state!). This is not true; SimpleDateFormat is not thread-safe, and format() does mutate state. From the javadoc:

Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.

Suggestions on handling synchronization:

N.B. In general, you should be very careful when using ThreadLocal storage, especially when using thread pools (your data won’t be automatically garbage collected and may be visible to other Threads—a security risk) or when storing references to Thread objects in ThreadLocal storage (which might confuse the GC). However, in this case, we should be okay.

XML Pull Parsing and Enums: like chocolate and peanut butter

Thursday, May 31st, 2007

Enumeration Screenshot.

There comes a time in every developer’s life when they need to write code that processes some XML. Lately, we’ve seen the proliferation of APIs that make XML processing easier, like JAXB (Java API for XML Binding). However, when speed and scale are required, chances are you’re going to need to roll your own processor. Before I continue, let me clear up some terminology, when I say “processor”, I mean the code of yours that’s wrapped around a SAX (tutorial), DOM (tutorial), or an XPP (tutorial) parser, not the guts of the parser itself.

At the end of the day, that’s the interesting part of what you’re doing - the grammar of your data model rather than the minutiae of start and end tags. Building a processor is the interface between the data interchange format and the internal data model of your application.

Click through for a tour of XML parsers and a look at a novel technique for encoding processors that use pull parsers (as usual, we’ve included a WebStart demo, as well as a jar file containing the compiled example along with all of its source code).
(more…)

Custom Alpha Compositing

Tuesday, March 27th, 2007

Every so often (can’t be more than once every two or three days), Swing doesn’t quite do what we need, and we end up writing customized code. In this case, all the available AlphaComposite instances provided with Java were variations on the theme of combining the colors and alpha channel of both source images into a target image. (Wikipedia’s Alpha Compositing article is good background on the topic).

What if what you really wanted was the color from one image and the alpha channel from another? You’d be out of luck, but for the talents of Brien. Here’s what you normally get with a standard AlphaComposite.SRC_OVER sort of technique. In the following two examples, the icon is opaque and the rectangle is partially opaque black fading to transparency.

AlphaComposite.SRC_OVER

What we needed looks more like this:

SourceAlphaComposite

Read on to find out how we did it, and why. (more…)

Unicode and happy user experiences

Tuesday, March 6th, 2007

Everyone agrees that it’s crucial to do validation on user input so that, among other things, your application never tries to write a value that’s too long into a database field with a specific limit. Users of your application shouldn’t, however, be left guessing whether the megabyte they pasted (and you know they will) into the eensy-teensy text field really got saved to the database or not. So you should limit the text field itself so they get immediate feedback, rather than via some Johnnie-come-lately error message, or worse, a bunch of text gets dropped in the bit bucket.

One fairly well established technique is to write a DocumentFilter, and when insertString() or replace() is called, validate the added text and truncate as necessary to ensure the database field length is not exceeded.

Now the fun part. What happens when you try to store your comments on N’Ko, Mongolian, Bopomofo (phonetic markers, now commonly used as an input character set for Mandarin), or even ancient Viking runes? You get two choices, store as ASCII or ISO-8859-1 (aka Latin-1), or whatever, and you lose data. Oops. Or convert to UTF-16 or UTF-8. Hm. Wait a minute, now the value (in bytes) is somewhere between 1-3 times as many bytes as the original String length. So, how do you limit the text field to the number of bytes the database will permit? If you picked UTF-16, it’s pretty simple, divide the database limit by two. But it’s pretty wasteful of space, usually. On the other hand, you can’t predict exactly how many bytes the UTF-8 representation needs until you try it out.

The following algorithm will produce a String which, if converted to supplied Charset, will be no more than maxBytes in length. It could be less, depending on the charset chosen and the text being trimmed. This happens because it removes whole characters at once, which may trim several bytes, jumping you from 1 byte over the limit to two under.


As a final note (thanks to the comments by one of our faithful and numerous readers), we would like to acknowledge that we have indeed ignored the existence of the supplementary planes of Unicode mappings, sticking to the Basic Multilingual Plane in this example. This avoids the even more intricate hassle of dealing with surrogate pairs. If one of these rather obscure character encodings (Byzantine Music Symbols, Phoenician, or my personal favorite, Deseret [editors note: yeah, I didn't know what it was either. Wikipedia to the rescue], for example) should appear, it’s possible that they might be truncated mid-character. According to the Unicode standard, this is an error, but also a very unlikely situation to encounter. Free Palantir t-shirt to the first person who posts a working example that properly deals with surrogates.

Oracle’s JDBC driver + prefetch == garbage [collection]

Friday, February 23rd, 2007

The Problem

Recently, we were experiencing major performance problems with loading documents from the database. Profiling did not isolate a single cause; everything (including unrelated, background operations) seemed slow. So, we started logging garbage collection, and found that we were collecting garbage at a rate of 20GB/min!

Profiling revealed that the worst offender, by far, was OracleStatement.prepareAccessors(). Interestingly, it only caused a problem when our result set included a LOB. For such queries, it allocates a 1MB object, regardless of whether the query returned any results at all.

Google searches revealed others who saw similar problems when accessing LOBs, but no solutions other than upgrading or changing drivers. We were already using the latest Oracle JDBC driver, and reverting to earlier drivers did not help. Switching drivers did solve the problem; however, pushing the change to production would require extensive testing to ensure that we were not trading in one problem for another (or more).

I was about to start conducting these tests when John discovered that we were setting the OracleConnection parameter “defaultRowPrefetch” to 1000. This parameter determines how many rows are pulled back from the database on each round-trip, and increasing this value from its default of 10 will normally yield a performance gain. As an experiment, I set the value to 1, and re-profiled memory allocation. The amount of memory allocated by OracleStatement.prepareAccessors() decreased by about three orders of magnitude. Thus, it appears that when a query can return a LOB, Oracle’s JDBC driver allocates approximately “rowPrefetch” KB of memory, even when zero rows are returned.

The Solution

Returning the “defaultRowPrefetch” parameter to 10 did rid us of our garbage collection problems. However, because this is a global setting, reverting it reduced the performance of many other queries which returned many rows with no LOBs. The prospect of setting “rowPrefetch” on a per-query basis was unappetizing, to say the least, but the performance loss was significant. In the end, we altered how we retrieve rows from the database so that the fetch size geometrically increases as we pull results back from the database.

Specifically, the first batch we retrieve contains at most 10 rows, after which we increase the batch size to 20. Once we’ve retrieved 20 more rows, we increase the fetch size to 40, and so on. In this way, we never allocate large amounts of memory for queries which return few (or no) results, but we still quickly ramp up to a large fetch size.

For large queries which returned no LOBs, this solution is still slower than when “defaultRowPrefetch” was 1000. However, the slowdown on those queries was minor, overall system performance was substantially improved, and, importantly, the changes did not require any per-query tuning.

Add speel checking to your Swng text components (the squiggly way)

Monday, February 12th, 2007



web start | download source

Marking up txt

Let’s hook Swing text components up to some tokenizing logic: a spell checker (the example above uses Jazzy), a regex (the example above will pick out some electronic musicians), or something more advanced.

Like all Swing components, text components are factored into an MVC setup. The model is javax.swing.text.Document; the view is javax.swing.plaf.TextUI (which delegates out to a javax.swing.text.View, which is generated from some ViewFactory); and the controller is the text component itself. A very simple way to add the notion of a token to this setup is to create a new kind Document – a TokenDocument.

When text is inserted into a token document, not only does it need to be tokenized, but existing tokens need to be shifted. We could do this manually; however, all javax.swing.text.Document provide something called a sticky position (javax.swing.text.Position) that that will do the shifting for us. Sticky positions are automatically updated by the document to reflect insertions and deletions of text. They also are guaranteed to maintain their ordering – that is, if position A is <= position B, it will always remain that way. This means the token document can maintain sorted trees of sticky positions (to store tokens) without worrying that their sort order will change.

Once we have the tokens in the model, we need to hook them into the view. We do this through a custom TextUI. It basically does everything a BasicTextUI does except it also paints a token layer underneath the highlights (above the background). In general with the javax.swing.text package, whenever code start painting outside of the view bounds (for each offset, this is the tightest bounding box for the letter at that offset), the dirty region needs to be expanded to include everywhere that was painted. In this code, you’ll see a line in the UI to deal with the dirty region.

Playing wth lines

Custom strokes like the squiggle stroke (above left) and smoothed noise stroke (above right) help give meaning to lines. Also, they can make an interface more fun.

Wrap pu

In this code, we extended the UI to paint lines under text. To change the display of the text itself, we would have to write our own View implementation (or more likely, extend PlainView [0]). This is not exclusive of the approach we took here. A more powerful View implementation could work in tandem with our custom UI, opening up even more ways to present information extracted from user-entered text.

Until next time!

[0] There’s a great introduction to this at Customizing a Text Editor, an article on the Sun Developer Network.

LICENSE — I wrote this using the Jazzy spell check engine + some open source trinkets (especially a Perlin noise generator). Except Jazzy (which is LGPL’d), all of it is Apache/BSD licensed.