You are brilliant, We are Hiring. Find out more...
Palantir Tech Blog

Archive for February, 2007

Oracle’s JDBC driver + prefetch == garbage [collection]

Friday, February 23rd, 2007

The Problem

Recently, we were experiencing major performance problems with loading documents from the database. Profiling did not isolate a single cause; everything (including unrelated, background operations) seemed slow. So, we started logging garbage collection, and found that we were collecting garbage at a rate of 20GB/min!

Profiling revealed that the worst offender, by far, was OracleStatement.prepareAccessors(). Interestingly, it only caused a problem when our result set included a LOB. For such queries, it allocates a 1MB object, regardless of whether the query returned any results at all.

Google searches revealed others who saw similar problems when accessing LOBs, but no solutions other than upgrading or changing drivers. We were already using the latest Oracle JDBC driver, and reverting to earlier drivers did not help. Switching drivers did solve the problem; however, pushing the change to production would require extensive testing to ensure that we were not trading in one problem for another (or more).

I was about to start conducting these tests when John discovered that we were setting the OracleConnection parameter “defaultRowPrefetch” to 1000. This parameter determines how many rows are pulled back from the database on each round-trip, and increasing this value from its default of 10 will normally yield a performance gain. As an experiment, I set the value to 1, and re-profiled memory allocation. The amount of memory allocated by OracleStatement.prepareAccessors() decreased by about three orders of magnitude. Thus, it appears that when a query can return a LOB, Oracle’s JDBC driver allocates approximately “rowPrefetch” KB of memory, even when zero rows are returned.

The Solution

Returning the “defaultRowPrefetch” parameter to 10 did rid us of our garbage collection problems. However, because this is a global setting, reverting it reduced the performance of many other queries which returned many rows with no LOBs. The prospect of setting “rowPrefetch” on a per-query basis was unappetizing, to say the least, but the performance loss was significant. In the end, we altered how we retrieve rows from the database so that the fetch size geometrically increases as we pull results back from the database.

Specifically, the first batch we retrieve contains at most 10 rows, after which we increase the batch size to 20. Once we’ve retrieved 20 more rows, we increase the fetch size to 40, and so on. In this way, we never allocate large amounts of memory for queries which return few (or no) results, but we still quickly ramp up to a large fetch size.

For large queries which returned no LOBs, this solution is still slower than when “defaultRowPrefetch” was 1000. However, the slowdown on those queries was minor, overall system performance was substantially improved, and, importantly, the changes did not require any per-query tuning.

Tips and tricks for immutable objects

Tuesday, February 20th, 2007

IBM DeveloperWorks has a good article on tips and tricks when creating immutable objects. Everyone knows that a good way to enforce thread safety without locks or explicit customization is to use immutable objects. This article covers some of the tricky parts of doing that:

  • You have to remember to make the members final.
  • You can cache state but not compute it in the constructor if you make it a write-once field. (Which, interestingly, does not mean it’s only written once. It means that it’s only ever written with one value.)
  • Write-once fields are guaranteed for both primitives and objects, but there’s no guarantee with objects that you see the same object reference between accesses.

Read the whole thing.

Thoughts on security

Wednesday, February 14th, 2007

A quick read on the pitfalls of designing computer security, The Six Dumbest Ideas In Security:

Let me introduce you to the six dumbest ideas in computer security. What are they? They’re the anti-good ideas. They’re the braindamage that makes your $100,000 ASIC-based turbo-stateful packet-mulching firewall transparent to hackers. Where do anti-good ideas come from? They come from misguided attempts to do the impossible - which is another way of saying “trying to ignore reality.” Frequently those misguided attempts are sincere efforts by well-meaning people or companies who just don’t fully understand the situation, but other times it’s just a bunch of savvy entrepreneurs with a well-marketed piece of junk they’re selling to make a fast buck. In either case, these dumb ideas are the fundamental reason(s) why all that money you spend on information security is going to be wasted, unless you somehow manage to avoid them.

A well-written piece that’s worth reading for anyone that’s implementing computer security, either at an operational level or as a software engineer.

Add speel checking to your Swng text components (the squiggly way)

Monday, February 12th, 2007



web start | download source

Marking up txt

Let’s hook Swing text components up to some tokenizing logic: a spell checker (the example above uses Jazzy), a regex (the example above will pick out some electronic musicians), or something more advanced.

Like all Swing components, text components are factored into an MVC setup. The model is javax.swing.text.Document; the view is javax.swing.plaf.TextUI (which delegates out to a javax.swing.text.View, which is generated from some ViewFactory); and the controller is the text component itself. A very simple way to add the notion of a token to this setup is to create a new kind Document – a TokenDocument.

When text is inserted into a token document, not only does it need to be tokenized, but existing tokens need to be shifted. We could do this manually; however, all javax.swing.text.Document provide something called a sticky position (javax.swing.text.Position) that that will do the shifting for us. Sticky positions are automatically updated by the document to reflect insertions and deletions of text. They also are guaranteed to maintain their ordering – that is, if position A is <= position B, it will always remain that way. This means the token document can maintain sorted trees of sticky positions (to store tokens) without worrying that their sort order will change.

Once we have the tokens in the model, we need to hook them into the view. We do this through a custom TextUI. It basically does everything a BasicTextUI does except it also paints a token layer underneath the highlights (above the background). In general with the javax.swing.text package, whenever code start painting outside of the view bounds (for each offset, this is the tightest bounding box for the letter at that offset), the dirty region needs to be expanded to include everywhere that was painted. In this code, you’ll see a line in the UI to deal with the dirty region.

Playing wth lines

Custom strokes like the squiggle stroke (above left) and smoothed noise stroke (above right) help give meaning to lines. Also, they can make an interface more fun.

Wrap pu

In this code, we extended the UI to paint lines under text. To change the display of the text itself, we would have to write our own View implementation (or more likely, extend PlainView [0]). This is not exclusive of the approach we took here. A more powerful View implementation could work in tandem with our custom UI, opening up even more ways to present information extracted from user-entered text.

Until next time!

[0] There’s a great introduction to this at Customizing a Text Editor, an article on the Sun Developer Network.

LICENSE — I wrote this using the Jazzy spell check engine + some open source trinkets (especially a Perlin noise generator). Except Jazzy (which is LGPL’d), all of it is Apache/BSD licensed.

Pipes: using unix pipelines for beautiful answers to quick and dirty questions

Wednesday, February 7th, 2007
/loony/bin

As we approach a release at Palantir we usually cut to a stable branch that QA can start testing as a release candidate. Further bug fixing and testing may continue on trunk by the developers, but we code review changes before committing them to the stable branch. As the time to really cut the release gets truly imminent we start asking questions like:

What changes are on trunk that are not in the stable branch?

We’re less concerned with what the changes are and more concerned with who owns the changes. What really want to know is:

Do the changes on trunk represent pending changes that should be moved to stable or are they further development that shouldn’t be put into the stable branch for this release?

For the most part, the person that can answer that question is the coder who made the changes on trunk. To that end, what we really would love to have would be a report of all files in trunk that differ from the stable branch and who last touched the file. There isn’t really an svn command that will do this succintly, so I started thinking about how to accomplish this. I had an inkling that it could be all solved with a single Unix pipeline and so I set out on my way to craft such a beast. Here’s what I came up with in about ten minutes:

for name in `diff -r --brief --exclude=.svn pgstable/src pgtrunk/src  | awk '{print $4}' | grep pgtrunk `; do
    author=`svn info $name | grep -E "Last Changed Author" | awk '{print $4}'`;
    echo $author    $name;
done | sort | sed 's/pgtrunk\\/src\\///' > difflist.txt

Which produces output that looks like this:

gbush com/palantir/foo/Bar.java
bclinton com/palantir/baz/Fargle.java

How did I come up with such a beast? I deconstruct this inscrutable wonder after the jump.
(more…)