Palantir Finance Applied to Log4J Data
August 26th, 2009 |
In a previous post, Eric W. covered how we analyze polled system health information. Now we’ll look at pushed information, in the form of logging events.
Use Cases & Constraints
We decided on three kinds of questions we wanted to answer:
- What is the health of the deployment?
- Example: What errors have occurred in the last 24 hours?
- Which parts of the platform are our users engaged with?
- Example: How much time do users spend in each application?
- How is our server performing over time?
- Example: What is the average wait on a search query?
The chief constraint was that we build our platform on Log4J. We already use Log4J all over the project, so converting the logging was out of the question. Besides, Log4J provides a guideline for the kind of metadata our events should support, and Log4J makes it easy to record events to a database.
That left us with two problems to solve: how to store structured data with a Log4j message, and how to analyze the collected data.
Analysis is the easy part: just use Palantir! After all, a sequence of logging events has a lot in common with a time series. The rest is explained below.
Recording Structured Data
Consider the problem of plotting usage by a user on a given day. The simplest approximation is to log an event every time an application is closed, and provide the time spent in the application with that event. Posting the information as an unstructured String–e.g. “Andrew spent 46 seconds in Chart”–makes it difficult to later extract the data for analysis. To solve this problem we developed the class RichLogEntry.
RichLogEntrys contain a human-readable message and tagged data in the form of a set of key/value pairs, such as {duration, 46}, {user, Andrew}. This adds to the up-front cost since log messages become more complex, but the benefit is that the analysis engine can easily and generically access data in events.
Furthermore, RichLogEntry plays nicely with existing Log4J infrastructure. Loggers in Log4J already accept an arbitrary Object to pass to Appenders, and Log4J’s default Appenders call toString() on the Objects provided. For RichLogEntry the toString() is simply the human-readable message. So a call to the logging framework with a RichLogEntry would look like (pseudo-Java):
logger.info( (“Andrew just spent 46 seconds in Chart”,
{“duration” : 46.0, “application” : “Chart”, “user” : “Andrew”}) );
For most Appenders this would produce the human-readable String, but our custom Appender knows how to store the tagged data for later analysis.
Example: Application Usage
We implemented the above (i.e. log a “duration” message each time a Palantir application loses focus), and hooked in the data with a Palantir Data Provider plugin.
In the image below, we’re using our Explorer application to analyze the logging data. Our filter framework combines filtering and visualization into a single application. The image contains three filters from top to bottom, each containing a blue title bar. The results of each filter are fed into the filter below it:
- The top filter separates messages by application and displays statistics for each. We’ve selected the Explorer application, so its 144 messages will be fed into the next filter.
- The middle filter has a histogram of the number of seconds each Explorer instance was active (in log scale). Each “bucket” represents a range of durations, and its height is the number of messages with that duration. It looks like I usually spend around 10 seconds in Explorer before switching windows. We’re selecting the gray part of the histogram to avoid skewing our results for the times I’ve gone AFK with Palantir running.
- The bottom filter counts the number of log events over time.
In Palantir Finance, filters such as these can be saved and used anywhere in the platform. Let’s do that, and compare my usage to my coworker Eric L’s. Creating a new set of filters for Eric is easy–I just modify a single filter above to specify him instead of me (the filter isn’t shown for simplicity’s sake), and then save a new copy. Our Chart application is a good place to view the two series side by side:
Of course, I’m the harder worker!
Conclusion
Our logging framework is complete, and we’ve found many new use cases. We use the framework to:
- analyze performance metrics across builds of both Palantir products
- automatically compile usage reports on deployed installations
- import and explore exotic event data sets by running the events through Log4J
Building the Log4J analysis framework was valuable, fun and easy; and it demonstrates the flexibility of Palantir Finance for working with arbitrary data sets.







