<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Palantir: search with a twist (part one: memory efficiency)</title>
	<atom:link href="http:///2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/</link>
	<description>Articles from the Engineering Group at Palantir Technologies</description>
	<lastBuildDate>Tue, 24 Jan 2012 09:51:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Katherine</title>
		<link>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/comment-page-1/#comment-240</link>
		<dc:creator>Katherine</dc:creator>
		<pubDate>Thu, 01 Oct 2009 23:20:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1088#comment-240</guid>
		<description>Mark,
  Sorry if the blog post didn&#039;t make it clear - we think Lucene is some of the best third-party searching software out there, which is why we use it as the base index/search engine, and we&#039;re not trying to discourage people from using it. However, no generic library is going to fit our exact use case, and we&#039;re outlining some places where we&#039;ve run into that. The post also elided over some technical details for the sake of brevity - apologies if you feel that amounted to a misrepresentation of Lucene.

About loading results: Lucene does use a PriorityQueue for the results, so our description would be substantially different from what actually happens if we didn&#039;t want access to all the matching results. Unfortunately, our use case requires that users be able to see all matching results if they want to - in order to get n results from out-of-the-box Lucene, even if you plan to break them into k pages of size n/k, you have to size the PriorityQueue to hold n results. This means that you&#039;re effectively loading all matching results - supporting paging over chunks of results is not built in.
 
 About RangeQueries: We initially started using Lucene in 2005, so some of our custom code has been replicated in later versions. Some of it hasn&#039;t - for instance, we use the same range-query logic for both ranges and for broad wildcards, although only numeric ranges were discussed in the post. The latter does not have support that I know of as of Lucene 2.4 (the version we&#039;re currently using, and the most current release as of the time of the original post). 
However, the 2.9 release that came out last week looks like it may have added similar functionality, using index-size-based cutoffs instead of the memory-size-based ones we use (I haven&#039;t traced through all the new source code yet, but that&#039;s what I&#039;ve seen so far). 
And, since Lucene is a generic platform, some custom code has been replicated in a more generic version. For example, Payloads, which were introduced after we started using Lucene, are similar to but not as optimized for our use case as how we do security enforcement.</description>
		<content:encoded><![CDATA[<p>Mark,<br />
  Sorry if the blog post didn&#8217;t make it clear &#8211; we think Lucene is some of the best third-party searching software out there, which is why we use it as the base index/search engine, and we&#8217;re not trying to discourage people from using it. However, no generic library is going to fit our exact use case, and we&#8217;re outlining some places where we&#8217;ve run into that. The post also elided over some technical details for the sake of brevity &#8211; apologies if you feel that amounted to a misrepresentation of Lucene.</p>
<p>About loading results: Lucene does use a PriorityQueue for the results, so our description would be substantially different from what actually happens if we didn&#8217;t want access to all the matching results. Unfortunately, our use case requires that users be able to see all matching results if they want to &#8211; in order to get n results from out-of-the-box Lucene, even if you plan to break them into k pages of size n/k, you have to size the PriorityQueue to hold n results. This means that you&#8217;re effectively loading all matching results &#8211; supporting paging over chunks of results is not built in.</p>
<p> About RangeQueries: We initially started using Lucene in 2005, so some of our custom code has been replicated in later versions. Some of it hasn&#8217;t &#8211; for instance, we use the same range-query logic for both ranges and for broad wildcards, although only numeric ranges were discussed in the post. The latter does not have support that I know of as of Lucene 2.4 (the version we&#8217;re currently using, and the most current release as of the time of the original post).<br />
However, the 2.9 release that came out last week looks like it may have added similar functionality, using index-size-based cutoffs instead of the memory-size-based ones we use (I haven&#8217;t traced through all the new source code yet, but that&#8217;s what I&#8217;ve seen so far).<br />
And, since Lucene is a generic platform, some custom code has been replicated in a more generic version. For example, Payloads, which were introduced after we started using Lucene, are similar to but not as optimized for our use case as how we do security enforcement.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Harwood</title>
		<link>http://blog.palantirtech.com/2009/08/13/palantir-search-with-a-twist-part-one-memory-efficiency/comment-page-1/#comment-239</link>
		<dc:creator>Mark Harwood</dc:creator>
		<pubDate>Thu, 01 Oct 2009 16:26:09 +0000</pubDate>
		<guid isPermaLink="false">http://blog.palantirtech.com/?p=1088#comment-239</guid>
		<description>Some derogatory and misleading information here about Lucene. 
 
&gt;&gt;Vanilla Lucene uses the following algorithm for accumulating search results: Load all matching results.
 
No it doesn&#039;t and never has. PriorityQueues are used everywhere.
 
 
&gt;&gt;RangeQuery… has one very nasty property
 
That is why the Javadocs tell you not to use it and the QueryParser hasn&#039;t supported it as the default for quite some time.
The alternative approach you describe is implemented in ConstantScoreRangeQuery which is the newer default in the query parser. 
 
Are you working on very old versions of Lucene? If so please target your comments at something vaguely recent or take the trouble to read the current documentation more closely before spreading bad advice.</description>
		<content:encoded><![CDATA[<p>Some derogatory and misleading information here about Lucene.<br />
 <br />
&gt;&gt;Vanilla Lucene uses the following algorithm for accumulating search results: Load all matching results.<br />
 <br />
No it doesn&#8217;t and never has. PriorityQueues are used everywhere.<br />
 <br />
 <br />
&gt;&gt;RangeQuery… has one very nasty property<br />
 <br />
That is why the Javadocs tell you not to use it and the QueryParser hasn&#8217;t supported it as the default for quite some time.<br />
The alternative approach you describe is implemented in ConstantScoreRangeQuery which is the newer default in the query parser.<br />
 <br />
Are you working on very old versions of Lucene? If so please target your comments at something vaguely recent or take the trouble to read the current documentation more closely before spreading bad advice.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

