Wednesday, June 11, 2008

Lucene concurrent search performance with 1,2,4,8 IndexReaders

My last Lucene evaluation (Simultaneous (Threaded) Query Lucene Performance) from a couple of days ago was looking at concurrent (multithreaded) queries using a single IndexReader across all threads. Due to suggestions/demand from the Lucene User mailing list, I have expanded the evaluation to include multiple IndexReaders.
It is known that a single IndexReader is a limiting factor in a multithreaded environment. So I decided to run the same tests with 1,2,4 and 8 IndexReaders (actually I create IndexReaders and then create an IndexSearcher from each of these and share the IndexSearchers).
Below are the results. All of the test environment are the same as my previous evaluation, except:

  • It goes up to 8192 threads instead of the original 4096 threads
  • I had to pass in to the Java VM: -Xmx4000m because the VM was running out of heap for 8 readers
  • I've made the graph larger

(Click on graphic to see results)

As you can see, 2,4 and 8 readers significantly improve query rate over a single shared reader, between ~10 and 512 threads. The overall winner is 4 readers, showing marked improvement over 2 and 8 readers in the range from 16 threads to about 512 threads. After this point all numbers of readers are effectively the same.

I am not sure why 4 readers appears to be the sweet spot for this particular configuration. I will have to re-run this experiment with a finer granularity of readers (1,2,3,4,5,6,7,8,9,10,12,14,16 readers). However, remember that this machine is a dual CPU, dual core configuration (4 real cores, hyper-threaded to 8 virtual cores): it may be that having the same number of readers as the number of physical cores improves things, perhaps through less state swapping. I am not an expert. With more evaluations we may be able to comment more intelligently on this.

This is just one data point, but I hope it will be helpful.
I am still planning on releasing the code for the evaluation and the results plotting with gnuplot.

I would appreciate any feedback.

This work was done as part of a Lucene evaluation for my employer, CISTI, the National Research Council Canada. I work with Lucene as it is relevant to my research, with an example of some of my Lucene-based research here: Ungava.

5 comments:

Anonymous said...

Hi,Glen Newton.Could you share your test code?

Glen Newton said...

Yes, I am going to clean up the code and add some reasonable documentation & release the code. I'll do that next week as I am at JCDL2008 this week, so I don't have time this week.

I will announce here on Zzzoot and on the Lucene Users mailing list.
Thanks!

Anonymous said...

thanks for quickly reply!

Mike Stoppelman said...

Any word on the code being released Glen?

Anonymous said...

Hi, Glen, I wrote another msg, but the 4 old comments on the code, are apparead suddenly now ?!? I'm interested on the difference between your corpus and others, bigger then this one, if the trend curve (time of parallel queries, time of indexing, ..) are linear or what . Thanks, Paolo