These benchmarks are on the same machine, VM, OS described in my earlier Lucene indexing performance benchmarks, however the index is a little different: it is an index of title, author, journal name, keywords, etc metadata (no full-text) for 25.6M journal articles. The index size is 19GB and -- using the same framework as the previous benchmarks, above -- indexing time of 4.25 hours. YMMV.
Using a set of 2900 user queries (ranging from single word queries to queries with >600 characters and using multiple fields and operators; no range queries), Lucene was pre-warmed with 2000 (different) queries. Ten runs were performed and averaged.
Below are the results plotting #requests per second handled vs. #threads making requests. This was all run in the same VM, using an instance of java.util.concurrent.ThreadPoolExecutor to parallelize things:
The best results were for 6 or 7 threads. Interesting how the response flattens-out at around 32 threads and stays steady until ~1024 threads..
Of course, we were interested in the wait times of end-users, so below I've plotted the average wait times of users. It is calculated:
It is an approximation of course, but good enough to get a general idea.
As you can see below, the average wait times of requests are:
- less than 0.08 seconds for up to 32 requests / second
- less than 0.5 seconds for up to 192 requests / second
- less than 1 seconds for up to 256 requests / second
- less than 2 seconds for up to 768 requests / second
I am running a much larger query set of ~900k queries as we speak, but I don't think it will be finished for another day or so. I will post the results when they are ready, although preliminary results suggest that the performance on this query data set will be poorer (probably due to the nature of the queries: many "b*" types of query terms).
I am going to clean-up the code that does this testing and release it in the next week or so.
PS. The plots were done using gnuplot. Thanks, gnuplot!
Update 2008-06-10: As pointed-out in some follow-ups to my original posting on the Lucene User list for these benchmarks, I left-out some details:
- The index format was the compound format
- No command line arguments were passed to the Java VM.
- One IndexSearcher is shared across all threads.