Posts

Showing posts from June, 2008

If you only read one article on Cyberinfrastructure...

...it should probably be this one: Jelinkova, K., Carvalho, T., Kerian, D., Knosp, B., Percival, K., Yagi, S. (2008). Creating a Five-Minute Conversation About Cyberinfrastructure . EDUCAUSE Quarterly, 31 (2), 78-82. Thanks Roy Tennant et al ., Current Cites June 2008 .

Re-reading "Godel, Escher, Bach: An Eternal Golden Braid"

I have decided to re-read Douglas Hofstadter's " Godel, Escher, Bach: An Eternal Golden Braid ". When I first read it - 25+ years ago - it significantly changed how I looked at many things in the world and I would describe it as a seminal book in my development. What I am wondering is if I will have new revelations on re-reading it (I am quite sure) as I and the world around me has changed, and what they might be. I am also curious what others have experienced in the re-reading of early-in-life personally seminal books later in life, and how they interpreted the nature of the new revelations. And I am sure Dr. Suess is likely to be one book that many of us have likely re-read and re-interpreted many times! :-)

Lucene concurrent search performance with 1,2,4,8 IndexReaders

Image
My last Lucene evaluation ( Simultaneous (Threaded) Query Lucene Performance ) from a couple of days ago was looking at concurrent (multithreaded) queries using a single IndexReader across all threads. Due to suggestions/demand from the Lucene User mailing list, I have expanded the evaluation to include multiple IndexReaders . It is known that a single IndexReader is a limiting factor in a multithreaded environment. So I decided to run the same tests with 1,2,4 and 8 IndexReaders (actually I create IndexReaders and then create an IndexSearcher from each of these and share the IndexSearchers). Below are the results. All of the test environment are the same as my previous evaluation, except: It goes up to 8192 threads instead of the original 4096 threads I had to pass in to the Java VM: -Xmx4000m because the VM was running out of heap for 8 readers I've made the graph larger (Click on graphic to see results) As you can see, 2,4 and 8 readers significantly improve query rate over a

Simultaneous (Threaded) Query Lucene Performance

Image
I've recently had to do some performance query tests on Lucene (v2.3.1) under concurrent request load. These benchmarks are on the same machine, VM, OS described in my earlier Lucene indexing performance benchmarks , however the index is a little different: it is an index of title, author, journal name, keywords, etc metadata (no full-text) for 25.6M journal articles. The index size is 19GB and -- using the same framework as the previous benchmarks, above -- indexing time of 4.25 hours. YMMV. Using a set of 2900 user queries (ranging from single word queries to queries with >600 characters and using multiple fields and operators; no range queries), Lucene was pre-warmed with 2000 (different) queries. Ten runs were performed and averaged. Below are the results plotting #requests per second handled vs. #threads making requests . This was all run in the same VM, using an instance of java.util.concurrent.ThreadPoolExecutor to parallelize things: The best results were for 6 or 7 t