Zzzoot

Posts

Showing posts from November, 2008

The (near) Future of Research Articles

November 27, 2008

Rod Page 's demo for his Elsevier Grand Challenge submission (" Towards realising Darwin’s dream: setting the trees free ") shows the type of enrichment of biological - if not all research - articles that is quickly becoming possible. Taking a published article (" Mitochondrial paraphyly in a polymorphic poison frog species (Dendrobatidae; D. pumilio "), various additional biological, geographical and other metadata are extracted and added to a web page for the article. These include: Map showing all localities mentioned in the paper, with their enclosing polygon List of other studies which have samples in area enclosed by the study polygon Each of the following are linked through to their underlying databases (such as NIH accession number and NCBI nucleotide viewer or linked to ubio taxonomic name viewer record: List of sequence features (such as genes) in the article List of taxa sequenced in the article List of gene sequences cited by the article An image c...

Lucene 2.3.1 vs 2.4 benchmarks using LuSql

November 24, 2008

I have been doing some indexing performance tests with LuSql , and have some numbers comparing Lucene 2.3.1 with 2.4. Despite some discussion about 2.4 having poorer indexing performance, my tests with LuSql 0.9 suggest otherwise: Lucene 2.3.1 Number of records added= 2000000 Optimizing index Closing index Optimizing index time: 311 seconds Closing JDBC: result set Closing JDBC: statement Closing JDBC: connection *********** Elapsed time: 854 seconds 15m 18s Lucene 2.4 Number of records added= 2000000 Optimizing index Closing index Optimizing index time: 322 seconds Closing JDBC: result set Closing JDBC: statement Closing JDBC: connection *********** Elapsed time: 759 seconds 12m 39s Index size: 3.7GB. It is interesting that the overall indexing time is significantly less, but the optimizing time is slightly higher. Data, hardware and system configuration: as per my previous Lucene benchmarking . Note that this is a simple benchmark, so YMWV. This benchmark was done with the LuSql de...

Asian Digital Libraries 2008 Proceedings

November 19, 2008

Proceedings of the 11th International Conference on Asian Digital Libraries, ICADL 2008 , Bali, Indonesia, December 2-5, 2008 are now available: DL2Go: Editable Digital Libraries in the Pocket . Hyunyoung Kil, Wonhong Nam, Dongwon Lee. Hierarchical Classification of Web Pages Using Support Vector Machine . Yi Wang, Zhiguo Gong. The Prevalence and Use of Web 2.0 in Libraries . Alton Yeow Kuan Chua, Dion Hoe-Lian Goh, Chei Sian Lee. Usability of Digital Repository Software: A Study of DSpace Installation and Configuration . Nils Körber, Hussein Suleman. Developing a Traditional Mongolian Script Digital Library . Garmaabazar Khaltarkhuu, Akira Maeda. Weighing the Usefulness of Social Tags for Content Discovery . Khasfariyati Razikin, Dion Hoe-Lian Goh, Chei Sian Lee, Alton Yeow Kuan Chua. A User Reputation Model for DLDE Learning 2.0 Community . Fusheng Jin, Zhendong Niu, Quanxin Zhang, Haiyang Lang, Kai Qin. Query Relaxation Based on Users Unconfidences on Query Terms and Web K...

Software Announcement: LuSql: Database to Lucene indexing

November 17, 2008

LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC -accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple cores. LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation. Its only dependencies are three Apache Commons libraries, the Lucene core itself, and a JDBC driver. LuSql has been extensively tested, including a large 6+ million full-text & article metadata document collection, producing an 86GB Lucene index. I am the author of the LuSql software. LuSql at CISTI Lab LuSql at freshmeat Upda...

New Book: Semantic Digital Libraries

November 12, 2008

I am looking forward to getting a hold of this just announced book, Semantic Digital Libraries , Editors: Sebastian Ryszard Kruk , DERI NUI, Galway, Bill McDaniel , DERI NUI, Galway. Springer-Verlag, Heidelberg (DE) 2009, XVI, 246 p. 1 illus., Hardcover ISBN: 978-3-540-85433-3. The site for the book includes Tutorial on Semantic Digital Libraries , a tutorial presented at JCDL2008 , as well as a faceted searchable interface to the (extensive and useful) links described in the book. Contents Introduction Part I - Introduction to Digital Libraries and Semantic Web Digital Libraries and Knowledge Organization Semantic Web and Ontologies Social Semantic Information Spaces Part II - A Vision of Semantic Digital Libraries Goals of Semantic Digital Libraries Architecture of Semantic Digital Libraries Long-time Preservation Part III - Ontologies for Semantic Digital Libraries Bibliographic Ontology Community-aware Ontologies Part IV - Prototypes of Semantic Digital Libraries JeromeDL...

Opportunistic Software Systems Development

November 03, 2008

In the 25th anniversary issue (November/December 2008 (vol. 25 no. 6)) of IEEE Software , my NRC colleague Anatol Kark is part of the editorial team for the special issue on " Opportunistic Software Systems Development ". These are all great articles, and I particularly like the Jansen et al article (" Pragmatic and Opportunistic Reuse in Innovative Start-up Companies ") and feel that almost everyone who is trying to bring their organizationl IT into the 21st century should be forced to read the Gamble et al article (" Monoliths to Mashups: Increasing Opportunistic Assets "). Cornelius Ncube, Patricia Oberndorf, Anatol W. Kark, " Opportunistic Software Systems Development: Making Systems from What's Available ," IEEE Software, vol. 25, no. 6, pp. 38-41, Nov/Dec, 2008 Slinger Jansen, Sjaak Brinkkemper, Ivo Hunink, Cetin Demir, " Pragmatic and Opportunistic Reuse in Innovative Start-up Companies ," IEEE Software, vol. 25, no. 6, p...