Software Announcement: LuSql: Database to Lucene indexing

November 17, 2008

LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple cores.

LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation. Its only dependencies are three Apache Commons libraries, the Lucene core itself, and a JDBC driver.

LuSql has been extensively tested, including a large 6+ million full-text & article metadata document collection, producing an 86GB Lucene index.

I am the author of the LuSql software.

Update 2008 11 17 14:16:

Update 2008 11 17 22:00

Discussion on the Solr list

Comments

Daniel Lemire said…

Looks good. So, you basically can index documents stored in, say, mysql, right?

17 November, 2008 09:10

Glen Newton said…

Yes. Anything that is accessible through a JDBC driver.

The default is MySQL, but you just tell LuSql that you want to use another JDBC driver and give it the appropriate connect string.

Note that -- in addition to SQL databases -- there are JDBC drivers for CSV text (CsvJdbc)and Excel (xlSql) (I haven't tested them though...)

17 November, 2008 09:17

Search This Blog

Zzzoot

Software Announcement: LuSql: Database to Lucene indexing

Comments

Popular posts from this blog

Java, MySql increased performance with Huge Pages

Lucene concurrent search performance with 1,2,4,8 IndexReaders

Postscript coding resources