Monday, October 08, 2007

Drill Clouds for Search Refinement

I'd like to introduce something I call drill clouds, an extension to tag clouds for search refinement in information retrieval.

I will be using an experimental Lucene-based search platform that I have developed, called Ungava (more in this later), which includes my implementation of drill clouds. Note that much of this posting is derived from a posting of mine on drill clouds on the CISTI Lab wiki.

Drill clouds are what I call an extension to tag clouds to make them a useful tool for search refinement. That is, to use a tag cloud to refine an existing query by adding new elements to the query through interactions with the cloud. As this results in a kind of drill-down search behaviour, these new clouds have been named drill clouds. Some differences between traditional tag clouds and drill clouds:
  • Drill clouds are applied to search results and -- as search results can be very large and include many result items and many tags -- the cloud that is presented is created from a subset of the result set (usually the top N). This is done for both for user interface and performance reasons. This is different from traditional tag clouds which are usually applied to all items. In Ungava, the number of tags and number of search results articles from which those tags were derived is displayed and can be manipulated by the user.
  • When a tag is used in the query refinement, this tag is excluded from the subsequent cloud, as it exists in every result item of the new search. This is perhaps the most distinguishing attribute of a drill-cloud: the exclusion of accumulating search-refinement tags from the subsequent query(ies).
For example:
  1. Using the default Ungava search form, the user searches for fulltext: cell


  2. The user now clicks on the keyword cloud link, getting:


  3. The user now clicks on the chromatin keyword cloud entry, which adds keyword:chromatin to the original search query, resulting in a new set of results:Note that this refined search results in 52 hits, down from the original 3461 hits.



  4. Now when the user clicks on the Keyword cloud link, they get the keyword drill cloud for the new results, but with the keyword (tag) chromatin excluded from the cloud, removing its dominating influence on the cloud (as all articles would have chromatin as a keyword). Here is the resulting keyword drill cloud:If the chromatin were not excluded, its dominance would reduce the other clouds entries to small entries, reducing the discriminating power of the cloud, and its overall usefulness. Here is what the cloud would look like in our example:The user can continue to iteratively refine their search using the drill cloud from each search. Note that users are not constrained to using the same type of metadata tag cload for refinement, i.e. they can follow a keword drill cloud refined search with one from one of the other available drill clouds.



  5. Continuing our example that produced this results list:
  6. Selecting the Author cloud produces the following drill cloud:

  7. Clicking on the author tag AusiĆ³, Juan produces the following results:
Note how after only a small number of drill cloud iterations involving only mouse clicks (not typing in forms), the original result set was reduced from the original 3461 hits to 5 hits.

I am hoping on putting together a paper on drill clouds & submitting it to JCDL 2008.

-----------
Update:
  • Tuesday, Oct 9: Ungava appears to be back up again.
  • Monday, Oct 8 2007, 1300 EST: It seems that Ungava is down. Today is Canadian Thanksgiving, so I don't think I'll be able to have this brought back up before tomorrow

No comments: