Wednesday, October 31, 2007

"When Is Open Access Not Open Access?"

The article When Is Open Access Not Open Access? (CJ MacCallum) PLoS Biology examines the slippery activities of publishers that try and fly the flag of Open Access (with varying degrees of capitalization) but who only offer the free-as-in-beer definition of freedom, as opposed to the Open Access definition, which includes --- as well as free-gratis freedom -- extensive intellectual property rights permitting unrestricted derivative use. This issue and these distinctions were discussed earlier this year in "Free but not open?" at the PLoS blog. I have noticed that many journals use the weasel words like "We conform to open access as defined by SHERPA". The SHERPA definition does not include the extensive IP rights described by Open Access:

By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
-Budapest Open Access Initiative
This watering-down of freedom from "free-gratis and free-to-use-and-modify-and-distribute" to simply "free-gratis" (and maybe some IP freedom for the authors) and the general obfuscation/duplicity/ignorance by publishers parallels similar activities in the software world, where the freedom issue has also been confused and watered-down in various "open source" (note case) licenses. See Open Source vs. Free Software.

Saturday, October 27, 2007

Tag Cloud inspired HTML Select lists

I have been working with Tag clouds and other Web 2.0 sorts of things quite a bit lately [see earlier post: Drill Clouds for Search Refinement] and couldn't help notice that it might be useful to use the Tag cloud "Size reflects frequency/importance" idiom in HTML select lists, so I did a little bit of experimenting (BTW, I did look for these on the Web but didn't find them: it doesn't mean they are not already out there...).

So I played with the styles of these elements, and was able to get something that looks like this:

I am not sure how the above HTML renders in your browser (Update: Daniel has some info on how/if this works in different browsers), but here is how it renders in mine (Firefox on Linux (Suse 10.2):

It is interesting how the browser allocates space: it seems like it uses the largest (tallest) item in the list to allocate the height of the widget, which makes sense. But while the version of Firefox appropriately sizes the pull-down contents (i.e. above, left), when a term is selected, it is sized at the default text font size (above right), even if its font size as defined and as displayed in the pull-down is larger. This appears to be a bug. But it is easily possible that there is some CSS that I should be using to look after this but do not know about. I have not tested this behaviour in other browsers, but I have for other versions of Firefox (1.06,

Notwithstanding this behaviour, on experimenting with these select variations, I think that they work well and are useful in the appropriate situations.

Update 2008 Oct 16: Oh, here is how I made this:

<option style="font-size: 80%;" value="Aggregators"> Aggregators</option>

<option style="font-size: 155%;" value="Blogs"> Blogs</option>

<option style="font-size: 80%;" value="Collaboration"> Collaboration</option>

<option style="font-size: 125%;" value="Joy of Use"> Joy of Use</option>

<option style="font-size: 80%;" value="Podcasting"> Podcasting</option>

<option style="font-size: 125%;" value="RSS"> RSS</option>

<option style="font-size: 200%;" value="Web 2.0"> Web 2.0</option>

<option style="font-size: 80%;" value="XHTML"> XHTML</option>


<select size="5">

<option style="font-size: 80%;" value="Aggregators"> Aggregators</option>

<option style="font-size: 155%;" value="Blogs"> Blogs</option>

<option style="font-size: 80%;" value="Collaboration"> Collaboration</option>

<option style="font-size: 125%;" value="Joy of Use"> Joy of Use</option>

<option style="font-size: 80%;" value="Podcasting"> Podcasting</option>

<option style="font-size: 125%;" value="RSS"> RSS</option>

<option style="font-size: 200%;" value="Web 2.0"> Web 2.0</option>

<option style="font-size: 80%;" value="XHTML"> XHTML</option>


digg this

Tuesday, October 23, 2007

Intellectual Property articles in CACM

The October issue of the Communications of the ACM has two complementary articles in the area of Intellectual Property. Complementary in that one is one copyright reform and the other is on (software) patents:

NIH Open Access at Risk in U.S bill

Peter Suber of Open Access News reports that a U.S. Senate labour bill has recently had an amendment added to it, putting the Open Access mandate of the NIH at risk:

The provision to mandate OA at the NIH is in trouble. Late Friday, just before the filing deadline, a Senator acting on behalf of the publishing lobby filed two harmful amendments, one to delete the provision and one to weaken it significantly.

Cyberinfrastructure and Data preservation

Richard Akerman - my colleague here at CISTI - has a couple of excellent pointers to digital preservation and cyberinfrastructure resources at Science Library Pad:
-- CLIR cyberinfrastructure short articles:

-- PV 2007 - Ensuring the Long-Term Preservation and Value Adding to Scientific
and Technical Data :

Thursday, October 18, 2007

Minister of Industry (Canada) Appoints Members of Science, Technology and Innovation Council

It is good to see this advisory body -- promised in the Canadian government's science and technology strategy (Mobilizing Science and Technology to Canada's Advantage) released in May 2007 -- has now been created and appointments made. I hope that it will be effective in its activities. Now that it is in place, perhaps this body might lend some focus (and hopefully its support) to various national science activities, initiatives and proposals, such as the recommendations of the National Consultation on Access to Scientific Research Data (NCASRD).

Saturday, October 13, 2007

IJDL Special Issue: Connecting digital libraries to eScience

The International Journal on Digital Libraries has a special issue entitled "Connecting digital libraries to eScience". I haven't had a chance to read any of the articles, but they look very interesting, and include some discussion on various scientific data issues, collaboration, repositories, research infrastructure, etc:

Thursday, October 11, 2007

New JISC Data Sharing Documents

As part of its DISC-UK DataShare project, JISC has released two documents:

  1. DISC-UK DataShare: State-of-the-Art Review, Harry Gibbs
  2. Data Sharing Continuum graphic, Robin Rice
The former is a summary of recent projects and policy, and introduced me to a number of projects and initiatives that I hadn't previously known about. The latter is a well thought-out view of the data sharing continuum, showing us where we have been (and perhaps for some of us, still are!) and a good idea of where we will/should be going. A good graphic to show to a manager trying to understand the big picture.

Monday, October 08, 2007

Drill Clouds for Search Refinement

I'd like to introduce something I call drill clouds, an extension to tag clouds for search refinement in information retrieval.

I will be using an experimental Lucene-based search platform that I have developed, called Ungava (more in this later), which includes my implementation of drill clouds. Note that much of this posting is derived from a posting of mine on drill clouds on the CISTI Lab wiki.

Drill clouds are what I call an extension to tag clouds to make them a useful tool for search refinement. That is, to use a tag cloud to refine an existing query by adding new elements to the query through interactions with the cloud. As this results in a kind of drill-down search behaviour, these new clouds have been named drill clouds. Some differences between traditional tag clouds and drill clouds:

  • Drill clouds are applied to search results and -- as search results can be very large and include many result items and many tags -- the cloud that is presented is created from a subset of the result set (usually the top N). This is done for both for user interface and performance reasons. This is different from traditional tag clouds which are usually applied to all items. In Ungava, the number of tags and number of search results articles from which those tags were derived is displayed and can be manipulated by the user.
  • When a tag is used in the query refinement, this tag is excluded from the subsequent cloud, as it exists in every result item of the new search. This is perhaps the most distinguishing attribute of a drill-cloud: the exclusion of accumulating search-refinement tags from the subsequent query(ies).
For example:
  1. Using the default Ungava search form, the user searches for fulltext: cell

  2. The user now clicks on the keyword cloud link, getting:

  3. The user now clicks on the chromatin keyword cloud entry, which adds keyword:chromatin to the original search query, resulting in a new set of results:Note that this refined search results in 52 hits, down from the original 3461 hits.

  4. Now when the user clicks on the Keyword cloud link, they get the keyword drill cloud for the new results, but with the keyword (tag) chromatin excluded from the cloud, removing its dominating influence on the cloud (as all articles would have chromatin as a keyword). Here is the resulting keyword drill cloud:If the chromatin were not excluded, its dominance would reduce the other clouds entries to small entries, reducing the discriminating power of the cloud, and its overall usefulness. Here is what the cloud would look like in our example:The user can continue to iteratively refine their search using the drill cloud from each search. Note that users are not constrained to using the same type of metadata tag cload for refinement, i.e. they can follow a keword drill cloud refined search with one from one of the other available drill clouds.

  5. Continuing our example that produced this results list:
  6. Selecting the Author cloud produces the following drill cloud:

  7. Clicking on the author tag Ausió, Juan produces the following results:
Note how after only a small number of drill cloud iterations involving only mouse clicks (not typing in forms), the original result set was reduced from the original 3461 hits to 5 hits.

I am hoping on putting together a paper on drill clouds & submitting it to JCDL 2008.

  • Tuesday, Oct 9: Ungava appears to be back up again.
  • Monday, Oct 8 2007, 1300 EST: It seems that Ungava is down. Today is Canadian Thanksgiving, so I don't think I'll be able to have this brought back up before tomorrow