FREE THE ARTICLES! (Full-text for researchers & scientists and their machines)

At a recent plenary I gave [earlier post] at the Colorado Association of Research Libraries Next Gen Library Interfaces conference, I went a little off-script and was educating (/haranguing) the mostly librarian audience about the present-and-near-future importance of the accessibility of full-text research articles to their researchers and scientists.

By accessibility of full-text I didn't mean the ability of a human to access the PDF or HTML of an article via a web browser: I was referring to the machine-accessibility of the text contained in the article (and the metadata and the citation information).

I was concerned because of the increasing number of discipline-specific tools that use full-text (& metadata & citations) to allow users (via text mining, semantic analysis, etc.) to navigate, analyze and discover new ideas and relationships, from the research literature. The general label for this kind of research is 'literature-based discovery', where new knowledge hidden in the literature is exposed using text mining and other tools.

Most publisher licenses do not allow for the sort of access to the full-text that many of these discovery and exploration tools need.

When I asked for a show of hands of how many were aware of this issue, of the ~200 in the audience, no one raised their hand.

I went on to suggest/rant that librarians should expect more of their researcher/scientist patrons to be needing/demanding this sort of access to the full-text of (licensed) journal articles. They need to anticipate this response, and I suggested the following non-mutually-exclusive strategies:

demanding licenses from publishers and aggregators that allow them to offer access to full-text for analysis by arbitrary patron tools
asking publishers to publish their full-text in the Open Text Mining Interface (OTMI)
supporting Open Access journals which allow-for much of this this out-of-the-box (but often have very difficult APIs or non-at-all and only web pages to get at the content!!)

Recently I retro-discovered an article[1] in The Economist, which explains to the lay-person some of the kind of things that can be done with access to the literature. This study [2] shows how researchers discovered the biochemical pathway involved in drug addiction from the literature alone. They did no experiments. This discovery was derived from an analysis and extraction of information from more than 1000 articles! This is not the first time this sort of thing has happened[3]. Clearly, this sort of analysis can save time and money in discovering important and relevant scientific knowledge.

[1] Drug Addiction: Going by the book (2008). The Economist, January 10 print issue.
[2] Li, C., Mao, X., Wei, L. (2008). Genes and (Common) Pathways Underlying Drug Addiction. PLoS Computational Biology, 4(1), e2. DOI: 10.1371/journal.pcbi.0040002
[3] Swanson, D. (1986). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med, 30:1:7-18.

Additional reading:

Bourne, P.E., Fink, J.L., Gerstein, M. (2008). Open Access: Taking Full Advantage of the Content. PLoS Computational Biology, 4(3), e1000037. DOI:10.1371/journal.pcbi.1000037
Demirandasanto, M., Coelho, G., Dossantos, D., Filho, L. (2006). Text mining as a valuable tool in foresight exercises: A study on nanotechnology. Technological Forecasting and Social Change, 73(8), 1013-1027. DOI: 10.1016/j.techfore.2006.05.020
Džeroski, S., Langley, P., Todorovski, L. (2007). Computational Discovery of Scientific Knowledge. Lecture Notes in Computer Science 4660 DOI:10.1007/978-3-540-73920-3
Glenisson, P. (2004). Integrating scientific literature with large scale gene expression analysis. PhD Thesis, Katholieke Universiteit Leuven, Belgium.
Hristovski, D., Peterlin, B., Džeroski, S., Stare, S. (2007). Literature Based Discovery Support System and Its Application to Disease Gene Identification. , 4660, 307-326. DOI: 10.1007/978-3-540-73920-3_15
Kostoff, R. (2007). Validating discovery in literature-based discovery (letter to the editor). Journal of Biomedical Informatics, 40(4), 448-450. DOI:10.1016/j.jbi.2007.05.001
Krallinger, M., Valencia, A. (2005). Text-mining and information-retrieval services for molecular biology. Genome Biology, 6(7), 224. DOI:10.1186/gb-2005-6-7-224
Krogel, M., Scheffer, T. (2004). Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics. Machine Learning, 57(1/2), 61-81. DOI: 10.1023/B:MACH.0000035472.73496.0c
Mack, R. (2002). Text-based knowledge discovery: search and mining of life-sciences documents. Drug Discovery Today, 7(11), S89-S98. DOI:10.1016/S1359-6446(02)02286-9
Saso Dzeroski, Ljupco Todorovski, Eds. (2007). Computational Discovery of Scientific Knowledge, Introduction, Techniques, and Applications in Environmental and Life Sciences. Lecture Notes in Computer Science 4660 Springer, ISBN 978-3-540-73919-7
Weeber, M. (2007). Drug Discovery as an Example of Literature-Based Discovery. 4660, 290-306. DOI: 10.1007/978-3-540-73920-3_14
Weeber, M., Kors, J.A., Mons, B. (2005). Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics, 6(3), 277-286. DOI:10.1093/bib/6.3.277
Zhou D., Y. He (2008) Extracting interactions between proteins from the literature. Journal of Biomedical Informatics41:2:393-407. DOI:10.1016/j.jbi.2007.11.008
Zweigenbaum, P., Demner-Fushman, D., Yu, H., Cohen, K.B. (2007). Frontiers of biomedical text mining: current progress. Briefings in Bioinformatics, 8(5), 358-375. DOI:10.1093/bib/bbm045

Update 2008 April 7: Peter Suber's posts on how OA facilitates meta-analysis and text-mining.

Thanks to Martha Lee UCLA via NGC4LIB.

Comments

Daniel Lemire said…

My impression is that librarians do not take open access seriously. I suppose they consider that they are gatekeepers to the locked information.

Who needs them if people can freely get access to the data?

Ah! Precisely. We don't need gatekeepers. There is still a huge need for people to help organize the data, through software or through manual intervention, but please, we don't need people to grant us access to the data.

04 April, 2008 19:40

Glen Newton said…

While traditionally librarians have often played a gatekeeper role, and certainly there has been an existential crisis in the library community over the last ~10 years, I have to mostly disagree with your comment.

My experience is that most librarians are supportive of Open Access; many of them were and are impacted by the quickly rising costs of commercial journals. They have experienced reduced buying power and commensurate reduced availablity of journals for their users. And - for the most part - most librarians want information to be as freely available as possible. There is no question that their roles are evolving and that the uncertainty of this evolution has made many librarians uncomfortable.

References:

Library sees red over rising journal prices

The Cost of Journals

Biomedical Journal Costs and Trends

06 April, 2008 09:43

frenzy said…

This comment has been removed by a blog administrator.

18 July, 2008 11:37

Search This Blog

Zzzoot

FREE THE ARTICLES! (Full-text for researchers & scientists and their machines)

Comments

Popular posts from this blog

Java, MySql increased performance with Huge Pages

Postscript coding resources

It's not Open Data, so stop calling it that...