FREE THE ARTICLES! (Full-text for researchers & scientists and their machines)
At a recent plenary I gave [earlier post] at the Colorado Association of Research Libraries Next Gen Library Interfaces conference, I went a little off-script and was educating (/haranguing) the mostly librarian audience about the present-and-near-future importance of the accessibility of full-text research articles to their researchers and scientists.
By accessibility of full-text I didn't mean the ability of a human to access the PDF or HTML of an article via a web browser: I was referring to the machine-accessibility of the text contained in the article (and the metadata and the citation information).
I was concerned because of the increasing number of discipline-specific tools that use full-text (& metadata & citations) to allow users (via text mining, semantic analysis, etc.) to navigate, analyze and discover new ideas and relationships, from the research literature. The general label for this kind of research is 'literature-based discovery', where new knowledge hidden in the literature is exposed using text mining and other tools.
Most publisher licenses do not allow for the sort of access to the full-text that many of these discovery and exploration tools need.
When I asked for a show of hands of how many were aware of this issue, of the ~200 in the audience, no one raised their hand.
I went on to suggest/rant that librarians should expect more of their researcher/scientist patrons to be needing/demanding this sort of access to the full-text of (licensed) journal articles. They need to anticipate this response, and I suggested the following non-mutually-exclusive strategies:
[1] Drug Addiction: Going by the book (2008). The Economist, January 10 print issue.
[2] Li, C., Mao, X., Wei, L. (2008). Genes and (Common) Pathways Underlying Drug Addiction. PLoS Computational Biology, 4(1), e2. DOI: 10.1371/journal.pcbi.0040002
[3] Swanson, D. (1986). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med, 30:1:7-18.
Additional reading:
Thanks to Martha Lee UCLA via NGC4LIB.
By accessibility of full-text I didn't mean the ability of a human to access the PDF or HTML of an article via a web browser: I was referring to the machine-accessibility of the text contained in the article (and the metadata and the citation information).
I was concerned because of the increasing number of discipline-specific tools that use full-text (& metadata & citations) to allow users (via text mining, semantic analysis, etc.) to navigate, analyze and discover new ideas and relationships, from the research literature. The general label for this kind of research is 'literature-based discovery', where new knowledge hidden in the literature is exposed using text mining and other tools.
Most publisher licenses do not allow for the sort of access to the full-text that many of these discovery and exploration tools need.
When I asked for a show of hands of how many were aware of this issue, of the ~200 in the audience, no one raised their hand.
I went on to suggest/rant that librarians should expect more of their researcher/scientist patrons to be needing/demanding this sort of access to the full-text of (licensed) journal articles. They need to anticipate this response, and I suggested the following non-mutually-exclusive strategies:
- demanding licenses from publishers and aggregators that allow them to offer access to full-text for analysis by arbitrary patron tools
- asking publishers to publish their full-text in the Open Text Mining Interface (OTMI)
- supporting Open Access journals which allow-for much of this this out-of-the-box (but often have very difficult APIs or non-at-all and only web pages to get at the content!!)
[1] Drug Addiction: Going by the book (2008). The Economist, January 10 print issue.
[2] Li, C., Mao, X., Wei, L. (2008). Genes and (Common) Pathways Underlying Drug Addiction. PLoS Computational Biology, 4(1), e2. DOI: 10.1371/journal.pcbi.0040002
[3] Swanson, D. (1986). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med, 30:1:7-18.
Additional reading:
- Bourne, P.E., Fink, J.L., Gerstein, M. (2008). Open Access: Taking Full Advantage of the Content. PLoS Computational Biology, 4(3), e1000037. DOI:10.1371/journal.pcbi.1000037
- Demirandasanto, M., Coelho, G., Dossantos, D., Filho, L. (2006). Text mining as a valuable tool in foresight exercises: A study on nanotechnology. Technological Forecasting and Social Change, 73(8), 1013-1027. DOI: 10.1016/j.techfore.2006.05.020
- Džeroski, S., Langley, P., Todorovski, L. (2007). Computational Discovery of Scientific Knowledge. Lecture Notes in Computer Science 4660 DOI:10.1007/978-3-540-73920-3
- Glenisson, P. (2004). Integrating scientific literature with large scale gene expression analysis. PhD Thesis, Katholieke Universiteit Leuven, Belgium.
- Hristovski, D., Peterlin, B., Džeroski, S., Stare, S. (2007). Literature Based Discovery Support System and Its Application to Disease Gene Identification. , 4660, 307-326. DOI: 10.1007/978-3-540-73920-3_15
- Kostoff, R. (2007). Validating discovery in literature-based discovery (letter to the editor). Journal of Biomedical Informatics, 40(4), 448-450. DOI:10.1016/j.jbi.2007.05.001
- Krallinger, M., Valencia, A. (2005). Text-mining and information-retrieval services for molecular biology. Genome Biology, 6(7), 224. DOI:10.1186/gb-2005-6-7-224
- Krogel, M., Scheffer, T. (2004). Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics. Machine Learning, 57(1/2), 61-81. DOI: 10.1023/B:MACH.0000035472.73496.0c
- Mack, R. (2002). Text-based knowledge discovery: search and mining of life-sciences documents. Drug Discovery Today, 7(11), S89-S98. DOI:10.1016/S1359-6446(02)02286-9
- Saso Dzeroski, Ljupco Todorovski, Eds. (2007). Computational Discovery of Scientific Knowledge, Introduction, Techniques, and Applications in Environmental and Life Sciences. Lecture Notes in Computer Science 4660 Springer, ISBN 978-3-540-73919-7
- Weeber, M. (2007). Drug Discovery as an Example of Literature-Based Discovery. 4660, 290-306. DOI: 10.1007/978-3-540-73920-3_14
- Weeber, M., Kors, J.A., Mons, B. (2005). Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics, 6(3), 277-286. DOI:10.1093/bib/6.3.277
- Zhou D., Y. He (2008) Extracting interactions between proteins from the literature. Journal of Biomedical Informatics41:2:393-407. DOI:10.1016/j.jbi.2007.11.008
- Zweigenbaum, P., Demner-Fushman, D., Yu, H., Cohen, K.B. (2007). Frontiers of biomedical text mining: current progress. Briefings in Bioinformatics, 8(5), 358-375. DOI:10.1093/bib/bbm045
Thanks to Martha Lee UCLA via NGC4LIB.
Comments
Who needs them if people can freely get access to the data?
Ah! Precisely. We don't need gatekeepers. There is still a huge need for people to help organize the data, through software or through manual intervention, but please, we don't need people to grant us access to the data.
My experience is that most librarians are supportive of Open Access; many of them were and are impacted by the quickly rising costs of commercial journals. They have experienced reduced buying power and commensurate reduced availablity of journals for their users. And - for the most part - most librarians want information to be as freely available as possible. There is no question that their roles are evolving and that the uncertainty of this evolution has made many librarians uncomfortable.
References:
Library sees red over rising journal prices
The Cost of Journals
Biomedical Journal Costs and Trends