Posts

Showing posts from April, 2008

"Science 2.0 -- Is Open Access Science the Future?"

" Is posting raw results online, for all to see, a great tool or a great risk? " Scientific American article . Read it. Nuff said.

Media in Motion Symposium

McGill University's Documentation and Conservation of the Media Arts Heritage (DOCAM) Research Alliance and Media@McGill have call for papers out for their "Media in Motion: The Challenge of Preservation in the Digital Age" to be held October 29, 2008 at McGill University (Annual International DOCAM Summit). The topics include but are not limited to: Archival Practices Challenges of Audio, Film, Video, and Digital Media Preservation Cultural Influences, Impacts, and Considerations Cultural Property Law Digital Preservation and Cultural Memory Digitization of the Humanities Effects on Artistic Practices Ethical, Social, and Philosophical Concerns Preservation Strategies and Techniques Future Trends and Directions I found this information on the DIGLIB mailing list, but I can't find the CFP on the DOCAM site, so here is the link to the mailing list announcement .

"Wikipedia for Data"

Bret Taylor has a refreshing and rather simple (powerful) idea: Wikipedia for data . All kinds of data. From what I would consider scientific data, to census data to geographic data to TV listings to stock data to CD cover and track data to .....whatever. Cut out the lawyers and the licensing. Open Data for all to use and experiment with. Coming from a country where you have to buy census data (!!!), this is a wonderful idea. Spread the meme.

"Libraries in the Converging Worlds of Open Data, E-Research, and Web 2.0"

This looks like an interesting article ( Libraries in the Converging Worlds of Open Data, E-Research, and Web 2.0 , Stuart MacDonald, March/April issue of ONLINE magazine) but I don't have a subscription so I can't really comment much on it. Ironic that the abstract mentions Peter Suber's Open Access News blog.... ;-) Abstract: " The new forms of research enabled by the latest technologies bring about collaboration among researchers in different locations, institutions, and even disciplines. These new collaborations have two key features -- the prodigious use and production of data. This data-centric research manifests itself in such concepts as e-science, cyberinfrastructure, or e-research. Over the last decade there has been much discussion about the merits of open standards, open source software, open access to scholarly publications, and most recently open data. There are a range of authoritative weblogs that address the open movement, some of which include: 1. DC

Semantic Markup

The Economist has a good article ( The Semantic Web: Start Making Sense , April 6, 2008) on the Semantic Web (Web 3.0?). One of the technologies they describe for automatically semantically marking-up structured and unstructured text is Calais from Reuters: " Reuters, however, believes it has overcome this problem. It recently launched a service called Calais[1] that takes raw web pages (and, indeed, any other form of data) and does the marking up itself. The acronyms can then get to work. That promises to imbue the streams of unstructured text and data sloshing around the internet with almost instant meaning. The idea is that any website can send a jumble of text and code through Calais and receive back a list of "entities" that the system has extracted--mostly people, places and companies--and, even more importantly, their relationships. It will, for instance, be able recognise a pharmaceutical company's name and, on its own initiative, cross-reference that agai

Lucene indexing performance benchmarks for journal article metadata and full-text

I posted these journal article metadata & full-text Lucene indexing benchmarks to the lucene user mailing list using the suggested XML format , but it seems like that was not the proper thing to do. One of the list members (Cass Costello) converted it to HTML (thanks :-) ). I've decided to give it a permanent home here. If you have any questions, just let me know. I have some other benchmarks I will be posting with more records (~25 million) but only article metadata, not full-text. The loader that does all of this was developed as part of my Ungava project. Hardware Environment Dedicated machine for indexing : yes CPU : Dual processor dual core Xeon CPU 3.00GHz; hyperthreading ON for 8 virtual cores RAM : 8GB Drive configuration : Dell EMC AX150 storage array fibre channel Software environment Lucene Version : 2.3.1 Java Version : Java(TM) SE Runtime Environment (build 1.6.0_02-b05) Java VM : Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode) OS Version :

Must-read for Science Librarians: "Open Notebook Science: Implications for the Future of Libraries"

Jean-Claude Bradley's presentation " Open Notebook Science: Implications for the Future of Libraries " is a must-read for all research and science librarians if they want to know how science is starting to be, and will be, done. It should also be read by those who plan the futures of research and science libraries, in order to understand how, for instance, the millennials will be doing science, if they are not already. Fundamental to this future (and present) are Open Access, Open Data, social (research) networking, the blogging/wiki/GoogleDocs/mailing-list dynamics, Wiki versioning (of experimental and other research activities), Second Life for presentations and teaching, and the necessity of machine-to-machine communications and interactions (see my earlier blog entry: New Open Access Criterion: Support access by machines "). Abstract: Open Notebook Science involves a variety of internet-based techniques for sharing of scientific information, from the use of wiki

New Open Access Criterion: Support access by machines (m2m)

Related to my last posting ( FREE THE ARTICLES! (full-text for researchers & scientists and their machines) ) and in the light of Peter Murray-Rust's recent annoying discovery that he cannot text-mine Pubmed Central ( Can I data- and Text-mine Pubmed Central? ), I would like to suggest an additional criterion to the definition of Open Access: Open Access must include access by machines : At minimum one must allow crawls of the site/content or (to reduce the impact of badly configured crawlers) create a compressed XML file containing all metadata and either content, or direct links to content and make it available for download (and if bandwidth is still an issue put it on a P2P network like BitTorrent ). Preferable is to offer some kind of API (OTMI) or protocol (OAI-PMH) to get at content and metadata and citations. Better is to offer access to the XML of the articles in addition to the PDF and/or HTML; if the XML actually has some semantic content, then we are approaching th

FREE THE ARTICLES! (Full-text for researchers & scientists and their machines)

At a recent plenary I gave [ earlier post ] at the Colorado Association of Research Libraries Next Gen Library Interfaces conference, I went a little off-script and was educating (/haranguing) the mostly librarian audience about the present-and-near-future importance of the accessibility of full-text research articles to their researchers and scientists. By accessibility of full-text I didn't mean the ability of a human to access the PDF or HTML of an article via a web browser: I was referring to the machine- accessibility of the text contained in the article (and the metadata and the citation information). I was concerned because of the increasing number of discipline-specific tools that use full-text (& metadata & citations) to allow users (via text mining, semantic analysis, etc.) to navigate, analyze and discover new ideas and relationships, from the research literature. The general label for this kind of research is ' literature-based discovery ', where new

"Places & Spaces: Mapping Science" exhibit @ NRC-CISTI

Image
It is very exciting that the Places and Spaces: Mapping Science exhibit from Indiana University will be on display at NRC-CISTI from April 3 - June 27 2008. This is the first time this collection of amazing maps of science is on display outside the U.S. The diverse and creative collection includes traditional cartographic maps, concept maps and domain maps. These are all physical paper (+other media) maps, and also includes some hands-on maps made specifically for children to interact with. Congrats to all involved at NRC-CISTI and in particular my CISTI Research colleague Jeff Demaine who was the originator and champion of this initiative. References: Boyack, K.W., Klavans, R., Börner, K. (2005). Mapping the Backbone of Science . Scientometrics, 64 (3), 351-374. Update: 2008 April 15: Indiana University SLIS Events News: Mapping Science Exhibit at the National Research Council - Ottawa, Canada