Research Data and Metadata at Risk: Degradation over Time

Research data and metadata - usually derived from an experiment or series of experiments - is in the hot focus of the researcher during the experiment and the subsequent interpretation, paper writing and publishing. But once the researcher has moved on to their next effort, this data is very much at risk. Often (read 'the-rule-rather-than-the-exception'), the data is not properly managed and archived, and resides as a single copy on the researcher's desktop or maybe research server.

In addition, the metadata is minimal or non-existent, and if it does exist is only interpretable by the researcher and their colleagues or students. Over time, the chance that this data will be lost or useful knowledge about it forgotten by the researcher increases, and the information content of the data and metadata rapidly decreases. Some events can seriously accelerate this decrease: data loss (media failure, computer replacement, other serious accidents or failures, etc); change of careers and retirement; and the death of the researcher.

This common scenario was first described in published form in 1997 (to my knowledge) in a paper entitled: Nongeospatial Metadata for the Ecological Sciences (citation below). It included a very expressive diagram, which I have re-created for another paper (which was used in the draft of the paper but it was decided not to use the diagram in the final version), and you can see it below:





A higher resolution jpeg image can be found here.
A high resolution PDF can be found here.

The diagram was created using LaTeX and TikZ. The source files can be found at the github repo.


From (only in draft): 
de la Sablonnière, Auger, Sabourin and Newton. 2012. Facilitating Data Sharing in the Behavioral Sciences. Data Science Journal. Volume 11, 23 March 2012. DOI 10.2481/dsj.11-DS4
   
After: 
Michener, W., J. Brunt, J. Helly, T. Kirchner & S. Stafford. 1997. Nongeospatial Metadata for the Ecological Sciences. Ecological Applications 7:1:330-342 DOI: 10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2

 

The Michener paper is excellent and very much before its time. Its many insights generalize to other research domains.

Comments

Popular posts from this blog

Java, MySql increased performance with Huge Pages

IBM on Linux: "Lean, clean, and green"

Mars Inc. Cacao Genome Database claims Open Access, public domain: falls short