Posts

Showing posts from 2010

Research Data and Metadata at Risk: Degradation over Time

Image
Research data and metadata - usually derived from an experiment or series of experiments - is in the hot focus of the researcher during the experiment and the subsequent interpretation, paper writing and publishing. But once the researcher has moved on to their next effort, this data is very much at risk. Often (read ' the-rule-rather-than-the-exception '), the data is not properly managed and archived, and resides as a single copy on the researcher's desktop or maybe research server. In addition, the metadata is minimal or non-existent, and if it does exist is only interpretable by the researcher and their colleagues or students. Over time, the chance that this data will be lost or useful knowledge about it forgotten by the researcher increases, and the information content of the data and metadata rapidly decreases. Some events can seriously accelerate this decrease: data loss (media failure, computer replacement, other serious accidents or failures, etc); change of care

Canadian "Open Data" Cities: 'No stars' in Tim Berners-Lee Five Star Rating for Open Government Data

At the International Open Government Data Conference (IOGDC) Tim Berners-Lee " ...reiterated his “ five star system ” for open government data : 1 Star for putting data on the Web at all, with an open license. 2 Stars if it’s machine-readable. 3 Stars for machine-readable, non-proprietary formats 4 Stars if the data is converted into open linked data standards like RDF or SPARQL 5 Stars when people have gone through the trouble of linking it " From: http://gov20.govfresh.com/open-data-accountability-citizen-utility-and-economic-opportunity/ Original TB-L ref: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/ So according this metric by TB-L, the Canadian cites (Ottawa, Vancouver, Toronto,, Edmonton, etc) that have recently released data with not open licenses get ZERO stars !!!

Not 'in spite of' but 'because of': agile: Multics--The first seven year

In the Multics--The first seven years ( 1972 Spring Joint Computer Conference ) Corbato et al make the interesting statement: " In spite of [bold added] the unexpected design iteration phase, the Multics system became sufficiently effective by late 1968 to allow system programmers to use the system while still developing it. " I think that many from the agile community might change the " in spite of " to " because of "... :-)

Mars Inc. Cacao Genome Database claims Open Access, public domain: falls short

This initially looked very promising: Mars , along with a number of collaborators (USDA, IBM, Clemson University Genomics Institute; Public Intellectual Property Resource for Agriculture at the University of California-Davis; National Center for Genome Resources; Center for Genomics and Bioinformatics at Indiana University; HudsonAlpha Institute for Biotechnology; and Washington State University), have sequenced the cacao genome and released it "Open Access" and "public domain" for the benefit of all, at a site called the Cacao Genome Project : McLean, VA –Today, Mars, Incorporated, the U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), and IBM released the preliminary findings of their breakthrough cacao genome sequence and made it available in the public domain . - From the Mars Inc. press release 15 September 2010 A quote from the Independent article on the release ( First rice, then wheat – now cocoa genome unravelled 15 Sep 2010) f

Conference proceedings: Electronic Government and the Information Systems Perspective

Volume 6267: Electronic Government and the Information Systems Perspective . First International Conference, EGOVIS 2010, Bilbao, Spain, August 31. September 2, 2010. [NB: Behind paywall] Stakeholders’ Views on Government Enterprise Architecture: Strategic Goals and New Public Services . Katja Penttinen, Hannakaisa Isomäki An Investigation into Critical Determinants of e-Government Implementation in the Context of a Developing Nation . Nahid Rashid, Shams Rahman “What We Cannot Speak about We Must Pass over in Silence” – (In)correctly Arguing and Comparing the Costs of IT Investments in Public Sector . Samuli Pekkola, Kimmo Wideroos Small-Area Population Projections - A Key Element in Knowledge Based e-Governance . Henning Sten Hansen From Policy-Making Statements to First-Order Logic . Adam Wyner, Tom Engers, Kiavash Bahreini A Fuzzy Recommender System for eElections . Luis Terán, Andreas Meier Web 2.0 Creates a New Government . Roland Traunmüller Elements of Comprehensive Assessments

What is Open Gov Data? The Sunlight Foundation: Ten Principles for Opening Up Government Information

My earlier entry/rant, It's not Open Data, so stop calling it that... about the non-Open Data nature of a number of Canadian cities' Open Data initiatives is supported by the just released Ten Principles for Opening Up Government Information from the Sunlight Foundation . Specifically: 6. Non-discrimination "Non-discrimination" refers to who can access data and how they must do so. Barriers to use of data can include registration or membership requirements. Another barrier is the uses of "walled garden," which is when only some applications are allowed access to data. At its broadest, non-discriminatory access to data means that any person can access the data at any time without having to identify him/herself or provide any justification for doing so. 8. Licensing The imposition of "Terms of Service," attribution requirements, restrictions on dissemination and so on acts as barriers to public use of d

ARL Report: E-Science and Data Support Services

The U.S. Association of Research Libraries (ARL) has produced a new report ( E-Science and Data Support Services ).

Alzheimer's Spinal Fluid Test and Research Data Sharing

The recent reports on being able to predict Alzheimer's ( Alzheimer's predicted by spinal-fluid test -- CBC, 2010.08.10) are the direct results due to the data sharing of scores of biomedical researchers ( Sharing of Data Leads to Progress on Alzheimer’s -- New York Times, 2010.08.10). The sharing included both academic researchers and drug company researchers. The data sets are available online: Companies as well as academic researchers are using the data. There have been more than 3,200 downloads of the entire massive data set and almost a million downloads of the data sets containing images from brain scans. Alzheimer’s Disease Neuroimaging Initiative (ADNI), the organization looking after the data, has a very complete policy ( Alzheimer’s Disease Neuroimaging Initiative (ADNI) Data Sharing and Publication Policy ) about their data sharing.

It's not Open Data, so stop calling it that...

While it is a great positive change that data is being released through numerous efforts around the world, data release is not the same as Open Data release . A number of Canadian cities have announced Open Data initiatives, but they are not releasing Open Data. They are just releasing data. Of course, this is better than not releasing data. But let's at least be honest about what we are doing. Why aren't they Open Data? Because their licenses are not Open Data licenses: Not Open Data : Edmonton: " The City may, in its sole discretion, cancel or suspend your access to the datasets without notice and for any reason ..." - from Terms of Use Not Open Data : Vancouver: "The City may, in its sole discretion, cancel or suspend your access to the datasets without notice and for any reason..." - Terms of Use Not Open Data : Ottawa: " The City may, in its sole discretion, cancel or suspend your access to the datasets without notice and f

University visitor @ Australian National University

Image
Tomorrow is (sadly) my last official day * as a university visitor at the Australian National University (ANU), Canberra . I've been here since late June, invited by ANU adjunct and Funnelback chief scientist (and ex-CSIRO ) David Hawking , to visit the Algorithms and Data Research Group , School of Computer Science , College of Engineering and Computer Science . I was installed in a lovely office looking out into the campus, where I've been working on large scale journal visualization, a continuation of the Torngat project . I've been working on a couple of things, including applying Mulan to the multi-label problem of the corpus I am working with, so I can get precision and recall to evaluate this method empirically. My productivity has been hampered by a recurring stomach problem (which appears to be gone this last week: yay!), so I've not progressed as much as I would have wanted to.... :-( At the end of last week I gave a presentation at CSIRO (in the same bui

E-Government ICT Conference

The recent conference E-Government ICT Professionalism and Competences Service Science (IFIP International Federation for Information Processing, IFIP 20th World Computer Congress, Industry-Oriented Conferences, September 7-10, 2008, Milano, Italy) is of interest to those involved with ICT in government and eGovernment in general (although the conference is rather EU-centred). Note the content is behind a pay-wall, so you can't read the articles unless you belong to an institution that has a subscription or you have one yourself. Of interest: Why is True eGovernment still difficult to be achieved? E-Government For Small Local Government Organizations A normative approach to democracy in the electronic government framework IT skill requirements in Public Administration How to move forward and implement e-skills on a long term basis Search: "open source": Business Process Monitoring: BT Italy case study The Italian Public Administration Electronic Market: Scenario, Opera

JCDL2010 Research Data Papers

I am at the 2010 Joint Conference on Digital Libraries (JCDL2010) at Surfer's Paradise in Queensland, Australia. Among the many interesting papers are two papers very relevant to those interested in research data issues. [JCDL2011 will be held in Ottawa, Canada June 13-17 2011. I am the general chair for the conference] Digital Libraries for Scientific Data Discovery and Reuse: From Vision to Practical Reality Jillian Wallis, Matthew Mayernik, Christine Borgman and Alberto Pepe Abstract. Science and technology research is becoming not only more distributed and collaborative, but more highly instrumented. Digital libraries provide a means to capture, manage, and access the data deluge that results from these research enterprises. We have conducted research on data practices and participated in developing data management services for the Center for Embedded Networked Sensing since its founding in 2002 as a National Science Foundation Science and Technology Center.

Sydney in Winter is great!

Image
I am in Sydney, soon to go to Surfer's Paradise for JCDL2010 . It is winter here in Australia, with Sydney being ~2C overnight and 17-22C during the day. Fortunately the days have been sunny and dry: Had a great tour of the Universitry of Sydney's libraries by John Shipp , university librarian, who was very generous with his time:

Presenting at Code4Lib-North

Using Open Source Tools for Visualization and Semantic Mapping in a Large Scale Article Digital Library [slideshare] - Questions: Anyone working in the cloud? Converting 4TB of TIFFS to PDF in 24hrs: Hadoop + EC2 + S3 = Super alternatives for researchers (& real people too!)

W3C to lead Linked Open Data Camp at WWW2010

Linked Open Data (LOD) Camp - The year open data went worldwide , by Tim Berners-Lee - Open data deployments in electronic Government , by Sandro Hawke - Linked data: What about privacy? , by Thomas Roessler - Linked Open Data: What about Applications? , by Ivan Herman - Selection of Topics of Discussion for the afternoon sessions Afternoon lightning talks subjects: Linked Open Drug Data eGov opportunities Health Care/Life Sciences opportunities Financial and Business Data Crawling the LOD with ldspider Wiki

Article: Open Source Econometric Software better performance, accuracy and bug fixing than commercial software

Yalta and Yalta 2010 ( " Should Economists Use Open Source Software for Doing Research? ") examine the reliability, accuracy and bug fixing time for an Open Source econometric software package and five commercial econometric software packages. They find that after 5 years many of the bugs in the commercial software have not been fixed, whereas similar bugs in the Open Source software are fixed and released within a week of the discovery of the bugs. Building on the work done by McCullough in 2004 applying a set of tests called Wilkinson's tests to the five commercial software packages, they re-apply these tests to the new versions of the commercial software, and apply the tests to the Open Source econometrics software Gretl . The idea behind the Yalta and Yalta paper is to evaluate the bug fix time of the Open Source software, and compare this to the fixes -- if any -- and their times, that have been applied to the commercial software, since the 2004 McCullough paper

Preservation of Digital Geospatial Materials

The Journal of Map And Geography Libraries has a special issue Preservation of Digital Geospatial Materials (Volume 6 Issue 1 2010) . Papers describe a national level effort, a state effort, and a complex multi-state effort. All three projects are funded by the U.S. National Digital Information Infrastructure and Preservation Program (NDIIPP) : The National Geospatial Digital Archive: A Collaborative Project to Archive Geospatial Data . Tracey Erwin, Julie Sweetkind-Singer. The North Carolina Geospatial Data Archiving Project:Challenges and Initial Outcomes . Steven P. Morris GeoMAPP: A Geospatial Multistate Archive and Preservation Partnership . Alec Bethune, Butch Lazorchak, Zsolt Nagy

The Economist special report on information management

Data, data everywhere The Economist , 00130613, 2/27/2010, Vol. 394, Issue 8671 All too much: Monstrous amounts of data

Open Data and Open Source Tools: Examine BC Power Transmission

"*pybctc* is a python package that makes access to British Columbia Transmission Corporation (BCTC) electric data easier. The British Columbia Transmission Corporation is a crown corporation with a mandate to plan, build, and operate the province of British Columbia's electricity transmission system. It publishes valuable information on electicity generation, transmission, and consumption to its website. This information is useful for many purposes including economic analysis, power trading, electric system study, and forecasting. The first step in using such information is to download it an parse it into useful data structures - a task performed by this library. The processed data normally will feed statistical methods, heuristics, and system models to provide a useful analysis of the British Columbia electric system. The *pybctc* project is hosted at http://bitbucket.org/kc/pybctc The data this library accesses can be found here: B.C. Transmission Corp historical trans

Paper: "Sociological implications of scientific publishing"

Image
Sociological implications of scientific publishing: Open access, science, society, democracy, and the digital divide Ulrich Herb. First Monday , Volume 15, Number 2 - 1 February 2010 Claims for open access are mostly underpinned with science–related arguments (open access accelerates scientific communication); financial arguments (open access relieves the serials crisis); social arguments (open access reduces the digital divide); democracy–related arguments (open access facilitates participation); and, socio–political arguments (open access levels disparities). Using sociological concepts and notions, this article focuses strongly on Pierre Bourdieu’s theory of (scientific) capital and its implications for the acceptance of open access, Michel Foucault’s discourse analysis and the implications of open access for the concept of the digital divide. Bourdieu’s theory of capital implies that the acceptance of open access depends on the logic of power and the accumulation of

Suite of ecology journals moving to open access research data policy

In the recent editorial ( Am Nat 2010. Vol. 175, pp. 145–146 ) of The American Naturalist , it was announced that the journals " The American Naturalist , Evolution , the Journal of Evolutionary Biology , Molecular Ecology , Heredity , and other key journals in evolution and ecology..." would be introducing data archiving policies supporting access, re-use and long term preservation. These policies are to be put in place in one year, and the example policy for the The American Naturalist is given: This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad , or the Knowledge Network for Biocomplexity. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allow

Mattress Tobogganing in Quebec

Mid-December the new bed was delivered, and the delivery guys were supposed to take away the old bed. But - as I live in the country and it was snowy - the delivery truck got stuck down the road from my place. While the delivery guys were waiting for the tow truck, they decided to carry up the mattress, box spring and bed frame. They also decided to carry down the old mattress and box spring (wrapped in the plastic that the new items had arrived in). Spontaneously they decided to use these as plastic-wrapped toboggans first to toboggan down my driveway and then carry their toboggans around the corner, to the hill on my street, down to their truck. The mattress proved to be the better of the two, getting some pretty good speed down the road. I almost regret not having kept the old mattress for tobogganing! :-)