Posts

Showing posts from 2007

W3C Proposed SPARQL Recommendations

The W3C has just released three SPARQL -related proposed recommendations: SPARQL Query Language for RDF SPARQL Query Results XML Format SPARQL Protocol for RDF

CSIRO research program takes data leadership

CSIRO has announced a new program, Terabyte Science , (press release: From molecules to the Milky Way: dealing with the data deluge ) that is oriented around dealing with the issues of the large volumes of data generated by much of modern science. While this includes the difficult problem of the management of large volumes of data, this program will also focus on new ways to analyse and exploit this data. Kudos to CSIRO for recognizing this issue and realizing an important program for data science, both to their own country and the rest of us. Hopefully other such initiatives will take hold in other countries. And perhaps we will be seeing some of their work published in CODATA 's Data Science Journal . (Disclosure: I am an observer on the Canadian National Committee for CODATA ).

"When Is Open Access Not Open Access?"

The article When Is Open Access Not Open Access? (CJ MacCallum) PLoS Biology examines the slippery activities of publishers that try and fly the flag of Open Access (with varying degrees of capitalization) but who only offer the free-as-in-beer definition of freedom, as opposed to the Open Access definition, which includes --- as well as free- gratis freedom -- extensive intellectual property rights permitting unrestricted derivative use. This issue and these distinctions were discussed earlier this year in " Free but not open? " at the PLoS blog. I have noticed that many journals use the weasel words like " We conform to open access as defined by SHERPA ". The SHERPA definition does not include the extensive IP rights described by Open Access: By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of

Tag Cloud inspired HTML Select lists

Image
I have been working with Tag clouds and other Web 2.0 sorts of things quite a bit lately [see earlier post: Drill Clouds for Search Refinement ] and couldn't help notice that it might be useful to use the Tag cloud "Size reflects frequency/importance" idiom in HTML select lists, so I did a little bit of experimenting (BTW, I did look for these on the Web but didn't find them: it doesn't mean they are not already out there...). So I played with the styles of these elements, and was able to get something that looks like this: Aggregators Blogs Collaboration Joy of Use Podcasting RSS Web 2.0 XHTML Aggregators Blogs Collaboration Joy of Use Podcasting RSS Web 2.0 XHTML I am not sure how the above HTML renders in your browser (Update: Daniel has some info on how/if this works in different browsers ), but here is how it renders in mine (Firefox 2.0.0.4 on Linux (Suse 10.2): It i

Intellectual Property articles in CACM

The October issue of the Communications of the ACM has two complementary articles in the area of Intellectual Property. Complementary in that one is one copyright reform and the other is on (software) patents: Does copyright law need to be reformed? Pamela Samuelson. Abstract: Considering the issues involved in developing a simplified new copyright law and associated administrative mechanisms . [Software Patents]: The good, the bad and the ugly . Matt E. Thatcher, David E. Pingry. Abstract: The goal of improving patent quality remains elusive both from an economic and process perspective.

NIH Open Access at Risk in U.S bill

Peter Suber of Open Access News reports that a U.S. Senate labour bill has recently had an amendment added to it, putting the Open Access mandate of the NIH at risk: The provision to mandate OA at the NIH is in trouble. Late Friday, just before the filing deadline, a Senator acting on behalf of the publishing lobby filed two harmful amendments, one to delete the provision and one to weaken it significantly.

Cyberinfrastructure and Data preservation

Richard Akerman - my colleague here at CISTI - has a couple of excellent pointers to digital preservation and cyberinfrastructure resources at Science Library Pad: -- CLIR cyberinfrastructure short articles: Cyberinfrastructure: It's All about Sharing. Amy Friedlander As We May Rethink . Chuck Henry. -- PV 2007 - Ensuring the Long-Term Preservation and Value Adding to Scientific and Technical Data : The Role of Standards in Managing and Preserving Heterogeneous Scientific Data . John Rumble, Jr. - Information International Associates, USA. Long term digital preservation of scientific and scholarly information: the perspective of the European Commission . Carlos Oliveira - EC, Luxemburg. Preserving and Re-using 20th Century Astronomical Observations . Elizabeth Griffin - Herzberg Institute for Astrophysics , DAO, Canada. Observations on Cost Modeling and Performance Measurement of Long Term Archives . Kathy Fontaine -

Minister of Industry (Canada) Appoints Members of Science, Technology and Innovation Council

It is good to see this advisory body -- promised in the Canadian government's science and technology strategy ( Mobilizing Science and Technology to Canada's Advantage ) released in May 2007 -- has now been created and appointments made . I hope that it will be effective in its activities. Now that it is in place, perhaps this body might lend some focus (and hopefully its support) to various national science activities, initiatives and proposals, such as the recommendations of the National Consultation on Access to Scientific Research Data (NCASRD).

IJDL Special Issue: Connecting digital libraries to eScience

The International Journal on Digital Libraries has a special issue entitled " Connecting digital libraries to eScience". I haven't had a chance to read any of the articles, but they look very interesting, and include some discussion on various scientific data issues, collaboration, repositories, research infrastructure, etc: Connecting digital libraries to eScience: the future of scientific scholarship . Michael Wright, Tamara Sumner, Reagan Moore, Traugott Koch Not by metadata alone: the use of diverse forms of knowledge to locate data for reuse . Ann Zimmerman Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries . Christine L. Borgman, Jillian C. Wallis, Noel Enyedy Collaborative eScience libraries . Linn Marks Collins, Mark L. B. Martinez, Ketan K. Mane, James E. Powell, Chad M. Kieffer, Tiago Simas, Susan K. Heckethorn, Kathryn R. Varjabedian, Miriam E. Blake, Richard E. Luce Pathways: augmenting interoperabilit

New JISC Data Sharing Documents

As part of its DISC -UK DataShare project , JISC has released two documents: DISC-UK DataShare: State-of-the-Art Review , Harry Gibbs Data Sharing Continuum graphic , Robin Rice The former is a summary of recent projects and policy, and introduced me to a number of projects and initiatives that I hadn't previously known about. The latter is a well thought-out view of the data sharing continuum, showing us where we have been (and perhaps for some of us, still are!) and a good idea of where we will/should be going. A good graphic to show to a manager trying to understand the big picture.

Drill Clouds for Search Refinement

Image
I'd like to introduce something I call drill clouds , an extension to tag clouds for search refinement in information retrieval. I will be using an experimental Lucene -based search platform that I have developed, called Ungava (more in this later), which includes my implementation of drill clouds. Note that much of this posting is derived from a posting of mine on drill clouds on the CISTI Lab wiki . Drill clouds are what I call an extension to tag clouds to make them a useful tool for search refinement. That is, to use a tag cloud to refine an existing query by adding new elements to the query through interactions with the cloud. As this results in a kind of drill-down search behaviour, these new clouds have been named drill clouds . Some differences between traditional tag clouds and drill clouds: Tags in drill clouds can be any useful metadata and are not necessarily user applied or exclusively keyword-like (but can include controlled and uncontrolled vocabularies,

New NSF Program: Cyber-Enabled Discovery and Innovation (CDI)

The NSF today announced a very exciting -- at least to me and my research colleagues -- program: CDI seeks ambitious, transformative, multidisciplinary research proposals within or across the following three thematic areas: From Data to Knowledge: enhancing human cognition and generating new knowledge from a wealth of heterogeneous digital data; Understanding Complexity in Natural, Built, and Social Systems: deriving fundamental insights on systems comprising multiple interacting elements; and Building Virtual Organizations: enhancing discovery and innovation by bringing people and resources together across institutional, geographical and cultural boundaries . Congruent with the three thematic areas, CDI projects will enable transformative discovery to identify patterns and structures in massive datasets; exploit computation as a means of achieving deeper understanding in the natural and social sciences and engineering; simulate and predict complex stochastic or chaotic syste

Sustainable Digital Data Preservation and Access Network Partners (DATANET) CFP

The US National Science Foundation Office of Cyberinfrastructure has a call for proposals . From the call: The new types of organizations envisioned in this solicitation will integrate library and archival sciences, cyberinfrastructure, computer and information sciences, and domain science expertise to: provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline; continuously anticipate and adapt to changes in technologies and in user needs and expectations; engage at the frontiers of computer and information science and cyberinfrastructure with research and development to drive the leading edge forward; and serve as component elements of an interoperable data preservation and access network. ...these exemplar organizations can serve as the basis for rational investment in digital preservation and access by diverse sectors of society at the local, regional, national, and international levels,

Scholarly Electronic Publishing Bibliography, 2006

I've just discovered this great resource , created by Charles W. Bailey, Jr, which is a 266 page bibliography (PDF) which includes sections on: Economic Issues Electronic Books and Texts Electronic Serials (including Electronic Distribution of Printed Journals) Legal Issues (including Intellectual Property Rights and License Agreements) Library Issues (including Digital Libraries and Information Integrity and Preservation) New Publishing Models Publisher Issues (including Digital Rights Management) Repositories, E‐Prints, and OAI This is the latest edition of this bibliography, first described in: Bailey, Charles W., Jr. ʺ Evolution of an Electronic Book: The Scholarly Electronic Publishing Bibliography .ʺ The Journal of Electronic Publishing 7 (December 2001).

Open Data for Global Science

The CODATA (Committee on Data for Science and Technology) Data Science Journal has a special issue entitled " Open Data for Global Science " (June 2007) which has a series of excellent articles. It is broken into two parts ( Recent International and National Governmental Data Policy Developments and Analysis of Data Policy Issues ). The Canadian National Consultation on Access to Scientific Research Data (NCASRD) is reported on , and other articles of interest include: Open Data for Global Science: A Review of Recent Developments in National and International Scientific Data Policies and Related Proposals . Paul F. Uhlir. Big Opportunities in Access to "Small Science" Data . Harlan Onsrud and James Campbell. OECD Principles and Guidelines for Access to Research Data from Public Funding . Dirk Pilat and Yukiko Fukasaku. Open Access to Scientific Data: Promoting Science and Innovation . Guan-Hua Xu. Open Data for Global Science . Paul F. Uhlir and Peter Schröder

New Zealand Science and Open Access

In " An Information Revolution ", David Penman discusses Open Access and Open Data (especially as applied to government-funded research) in general, and more specifically as applied to New Zealand science and scientists. While there is some good news: "The Foundation for Research, Science and Technology is now reviewing its data policy and moving towards the norm for the OECD – greater open access for publicly-funded data. Rather than the research provider deciding on access, all information is openly and freely available unless restrictions such as national security, environmental damage (eg, the GPS co-ordinates of threatened species), or clear commercial disadvantage can be justified." He has some blunt - and appropriate - words for NZ scientists: Our researchers will also have to change. No longer can they sit with filing cabinets full of data waiting for the definitive experiment or the life time monograph. Publish quickly in electronic media, make your data a

CIHR announces "Policy on Access to Research Outputs"

The Canadian Institutes of Health Research (CIHR) have announced this new policy, which includes publications AND data. Basically, they have taken a gentle but significant step in opening-up the research outputs of the grant recipiants they support. It does not impact all forms of publications. Two of the most salient points: 5.1.1 Peer-reviewed Journal Publications Grant recipients are now required to make every effort to ensure that their peer-reviewed publications are freely accessible through the Publisher's website (Option #1) or an online repository as soon as possible and in any event within six months of publication (Option #2)." and 5.1.2 Publication-related Research Data Recognizing that access to research data promotes the advancement of science and further high-quality and ethical investigation, CIHR explored current best practices and standards related to the deposition of publication-related data in openly accessible databases. As a first step, CIHR will now re

Financial Times on Open Access

In " The irony of a web without science " (Sept 4 2007) James Boyle decries the state of scientific research and describes the limited amount of scientific output - in particular journal publications - that can be accessed in an Open Access manner. The author cannot reconcile what he describes as the "genius of the web is that it is an open network" with the closed and expensive nature of what is modern science and the modern scientific publishing landscape. But the author goes on to say: Thus I do not support the proposal that all articles based on state-funded research must pass immediately into the public domain. But there are more modest proposals that deserve our attention. Pending legislation in the US balances the interest of commercial publishers and the public by requiring that, a year after its publication, NIH-funded research must be available, online, in full... I think the author muddles a number of different ideas here (OA does not imply pub

Fedora to grow to include open access publishing, eScience, and eScholarship

Sandy Payette has plans to expand Fedora to support open access publishing, eScience and eScholarship. With a recent $4.9M grant from the Moore foundation, it looks like she might have the opportunity to do this...

Clifford Lynch on Cyberinfrastructure and E-Research

Clifford Lynch 's closing keynote to the 2007 Seminars On Academic Computing entitled " The Institutional Challenges of Cyberinfrastructure and E-Research " is now available as a podcast . Abstract: It has become clear that scholarly practice and scholarly communication across a wide range of disciplines are being transfigured by a series of developments in IT and networked information. While this has been widely discussed at the national and international levels in the context of large-scale advanced scientific projects, the challenges at the level of individual universities and colleges may prove more complex and more difficult. This presentation will focus on these challenges, as well as the development of truly institution-wide strategies that can support and advance the promises of e-research.

Study: Reduced Open Source developer productivity linked to "restrictive" FLOSS licenses (where "restrictive"=GPL and "non-restrictive"=BSD)

A study by economists from Tel Aviv University and the Centre for Economic Policy Research (CEPR) entitled " Open source software: Motivation and restrictive licensing "[ 1 ] ( pre-print ) looks at the productivity of developers on Open Source projects and concludes: " ...that the output per contributor in open source projects is much higher when licenses are less restrictive and more commercially oriented. " and observe: " Projects written for the Linux operating system have lower output per contributor than projects written for other operating systems ... " and: "Output per contributor in projects oriented towards end users (DESKTOP) is significantly lower than that in projects for developers." They also observed that the median # of contributors in "restrictive" projects (13) to be much less than for "non-restrictrive" projects (35). They chose the 71 most active projects on SourceForge in January 2000 and studied them

Australia talks about Research data archiving

I see how the Australians appear to have the good fortune of having the discussion on research data archiving moving forward, as suggested by the upcoming meeting in September, " Long-Lived Collections: the Future of Australia's research data " at the National Library of Australia. This meeting is a follow-up to some very good efforts, including the Australian government's " Data for Science (DFS)" prepared for the Prime Minister’s Science, Engineering and Innovation Council, and the Australian Partnership for Sustainable Repositories' " Sustainability Issues for Australian Research Data: The report of the Australian eResearch Sustainability Survey Project ". I can only be envious of this activity, given the -- unfortunately -- almost complete vacuum of activity following the release of Canada's National Consultation on Access to Scientific Research Data (NCASRD). The two reports - DfS and NCASRD - are very similar in scope and in reco

Data Archiving of Publicly Funded Research in Canada

Carol Perry presented this revealing study at last year's Access & Privacy Workshop 2006 held in Toronto. Its objectives were: "To assess the attitudes of academic researchers regarding the archiving of data resulting from publicly funded research To assess impediments to the creation of a national data archive program in Canada" She randomly polled 173 SSHRC grant recipients for 2004-2005 (with 75 respondents). Her results: "41% indicated they had current plans to archive their research data Of these, only 18.7% identified an established data archive as a deposit site for their data. 72% were not aware of SSHRC’s mandatory data archiving policy for all grant recipients 90% were not aware that Canada is a recent signatory to the OECD declaration on access to publicly funded data ." and •" In 2001: 60% favoured a national data archive 39% analyzed data created by others •In 2006: 69% favoured a national data archive 48% analyzed data created by others

"Sharing the fruits of science"

University Affairs has an interesting article on Open Science that examines the patents and licensing regime and its impacts on science and the ability to do science. While at times advocating an Open Source-like model of Open Science, the author is a little to wishy-washy and supports hybrid models which are too much of a slippery slope for me. I also don't agree with a number of statements including: But now an international scientific counterculture is emerging. Often referred to as "open science" this growing movement proposes that we err on the side of collaboration and sharing. Counter-culture ? I think that he has it backwards: despite the many biotechnologists and biotech companies and other science-based industries that use the patent system to support their business interests - usually encumbering further scientific discovery - the vast majority of scientists - at least working in academia, and of course with exceptions - have long been and will continue, wo

Data Curation Report

Liz Lyon , of UKOLN and DCC, has produced an excellent report titled " Dealing with Data ". It is a very applied look at the issues around data curation and preservation and examines at " the roles, rights, responsibilities and relationships of institutions, data centres and other key stakeholders who work with data ." While it is UK-oriented, most of its recommendations can be applied to other regions. It includes 35 recommendations in eight categories: Co-ordination and Strategy Policy and Planning Practice Technical Integration and Interoperability Legal and Ethical Issues Sustainability Advocacy Training and Skills Many of the recommendations resonate with many of the recommendations of the National Consultation on Access to Research Data (NCASRD) here in Canada that I and others helped organize in 2005. Some recommendations of particular interest: REC 2. Research funding organisations should jointly develop a co-ordinated Data Curation and Preservation Strat

Microsoft Open XML efforts good? - British Library. Update

Microsoft Open XML efforts good ? - British Library . For more problems with the OOXML "open" standard: see Slashdot's Microsoft's OOXML Formulas Could Be Dangerous and the original article by Rob Weir, The Formula for Failure . And perhaps of more significance, some real questions from FSF Europe to national standards bodies, perhaps lessons learned (or those which should be learned) from the OOXML standardization fiasco: Six Questions to national standardization bodies: Application Independence? Supporting pre-existing Open Standards? Backward compatibility for all vendors? Proprietary extensions? Dual Standards? Legally safe?

Microsoft Open XML efforts good? - British Library

It seems that - in a BBC article (" Warning of Data Ticking Time Bomb ", discovered at the ACM TechNews for this week)- Adam Farquhar , head of e-architecture at the British Library, has made a rather disappointing comment on Microsoft and the Open XML format: Microsoft has taken tremendous strides forward in addressing this problem. There has been a sea change in attitude. Sigh. This is very sad. The original press release from the U.K. National Archives is here: The National Archives and Microsoft join forces to preserve the UK´s digital heritage . The Wikipedia article on Open XML shows why this is such a disappointing comment. The Do you love Microsoft? blog has an excellent recent posting on this: OpenXML and the British Library - Part 4. And others also used "disappointing" for other MS related comments by Mr. Farquhar : Adam Farquhar's presentation on OOXML disappointing - CyberTech Rambler Having Your Digital Cake and Eating It - Open... And more
Some catching up.... I am rather behind on some posts (like I attended JCDL2007 in Vancouver last week - sans wireless - and need to post on some goings-on there...) and would like to point out some excellent work presented by a colleague of mine at CISTI : Richard Akerman 's presentation at ICSTI 2007 Nancy, titled " Web tools for web reviewers...and Everyone " and at IATUL titled " Library service-oriented architecture to enhance access to science ". [Thanks to Richard for correcting my earlier confusions....]
W3C Releases WSDL 2.0 Recommendation Today the W3C released version 2.0 of WSDL, which supports both REST-style HTTP and SOAP, and includes a converter to WSDL 2.0 .

Nature Preceedings

Nature has announced what is basically a repository, Nature Preceedings - similar to arXiv.org for physics - for researchers in biology, medicine, chemistry and the Earth sciences to share early findings: " pre-publication research, unpublished manuscripts, presentations, posters, white papers, technical papers, supplementary findings, and other scientific documents ". There is no peer review, but staff curators filter-out materials that are not legitimate scientific contributions. There are also 13 subject RSS feeds. Of particular interest is how every item is given a DOI or Handler, making it more easily citable. More discussion at O'Reilly Radar and Connotea.

Stewardship of digital research data: a framework of principles and guidelines

Sub-title : " Responsibilities of research institutions and funders, data managers, learned societies and publishers " This draft report from the Research Information Network (RIN) , UK, for consultation is a must-read for those wrestling with policies and guidelines concerned with the long-term management, access to, and archiving of digital data generated by the activities of researchers. It outlines a comprehensive policy framework, based around five principles: Roles and responsibilities Standards and quality assurance Access, usage and credit Benefits and cost effectiveness Preservation and sustainability This draft report is a follow-up to the excellent January 2007 report: Research Funders’ Policies for the management of information outputs and the June 2005 RCUK position on issue of improved access to research outputs , the latter focusing solely on research outputs as publications. Of particular interest are the reponses of the various research funding councils in
Ontario Data Documentation, Extraction Service and Infrastructure Initiative (ODESI) - Launched In what likely will become a busy trend, the Ontario Council of University Libraries (OCUL) has announced a project for the creation of a data service providing researchers access to " a significant number of datasets ". ODESI will be part of OCUL's already popular Scholar's Portal. The press release is unclear as to whether this will only house standard data sets (like those from Statistics Canada, etc.) or that this service will allow for researchers to deposite their data. I would argue that a data deposite archive service is much more important at this time, as described and argued in th e National Consultation on Access to Scientific Research Data (NCASRD) , of which I was a participant. I also was not able to find any mention of this on the OCUL or Scholar's Portal.

Tag Cloud inspired HTML Select lists

Image
I have been working with Tag clouds and other Web 2.0 sorts of things quite a bit lately and couldn't help notice that it might be useful to use the Tag cloud "Size reflects frequency/importance" idiom in HTML select lists, so I did a little bit of experimenting (BTW, I did look for these on the Web but didn't find them: it doesn't mean they are not already out there...). So I played with the styles of these elements, and was able to get something that looks like this: Aggregators Blogs Collaboration Joy of Use Podcasting RSS Web 2.0 XHTML Aggregators Blogs Collaboration Joy of Use Podcasting RSS Web 2.0 XHTML I am not sure how the above HTML renders in your browser, but here is how it renders in mine (Firefox 2.0.0.4 on Linux (Suse 10.2): It is interesting how the browser allocates space: it seems like it uses the largest (tallest) item in t
Geist: Open Data and Open Access In an article in the Toronto Star (" Science and Tech Strategy a Missed Opportunity "; archived version ) Michael Geist is strongly advocating that the new Canadian government's science and technology strategy go further, and mandate the Open Access for articles derived from publicly -supported research, those supported by the Federal research funding agencies (NSERC, SSHRC, CIHR, etc), as well as the opening of publicly-supported research data ("raw scientific data"). This to better support re-use by both industry and researchers without the existing complicating and onerous licensing regimes that encumber these data.

NSF Community-based Data Interoperability Networks (INTEROP) Proposal Solicitation

A solicitation for proposals has been issued by the NFS 's US National Science Foundation Office of Cyberinfrastructure with the goals of funding projects supporting the re-use and re-purposing of data, data discovery, interoperability and " consensus-building activities and for providing the expertise necessary to turn the consensus into technical standards with associated implementation tools and resources. " Scientific data is very expensive to acquire, and much of it cannot be reproduced, due to its temporal nature. Vast resources of data acquired through publicly-funded research languish due to the lack of archiving of these data sets. Much of this exists on the hard-drives and (yes) floppy disks of researchers, much of which is thrown away when the researcher retires. Both due to the loss of dataset, and the lack of standard metadata (some disciplines are better off than others) and tools for the discovery and use (interoperability) of existing data sets, re-use
I am at WWW2007 only for today (Thursday) after attending the W3C meeting on Sunday -- Tuesday. I must say that I regret not registering for the rest of the WWW2007 conference, as it has moved to what I believe to be a more relevant, robust venue for web research and activities. This morning I attended the panel session " Building a Semantic Web in Which Our Data Can Participate " session, moderated by Paul Miller of Talis, with panelists Steve Coast (OpenStreetMap) Peter Murray-Rust (University of Cambridge) Rob Styles (Talis) Jamie Taylor (Metaweb) It was a very good panel, although the discussion revolved more around getting access to data as opposed the the Semantic Web aspect.
"Everyone uses Linux, because everyone uses Google" - Tim O'Reilly Tim O'Reilly points out in his presentation at the W3C AC meeting at Banff, Alberta, that since Google is the largest deployed Linux app (the backend Google farm is Linux boxen), and since everyone uses Google, therefor everyone uses Linux.

"Embracing Web 3.0"

In IEEE Internet Computing , Ora Lassila and James (Jim) Hendler discuss Web 3.0 in their Embracing Web 3.0 as a union of (some) parts of Web 2.0 and the Semantic Web.
W3C AC Meeting I've made my way to Banff, Alberta for the May 2007 W3C advisory committee (AC) meeting (I am the NRC's W3C AC rep). I will be reporting on various aspects of the meeting.