Wednesday, August 29, 2007

Fedora to grow to include open access publishing, eScience, and eScholarship

Sandy Payette has plans to expand Fedora to support open access publishing, eScience and eScholarship. With a recent $4.9M grant from the Moore foundation, it looks like she might have the opportunity to do this...

Clifford Lynch on Cyberinfrastructure and E-Research

Clifford Lynch's closing keynote to the 2007 Seminars On Academic Computing entitled "The Institutional Challenges of Cyberinfrastructure and E-Research" is now available as a podcast.

Abstract: It has become clear that scholarly practice and scholarly communication across a wide range of disciplines are being transfigured by a series of developments in IT and networked information. While this has been widely discussed at the national and international levels in the context of large-scale advanced scientific projects, the challenges at the level of individual universities and colleges may prove more complex and more difficult. This presentation will focus on these challenges, as well as the development of truly institution-wide strategies that can support and advance the promises of e-research.

Friday, August 24, 2007

Study: Reduced Open Source developer productivity linked to "restrictive" FLOSS licenses (where "restrictive"=GPL and "non-restrictive"=BSD)

A study by economists from Tel Aviv University and the Centre for Economic Policy Research (CEPR) entitled "Open source software: Motivation and restrictive licensing"[1] (pre-print) looks at the productivity of developers on Open Source projects and concludes:

"...that the output per contributor in open source projects is much higher when licenses are less restrictive and more commercially oriented."

and observe:
"Projects written for the Linux operating system have lower output per contributor than projects written for other operating systems..."
"Output per contributor in projects oriented towards end users
(DESKTOP) is significantly lower than that in projects for developers."
They also observed that the median # of contributors in "restrictive" projects (13) to be much less than for "non-restrictrive" projects (35).

They chose the 71 most active projects on SourceForge in January 2000 and studied them over an 18 month period starting in January 2002. They measure these projects every 2 months over this period resulting in 9 samples. The metrics they used include: Source lines of code (SLOC), #contributors, the "restrictiveness" of the license (ranging from GPL = very; LGPL, Mozilla, NPL, MPL = moderate; or BSD = non), operating system, age of project, if it is a desktop or system application, language (C++ or C = 1; all others = 0), and others. They took in to account the difference between the LOC of language by separately also looking at just the C++ or C projects.

I do not understand the lag in choosing the projects (January 2000) and the start of the data sampling (January 2002). This in itself could have skewed the results, i.e. the 71 most active projects in 2000 would almost definitely NOT be the most active 2 years later. I think this may be a major flaw in this study.

I also don't think that the sampling size is large enough & that the sampling method should have been a random selection of projects that met some reasonable criteria, like:
  • had at least C contributors
  • had at least L lines of code contributed over the last M months
  • had at least D downloads over the last M months (penalized very new & very unpopular projects??)
I also believe that they made another possible error: they observe in their discussion that the median number of LOC per project was 53K for "non-restrictive" and 60k for "restrictive". They suggest that this is not a big difference (they do not appear to verify the nature of the distribution of LOC in projects by license grouping statistically). But I would suggest that 500 lines of code for a project that has 5k LOC can often be a more significant contribution than 500 LOC to a 100K LOC project. They should have looked into the effect of normalizing the contributed LOC by the total LOC in the project.

I haven't taken too much time to go over all of their experimental design, model & stats....

This study builds on an earlier study titled "The Scope of Open Source Licensing"[2] 2005, (pre-print), which is where the authors get their view of "restrictiveness" for licenses. This study found:
"Projects geared toward end-users tend to have restrictive licenses, while those oriented toward developers are less likely to do so. Projects that are designed to run on commercial operating systems and whose primary language is English are less likely to have restrictive licenses. Projects that are likely to be attractive to consumers—such as games—and software developed in a corporate setting are more likely to have restrictive licenses. Projects with unrestricted licenses attract more contributors."
This study used all 40k SourceForge projects available (2002).

[1] Fershtman, C. & N. Gandal. 2007. Open source software: Motivation and restrictive licensing. International Economics and Economic Policy.

[2] Lerner J, Tirole J (2005) The scope of open source licensing. Journal of Law, Economics and Organization 21:20–56

Monday, August 20, 2007

Australia talks about Research data archiving

I see how the Australians appear to have the good fortune of having the discussion on research data archiving moving forward, as suggested by the upcoming meeting in September, "Long-Lived Collections: the Future of Australia's research data" at the National Library of Australia. This meeting is a follow-up to some very good efforts, including the Australian government's "Data for Science (DFS)" prepared for the Prime Minister’s Science, Engineering and Innovation Council, and the Australian Partnership for Sustainable Repositories' "Sustainability Issues for Australian Research Data: The report of the Australian eResearch Sustainability Survey Project".

I can only be envious of this activity, given the -- unfortunately -- almost complete vacuum of activity following the release of Canada's National Consultation on Access to Scientific Research Data (NCASRD).

The two reports - DfS and NCASRD - are very similar in scope and in recommendations, reflecting the similar but not the same situations in both countries.

Related blog entries:


Update 2008 09 01:

Tuesday, August 14, 2007

Data Archiving of Publicly Funded Research in Canada

Carol Perry presented this revealing study at last year's Access & Privacy Workshop 2006 held in Toronto. Its objectives were:

  • "To assess the attitudes of academic researchers regarding the archiving of data resulting from publicly funded research
  • To assess impediments to the creation of a national data archive program in Canada"
She randomly polled 173 SSHRC grant recipients for 2004-2005 (with 75 respondents). Her results:
  • "41% indicated they had current plans to archive their research data
  • Of these, only 18.7% identified an established data archive as a deposit site for their data.
  • 72% were not aware of SSHRC’s mandatory data archiving policy for all grant recipients
  • 90% were not aware that Canada is a recent signatory to the OECD declaration on access to publicly funded data."

•"In 2001:
  • 60% favoured a national data archive
  • 39% analyzed data created by others
•In 2006:
  • 69% favoured a national data archive
  • 48% analyzed data created by others"

"86.7% would not alter their grant-seeking behaviour if SSHRC enforced its data archiving policy"

These results are both hopeful and frustrating: meta-analysis is up, support for a national data archive is up, there is little perceived negative impact by researchers of SSHRC data archiving policy and almost half of the respondents indicated that they were planning to archive their data; on the other hand, 3/4 of the respondents didn't even know about SSHRC's policy, and 9/10 didn't know about Canada's recent OECD commitment (which has led to the recent publication from the OECD: OECD Principles and Guidelines for Access to Research Data from Public Funding, which has a great deal of overlap with Canada's National Consultation on Access to Research Data (2005) and the earlier National Data Archive Consultation Building Infrastructure for Access to and Preservation of Research Data (2002)).

I believe the level of acceptance of researchers is high enough to move forward on an national data archive, and clearly there also needs to be a better education campaign by SSHRC and other Canadian research funding bodies both at the strategic level - read "policy and funding" - and at the tactical level - read "engaging, informing and educating researchers".

"Sharing the fruits of science"

University Affairs has an interesting article on Open Science that examines the patents and licensing regime and its impacts on science and the ability to do science. While at times advocating an Open Source-like model of Open Science, the author is a little to wishy-washy and supports hybrid models which are too much of a slippery slope for me.

I also don't agree with a number of statements including:

But now an international scientific counterculture is emerging. Often referred to as "open science" this growing movement proposes that we err on the side of collaboration and sharing.
Counter-culture? I think that he has it backwards: despite the many biotechnologists and biotech companies and other science-based industries that use the patent system to support their business interests - usually encumbering further scientific discovery - the vast majority
of scientists - at least working in academia, and of course with exceptions - have long been and will continue, working in an Open Science environment. Not to take away from the Open Science movement and what it is trying to do. But it existed before someone decided to call it Open Science and it is the default model / mode for most scientists in academia. The tail is wagging the dog a little here...

Thanks to Mary Zborowsky and Michel Sabourin for pointing-out this article.

Related article in University Afairs: "The bottom line on open access" by John Lorinc, March 2006.