Friday, September 17, 2010

Mars Inc. Cacao Genome Database claims Open Access, public domain: falls short

This initially looked very promising: Mars, along with a number of collaborators (USDA, IBM, Clemson University Genomics Institute; Public Intellectual Property Resource for Agriculture at the University of California-Davis; National Center for Genome Resources; Center for Genomics and Bioinformatics at Indiana University; HudsonAlpha Institute for Biotechnology; and Washington State University), have sequenced the cacao genome and released it "Open Access" and "public domain" for the benefit of all, at a site called the Cacao Genome Project:
McLean, VA –Today, Mars, Incorporated, the U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS), and IBM released the preliminary findings of their breakthrough cacao genome sequence and made it available in the public domain.
- From the Mars Inc. press release 15 September 2010
A quote from the Independent article on the release (First rice, then wheat – now cocoa genome unravelled 15 Sep 2010) from one of the collaborators on the project:
Professor Shapiro, a molecular biologist, said: "We thought: 'Let's put this in the public domain so everyone has free access to it for eternity'. It could be patented and it can't be now. We have full open access.
"public domain"

"full open access"

As this is data, we could also be talking about Open Data.

Let's take a look at how 'open' this Cacao Genome Project is by examining the fine print (of the license):

In order to get access to the data, you have to get an account (no anonymous access; obligatory registration is pretty counter-Open Access and arguably not 'public domain'). In order to get an account, you have to agree to a license.

Registration & license

From the license:
The Provider is making available the information and data found in the cocoa genome databases for general information purposes for scientific research, germplasm conservation and enhancement such as plant breeding, technical training, general education, academic use, or personal use.
Restricted use, appearing not to include commercial use. So more of a GPL-ish license as opposed to a BSD-ish license (before anyone calls me out, but I am not saying GPL is NOT commercial, just generally viewed as less commercial-friendly than BSD).

Moving on:
Anytime the User consults the data base through the cocoa genome database web site, he/she shall be bound to the same obligations under this IAA. Should the User store the information and data for future use he/she shall be bound to the same obligations under this IAA.

The User shall not transfer the information referred to in this agreement, or any copy of them, to a third party without obtaining written authorization from the Providers which will only be provided subject to the third party user entering into this same IAA.

Wow. That is particularly extraordinary. A WTF moment.

Fortunately I didn't agree to the license so I AM able to talk about it now.

Not allowing third parties to see a license is inherently incompatible to the idea of Open Access, Open Source, Open Data and public domain.

It is simply bizarre in these modern times.

Moving on:
The User shall not claim legal ownership over the information and data found in the data base nor seek intellectual property protection under any form over these information, data and data base. For clarity, the user agrees not to claim any of the sequences disclosed in these databases in any patent application.

Translation: Don't claim legal ownership, because we own the IP for the data AND the sequences, and (maybe) we will be claiming patents, etc some time in the future. I have not been able to find anything on the site to the contrary (see below 'Deluded or Disengenuous' below).

Moving on:
However, the foregoing shall not prevent the User from releasing, reproducing or seeking intellectual property protection on improved seeds or plants that may be developed using the information for purposes of making such seeds or plants available to farmers for cultivation.
This appears to allow commercial use of the database ("make available" can include selling the seeds), which seems to conflict with the earlier clause.


Clearly, this data set has not been released as Open Access and certainly not released into the public domain.

Instead of Open Access or public domain, they have a restrictive license, which allows gated access for a restricted set of uses.

They should therefore not be claiming Open Access or public domain for this data.

Deluded or disingenuous?

The "About" page of the Cacao Genome Project claims that the license is in place to defensibly block patents of the sequences. While this may be true, claiming an Open Access AND public domain release of the data is either disingenuous or deluded.
Public access to the genome will be available permanently without
patent via the Cacao Genome Database. Before viewing the data, users
have to agree that they will not seek any intellectual property
protection over the data, including gene sequences contained in the
database. The Information Access Agreement allows any cacao breeders
and other researchers to freely use the genome information to develop
new cacao varieties. This allows for a level playing field and a
healthy competitive environment that will ultimately benefit the
sustainability of cacao production in the long term.

'Free' as in 'beer' they should have said.


newsreader said...

Not only that, but during the first day of the website, it also stated that if you use the data you could not publish any articles with it until some unspecified time in the future, not that I would ever know what to do with it;) but someone out there must ..... so where does that leave them? but it changed the next day... so which version is accurate if you enter the first day...

Egon Willighagen said...

Thanx for writing up this detailed analysis!

Anders Norgaard said...


I agree that the comparison with GPL is not spot-on (in particular because the GPL does not have field-of-use restrictions).

A comparison with the Creative Commons NC clause would be more correct, I think.


Anders Norgaard said...

To be more precise. The field-of-use restrictions are not GPL-ish. They are incompatible with the GPL (and free software).


Jenny Reiswig said...

I'm not sure they mean you can't share the license. I think they mean you can't share the information which is the subject OF the license, ie, the content of the database. IOW, you can't save a copy of the database and give it to someone who hasn't agreed to the terms. It could be clearer though.