Tuesday, July 27, 2010

It's not Open Data, so stop calling it that...

While it is a great positive change that data is being released through numerous efforts around the world, data release is not the same as Open Data release. A number of Canadian cities have announced Open Data initiatives, but they are not releasing Open Data. They are just releasing data. Of course, this is better than not releasing data. But let's at least be honest about what we are doing.

Why aren't they Open Data? Because their licenses are not Open Data licenses:
  • Not Open Data: Edmonton: "The City may, in its sole discretion, cancel or suspend your access to the datasets without notice and for any reason..." - from Terms of Use
  • Not Open Data: Vancouver: "The City may, in its sole discretion, cancel or suspend your access to the datasets without notice and for any reason..." - Terms of Use
  • Not Open Data: Ottawa: "The City may, in its sole discretion, cancel or suspend your access to the datasets without notice and for any reason..." - from Terms of Use
  • Not Open Data: Toronto: "The City may, in its sole discretion, cancel or suspend your access to the datasets without notice and for any reason..." - from Terms of Use
All of these licenses also suffer from the additional mis-feature of arbitrary retroactivity:
"The City may at any time and from time to time add, delete, or change the datasets or these Terms of Use. Notice of changes may be posted on the home page for these datasets or this page. Any change is effective immediately upon posting, unless otherwise stated"

These two clauses mean that there is no stability for someone using this data. If, something they do or say (data related or not) is not liked by the city whose data they are using, they can lose access. Or if the city finds that many data users are doing things they do not like, they can change the terms of reference to impact data previously obtained by users.

How to fix
Obligatory versioning of both datasets and licenses, and losing the above two clauses. When a dataset is released, it is given a version, and that release is matched to a (usually the most recent) license version, that will always apply to that version of that data release. Any change to a license generates a new version, only applicable to subsequent releases that choose to use the new license.

This is how things work in the Open Source world. It means that if you possess a piece of Open Source software, with a license of a specific version, someone half-way across the world from you cannot turn you into criminal and/or shut you down by retroactively changing the license. It means that you have stability. Of course, you may be shut out of the next version if they change its license, but that doesn't necessarily shut you down today. You have some level of stability.

An example: an SME builds a business based on data released by the cities. This business perhaps includes data mining tools that reveal some things that some of the cities do not like revealed or discussed. They change the license (remember: "...cancel or suspend ...without notice and for any reason...") or simply cancel or suspend the company's data access to shut this company out, and the company goes out of business.


So, if you want to release Open Source code or Open Data, you must be willing to accept that it will be used in ways that you may find offensive, to you (and/or your constituents). That is how it works.

Update: 2010 10 14: Eight Principles of Open Data from Open Government Data Principles:
  1. Primary Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modiļ¬ed forms.
  2. Complete All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
  3. Timely Data is made available as quickly as necessary to preserve the value of the data.
  4. Accessible Data is available to the widest range of users for the widest range of purposes.
  5. Machine processable Data is reasonably structured to allow automated processing.
  6. Non-discriminatory Data is available to anyone, with no requirement of registration.
  7. Non-proprietary Data is available in a format over which no entity has exclusive control.
  8. License-free Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.
The above cities' licenses are are not compliant with #4 and #6 of these eight principles. See also http://zzzoot.blogspot.com/2010/08/what-is-open-gov-data-sunlight.html

Update: 2010 Nov 7: http://acrosscanadatrails.posterous.com/civicaccess-discuss-importance-of-true-open-d


Luke Closs said...

If I post up some open source code on github, you would still call it open source even though:

I could, in my sole discretion remove access to that code from github without any notice to you.

I could also at any time add/delete/change the code on github or the license of the code.

Yet you'd still call it open source.

And you'd have no stability for that code.

I'm in no way saying the data license is good, just saying it's actually very similar to the guarantees you get for open source code.

Glen Newton said...

No, this is different: in your example, if I have already downloaded the code before you deleted it and am using it in my business, and it has an Open Source, then you deleting it on github has no effect on me. The business can still continue.

And if you change the license, I am still unaffected: I downloaded the code under the terms of the previous license, which is an Open Source one. No retroactivity.
Therefor some level of stability. You can't shut me down, as I am using the version of the software as per the license the code is associated with.

And so these controlled data licenses are very far from the Open Source one.

John said...

FYI: Most of the canadian cities copied Vancouver, as Vancouver was the first to really release datasets. Thats why it appears the errors repeat.

At Ottawa Changecamp 2010 a new license was discussed with city officials

Anonymous said...

Glen, just to update you as you seem to be interested.

As you can probably tell the Cities of Toronto, Edmonton and Ottawa all used similar text as created by the City of Vancouver.

As a group and as individuals we realize that the terms do create restrictions, that has been pointed out to many of us already. Since late last year the 4 cities have been working together to address this issue. We are making great progress, and if you know anything about government this can take some time in order to review.

So stay tuned as we all work together to change our open terms.

Thanks for your insights and comments

Chris Moore
City of Edmonton

Glen Newton said...

Hi Chris,

I am very happy progress is being made. I understand that both the politicians and the lawyers need to be eased into the Open Data realm.

What I am focusing on most now is the versioning of both licenses and data. I would think that the lawyers would prefer versioning too.

If you ever want my direct input on things, please feel free to contact me.