Andrew Dalke
2012-08-29 13:47:36 UTC
Perhaps the Blue Obelisk "Open Data" page could describe what one should do in order to make their datasets open, or to disclaim any legal protections to data sets?
Here is text which I think helps fill in the "Open Data", "Open Source","Open Specification" pages of the Blue Obelisk pages.
Please note that I have changed some of the Blue Obelisk terms
to be more acceptable to my views and understanding. For example, I
do not believe that an Open Standard essentially requires "community process."
Cheers,
Andrew
***@dalkescientific.com
===============
http://sourceforge.net/apps/mediawiki/blueobelisk/index.php?title=Open_Source
We use "Open Source" to encompass
free [http://www.gnu.org/philosophy/free-sw.html]
and
open source [http://opensource.org/docs/definition.php] software
as well as software in the
public domain [http://en.wikipedia.org/wiki/Public_domain].
(ODFOSOS doesn't have the same simplicity to it). We believe
that open source is an essential prerequisite for useful peer
review, we believe that open source research software is
the best way to advance and disseminate knowledge, and we
encourage people to release their own software under an open
license.
Do you want to release your software as Open Source?
The principles of free and open source software is based in
copyright law. Software is essentially text, and in most countries
you, or perhaps your employer, automatically have the right to
prevent others from copying that text. You must actively either
grant others a license to make copies or give up your copyright
and put it into the public domain.
The book "Producing Open Source Software" has an excellent
summary of how to
choose a license [http://producingoss.com/en/license-quickstart.html].
Our recommendation is to use the
GNU GPL [http://www.gnu.org/licenses/#GPL],
the MIT / X Window System License [http://opensource.org/licenses/mit-license.php]
if you are interested in a simple license, and the
Apache License 2.0 [http://www.apache.org/licenses/LICENSE-2.0]
for a permissive license which also explicitly grants a patent license.
If you want to put your software into the public domain then we
recommend you either use the text from the
SQLite public domain dedication [http://www.sqlite.org/copyright.html]
or use the
CC0 public domain dedication [http://creativecommons.org/about/cc0].
Be aware that "public domain" doesn't have a well-defined internationally
recognized meaning, so many people prefer instead the certainty of
a license.
===============
http://sourceforge.net/apps/mediawiki/blueobelisk/index.php?title=Open_Data
We use "Open Data" to apply to data sets which are part
of the open scientific literature, including data sets
which are published on research web sites. We believe
that facts and collections of facts must be in the public
domain, or at the very least distributed by a data license
which adheres to the
Panton Principles [http://pantonprinciples.org/].
We believe all published data sets, including those which
contain material covered under copyright, should be
released under a permissive data license which allows
anyone to copy, process, analyze, modify, and redistribute
that data, for any purpose.
We specifically exclude personally identifying information from
this belief, and have no position about data which might be
seen a posing a public hazard.
Do you want to release your data as Open Data?
A fact is something like "the triple point of water is 273.16 K"
or "the SMILES for methane is [CH4]". While gene patents
come close, facts concerning the natural world are not,
by themselves, protectable. There must be some transformative
or creative step to make a fact protectable under patent or
copyright law.
The difficulty is that a collection of facts might be protected.
For example, if you've used creative thought and expert opinion
in order to select the elements in the data set, then you may
have a specific database right. If you've arranged the data in
a novel and creative fashion, then that is also protected.
The specific details depends on your country's legal system;
in the UK, a database right exists if there is a "substantial
investment in obtaining [or] verifying" the data in the database.
You can easily see how that might apply to scientific data.
You can make your data Open Data by putting it in the public
domain. This is called a public domain dedication or waiver,
and it is not a license. The easiest and best solution is to use the
CC0 public domain dedication [http://creativecommons.org/about/cc0].
You may think it's okay to use a simpler version, like a
variation of the
SQLite [http://www.sqlite.org/copyright.html]
or PDB
PDB [http://www.rcsb.org/pdb/static.do?p=general_information/about_pdb/policies_references.html]
dedications. The problem is that these only cover copyright
and not the database, moral, or other related rights which might
also be in the data set.
You need to be careful. Just because you collected a bunch of
data doesn't mean that you have the right to dedicate it to
the public domain. For one, your employer or university might
be the legal rights holder, so check with them first. You need
to be careful that you don't violate the legal rights of others.
For example, suppose you develop a crowed-sourced collection
of chemical protocols, where commentary on each of the protocols
was contributed by others. Those people have a copyright interest
in the result, so you can't distribute your data set without their
permission. For this case we recommend that you require that
contributors license their contribution under the
Attribution-ShareAlike 3.0 Unported License [http://creativecommons.org/licenses/by-sa/3.0/].
And importantly, don't take someone else's proprietary data set,
extract a lot of the of records, and release it. Even if you
add a lot of new data, that's just plain illegal under copyright law.
===============
http://sourceforge.net/apps/mediawiki/blueobelisk/index.php?title=Open_Standards
We believe that Open Standards are necessary to promote scientific
data exchange, analysis services, and data archiving.
By Open Standards we mean that data formats, and control and exchange protocols
must be documented in such a way that person
"skilled in the art" [http://en.wikipedia.org/wiki/Person_having_ordinary_skill_in_the_art]
can produce software which can read, write, and exchange data with other
software which implements those Open Standard. Open Standards must be available
on non-discriminatory terms, must not require patent, trademark, or other
licensing, must not require a royalty fee of any sort, must not be protected
using DRM or other technical means, and must not prohibit reverse engineering.
For historical reasons, we acknowledge that older standards published in
the open literature and available through public research libraries and
the publishers, may be included as an Open Standard. Otherwise, Open
Standards must not require an access fee, registration, or license agreement
in order to access and use the documentation, and must not prevent redistribution
of copies of the standard, including by DRM or copyright restrictions.
We take no position on if Open Standards must allow derivative works. We
specifically exclude trademark issues so long as software may use the
Open Standard without trademark permission. We take no position on if
an Open Standard requires public involvement or feedback.
Do you want to release your data as an Open Standard?
A standard is just documentation. The easiest way to make it an
Open Standard is to release the document under one of the open
content licenses. Do you want others to be able to develop
variations of the standard? Then we suggest using the
Attribution 3.0 (CC BY 3.0) [http://creativecommons.org/licenses/by/3.0/]
or
Attribution-ShareAlike 3.0 (CC BY-SA 3.0) [http://creativecommons.org/licenses/by-sa/3.0/]
licenses. Otherwise, if you want to control the standards document
(which is not the same as controlling the standard) then we suggest
the
Attribution-NoDerivs 3.0 (CC BY-ND 3.0) [http://creativecommons.org/licenses/by-nd/3.0/].
Unfortunately, standards may depend on patents, and patent issues are
not part of the above licenses. You need to be aware if your standard
requires any patent. You may also need to require that participants
in the standards development waive their patent claims before joining,
but so far a lack of patent waiver has not caused problems.
You may also release a reference implementation or validation dataset
as part of the specification. These should be released as
Open Source [http://sourceforge.net/apps/mediawiki/blueobelisk/index.php?title=Open_Source]
and
Open Data [http://sourceforge.net/apps/mediawiki/blueobelisk/index.php?title=Open_Data],
respectively.