Open Access and Open Data

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Wed, 23 Jan 2008 20:56:21 +0000

    Hyperlinked version:
    http://openaccess.eprints.org/index.php?/archives/353-guid.html

The arch-analyst of apertivity, Richard Poynder, has published yet
another excellent interview, this time time with Peter Murray-Rust, a
dedicated advocate of Open Data (OD).
http://poynder.blogspot.com/2008/01/open-access-interviews-peter-murray.html

Here are a few comments on some important differences between Open
Access (OA) and Open Data (OD).

The explicit, primary target content of OA is the full-texts of all
the articles published in the world's 25,000 peer-reviewed scholarly
and scientific journals. This is a special case, among all texts,
partly because (i) research depends critically on access to those
journal articles, because (ii) journals are expensive, because (iii)
authors don't seek or get revenue from the sale of their articles, and
hence have always given them away to any would-be user, and because
(iv) lost access means lost research impact.

Research data are also critical to research progress, of course, but
the universal practice of publishing research findings in refereed
journal articles has not extended to the publication of the raw data
on which the articles are based. There have been two main reasons for
this. One was the capacity of the paper medium: There was no
affordable way that data could be published alongside articles in
paper journals. The other was that not all authors wanted to publish
their data, or at least not right away: They wanted the chance to
fully data-mine the data they had themselves gathered, before making
it available for data-mining by other researchers.

The online era has now made it possible to publish all data affordably
online. That removed the first barrier (although there are still
technical problems, which Peter Murray-Rust and others discuss and are
working to overcome). But the question of whether and when an author
makes his data open is still a matter for the author to decide.
Perhaps it ought not to be the author's choice -- but that is a much
bigger and more complicated question than OA (for in OA all authors
already want to make their published articles freely accessible
online).

That difference in scope and universality is one of the reasons the OA
and OD movements are distinct ones: OD has both technical and
political problems that OA does not have, and it is important that OA
should not be slowed down by inheriting these extraneous problems --
just as it is important that OD should not be weighed down by the
publisher copyright problems of OA (which do not apply to OD for the
simple reason that the authors do not publish their data, hence do not
transfer copyright to a publisher).

So far, this is all simple and transparent: OA and OD have different
target contents, with different problems to contend with. OA's
solution has been for researchers' institutions and funders to mandate
the self-archiving of all of OA's target content, making it free for
all online. But an interesting overlap region is thereby created
between OA and OD: for article texts are themselves data! And one of
the most important purposes for which the OD movement has sought to
make data freely available online -- apart from the purpose of making
it available for collaboration and use by all researchers -- is
data-mining, by individuals as well as by software, and for
re-publication in further 3rd-party online databases. Data-mining can
be done not only on raw research data, but on article texts too,
treated them as data: text-mining.

Here too, the interests of OA and OD are perfectly compatible and
complementary -- except for one thing: If text-minability and
3rd-party re-publication were indeed to be made part of the definition
of OA (i.e., not just removing price barriers to access by making
research free for all online, but also "removing permissions barriers"
by renegotiating copyright) then this would at the same time radically
raise the barriers to achieving OA itself (just as insisting on making
the paper edition free would), making it contingent on authors
willingness and success in renegotiating copyright with their
publishers.

The online medium itself had been the critical new factor that had
made it possible to remove price barriers to access, by making
research articles toll-free online. But the price for going on to
insist on the removal of both price barriers and "permissions
barriers" jointly, as part of the very definition of OA, would have
been to raise the problem of overcoming permissions barriers as a
barrier to overcoming price barriers! For the new online medium that
made toll-free online access possible, did not, in and of itself,
redefine copyright, any more than it redefined ownership of the paper
edition.

Toll-free online access (OA) will lead to copyright reform (and
publishing reform, and perhaps eventually also to the demise of the
paper edition). But the online medium alone, in and of itself, simply
made toll-free online access possible -- and that is hence the proper
definition of OA. (After all, copyright retention by authors was
perfectly possible in the paper era. In and of itself, it is not an
online matter at all -- although the online medium, and OA itself,
will eventually lead to it.)

Peter Murray-Rust is right that there was some naivete about some of
this at the time of the drafting of the BOAI definition of OA (which I
signed, even though I later opted for an updated definition of OA, one
that resolved this ambiguity in favor of immediate OA and its capacity
to grow). More than naivete, there was ignorance and lack of
foresight, both about the technical possibilities and about the
practical obstacles. It was the online medium that had made OA
possible: Toll-free access for all users had not been possible or even
thinkable in the paper era, either to articles or to data, for both
economic and practical reasons. But with the advent of the online era,
toll-free access online became thinkable, and possible. Indeed it was
already within reach: The only thing authors had to do was to make
their articles and data accessible free for all, online.

But most article authors did not make their articles freely accessible
online -- even though they all, without exception, sought no income
from them their sale, wanting them only to be used, applied, cited and
built upon. Most authors remained paralyzed because (1) they were
worried about copyright and because (2) they didn't know how to
provide OA, imagining that it might require a lot of time and effort.

The solution was Green OA self-archiving mandates on the part of their
universities and funders, as an extension of their already existing
publish-or-perish mandate. In particular, the IDOA
(Immediate-Deposit/Optional-Access) Mandate requires researchers to
deposit their articles in their Institutional Repositories (IRs)
immediately upon publication (with access temporarily set to Closed
Access for those journals that impose an access embargo period).

The IDOA solution works for OA -- it provides immediate OA for all the
articles that are published in the 62% of journals that already
endorse immediate OA. And for the 38% that do not, the articles are
deposited as Closed Access; the IR's semi-automatic "email eprint
request" button then provides users with almost-immediate, almost-OA
during any embargo period.

But this solution does not work for OD, because (a) depositing data
cannot be mandated, it can only be encouraged and because (b) making
article-texts re-usable by 3rd-party text-miners and re-publishers as
data requires permission from the copyright holder. That is not part
of IDOA, and the "email eprint request" button does not cover it
either.

So the strategic issue is whether to insist on something stronger than
IDOA -- at the risk of not reaching consensus on any mandate at all --
or waiting patiently a little while longer, to allow IDOA mandates to
become universal, generating toll-free online access (OA), with its
immediate resultant benefits to research and researchers -- and to
trust that the pressure exerted by those very benefits will lead to
the demise of embargoes as well as to OD (for both data and texts) in
due course.

I would accordingly urge patience on the part of the OD community, as
well as to the Gold OA (publishing) and copyright-reform communities
(even though I am by no means patient by nature myself!). Their day
will come soon too!

But first, please allow Green OA to take the natural course that is
now wide open for it, paving the way with universal IDOA mandates
generating toll-free online access to research, and all its immediate
benefits. The strategic course to take now is to allow those mandates
to propagate globally. This is not the time for over-reaching, raising
the ante for OA higher than what the mandates can provide, and thereby
only jeopardizing their chances of being adopted in the first place.

    Brody, T., Carr, L., Gingras, Y., Hajjem, C., Harnad, S. and
    Swan, A. (2007) Incentivizing the Open Access Research Web:
    Publication-Archiving, Data-Archiving and Scientometrics. CTWatch
    Quarterly 3(3).
    http://eprints.ecs.soton.ac.uk/14418/

Stevan Harnad
AMERICAN SCIENTIST OPEN ACCESS FORUM:
http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.h
tml
    http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/

UNIVERSITIES and RESEARCH FUNDERS:
If you have adopted or plan to adopt a policy of providing Open Access
to your own research article output, please describe your policy at:
    http://www.eprints.org/signup/sign.php
    http://openaccess.eprints.org/index.php?/archives/71-guid.html
    http://openaccess.eprints.org/index.php?/archives/136-guid.html

OPEN-ACCESS-PROVISION POLICY:
    BOAI-1 ("Green"): Publish your article in a suitable toll-access journal
    http://romeo.eprints.org/
OR
    BOAI-2 ("Gold"): Publish your article in an open-access journal if/when
    a suitable one exists.
    http://www.doaj.org/
AND
    in BOTH cases self-archive a supplementary version of your article
    in your own institutional repository.
    http://www.eprints.org/self-faq/
    http://archives.eprints.org/
    http://openaccess.eprints.org/
Received on Wed Jan 23 2008 - 21:03:24 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:11 GMT