Re: OA IRs Are Research Access Providers, Not Publishers or Library Collections

From: Stevan Harnad <>
Date: Wed, 5 Mar 2008 22:54:35 +0000

On Wed, 5 Mar 2008, John Smith wrote:

> Let's go back to the beginning.
> Local archives/repositories were intended to store items but no-one
> outside the organisation concerned knew what was in them.

John, that's not the beginning! If we fast-forward from the origins of
language to the origins of writing and print, we have texts stored on
paper and then online. These included texts the reader bought (books,
journal articles) and texts the author sold (books) or gave away
(journal articles).

Authors' drafts can be stored online in Closed Access or in Open Access.
"Archives" and "repositories" are creations of our own (sometimes not
too clear or coherent) imaginations -- at least insofar as the online ones
are concerned. Bits are stored on devices (including networked ones),
with various user access privileges.

> OAI-PMH was invented (by the Open Archives Initiative) to enable the
> builders of archives to make the metadata that described their contents
> available to the outside world.
> The Open Access Movement (OAM) came along and adopted this idea of open
> (i.e. visible) archives via OAI-PMH to build Institutional Repositories
> (IRs) as an alternative to subject archives.

Not quite. The OA Movement (not yet thus named) much earlier came up
with the idea of making journal articles freely accessible online. Till
the Web (1990s), this could only be via FTP sites (1980s). Then it
became websites. And with the OAI-PMH (1999) it became possible via any
OAI-compliant site -- and these in turn could be harvested (for various
reasons, among them the creation of virtual subject archives).

> This is where the line crossing started. In the original OAI proposal
> the metadata exposed described the actual content of the repository but
> the OAM view (at least some members of it) was that the IR was a record
> of research outputs and a provider of access to some variant of those
> outputs. Under this view metadata is not a description of the content of
> the repository it is a description of a research output which may have
> a version of that output attached to it. This has led to a situation
> where many (most?) IRs today contain many more bibliographic records
> (metadata with no associated full text) than they do records with
> associated full text.

The current end-state is indeed as described (many IRs devoid of anything
but metadata) but the path to that current end-state (and beyond it!) is
not quite as described!

The original motivation for the OAI-PMH (1999) was to provide a "universal
preprint service" by making all "preprint" archives interoperable. Two
things soon became clear: (1) The real target was not the preprint but
the postprint; and (2) the OAI-PMH was potentially applicable to many
more kinds of digital content besides preprints and postprints.

The protocol then split between its original originally intended use
-- to provide free, interoperable online access to research -- and its
subsequently extended use -- to make broader digital content harvestable
and interoperable (and to manage and preserve it).

The original OA agenda (not yet thus named) was temporarily obscured;
and then it was recovered, in 2001, with the founding of the BOAI and
the coining of the term "OA". Meanwhile, OAI-compliant IRs were born,
and to this day most are still confused between their potential OA
function and their potential digital content management function.

> There is another area of line-crossing. When IRs were true repositories
> they were meant to be stable environments where a link to a full text
> item was maintained indefinitely and it was the same full text item.

This is the generic digital preservation agenda -- not to be confused
with the OA agenda, which is about research access provision, not
(particularly) about generic digital preservation.

> This
> was the approach adopted by the original subject archives; a new version
> did not overwrite an earlier version. This was done (I believe in part)
> to discourage the depositing of early inaccurate versions because the
> authors knew it was difficult or impossible to retract an item after
> deposit.

I am not sure what the "original subject archives" are: Arxiv?

In any case, IRs welcome, and track, successive versions of articles,
pre- and post-publication.

> Now it seems to be becoming common to replace earlier versions as
> later versions become available.

Some authors keep their old drafts deposited; others prefer to remove them.
Nothing fundamental here: Just evolving scholarly practice. No canons to
adhere to (though keeping old unpublished drafts OA when they are not too
embarrassing to you is always better than removing all traces).

> Once you start to do this you no longer
> have a repository, you have a publications list with associated full text
> (where available).

I am not sure who has the monopoly on the intended meaning of
"repository": We've already gone one needless round of needless and
useless synonym propagation: We used to call them "open archives"
(that's what the OA in OAI was); now we call them IRs. So what? They're
all recent and they're still mostly empty. Let's fill them instead of
worrying about semiology. OA IRs need to be filled with the full texts
of refereed research papers, optionally before publication, obligatorily
at publication, and, optionally, postpublication updates too.

Publication lists continue to be publication lists, and pertain only to
what has been formally published. The rest is listed as "unpublished work."

Access to both published and unpublished work used to be only
on-paper. Now it is also online -- and especially via OA IRs.

> I have no problem with this but let's stop calling it a
> 'repository' and call it a 'publication list with associated full text
> (where available)'.

We stopped calling them archives; now we should stop calling them
repositories? (I think we should stop calling them and start filling
them!) They are certainly not "publication lists" -- though they can
certainly be used to generate publication lists -- as well as to access
the publications (if they are deposited). If the contain just metadata,
then they are empty repositories; if they contain other contents instead
of research publications, then they are just generic digital

> [While we are clarifying naming systems lets get
> rid of the totally misleading 'pre-prints and 'post-prints', I propose
> 'pre-refereed' and 'post-refereed'.]

They already mean exactly that (except that not all published journal
articles are refereed, because not all journals are refereed! It is the
refereed ones, however, that are OA's primary target).

Unrefereed or pre-refereeing research papers: preprints.
Refereed or published research papers: postprints.

> The EPrints package has in part been a contributor to this
> confusion. This is because some of its questions are ambivalent under
> the two interpretations or models.

The purpose of EPrints is not to contribute to online storage semiology.
It is to help fill IRs with their primary intended content: refereed

> When entering an item with associated
> full text you are asked to tag it as 'refereed/non-refereed' and you
> are also asked to tag it as 'draft/submitted/etc'.

Yes indeed. It is useful to users and searchers to be able to restrict
themselves to refereed content only, if they wish, and also to know
whether the version they are accessing is the publisher's PDF or the
author's final draft.

There are dependencies between the categories: A submitted draft is not
a published draft; nor is it a refereed draft.

But there aren't two "models" here: There's just one target content
(preprints and postprints). The IR may contain the texts themselves, or
just the metadata that describe them.

> If you are following
> the repository model both questions refer to the full text item being
> entered but if you are following the publications list model the first
> question is about the item itself and the second is about the full text
> associated with it *at the time of entry*. This means two users of EPrints
> can build two different IRs whilst using the same package because the
> questions allow two different (but internally consistent) interpretations.

Let me try to illustrate how these card-catalogue questions are mooted by
the real target use of OA IRs, by researchers: A researcher is searching
for refereed research, and wants access to a refereed draft. The tags
make doing that possible. Or the researcher widens the target and wants
to look as unrefereed drafts too. That's fine too.

The "models" above have nothing whatsoever to do with these users'
needs. They are obsolete.

Few (if any) institution-external users will have any use for a metadata
only IR. They are looking for the full-texts. Hence empty OA IRs are not
OA IRs at all.

If the user for some reason wants the publisher's PDF, and it's not in
the IR, the user is out of luck (but that's not what OA IRs are for). A
link will take him to the publisher's website.

If an author has been sloppy, and has updated the metadata for the
already deposited preprint and dubbed it "refereed" at the time the
paper was accepted for publication, without adding the postprint, and
that has scholarly consequences, then scholarly practice will tighten up,
in the OA age. No big deal. And nothing to do with the apriori meaning of
"repository" in Plato's formal eternity.

That said, EPrints software and IRs can certainly be used for other
purposes as well (including library card catalogues...)

> It may be that I am the only person feeling confused about IRs and
> everyone else is clear - but are you sure you are clear about the same
> thing :-) .
> Just to complete my reply to Stevan's note:
> Nothing in my previous note was about certification and I'm not sure
> how that crept in.

I thought you were concerned that there was no way to be sure that the
draft an author self-archives is indeed as tagged (refereed, published,
final draft, etc.)... If your concern is that IRs only contain metadata,
it is time for you to start promoting postprint deposit mandates!

> Finally my comment about metrics and citation counts was an attempt to
> suggest a way of providing some form of quality assessment to searchers
> that did not require any reference to peer review or refereeing.

The citations of published, peer-reviewed articles presuppose peer
review. If the item is published, one does not cite deposits, one cites
publications. (Unpublished papers are an extra bonus; they are not the
primary target content of the OA movement.)

Neither downloads nor citations refer directly to peer review, but when
what you are downloading is a refereed postprint, then it does pertain
to refereeing, and likewise when you cite a published work. (You read a
version -- hopeably the refereed, published version -- and you cite the
published work. Only in the case of downloading or citing unrefereed,
unpublished papers is peer review not involved; and, as noted, this is
just an added bonus of an OA IR, not its mainstay or raison d'etre.)

Stevan Harnad

If you have adopted or plan to adopt a policy of providing Open Access
to your own research article output, please describe your policy at:

    BOAI-1 ("Green"): Publish your article in a suitable toll-access journal
    BOAI-2 ("Gold"): Publish your article in an open-access journal if/when
    a suitable one exists.
    in BOTH cases self-archive a supplementary version of your article
    in your own institutional repository.
Received on Wed Mar 05 2008 - 23:04:45 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:14 GMT