Re: Optimal OA IR Preprint and Postprint Deposit and Withdrawal Policy

From: Stevan Harnad <>
Date: Fri, 11 Aug 2006 21:03:02 +0100

On Fri, 11 Aug 2006, Simeon Warner wrote:

> On Fri, 11 Aug 2006, Stevan Harnad wrote:
> > SH:
> > UNREFEREED PREPRINTS: If you want authors to be willing to deposit
> > their unrefereed *preprints* at all, you *must* allow them to remove
> > them at will, instantaneously.
> >
> > (It is a good and useful author practice to self-archive preprints:
> > it establishes priority, it elicits corrective peer feedback, it
> > creates a historic record of stages of development of a work, it
> > accelerates and increases research impact and progress. But if the
> > institution imposes a foolishly oppressive removal policy, authors
> > will simply be discouraged from taking the useful step of depositing
> > their unrefereed preprints in the first place).
> SW:
> I disagree with Stevan here, and this is not the policy we follow at
> arXiv. If you expect a preprint to allow authors to establish priority
> then you are saying that the preprint has become part of the scholarly
> record and it should thus not be removed. In arXiv we allow authors to
> post a withdrawal notice but old versions remain publicly available (in a
> very small number of cases of copyright infringement and personal insult we
> have removed articles).

First, the (many) points of agreement with Simeon:

(1) Yes, all things being equal, it is greatly preferable not to remove
deposited documents, whether preprint or postprint, hence removal should
not be encouraged.

(2) Yes, depositing a pre-refereeing preprint is a good way to establish
priority, even before formal publication.

(3) Yes, depositing pre-refereeing preprints is in any case a good practice,
beneficial to research progress, especially in fast-moving, early-uptake
fields, and is to be encouraged.

(4) Yes, a scholarly record of pre-publication stages of research reports
is of interest and value in and of itself.

But now the disagreements:

(i) An Institutional Repository (IR) is not the same thing as a Central
(uni-disciplinary or multidisciplinary) Repository (CR) like Arxiv or
PubMed Central.

(ii) A pre-refereeing preprint is not the same as a refereed postprint.

(iii) The first and most fundamental goal of the Open Access movement is to
provide Open Access to the published, peer-reviewed research literature.

(iv) Open Access to pre-refereeing preprints is and must remain an optional
*bonus* that the author may or may not provide, temporarily or permanently,
over and above access to the refereed postprint.

(v) Open Access to the peer-reviewed postprint is a necessity, across all
disciplines, to supplement Toll Access (via journal

(vi) Open Access to the unrefereed preprint is not a necessity, not
necessarily discipline-universal, and should not be portrayed as such.

(vii) Central Repositories (CRs) evolved on the basis of spontaneous, voluntary
self-archiving, of both preprints and postprints.

(viii) Institutional self-archiving is a matter of systematic institutional
policy, and pertains specifically to refereed, published postprints.

(ix) Institutional self-archiving is (largely) restricted to the institution's
own authors self-archiving their own work: preprints and postprints.

(x) Institutions can and should control the content of their IRs (mainly by
restricting it to their own researchers' output and by ensuring that it includes
all the institutional published postprint output).

(xi) The fact that institutional employees are the self-archivers give
IRs a level of control and answerability that superordinate CRs like
Arxiv, in which anyone in the world can deposit, do not and cannot have
(although research-funder CRs are a partial exception).

(xii) But for neither IRs or CRs should access-provision (self-archiving),
be conflated with publication, nor, preprints (provisional) with postprints
(peer-reviewed, published, and permanent).

> For a thought experiment to help with this, imagine [depositing]
> multiple solutions to some problem to an archive and then removing
> all but the correct one at some later date. Is that a reasonable way
> to establish priority?

No. The reasonable way to establish priority is to deposit the unrefereed
preprint in your IR (or CR) to establish priority and then to get it
refereed and published in a refereed journal. If the author of the
published version is no longer interested in asserting or preserving
pre-publication priority (for some unfathomable reason), he can remove
the unrefereed preprint (although downloaded, cached and harvested
residues may still perdure). The canonical version is, always was,
and will continue for the foreseeable future to be the published,
peer-reviewed, "certified" version: the postprint.

Corrections are another matter. In principle, any version could turn
out to contain an error, detected later: the unrefereed preprint or the
refereed postprint. The difference is that the unrefereed preprint can (in
principle) be deleted (not necessarily in practice, as ghostly remnants,
downloaded or cached elsewhere, can return to haunt the author). The
published version can only be formally "retracted," but it cannot be
"unpublished." It cannot be withdrawn from the bookshelves and the
hard-disks of the world, nor from the annals of the journal in which
it was published. Corrected post-publication updates, however, can be
disseminated too.

So please don't conflate preprint self-archiving, which is a (possibly
temporary and ephemeral) way of providing early (risky) access to
unrefereed research, with postprint self-archiving, which is a way of
supplementing access to refereed, published research.

> I think the option to allow authors to remove e-prints is simply an
> unpleasant compromise that may be necessary to help populate repositories.

Again, unrefereed, unpublished preprints and refereed, published postprints
are being conflated here, as is preservation-archiving and access-archiving:

Self-archived drafts can be disinterred from the archive: "un-archived." I
agree that this should be discouraged, wherever it is unnecessary,
but I don't find it at all unpleasant to allow authors the permanent
option of withdrawing unpublished work from public view if they so wish
(and not merely as a sop for enticing reluctant self-archivers to go
ahead and self-archive!). That's the difference between publishing
something and merely providing access to it. Publishing is archival,
permanent and irreversible. Access-provision is not.

> One could hope that the option might later be removed in a
> bait-and-switch move. This was how it played out in arXiv though it was
> not thought of in that way. Versions have been stored since 1997 but
> before that a revision overwrote the previous version.

With all due respect, I think Arxiv was an important milestone in the
evolution of self-archiving, Open Access, and Institutional Repositories,
but it is neither the optimal model nor (I believe) the wave of the
future. The wave of the future (thanks to OAI-interoperability) is
(I believe) distributed local-institutional self-archiving of each
institution's own research output in its own IR, not central, Arxiv-style
self-archiving. Central harvesting -- Oaister-, citebase-, citeseer-,
scirus- and google-scholar-style -- will take care of the rest,
harvesting the distributed OA IR contents seamlessly into searchable
central "virtual archives ("VRs")."

Research institutions (universities, mostly) have an interest in two things:
(1) maximising the usage and impact of their research output, by maximising
access to it and (2) preserving a permanent record of their research output.

Self-archiving institutional research output (preprint and postprint)
serves the purposes of both (1) and (2), but only (1) requires that the
output be made Open Access; (2) would be equally well-served by Closed
Access self-archiving. And the only thing an institution can insist upon
being deposited in its permanent archive is the author's final, refereed,
published drafts; authors are well within their rights and reason to
reserve the prerogative to decide for themselves what pre-refereeing
drafts they wish to grant access to, and which of them, if any, they
wish to retain in the permanent record.

This being the online, networked age, however, the following unprecedented
sequence can happen (and no doubt has and will): An unrefereed preprint
is posted publicly only, read, used and cited, and then withdrawn,
orphaning links and citations (unless the users/citers preserved a
draft). This is not good for scholarly progress, and a solution will
evolve. The most likely solution is that institutions will make their
authors answerable for what they post publicly in their IR at least
insofar as requiring them to leave at least a Closed Access version
of it in the archive permanently -- with a URL or DOI that permanently
identifies it, but does not necessarily provide public access to the full
text itself. Under special circumstances, referees, official auditors,
etc., should be able to apply to the institutions for access to the full
text, in cases of scholarly dispute about what it had contained.

Why leave the option to allow the publicly posted preprint to revert
to Closed Access status? Because if it did contain an error, leaving
it publicly accessible -- even if there are links and pointers to
corrected versions and updates -- leaves open the possibility that an
unwitting user will access the erroneous version. The probability is low;
and even withdrawal does not reduce that probability to zero (because
of likely downloaded and harvested residues here and there); but the
sensible, scholarly policy for an IR is to support the withdrawal of
unrefereed, unpublished work, while formally discouraging it.

That is the long and short of it. It has nothing whatsoever to do
with "unpublishing" published work. And, yes, the difference between
peer-reviewed publications and unrefereed self-postings is a profound
and important one, even in the OA age. The official scholarly record is
the *published* record, not the "posted" record.

Stevan Harnad
Received on Fri Aug 11 2006 - 21:16:54 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:27 GMT