Re: The archival status of archived papers

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Mon, 2 Dec 2002 19:40:34 +0000

On Mon, 2 Dec 2002, J Adrian Pickering wrote:

> It isn't just a technical issue.
>
> If you follow Mark's solution you end up with the risk of people citing
> papers that don't contain the information they cite anymore.

Mark suggested that an archived article should be a persisting object,
with a persisting identifier. That seems reasonable. Now if there have
been several successive versions of a paper (which the author wants to
consider as successive versions rather than new papers), then it also
seems reasonable that the archive should link all the successive
versions and point to the latest one by default.

All the prior versions are preserved, and accessible (unless the archive
has a policy allowing withdrawal -- a policy that should not be
encouraged).

It is the user's or citer's responsibility to specify which version they
have used/cited, if there are more than one. That will become part of
good scholarship, just as spelling the author's name correctly is.

So we need both unique identifier for a generic paper and a unique
identifier for a specific draft of that paper.

> This is particularly likely when the matter being discussed is
> controversial. A citation strictly refers to a manifestation/version
> not the generic paper.

Correct. It refers to a specific draft, usually with a calendar date and
some other identifying features.

> If the person making the citation wishes to change the citation to a later
> version then that is *their* right. The link is *their* link, not the
> target's. If you have 'published' something then it is in the public domain
> and you must expect people to cite it (and that version).

I mostly agree. But this seems to be covered by providing unique version
identifies; it does not prevent the Archive from defaulting to the most
recent version -- while offering the earlier versions too.

It might be making a subtle difference in the view people are taking on
this whether they are thinking of the Archive as a centralized one
(rather like a journal) or a distributed institutional one (rather like
author-provided reprints). It is conceivable that different drafts of a
paper will be in different archives. Those distributed versions too,
need to be trackable and integrated. My technical inexpertise leaves me
unable to propose how to do this, but it is the hardest-case scenario,
and the one we should aim to cover, eventually. Assuming it will all
be in one central archive is probably unrealistic (and unnecessary, in the
spirit of distributed OAI archiving and interoperability).

To my layman's ear it sounds as if every version of a paper will need a
unique version identifier, and in addition, there will need to be some
interoperable ways of integrating different versions as being different
versions of the same paper. The new scholarship will be, at the gross
level, concerned only with citing the generic paper (without worrying
about version fine-tuning), but the careful scholars need to have the
option of specifying the version too, uniquely, for those cases where
it matters.

> I agree that archive items should persist and, therefore, the references to
> them. The relationship between the versions should be issue to click
> through too.
>
> Regards the 'user' query, they need to be told not to submit so many
> versions i.e. *think* carefully before submission! This is a matter of
> policy and governs the degree of 'resistance' there is to making
> submissions. There needs to be some otherwise the quality level drop.

It cuts both ways. Yes, authors should not start archiving willy-nilly
every raw draft and every afterthought. But they should not feel
constrained in doing corrections and updates whenever they are needed
too. Authors should know, though, that from the moment they place a draft
into a public open-access archive, it may be read, cited, and pointed to
-- that specific draft -- in perpetuum. That is part of what it means
to have archived something publicly.

I'm sure scholars will easily get a sense for this, as they have for
everything else. In the beginning some will fumble and treat the
archive as labile first drafts or lapidary touch-me-nots, but experience
and feedback will calibrate everyone's practice and reflexes. The
Archives just have to make sure they do not pre-judge or short-circuit
any important options a priori.

Stevan

> A/
>
>
> >Stevan Harnad
> >
> >On Mon, 2 Dec 2002, Mark Doyle wrote:
> >
> > > Greetings,
> > >
> > > On Tuesday, November 26, 2002, at 08:27 PM, Stevan Harnad wrote:
> > >
> > > > Now it is conceivable that the eprints architecture can be slightly
> > > > modified, so that the old, suppressed URL for the deleted paper
> > > > automatically redirects to the new draft if someone tries to access
> > > > the old one. That I have to let Chris reply about. Here I have merely
> > > > explained the rationale for not having designed the archive so a paper
> > > > could be deposited, and then modified willy-nilly under the same URL.
> > > > For that would not have been an archive at all, and user complaints,
> > > > about trying to use and cite a moving target, would have far
> > > > out-numbered
> > > > depositor complaints about what to do with after-thoughts and
> > > > successive
> > > > drafts.
> > >
> > > Well, that is one way to look at it. On the other hand, arXiv.org uses
> > > version numbers and the persistent name/id and URL (say hep-th/0210311
> > > and http://arXiv.org/abs/hep-th/0210311) always points to the latest
> > > version
> > > with links to the earlier versions.
> > >
> > > I believe you are advocating a poor design choice here. One cannot
> > > overemphasize
> > > the importance of human-friendly persistent names that are easily
> > > converted
> > > to URL's for linking and quick location. Patching the system to
> > > redirect to the
> > > latest linked version is a hack. Is one actually able to download
> > > the earlier version (which is what was cited)? Generally, a better
> > > approach
> > > is to give a good persistent name to a "work" and not a single
> > > manifestation
> > > of that work (whether it be a particular format or a particular
> > > version) and
> > > then give a reader a single point of entry into the system that can be
> > > bookmarked
> > > or cited reliably which gives a choice of what to download. Cutting off
> > > access
> > > to an earlier, citeable version is a mistake. Archives should not
> > > delete items
> > > or make them hard to access - rather they should show items in context
> > > and give easy access to an item's history and versioning with a single
> > > identifier for the work taken as a whole.
> > >
> > > Cheers,
> > > Mark
> > >
> > > Mark Doyle
> > > Manager, Product Development
> > > The American Physical Society
> > >
>
Received on Mon Dec 02 2002 - 19:40:34 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:44 GMT