Re: Central vs. Distributed Archives

From: Stevan Harnad <>
Date: Thu, 2 Nov 2000 18:28:35 +0000

On Thu, 2 Nov 2000, Greg Kuperberg wrote:

> I certainly think that a standard for interoperability could be useful,
> but it is wishful thinking to suppose that it can tame an anarchy of many
> tiny little e-print archives. In my discipline, when the literature
> is excessively decentralized, as it was entirely before 1998 and still
> largely is, neither authors nor readers have any confidence that papers
> floating around on the Net are permanent. And they are right, because
> no one could promise to keep those papers forever with any credibility.
> ... The fact that the arXiv is so large and so widely used and
> mirrored is a necessary ingredient for assuring permanence.

(1) Archives meeting the conditions to be registered OAI-compliant
data-providers <> are
not likely to be "tiny little" ones (though it is no problem if some of
them are).

(2) Most Eprints Archives are likely to be university-based archives, for
all the university's refereed research, in all its disciplines. That's
hardly tiny (or impermanent) either.

(3) The goal is to free the refereed literature, across disciplines,
now. Once the literature is thus freed the process will be irreversible.

(4) The mechanisms for preserving and navigating it will continue to
evolve and improve, with the whole world's refereed assets in this
distributed basket (suitably mirrored, harvested, cached, backed up,

(5) The immediate issue is hence not the PERMANENCE of the
self-archived drafts but their EXISTENCE, free for all, now. The
permanence will take care of itself.

> The "self" in self-archiving could mean individuals acting for themselves,
> or it could mean the research community acting for itself by directly
> supporting one or a few archives. I have the feeling that you don't
> see this as an important distinction.

You are right; I think it is a red herring. Most of the individuals in
question (the authors of the refereed literature) are researchers at
universities and research institutions. In principle each of them could
set up his own Eprints Archive and register it with the OAI (and that
would be fine as a start, and would free the literature irreversibly).

But of course the likely, practical strategy is for the researchers'
universities and research institutions (or, more specifically, their
libraries) to create and administer their institutional Eprint Archives
for all their researchers' refereed output, in all disciplines. (We can
have at least as abiding a faith in the durability of the collections
on universities' airwaves, then, as we now have in the durability of
the collections on their shelves).

> I can't say that this ambitious goal is "within immediate reach" in
> mathematics, because many of us have worked hard to make it happen and
> we see a lot of work ahead. We can't expect all mathematicians to change
> their minds in one day.

You are now talking about something else: You are talking about what it
will take to induce the research cavalry to drink, once they have been led
to the waters of self-archiving.

There's no second-guessing human nature, but my own hunch is that the
motivational structure at the researchers' own institution -- the one
that benefits from (and rewards) the impact of its own researchers'
refereed output, and the one that is today weighed down by the serials
crisis and the limitations that that puts on its own researchers'
access to the refereed output of researchers at other institutions --
may provide just the kind of local incentive for self-archiving that a
centralized, discipline based entity so far seems unable to provide.

In any case, these two routes to the liberation of the refereed corpus
(centralized and distributed) are complementary (and interoperable!).

> If you think that encouraging many small archives to spring up is the
> magic step, then I simply disagree. Because when we glued together
> many small archives into the math arXiv, the whole was much more than
> the sum of the parts. Even though the math arXiv has only 5% of new
> math papers, and even though it will take years for it to get to even
> 50%, it is at least growing more quickly than all of the Lilliputian
> mathematical archives put together.

I am not a mathematician, but this "whole is greater than the sum of its
parts" argument does not add up for me!

Centralized archiving in maths is at 5% and will take years to get to
50%. What possible reason would there be not to encourage complementing it
by institutional Eprint Archives immediately -- given that they will all be
co-harvested (and mirrored, and cached, etc.) in global virtual archives
anyway, thanks to interoperability?

> other disciplines are sufficiently different that their open archives
> might need separate administration. And that would lead to fragmentation,
> which concerns me more than it does you.

My concern is freeing the refereed literature online, now. There is no
reason it should stay hostage to S/L/P barriers for another minute. The
future will then take care of itself. We have nothing to lose, and
everything to gain (self-archiving online does not remove it from the
pages of the journals or the shelves of the libraries it is on now; it
just increases its accessibility dramatically).

A priori worries about distributed archiving alas belong to that long
litany of prima facie rationales for inaction that have been keeping
the research cavalry in Zeno's Paralysis, and the refereed literature
behind financial firewalls, well past the point when posterity will
chide us for it.

Stevan Harnad
Professor of Cognitive Science
Department of Electronics and phone: +44 23-80 592-582
             Computer Science fax: +44 23-80 592-865
University of Southampton
Highfield, Southampton

NOTE: A complete archive of the ongoing discussion of providing free
access to the refereed journal literature online is available at the
American Scientist September Forum (98 & 99 & 00):

You may join the list at the site above.

Discussion can be posted to:
Received on Mon Jan 24 2000 - 19:17:43 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:45:55 GMT