Re: Central vs. Distributed Archives

From: Stevan Harnad <>
Date: Thu, 2 Nov 2000 15:07:58 +0000

On Thu, 2 Nov 2000, Greg Kuperberg wrote:

> 1) I have mixed feelings about the grass-roots connotations of the
> "Open Archives Initiative" and even more in Harnad's phrase
> "self-archiving".

You have to distinguish between the Open Archives Initiative (OAI) and
the "(Author/Institution) Self-Archiving (Sub-)Initiative."

OAI has now evolved into an initiative for shared standards and
interoperability in the metadata tagging of the contents of online
archives -- WHETHER OR NOT the contents (i.e., apart from the metadata)
of the archives are full-text or free:

A commercial publisher, for example, can establish an OAI-compliant
Open Archive as readily as any other institution or individual, and
would benefit from the increased visibility provided by the
OAI-compliant interoperability for the contents of the Archive, even if
the full-texts were kept behind an S/L/P financial firewall.

A journal publisher can also establish an OAI-compliant FREE Open
Archive, if they do wish to give away their full-text contents at this
time (as around 400 biomedical publishers are currently willing to do,
as indicated in a very recent posting:
-- although most of those archives are not yet OAI-compliant).

Nor is the OAI particularly committed to either centralized,
discipline-based Open Archiving (e.g. ArXiv, CogPrints) or distributed,
institution-based Open Archiving (Eprints): It is developing
interoperability standards that apply to both, with the objective of
making the difference between them less significant, eventually perhaps
even irrelevant.

The (Author/Institution) Self-Archiving (Sub-)Initiative, however, is
SPECIFICALLY concerned with freeing the refereed research literature
through author/institution self-archiving (in OAI-compliant Open

> I do believe that the research literature should be
> electronic and free, and it is possible that each discipline must pass
> through an anarchic, do-it-yourself phase of open archival before
> moving on to a more organized stage.

It is not at all clear why you describe open archiving as "anarchic"!
It was precisely in order to put order into distributed online digital
archiving resources through interoperability that the OAI was

And the other aspect of the order is the order already provided by the
refereed journals, in the form of peer review and its certification.
That order is medium-independent, and will be preserved in a
well-tagged Open Archive: "Journal-Name" will be a field, etc.

The only "do-it-yourself" issue is self-archiving itself. And the issue
is very clear: If researchers want the refereed literature freed, now,
then they can do it themselves, by self-archiving, now. Otherwise, they
have to wait until someone else (the journal publishers?) decides to
free it for them -- and that could prove to be a very long wait

    Harnad, S. (1999) Free at Last: The Future of Peer-Reviewed
    Journals. D-Lib Magazine 5(12) December 1999

> However, when I started archive work in mathematics, we already had an
> array of separate preprint servers cum e-print archives. The effort
> since then has been to reorganize much of this jumble into the math
> arXiv. Having many copies of one huge archive is superior to having
> many little archives, no matter how interoperable. Serious permanence
> and stability requires closer cooperation than that.

Again, it is a question of how long the researcher community is willing
to wait for the optimal and inevitable: It is now within immediate
reach to eliminate all the research access/impact-barriers, now,
through self-archiving. Interoperability will integrate the results
into a "global" Archive of the entire refereed research literature, in
all disciplines, as searchable as the Institute for Scientific
Information's Current Contents Database -- but including the full-texts
themselves (and free). (See ARC as a prototype and fore-taste of this

But note that arXiv-style centralized, discipline-based self-archiving
in Physics, the most advanced self-archiving on the planet -- with
130,000 archived paper in 10 years -- has only freed 30-40% of the
Physics literature so far, and will take 10 more years to free it all
at the present steady linear growth rate:

Note that I used to cite the above graph repeatedly as evidence that
the self-archiving cup is half-full. But it is also evidence that it is
still half-empty -- and taking another 10 years to fill.

So the idea is that distributed, pan-disciplinary, institution-based
self-archiving (OAI-compliant, of course) may be what is needed to get
this growth rate into the exponential range for Physics, as well as to
carry it over into all the other disciplines.

Of course multiple copies and mirroring (and harvesting and caching)
will be as important for distributed Open Archives as for centralized
ones. But there is no need to rely only on the centralized model:
Interoperability allows distributed archives to be harvested into
virtual central archives!

You give no reason at all why "serious permanence and stability
requires" all archives to be centralized ones.

> At the overall STM level the literature may have to be divided into
> single-discipline or few-discipline fragments for some time.


> The Los-Alamos based arXiv works well for the TeX-based e-print culture
> in mathematics, physics, and parts of computer science. But it is not
> clear how to extend that particular system to the rest of science.

Why? This formula has been repeated so many times that people are
actually believing it, without anyone ever having explained why it
should be thought to be true!

It is true that (1) arXiv started in Physics. It is true that (2)
physics papers are mostly Tex-based. And it is true that physicists had
(3) a culture of sharing their unrefereed preprints with one another
before publication, first on-paper, and, once possible, on-line. This
explains why it all started in Physics.

But "eprints" are not, and never have been, just unrefereed, TeX-based
papers. They always included the all-important refereed, published
POSTprints too, once they were available, from the very beginning.
Those postprints are eprints too, and they might be TeX, PDF, HTML,
Post-script, or what have you (so, for that matter, could the
preprints be in any of these formats).

The only aspect of this system about which we need to ask whether or
not it can "extend... to the rest of science" concerns whether the rest
of science, too, would or would not benefit from having its refereed
literature (preprints optional) freed through self-archiving in this

The answer, I think, is a resounding Yes. A "no" would be tantamount to
assuming that, apart from Physicists, (a) researchers in other
disciplines do not care whether or not the impact of their research is
restricted to those researchers who happen to be at institutions that
are willing and able to pay the S/L/P costs of accessing it and (b)
other disciplines likewise do not care whether or not they themselves
can access the research of others when their own institution is
unwilling or unable to pay the S/L/P costs of accessing it.

So the feasibility and benefits of freeing the refereed literature
through self-archiving have nothing whatsoever to do with TeX, or
preprint culture, or Physics -- apart from the fact that the physicists
were the fastest off the mark, historically (perhaps because they are
smarter and more serious about research).

Let us not confuse the unique features of the initial conditions that
actually initiated self-archiving in Physics first, with the universal
steady-state benefits of an online corpus, freed by self-archiving (or
any other means).

> If you have to have disjoint archives, fragmented interoperability is
> then a good goal to work towards. But you have to realize that it is
> only a partial solution. And I have reservations about encouraging
> every tenth researcher to set up yet another archive, because that can
> lead to entrenched Lilliputian fiefdoms of e-prints. By my standards
> the physics part of the arXiv, with 130,000 e-prints, is large; the
> math arXiv, with 13,000, is medium-sized; and an archive with 1,300 or
> less is tiny.

I don't know about "entrenched Lilliputian fiefdoms," but I know the
difference between having, say, 130,000 current articles in Physics
available online now, and 170,000 NOT available now, hence not
available to anyone not now at an institution that can afford the
S/L/P: That's a LOT of physicists, the vast majority of those on the
planet. Add to that the number of researchers in other disciplines who
cannot access their own respective refereed literatures, and you get an
access-deprived population of Brobdingnagian, not Lilliputian,

Would all of these, and research itself, be better off with "disjoint
archives" -- OAI-compliant and interoperable -- NOW?

You bet.

So what are you actually worrying about here?

> 2) I have been accused, sometimes correctly, of being overzealous in my
> support of the arXiv. I see that Stevan Harnad has about as much
> enthusiasm as I do, and I can't criticize that. But if the September98
> forum has strong advocacy in favor of open archives, it doesn't make
> sense to limit criticism. Because then you're just preaching to the
> choir. If you don't want to debate whether or not open archives are a
> good idea, maybe that makes sense. But then you shouldn't dwell on how
> fantastic open archives are; instead you should steer the discussion to
> practical plans.

What gave the impression that criticism of either Open Archives or
self-archiving is limited in this Forum?

I have, as moderator, terminated discussion on a few irrelevant or
saturated topics (is there a conspiracy of university administrators to
control researchers' intellectual property? is the library serials
crisis simply a consequence of under-funding the libraries? how can we
reform or abandon peer review?), but comments, whether supportive or
critical, on the Forum's central theme -- "How to free the refereed
literature online, now? -- have never been suppressed.

Indeed, the OAI has never yet been criticized in this Forum, and I am
eager to hear your substantive criticism. Your current posting,
however, was extremely vague about why you think centralized archiving
is the only way to go.

> 3) I also can't criticize Elsevier's Chemistry Preprint Server
> project. In a way I can't even criticize commercial publishers with
> high journal prices, even though I believe that the mathematical
> literature should be free. A for-profit company is entitled to
> maximize profit. If it is publicly traded, it is legally required to
> do so up to a point.

I couldn't agree with you more! But what gives you the impression that
this Forum is trying to prevent companies from doing whatever they
like? What we are trying to do is free the refereed literature.
Vendors are free to continue selling it, on-line or on-paper, with any
deluxe add-ons they see fit -- as long as the author/researchers
themselves are free to free their own refereed papers online through

> (But the same token, the customer, academia, is entitled to minimize
> expenses.)

It's not about minimizing anyone's expenses, but about freeing access
to one's own research. There is no reason ANYONE, ANYWHERE should have
to pay a penny for (online) access to my research, which I (and all
other authors of refereed journal papers) give away, and have always
given away, for free.

> I'm against Napster-style copyright infringement

So am I. That is CONSUMER THEFT, whereas we are talking about AUTHOR
GIVE-AWAY. (This Forum has prior threads on this topic.)

> and I have mixed feelings about journal boycotts.

My feelings are mixed too: If there were a guarantee that they would
work, and work overnight, and force all publishers to make a free
version of the entire refereed literature available online (without
immediately ruining the publishers), I might support boycotts, but I
don't believe for a moment that they would have that effect. Moreover,
I don't believe authors would (or should) give up their preferred
journals until/unless the journals agree to free their contents. There
is simply no reason to give anything up, because authors can free the
contents of the journals themselves, through self-archiving (and I
think distributed, institution-based self-archiving, will hasten and
strengthen that process).

> My approach is less confrontational.

There is nothing confrontational about self-archiving (and it is
completely legal):

> My own recent papers lie permanently in the arXiv, I keep the
> copyright, and I will publish in any journal that wants the papers on
> those terms.

Highly commendable. But authors don't need to make even that much of a
sacrifice: See the above URL.

> From this point of view, I am not sure about the Chemistry Preprint
> Server, because I don't see the business model for it. But then, I
> don't see the business model for Google either, and I think that Google
> is great. It is possible that the Chemistry Preprint Server will be an
> important gift from Elsevier to the chemistry research community.
> Arguably the chemists should have done it for themselves, but maybe
> they lack leadership and need Elsevier to do it for them.

I don't know that we should worry too much about the Chemistry Preprint
server one way or the other (why only preprints? will they stay
online?). Just go ahead and self-archive (whether in centralized
discipline-based Archives or distributed, institution-based ones). The
rest will take care of itself.

Stevan Harnad
Professor of Cognitive Science
Department of Electronics and phone: +44 23-80 592-582
             Computer Science fax: +44 23-80 592-865
University of Southampton
Highfield, Southampton

NOTE: A complete archive of the ongoing discussion of providing free
access to the refereed journal literature online is available at the
American Scientist September Forum (98 & 99 & 00):

You may join the list at the site above.

Discussion can be posted to:
Received on Mon Jan 24 2000 - 19:17:43 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:45:55 GMT