Re: Central vs. Distributed Archives

From: Stevan Harnad <harnad_at_coglit.ecs.soton.ac.uk>
Date: Thu, 2 Nov 2000 21:29:24 +0000

I like Greg Kuperberg's postings, even though we disagree. Greg too is an
advocate of freeing the literature through author self-archiving, but he
prefers centralized archives, whereas I think both centralized and
distributed archiving are welcome and should be encouraged, as both can
hasten the freeing of the refereed literature.

Centralized archiving has been with us for over 10 years, and at its
current rates it will take 10 more years to free the Physics literature
alone, where it is most advanced. In Greg's own field of mathematics,
it might be going even more slowly. It looks to me as if centralized
self-archiving can now use the help of distributed institutional
self-archiving.

By way of counterevidence, Greg cites the fact that in mathematics
institutional self-archiving predated centralized self-archiving
and was unreliable. It was centralized self-archiving that accelerated
and stabilized the process.

What Greg seems to overlook is that the institutional self-archiving he
describes PRE-DATED the Open Archives Initiative (OAI), with its
interoperability. Hence the question of whether or not distributed
self-archiving in OAI-compliant Institutional Eprint Archives will
accelerate the freeing of the literature has not yet been tested.

Greg also seems to conflate, at some junctures, the self-archiving of
unrefereed preprints with the self-archiving of refereed postprints,
as if self-archiving were in some sense a rival to or substitute for
refereed publication (which I certainly do not think it is);
self-archiving is merely a way to free the refereed literature.

On Thu, 2 Nov 2000, Greg Kuperberg wrote:

> In 1997, the year before the universal math arXiv was started, there
> were already some 10 or 20 thousand research papers freely available on
> the web. Most of them were on personal home pages, but thousands were
> in institutional and subject-based preprint series.

This is irrelevant, as noted above. These archives were not
OAI-compliant and hence could not be integrated or navigated in a
useful way.

> Nonetheless the vast majority of these papers were still eventually
> sold as published papers.

This too is irrelevant. The initiative to free the refereed literature
is a PRO-RESEARCHER and PRO-RESEARCH initiative, not an anti-publisher
initiative (nor even particularly a "pro-library" initiative):

The goal is to free the refereed literature for one and all online.
That is what self-archiving does.

The goal is NOT to prevent other versions of the refereed literature
from being sold, on-paper or on-line, if there is a market for them.
(Why would we want to do that?)

> So what were the publishers selling? Not peer review, because you
> can learn from Math Reviews where a paper has been published without
> subscribing to the journal. To a large extent the journal system was
> selling, and is still selling, stability and permanence.

Fine. Let it continue to do so (whether the stability/permanence is real
or merely imagined). As long as another version is online and free, the
goal is met.

> So that has been the fundamental question of open archival in
> mathematics for years. That is why some of the recalcitrant math
> publishers say that the arXiv is "just a preprint server" and not a
> "permanent e-print archive". Of course I don't agree with them; I
> choose the arXiv over subscription journals as the future route to
> permanent archival.

I'm afraid that this is not making sense to me. What is the argument?
That the jeering of some publishers nullifies the fact that that portion
of the refereed literature that has been freed is indeed free?

The substantive question is: Are the refereed papers online and free? If
they are, who cares if some people keep calling them "prepints," when in
reality they include both, pre-refereeing preprints + post-refereeing
postprints (= eprints)?

But I sense another point of disagreement with Greg: Earlier he said
it's not the peer-review that makes people keep paying for the for-fee
(refereed) version despite the availability of the for-free (refereed)
version, but the "stability and permanence". Perhaps. But if the
implementation of the peer-review were no longer paid for by the
continued support for the publishers' version, perhaps the true value
and causal role of peer-review in all of this would become clearer.

Moreover, for now, it is not true stability/permanence that
distinguishes the publishers' for-fee version and the archives'
for-free version, but mere PERCEIVED stability/permanence.

With time, that may change. But for now it certainly isn't any reason to
deter us from self-archiving, either centrally or institutionally. On
the contrary; as long as the publishers' for-fee version is seen as the
guarantor of the stability/permanence, there is no reason whatever NOT
to SUPPLEMENT that with the self-archived free version -- without giving
the stability/permanence issue another thought!

> As a practical matter most of the institutional preprint series in
> mathematics are at the department level. At every university at which
> I have studied or held an appointment, interdepartmental computer
> services (a) are often mediocre, and (b) are often a one-size-fits-all
> straightjacket. I don't even like central campus e-mail. In my view the
> strength of university research is rooted in departmental independence.

Fine. But irrelevant to OAI-compliant Institutional Eprint archives,
administered by the Institution's Library (or do you not trust library
resources to anything but the department either?)

> So should we mathematicians trust individual math departments to
> permanently preserve their e-prints? I don't think so. Our own math
> preprint series at UC Davis is an arXiv overlay - all articles are
> automatically contributed to the math arXiv. One of my arguments for this
> arrangement is that we can't promise to babysit these preprints forever.
> We could easily forget our obligation.

The Department could easily forget; the institutional library is unlikely
to do so. It has a lot of prior practise with stability/permanence! (And
it has a good deal to gain from maintaining robust institutional Eprint
Archives: The prospects of serials-crisis relief, as other
institutional libraries do the same thing, with their own Eprint
archives -- perhaps it will be this reciprocity with each institution's
intellectual goods, formally mediated by journal subscriptions, that
will provide just the stability/permanence we're looking for?)

> When we put together the universal math arXiv from its disparate parts,
> submissions immediately jumped by 40% (as of December 1997).
> Since then the math arXiv has grown more quickly than the subject-based
> archives that were not pulled into the fold.

Very welcome news. But not fast enough, alas. And there has been a new
development since 1999 (OAI). And the OAI-compliant Eprints software
will be ready for institutional adoption in a few weeks:
http://www.eprints.org

> As I said above, in math the institutional archives are there already

Yes, but not OAI-compliant, hence not interoperable, hence anarchic.

> They distract authors as much as they encourage them.
> In fact one of the serious
> problems with the fragmented interoperability system is multiple
> submissions. Many authors like to advertise themselves by putting
> their papers in more than one archive. Or if a paper has four authors,
> it could go to four archives because each one has a different favorite.

I repeat. You are describing history. These are not registered,
OAI-compliant archives!

> As for your vision of global virtual archives, that hasn't happened yet.

It hasn't happened yet, because the global self-archiving hasn't
happened yet! Moreover, until the OAI-protocol, there wouldn't have
been the means to make it happen, even if there had been global
self-archiving. Now it is possible. See ARC for a glimpse of how
distributed Open Archives can be drawn together:
http://arc.cs.odu.edu/

And as I said, Institution-based incentives may be precisely the
critical ones that have been missing in purely centralized,
discipline-based self-archiving. Now they will be there, helping the
initiative. And so will the free off-the-shelf OAI-compliant Eprints
software.

> If you wait for that then you can't also assure us that the revolution
> can take place immediately. If we do have something to wait for, why
> wait for a integrated facade with a fragmented foundation instead of
> the other way around?

Greg, you know that what I said was that authors could free the refereed
literature immediately through self-archiving. I didn't say anyone
should wait for the global virtual archive to be full before filling it!

You, on the other hand, seem to be marshalling rationales for not
filling it (because of a priori worries about stability/permanence and
"fragmented foundations")...

> Obviously I'm not a conservative offering rationales for inaction.
> And my worry is not "a priori". NCSTRL and MPRESS are two long-standing
> attempts at standards-based fragmented interoperability. Neither one
> has as much readership as the younger, fully integrated math arXiv.

They pre-dated OAI and Eprints. Have just a bit more patience; but be
prepared to set aside prior prejudices or you will obstruct precisely
what we both want to facilitate!

--------------------------------------------------------------------
Stevan Harnad harnad_at_cogsci.soton.ac.uk
Professor of Cognitive Science harnad_at_princeton.edu
Department of Electronics and phone: +44 23-80 592-582
             Computer Science fax: +44 23-80 592-865
University of Southampton http://www.ecs.soton.ac.uk/~harnad/
Highfield, Southampton http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM

NOTE: A complete archive of the ongoing discussion of providing free
access to the refereed journal literature online is available at the
American Scientist September Forum (98 & 99 & 00):

    http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html

You may join the list at the site above.

Discussion can be posted to:

    american-scientist-open-access-forum_at_amsci.org
Received on Mon Jan 24 2000 - 19:17:43 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:45:56 GMT