Re: Interoperability - subject classification/terminology

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Sat, 23 Nov 2002 14:38:25 +0000

On Sat, 23 Nov 2002, [iso-8859-1] Subbiah Arunachalam wrote:

> Why is it that Open Archives/ E-prints works well in
> some fields (physics, astronomy, computer science) and
> not in other fields (say, agriculture)? I would like
> to hear from members of the list.

Others are invited to reply too. Here is my own candidate explanation:

(1) It is not that physics or astronomy or computer science are
different from other fields with regard to the benefits or feasibility of
self-archiving and open access in their fields. All fields can benefit
from it and it is feasible in all fields. There are reasons, however,
why self-arching *began* in physics/astronomy, and why it came early in
computer science too.

(2) Self-archiving began in physics (and soon generalized to astronomy)
because physics already had, in paper days, a "preprint culture."
Physicists had already learned, well before the online era, that they
could accelerate the pace and interactivity of research if they did
not wait till published versions of papers appeared in print. Especially
in high-energy physics, they adopted the practise of mailing preprints
of their work to one another, to routing lists, and to a number of
central depositories.

(3) This practise simply generalized, in the beginning of the '90s,
quite naturally, as the technology became available, first to email
routing lists, and then to a web depository. Given the existing preprint
culture, this subsequent development requires no special explanation.
The physicists were smarter than the rest of us in having already
discovered the benefits to research progress of sharing preprints as
early as possible. They would have had to be rather thick to just keep
doing that in paper once email and the web were available!

(4) The practise of self-archiving immediately began to spread to other
areas of physics and allied fields (astronomy, mathematics), but the
important fact has to be noted that from the very beginning in August
1991 to the present day, over a decade later, that growth has been
merely linear (which means, currently, 3500 deposits per month)
http://arxiv.org/show_monthly_submissions

(5) At that linear growth rate, it would take 10 years before everything
being published in physics (in that year, 2012) was being self-archived.
Physics/astronomy/maths are still ahead of all disciplines, but their
lead is not dramatic enough, and another decade would be far, far too
long a wait. What is needed is something that will not only (i) accelerate
self-archiving in those head-start fields to a curvilinear upward
growth-rate that will capture their total current research output much
sooner, but also something that will (ii) universalize the practise
of self-archiving to all the other late-comer disciplines, and capture
their full research output too (currently about 2,000,000 articles per
year, appearing in the approximately 20,000 peer-reviewed journals
that exist today in all disciplines and languages worldwide).

(6) My own hypothesis is that distributed, institutional self-archiving
will be the critical factor that will induce this acceleration and
universalization of self-archiving, as centralized, discipline-based
self-archiving alone has so far failed to do.

(7) The reason is that the rationale for institutional self-archiving
makes the benefits of open access explicit for all
researchers. Researchers and their own institutions (not their
disciplines) are the co-beneficiaries of the maximized research
visibility, accessibility, usage, citation and impact that are provided
by maximizing research access (i.e., universal, open access) through
self-archiving. It is researchers and their institutions whose research
output and research impact, and the indirect rewards that they bring --
in the form of research funding, income and standing, prizes and prestige
-- benefit from open access.

(8) In addition, research institutions have the further motivation to
try to relieve their serials subscription/license crises by doing whatever
they can to promote open access through self-archiving: Distributed
self-archiving is reciprocal.

(9) And the motivation for institutional reciprocity in self-archiving
is not just based on (a) the potential to maximize the impact of
institutional research output, nor on the possibility of eventually
(b) relieving institutional serials budget burdens. Access itself -- (c)
access to the peer-reviewed research output of all other universities --
can only enhance the quality and productivity of their own researchers'
word, for in the current toll-access system no institution, not even
the biggest or wealthiest institution, can afford to provide access
to anywhere near the total peer-reviewed research literature for its
researchers (in any field).

(10) The fourth reason that distributed institutional self-archiving may
well prove to be the way to accelerate and universalize open access is
that (d) internal and external research assessment (to reward researchers
for their past contributions and to fund their future contributions
http://www.hero.ac.uk/rae/ ) also promises to be greatly strengthened
through the creation of a global, open-access digital database of total
institutional research output, accessible to the many new scientometric
assessment tools that are being and will be created to analyze and monitor
research productivity and impact (e.g., http://citebase.eprints.org)
when applied to this rich new resource. This cause/effect loop, and the
means to monitor, measure, and display it, will not remain for long lost
on either university administrations or research funders.

(11) I have still to reply about computer science: This is another sort
of special case. The content of computer science, as a discipline, is
by its nature closest to the medium of self-archiving itself, namely,
computers, digital data, and distributed networks. It was only natural
that computer-scientists should create and store their digital research
output on the Net, and they did so, in huge numbers -- greater even
than those of physics and the other head-start disciplines. But they
stored them on their home websites or departmental tech-report pages
rather than in a centralized computer science archive as the physicists
had done in ArXiv. (There is a computer-science sector in ArXiv too,
but it is still one of the smaller sectors and growing no faster than
the others.)

(12) The brilliant (but also quite natural) strategy of NEC's Steve
Lawrence, Lee Giles and Kurt Bollacker had then been to try to *harvest*
all of the anarchically self-archived computer science papers distributed
all over the web (and this was before the days of OAI-interoperability --
http://www.openarchives.org -- and OAI-compliant institutional Eprints
Archives -- http://www.eprints.org -- which have since made harvesting so
much easier). The result, ResearchIndex -- http://citeseer.nj.nec.com/cs
-- was (and still is!) the biggest open-access archive of them all,
having harvested in computer science over twice as many papers (currently
500,000) as all the papers (currently 200,000) in all the fields in the
Physics ArXiv put together. But ResearchIndex is a "virtual" archive,
not a centralized one at all; it is a google-style selective harvest
from distributed websites all over the Web. Lawrence et al. have also
demonstrated the power of such a virtual database to generate rich new
citation-based scientometric indicateors of research and researcher
productivity and impact.

(13) All these currents are currently converging. The Physics ArXiv is
OAI-compliant, as are all the distributed institutional Eprint Archives,
so they can all be harvested and navigated seamlessly as if they were
all one global archive. The computer science archive (ResearchIndex,
has announced that it too will shortly become OAI-compliant. So
there is no longer any difference bewteen central and distributed
archiving. Universities worldwide are becoming increasingly aware of
the causal connections between research access and research impact,
and their implications for research productivity and funding, and are
moving towards self-archiving their institutional research output and
the reciprocal benefits it confers to the entire worldwide research
community.

(14) But it is all still happening far too slowly! We need not, and
should not, wait another decade to reap the immense benefits of open
access to the planet's research output.

(15) For ideas about what researchers, their institutions, and their
research funders can do to hasten us all along the road to the optimal
and inevitable, see:
http://www.eprints.org/self-faq/#researcher/authors-do
http://www.eprints.org/self-faq/#institution-facilitate-filling
http://www.eprints.org/self-faq/#research-funders-do

Replies to Arun's question are invited from others too!

Stevan Harnad
Received on Sat Nov 23 2002 - 14:38:25 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:42 GMT