OA Archives: Full-texts vs. metadata-only and other digital objects

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Thu, 9 Jun 2005 14:44:41 +0100

On Thu, 9 Jun 2005, Tim Gray, Library Assistant, Homerton College Library wrote:

> >....only 15% of OA's target content (the annual 2.5 million full-text
> >research articles published in the world's 24,000 journals) is as yet
> >being self-archived, worldwide [...]
> I was under the, obviously mistaken, impression that all items harvested by
> OAIster were open access. Open access, to me, means (amongst other things)
> full text. So what are these authors doing? Archiving their metadata but
> not the actual text? They might as well not archive anything, then, I would
> have thought.
> I know that 85% are not self-archiving yet, but I assumed that OAIster
> covered the 15% who *were*. But maybe I've missed something?

Your query is quite natural and very important to raise, and to understand:

It is extremely important to distinguish, and understand the relation between:

    (OAI) (1999) the Open Archives Initiative (OAI), with its metadata
    harvesting/interoperability protocol and

    (OA) (2001) the (Budapest) Open Access (OA) Initiative (BOAI), with its
    objective of open access to the full texts to preprints and postprints
    of articles (and dissertation)

OAI began in 1999 with an OA focus -- to make OA archives interoperable. But it
soon became much more general: a metadata harvesting protocol for making all sorts
of digital archives -- not just OA archives -- interoperable.

OAIster harvests from all known OAI-compliant archives, not just OA archives.

Moreover, even the OA archives are not necessarily 100% full-text! In other words,
it is not at the moment known what percentage and which of the current
5,475,850 records from 480 institution harvested by OAister correspond to
OA full-texts.

We can be fairly sure that OA full-texts are the minority in OAIster, based
on the estimates from the OA full-texts crawlers from Oldenburg, Southampton
and Universite du Quebec a Montreal, which converge on 15% worldwide
and discipline-wide for the past 10 years.


Probably OAIster's percentage of OA full-texts is higher than the 15%
that is the average for the literature as a whole, but it might not be
very much higher than that, not just because not all records link to
full texts, or OA texts, but especially because some of the "full-text"
records are not OA target-texts (i.e., preprints, postprints, theses)
but other kinds of digital objects: courseware, institutional records,
video, audio, software!

Kat Hagedorn at OAister can estimate the proportion of full texts offline
and has done so in the past:

    Re: DOAJ, OAIster and Romeo should chart growth, as EPrints does

OAister cannot yet accurately chart the full text OA subset online,
but I hope it soon will (Kat?). I hope Kat will correct my errors
or omissions in summarizing OAIster!

Tim Brody's Institutional OA Archives Registry covers a somewhat more
OA-focussed subset of OAIster's Archives


and is now in the process of adding powerful new features, but it too
does not yet distinguish full-texts from metadata-only. It charts
the growth of both the number of archives and the number of records
for 7 kinds of OAI-compliant Archives:

Institutional/Departmental Research Archives:

A: Number of Archives: 210
c: Number of Celestial-harvestable subset: 158
rc: Number of Records from celestial-harvestable: 452600
ac: average records per celestial-harvestable archive: 2865

Cross-Institutional Research Archives: A:55 c:43 rc:1429670 ac:33248

E-Thesis Archives: A:54 c:40 rc:155965 ac:3899

E-Journal/E-Publication Archives: A:39 c:30 rc:83631 ac:2788
Demonstration Archives: A:24 c:11 rc:5961 ac:542
Database Archives: A:8 c:4 rc:1958 ac:490

Other Kinds of Archives: A:44 c:26 rc:373058 ac:14348

But at the moment the record counts and averages (rc and ac) cannot
distinguish full-text records from metadata-only records, and the
latter are in the vast majority. The Archives Registry can also only track
archives that are harvestable by celestial: http://celestial.eprints.org/

(The administrators of Institutional Repositories/Archives could
help us a great deal if (1) those with OAI-compliant archives not
in the Registry could register them at
and (2) those archives in the registry that are not celestial-harvestable
(122/434 = 28%) could provide the data to make them celestial-harvestable.)

In conclusion: The 15% OA full-text estimate is probably right. So the
most important task is to increase that OA content form 15% to 100%. This
requires the adoption of institutional OA self-archiving policies.


Sorting out the 15% full-texts from the metadata-only and other kinds of digital
objects in OAI-space today will help a little, but only institutional
self-archiving policy will get us over the top at last!

Stevan Harnad

A complete Hypermail archive of the ongoing discussion of providing
open access to the peer-reviewed research literature online (1998-2005)
is available at:
        To join or leave the Forum or change your subscription address:
        Post discussion to:

UNIVERSITIES: If you have adopted or plan to adopt an institutional
policy of providing Open Access to your own research article output,
please describe your policy at:

    BOAI-1 ("green"): Publish your article in a suitable toll-access journal
    BOAI-2 ("gold"): Publish your article in a open-access journal if/when
            a suitable one exists.
    in BOTH cases self-archive a supplementary version of your article
            in your institutional repository.
Received on Thu Jun 09 2005 - 14:44:41 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:55 GMT