Re: OA Archives: Full-texts vs. metadata-only and other digital objects

From: Stevan Harnad <>
Date: Mon, 13 Jun 2005 14:40:09 +0100

On Mon, 13 Jun 2005, Tim Gray wrote:

> Thank you for your full and illuminating reply to my query about how much
> material in OA archives is available as full text. I am surprised at how
> low you estimate the figure to be and that it is not, yet, possible to
> produce a definitive number.

Why the number of full texts in OA archives is so low is because the
number of institutions with OA self-archiving mandates (as opposed to
the number institutions with OA Archives) is so low. Cf.:

The remedy is quite obvious (and will come, but is taking rather
than it might).

        Swan, Alma and Brown, Sheridan (2005) Open access self-archiving:
        An author study. Technical Report, Joint Information
        Systems Committee (JISC), UK FE and HE funding councils.

> I am wondering if the Open DOAR (Directory of Oopen Access Repositories -
> the 'sister project' to the Directory of Open Access Journals, DOAJ) will
> set strictly 'full text only' rules for inclusion in its directory?

Archives with mixed contents, some of it other than OA full-texts, should
not be excluded, but an algorithm must be devised to recognise and record
the number of full-texts separately. Tim Brody and co-workers at Southampton
are working on this now for the Southampton OA Archives Registry.

    "Newly enhanced Registry of Open Access Repositories (ROAR)"

> how will it relate to the archives.eprints directory you are involved with?

That remains to be clarified, but my understanding is that there will be a
collaboration and DOAR will be built on the Southampton OA Archives Registry.
(Others will have to confirm whether that is indeed the case.)

> It gets confusing to me because there are so many lists of repositories around
> on the web.

That was why the Southampton OA Archives Registry was created, two years ago.
Moreover, because all the other registries rely only on voluntary
self-registration, and archives have not been rigorous about self-registering,
the Southampton OA Archives Registry has been hand-trawling the Web and other
registries to find and register new OA Archives as they are created.

Perhaps a recognizable, consistent self-identifier tag will evolve, so
OA Archives can be automatically harvested and registered, but so far
this has not yet happened. Indeed, some of the ostensibly OAI-compliant
OA Archives may not even be OAI-compliant!

This too will improve, as more institutions adopt institutional self-archiving
policies. Germany's DINI certificate will help.

    "Goettingen/DINI/SPARC-Europe Open Access Meeting"

> How does the celestial harvesting list you mention relate to
> the archives.eprints list (are they the same list?)

Celestial, written by Tim Brody, from the University of Southampton,
is an OAI aggregator/cache application that imports OAI metadata from
version 1.0,1.1,2.0 OAI-compliant repositories, and re-exposes that metadata
through either an aggregated or per-repository OAI-compliant 2.0 interface.

Tim is also the creator and maintainer of the Southampton OA Archives Registry where it is explained that:

    What does Not in Celestial mean?

    This means the archive has not been listed/harvested by Celestial
    ( This may be because the archive doesn't
    have a functioning OAI-PMH interface.

    What does OAI Interface Unknown mean?

    Either the archive doesn't have a functioning Open Archives interface,
    or we couldn't track down where it is. Site admins should say on
    their 'about' or 'help' page where their OAI interface is and use a
    common URL for it (e.g. /perl/oai or /cgi-bin/oai). Submitting your
    site to the OAI registry/Hussein Suleman's Repository Explorer will
    also help to get your site noticed.

> or the large list kept
> by the University of Illinois at Urbana-Champaign (UIUC) at
> <>?

That is one of the registries from which the the Southampton OA Archives Registry
hand-harvests. The Registry regularly harvests also from OAIster

It can also import lists from OAI list-friends automatically:

> I take the archives.eprints to be the closest to a definitive list of the
> OA Institutional Repositories which we are concerned with here - alhtough I
> notice that our 'DSpace_at_Cambridge' repository
> <> is not included.

DSpace_at_Cambridge is in the Registry: See

But it is "not in Celestial" because
is either not the correct OAI base URL or does not work.

In contrast, Cambridge's other OA Archive *is* in Celestial:

All OA archive managers are encouraged to register their Archives, including
their OAI Base URL, and to contact Tim to make sure it works:

(I have emailed this posting to Cambridge's Tom de Mulder and Peter Morgan
in the hope that they will work with Tim to make sure Cambridge is

> I see the distinction between OA Archives and the Open Access Initiative.

Yes, the OAI protocol is for all digital contents, whether OA or non-OA.
It concerns metadata interoperability.

> Maybe this is not strictly relevant to this forum and a basic
> misunderstanding of the purposes of archiving, but I still cannot
> understand why people are archiving *just* the metadata and not the full
> text. It makes OA search engines like OAIster more like a any other
> standard bibliographic database with mostly subscription-only access.

You are quite right about the latter. And the main reason they are only
archiving metadata is what I have already pointed out: The low number of
institutional (full-text) OA self-archiving requirements to date.

But a second reason is that for some kinds of objects (non-OA objects,
i.e., not preprints, postprints or dissertations, e.g., library or
institutional records) the institution may not *want* to archive the
object, only its OAI metadata. The solution, as noted, is automatic
distinction between OA full-text and other kinds of OAI records.

> I am interested in the whole area of Open Access and keeping up with
> developments. This forum is excellent for that purpose.


Stevan Harnad
Received on Mon Jun 13 2005 - 14:40:09 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:55 GMT