Re: OA Archives: Full-texts vs. metadata-only and other digital objects

From: Tim Brody <>
Date: Mon, 13 Jun 2005 15:20:20 +0100

Tim Gray wrote:
> Stevan
> Thank you for your full and illuminating reply to my query about how much
> material in OA archives is available as full text. I am surprised at how
> low you estimate the figure to be and that it is not, yet, possible to
> produce a definitive number.

Knowing the difference between a "full-text" (also whether it's
scholarly/published/peer-reviewed) is something in the realm of Google
Scholar, Citeseer etc.

Without wishing to recreate one of those services I don't know of a
method for producing a definitive number. I suspect simple approaches
(e.g. "does record have PDF link") will be undermined by (sorry for
picking on you!) sites like:
No prizes for spotting why that wouldn't work :-)

> I am wondering if the Open DOAR (Directory of Oopen Access Repositories -
> the 'sister project' to the Directory of Open Access Journals, DOAJ) will
> set strictly 'full text only' rules for inclusion in its directory? And how
> will it relate to the archives.eprints directory you are involved with? It
> gets confusing to me because there are so many lists of repositories around
> on the web. How does the celestial harvesting list you mention relate to
> the archives.eprints list (are they the same list?) or the large list kept
> by the University of Illinois at Urbana-Champaign (UIUC) at
> <>?

Celestial is an OAI cache - it retrieves every metadata record from
those archives I've added to it. To make archives.eprints (IAR) I
stapled together the GNU EPrints listing with Celestial's record counts
(as an aside, anyone can use the records graphs from Celestial). I keep
a firmer technical control of Celestial than I do the IAR.

UIUC is the point of entry to get added to OAIster, but provides
analyses of all *OAI* repositories registered with it. The IAR includes
many archives with no or broken OAI interfaces, as well as aggregates
(e.g. single entry with multiple OAI interfaces). We also collect
additional metadata in the IAR that isn't exposed by OAI (type,
software, etc.). (Not forgetting the registry at &
Hussein Suleman's OAI explorer)

My hope and expectation is that OpenDOAR will include some metric of
full-textness. There was also an effort for the recent Amsterdam
SURF/JISC/CNI meeting to ascertain some figures (by survey) for the
content of IRs - I believe that report will be published in the next
month or so.

> I take the archives.eprints to be the closest to a definitive list of the
> OA Institutional Repositories which we are concerned with here - alhtough I
> notice that our 'DSpace_at_Cambridge' repository
> <> is not included.


> I see the distinction between OA Archives and the Open Access Initiative.
> Maybe this is not strictly relevant to this forum and a basic
> misunderstanding of the purposes of archiving, but I still cannot
> understand why people are archiving *just* the metadata and not the full
> text. It makes OA search engines like OAIster more like a any other
> standard bibliographic database with mostly subscription-only access.

I'm glad to see you're an "archivangilist" rather than a "repologist"
('sorry, the full-text isn't available here')!

It's the IR vs Open archives paradigm. The IR serves an institutional
need to *track* as well as to *expose* research output. Tracking
research output does not require making that research available for-free
on the Web. The purpose of Open archives is to make research more
efficient by maximising access to research, hence maximising research

If a high quality body of freely accessible literature is available
through IR's, then the services that build on them will be more useful.
There are a lot of records appearing out there, but the full-texts
available from ad hoc Web pages still dwarfs IRs. There is also no clear
distinction between "prestigious" research and the "capture all"
philosophy - administrators and authors need to realise that what they
put into the IR may very well turn up on automated CVs, and they
probably don't want to have their high-impact peer-reviewed articles
hidden amongst 1000's of powerpoint slides!

Tim Brody <>
Administrator, Institutional Archives Registry
Received on Mon Jun 13 2005 - 15:20:20 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:55 GMT