Estimating Annual Growth in OA Repository Content

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Sat, 9 Aug 2008 11:02:26 -0300

            Deblauwe, Francis (2008) OA Academia in
            Repose: Seven Academic Open-Access
            Repositories Compared

This is a useful beginning in the analysis of the growth of Open
Access (OA), but it is mostly based on central collections of a
variety of different kinds of content.

A useful way to benchmark OA progress would be to focus on OA's
target content -- this would be, first and foremost, peer-reviewed
scientific and scholarly journal articles -- and to indicate, year by
year, the proportion of the total annual output of the
content-providers, rather than just absolute annual deposit totals.

The OA content-providers are universities and research institutions.
The denominator for all measures should be the number of articles the
institution publishes in a given year, and the numerator should be
the number of articles published in that year (full-texts) that are
deposited in that institution's Institutional Repository (IR).

Just counting total deposits, without specifying the year of
publication, the year of deposit, and the total target output of
which they are a fraction (as well as making sure they are article
full-texts rather than just metadata) is only minimally informative.

Absolute totals for Central Repositories (CRs), based on open-ended
input from distributed institutions, are even less informative, as
there is no indication of the size of the total output, hence what
fraction of that has been deposited.

If an institution does not know its own annual published articles
output -- as is likely, since such record-keeping is one of the many
functions that the OA IRs are meant to perform -- an estimate can be
derived from the Institute of Scientific Information's (ISI's) annual
data for that institution. The estimate is then simple: Determine
what proportion of the full-texts of the annual ISI items for that
institution are in the IR. (ISI does not index everything, but it
probably indexes the most important output, and this ratio is hence
an estimate of what proportion of the most important output is being
made OA annually by that institution).

This calculation could easily be done for the only university IR
among the 7 analyzed above, Cambridge University's. It was probably
chosen because it is the IR containing the largest total number of
items (see ROAR) and one of the few IRs with a total item count big
enough to be comparable with the total counts of the
multi-institutional collections such as Arxiv. However, it is unclear
what proportion of the items in Cambridge's IR are the full-texts of
journal articles -- and what percentage of Cambridge's annual journal
article output this represents.

CERN is an institution, but not a multidisciplinary university: High
Energy Physics only. CERN has, however, done the recommended estimate
of its annual OA growth in 2006 and found its IR "Three Quarters Full
and Counting. http://library.cern.ch/HEPLW/12/papers/2/
CERN, moreover, is one of the 25 institutions, universities and
departments that have mandated deposit in their IR. Those are also
the IRs that are growing the fastest.

(Deblauwe notes that"Resources... remain a big issue, e.g., in 2006,
after the initially-funded three years, DSpace_at_Cambridge's growth
rate slowed down due to underestimation of the expenses and
difficulty of scaling up." I would suggest that what Cambridge needs
is not more resources for the IR but a deposit mandate, like
Southampton's, QUT's, Minho's, CERN's, Harvard's, Stanford's, and the
rest of the 25 mandates to date: SeeROARMAP.)

Stevan Harnad
American Scientist Open Access Forum
Received on Sat Aug 09 2008 - 15:08:29 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:26 GMT