Re: New Ranking of Central and Institutional Repositories

From: Stevan Harnad <>
Date: Sat, 9 Feb 2008 15:10:43 +0000

On Sat, 9 Feb 2008, Thomas Krichel wrote:

> > Stevan Harnad:
> > Yes, the three first ranks go to "thematic" (i.e., discipline- or
> > subject-based) Central Repositories (CRs): (1) Arxiv (Physics), (2)
> > Repec
> > (Economics) and (3) E-Lis (Library Science). That is to be expected,
> > because such CRs are fed from institutions all over the world.
> Thomas Krichel:
> Yeah, but E-LIS is really small, looking at it today it tells
> us it has 7253 documents. That IRs struggle to compete with that
> sort of effort demonstrates that IRs don't populate, even in the
> presence of mandates. No amount of Driver summits will change this.
> Disclaimer: I am the founder of RePEc and also do volunteer work
> for E-LIS.

(0) IRs "don't populate, even in the presence of mandates"?

(Could Tom please state his evidence for this, comparing the 12 mandated
IRs so far with unmandated control IRs -- as Arthur Sale did for a subset,
demonstrating the exact opposite of what Tom here claims.)

(Perhaps all Tom means here is that ROARMAP is not yet populated with
enough mandates -- in which case I suggest he stay tuned, and watch what
happens in 2008.)

(1) I don't know the Webometrics ranking formula, but it is
clearly based on multiple weighted parameters, and not merely on
total number of records (country, size, visibility, rich files,
"scholar"), otherwise the rank order would have been the same
as what ROAR gives you if you select "Sort by Total Records":

The Webometrics "Size" parameter seems to be the same as ROAR's "Total
records" -- except Webometrics so far seems to omit PubMedCentral, which
would otherwise be the biggest of the CRs. I expect that Webometrics'
coverage and perhaps also their formula is still being refined. [They
only seem to cover a total of 200 CRs and IRs right now.] And of course
there is also still the not-yet-solved problem of distinguishing the
records that are full-texts from those that are just metadata, and
distinguishing OA content from other kinds of deposits. Stay tuned.

Another prominent omission is OAIster, which of course is a harvested
meta-repository of repositories (both IRs and CRs) -- but that should
cause some logical reflection about the notion of a "CR" altogether: For
of course OAIster is itself a CR! Moreover, some of the other CRs are
themselves harvested, either from individual websites or from IRs.
(CiteSeer is, and so is Repec -- and so, for that matter, are Google
Scholar and Google!) So the CR-enthusiasts still need to sort out their
logic: they must sort out harvested CRs from CRs that are deposited into

(My own provisional conclusion is that as it becomes obvious that
virtually every researcher has or will shortly have his own
institution's IR, it is the local OAI-compliant IRs (including
departmental IRs) that are the natural, optimal, and systematic locus of
direct deposit, since institutions are the direct providers of the
research in any case. "CRs" should be seen and treated as central
services, harvesting (either full-texts or just their metadata) from the
distributed IRs, rather than being seen as alternative "repositories",
either for direct deposit (in foolish and dysfunctional competition with
direct IR deposit), or as entrants in a size competition with IRs --
when the institutions and their IRs are the source of their contents in
any case!)

(And CRs are encouraged and welcome to enhance the metadata of their
harvested contents. But metadata enhancement should not get in the way
of content provision itself: That is putting the cart before the (still
largely absent) horse!)

(2) All repositories are still "struggling" for content. Arxiv and Repec
are based on long-standing, semi-successful spontaneous self-archiving
practices by a very small, unchanging number of disciplines (and even
that is very far from covering all or most of the research article output
of Physics or Economics). The rest of Physics and Economics -- and the
rest of the disciplines, and the rest of the research institutions of
the world -- are not capturing their research output spontaneously. That
is by now quite obvious. The solution is equally obvious, tried, tested,
and already shown to successfully approach a deposit rate of 100%
within 1-2 years: Green Self-Archiving Mandates.

But again, comparing IRs and CRs in this regard is comparing apples and
fruit: There are (growing alongside (i) the one high-volume spontaneous
self-archiving CR, Arxiv, (ii) the semi-central CR, Repec, and (iii) the
harvested CR, Citeseer), 12 university mandates and 22 funder mandates
adopted, several quite recently -- plus 9 more proposed, including
nation-wide multi-university mandate proposals in Brazil, and across
the 791 universities in 46 countries in Europe.

And the question of the *locus* of mandated deposit still needs to
be sorted out for the funder mandates: they ought to be mandating IR
deposit and central harvesting rather than going against the tide by
needlessly mandating direct central deposit.

So it's early days for IR mandates, hence also for IR content over and
above the spontaneous deposit baseline of about 15%. But stay tuned to
see whether it is the apples, oranges, etc. or the fruit that prove to
multiply more fruitfully!

(3) Disclaimer: I started in 1994 as an institutional self-archiving
advocate. But then, foolishly enthralled by the spontaneous success of
Arxiv, I temporarily switched allegiance to central self-archiving. But
with the creation of the OAI harvesting protocol in 1999, which
effectively made IRs and CRs all interoperable, hence equivalent, my
allegiance sensibly reverted to where it should have stayed all along: to
the systematic source of all the target content, namely, each
researcher's own institutional IR. The institutions are the direct
providers of all the OA target content. It is in their own interests and
within their means to ensure that it is all made OA in their own IRs.
That is the optimal solution that scales, naturally and systematically,
to cover all of the world's research output, across all disciplines,
institutions, languages and nations. The rest is just a matter of
central harvesting, indexing, data-mining, and metadata enrichment of
the IR contents. But the first and foremost objective is OA content
provision, now; and the natural host for that is the author's own local

(It was my impression that Tom Krichel too was a fan of distributed
local self-archiving and central harvesting; as I recall, he was one of
those who warned me off of centralism during my brief fatuous flirtation
with it. But now Tom seems so comfortable with the continuing
spontaneous deposit rate of economists that he does not notice that this
spontaneous formula has utterly failed to generalize to all
other disciplines for well over a decade now, and that Green OA IR
Self-Archiving Mandates have meanwhile become the tried, tested and proven
means of generating 100% locally. Hence it is toward generalizing those
mandates that the OA movement must now devote its efforts (and is indeed
doing so, successfully).

Stevan Harnad

If you have adopted or plan to adopt a policy of providing Open Access
to your own research article output, please describe your policy at:

    BOAI-1 ("Green"): Publish your article in a suitable toll-access journal
    BOAI-2 ("Gold"): Publish your article in an open-access journal if/when
    a suitable one exists.
    in BOTH cases self-archive a supplementary version of your article
    in your own institutional repository.
Received on Sat Feb 09 2008 - 15:33:59 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:12 GMT