Please Don't Conflate Direct with Harvested CRs (Central Repositories), Or Deposit Locus With Search Locus

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Sun, 8 Feb 2009 14:59:37 -0500

On Sun, Feb 8, 2009 at 12:27 PM, Imre Simon <> wrote:

      "It is an unquestionable reality that unmandated IR's
      [Institutional Repositories] remain all but empty. ArXiv,
      CiteSeerX, Repec and SSRN are the four examples of large
      thematic repositories [Central Repositories, CRs] I know
      of which are populated without a mandate. One wonders

(1) There is a profound difference between (1a)  Arxiv (and perhaps
also SSRN ), on the one hand (these are Central Repositories [CRs] in
which authors deposit papers directly) and (1b)  CiteseerX (and
partly also Repec), on the other hand (for these are harvested CRs,
their papers and metadata harvested from local repositories, usually
at the author's host institution, where they have been directly
deposit). Harvested CRs are like OAIster , or, for that matter,
Google Scholar!

(2) The difference is crucial, because central vs. institutional
locus-of-deposit is what is really under discussion here; no one is
disputing that navigation and search are done, and should be done, at
the central level, irrespective of whether CR deposit is direct or CR
contents are harvested.

(3) There are several reasons why these particular CRs are fuller
than IRs:

      (3a) An entire discipline is bigger than a single
      (multidisciplinary but local) institution

      (3b) These CRs contain only the deposits of those
      individual authors and disciplines that do deposit
      spontaneously, unmandated; these amount to about 15% of
      OA's total target output, and that is well known. The
      problem is the remaining 85% -- which will be pretty
      homogeneously represented in each individual
      multidisciplinary institution's IR (85% empty if

      (3c) But there is a systematic denominator bias here, for
      the success of an IR in capturing its institutional
      research output is reckoned as the ratio of its annual
      deposited papers to the total annual paper output for
      that institution, whereas for a CR this must be reckoned
      as the ratio of its annual deposited papers to the total
      annual output for the discipline or disciplines the CR
      covers (worldwide)! For certain disciplines and
      subdisciplines, such as High Energy Physics,
      Astrophysics, Economics and Computer Science this ratio
      will be quite high. But they are not OA's problem
      disciplines, because they are depositing already, whether
      centrally or locally, unmandated, and have been doing so
      for years. OA's problem is all the disciplines that are
      not doing so, for they are the main basis of the 85%
      emptiness of IRs.

(4) The reason all this matters, and the reason it is so important
not conflate direct and harvested CRs, nor to conflate deposit locus
with search locus, is that the issue of locus-of-deposit and mandates
is very deeply interrelated.

(5) Deposit mandates can be funder mandates or institutional

(6) Funder mandates only cover funded research, and not all (perhaps
not even most) research output is funded; and this would be true even
if all funders already mandated OA.

(7) In contrast, (virtually) all research output (and hence all of
OA's target content) is institutional. Institutions are the universal
research providers.

(8) So if all institutions mandated OA, that would generate universal

(9) Hence if all of OA's target content is institutional output, it
follows that, inasmuch as the 85% of research that is not being
deposited spontaneously will be deposited once it is mandated, what
is most needed is universal institutional OA mandates.

(10) Funder mandates already help, for their portion of OA's target
content, but they would help far more if they could facilitate the
deposit not only of the research they fund, but all research: in
other words, if they could help induce institutions to mandate OA for
all of their research output, not just the subset mandated by the

(11) In order to be able to do this, funder mandates need only ensure
the presence of one implementational detail, which does not lose any
of their own target content, but potentially extends also to the rest
of the research output of each one of its fundees' institutions.

(12) Funders need to stipulate the fundee's own IR as the
locus-of-deposit for complying with the funder's deposit mandate (or
an interim backup repository like DEPOT, to host deposits until the
institution sets up an IR, to which the deposits can then be
automatically exported: DEPOT currently has only 66 deposits because
most UK funders are either requiring CR deposit or leaving it open
which repository their fundees deposit in).

(13) The contents can be harvested to CRs from there.

(14) The issue of search and functionality at the harvester level is
nothing but a red herring. (Citeseerx is a perfect example of the
functionality of a CR that harvests from distributed IRs.)

(15) Nor do the special features of the few disciplines (such as
computer science -- the first -- physics and economics) that took
spontaneously to self-archiving without a mandate long ago have
anything to do  with either (a) the IR/CR issue, or (b)  viable
alternatives to mandates (because no one at all no one so far has
demonstrated any, apart from waiting and waiting) for generating the
85% of content missing from IRs, and OA as a whole.

Stevan Harnad
Received on Sun Feb 08 2009 - 20:00:24 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:40 GMT