Re: Repositories: Institutional or Central?

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Tue, 10 Feb 2009 10:46:38 -0500

On Mon, Feb 9, 2009 at 3:56 PM, Tomasz Neugebauer (Digital Projects &
Systems Development Librarian, Concordia University) wrote:

      A granting agency can make open access to the results of
      the research a condition of funding, but a university
      mandate that makes the university IR the compulsory locus
      of deposit... is not a good idea.  An appeal to
      individualism of the researchers should be sufficient...

IRs have been going up for nearly 10 years now; there are over 1000
of them, and the unmandated ones all remain near-empty (< 15% ),
whereas the (very few) mandated ones approach 100% annual deposit
within about two years: How much longer do you propose we go on
waiting for "individualism" (unmandated deposit) to start proving to
be sufficient?

You are only now setting up Concordia's IR, and you have no direct
experience of the time that has already elapsed worldwide, and how
the hope that there would be spontaneous, "individualistic" deposits,
especially with assistance and incentives, have consistently proved
insufficient. Now that we have heard your a-priori ideological views
on the matter, it might be a good idea for you to spend the next few
years observing what actually happens. 

Meanwhile, those of us who have already been through this many times
will continue to advocate the solution for which there is already the
evidence that it works.
      "How do we get more research to be open access?" - I
      think this is still an open question - certainly
      voluntary support (and usage) by the researchers seems
      more beneficial to me than a mandated 'compliance'.  The
      current voluntary submission rates in CTRs is a fact that
      supports their role.  Perhaps IRs can learn something
      from CTRs in terms of the type of services that they

The (common) error you are making is the failure to normalize in
comparing IR and CR deposit rates: The numerator is the number of
deposit (per year, of that year's target content) and the denominator
is the total target content (per year, of that year's target

Of course CR numerators are much bigger than IR numerators: CRs'
annual target content ranges over an entire field's output, across
all institutions, whereas any individual IR's annual target content
consists of that institution's annual output only (across all

Once you normalize the annual deposit rate by dividing the deposit
count by the total annual target content count, you will find that in
almost all fields (but not all: see below) CRs have the very same
deposit rate as IRs. The very few exceptions (computer science,
economics and physics) are due entirely to the nature of the field,
and not the nature of the repository. 

High Energy Physicists, who have been spontaneously self-archiving at
close to 100% since 1991, happen to deposit in a CR, Arxiv, which
today has a total of 518,884 documents (including other subfields of
physics, plus a few other subdisciplines as well. (All figures are
from ROAR.)

Computer Scientists, who have been self-archiving even longer, mostly
deposit on their own distributed institutional websites, and these
deposits are then harvested by their field's CR, Citeseerx -- but
that CR too has a healthy total for its field of 716,772 documents.
So does Repec, with 774,432 documents, which is likewise a harvested
rather than a direct-deposit CR.

There are two lessons to be learned from these data. First, even in
CRs with a strong normalized deposit rate (as in computer science,
economics and physics), their success has nothing to do with locus of
deposit, since two of these three big ones is harvested from
distributed institutional websites. That makes them a lot more like
google or google scholar than like repositories, since no one
deposits directly in google and google scholar. Second, the success
of this CRs has everything to do with discipline-specific practices,
because all three of these disciplines have long had the practice of
sharing pre-refereeing drafts (preprints) before publication. 

Other disciplines, too, might one day find it useful to share
unrefereed drafts, but not all will do so, probably not even most,
and the sharing of unrefereed preprints is not what OA is primarily
about: It is about the sharing of refereed postprints. In some
fields, such as biomedicine, the dissemination of unrefereed, and
possibly invalid results might even be dangerous to public health; in
other fields it might simply be too risky for the researcher's
reputation to publicly post unvalidated writings. (This is all up to
the researcher, and cannot and should not be mandated.)

This brings us to the biggest CR of all, PubMed central, with
1,525,967 documents: Most of these are not deposited by their authors
at all (only the annual 80,000 mandated by NIH are); the rest are the
result of various arrangements with the publisher, after various
embargo periods of up to a year or more have elapsed.

We have now surveyed the top 4 CRs, to discover that there is in fact
no lesson at all to be learnt there by IRs, on how to overcome the
15% spontaneous-deposit baseline. It has nothing to do with local vs
central deposit, nor with the functionality of CRs.

The CR functionality issue is even more of a red herring, because of
course users will consult the harvested global service, the CR, not
the individual, distributed local sources, the IRs, for navigation
and search, just as they consult google and google scholar. It would
be absurd to implement sophisticated direct search capability at the
single IR level, when the obvious locus for search is the central
harvester level -- and again, that has nothing whatsoever to do with
whether the central service is itself a locus of direct deposit, like
Arxiv, or harvested from distributed local sites, like citeseerx or
google scholar.

The only relevant functionality at the level of the repository, the
locus of deposit, is author (depositor) functionality, not user
      I think that we would need to let researchers have the
      opportunity to voice their opinion... before we can be
      sure that no researcher would indeed complain.

Tomasz, now that you have voiced your own opinion, it would be a good
idea for you to read the background literature on this topic. There
you will find the large, multidisciplinary and multinational author
surveys that were conducted several years ago by Alma Swan and
Sheridan Brown, in which researchers did indeed voice their opinion,
and their opinion was that they would not deposit until and unless it
was mandated by their institutions and/or funders, but that if and
when deposit was indeed mandated, 95% would deposit, and over 80%
would deposit willingly. This finding has since been confirmed by
others; and Arthur Sale has gone on to do studies on authors' actual
behavior, with and without a mandate, to find that authors do indeed
behave in accordance with the opinion they voiced in the Swan/Brown
surveys, with their actual deposit rate approaching 100% within two
years of the adoption of a deposit mandate (but languishing at the
baseline 15% -- or 30% if incentives and assistance are provided --
if deposit is not mandated).

Stevan Harnad
Received on Tue Feb 10 2009 - 15:53:20 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:40 GMT