Re: Central versus institutional self-archiving

From: Atanu Garai/Lists <atanugarai.lists_at_GMAIL.COM>
Date: Sat, 8 Mar 2008 22:21:08 +0530

Thanks Stevan. These are key points that are coming to my mind.

Stevan Harnad wrote:
      On Sat, 8 Mar 2008, Atanu Garai/Lists wrote:

            Dear Colleagues
            This question is very basic. Institutions all
            over the world are
            developing their own repositories to archive
            papers written by staffs. On
            the other hand, it is very much feasible to
            develop thematic and
            consortia repositories wherein authors all
            over the world can archive
            their papers very easily. Both the approaches
            have their own pros and
            cons. However, having few big thematic (e.g.
            subject based) and/or
            consortia (e.g. Indian universities archive)
            repositories is more
            advantageous than maintaining hundreds of
            thousands small IRs, taking
            cost, management, infrastructure and
            technology considerations. Moreover,
            knowledge sharing and preservation becomes
            easier across the
            participating individuals and institutions in
            large IRs. If this
            advantages are so obvious, it is not
            understandable why there is so much
            advocacy for building IRs in all

      Not only are the advantages of central repositories (CRs)
      over institutional
      repositories (IRs) not obvious, but the pro's of IRs
      vastly outweigh
      those of CRs on every count:

This forum must have discussed this issue. Also, the objective of
posing this question should be made clear, so that you can find it in
the right context and spirit. At one point of time and still now, we
wanted to have disbursed information platforms and database. But with
the emergence of large digitisation projects, notably Google Books,
the advantages of having a centralised global databases are becoming
obvious. A choice between 'central repository' and 'IR' is a policy
decision for a university or group of universities and such a
decision is driven by number of factors. Again, the question is what
are the sequence of events and rationale that led the open access
community to select IRs as primary archiving mechanism over CRs.
Institutions should be able to make a choice of their own, but if you
want to advise the institutions what should be the key criteria to
advise them to go for own IRs, over the CRs.
      (1) The research providers are not a central entity but a
      network of independent research institutions (mostly

      (2) Those independent institutions share with their own
      researchers a
      direct (and even somewhat competitive) interest in
      archiving, evaluating,
      showcasing, and maximizing the usage and impact of their
      own research
      output. (Most institutions already have IRs, and there
      are provisional
      back-up CRs such as Depot for institutionally
      unaffiliated researchers
      or those whose institutions don't yet have their own IR.)

Points 1 and 2 are essentially dealing with the notion of
self-archiving mandate that the institution may or may not invoke for
its researcher. From an institutional point of view, the choice of CR
and IR will primarily be driven by management, impact and
effectiveness of the repositories. For universities which produce a
high number of research papers annually, creating IRs may be sensible
but there are universities in India that are producing only a handful
of research papers. My understanding is that for such universities
maintaining own repositories are less effective, even if we take cost
considerations alone. The issue  of  "a direct (and even somewhat
competitive) interest in archiving, evaluating,  showcasing, and
maximizing the usage and impact of their own research output" does
not conflict with the choice of having a CR (or rather global
repository). Independent institutions can have both mandated
self-archiving and archiving, evaluating, showcasing, maximizing the
usage etc. in CRs as well.
      (3) The OAI protocol has made all these distributed
      repositories interoperable, meaning that their metadata
      (or data) can all be
      harvested into multiple central collections, as desired,
      and searched,
      navigated and data-mined at that level. (Distributed
      archiving is also
      important for mirroring, backup and preservation.)

      (4) Deposit takes the same (small) number of keystrokes
      or centrally, so there is no difference there; but
      researchers normally
      have one IR whereas the potential CRs for their work are
      multiple. (The
      only "global" CR is Google, and that's harvested.)

Technology is not a constraint in making metadata interoperable,
though not without some compromise in the data quality. For full text
data, interoperability is challenged by copyright restrictions. These
dilemma are avoided intrinsically in CRs. On the other hand, large
scale CRs are having the opportunity to make full text search and
retrieval feasible. Volatility of harvested metadata from IRs is
avoided with the implementation of CRs.
      (5) The distributed costs of institutional self-archiving
      are certainly
      lower than than maintaining CRs (how many? for what
      fields? and who
      maintains them and pays their costs?), particularly as
      the costs of a
      local IR are low, and they can cover all of an
      institution's research
      output as well as many other forms of institutional
      digital assets.

You may like to give some empirical data here to corroborate your
statement. Creating and maintenance costs of IR are minimal, but if
you want to advocate and popularise IRs, you will have a staff. There
are some figures that were submitted to UK parliamentary committee.
CRs adopt all these costs and institutions may or may not give the
CRs same amount of subscription costs. Preserving "as well as many
other forms of institutional digital assets" was not in the IR's
mandate but obviously CRs can also do that purely from tech point of
      (6) Most important of all, although research funders can
      self-archiving mandates, the natural and universal way to
      ensure that IRs
      (and hence harvested CRs) are actually filled with all of
      the world's
      research output, funded and unfunded, is for institutions
      to mandate
      and monitor the self-archiving of their own research
      output, in their
      own IRs, rather than hoping it will find its way
      willy-nilly into
      external CRs.

Self-archiving and mandate is not a technological issue, it is a
regulatory one - hence, it can be done in IRs and/or CRs.
Atanu Garai
Online Networking Specialist
International Secretariat:
150, route de Ferney
CH-1211 Geneva 2
Tel: 41.22791.6249/67
Fax: 41.22710.2386
New Delhi Contact:
Tel: 91.98996.22884

Received on Sat Mar 08 2008 - 20:52:53 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:15 GMT