Re: Central versus institutional self-archiving

From: Stevan Harnad <>
Date: Sat, 2 Oct 2004 13:16:41 +0100

On Fri, 1 Oct 2004, [identity deleted] wrote:

> While OAI compliance is a sine qua non condition of some measure of
> inter-operability, it does not (yet?) ensure the kind of ease of retrieval
> that other forms of archiving can provide, including some form of central
> archiving.

This is incorrect.

This erroneous view that central archiving is somehow better or safer
than distributed/institutional archiving is exactly analogous with older
views that on-paper publication is somehow better or safer than on-line
publication. The latter papyrocentric habit and illusion has happily
faded, thanks mainly to the force of the example and experience with the
growing mass of on-line content and usage. (But this obsolete thinking
did not fade before it managed to delay progress for several years;
nor has it faded entirely, yet!)

The instinctive preference for central over distributed archiving is a
remnant of that same papyrocentric thinking ("the texts are safer and more
tractable when they are all located in the same physical place") and will
likewise fade with actual experience and more technical understanding. The
trouble is that the preference (in both cases) is invariably voiced in
contexts and to populations that lack both the technical expertise and
the experience with the newer, untrusted modality.

And it always appeals to an uninformed audience that is a-priori more
receptive to what more closely resembles the old and familiar than what
resembles the new and less familiar, and that bases its sense of what is
"optimal" not on objective experiment and evidence, but on subjective
familiarity and habit.

The place to voice any doubts or uncertainties on technical questions
like this is among technical experts with relevant experience, such as
the OAI technical group, not in the wider populace that is still naive
and leery about the online medium itself, archiving, and open access.

> Let us not forget that OAI-compliance may also lead to a mixing of various
> levels of documents, for example some peer-reviewed, others not.

The Eprints software includes the tag "peer-reviewed" and "not peer reviewed".
This means documents can be "de-mixed" according to the metadata tags, as
intended. In addition, the journal-name tag is an indicator. The old idea that
physical location is the way to de-mix is obsolete in the distributed
online era that the Web itself so clearly embodies.

Moreover, the mixing of types of documents is a function of the archiving
policy, not of the archive-type (institutional or central) or location.

Lastly, the inclusion of both peer-reviewed journal articles *and* both
preprints and post-publication revisions and updates is a desirable
complement, and can likewise be handled by various forms of pre-
and post-triage using both the metadata and meta-algorithms based on
metadata and full-text (de-duplication, dating and versioning at the
harvester level).

> because of this, the perception of archives that are only OAI-compliant
> may not be entirely favorable. Scientists/scholars may not make much or
> even any use of these sources simply because they consider them as too
> "noisy" or worse.

Are we then to recommend policy not on the basis of the actual empirical
and technical facts, but on the basis of the prevailing "perception"? If
we had adopted that strategy, we would have renounced the online medium
itself a-priori, and renounced also the notion of Open Access! We are
here to promote what is *in fact* optimal, not what is *perceived* to be
optimal, by force of existing habits and practices.

(Moreover, what is specifically at issue here is what form of
self-archiving to *mandate* -- institutional or central for erstwhile
non-self-archivers. This is an opportunity to guide and shape new habits,
rather than to be guided by old ones.)

> Central (OAI-compliant) archiving is not mutually exclusive with distributed,
> OAI-compliant archives; it simply completes and reinforces the archival
> system that is being presently explored and experimented with.

That is entirely correct, and is one of the premises of OAI-compliance and
interoperability: *All* forms of archiving are in fact forms of distributed
archiving and are made interoperable and equivalent by OAI-compliance. So
no one has said central archiving lacks any of the functionality of
institutional archiving.

The reason I (and others) are coming out so strongly in favour of
distirbuted institutional archiving is hence not *functional*, since we
fully understand how and why both forms of archiving are functionally
equivalent (and indeed it is the advocates of central archiving that
often fail to grasp this, and argue on the basis of putative functional
advantages of central archiving that do not in fact exist).

The reason I (and others) are coming out so strongly in favour of
distributed institutional archiving has to do with the probability of
OA content-provision itself, i.e., the probability that the OA archives
will be filled, rather than lie fallow, as well as the closely connected
probability that archive-filling will propagate across fields and
archives, rather than remain restricted to just one field and archive. (It
also has a little to do with distributing the archiving burden and costs,
but that is not the primary reason.)

The authors of the annual 2.5 million articles that we would all like to
see self-archived as soon as possible are virtually all affiliated with
institutions of their own (universities or research institutions). They
also each have disciplines, but author/institution is the relevant pair
here, not author/discipline: not just because disciplines are nebulous
entities or because few disciplines have central archives and creating and
maintaining them is a much more nebulous matter, but because it is authors
and their respective institutions (not authors and their disciplines),
that share a common stake in maximizing the access to and impact of their
(joint) research output -- not authors and their respective disciplines
(which are, if anything, a locus of competition for impact, rather than
being its joint co-beneficiaries).

Moreover, institutions (particularly universities) also share most or all
of the disciplines. So when a self-archiving policy or practice is adopted
by an institution at all, the probability is very high that it will also
propagate across all of that same institution's disciplines. Moreover,
as institutions (and not disciplines) are in competition with one
another for visibility and impact, the probability is also high that
if some institutions adopt the policy and practice of self-archiving,
this will also propagate across (competing) institutions.

In addition, institutions, being the employers of their researchers and
the co-beneficiaries of research impact, are in a position to mandate,
monitor and reward compliance with an institutional self-archiving policy
(through employment, salary, promotion, tenure, etc.).

Disciplines have neither the interest nor the wherewithal to mandate,
monitor and reward central self-archiving. Neither do Learned Societies.

The one prominent and valuable non-institution-based exception is
research-funders, whether discipline-based or national/international and
pan-disciplinary: Research-funders too have an interest in maximizing
the access to and impact of the research they fund, and are hence in a
position to mandate, monitor and reward self-archiving.

However -- and this is a critical point, particularly with the US/NIH
self-archiving mandate -- research-funders can mandate, monitor and reward
self-archiving either way: They can mandate central self-archiving, as
the current version of the US/NIH recommendation does, or they can mandate
institutional self-archiving, as the UK recommendation does. The effect,
for the specific funded research itself, is exactly the same. The critical
difference is in the probability of propagation *beyond* the specific funded
research in question, toward the 100% OA that we are all seeking.

Having established that institutional and central archiving are
functionally equivalent, and that research-funders can equally well
mandate, monitor and reward self-archiving on either a central or
an institutional basis, the only relevant question is: Which of these
otherwise completely equivalent means is more likely to yield more OA?
And the answer is unequivocal: mandating institutional self-archiving,
according to the UK recommendation, rather than central self-archiving,
according to the US recommendation.

Yes, there is some probability that discipline-based central-archiving
mandates by research-funders will propagate across disciplines and
research-funders too (and they no doubt will). But that propagation is
just as likely (and will in fact occur far more readily and quickly
of its own accord) if each discipline and research-funder does not
need to create and fund and maintain a central archive of its own, but
can distribute that load on the institutional OAI-archiving network,
which is already growing because of the self-archiving mandates of
both prior research-funders and research-institutions themselves, and
is already propagating across the disciplines within each institution,
and across institutions.

(Institutional self-archiving, by the way, is actually distributed
locally too, with departments administering and monitoring compliance
in their own sectors: that is part of the beauty and functionality of
OAI-interoperability -- as well as of the modular OAI-compliant software:
"Institutional" self-archiving should really be called
"Institutional-Departmental" self-archiving.)

A forthcoming analysis by Rowland & Swan commissioned by the UK Joint
Information Systems Committee (JISC) has come out decisively in favor of
distributed institution-based self-archiving over central self-archiving,
for a variety of reasons, including both functional and economic ones
based on efficiency, cost and ease of implementation and monitoring,
as well as strategic reasons based on institutional research culture and
probability and ease of compliance. (I will circulate the URL of that
report as soon as it is released.)

> Consequently, it does not make much sense to focus on this issue. Simply let
> archives flourish wherever they may and in whatever form.

On the contrary, it makes a great deal of sense to focus on this issue,
to try to understand it, and to try to guide policy and implementation
in the direction that is likely to maximise the propagation of OA
self-archiving across disciplines and institutions, rather than to
minimize it:

The US central self-archiving mandate will certainly generate OA for
NIH-funded biomedical research. But why not, for the same money and
mandate, generate so much more OA, by simply dropping the stipulation that
the self-archiving must be central (in PubMed Central), and instead
let the self-archiving propagate naturally across institutions and their
disciplines? This OA maximization is attainable at no functional cost
or sacrifice whatsoever. All it requires is a small parameter change
that will confer huge benefits.

> If some institutions seem to feel more at ease with the presence of some
> centralized archives, so be it, so long as they do not object to the
> parallel development of institutional, disciplinary or even individual
> archives.

I could not follow the logic of this. (It seems to confound two senses
of the word "institution"). As far as I know, no one has spoken about
what institutions do or not feel "at ease" with (and most individual
sentiments of "ease" here are more about ease with what individuals are
accustomed to, rather than about what is actually optimal, either for
the individuals, their institutions, or OA). The issue concerns what form of
mandated self-archiving is likely to generate the most OA, soonest.

The concerned parties seem to be the following: (1) NIH, which is a
central (national) research-funding agency, which is also associated with
(2) NLM, which has a superb and invaluable central index for abstracts
and links across all of biomedicine, PubMed, and which also has associated
with it a small but useful and growing central full-text OA Archive,
PubMed Central (PMC).

The US Congress is considering making it law that NIH should mandate
the self-archiving of all NIH-funded research in PMC. The self-archiving
mandate for NIH-funded research is extremely desirable and welcome. The
point under discussion here is that by changing one small parameter in
the mandate -- namely, to require only that the research be self-archived
in an OAI-compliant OA archive, without stipulating that it must be PMC
-- the very same NIH mandate can and will generate far, far more OA,
naturally propagating of its own accord across institutions and their
disciplines. The functionality will be identical (and PMC can easily
and automatically harvest all the NIH-funded institutional metadata if
it wishes, as well as serving as a backup OAI archive for the full-texts
if an author's institution does not yet have an OAI archive).

The UK mandate (if/when implemented) is already optimal in this regard.

Stevan Harnad

A complete Hypermail archive of the ongoing discussion of providing
open access to the peer-reviewed research literature online (1998-2004)
is available at:
        To join or leave the Forum or change your subscription address:
        Post discussion to:

UNIVERSITIES: If you have adopted or plan to adopt an institutional
policy of providing Open Access to your own research article output,
please describe your policy at:

    BOAI-2 ("gold"): Publish your article in a suitable open-access
            journal whenever one exists.
    BOAI-1 ("green"): Otherwise, publish your article in a suitable
            toll-access journal and also self-archive it.
Received on Sat Oct 02 2004 - 13:16:41 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:36 GMT