Re: The OA Deposit-Fee Kerfuffle: APA's Not Responsible; NIH Is

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Thu, 17 Jul 2008 14:40:38 -0400

Chris Armbruster, as in the past, and like many others, completely
conflates the problem of content and the problem of functionality:

(1) Virtually all OA repositories today -- institutional and central
-- are low on content: Only about 15% of annual refereed research is
being deposited today.
(2) The only two exceptions are the fields of physics and economics,
where authors have been spontaneously depositing their papers in,
respectively, Arxiv and various collections of working papers in
economics, now harvested by RePEc.

(3) Even after many years of their positive example, the
self-archiving practices of these two fields have failed to
generalize to the rest of scholarly and scientific research.

(4) This is why self-archiving mandates -- from research institutions
and research funders -- are needed.

(5) Since all research, in all fields, originates from institutions,
institutional repositories (IRs) are the natural, convergent locus of
deposit for both institutional and funder mandates.

(6) Because IRs are OAI-compliant, hence interoperable, their
contents (metadata + links, or metadata + full-texts) can be
harvested into central collections (CRs) of various kinds
(subject-based, funder-based, nation-based, or global).

(7) Functionality can be enhanced at the harvester level in many
ways; all that is needed is the content itself.

(8) But we won't have the content unless we mandate it.

(9) And mandates won't work if funder mandates and institutional
mandates are in competition, and diverge.

(10) Institutions are the content-providers, in all fields, funded or

(11) Institutions share with their researchers a joint interest in
maximizing the accessibility, uptake, usage, and impact of their
joint research output.

(12) Institutions can also monitor and ensure compliance with funder
mandates (alongside their own institutional mandates).

(13) Locus of deposit has absolutely nothing to do with

(14) But locus of deposit has everything to do with ensuring that the
content is provided.

On 17-Jul-08, at 3:54 AM, Armbruster, Chris wrote:

> I would like to publicly applaud the NIH policy makers for
> a central repository.

NIH could "strengthen" its central repository (CR) (PubMed Central)
irrespective of the locus of deposit. Locus of deposit is relevant to
maximizing content provision and unrelated to functionality.

> As far as I can see, after several years,
> institutional repositories have not made decisive progress in being
> useful to either authors or readers by providing services that are
> of any value (beyond storage). 

The purpose of IRs is not to provide services but to provide content.
The services are provided at the harvested collection (CR) level.

And the usefulness of CR services depends entirely on whether the
content -- on which the service is to be based  -- is actually
provided in the first place.

> If I look at the kinds of services
> that arxiv, SSRN, CiteSeerX, RePEc and PMC offer, I see no
> emerging from the IRs, no matter how much you synchronize and
> harvest. 

I have great difficulty understanding the point Chris is trying to
make here: 

Both CiteSeerX and RePEc are harvester services. There is no CR there
in which authors deposit directly. CiteSeerX and RePEc (like Google
Scholar) harvest their contents from IRs and other institutional and
personal sites on the web.

Arxiv, as noted, is a longstanding CR in which physicists have been
depositing directly since 1991, but there is no sign of that
spontaneous phenomenon duplicating itself in any other field (even
though CRs are available in other fields too, including CogPrints, in
cognitive sciences, which I created in 1997).

SSRN is a CR, but the way to assess how full it is is to divide its
annual contents by the global annual output in all the fields
covered. It will be found to hover at the very same spontaneous
deposit level (15%) as the IRs. And no matter how many or wondrous
the services you provide over it, 15% is still just 15%.

No one would search a topic IR by IR, so it makes no sense providing
certain services at the IR level. (IRs provide local services
pertinent to the institution itself, such as generating CVs, research
assessment data, and usage statistics. If you want to search across
IRs, go to OAIster, Google Scholar, CiteSeerX, or Citebase. 

But you will be disappointed, because all you will find is about 15%
(except in physics and economics).

That's what the mandates are for.

And that's why it's important that institutional and funder mandates
converge on the providers, the IRs, rather than competing, by
requiring direct deposit in institution-external CRs (instead of just
having the CRs harvest).

> Also, centralized repositories seem to lend themselves
> much more easily to the creation of overlay services that extract
> further value for the scholarly community. 

Overlay services can be developed over any OAI-compliant
repositories, whether IRs or CRs. The locus of deposit makes no
difference whatsoever. That was the whole point of the OAI protocol.

> Just consider the following
> service: (developed in Germany, based on
> the efforts of the NIH, a splendid example for the kind of
> trans-national innovation that has become possible on the basis of
> repositories).

And if NIH mandated direct deposit in IRs, and harvested PMC content
from there, the very same services could be built on it. The
difference would be that the NIH mandate would be convergent and
synergistic with institutional mandates, generating far more content,
beyond just what is funded by NIH, across all fields, institutions
and countries.

> I hope the NIH holds fast and that more research funders will
> deposit in centralized repositories - either discipline-specific
> or at least national.

For the "success" of national CRs, see France's HAL. Without
mandates, it languishes at the usual 15%, no matter how you cut the

No, Chris, what's missing is content, not functionality. And the
reason for the focus on IRs is because that is the convergent,
systematic way to get all the content, not depositing willy-nilly and
hoping that that will somehow cover all of OA space.

Stevan Harnad
Received on Thu Jul 17 2008 - 22:36:39 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:22 GMT