Conflating OA Repository-Content, Deposit-Locus, and Central-Service Issues

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Mon, 30 Nov 2009 15:07:53 -0500

On Sat, Nov 28, 2009 at 12:08 PM, Chris Armbruster wrote:
>     CA: "I have some doubts that the juxtaposition of institutional versus central repository is helpful (any longer)"

Helpful for what?
It is not only helpful but essential if what one is interested in is
filling repositories with the target content of the OA movement. For
in order to fill them, you have to get their target contents
deposited. And to get their target contents deposited you have to
mandate deposit. And to mandate deposit you have to specify the locus
of the deposit. And the only two kinds of loci are institutional and
central. And it makes a huge difference to the probability of
achieving consensus and compliance on mandates *where* the mandates
propose to require the author to deposit: institutionally or

So the only ones for whom the distinction between institutional and
central repositories is not "helpful" are those for whom it is
immaterial whether repositories are full or remain empty. Their
concerns are, apparently, at some other, more abstract or idealized
>     CA: "that is why the proposition is to henceforth distinguish between four ideal types of repositories on an abstract level, so as to be able to examine each specific repository in more detail."

And while we are theorizing at an abstract level about ideal
repository types, real, concrete repositories remain mostly empty, in
no small part because of some funders' failing to adopt practical,
realistic mandates on locus of deposit, mandates that converge rather
than compete with institutional mandates.

The abstract distinctions among the four "ideal types of
repositories," apart from three of them being of doubtful substance,
have nothing to do with this crucial concrete distinction, three of
the four being "subtypes" of central repository.

To repeat: Only a portion of OA's target content is funded, but all of
it originates from institutions.
>     CA: "For example PMC was a subject-based repository, but it languished before it became a research repository (capturing publication outputs) due to a national mandate, which is compatible with also having a UK PMC and PMC Canada."

The only thing that changed with PMC was that it went from being an
empty repository to being a less-empty repository when full-text
deposit was mandated for NIH-funded articles.

That had nothing to do with its changing from being a "subject"
repository to a "research" repository. Its target contents were always
the same: biomedical research articles. The only difference is that
the mandates raised somewhat the proportion of their target contents
that actually got
But the cost of that increase was a great opportunity lost and a bad
example set -- because NIH (and now its emulators) insisted on direct
deposit in a central repository (PMC, and now its emulators) instead
of allowing -- indeed preferring -- institutional deposit, and then
harvesting, importing or exporting (one or many) central collections
and services therefrom.

That would have facilitated institutional mandates for all the rest of
OA's target content, not just research funded by NIH (and its
emulators), by spurring institutions -- the universal providers of all
research output -- to mandate institutional deposit for all the rest
of their research output too, funded and unfunded.

Not all funders copied NIH, however, and there is still hope that NIH
will rethink its arbitrary and counterproductive locus-of-deposit
policy, in the interest of all of OA's target content.
NIH Open to Closer Collaboration With Institutional Repositories
>     CA: "The point here is to examine (here: for the life sciences) past and (possible) future repository development and help stakeholders make informed decisions."

Help which stake-holders make which decisions about what, and why?

While repositories remain near empty -- and that includes PMC (or its
emulators) whose target contents comprise all of US (or other nations'
or funders') biomedical research, the only substantive thing at stake
is *content*, and the "stake-holders" are mostly institutions -- who
also happen to be the *providers* of all that content, funded and
unfunded, across all nations and funders.
>     CA: "Another example: the Dutch system looks like a network of institutional repositories, but is now part of a national gateway (NARCIS)."

So what? The only relevant thing is: what proportion of their own
annual research output are those Dutch institutional repositories
actually capturing?

The last time I asked Leo Waaijers, he admitted quite frankly that no
one has checked. But unless there is something
different about the air breathed in the Netherlands, all indications
are that their institutional repositories, like repositories
everywhere, are only capturing about 15% of their target output. That
is the approximate deposit rate for spontaneous (unmandated)
self-archiving, worldwide. Only deposit mandates can raise that
deposit rate appreciably -- and so far the Netherlands has no OA

It matters how you do the arithmetic. An institutional repository can
calculate its annual deposit rate by dividing its annual full-text
article deposits for that year by the institution's annual article
publication total for that year.

But for a central repository -- or for a "network of institutional
repositories" -- you have to make sure to divide by their respective
annual total target output. For the Netherlands, that's the total
annual article output from all the institutions in the NARCIS network.
And for PMC it's all of US biomedical research article output.

Otherwise one gets carried away in one's idealized abstractions by the
spurious fact that central repositories often have much more content,
in absolute terms, than individual institutional repositories. But
remedying this "denominator fallacy" by dividing annual deposit counts
by their total annual target content count quickly puts things back
into practical perspective.

(And this is without even mentioning the question of time-of-deposit,
which is almost as important as locus-of-deposit: Many of the central
repositories -- e.g. PMC -- have access embargoes because funder
mandates have allowed them (and have even left it in the hands of
publishers rather than fundees to do the deposits, even though it is
fundees, not their publishers, who are subject to funder mandates).
Institutional repositories have a powerful solution for providing
"Almost OA" to closed access deposits during any embargo period -- the
"email eprint request" Button. This Button is naturally and easily
implemented by the repository software at the local institutional
level, but would be devilishly difficult -- though not impossible --
to implement at the central level (especially where there is proxy
deposit by publishers) because it requires immediate email approval by
the author of eprint requests from the would-be user, mediated
automatically by the repository software: )
>     CA: "Moreover, the major institutions in the network are research universities.  Thus the question arises, if Dutch repository development could be improved if stakeholders used the notion of research repository and national repository system to consider their options (rather than thinking that the institutions must do the job)."

What on earth does this "arising question" mean at this late stage of
the game? We have researchers, the ones who do the research and write
the articles. They are (mostly, 85%) *not depositing* unless mandated
by their institutions and/or funders. This is now unchangingly true
for decades.

Now what -- in specific, concrete, practical terms -- is it that using
"the notion of research repository and national repository system to
consider their options (rather than thinking that the institutions
must do the job)" is supposed to do to fill those empty repositories?
Is there any evidence that theorists abstractly contemplating ideal
repository subtypes translates into concrete, practical action on the
part of researchers 85% of whom consistently fail to deposit
unmandated into any-which repository across the years?
>     CA: "In two decades of immersion in digital worlds, we have witnessed the development of various repository solutions and accumulated a better understanding of what works and what doesn't. The main repository solutions may be distinguished as follows:"

Before we go on: The only thing we have learned in two decades --
apart from the fact that computer scientists, physicists and
economists deposit spontaneously, unmandated (two of them
institutionally, one of them centrally) at far higher than the global
baseline 15% rate -- is that the only thing that will raise the
spontaneous deposit rate is mandates (from institutions or funders).

That lesson has nothing whatsoever to do with "various repository
solutions" (central or institutional, abstract or concrete, real or
ideal, actual or notional).
>     CA: "Subject-based repositories (commercial and non-commercial, single and federated) usually have been set up by community members and are adopted by the wider community. Spontaneous self-archiving is prevalent as the repository is of intrinsic value to scholars."

Spontaneous self-archiving is "prevalent" at the steadfast rate of
about 15%, and that is the problem.

The nature of the repository has absolutely nothing to do with this,
one way or the other. It is a matter of "community" practice.

And, as noted, the few scholarly "communities" that have adopted
spontaneous self-archiving practices unmandated (computer scientists,
physicists and economists) did so very early on in these two decades,
continuing pre-Web pratices, two of them institutionally and one of
them centrally; and they did so mainly to share preprints of
unrefereed drafts early in their research cycle. The value they found
in that practice predated the Web and had absolutely nothing to do
with repository type.

(And if it's hard to get authors to make their final drafts of
refereed, published articles publicly accessible unless it is
mandated, it would be incomparably harder to get authors from the
"communities" that have their own reasons for not wanting to make
their unrefereed drafts public to do so, against their wills: their
institutions and funders certainly cannot mandate it!)

"Commercial" vs. "non-commercial" also sounds like a can of worms: In
speaking of "repositories," are we mixing up the Open Access (OA) ones
with the Fee Access ones? And those that contain full-texts with those
that contain only metadata? And those that contain articles with those
that contain other kinds of content? If so, we are not even talking
about the same thing when we speak of repositories, for all I mean is
OA repositories of the full-texts of refereed research journal
>     CA: "Much of the intrinsic value for authors comes from the opportunity to communicate ideas and results early in the form of working papers and preprints, from which a variety of benefits may result, such as being able to claim priority, testing the value of an idea or result, improving a publication prior to submission, gaining recognition and attention internationally and so on."

We are comparing apples and oranges. OA's primary target is not and
has never been unpublished, unrefereed drafts.

Distinguish the self-archiving of OA's target content -- refereed
articles -- from the self-archiving of unrefereed preprint drafts. The
latter practice has been found very useful by some disciplines
(computer science, physics, economics) for a long time -- indeed
before the Web. But this practice has not caught on with other
disciplines, for an equally long time, in all likelihood because most
disciplines are not interested in making their unrefereed drafts
public. (Some may find this practice unscholarly; others might find it
potentially embarrassing professionally; in some disciplines it might
even be dangerous to public health.)

And the overall global self-archiving rate remains the baseline 15%
unless self-archiving is mandated.
>     CA: "As such, subject-based repositories are thematically well defined, and alert services and usage statistics are meaningful for community users"

This not only conflates unrefereed draft-sharing with OA and
repositories with services over repositories, but it also mixes up
cause and effect. There is no central repository functionality that
cannot just as well be provided over distributed or harvested
repositories. And there is no repository that cannot succeed if it
manages to capture its target content. Otherwise, the rest of the
functional details are merely decorative, for empty repositories.
And neither OA's nor OA mandates' target is unrefereed drafts (though
they are of course welcome if the author wants to deposit them too).
>     CA: "Research repositories are usually sponsored by research funding or performing organisations to capture results. This capturing typically requires a deposit mandate."

It makes no difference whether one calls a repository of, say,
biomedical research a "subject" repository or a "research" repository.
That's just words. And both institutions and funders "sponsor" them.
All that matters is whether or not deposit is mandated, because that
is what determines whether the repository is full or near-empty.

You are conflating "mandated repository" with "central research
repository." All OA repositories are "research repositories" because
all have the same target content: refereed research articles. And both
central and institutional deposit can be mandated.

You seem to keep missing the sole substantive point at issue, which is
that institutions are the universal providers of *all* of OA's target
content, funded and unfunded, across all research subjects and all
nations -- and funder mandates requiring direct central deposit
compete with and discourage institutional mandates for all the rest of
OA's target content, by requiring (from already-sluggish authors)
divergent, multiple institution-external deposit instead of convergent
one-stop institution-internal deposit (which can then be imported,
exported or harvested by central collections and services).
>     CA: "Publications are results, including books, but data may also be considered a result worth capturing, leading to a collection with a variety of items."

It's nice to get more ambitious in speculating about what one would
ideally like to see deposited, but let us not lose sight of practical
reality today: Authors (85%) are not even depositing their refereed
research articles until it is mandated. These are articles that --
without a single exception -- authors *want* to be accessible to any
would-be user, for they have already *published* them.

In contrast, it is certainly not true that all, most or even many
authors today want to make their unpublished research data (perhaps
still being data-mined by them) or their published books (perhaps
still earning royalty revenue, or hoping to) or their unrefereed
drafts (perhaps embarrassing or even dangerous until validated by peer
review) publicly accessible to all users today.

Now, does it not make more sense to try to encourage authors to
provide OA to content that they would already wish to see freely
accessible to any would-be user today -- by mandating the practice --
rather than imagining (contrary to fact) that authors are already
providing OA to content that many of them may not yet even wish to see
freely accessible to any would-be user today?
>     CA: "Because these items constitute a record of science, standards for deposit and preservation must be stringent."

Stringent standards for deposit? When most authors are not even
bothering to deposit at all? That seems an odd way to try to generate
more deposits! Rather like raising the price of a product that no one
is bothering to buy at current prices.

(No, it's not raising the quality of the product either: Users are the
ones who benefit from repository functionality; but it is authors that
we are trying to induce to provide the content to which this
user-functionality is applied.)

And is the scientific record not already in our journals and
libraries, on paper and online? And is peer review not a already
stringent enough standard?

Yes, peer-reviewed articles need to be preserved, but what has that to
do with authors depositing it in an OA repository? and usually
deposited in the form of a refereed final draft which is not the
canonical "version of record," but merely a supplementary version, to
provide OA for those would-be users who do not have subscription
access to the journal in which the canonical version -- the one that
really needs the preservation -- was published).
This is the old canard, again, conflating digital preservation with
Open Access provision -- -- and perhaps also
conflating unpublished preprints with published postprints.

And as to record-keeping: Yes, both institutions and funders need to
keep records -- indeed archives -- of the research output that they
employ and fund researchers to produce. Again, the natural locus for
that record is the institutional repository, which the institution can
manage, monitor and show-case, and from which the funder can import,
export or harvest its funded subset if it wishes. Direct
institution-external deposit, willy-nilly, would be like institutions
relying on their banks to do their record-keeping instead of
>     CA: "The sponsor of the repository is likely to tie reporting functions to the deposit mandate, this being, for example, the reporting of grantees to the funder or the presentation of research results in an annual report."

Yes, both grant fulfillment and annual research output recording and
evaluation can and should be implemented through repository deposit
mandates, by both funders and institutions. But the question remains:
What should be the locus of deposit? and should there be one
convergent locus of deposit, for a researcher and/or article, or
multiple divergent ones?

The obvious answer, again, is one-time, one-place institution-internal
deposit, mandated by both institutions and funders, and the rest by
institution-external import/export/harvesting therefrom.
>     CA: "Research repositories are likely to contain high-quality output. This is because its content is peer-reviewed multiple times (e.g. grant application, journal submission, research evaluation) and the production of the results is well funded."

This is extremely blurred and vague.

Inasmuch as refereed journal articles report funded research, they
have been both grant-reviewed and peer-reviewed, so that's

Accepted grant proposals are not part of OA's target content, and are
just a book-keeping matter for institutions and funders.

Research evaluation is done on the basis of research performance and
impact, including refereed publications as their primary input. We are
again double-counting if we dub as triply peer-reviewed content that
is simply standardly peer-reviewed articles, deposited for research
evaluation in a repository.

This sounds mostly like massaging the obvious without stating the
obvious: None of it happens if the content in question is not
deposited. Deposit needs to be mandated, and the locus of the deposit
needs to be institutional, not central, to avoid needlesly placing
divergent multiple-deposit burdens on the (already sluggish) author.
>     CA: "Users who are collaborators, competitors or instigating a new research project are most likely to find the collections of relevance"

Yes indeed -- if they are deposited. And they will only be deposited
if deposit is mandated. And mandates need to be convergent rather than
competitive in order to reach consensus on adoption and compliance.
And hence the sole stipulated locus of deposit needs to be
institutional. The rest is all just a matter of harvesting and
services over distributed institutional repositories.
>     CA: "National repository systems require coordination - more for a federated system, less for a unified system. National systems are designed to capture scholarly output more generally and not just with a view to preserving a record of scholarship, but also to support, for example, teaching and learning in higher education.  Indeed, only a national purpose will justify the national investment. Such systems are likely to display scholarly outputs in the national language, highlight the publications of prominent scholars and develop a system for recording dissertations. One could conceive of such a national system as part of a national research library that serves scholarly communication in the national language, is an international showcase of national output and supports public policy, e.g.  higher education and public access to knowledge"

You are talking about a harvesting service. No need for it to be a
direct locus of deposit.
Which brings us back to the sole real priority, which is concerted,
convergent mandates from institutions and funders (and national
governments) to deposit (once only) in institutional repositories,
minimizing the burden on authors.
>     CA: "Institutional repositories contain the various outputs of the institution."

And all other repositories -- subject-based, funder-based, or national
-- likewise contain "the various outputs of the institution,"
institutions being the sole universal providers of all research
>     CA: "While research results are important among these outputs, so are works of qualification or teaching and learning materials. If the repository captures the whole output, it is both a library and a showcase. It is a library holding a collection, and it is a showcase because the online open access display and availability of the collection may serve to impress and connect, for example, with alumni of the institution or the colleagues of researchers."

It is highly desirable for universities to make their courseware
freely accessible online. But it is a different agenda from OA's. And
it has an even lower deposit rate today than OA: MIT is the only
institution that has a policy of making its courseware openly

If people are not yet recycling their waste, what needs to be done is
to mandate waste recycling, not to find other worthy things it would
be a good idea to do, but that people are likewise not doing, such as
giving up cigarettes -- or other worthy things that a (near-empty)
waste-recycling depository could host, aside from its target contents,
such as charity-donation booths.

Besides, some courseware -- especially material prepared in the hope
of writing a best-selling textbook -- is more like data, books,
unrefereed preprints (and software, and music and movies):
discretionary give-aways, depending on the author, rather than
universal give-ways, written solely for uptake and impact, like
refereed research articles.

So let's not remain oblivious to the vast shortfall in OA's target
content by blurring it with fantasies about other kinds of content
(much of it absent too!).

As for theses: The natural solution for them is to treat them the same
way as journal articles: mandate deposit in the institutional
repository (as more and more universities are now beginning to do).
>     CA: "A repository may also be an instrument of the institution by supporting, for example, internal and external assessment as well as strategic planning."

Yes, and this is yet another rationale for mandating deposit of OA's
target content: refereed research publications. Australia and the UK
are beginning to link their institutional repositories to submissions
for research assessment nationally, and universities like Liège are
doing so for internal performance assessment.
>     CA: "Moreover, an institutional repository could have an important function in regional development. It allows firms, public bodies and civil society organisations to immediately understand what kind of expertise is available locally."

Yes, all true. These are further rationales for institutions mandating
institutional deposit -- and for funder mandates to reinforce
institutional deposit mandates rather than compete with them.
>     CA: "These four ideal types have been derived partly from the history of repositories, partly through logical reasoning. This includes an appreciation of the relevant literature on scholarly communication, open access and repositories, though the [paper] is not a literature review but an argument that moves back and forth between abstract ideal types and specific cases.  Ideal types should not be misunderstood as a classification, in which each and every repository may be identified as belonging unambiguously to a category. Rather, the purpose of creating ideal types is to aid our understanding of repositories and provide a tool for analysing repository development."

The "argument" does not seem to be grounded in a grasp either of what
(OA) repositories are for, or of the practical problem of filling
them. The distinctions among central repositories are largely
arbitrary and spurious; they are more about services and functionality
than about locus of deposit or repository type. The fundamental and
sole substantive point is completely missed: Deposit needs to be
mandated (by the universal providers of the target OA content --
institutions -- reinforced by funders) and the locus of deposit needs
to be institutional.

The rest is just counting abstract chickens before their concrete eggs
are fertilized, let alone laid or hatched.
>     CA: "Some publication repositories may be identified easily as resembling very much one ideal type rather than another. Some of the classic repositories conventionally identified as subject-based, such as arXiv and RePEc, exhibit few features of another type. Yet, one of the more interesting questions to ask is in how far other elements are present and what this means. ArXiv, for example, is also a research repository, with institutions sponsoring research in high-energy physics being important to its development and success. RePEc, by comparison, has a strong institutional component because the repository is a federated system that relies on input and service from a variety of departments and institutes."

Arxiv is based on direct central deposit of preprints (and postprints)
in physics; Repec amalgamates distributed institutional deposits of
preprints in economics; Citeseer harvests distributed institutional
deposits of preprints and postprints in computer science. There is
nothing to be learned here except that the spontaneous preprint (and
postprint) deposit practices in these three research subject
communities have failed to generalise to other research subject
communities and therefore postprint deposit mandates from institutions
and funders are needed, with one convergent locus of deposit: the
repositories of the universal providers of all research, funded and
unfunded, across all subjects and nations: the world's universities
and research institutes.
>     CA: "To continue with another example, PubMed Central (PMC), at first glance, is a subject-based repository. Acquisition of content, however, only took off once it was declared a research repository capturing the output of publicly funded research (by the NIH). Notably, US Congress passed the deposit mandate, transforming PMC into a national repository. That a parallel, though integrated, repository should emerge in the UK (UK PMC) and Canada (PMC Canada) is thus not surprising. Utilisation of the ideal types outlined above would thus be fruitful in analysing the development of PMC and, presumably, be equally valuable in discussing the future potential of PMC, for example the possible creation of a Europe PMC."

This just repeats the very same incorrect analysis made earlier: PMC
is and always was a US central research subject repository for
refereed biomedical research publications (so are its emulators, for
their own "national" output). What changed was not that NIH rebaptized
PMC by "declaring" it a "research repository." What changed was that
NIH mandated deposit (after two years wasted in the hope that a mere
"invitation" would do).

The rest is just monkey-see, monkey-do. What those aping the US
missed, however, was all the rest of OA's target content, funded and
unfunded -- across all nations, subjects and institutions -- and how
not only mandating deposit, but mandating convergent institutional
deposit is essential in order to have universal OA to refereed
research in all subjects, worldwide.

(The various national PMCs are a joke, and will be quietly rebaptized
as harvested archival national collections -- if those are desired at
all -- once worldwide OA content picks up, as institutional deposit
mandates become universal. The global search functionality will not be
at the level of all these absurd and superfluous national PMC clones,
but at the level of global harvesting/search services. Why would any
user -- peer or public -- want to search the world's biomedical
literature by country (or institution, for that matter) -- other than
for parochial actuarial purposes?)
>     CA: "National solutions are increasingly common (and principally may also be regional in form), but vary especially with regard to privileging either research outputs or the institutions. The French HAL system is powered by the CNRS, the most prestigious national research organisation, and thus is strong on making available research results."

Strong on making them available if/when deposited, but no stronger
than the default 15% on getting them deposited at all. (The
denominator fallacy again...)
>     CA: "In Japan, the National Institute of Informatics has supported the Digital Repository Federation, which covers eighty-seven institutions, with mainly librarians working to make the system operational."

Unless librarians in Japan have executive privileges over authors'
writings that librarians elsewhere in the world lack, they will not be
able to raise the deposit rate without mandating deposit either...
>     CA: "In Spain, an aggregator and search portal, Recolecta, sits atop a multitude of institutional repositories, with a large variety of items."

A large variety of "items": But what percentage of Spanish annual
refereed article output is being deposited? My guess is that -- apart
from Spain's 4 institutional mandates and 1 funder mandate -- that
percentage will be the usual baseline 15% (looking spuriously bigger
because aggregated centrally across multiple institutions: the
denominator fallacy yet again...).
>     CA: "In Australia, institutional repositories are prominently tied to the national research assessment exercise, with due emphasis on peer reviewed publications."

That's promising, because being required to submit for research
assessment via institutional repositories is effectively a deposit
mandate. Moreover, with 1 funder mandate and 5 institutional mandates
-- including the world's first institution-wide mandate at QUT --
Australia is neck-and-neck, proportionately, with the UK, in the
worldwide national OA sweepstakes: The UK has 13 funder mandates, 11
institutional mandates, and 3 departmental mandates, including the
world's very first OA mandate (U Southampton School of Electronics and
Computer Science); the UK too is oving toward linking deposit to the
new national research assessment scheme.
>     CA: "More here:"
>     Armbruster, Chris and Romary, Laurent, Comparing Repository Types: Challenges and Barriers for Subject-Based Repositories, Research Repositories, National Repository Systems and Institutional Repositories in Serving Scholarly Communication (November 23, 2009).  Available at SSRN:

Let the reader be prepared for a rather confused and practically
unproductive mashup of OA repository-content, deposit-locus, and
central-service issues in the Armbruster & Romary paper.

Yet the resolution is a simple one-liner: All research institutions
and funders worldwide need to mandate institutional deposit, and then
reap the harvest centrally, with search services, subject collections,
national collections, language collections, and any other "ideal" on
which hearts are set.

(But don't let the function-tail wag the content-dog now, when it's
only at 15% body weight and needs to settle down and eat.)

Stevan Harnad
Received on Mon Nov 30 2009 - 20:10:44 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:50:00 GMT