Re: Role of arXiv

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Fri, 8 Oct 2010 07:47:06 -0400

On Fri, Oct 8, 2010 at 12:57 AM, Simeon Warner
<> wrote:

>> On Thu, 7 Oct 2010, Joseph Esposito wrote (in liblicense):
>> >
>> >JE:
>> >Finally, once again taking the centrality of arXiv to the
>> >community it serves into consideration, what would happen if a
>> >modest deposit fee were assessed--say, $50 per article?
>> SH:
>> The IR cost per paper deposited will be closer to 50c than $50, once all
>> universities are hosting their own output, and mandating that it be
>> deposited.
> SW:
> I do not think the 50c number is supported by fact or by trend. I know
> that for Cornell's IR the number is much closer to $50 than to 50c if
> one divides cost to operate by the number of new submissions in the
> same period. (I would love to see data for other IRs.)

Simeon, I can only repeat:

*"once all universities are hosting their own output, and mandating that it be

Cornell has not mandated deposit, and it is far from hosting all of
its annual output. Ditto for all but about 100 universities so far

> SW:
> For arXiv the number is <$7. We have the benefit of significant scale
> (65k submissions/year) and a user community that require very little
> hand-holding.

Yes, you have significant scale. But, for Arxiv, Cornell -- and the
other subsidisers, including some universities -- are paying for all
deposits, from all universities, in one central repository.

To repeat: The sensible solution (and probably the only practical,
affordable one) is for Arxiv -- and any other central archives like
it, in other fields -- to harvest their content automatically from
Institutional Repositories that host their own research output.

The annual cost per paper deposited will be far less for an
Institutional Repository -- hosting only its own research output --
*once the institutions are indeed hosting all of their annual research
output* -- and not a small fragment of it, as now.

Most institutions have IRs that are near-empty rather than at capacity
(as far as OA's target output is concerned). (The cost/benefit of
hosting their grey literature and other kinds of content is another
matter, but not to be reckoned into this comparison with Arxiv
regarding per-paper cost. IRs can archive lots of kinds of things,
including family photo albums, if desired...)

And Cornell, of course, has the double burden of hosting a near-empty,
unmandated IR for its own refereed research output, plus the (partial)
expense of hosting Arxiv for the world!


Annual Costs Per Deposit of Hosting Refereed Research Output Centrally
Versus Institutionally

Why Cornell's Institutional Repository Is Near-Empty


> SW:
> This is not to say that IRs aren't worth the support from their local
> institution! Compared with the cost of doing research resulting in an
> article, $50 is pocket change. I think that a key driver for IRs is
> that they align well funding with mission. At Cornell we consider it a
> worthwhile service for our faculty to provide considerably more
> support for the IR than arXiv could provide its users.

There are many valid reasons for institutions creating and supporting
their IRs -- but only if they mandate that they be filled with their
target content.

Among those many valid reasons are economic ones:

    "Among the many important implications of Houghton et al’s (2009)
timely and illuminating JISC analysis of the costs and benefits of
providing free online access (“Open Access,” OA) to peer-reviewed
scholarly and scientific journal articles one stands out as
particularly compelling: It would yield a forty-fold benefit/cost
ratio if the world’s peer-reviewed research were all self-archived by
its authors so as to make it OA. There are many assumptions and
estimates underlying Houghton et al’s modelling and analyses, but they
are for the most part very reasonable and even conservative. This
makes their strongest practical implication particularly striking: The
40-fold benefit/cost ratio of providing Green OA is an order of
magnitude greater than all the other potential combinations of
alternatives to the status quo analyzed and compared by Houghton et
al. This outcome is all the more significant in light of the fact that
self-archiving already rests entirely in the hands of the research
community (researchers, their institutions and their funders), whereas
OA publishing depends on the publishing community. Perhaps most
remarkable is the fact that this outcome emerged from studies that
approached the problem primarily from the standpoint of the economics
of publication rather than the economics of research."

Harnad, S. (2010) The Immediate Practical Implication of the Houghton
Report: Provide Green Open Access Now. Prometheus 28 (1). pp. 55-59.

> SW:
> (As a side note I mention that at arXiv we consider free access and
> free submission to be foundational and thus did not consider an
> author-pays model. See for
> more details of our business planning process.)

Arxiv is a repository for articles that have been or will be refereed
and published by *journals*. There is an "author pays" model for
paying for that refereeing and publishing through author/institution
publication fees (for OA journals, and a subscription model for non-OA
journals, still the vast majority).

But there is not, never was, and never need by an "author pays" model
merely for the *deposit* of the author's draft of those same articles.

Arxiv is a repository, providing access, not a publisher of refereed
research. The journals are still doing that. And they need to be paid
for it, either via subscriptions or via "author pays."

>> > JE:
>> >I am not
>> >suggesting that this should or should not happen; I am simply
>> >wondering what the outcome would be.  (BioMed Central, PLoS, and
>> >Hindawi all charge more than this, though they provide additional
>> >services.)  Would the number of deposits remain about the same?
>> >Would the number drop?  And if it dropped, how precipitously?
>> SH:
>> Guess again! Once the burden of hosting, access-provision and archiving is
>> offloaded onto each author's institution, the only service that journals
>> will need to provide is peer review, and hence journals will be charging
>> institutions a lot less than they are charging now. (Print editions as
>> well as online editions and their costs will be gone too.)
> SW:
> Overlay journals are also very interesting and I hope will grow in
> number. This does not seem to be happening yet though. A trend we see
> right now is a rather problematic increase in the number of low
> quality author-pays website-and-little-else online journals. They
> aggressively promote their articles through open-access services such
> as arXiv while established journals wrestle with the transition.

On this you are entirely right, Simeon (though I think the term
"overlay journals" is a misdescription of what may eventually come to
pass, once all refereed, published articles are being self-archived in
their author's IR).

(And Cornell is aiding and abetting this trend, by agreeing
pre-emptively to subsidize "author pays" costs for (some of) their
authors' articles while failing to mandate self-archiving of all of
their authors' articles, cost-free!)


Harnad, S. (2009) The PostGutenberg Open Access Journal. In: Cope, B.
& Phillips, A (Eds.) The Future of the Academic Journal. Chandos.

> SW:
> In all of this the tools necessary to use IR content effectively still
> lag well behind the facilities offered by subject repositories.

Many of the necessary tools are not needed at the individual IR level,
because search occurs at the harvester level.

What IRs lack is not tools, but content. Once we have the content,
developing the tools is a piece of cake.

> SW:
> One should also not underestimate the cost of building effective
> collections over harvested data (see, for example, the NSDL experience
> ).

We can cross that bridge when we get to it -- if Google Scholar does
not cross it for us -- once the target content is indeed being
deposited in the IRs, globally, because deposit has been mandated.

Stevan Harnad
Received on Fri Oct 08 2010 - 12:48:16 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:50:16 GMT