A Physicist's Challenge to Duplicate Arxiv's Functionality Over Distributed Institutional Repositories

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Mon, 5 Jan 2009 09:39:00 -0500

      Physicist (anonymous):

            "If you want to convince me [that
            institutional self-archiving plus central
            harvesting can provide all the functionality
            of Arxiv ], then try to do so by conducting
            the following experiment with any...
            "harvesting" vehicles you like: 

                  (1) Choose an area, such as
                  Mathematical Physics, or
                  Integrable Systems, and find all
                  the papers that have been
                  deposited in any of the archives
                  that they cover, within the past
                  week.  (If they cover 95% of the
                  arXiv, they must necessarily
                  producethis information just as
                  well). No other barrage of junk;
                  just that simple list of papers. 

                  (2) Do the same with respect to
                  all the posted publications by a
                  given author for the past ten
                  years. Again: not a barrage of
                  google-like junk dumped upon you,
                  but this specific information.
                  (If I want a ton of junk, I can
                  also go to Google scholar, and
                  waste endless time trying to find
                  what I need.) 

                  (3) Find out, at one go, if a
                  given article, or set of
                  articles, from the above list,
                   has been published in a journal
                  , and what the journal reference

                  (4) Get a copy of any of these
                  articles, at once, in any
                  convenient format, like .pdf,
                  that is available.

                  (5) Be equally sure that all the
                  above is simultaneously done for
                  all such articles deposited in
                  individual institutional

            "If you can do all the above, successfully,
            you will have given the 'proof of

      Les Carr (ECS, Southampton):

      "I think we can reasonably build the required
      functionality on top of the Celestial OAI-PMH harvester .
      The "proof of concept" project would need to fund a
      server to allow registered users to subscribe to alerting
      emails, based on searches over the "recently added" OAI
      metadata held in Celestial." 

Note: This is not about the relatively trivial issue of whether
longstanding Arxiv self-archivers need either to change their locus
of deposit or to do double the keystrokes in order to deposit their
papers in both Arxiv and their IRs: That can be accomplished
automatically, depositing only once, by the IR software's SWORD
import/export functionality. 

This is instead about whether central harvesters of distributed IRs
can indeed provide (at least) the same functionality as
direct-deposit central repositories (or even better). The provisional
reply is that they can, but it is now important and timely to
demonstrate this technically.

The functionality question is extremely important for another matter:
Getting the IRs filled. It has become clear that deposit mandates are
needed in order to fill repositories (whether central or
institutional) with OA's target content: the 2.5 million articles per
year published in the planet's 25,000 peer-reviewed journals, in all
disciplines and languages, and originating from all the world's
research institutions (universities, mostly).

OA deposits need to be mandated by all the world's research
institutions, the research providers, reinforced by deposit mandates
from the funders of the funded subportion of that research. The
universal adoption of these deposit mandates needs to be facilitated
and accelerated : There have only been 61 adopted so far (from 31
institutions and 30 funders). The institutional mandates cover all
research output, whereas the funder mandates only cover funded
research. But whereas an institutional mandate covers covers all
research output, cutting across all fields, funded and unfunded, from
that institution alone, a funder mandate covers only funded research,
usually only in one or a few fields; however, it cuts across all

Hence a funder mandate that requires institutional IR
deposit (followed by optional automatized central harvesting or
export) also simultaneously serves to stimulate, motivate and
reinforce the adoption of institutional mandates by each of its
funded institutions, to cover the rest of each institution's own
research output, across all fields, funded and unfunded. In contrast,
a funder mandate that requires direct deposit in an
institution-external, central repository (1) touches only the
research output that it funds, (2) fails to propagate so as to
facilitate the adoption of complementary institutional mandates for
all the rest of institutional research output -- and even (3)
competes with institutional mandates by (giving the appearance of)
necessitating double-deposit were the institution to contemplate
adopting a deposit mandate of its own too. 

In reality, of course, the SWORD automatic import/export capability
moots any need for double-deposit, but this is not yet widely known
or understood; and even without double-deposit as a perceived
deterrent, divergent funder mandates, needlessly requiring direct
institution-external deposit, simply miss the opportunity to provide
the synergy and incentive for the adoption of complementary
institutional mandates that convergent funder mandates, requiring
institutional IR deposit (plus optional central harvesting) do.

Hence the demonstration that central harvesting of distributed IR
deposits can not only duplicate but surpass the functionality of
direct central deposit should help encourage funders to adopt the
convergent IR deposit mandates that facilitate the adoption of
complementary mandates by the universal provider of research output,
the worldwide network of institutions (OA's "sleeping giant "),
rather than divergent mandates that fail to encourage (or even
discourage) institutional mandates.

Stevan Harnad 
Received on Mon Jan 05 2009 - 14:42:38 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:37 GMT