Re: Central vs. Distributed Archives

From: Stevan Harnad <>
Date: Thu, 6 Nov 2003 13:42:41 +0000

Yet another piece of evidence has appeared that seems to confirm that
whereas central archiving was historically the way in which self-archiving
began, it is not the fastest or best form for it to grow and spread today:

The Nature headline is (as usual for the press) an exaggeration:

    "Critical comments threaten to open libel floodgate for physics

and so is SciDevNet's:

    "Legal concerns plague open access physics archive"

but the facts seem to be that, across the years, some papers that
contained plagiarism or libel might have found their way into ArXiv's vast
(250,000 papers) and unvetted collection.

I said "unvetted," but of course almost all those papers are
also submitted to peer-reviewed journals, which *do* vet them,
and when there have been any corrections to the unrefereed
preprint, the authors self-archive the refereed postprint too:

So the (tiny) problem of plagiarism and libel is with papers that have
*not* been peer-reviewed.

ArXiv can make an effort to vet its daily submissions for plagiarism or
libel, but at nearly 4000 per month, this would be quite a task:

So the natural conclusions to draw from this seem to be the following:

(1) OAI-interoperability has now made all OAI-compliant archives
equivalent: They can all be harvested and jointly searched. It no
longer makes any difference which archive a paper is actually deposited

(2) Not only are institutions in the best position to vet their own
research output before approving deposits in their own institutional
archives (probably on a departmental basis, optimally)
but this vetting load is much better shouldered in a distributed way,
rather than having one centralized vettor for all of the planet's research
output (in physics, mathematics, or other disciplines).

(3) Having institutional self-archived research output housed in the
institution's own archives also immunizes the archive from external
liabilities (such as plagiarizers from other institutions) but it also
makes it even more clear that -- contrary to what the Nature article
says it is, and perhaps contrary even to what the Physics ArXiv *thinks*
it is -- open-access archives are not *publishers*! They are merely a
means of providing open access to (refereed) publications (as well as
to their precursor unrefereed preprints).

    "Garfield: 'Acknowledged Self-Archiving is Not Prior Publication'"

For those who needed a reminder of it, research's "publish or perish"
mandate is *not* "self-archive or perish"! "Publication" refers to
certification as having met the known peer-review quality standards of
a journal, not to having pressed the click button to self-archive an
unrefereed draft in an open-access archive! That meets the (trivial)
legal definition of "publishing," to be sure -- even hand-writing it
on paper once and showing it to someone does! But it certainly doesn't
meet the definition of what the research community (and promotion/salary
committees, and research-funding councils) means by "publication,"
which is to be certified by a qualified, neutral third-party as having
met its known standards of peer review. At best, the self-archiving
of an unrefereed draft qualifies as vanity-press *self-publication* --
but that is precisely what researchers' institutions and their "publish
or perish" mandates are there in order to *protect* their researchers
from doing! (Or rather, to ensure that they go on to get their papers
properly peer-reviewed and certified as having met the peer-review
standards of the particular journal that accepted the paper.)

By the same token, it is each researcher's own institution -- not a
centralized entity like ArXiv -- that is in the best position to prevent
its own researchers (and themselves) from self-archiving plagiarized or
libellous papers -- and to take action if they do.

Having said that, the Physics ArXiv's "legal concerns" are all a tempest
in a teapot anyway. A central archive is a service provider. The service
it provides is to operate an archive for authors to self-archive in. If
an author self-archives a piece of plagiarism or libel therein, the only
legal responsibility of the archive is to *remove* that item as soon as
it is drawn to its attention. This is exactly the same rule as the one
applied to other Internet service providers: If someone posts or emails
pornography in an AOL discussion list or bulletin board, AOL does not
become liable as a pornographer if it immediately removes the item
as soon as it is drawn to its attention and blocks further postings
from the poster. (The poster, of course, is the one to prosecute for
the pornography!) It is absurd to imagine that AOL could vet all emails
and postings in advance, to screen out pornography! It is reasonable,
though, to insist on better identity-control, for authenticating and
tracing the identity of posters, in case legal action needs to be taken
against them.

So depositor-authentication and tracing is the only thing ArXiv may need to
shore up (as well as the capability of removing an item). Fortunately,
institutional archives can do this much more easily and naturally with
their own research staff!

See the long thread:
"Central vs. Distributed Archives"1G

Stevan Harnad

NOTE: Complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02 & 03):
    Posted discussion to:

Dual Open-Access Strategy:
    BOAI-2 ("gold"): Publish your article in a suitable open-access
            journal whenever one exists.
    BOAI-1 ("green"): Otherwise, publish your article in a suitable
            toll-access journal and also self-archive it.
Received on Thu Nov 06 2003 - 13:42:41 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:08 GMT