Re: EPrints, DSpace or ESpace?

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Tue, 11 Feb 2003 20:09:03 +0000

On Tue, 11 Feb 2003, D M Sergeant wrote:

>DS> So preservation should focus on tolled publications, and not
>DS> self-publications? Self-archiving systems cannot have a
>DS> preservation component?

(1) Self-archiving is self-archiving, not self-publication.
http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm#1.4

(2) Self-archiving is intended to provide open-access to what is
otherwise only available by toll-access.

Preservation should focus on the locus classicus of *publications* --
which is currently tolled -- not on the attempts to *supplement* them
with free access.

(The tolled literature in question here is the planet's 20,000 refereed
journals, both their paper and their online editions, which are both
proprietary products of their publishers.)

Of course self-archiving can have a preservation component: It does. And
as it gets more content, the preservation component will get more
attention.

> DS> [T]he wrong [software] choice may lead to a failure in the
> DS> preservation. Other material is ergo being needlessly lost
> DS> while ever it is not being preserved.
>
>DS> So really this ArXiv self-archiving initiative is an example of
>DS> preservation. This is a good thing. And surely it is a good thing
>DS> that the library community is beginning to preserve other research
>DS> disciplines.
>
>DS> Not having the correct software is no rationale for losing digital
>DS> material. Surely it is best to build software that does as good
>DS> a job as can be done. Yes, the job still needs to be done.

Derek, you seem to be systematically missing the point. The right
self-archiving software today is the software that self-archives and
provides open-access today, and tomorrow, and after tomorrow, as ArXiv
has been doing for over 12 years. The free self-archiving software under
discussion here does everything ArXiv has been doing, and more. It is
*you* [see DS above] who were raising questions about whether it is
sufficiently preservational, and I was replying that there is no reason
whatsoever either to worry about or to be held back by that now.

The library community can only preserve the self-archived research of
other research disciplines to the extent that other research disciplines
self-archive it. Those other disciplines are not doing nearly enough
self-archiving yet. Needless worries about whether the self-archiving
software is "correct" enough is one of the things holding them back.

So the question arises: does the library community wish to help accelerate
self-archiving or help hold it back? If the answer is the former (as I
assume, on reflection, it will prove to be), then it would be helpful
not to keep raising unnecessary and irrelevant concerns about preservation
in the context of either choosing self-archiving software or doing
full-speed self-archiving, now.

Without content, there is no content to preserve. And a growing mass of
content is the best guarantor that any eventual preservation needs will
be addressed. Virtually all the content in question here (that 20,000
refereed-journals-worth) is currently proprietary toll-access content:
Let preservation worries be focussed on that toll-access corpus for
now. And let researchers meanwhile go ahead and supplement it with
open-access versions of their own publications that they self-archive
in their own institutional Eprint Archives, using today's perfectly
adequate self-archiving software.

> SH> The library community is worrying about the "needless loss" of
> SH> nonexistent content --
>
>DS> I thought that it was you who suggested that a whole decade of
>DS> (nonexistent) research had been lost needlessly!

Could the misunderstanding underlying all this run so deep that even
those words of mine were misconstrued? I said that the physicists had
been self-archiving open-access versions of their toll-access content
for 12 years, whereas other disciplines had not (and in part because
of groundless worries about its preservation!) -- at the cost to those
other disciplines of the loss of 12 years of access to and impact of their
(non-existent) open-access content! (Meanwhile, the toll-access versions
in all disciplines have been carrying on as usual.) And even the
physicists' open-access content -- recklessly self-archived despite the
preservation hazards! -- is still here to tell the tale, 12 years hence...

> SH> I would say that there was a certain incompatibility here between the
> SH> desiderata of the library community and the research community! Yet it
> SH> is all so simply resolved, if we simply remind ourselves that we are
> SH> talking here about immediate *supplements* to publication and existing
> SH> forms of preservation, not *substitutes* for them.
>
>DS> I cannot even remember raising the banner of the library
>DS> community. My desire is that nothing digital is lost
>DS> inadvertently. This means effort on someone's part to decide what
>DS> to preserve, and to preserve it.

That's the banner! Today the effort that is needed is the effort to
self-archive researchers' own refereed research publications, to make
them open-access. The preservation efforts come after that, not before.
And the software is already more than adequate for the task.

> SH> Note that the emphasis is on "immediate" rather than "delay" -- including
> SH> delays for the sake of future-proofing.
>
>DS> Emphasis is on getting the "immediate" done well.

If "done well" means deferring the immediate in any way at this time,
then it is most definitely the wrong emphasis (and precisely the banner
I would like to see shelved at last).

(Note that we are not talking about digital contents in general, but only
about the self-archived, open-access versions of toll-access refereed
journal publications. The picture gets hopelessly scrambled if you try
to apply what I am saying to digital contents in general, or you try to
apply what is true for digital contents in general to this very special,
*supplementary* subset of them.)

> >DS> How much do either [EPrints or DSpace -- or http://cdsware.cern.ch/]
> >DS> conform to the OAIS reference model?
> >
> SH> How much do they *need* to (and why?), in order to provide many years
> SH> of enhanced access and impact to otherwise unaffordable research, *now*?
>
>DS> So that the many years happens on purpose, instead of in isolated
>DS> instances by accident. It was actually a genuine question, which
>DS> I would like to know the answer to.
>
>DS> How much do EPrints or DSpace conform to the OAIS Reference Model?

Not much; nor is it clear why they should. (See the parallel reply of my
colleague, Les Carr.)

> DS> It is unlikely that either [EPrints or DSpace] will be able to provide
> DS> the full solution.
>
> SH> The full solution for what?
>
>DS> The full solution to keeping my database application for many years.
>DS> (This was the example I used earlier in my original reply.)

But why should we be concerned about your database application when what
is being lost year after year is the access, usage, and impact of papers
that are currently only accessible via toll-access? We are talking here
about researchers doing day-to-day research, by accessing the full-text
of one another's research papers; not about some present or future
database application.

>DS> As I mentioned, self-archiving is a good idea. Immediate
>DS> self-archiving is even better. Immediate self-archiving and
>DS> self-preservation is even more better!

But what was on offer was software for immediate self-archiving. And
what you were raising were concerns about whether it is good enough, for
preservational reasons. To simplify: We can all go back in our corners
and work on improving the software for preservation, or we can go ahead
and use it to self-archive, now. Which is it to be?

Meanwhile, of course all the self-archiving software developers are
keeping an eye on posterity; but their closer eye is on content, now,
as a matter of priority, both for posterity and for today's research
progress.

Stevan Harnad
Received on Tue Feb 11 2003 - 20:09:03 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:51 GMT