Re: Archive Preservation Considerations

From: Jean-Clause Guedon <guedon_at_ERE.UMontreal.CA> <harnad_at_COGSCI.SOTON.AC.UK>
Date: Mon, 31 Aug 1998 05:43:07 -0400

Alan Lesgold <> wrote:

al> The Walker article, combined with Harnad's comments, present a
al> picture that seems ideal. However, it does not attend sufficiently
al> to the issue of archival preservation... My point is that we face
al> an urgent need to develop and finance enduring archival
al> preservation for at least the modest share of our scholarly output
al> that is still worth reading after a decade.

al> 1. Multiple sites store independent copies of journals in formats
al> with relatively long lifetimes, such as CD-ROM or DVD.

Stevan Harnad <> replied:

sh> CD-ROM is not a format, it is a storage medium (but format is a
sh> consideration too: see the posting by Mark Doyle of American
sh> Physical Society in the "Savings...30/70..." thread).

sh> xxx is mirrored in [15] countries.

Sorry to respond so slowly, people, but here I go on this ancient (three
days) message... I guess Stevan really wants to prove that the speed of
writing can catch up with the speed of thought!

CD-ROM is not a format indeed and one should not confuse the material
substratum (e.g. a clay tablet, a pyramid, a plastic-based CD_ROM, a
metal-based CD-ROM. On this latter point, see the amusing site where you will see a rather oniric project to mail down a
message to our descendants 50,000 years down the line...

Once the substratum is defined, the "ink" has to be defined. On a
CD-ROM, the little pits and their location on the surface are part of
this layer.

Finally, the language used belongs to a third layer.

In a sense, mirroring is like publishing several copies of the same
corpus. Except that each corpus is materialized with various forms of
substrata and various "inks". Only the third level, language remains

If we begin to calculate mirrors on the order of a hundred or even a
thousand sites, and not a mere dozen, then you publish your corpus on a
level commensurate with the number of copies often found in the print
world. We know that a thousand copies contribute greatly to the
capacity of a document to survive through time. That is if the material
substratum is not too lousy (e.g. acidic paper).

If you publish your corpus over hundreds of sites across the world, the
equipment used will vary greatly and its rate of renewal will also vary
greatly from place to place. Within ten years, your hundreds of sites
might well represent a hundred different forms of level one and two.
This means that older forms of digitized materials can easily be
replaced by more modern forms, if needed, through a judicious search of
the network of mirrors. But it also means that digitized materials, in
their meta-data, must include, for each site, the precise description
of the material substratum and the "ink" used. At any rate,
collectively, the hundreds of sites amount to a constant, statistically
driven, form of automatic updating that will need very little
monitoring, especially if the right watchful software is applied to the
meta-data so as to warn that variety is growing dangerously low. Genes
have survived very well throughout life in a similar fashion through a
judicious technique of mirroring through the species. We can do worse
than take a leaf off that book... :-)

al> 2. Support for those sites comes from a continuing income stream
al> provided by libraries and scholarly societies. The income stream
al> must be sufficient to support copying of all media every five
al> years as well as conservative amortization of server hardware to
al> assure up-to-date operations and high accessibility.

Mirroring will cost less and less as costs of storage continue to
plummet. It costs almost nothing to mirror a site, especially if you
compare that with the price of a subscription to an Elsevier journal...
:-) It could almost disappear in the general operating budget of any
university library.

Jean-Claude Guedon
Departement de litterature comparee
Universite de Montreal
CP 6128, Succursale "Centre-ville" Surfaces
Montreal, Qc H3C 3J7 Canada
Tel. 514-343-6208
Fax. 514-343-2211

See you at INET'99, San Jose, June 22-25, 1999
Received on Tue Aug 25 1998 - 19:17:43 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:45:25 GMT