Re: Ann Okerson on institutional archives

From: Stevan Harnad <>
Date: Mon, 28 Mar 2005 15:44:34 +0100 (BST)

I have to point out that the information from Franck Laloe about CNRS's
HAL is correct and very helpful but risks being extremely misleading
about the cost of distributed institutional archiving. Here are the
pertinent points:

(1) France is unique in having a national research "mega-institution",
the CNRS. This consists of CNRS researchers in just about all scholarly and
scientific disciplines (not just those we call "science") distributed
all over the country, either in independent CNRS unit or in CNRS units
that are administratively associated with local universities.

(2) I am not sure what percentage of the researchers and research output
of France the CNRS comprises, but it is considerable, and if we add in the
three other CNRS-like national research institutes (INSERM in medicine,
INRA in biology and INRIA in information/computer science, which are all
collaborating with CNRS in self-archiving their research output in HAL),
that covers the great majority of French research output.

(3) Because of this unified national mega-institution and mega-archive,
France is in a position to take a huge step forward toward making 100%
of French research output OA, thereby setting an example for the rest of
the world. The total cost of this is very low, because of the economies
of scale that come with having all national research output centralized
in this way.

(4) Most important of all, because all four of these institutions are
indeed institutions, with the status of employer (and, I am not sure
about this, but I believe also the status of research funder in some
cases), CNRS, INSERM, INRIA and INRA are in a position to adopt a unified
self-archiving policy at a national level, and to ensure that the policy
is implemented in the whole country, by just about all of its researchers,
for just about all of French research output, all at once.

(5) Now the misinterpretation of all this:

    (5a) Few if any other countries are in a position to adopt and
    implement a national self-archiving policy like this, distributed
    across all disciplines. Their research output is local to their
    distributed universities and research institutions, and hence
    self-archiving policy must be distributed and local to those

    (5b) The cost of self-archiving *per local institution* (which
    is what I and Les, and others who have actually implemented such
    local archives said it was: about a $2000 server plus a few days
    one-time sysad time for start-up and a few days a year sysad time for
    maintenance) is far, far lower than the cost of national, central
    archiving (which is itself quite low). It may be that the national
    sum of the local costs of the institutional self-archiving across
    all local universities in a country comparable to the size of France
    will be somewhat higher than the price of France's single national
    archive, HAL, but *this national sum is meaningless* in countries
    that have no such national structure! It is like summing the library
    book acquisition costs for each of the universities in a country
    and comparing them to central costs: There is no national "pocket"
    out of which all those local library acquisition budgets come, just
    as there is no national pocket for the sum of institutions' computer,
    network, telephone or research travel costs. Such a comparison only
    makes sense for a country with centralized research, like France.

    (5c) HAL, though an excellent and no doubt robust and highly
    functional national research self-archiving system, happens to be
    modelled on the properties of the Physics Arxiv. This is all fine,
    but rather arbitrary, in making comparisons with distributed local
    institutional self-archiving: There is no reason whatsoever why
    local institutions need to adopt either the particular properties
    of Arxiv or the strategies of a centralized national archive. The only
    thing that these local university archives need to ensure is that they
    are OAI-interoperable. The rest of the properties of HAL are merely
    specific further choices that have been made (many no doubt based
    on a-priori guesses, not concrete experience or empirical study
    of what is optimal) in the special case of HAL and CCSD.

    (5d) Franck Laloe's guess that OAI-interoperability is not enough
    (to forestall a 'Tower of Babel') is precisely that -- an a-priori
    guess. It has not been tested; all the a-posteriori evidence to
    date, from actual distributed university archives, is that the
    guess is simply incorrect: that what archives need is not more
    functionality (whether arxiv-like functionality, HAL-like
    functionality, or otherwise) but *more contents*. Archive content
    is the only thing standing between the research world and 100% OA.

    (5e) The only systematic analysis that has been done, comparing the
    merits of central, national self-archiving and distributed, local
    institutional self-archiving has come out very strongly in favour of
    distributed local institutional self-archiving -- followed by central
    *harvesting* and (if desired) metadata enhancement. A primary reason
    given was the existing research culture of independent research
    universities and institutions, which is local, not centralized or
    national: CNRS and France are a prominent exception in this regard
    (and hence not considered in this study). One of the
    secondary reasons was cost.

        Swan, Alma and Needham, Paul and Probets, Steve and Muir,
        Adrienne and O'Brien, Ann and Oppenheim, Charles and Hardy,
        Rachel and Rowland, Fytton (2005) Delivery, Management and Access
        Model for E-prints and Open Access Journals within Further and
        Higher Education. JISC Report.

        Swan, Alma and Needham, Paul and Probets, Steve and Muir, Adrienne
        and Oppenheim, Charles and O'Brien, Ann and Hardy, Rachel and
        Rowland, Fytton and Brown, Sheridan (2005) Developing a model
        for e-prints and open access journal content in UK further and
        higher education. Learned Publishing.

So, in summary, the special case of CNRS+, HAL and France is a great
asset to world OA, accelerating French OA provision substantially, in a
way not possible in any other country, at a national and central level,
and setting a splendid example (of systematic self-archiving policy)
that will encourage the rest of the world's research institutions to
self-archive too.

But please, having already lost so much time in reaching 100% OA because
of so many other misunderstandings, let us not now lose still more time
in over-focusing on the local particulars of France's centralized research
institutions, as these cannot be generalized literally to other countries
lacking such centralized institutions. Even less should we focus on the
special Arxiv-like and other features HAL has elected to incorporate, or,
indeed, the cost of HAL: The Arxiv features and their extensions are not
essential (nor even necessarily optimal!) ones, OAI-interoprability is
enough, and the costs of a national centralized archive have no basis
for comparison with countries that distribute their research across
independent universities and research institutions. What is essential is
more content, *not* more functionality!

The take-home message from France is accordingly that 100% self-archiving
is desirable and feasible -- but the details (central-institutional
vs. distributed-institutional, HAL's specific special features,
and their cost) are, as they say in hexagonese: << des précisions
inutiles >> (useless details). The principle of adopting and implementing
institutional self-archiving policies for 100% of research output is what
the rest of the world should be taking to heart from France's splendid
example and initiative.

Best wishes,

Stevan Harnad

 On Mon, 28 Mar 2005, Franck Laloe wrote:

> At 18:15 26/03/2005, Leslie Carr wrote:
> >On 26 Mar 2005, at 15:14, Franck Laloe wrote:
> >
> >>We now have a goood experience of this question at CCSD, since we have
> >>run an archive for the CNRS (a French research institution) for a few
> >>years. Actually, the cost of running an archive is not much; one salary
> >>is needed to pay someone to check that the documents which are uploaded
> >>are OK for the archive; the price of the buyiung and manitaining the
> >>hardware is comparable or less.
> >>
> >>What costs more money, on the other hand, is to write new software. We
> >>constantly improve ours (it is now significantly different from ArXiv,
> >>although it remains compatible with it), and we pay three engineers for
> >>this. I would say that for a whole (medium size) country like France, a
> >>centralized system for all disciplins would cost about 10 salaries; this
> >>is of course an extremely small fraction of the research budget of the country.
> >
> >This is very interesting and important information. Would you be able to
> >give an indication of the kinds of changes that you have had to build on
> >the base software (I assume from your message that you began with arxiv)?
> >With all of these systems, the devil (and the expense) is in the details,
> >but the precise details differ from one situation to another. It would be
> >a terrific insight to have an Institutional Repository costing data-point
> >at the National end of the spectrum!
> >---
> >Les Carr
> Well, maybe I should first say that I was reasoning more in terms of the
> contribution on one country (France for us) to international archives (or
> repositories, I do not know which word is best). Of course, if each
> institution in the country wants to have totally independent archives (even
> if compatible through OAI-PMH for instance), the overall cost would be much
> higher. In my country there are many institutions (we have universities,
> research institutions, what we call "grandes écoles", etc..), and the
> danger to build an expensive Babel tower is real. The whole idea of CCSD is
> to offer a kind of national (or international) service to all institutions
> that want to set up "direct scientific communcation" through openarchives;
> CCSD develops the software and maintains it, adapts it when special
> requirements are necessary, and will ensure the long term preservation
> (technical migrations, soft and hard). This is the general idea, with no
> special limit put at the borders of the country: if any scientific
> institution in the world wants to join, they are welcome, assuming a
> sufficient scientific qualité of course.
> The data base where the articles are stored is a single base, with
> homogeneous metadata. But our technique allows institution to create
> personalized environments, with their own texts, logos, screen layout, and
> even with additional metadata if useful. Everyone can have acess to the
> generaly system (sumbmission and consultation) either through a generic
> interface, or through a personalized interface that is institution
> dependent and selects only the articles belonging to the institution.
> Institutions which want to have a mirror of backup of all their data on a
> computer they own may do so, if for some reason they do not trust CCSD for
> keeping their material.
> I should add that it was agreed with our american friends who run ArXiv
> that every document that is collected by CCSD and belongs to one of the
> scientific caterogires of ArXiv will automatically be transferred to ArXiv.
> This works pretty well, and ensures more visibility to the articles we
> collect. But we also collect articles in history, education, linguistics,
> etc.. , which do not go to ArXiv for obvious reasons.
> In practice, of course, there is still a long way to go before we collect
> all scientific production of the country. CNRS is the largest research
> institution in France, and roughly speaking half of the scientific
> departements strongly support CCSD by asking their people; there is good
> hope to include more departements soon. We now have an agreement with
> another research institution in France, INRIA, so this will expand the
> impact of the system. Negociations with other scientific insitutions are
> undeway. Just a figure to give an idea: in 2004 we have collected 1 500
> thesis files, i.e. about 10% of the national production. My hope is to be
> at about 50% in two or three years, but this is only an extrapolation for
> the moment. And our main goal is not limited to thesis, it includes all
> kinds of scientific documents (articles, conference proceedings, etc..).
> No, at last, the answer to your questions! No, we did not start from the
> ArXiv software, and actually were not advised by Paul Ginsparg and
> colleagues to do so when we started in 2000, for good reasons. ArXiv is
> almost 15 years old, techniques have changed since. Our software, which we
> call Hal (as the crazy computer in the movie!) does many things that ArXiv
> does not do: as I said above, it allows a personnalization of environments,
> contains the notion of "stamps", of "collections", can extract lists of
> publications, etc.. It constantly evolves under the pressure of various
> demands, and this is why we need three engineers at CCSD.
> This has been a long message, I will stop here! But please do not hesitate
> to ask if you wish to know more. Concerning the cost of CCSD, it is easy to
> calculate: salaries for three engineers (count 4 if you count Marco and me,
> two part time physicists), offices, usual expenses, computers and servers
> (but this is not much, except if you count backup procedures which can be
> expensive if they are at a high level of security).
> best wishes
> Franck Laloë
> Franck Laloë, LKB, Dept de physique de l'ENS, 24 rue Lhomond, F 75005 Paris
> (France)
> tel et fax 33 (1) 47 07 54 13 --
> Franck Laloë, LKB, Dept de physique de l'ENS, 24 rue Lhomond, F 75005 Paris
> (France)
> tel et fax 33 (1) 47 07 54 13 --
> _______________________________________________
> SI mailing list

SI mailing list
Received on Mon Mar 28 2005 - 15:44:34 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:51 GMT