Fwd: Repositories: Institutional or Central ? - further questions from NERC.

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Fri, 6 Feb 2009 09:18:38 -0500

On 6-Feb-09, at 6:59 AM, Gerry Lawson (RCUK,Secretariat) wrote (in
JISC-REPOSITORIES):

      Stevan, a very useful series of postings - thanks.

      UK Research Councils have a variety of OA mandates -
      including two which mandate deposition in CRs (MRC- UK
      PubMed and ESRC - Society Today).  WIth the exception of
      EPSRC (and this may well change) the others do mandate
      deposition, but are unspecific about where.  NERC, for
      example, says:

      "From 1 October 2006 NERC requires that, for new funding
      awards, an electronic copy of any published peer-reviewed
      paper, supported in whole or in part by NERC-funding, is
      deposited at the earliest opportunity in an e-print
      repository.  NERC also encourages award-holders to
      deposit published peer-reviewed papers arising from
      awards made before October 2006.  "

      BUT its very difficult to check compliance to these
      mandates!  Councils have reduced their final reporting
      requirements on the expectation that it will be possible
      to collect outputs information (not just publications)
      electronically from grantholders.  RCUK is assessing
      options for doing this - either pushing/pulling from
      Institutional Repostories or from HEI CRIS systems, or
      both.  Whatever is decided its certain that that we'd be
      assisted by inclusion in IRs of metadata fields for a)
      "Funder" (perhaps using a dropdown list of funders URIs);
      and b) "GrantReference".  


Gerry, you are absolutely right. IRs need to have a metadata field
that specifies the funder, for a variety of reasons, including
verification of  grant fulfillment conditions.

(As you note below, the EPrints IR software has already implemented
this metadata tag.)

This is also yet another strong reason why funders should not require
direct deposit in a CR, nor even simply require open-ended deposit in
any repository, but should instead specify the author's own
institutional IR as the designated locus of deposit (and DEPOT for
those fundees whose institution has not yet set up its own IR).

Universities are already eager to do everything they can to help in
ensuring compliance with grant conditions. They can accordingly be
invaluable aids to the funding council in verifying compliance.
See: How To Integrate University and Funder Open Access Mandates

      The disadvantage of using IRs rather than Central
      Repositories is the absence of minimum standards and
      formats in the former.  Both the above fields exist in
      CRs (e.g. UK PubMed and Society Today)


But the standards and formats can all be implemented in IRs. EPrints
is continuously upgrading its functionality to keep pace with the
emerging needs of Open Access (including Open Access mandates by
funders and institutions). 

Don't forget that two free IR softwares -- EPrints and DSpace -- are
used to create the majority of IRs. IR software standards can be made
widespread or even universal (as OAI-PMH, for example, was made) in
the distributed worldwide IR community with a resultant power, scope
and functionality that can not only match but exceed what can be done
with CRs -- and without any of the disadvantages of CRs that
Professor Rentier, author of the U. Liège mandate, and I have both
described.

      So, three questions re IRs (reply offline if you
      prefer)..........

      1. Funder and GrantRef fields exist in EPrints (as free
      text) from version 3.0  - do they exist in DSpace and
      Fedora - and in what form?


I don't know. But EPrints -- which is the first of the IR softwares
and invariably the leader in keeping upgrades lock-step with the
emerging needs of OA -- will contact DSpace and Fedora developers, as
it has in the past (most notably with the all-important  "request a
copy" Button) to urge them to implement the GrantRef field too.
(Meanwhile, institutions should just adopt EPrints!)

      2. Can a standard be introduced where they allow multiple
      funders - like multiple authors? (its unlikely we'd want
      to be as sophisticated as adding a %DueToGrant field!)


I can't see any reason why not. I am branching this to Les Carr, who
will be able to reply. (Perhaps it has been implemented already.)

      3. If Councils were to add to their mandates a sentence
      like:  'By [date] such records should be tagged with
      Funder and Grant Reference information, and made
      available for harvesting', what would be an appropriate
      [date].  I guess this is depends on the harvisting tool.
       I'm told that standard OAI-PMH doesnt handle these
      fields and that SWAP is not widely used?  What is the
      best approach?


For the technical answer, I defer to Les Carr and the EPrints
development team.

But for timing, the question is slightly more complicated: The
Councils should specify that the deposit must take place immediately
upon the date of acceptance for publication. This will vary of
course, so it cannot be specified in advance, but it is the best
milestone for authors and funders to use to time the deposit.
See: Optimizing OA Self-Archiving Mandates: What? Where? When? Why?
How?

With IRs (as long as we ensure that they provide the requisite
functionality), harvesting need not be restricted to OAI fields.
Again, I defer to Les, but the EPrints and DSpace metadata fields
should surely be uniformly detectable and automatically harvestable
regardless of whether they are part of the OAI protocol. Les?


      Additionally, some Councils mandate deposition only
      'where a suitable repository exists'.  Should we change
      this to something like 'where a suitable Institutional
      Repository does not exist it is expected that the
      JISC-supported repository of last resort, 'The Depot' ,
      will be used.'?


Yes, definitely! It will at last breathe some life into DEPOT so that
it begins to be used for its intended purpose, which was precisely
that!

I am ever so grateful for your reply, Gerry, because it shows not
only that the funding councils are listening, but it confirms how
important and fruitful convergent mandates can and will be. Much
gratitude also to Professor Rentier, Rector of University of Liège,
whose timely and perspicacious essay on the relation between IRs,
CRs, and between institutional and funder deposit mandates has
triggered all this constructive discussion and coordination.
Best wishes, Stevan Harnad


      Many thanks, Gerry Lawson

      NERC Research Information Systems
      RCUK fEC Review
      01793-444417

      ________________________________

      From: Repositories discussion list on behalf of Stevan
      Harnad
      Sent: Thu 05/02/2009 22:33
      To: JISC-REPOSITORIES_at_JISCMAIL.AC.UK
      Subject: Re: Repositories: Institutional or Central ? [in
      French, from Rector's blog, U. Liège]


      On Thu, Feb 5, 2009 at 12:34 PM, Chanier Thierry
      <thierry.chanier_at_univ-fcomte.fr> wrote:


      TC:

      I agree. The question of tools for central repository
      (CR) is central.

      - it is preferable to avoid opposing CR and
      (Institutional repository) IR.


      They are not opposed. Both are welcome and useful. What
      is under discussion is locus of deposit. (The deposited
      document itself, once deposited, may be exported,
      imported, harvested to/from as many repositories as
      desired. The crucial question is where it is actually
      deposited, and especially where deposit mandates from
      funders stipulate that it should be deposited.)

      The issues for locus-of-deposit are:


      (1) Single or multiple deposit?



I think everyone would agree that at a time when most authors
(85%<http://elpub.scix.net/cgi-bin/works/Show?_id=178_elpub2008&sort=DEFAULT&sea
rch=%22ELPUB%3a2008%22&hits=52>  ) are not yet depositing at
all, this is not the time to talk about depositing the same
paper more than once.


(2) If single deposit: where, institution-internally or
institution-externally?



The author's institutional repository (IR) might be his
university's IR, or his research institute's IR, or the IR of
some subset of his institution, such as his department's IR or
his laboratory's IR. The point is that the locus of production
of all research output -- funded and unfunded, in all
disciplines and worldwide -- is the author's institution. The
author's institution also has a shared stake and interest with
its authors in hosting and showcasing their joint research
output.

All other links to the author's research are fragmented: Some
of it will be funded by some funders, some by others, and some
will be unfunded. Some will be in some discipline or
subdiscipline, some in another, some in several. There is much
scope for collecting it together in various combinations into
such institution-external collections, but it makes no sense at
all to deposit directly in some or all of them: One deposit is
enough, and the rest can be harvested automatically. The
natural and optimal locus for that one deposit is at the
universal source: the author's own institution.


(3) Import/Export/Harvest from where to where?



The natural and optimal procedure is: deposit
institution-internally and then, where desired,
import/export/harvest institution-externally. This one-to-many
procedure makes sense from every standpoint: Single convergent
deposit, convergent mandates, maximal flexibility and
efficiency, minimal effort and complication (hence maximal
willingness and compliance from authors). The alternative, of
many-to-one importation, or many-to-many import/export means
multiple, divergent deposit, divergent mandates, reduced
flexibility and efficiency, increased effort and complications
(and hence reduced willingness and compliance from authors).


TC:

In some countries, CRs may be prominent (particularly because
local
institutions have a low status, so IRs may not mean much to
researchers ...
when they exist), because centralized procedures for evaluating
research
may offer opportunity to researchers to start depositing - see
hereafter
about France -).


Institutional status-level is irrelevant, because research is
not searched at the individual IR level but at the harvester
(CR) level. We are discussing here what is the optimal locus of
deposit, so as to capture (and mandate the capture) all of OA's
target content, worldwide, and as quickly and efficiently as
possible. What matters for this is to find a procedure for
systematically capturing all research output, and the natural
and exhaustive locus for that is at the source: the institution
(university, research institute, department, laboratory) that
hosts the researcher, pays his salary, and provides his
institutional affiliation.

There is of course research evaluation at the
institution-internal as well as the institution-external
(funder and national) level. But even for national research
assessment exercises, such as the RAE in the UK, the
institution and department are the "unit of assessment"; they
are local, and distributed. And the natural locus for their
research output is their own IRs. And that is exactly how many
UK universities provided their submissions to RAE 2008. See the
IRRA <http://irra.eprints.org/>  .



TC:

- Researchers should be free to choose where they deposit but
with
requirements to deposit. They may do it in different
repositories (I mean
one document is only in one place, but depending on the nature
of the
document / data, one may choose various repositories)


I am afraid that it is here that we reach the gist of the
matter (and the height of the misunderstanding and
equivocation):

First, the only kind of deposit under discussion here is OA's
primary target content: refereed journal articles. That is also
the only deposit requirement (mandate) under discussion here,
because although there are many other things an author might
choose to deposit too -- books, software, multimedia,
courseware, research data -- those are optional contents
insofar as OA deposit mandates are concerned. And it is
specifically the locus of deposit of the required contents
(refereed journal articles) that matters so much, particularly
in funder mandate policies.

So whereas it may seem optimal for a funder to simply require
deposit in some OA repository or other, but to leave it up to
the author to choose which (and such a funder mandate is
certainly preferable to a mandate that specifies deposit in a
CR, or to no mandate at all), this is in fact far from being
the optimal mandate
<http://openaccess.eprints.org/index.php?/archives/369-guid.html>
, for the reasons discussed by Prof. Rentier:

Most researchers (85%) do not deposit unless they are required
to. Funders can only mandate the deposit of the research that
they fund. If they require that it must be deposited in a
specific CR, they are in direct competition with institutional
mandates (necessitating double or divergent deposit). If funder
mandates simply leave it open where authors deposit, then they
are not in competition with IR mandates, but they are not
helping them either. As noted, institutions are the producers
of all research output -- funded and unfunded, in all
disciplines, worldwide. Only 30 institutions mandate deposit so
far, worldwide (out of tens of thousands). If a funder mandates
deposit, but is open-ended about locus of deposit, it leaves
institutions in their current state of inertia. But if they
specifically stipulate IR deposit, they thereby immediately
increase the probability and the motivation for creating an IR
as well as adopting an institutional deposit mandate for the
rest of the research output of every one of the institutions
that have a researcher funded by that funder.


TC:

- It is a tactical decision for OA supporters, knowing the
local habits,
to advertise ways of deposit to colleagues


But we already know
<http://eprints.utas.edu.au/view/authors/Sale,_AHJ.html>  that
advertisement, encouragement, exhortation, evidence of
benefits, assistance -- none of these is sufficient to get most
researchers to deposit. Only requirements (mandates) work (and
you seem to agree).


Now institutions are the "sleeping giant" of OA, because they
are the universal providers of all of OA's target content. So
to induce the "sleeping giant" to wake up and mandate OA for
all of his research output, there has to be something in it for
him (or rather them, because the "sleeping giant" is in fact a
global network of universities and research institutions). What
is in it for each of them? A collection of its own
institutional research output that it can host, manage, audit,
assess and showcase. What use is it to each of them if their
research output is scattered globally willy-nilly, in diverse
CRs? It increases the research impact of the institution's
research output, to be sure, but how to measure, credit,
showcase and benefit from that, institutionally, when it is
scattered willy-nilly?

Now, as noted, importation/exportation/harvesting can in
principle work both ways. But if a university that might wish
to host its own research assets has to go out and find and
harvest them back from all over the web, because they were
deposited institution-externally, instead of being deposited
institutionally in the first place, the time and effort
involved is considerably greater than simply mandating direct
institutional deposit would have been -- and that back-harvest
does not even yield all of the university's output: only
whatever institutional research output happened to be funded by
funders that also mandate OA! Yet if those funders had mandated
IR deposit, all that work would already be done, and the
university would have a strong incentive to adopt a mandate
requiring the rest of its research output to be deposited too.

Meanwhile, for a mandating funder, harvesting the distributed
IR content of all of its fundees into a CR is far easier, as
the fulfillment conditions for the grant need only specify that
the author should send the funder the URL for the IR deposit of
all articles resulting from the grant. The rest can be done
automatically by software.


TC:

- we have to make sure that people in charge of funding
research (EU,
National) do not oblige researchers to deposit in one specific
place
(their CR or any other)


On the contrary, there is every reason that funders should
specify the fundee's IR as the preferred locus of deposit, for
the reasons just adduced. Open-ended mandates are better than
competing CR mandates, but they are not nearly as good as
convergent, synergistic IR mandates (to help awaken the
sleeping giant).

(As I was writing this posting, two new funder mandates have
been announced -- FRSQ in Canada and NRC in Norway: Both are
welcome, but both are open-ended about deposit locus, and
consequently both miss the opportunity to have a far greater
positive effect on global OA growth, by stipulating IR
deposit.)


- But I understand them, because when they ask researchers to
give access
to their work and advertise the fact that they have been paid
by them,
there is currently no practical way of doing it (labels put on
deposit
with the name of the program which gave the money, and
harvesters able to
compute this information ?)


Yes, precisely. Funding metadata can easily be added as a field
in the IR deposit software -- and institutions will be only too
happy to help in monitoring grant fulfillment conditions in
this way, in exchange for the jump-start it provides for the
filling of their own IRs.


- I also understand them because I feel that they want to add
interesting
tools (search, computation, meta-engine), tools which could be
developped
by central harvesters (CH). We are late on this issue and
harvesters have
not made much progress (see hereafter).


To repeat: Locus of direct deposit has nothing whatever to do
with harvester-level search. Search is not done at the IR level
but at the harvester (e.g., CR)  level.


TC:

1) HAL and research evaluation
---------------------
3 years ago I tried to convince my former lab to open a
sub-archive within
HAL (same repository, but URL specific to the lab, with proper
interface).
I also tried to convince my university to have a general
meeting with
directors of local labs in order to invite them to do the same
and, at
another level, to manage the sub-archive in HAL for the
university (a
solution somewhere in between CR and IR). My colleague of the
lab agreed,
started the work but gave up because of lack of time. My
university never
answered to my proposal.


HAL is a nationwide resource that can in principle be used
(much the way the Web itself is used) to allow an institution
to create and manage its own "virtual IR". As such, HAL is
partly a platform for creating virtual IRs, rather than a CR.

So, essentially, what you and your colleague tried to do (and
only partly succeeded) was to create and manage an IR. That's
splendid, and welcome, but we already know that IRs alone are
not enough. Without a mandate, they idle at the usual 15%
baseline.

(Please note that a lab repository is an IR.)


TC:

Now, thanks to procedures for evaluating research in France,
labs will have
to choose the way they want to be evaluated (I mean the
technical
procedure to achieve it). Some software used by the national
board will
do the computation out of HAL. Consequently, my lab decided
this week to
urgently re-open and manage its sub-archive in HAL. Of course,
the first
thing they have to do is deposit of metadata. Actual deposit of
corresponding papers is not mandatory. But they will take the
opportunity
to suggest to researchers to deposit as well their full papers.


It won't work; it's been tried many times before. So this is a
great opportunity lost. As you see, the IR clearly languishes
neglected without a mandate. With a mandate -- particularly one
in which evaluation is based on what is deposited, as in Prof.
Rentier's mandate at Liège -- researchers perk up and deposit.
But if all they have to deposit is metadata, that's all they
will deposit (even though adding the full-text is just one more
keystroke).

The reason is that the effect of mandates is mostly not
coercive. Researchers don't jump to deposit just because they
are required to deposit. They actually want to deposit, but
they are held back by two main constraints, one small, the
other big:

(1) The small constraint is ergonomic. Researchers are
overloaded, and they will not do something extra unless it
really has a high priority. A deposit mandate, especially one
tied to funding and/or evaluation, gives the few minutes-worth
of keystrokes per paper <http://eprints.ecs.soton.ac.uk/10688/>
 (which is all that a deposit amounts to) the requisite
priority that they otherwise lack.

(2) The big constraint is psychological: Researchers are
(groundlessly) afraid to deposit their papers (even the 63%
<http://romeo.eprints.org/stats.php>  for which the journal
already gives them its explicit blessing to do so) -- afraid
until and unless their institutions and/or their funders tell
them they must, because then they know it is officially okay to
do so! The mandate unburdens their souls, and unlocks their
fingers.


TC:

Last thing : I do not mean that in France, only HAL should be
used. We
should make sure we have the choice to deposit where we please.


What France needs, like every other country, is funder and
institutional mandates converging on single-locus IR deposit
(irrespective of whether the IR is hosted by HAL). But if
mandating funders leave locus-of-deposit open, or insist on
generic deposit in some CR or other, the giant will keep
hibernating, institutional (departmental, laboratory) mandates
will not be adopted, and what IRs there are will continue to
lie fallow.


2) Harversters : advantages and current limits
----------
Just a personal experience. Till recently I used to advertise
my list of
publications by giving the URL of an open archive Edutice (a
thematic one,
VERY USEFUL in our domain, sub-part of HAL but with its local
procedure,
interface, etc.).
Now I give to colleagues the OAISTER URL (with the path to
follow) to get
all my publications (because some of them are in other
archives).
The problem is : deposits in Edutice appear twice in the
OAISTER list (as
deposits of Edutice and of HAL - but there is one only
deposit).
It is a concrete exemple of progress which should be made to
avoid
repetitions in harvesters (among many other new features).


If they had all been deposited in your own IR you would have
had an automatic listing of all your works (without
duplications) through a simple google IR site-search "chanier
site:http-IRetc." -- and your institutions would have it all
too. And so would OAIster. And you could have exported to
Edutice with SWORD
<http://openaccess.eprints.org/index.php?/archives/484-guid.html>
 if you wished.

De-duplication and version-comparator software
<http://www.jisc.ac.uk/whatwedo/programmes/reppres/tools/valrec.aspx>
 is already being developed (though it's hardly worth it, when
the problem is not the presence of duplicates but the absence
of even a singleton for 85% global refereed research output) --
and that's what mandates in general -- and convergent IR
mandates in particular, to awaken the slumbering giant -- are
needed for.

Stevan Harnad





****************************** end of Thierry's message
************

Le Mer 4 février 2009 22:12, Bernard Rentier a écrit :
> I agree. It is exactly what I was trying to say in my last
paragraph :
> it is my belief that lauching a centralised and/or thematic
repository
> (C-TR) can make sense, but only if it does not discourage
authors from
> posting their publications in an institutional repository
(IR),
> otherwise many publications will be lost in the process (I
mean lost
> for easy and open access).
>
> In addition, direct posting in C-TRs will shortcut IRs and it
will be
> a loss for universities in their attempt to  host their
entire
> scholarly production (this is just a collateral effect, I
know, but
> being a University President, it is a worry for me).
>
> C-TRs are of much more interest if they collect data at a
secondary
> level by harvesting from primary IRs.
>
> Bernard Rentier



**********************************************************************
Internet communications are not secure and therefore RCUK does
not accept legal responsibility for the contents of this
message.  Any views or opinions presented are solely those of
the author and do not necessarily represent those of the RCUK
unless specifically stated.
All RCUK staff can be contacted using Email addresses with the
following format: firstname.lastname_at_rcuk.ac.uk
**********************************************************************
Received on Fri Feb 06 2009 - 14:20:08 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:40 GMT