Re: Required and Desirable metadata in a repository

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Sat, 1 Mar 2008 17:48:58 +0000

I am posting this at Prof. Wigan's request, but appending comments to
avoid generating misunderstandings.

(My replies refer to capabilities of the EPrints software. DSpace has most
of these capabilities too, and other softwares can easily implement
them.

On Sat, 1 Mar 2008, Marcus Wigan wrote:

> Steven, could you please forward to the list?
> There are several points Id like to make:
> 1) peer reviewed flag: YES please
>
> but there are several variants here
>
> a) a preprint held in an internal repository, which has subsequently been
> published in a peer reviewed journal, but at the time has not been nor has
> the journal been selected.. its still an invaluable tag, but implies a
> double
> entry and update process. Not desirable from a management point of view
> (unless the authors do it.. and not many will)

(i) For documents deposited as unrefereed preprints, their metadata can be
updated to add that they have been peer-reviewed (including journal
name, date, etc.) once those data are available; and if the paper itself
has revised in response to the refereeing, an updated full-text can be
added too.

(ii) The versions are linked; the latest version is tagged as such.

> b) a postprint (easy)

Yes.

> 2) invaluable metadata fields.
>
> a) I use [abstracts and] keywords heavily (which is of course why I
> implemented an automated keyword populator matching against specialist
> Thesauri [pdf available if anyone is interested])

(i) Not a repository-level function, because searching is not done at the
repository level but at the harvester level:

(ii) The repository software has abstract and keyword fields, but they
are unlikely to be more useful or usable than inverted full-text
boolean search, again at the harvester level, particularly when search
is restricted to OA content (or even peer-reviewed content).

(iii) Thesauri can be grafted on (again by harvesters/indexers)
if/when the OA content is complete or near-complete (but, again, I'll bet
they won't improve much on boolean full-text search).

> b) long ago when abstracting services were new, the IRRD inputs from
> Australia
> were created using expert researchers in the fields involved to write the
> abstracts and create the keywords (both IRRD Thesaurus based and
> additional
> new keywords suggested- most of which were of course methodological, where
> specialist thesauri are generally very poor). I wrote several hundred of
> these...

Those days are happily over: Provide the OA full-texts and the software
will take care of the rest. (Again, certainly not a repository-level
function, though repositories need to configure themselves to provide
optimal output for harvesters/indexers.)

> so an ideal field in document metadata would be something like
>
> - what are the key items you the author of the document rate as the major
> contribution in this document?

Can't hurt, but certainly not essential for search.

> Ive had far far greater success in getting authors to add this than to add
> abstracts of keywords or or.. almost anything else .. and they are
> invaluable.

Those were in the days when the full-text itself was not OA, harvested,
inverted, and indexed.

Stevan Harnad

> Im posting this as the latter point was endorsed b other off list
>
> marcus wigan
>
>
>
> At 6:39 PM +0000 29/2/2008, Stevan Harnad wrote:
> > Bill Hubbard is spot-on on the utility of am explicitly searchable
> > field indicating whether or not an item has been peer reviewed. The
> > EPrints software has such a tag.
> >
> > (It is only likely to be useful at a harvester level, as individual
> > repositories (IR) are only likely to be searched for
> > institution-internal purposes. So this is a metadatum worth displaying
> > for harvesters, and harvesters should set up in such a way as to make
> > it possible to search on only the peer-reviewed items, if the user
> > wishes.)
> >
> > I am not certain, however, about the usefulness or urgency of a
> > "copyright" tag at this time, for either author or user: This might
> > possibly be useful institution-internally (e.g., for IP vigilantes --
> > though one wonders whether they would trust an author's
> > self-assessment!) but I doubt they would be useful at the webwide user
> > level.
> >
> > Individual users certainly don't need to see or know the copyright
> > information, in order to view the item on-screen, download it, print it
> > off, and store it locally. (Users certainly don't worry about that in
> > accessing the billions (trillions?) of other kinds of items that are
> > web-readable!)
> >
> > It would only be relevant if the individual user wished to re-post or
> > republish the journal article -- and I'd be inclined to treat that rare
> > and non-fundamental usage-need as a special case, one not requiring a
> > universal tag to facilitate it at this time -- especially because as OA
> > content will grow, the copyright picture will change, and these extra
> > re-use rights will eventually become part of the default conditions for
> > OA content.
> >
> > I'd be inclined to say the same about the utility of an explicit
> > copyright status tag for the sake of harvesters who wish to put the
> > article in a database or to data-mine it: Again, harvesters like Google
> > do this already, without further ado, for the billions (trillions) of
> > items on the web already. It is hard to imagine that the minuscule
> > portion of all that web content that OA content represents (c. 2.5
> > million articles per year) warrants or necessitates explicit copyright
> > tagging at this time.
> >
> > Stevan Harnad
> >
> > On 08-02-29, at 12:10, Hubbard Bill wrote:
> >
> > > Dear Colleagues,
> > >
> > > Just picking up on Ian Stuart's question as to opinion on "Required"
> > > and
> > > "Desired" metadata fields for eprints records.
> > >
> > > Could I ask colleagues how they view a "peer-reviewed" field?
> > >
> > > In terms of what users want, my own experience from talking to
> > > academics
> > > is that when faced with a mass of Open Access eprints the great
> > > majority
> > > have asked unprompted about how to search only within peer-reviewed
> > > material.
> > >
> > > And for this facility we need to give services a peer-review field,
> > > unless they start interpolating from other metadata features like
> > > journal-title or somesuch.
> > >
> > > Copyright and peer-review (p-r) are the two topics that can be
> > > guaranteed to come up in academic discussions in relation to
> > > repositories: the first from their perspective as an author, the
> > > second
> > > from their perspective as researcher/user.
> > >
> > > My strong suspicion is that most of those academics that haven't
> > > asked
> > > about a p-r filter would want the feature before they used OA
> > > material
> > > as a habitual source for research. Again, it may be that they didn't
> > > ask
> > > because they assumed that it was all p-r, or, that it was all
> > > non-p-r.
> > > (I have found repositories have a slighted reputation in some
> > > quarters
> > > (often BioMedical) as being all referred to as "pre-print servers").
> > >
> > > In terms of ingest, I think that the author is the best person to
> > > know
> > > if their eprint has been p-r'd and that a peer-review tick-box would
> > > be
> > > an acceptable additional task. Authors are generally pleased that
> > > their
> > > article has passed p-r and would probably be happy about noting that.
> > > As
> > > to how that information is recorded, that is another matter.
> > >
> > > Does this agree with other colleagues' experience? Is a p-r field
> > > required to facilitate future use of the material?
> > >
> > > Regards,
> > >
> > > Bill
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Repositories discussion list
> > > >
> > > > [<mailto:JISC-REPOSITORIES --
> > > > JISCMAIL.AC.UK>mailto:JISC-REPOSITORIES -- JISCMAIL.AC.UK]
> > > > On Behalf Of Ian Stuart
> > > > Sent: 21 February 2008 14:41
> > > > To: JISC-REPOSITORIES -- JISCMAIL.AC.UK
> > > > Subject: Required and Desirable metadata in a repository
> > > >
> > > > [This is primarily a question for those involved in repositories
> > > > for
> > > > e-prints, but others may have interesting views]
> > > >
> > > > Within your own Repository, what [primarily metadata] fields are
> > > > *Required* and what are *Desired*?
> > > >
> > > > If you were advising a fellow Institution about setting up a
> > > > repository,
> > > > what fields would you advise as *Required* and what are
> > > > *Recommended*?
> > > >
> > > > If you were to harvest[1] from a repository, what fields would you
> > > > consider essential, and what would you consider helpful?
> > > >
> > > > Following on from that: if you were to harvest the Depot (or even
> > > > the
> > > > Intute Repository Search), how would you hope to identify[2]
> > > > deposits
> > > > that could be imported into your own Institutional Repository
> > > >
> > > > [1] This is where I come in: The depot will have a transfer
> > > > service, but
> > > > what to transfer?
> > > > [2] I've had loads of thoughts on this one, and they all seem
> > > > to spiral
> > > > and knit and knot and hide their threads, and not actually
> > > > conclude in
> > > > any meaningful way.... for me.
> > > >
> > > >
> > > > --
> > > >
> > > > Ian Stuart.
> > > > Developer for The Depot,
> > > > EDINA,
> > > > The University of Edinburgh.
> > > >
> > > > <http://edina.ac.uk/>http://edina.ac.uk/
> > > >
> > >
> > > --
> > >
> > > Bill Hubbard
> > > SHERPA Manager
> > >
> > > SHERPA - www.sherpa.ac.uk
> > > RSP - www.rsp.ac.uk
> > > RoMEO - www.sherpa.ac.uk/romeo
> > > JULIET - www.sherpa.ac.uk/juliet
> > > OpenDOAR - www.opendoar.org
> > >
> > > SHERPA
> > > Greenfield Medical Library
> > > University of Nottingham
> > > Queens Medical Centre
> > > Nottingham
> > > NG7 2UH
> > > UK
> > >
> > > Tel +44(0) 115 846 7657
> > > Fax +44(0) 115 846 8244
> > >
> > > * * * * * * * *
> > >
> > >
> > > This message has been checked for viruses but the contents of an
> > > attachment
> > > may still contain software viruses, which could damage your computer
> > > system:
> > > you are advised to perform your own checks. Email communications with
> > > the
> > > University of Nottingham may be monitored as permitted by UK
> > > legislation.
> >
> >
> >
> > This message is intended for the addressee(s) only and should not be
> > read,
> > copied or disclosed to anyone else outwith the University without the
> > permission of the sender.
> > It is your responsibility to ensure that this message and any
> > attachments
> > are scanned for viruses or other defects. Napier University does not
> > accept
> > liability for any loss
> > or damage which may result from this email or any attachment, or for
> > errors
> > or omissions arising after it was sent. Email is not a secure medium.
> > Email
> > entering the
> > University's system is subject to routine monitoring and filtering by
> > the
> > University.
>
>
> --
> ==========================================================================
> ====
> Dr Marcus Wigan, Personal website http://go.to/mwigan Personal email
> m.wigan -- hertford.oxon.org
> * Principal Oxford Systematics, Box 126 Heidelberg 3084 Australia
> * Senior Consultant Demis BV, delft, The Netherlands
> * Professorial Fellow, GAMUT, Faculty of Architecture and Planning,
> University of Melbourne
> * Emeritus Professor of Computing and of Transport Systems, Napier
> University Edinburgh Scotland * Professorial Fellow, Civil and
> Environmental Engineering, University of Melbourne
> * Visiting Professor, CTS, Civil and Environmental Engineering, Imperial
> College London
> ==========================================================================
> ====
Received on Sat Mar 01 2008 - 21:20:01 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:14 GMT