Re: OpenDOAR Search (extended to discuss formalisation of metadata)

From: Jeffery, KG (Keith) <"Jeffery,>
Date: Fri, 27 Oct 2006 15:37:47 +0100

All -

I agree with Les that we still need repositories and that Google, even
if customised, is still a rather blunt instrument.

The problem with the metadata is fairly obvious; it is machine readable
but not machine understandable i.e. the syntax is rather loose and the
semantics almost non-existent. This results in the end-user having to
browse on screen to achieve the required degree of recall and relevance
- a time-consuming and non-scalable way forward.

If this is compared with the formal metadata of a DBMS schema, or that
associated with any particular, specialised domain of scientific
research for data exchange / access then the difference is obvious

We need formalised metadata that ensures (heterogeneous) computer
software systems can interoperate using it. We have to resolve character
set, language, syntax and semantics. We've had a go at this at CCLRC
(so-called formalised DC) and it is strongly interlinked with CERIF (the
data/metadata standard for research information maintained by euroCRIS Stevan in particular will remember all this from the
CRIS2006 conference which he attended.


Prof Keith G Jeffery Director Information Technology
and International Strategy CCLRC Rutherford Appleton Laboratory
T:+44 1235 44 6103 Chilton, Didcot, OXON OX11 0QX UK
F:+44 1235 44 5147
WWW Person:
President ERCIM & CCLRC Director:
W3C Office at CLRC-RAL
President euroCRIS
VLDB Trustee Emeritus:
EDBT Board Member
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended recipients
do not take action on it or show it to anyone else, but return this
email to the sender and delete your copy of it
The CCLRC telecommunications systems may be monitored in accordance with
the policy available from

-----Original Message-----
From: American Scientist Open Access Forum
Behalf Of Stevan Harnad
Sent: 27 October 2006 11:17
Subject: Re: OpenDOAR Search


Date: Fri, 27 Oct 2006 10:15:52 +0100
From: Leslie Carr <>
Subject: Re: OpenDOAR Search

On 26 Oct 2006, at 19:00, Hubbard Bill wrote:

> Please find below an announcement from OpenDOAR for a search facility
> based on OpenDOAR holdings.

This is a very interesting service!

There was a discussion on this list at the beginning of August about
"Search Engines for Repositories Only". There were several attempts to
define constrained searches using RollYO or similar, but they all
suffered from one defect or another (too few sites, or logins required
etc). The Google Custom Search that OpenDOAR have set up seems much more
suitable to the repository community needs. Further, it would seem to be
fairly simple to set up Country-specific searches (a la UKOLN's EPrints
UK) by providing location-identifying annotations for each repository.

I have had a go with this, and created a ROAR-based Repository Search
Engine at
You can search all the ROAR repositories for a keyword and then Derek
Law can click on 'Scottish Research' to reduce the set of results to
those coming from the Scottish repositories (the "small and smart"
ones, according to his recent keynote at Open Scholarship :-)

There is a serious point that this opens up: why would we bother with
OAI-based repositories, if you can do it all with Google? The advantage
that OAI provided us was "metatdata", ie the possibility of providing
more accurate resource identification. The advantage of repositories
were that they provided an identifiable source of (well-
maintained) research material. Of course, the one can be simulated by
the other, and if Google could support a simple quality control
"refereed material" tag then we could get by without OAI and without

Well, it doesn't, and so OAI still seems our best hope. However, even
with five years of OAI our repositories are not doing a very good job of
sharing metadata that helps a service to comprehend the status of the
holdings that it harvests (is this a published, refereed journal
article or equivalent? Is this a paper from an unrefereed workshop?
is this a chemical data file?) Too much is still down to interpretation
and subsequent data mining of the web pages. The Eprints Application
Profile (
digirep/index/Eprints_Application_Profile) seems to be doing a good job
in achieving consensus in the use of Dublin Core, but there is an urgent
need for it to be implemented by all repositories!

We've spent a lot of time and effort on advocacy and policies over the
last couple of years, but I think it's time that we went back to some of
the technical fundamentals and made sure that our information
interoperability is up to scratch, otherwise we'll find ourselves in a
universe where the only thing you can do is a keyword search!
(just my opinion)
Received on Fri Oct 27 2006 - 16:32:28 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:33 GMT