Re: Search Engine for Repositories Only?

From: Stevan Harnad <>
Date: Fri, 4 Aug 2006 14:58:08 +0100 (BST)

On Fri, 4 Aug 2006, Philip Hunter wrote:

> Re: the OAI PMH architecture - the division into data and service providers
> was originally just that: a nice simple explanation of what was going on.
> Service providers add third-party services to simple data provision.

Still sounds good to me!

> All of which was up to the community to think about and to develop. There
> was no assumption that ultimately search and browse would happen through one
> subdividable 'global' super-aggregation of records, which you are suggesting
> as the most efficient approach.

No, let 1001 browsers compete: But they should complete via superior
functionality, not via subtotal coverage!

> On Google, it is worth pointing out that it doesn't do harvesting. In
> principle or otherwise. It _indexes_. The difference is subtle, but
> important for understanding what it can (and can't) do.

Google does not harvest OAI metadata. That is true (though it could, and, I
predict, Google Scholar will, sooner or later). But as important as harvesting
metadata is the fact that Google *does* harvest (and invert) full-texts!
That, after all, is half of the miracle of Google (PageRank, ranking hits by
recursive link counts and weights being the other half).

There is no reason for OAI to duplicate the effort of harvesting
and inverting full-texts (at least not right now). Hence a strategic
collaboration with google would be beneficial to both OA/OAI and Google
Scholar at this stage of play. It would give OA/OAI the power of full-text
boolean search over all of OA/OAI space, and it would help provide Google
Scholar with more OA content.


> ......
> >> There are a number of reasons why you might want national and restricted
> >> range OAI search engines as part of a global infrastructure for
> >> repository
> >> usage, including questions of quality assurance and the practical
> >> management
> >> of repository networks.
> >
> > I don't understand. If you have T total local IRs harvested by a
> > full-spectrum OAI harvesting/searching service such as OAIster, and Q
> > of them meet your local or national quality standards, then why wouldn't
> > simply restricting to that Q subset of T be the local quality assurer?
> >
> > And how do global or local OAI searcher/harvesters affect the practical
> > management of the local IRs themselves?
> >
> > Or is it that there are specific search functionalities that the national
> > or local
> > search engines would provide that global ones cannot or do not provide?
> >
> >> This question came up at the WWW2006 JISC workshop
> >> on repositories, where I suggested that global services might be built
> >> most
> >> practically on the basis of locally developed and managed services.
> >
> > The rationale for the OAI protocol has been to develop global OAI services
> > on top of
> > distributed local OAI data-providers (IRs) -- not on top of distributed
> > local OAI
> > services.
> >
> > The latter is possible too, but I would be keen to know the concrete
> > functional objective of doing it that way.
> >
> >> The institutional and geographic level levels of repository services
> >> which,
> >> while of little or no interest to the user, offer a number features which
> >> can support the quality and sustainability of global OAI search services.
> >
> > I am not saying that there is no possible functional advantage there: I am
> > just
> > saying I have not yet heard what it is, concretely. What are the
> > institutional and
> > national OAI search engines meant to do that the global ones do not or
> > cannot do?
> >
> >> This is one of those quasi-theological issues which are often quite
> >> divisive - do we work from the centre (full-spectrum OAI search engines,
> >> harvesting local services directly, with restricted range search
> >> options),
> >> or do we build global services using a tiered structure (restricted range
> >> OAI search engines whose aggregated records are globally harvested)?
> >
> > It is only theological if we are not specific about exactly what concrete
> > functionality we have in mind. On the face of it, the OAI picture
> > is: distributed local OAI data-providers (IRs), plus global OAI
> > service-providers providing services on top of those local IRs. There
> > can of course be OAI services on top of OAI services too, but with
> > search-services in particular, it is not obvious what sorts of things
> > the subglobal ones would be doing that the global ones would/could not.
> >
> > Please help me see!
> >
> >> We might have to suck it and see. The choice also depends to some extent
> >> on
> >> what the world at large thinks repositories are for - a matter clearly
> >> still
> >> in flux.
> >
> > That might be the gist of it: There are those who think IRs are for
> > digital content management and preservation, and those who think IRs
> > are for maximizing research access-provision. It might be helpful to
> > distinguish OAI DL IRs (OAI-compliant Digital-Library IRs, for digital
> > content management and preservation) from OAI OA IRs (OAI-compliant
> > Open- Access IRs, for providing research access). What the requisite
> > search services and functionalities might be, and be for, may then look
> > quite different for the two kinds of IRs.
> >
> > (Very similar questions underlie the [what should likewise be functional
> > rather than theological] issues surrounding the question of central
> > versus local repositories: CRs vs. IRs. And again it depends on what
> > you want them for, and what you want them to do, how.)
> >
> > (To go still further: In principle, the harvester of all harvesters is
> > google, and it harvests and inverts much of web content already. What
> > sets the OAI harvesters apart is that (1) they focus on a specific kind
> > of content, not all of webspace, and (2) they use the OAI tags. But of
> > course google could be restricted to that subset, and configured to allow
> > navigation based on the OAI tags too! Google Scholar is already going
> > in that direction. With a full-text harvester already trawling the net,
> > does OA/OAI really have to reduplicate the efforts? This is not a
> > rhetorical
> > question, but a practical, functional one.)
> >
> > Stevan Harnad
> >
> >
> >
Received on Fri Aug 04 2006 - 15:49:54 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:27 GMT