Re: Non-Discoverability or Non-Existence?

From: Stevan Harnad <>
Date: Fri, 20 Jul 2007 13:18:35 +0100

On Fri, 20 Jul 2007, Steve Hitchcock wrote:

> Google makes the known needles easy to find in IRs. Here's the harder
> part. How do I find what I don't know about? In other words, it's
> there in an IR, just like there's lots of stuff in repositories, but
> how do I find it if I know nothing about it? Is it just random
> chance, or is there a more systematic way? Alternatively, how do IRs
> advertise their contents to people?

There are two kinds of search. One (the trivial one) is where I have the
reference in hand, and I am simply seeking the full-text online. I don't think
anyone disputes that such items are "discovered" and retrieved by google
uncannily well (if they are indeed out there in google space).

The other kind of search is the one based on topic words and keywords. That's the
kind of search that those who claim there is a "discoverability" problem with OA
IRs have in mind. And that is where they have to provide objective evidence that
it is truly the case, rather than simply a consequence of the fact that IRs are
near empty, hence the target contents are non-existent, rather than

The only way to test this is to establish that a sufficiently large sample
of identical target items is present in both OA IR space and a benchmark
database that *does* have the "discoverability" capability they envision
-- and then to demonstrate the size of the putative discoverability problem
in the OA IR case.

Stevan Harnad

> Steve Hitchcock
> IAM Group, School of Electronics and Computer Science
> University of Southampton, SO17 1BJ, UK
> Email:
> Tel: +44 (0)23 8059 7698 Fax: +44 (0)23 8059 2865
> At 12:11 20/07/2007, Leslie Carr wrote:
> >On 20 Jul 2007, at 09:03, Mahendra Mahey wrote:
> >
> >>JIBS and JISC Collections Workshop -
> >>Discovering eprints: finding needles in the haystack?
> >
> >Andy Powell did 30 minutes work on this last year and showed that the
> >needles were actually quite easy to find with Google. (see http://
> > )
> >
> >I have just repeated his exercise with some eprints selected from
> >repositories at Southampton, Loughborough, Strathclyde and
> >Westminster and found that the situation is unchanged. ie, it is very
> >easy to find a specific needle using the needle's title or using
> >keywords drawn from its title.
> >
> >I suspect that the real difficulty in finding needles comes from the
> >fact that most of them haven't been put in the haystack in the first
> >place.
> >
> >Can anyone point me at some data showing the difficulty that people
> >are having in finding eprints? I would genuinely like to know - I am
> >NOT a Google apologist (I believe that there are probably serious
> >theoretical flaws with using it for certain types of information
> >discovery), but I dislike perpetuating urban myths and I would like
> >to find some serious data.
> >---
> >Les
