Re: Use of Navigational Tools in a Repository

From: Stevan Harnad <>
Date: Thu, 9 Mar 2006 13:12:20 +0000

> From: Maryanne Kennan <>
> Date: Thu, 9 Mar 2006 18:25:54 +1100
> Hi all, I am new to all this, so please excuse me if these are dumb
> questions... Questions: don't the external search engines harvest
> the metadata which may include entries from the taxonomies of the
> repositories? So if a researcher is searching via say Google or
> Google Scholar, they may be (knowingly or unknowingly) using words
> from the subject index/taxonomy rather than words from the full
> text...? So perhaps the work of the repository manager is used in
> ways other than those initially imagined? cheers mary Anne
> Graduate research student
> School of Information Systems, Technology and Management
> Faculty of Commerce and Economics
> The University of New South Wales
> Email:
> Telephone: 61 2 9385 4472

Good question. Someone more technical than me will have to reply whether
google also indexes on the metadata and not just the full-text. But even
if it does, it is almost certain that (1) the index terms are redundant
with the full text and (2) cannot now be searched-on to the exclusion of
the full text. So any subject terms therein are not being searched *as*
subject terms, but just as a (tiny) subset of the full text.

I am certain, though, that if and when the OAI-compliant OA corpus grows
from its current sparse 15% to something closer to 100% of the target
corpus (OA versions of the 2.5 million annual articles published in the
24,000 journals), then google and google scholar *will* use the OAI
metadata for search, and not only the full-text. (And if they don't, the
OAI search engines certainly will, and they will be the preferred search
engines for searching and navigating the OA/OAI corpus.)

On that happy day, there *will* be searchable subject taxonomies
available too, but they will not have been hand-entered at deposit by the
poor author (or the author's proxy assigns) into a vast prefabricated
menu! They will be automatically generated centrally by AI processing
of the full texts after the all-important fact (i.e., their local
depositing itself)!

Stevan Harnad
Received on Thu Mar 09 2006 - 13:15:37 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:14 GMT