Re: Use of Navigational Tools in a Repository

From: Frank McCown <fmccown_at_CS.ODU.EDU>
Date: Fri, 10 Mar 2006 09:52:49 -0500

> Why is this observation apparently so different from that reported by
> by McCown et al? Firstly the figures quoted from the papers are
> averages. The paper gives a table of 10 representative repositories
> whose Google percentage varies between 100% and 1.3%. Secondly, we
> are measuring different things - McCown et al tested a statistical
> sample of the search engine's index by query, whereas I have examined
> the actions of the search engine's crawler and ASSUMED that a
> document that is crawled must be indexed.

We were just testing URLs that we found in DC records, so it may be that
the URLs that were crawled were not present in the records. You can see
what URLs we tested here:

http://www.cs.odu.edu/~fmccown/research/oaipmh_coverage/datafiles.html

Most of the time I believe you can assume a document that is crawled
will be indexed, but search engines also perform some post-processing to
look for spam, duplication, etc. that may cause your page not to be indexed.

--
Frank McCown
Old Dominion University
http://www.cs.odu.edu/~fmccown/
Received on Fri Mar 10 2006 - 15:38:30 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:14 GMT