Re: Economic effects of link-based search engines on e-journals

From: William Y. Arms <wya_at_CS.CORNELL.EDU>
Date: Mon, 2 Oct 2000 14:38:16 -0400


I found your posting about web links very interesting.

My observation (based on conversations with my colleagues and questioning
the students in my Cornell classes) is that most of them use general Web
search engines (notably Google) as their first choice way of looking for
information. I can only hypothesize about their motivation, but here are
some possible reasons:

1. Instant gratification -- If something is identified through Google, a
single click brings the actual item.

2. Recall is more important than precision -- Users do not mind scanning
10-50 items (many of which are clearly irrelevant), so long as they find
something. Low precision does not matter with a good ranking algorithm.

3. Two-step coverage -- The facts that (a) Google indexes a billion items
and (b) its ranking algorithm emphasizes general materials means that it
usually finds a good introduction or a good overview of a topic, which
often acts as a guide to more detailed information.

However, perhaps the most instructive insight came from a senior memeber of
Springer-Verlag. As a good marketing firm, Springer has observed that
their potential customers are heavy users of web search services.
Therefore, they are setting up web materials explicitly designed for the
web crawlers to index.

The potential advantage of additional metadata is to improve the precision
of searching. This will be increasingly important as the volume of online
information grows. For this reason, I am an advocate of the Open Archives
approach. The Open Archives approach also provides a good way to provide
access to materials that cannot be found by web crawlers, (e.g., they are
formats other than text, dynamic, held in databases or restrocited access).


William Y. Arms
Professor of Computer Science email:
Cornell University web:
5159 Upson Hall telephone: 607-255-3046
Ithaca, NY 14853 fax: 607-255-4428
Received on Mon Jan 24 2000 - 19:17:43 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:45:52 GMT