Re: Google's Scholarly Search Service and Institutional OA Self-Archiving from Stevan Harnad on 2004-11-29 (American-Scientist-Open-Access-Forum)

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Mon, 29 Nov 2004 13:11:49 +0000

Google Scholar http://scholar.google.com
currently has 3,630 items for Dspace and
16,100 for Eprints. I expect both are undercounts. Eprints
certainly is, otherwise the figure would be at the very least
55,000 (for the Eprints subset harvested by celestial)
http://celestial.eprints.org/cgi-bin/eprints.org/graph
or at the very-very-very least 31,688:
http://www.eprints.org/

I hope Google Scholar will cover these two sets of scholarly
full-text sites more fully. They are likely to provide the
richest source of scholarly full-texts. Regular Google, for
example, already carries 210,000 items from Eprints, so they
are in there! Just a matter of porting them to Google Scholar!

The fact that the item is in Eprints or Dspace should be a criterion in
scholar.google's identification rule. So should the fact that
it comes form an OAI-compliant site.

Stevan Harnad

On Mon, 29 Nov 2004, Peter Suber wrote:

> [Forwarding from the DSpace-general list. --Peter.]
>
>
> Hi all,
>
> I wanted to mention that the new Google Scholar search
> (http://scholar.google.com) is including items from
> DSpace repositories in the results, as long as they're open for harvesting
> the full-text. I did notice that some
> institutions running DSpace that should be there aren't yet, so I've asked
> Google why they're missing.
>
> It can be a little tricky to figure out if you're institution is getting
> included or not -- search some known items
> from your repository and plow through all the results, and be sure to check
> all the versions since your copy
> might not be one of the first listed. If you're there, great, and if you're
> not (and want to be) then first make
> sure your repository's web server isn't blocking crawlers, and then write
> to me or them directly
> (scholar-support_at_google.com) to make sure they crawl your site.
>
> They also wanted me to mention that if you have limited access material
> that you would like to get indexed
> by Google but not cached by them for display, they're very interested in
> working with you. For example, at
> MIT we have some book titles from the MIT Press in our DSpace repository
> which are only available for free
> to the MIT community. Google proposes to index them, but not cache them, so
> that when a searcher finds
> one of them in a result set in google.com they're returned to DSpace to
> view the item and can get to the
> Press's online ordering system from there. More traffic for the book, more
> money for the Press. Let me
> know if you're interested in this and I'll put you in touch with the Google
> folks. Remember: if your DSpace
> content is freely available to the public then Google and the other web
> search engines should *already* be
> harvesting it so you don't need to do anything...
>
> MacKenzie
>
>
> MacKenzie Smith
> Associate Director for Technology
> MIT Libraries
> Building E25-131d
> 77 Massachusetts Avenue
> Cambridge, MA 02139
> (617)253-8184
> kenzie_at_mit.edu
Received on Mon Nov 29 2004 - 13:11:49 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:42 GMT