Re: access to self archive via google scholar

From: Peter Suber <peters_at_EARLHAM.EDU>
Date: Mon, 23 Oct 2006 08:27:41 -0400


Many OA repositories are inadvertently configured to frustrate rather than
facilitate Google crawling. Does the ScholArchive repository follow the
recommendations posted here
<>? These
recommendations were written for straight Google, not Google Scholar, but
they are likely to help with both.

      Peter Suber

At 05:42 PM 10/22/2006, you wrote:

>Those starting an IR should not expect Google to quickly harvest or to
>logically rank the journal articles posted on the new IR. This is based
>on my recent experience with helping the Florida Center for Library
>Automation (FCLA) in their efforts to test the IR waters with
>ScholArchive ( ), a pilot IR that is focused on
>the scholarly output of the faculty and graduate students of the
>University of Florida's Department of Entomology and Nematology.
>ScholArchive (using E-print software) went on line 28 July 2006 with 7
>posted articles. Here are excerpts from five emails relevant to their
>harvesting and ranking by Google and a report of today's results:
>1 Aug 2006 (from ScholArchive Administrator)
>"I have registered us with Google, Google Scholar, SCIRUS, ROAR, DOAR,
>etc. so we should be indexed very soon by lots of search engines,
>1 Sep 2006 (from ScholArchive Administrator)
>"I have been monitoring Google Scholar, Google and other discovery sites
>for the past 5+weeks since your papers were loaded, with the same
>disappointing results, even though I registered ScholArchive with these
>1 Sep 2006 (from Tom Walker to ScholArchive staff)
>"This is disappointing because faculty will be more likely to post their
>journal articles in ScholArchive IF we can show that doing so will
>significantly help Google users find openly accessible full text of the
>To illustrate how this might prove to be the case, consider my 2001
>Environmental Entomology article entitled "Butterfly migrations in
>Florida: seasonal patterns and long-term changes." This morning I
>entered "butterfly migrations in Florida" as a Google search phrase and
>got 36 hits (under 11 main listings). Here are the first six main
>1. My personal web site. [A click on Google's listing loaded the PDF
>file of the article.]
>2. BioOne. [A click on the listing loaded the abstract, but without a
>BioOne license the full text would be inaccessible.]
>3. Ingenta Connect [A click led to a page with the abstract and a chance
>to pay $25 for access to the full text.]
>4. TX-BUTTERFLY archives. [A click led to a bibliographic entry that had
>a dead link to the PDF file of the article. (My web site's URL was
>changed a few years ago)]
>5. Journal of the Lepidopterists' Society [A couple of clicks led to a
>1993 article on trapping migrating butterflies.]
>6. The Entomological Society of America Journals Online [A click led to
>the TOC of the issue, another click led to the abstract, and a third led
>to the PDF file. But unless someone knew that I had paid ESA to provide
>OA for my article, who would have thought that free access to the PDF
>file would have been found here?]
>BOTTOM LINE: Had I not posted the PDF file on my Web site, very few
>would have found free access to the article's full text. Thus it is
>important to know how Google will rank the ScholArchive posting.
>Incidentally, I ran the same search in Google Scholar BETA and got only
>one hit-the same as no. 2 above!"
>20 Sep 2006 (from ScholArchive Administrator)
>"As it turns out, Google is indeed indexing our site, but only the
>top-level pages, not the papers inside the repository. I am working on
>how this can be changed."
>9 Oct 2006 (from Tom Walker to ScholArchive staff)
>"Yesterday I checked Google to see if the ScholArchive version of my
>butterfly migration paper had been harvested. It had not, and worse,
>the order of the sites that offered it had been changed. My (free)
>offering of the paper on my home page was now the fourth of the main
>listings (instead of first). Two for-fee offerings were first and third
>and BioOne was second."
>22 Oct 2006
>When I searched Google this afternoon for the butterfly migration paper,
>the for-fee sites that had been No. 1 and No. 3 now occupied main
>entries No. 1 and 2 in the search results. Howerver,my homepage site
>(free) was now No. 3 and the posting on ScholArchive (free) was now No.
>4. BioOne had dropped to No. 5.
>The current ranking is still a disappointment but better than on 9 Oct
>(and worse than 1 Sep).
>[On Google Scholar (beta), the BioOne posting was all that was offered.]
>Thomas J. Walker
>Department of Entomology & Nematology
>PO Box 110620 (or Natural Area Drive)
>University of Florida, Gainesville, FL 32611-0620
>FAX: (352)392-0190
Received on Mon Oct 23 2006 - 17:22:41 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:32 GMT