Re: access to self archive via google scholar

From: Arthur Sale <>
Date: Mon, 23 Oct 2006 09:49:02 +1100

When reading this posting from Thomas J Walker, please remember the

1 Google's bot will visit probably only once a month or so. You can't
expect instant results.

2 Page rankings based on links take time to develop, as they are the result
of multiple harvesting of many sites.

3 Good repository software like Eprints has deliberately designed features
which allow Google (and other bots) to penetrate into the repository and
index the pdfs. These include a 'browse facility' (bots can't search) and
avoidance of the type of pages that go 'first 20', 'next 20', etc etc etc
which cause most bots give up through excessive depth.

4 If it can't be found by a bot, it doesn't exist on a search engine.

5 Google indexes pdfs. It first converts them to html. Most bots don't.

Arthur Sale

> -----Original Message-----
> From: American Scientist Open Access Forum
> Sent: Monday, 23 October 2006 8:42 AM
> Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] access to self archive
> google scholar
> Those starting an IR should not expect Google to quickly harvest or to
> logically rank the journal articles posted on the new IR. This is based
> on my recent experience with helping the Florida Center for Library
> Automation (FCLA) in their efforts to test the IR waters with
> ScholArchive ( ), a pilot IR that is focused on
> the scholarly output of the faculty and graduate students of the
> University of Florida's Department of Entomology and Nematology.
> ScholArchive (using E-print software) went on line 28 July 2006 with 7
> posted articles. Here are excerpts from five emails relevant to their
> harvesting and ranking by Google and a report of today's results:
> 1 Aug 2006 (from ScholArchive Administrator)
> "I have registered us with Google, Google Scholar, SCIRUS, ROAR, DOAR,
> etc. so we should be indexed very soon by lots of search engines,
> hopefully."
> 1 Sep 2006 (from ScholArchive Administrator)
> "I have been monitoring Google Scholar, Google and other discovery sites
> for the past 5+weeks since your papers were loaded, with the same
> disappointing results, even though I registered ScholArchive with these
> sites."
> 1 Sep 2006 (from Tom Walker to ScholArchive staff)
> "This is disappointing because faculty will be more likely to post their
> journal articles in ScholArchive IF we can show that doing so will
> significantly help Google users find openly accessible full text of the
> articles.
> To illustrate how this might prove to be the case, consider my 2001
> Environmental Entomology article entitled "Butterfly migrations in
> Florida: seasonal patterns and long-term changes." This morning I
> entered "butterfly migrations in Florida" as a Google search phrase and
> got 36 hits (under 11 main listings). Here are the first six main
> listings:
> 1. My personal web site. [A click on Google's listing loaded the PDF
> file of the article.]
> 2. BioOne. [A click on the listing loaded the abstract, but without a
> BioOne license the full text would be inaccessible.]
> 3. Ingenta Connect [A click led to a page with the abstract and a chance
> to pay $25 for access to the full text.]
> 4. TX-BUTTERFLY archives. [A click led to a bibliographic entry that had
> a dead link to the PDF file of the article. (My web site's URL was
> changed a few years ago)]
> 5. Journal of the Lepidopterists' Society [A couple of clicks led to a
> 1993 article on trapping migrating butterflies.]
> 6. The Entomological Society of America Journals Online [A click led to
> the TOC of the issue, another click led to the abstract, and a third led
> to the PDF file. But unless someone knew that I had paid ESA to provide
> OA for my article, who would have thought that free access to the PDF
> file would have been found here?]
> BOTTOM LINE: Had I not posted the PDF file on my Web site, very few
> would have found free access to the article's full text. Thus it is
> important to know how Google will rank the ScholArchive posting.
> Incidentally, I ran the same search in Google Scholar BETA and got only
> one hit-the same as no. 2 above!"
> 20 Sep 2006 (from ScholArchive Administrator)
> "As it turns out, Google is indeed indexing our site, but only the
> top-level pages, not the papers inside the repository. I am working on
> how this can be changed."
> 9 Oct 2006 (from Tom Walker to ScholArchive staff)
> "Yesterday I checked Google to see if the ScholArchive version of my
> butterfly migration paper had been harvested. It had not, and worse,
> the order of the sites that offered it had been changed. My (free)
> offering of the paper on my home page was now the fourth of the main
> listings (instead of first). Two for-fee offerings were first and third
> and BioOne was second."
> 22 Oct 2006
> When I searched Google this afternoon for the butterfly migration paper,
> the for-fee sites that had been No. 1 and No. 3 now occupied main
> entries No. 1 and 2 in the search results. Howerver,my homepage site
> (free) was now No. 3 and the posting on ScholArchive (free) was now No.
> 4. BioOne had dropped to No. 5.
> The current ranking is still a disappointment but better than on 9 Oct
> (and worse than 1 Sep).
> [On Google Scholar (beta), the BioOne posting was all that was offered.]
> Tom
> ====================================
> Thomas J. Walker
> Department of Entomology & Nematology
> PO Box 110620 (or Natural Area Drive)
> University of Florida, Gainesville, FL 32611-0620
> E-mail:
> FAX: (352)392-0190
> Web:
> ====================================
> -----Original Message-----
> From: American Scientist Open Access Forum
> On
> Behalf Of Donat Agosti
> Sent: Saturday, October 21, 2006 1:54 AM
> Subject: access to self archive via google scholar
> What would it need that self archives could be indexed by google
> scholar, so that those articles could be found
> search for example for
> "viaticus was tridecane"
> Then you end up in this paper
> G=Se
> arch
> It is it the original paper, which is copyrighted, and there is not hint
> that the paper is actually also on ZORA open access.
> how+
> full+item+record
> Ideally, it should show up, since then it would be more often used
> Donat
> Dr. Donat Agosti
> Science Consultant
> Research Associate, American Museum of Natural History and Naturmuseum
> der Burgergemeinde Bern
> Email:
> Web:
> Blog:
> Skype: agostileu
> CV
> Current Location
> Dalmaziquai 45
> 3005 Bern
> Switzerland
> +41-31-351 7152
Received on Mon Oct 23 2006 - 11:05:20 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:32 GMT