Re: New Ranking of Central and Institutional Repositories from Arthur Sale on 2008-02-09 (American-Scientist-Open-Access-Forum)

From: Arthur Sale <ahjs_at_ozemail.com.au>
Date: Sun, 10 Feb 2008 08:36:08 +1100

It looks as though the algorithm is the same as for university
websites.

Rank each repository for inward bound hyperlinks (VISIBILITY)
Rank every repository for number of pages (SIZE)
Rank every repository for number of 'interesting' documents eg .doc.
.pdf (RICH FILES)
Rank every repository for number of records returned by a Google
Scholar search (GOOGLE SCHOLAR)
Compute (VISIBILITY x 50%) + (SIZE x 20%) + (RICH FILES x 15%) +
(GOOGLE SCHOLAR x 15%)
And then rank the repositories on this score.

This is a poor measure in general. VISIBILITY (accounts for 50% of
score!) is not necessarily useful for repositories, when harvesting
in more important than hyperlinks. It will be strongly influenced by
staff members linking their publications off a repository search.
Both SIZE and RICH FILES measure absolute size and say nothing about
currency or activity. Some of the higher placed Australian
universities have simply had old stuff dumped in them, and are
relatively inactive in acquiring current material. Activity should be
a major factor in metrics for repositories, and this could easily
measured by a search limited to a year (eg 2007), or by the way ROAR
does it through OAI-PMH harvesting.

Arthur Sale
University of Tasmania

>
> (1) I don't know the Webometrics ranking formula, but it is
> clearly based on multiple weighted parameters, and not merely
> on total number of records (country, size, visibility, rich
> files, "scholar"), otherwise the rank order would have been
> the same as what ROAR gives you if you select "Sort by Total
Records":
> http://roar.eprints.org/?action=home&q=&country=&version=&type
=&order=recordcount&submit=Filter
>
> The Webometrics "Size" parameter seems to be the same as
> ROAR's "Total records" -- except Webometrics so far seems to
> omit PubMedCentral, which would otherwise be the biggest of
> the CRs. I expect that Webometrics'
> coverage and perhaps also their formula is still being
> refined. [They only seem to cover a total of 200 CRs and IRs
> right now.] And of course there is also still the
> not-yet-solved problem of distinguishing the records that are
> full-texts from those that are just metadata, and
> distinguishing OA content from other kinds of deposits. Stay tuned.
> http://trac.eprints.org/projects/iar/wiki/Missing
>
Received on Sat Feb 09 2008 - 22:30:28 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:12 GMT