Re: New Ranking of Central and Institutional Repositories

From: Arthur Sale <ahjs_at_ozemail.com.au>
Date: Tue, 12 Feb 2008 10:13:28 +1100

 Isidro

As one of those that contributed to that discussion, may I be more
specific?

The impact of a repository should be measured by things other than
some of the measures that you use. PageRank and Size are both very
weak indicators. I give examples below.

VISIBILITY
Visibility in the way you measure is nothing to do with the purpose
of repositories, and only a minor factor in their impact. Let me give
examples:
 * Inward links to the repository itself are relatively rare, and
    probably negligible in the total. Almost no-one really goes to a
    repository to search its content except locally - its value is in
    federation. The exceptions are (1) central repositories such as
    CERN, RepEc, ArXiv, etc, and (2) exemplar repositories such as
    Southampton and QUT. The component is hugely biased towards these
    repositories.
 * The majority of links to institutional repositories on the Web
    are probably from depositor's home pages, linking to their
    research records. In UTas we will gain 600-1000 such links once
    it is in the standard staff member template. Is this visibility?
    Or does it measure university size?
 * In a few cases, viewers may link to a paper. However to do this
    they have to value the paper significantly, then copy the URL,
    and then post it to a public website or blog. I expect this is a
    minority in the total of links. Any data otherwise? In any case
    it is dependent on an author's importance in the field, not the
    repository value.


REAL VISIBILITY
Real visibility in the case of a repository consists in (a) whether
it provides a compliant OAI-PMH interface, and (b) whether that
interface is harvested by federated services, such as ROAR, OAIster,
etc. One might also add whether the repository is actively harvested
as a flat file or via OAI by Google and Google Scholar, Scopus, or
Thomson. Noithing else really matters in respect of visibility. All
these are measurable. PageRank is irrelevant, sorry.

SIZE
Size is a terrible measure. Australia is full of examples where the
repository has been populated by uploading zillions of old stub
records going back to the 1930s or before. The full text is mostly
missing, though sometimes a grant has funded image scanning of the
document. This is fullness for the sake of fullness. To give one
example in your list, the Australasian Digital Thesis Program has
110,000 records of this type of old PhD theses. The full-text simply
says: contact the university for a photocopy. That's OK, but the
weighting of size ought to be low - less than 20%.

If it is necessary to measure size, and it probably is, then I
suggest a measure that counts the number of records with a
publication date within the last five years. Choose 10 years if you
want, but ancient record-keeping does not translate into impact.

ACTIVITY
It is quite clear from ROAR that deposit activity is a major measure
of impact. There are three easy measures to derive.
 * The number of acquisitions in the last 12 months. Easily
    discovered from the OAI interface.
    The number of acquisitions with a publication date in the last 12
    months. Easily discovered from the OAI interface. This measures
    currency as well as activity.
 * Some repositories are sporadic, some are continuous, the latter
    reflecting a deep-seated integration within the university's
    activity. A simple measure would be to derive a statistic from
    the traffic (see ROAR), such as
     + number of days in last 12 months with a deposit event
     + the Fourier spectrum of the last 12 months deposit events
        having no component with a period longer than 7 days above
        10% (I guess at what is significant and perhaps this can be
        turned into a score).

RICH TEXT
This is a reasonable measure, though subject to error. For example we
sometimes put a full-text that gives instructions on how to ask for
access to the item concerned, or a bio of the creator of an artwork.


DOWNLOADS
I'd love to promote downloads as a measure of impact, but there is as
yet no federated way to access this data.

I'm happy to continue this dialogue.

Arthur Sale
Professor of Computer Science
University of Tasmania

> -----Original Message-----
> From: American Scientist Open Access Forum
> [mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAX
> I.ORG] On Behalf Of Isidro F. Aguillo
> Sent: Monday, 11 February 2008 6:53 PM
> To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
> Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] New
> Ranking of Central and Institutional Repositories
>
> Dear all:
>
> Thanks for your interest in the Ranking of repositories, part
> of our larger effort for rnaking webpresence of universities
> and research centers. A few comments to your messages:
>
> - Currently the Ranking of repositories is a beta version. We
> will thank comments, suggestions and criticisms. Information
> about missed repositories are warmly welcomed. After feedback
> recieved during the last days we are considering a new
> edition before the scheduled one in July.
> - Our rank formula mimic in part PageRank but our
> "inspiration" was in fact impact factor. We maintain a ratio
> 1:1 between visibility (impact) and size (activity) that it
> is the basis of IF. In order to take into account the
> diversity of web info we decide to split the size
> contribution according to additional criteria.
> - Freshness is a topic we are concerned about not only for 
> repositories but for the rest of the rankings too. We are
> considering to take it into account  in the Scholar
> contribution giving more weight to recent publications.
> - There are methodological problems for producing relative
> indicators:
> percentage of global output, or institution size
> normalization. But you know ranking are usually build by GDP
> (US, Japan, Germany,...) and not GDP per capita (Luxembourg,
> United Arab Emirates, ...)
> - Our position as a research group has been previously stated
> but I am going to summarise again: The rankings are made with
> the aim of increase the volume of academic information
> available on the Web, promoting the electronic publication of
> all the activities of the universities, not only the research
> related ones. And specially from developing countries institutions.
>
> Best regards,
>
> Leslie Carr escribió:
> >
> > On 9 Feb 2008, at 21:36, Arthur Sale wrote:
> >
> >> It looks as though the algorithm is the same as for
> university websites.
> >>
> >> Rank each repository for inward bound hyperlinks (VISIBILITY)
Rank
> >> every repository for number of pages (SIZE) Rank every
> repository for
> >> number of 'interesting' documents eg .doc.
> >> .pdf (RICH FILES)
> >> Rank every repository for number of records returned by a Google
> >> Scholar search (GOOGLE SCHOLAR) Compute (VISIBILITY x 50%)
> + (SIZE x
> >> 20%) + (RICH FILES x 15%) + (GOOGLE SCHOLAR x 15%) And
> then rank the
> >> repositories on this score.
> >>
> >> This is a poor measure in general. VISIBILITY (accounts for 50%
of
> >> score!) is not necessarily useful for repositories, when
> harvesting
> >> in more important than hyperlinks. It will be strongly
> influenced by
> >> staff members linking their publications off a repository
search.
> >> Both SIZE and RICH FILES measure absolute size and say
> nothing about
> >> currency or activity. Some of the higher placed Australian
> >> universities have simply had old stuff dumped in them, and are
> >> relatively inactive in acquiring current material.
> Activity should be
> >> a major factor in metrics for repositories, and this could
easily
> >> measured by a search limited to a year (eg 2007), or by
> the way ROAR
> >> does it through OAI-PMH harvesting.
> >>
> > I believe that the Webometrics (ghastly name!) ranking of
> repositories
> > uses the same criteria as its ranking of universities ie it is
> > attempting to quantify the impact that the repository has
> had. This is
> > very different to the size, deposit activity, or even
> used-ness of the
> > repository and explains why the major contributing factor is
> > VISIBILITY. The main issue for this league table is "how
> much evidence
> > is there in the public web that your active research and
scholarly
> > outputs are valued enough by your community of peers that they
are
> > linking to them".
> >
> > This will probably seem entirely arbitrary to some people, and
> > entirely obvious to others, depending on how much they see "the
web"
> > as a para-literature. It mimics Google's PageRank valuation of
web
> > pages according to how many 'votes' (links/quasi-citations)
> they get
> > from other pages from independent sources.
> >
> >  It is not possible to tell with any accuracy whether a
University
> > Website is "a good website" simply by looking at the University's
> > place in the Webometrics Ranking of Universities. The website is
> > simply a channel which delivers visibility-impact for the
> University
> > (or not). Similarly for the repository.
> > --
> > Les Carr
> >
>
> --
> ****************************
> Isidro F. Aguillo
> Laboratorio de Cibermetría
> Cybermetrics Lab
> CCHS - CSIC
> Joaquin Costa, 22
> 28002 Madrid. Spain
>
> isidro _at_ cindoc.csic.es
> +34-91-5635482 ext 313
> ****************************
>
Received on Mon Feb 11 2008 - 23:50:56 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:13 GMT