My problem is that there are many more variables that make this sort of
analysis cry out for better research. For example:
Big numbers (derived any way) often turn out to be related to a great deal
of records that are not backed up by the full-text of an article.
Big numbers are also closely related to the energy with which archival
material is posted. One university in Australia decided to mount a campaign
to centrally post all its government reported research for the last six
years, but of course since the e-text was not available, most of it is in
unhelpful and unsearchable scanned page images in a pdf file. I discount
this very considerably compared to a digitally-born file.
The real measure of top class is the *current acquisition rate*. For
example, how much of the university's research output for 2006 has made it
into the online world by now (March, 2007). Or take 2005 if you quibble
about delays.
Size is actually not important. I'd rather have one 100% university (of
current research) than ten 10% universities of hit-or-miss stuff. Even if
the first was small and the ten large. 
The list takes no account of university size, and we all know that some
small universities beat some large ones in excellence. In my own case, the
University of Tasmania does not publish 19000 or even 3000 articles a year.
Hardly any of the online documents are .ps, and .doc and .ppt generally only
exist alongside the others as non-portable (and deprecated)
Microsoft-specific alternatives. I am afraid that this table simply does not
make sense.
Arthur Sale
University of Tasmania
> -----Original Message-----
> From: American Scientist Open Access Forum
[mailto:AMERICAN-SCIENTIST-OPEN-
> ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG] On Behalf Of Isidro F. Aguillo
> Sent: Wednesday, 21 March 2007 4:41 AM
> To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
> Subject: Re: [AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM] Size of repositories
> 
> Leslie Carr escribió:
> > On 20 Mar 2007, at 14:42, Isidro F. Aguillo wrote:
> >
> >> The thresholds are as follows:
> >>
> >>                  PDF     DOC    PPT      PS    SCHOLAR
> >>
> >> PREMIER LEAGUE   19000    4000    2000    1000     3300
> >>
> >> WORLD CLASS       7000    2000    1000     300     1200
> >>
> >> REGIONAL CLASS    3000     500     300      50      400
> >>
> >> These figures could be used as a reference in repository planning.
> >
> > What are the justifications for your thresholds? Are you saying that
> > a University with an existing Premier League Ranking should have 19K
> > PDFs in its repository? Or are you saying that if you create a league
> > table that is defined by repository size, then the top 200 should all
> > have 19K PDFs in their repositories?
> Dear all:
> 
> The figures should be read AT LEAST for reaching a certain position in
> the Webometrics Ranking and they are including all the files in the
> university web domain, that means self-repositories in personal pages,
> institutional centralized repositories, documents available from public
> access electronic journals, conference and workshops documents and
> slides and academic and non academic reports. There are two main sources
> of error: documents duplicated in two different places under the same
> domain and files not directly related to scholarly communication.
> > Either way I don't know where your figures come from. There are only
> > 16 institutional- or pseudo-institutional repositories in the world
> > with more than 19K records - and that's not filtering out the
> > bibliographic records with no full texts.
> Webometrics Ranking (www.webometrics.info) intends to encourage web
> publication as proposed by the Open Access initiatives and to show the
> commitment of the academic institutions to Web publication. /If the web
> performance of an institution is below the expected position according
> to their academic excellence, university authorities should reconsider
> their web policy, promoting substantial increases in the volume and
> quality of their electronic publications/.
> >
> > --
> > Les Carr
> >
> 
> --
> ***************************************
> Isidro F. Aguillo
> isidro_at_cindoc.csic.es
> Ph:(+34) 91-5635482 ext. 313
> 
> Cybermetrics Lab
> CINDOC-CSIC
> Joaquin Costa, 22
> 28002 Madrid. SPAIN
> 
> http://www.webometrics.info
> http://www.cindoc.csic.es/cybermetrics
> http://internetlab.cindoc.csic.es
> ****************************************
Received on Wed Mar 21 2007 - 03:06:24 GMT