Re: Ranking Web of Repositories: July 2010 Edition

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Thu, 8 Jul 2010 12:34:15 -0400

On 2010-07-08, at 4:43 AM, Isidro F. Aguillo wrote:

> Dear Hélène:
>
> Thank you for your message, but I disagree with your proposal. We are not measuring only contents but contents AND visibility in the web.

Dear Isidro,

If I may intervene with some comments too, as this discussion has some wider implications:

Yes, you are measuring both contents and visibility, but presumably you want the difference between (1) the ranking of the top 800 repositories and (2) the ranking of the top 800 *institutional* repositories to be based on the fact that the latter are institutional repositories whereas the former are all repositories (central, i.e., multi-institutional, as well as institutional).

Moreover, if you list redundant repositories (some being the proper subsets of others) in the very same ranking, it seems to me the meaning of the ranking becomes rather vague.

> Certainly HyperHAL covers the contents of all its participants, but the impact of these contents depends of other factors. Probably researchers prefer to link to the paper in INRIA because of the prestige of this institution, the affiliation of the author or the marketing of their institutional repository.

All true, but perhaps the significance and usefulness of the rankings would be greater if you either changed the weight of the factors (volume of full-text content, number of links) or, alternatively, you designed the rankings so the user could select and weight the criteria on which the rankings are displayed.

Otherwise your weightings become like the "h-index" -- an a-priori combination of untested, unvalidated weights that many users may not be satisfied with, or fully informed by...

> But here is a more important aspect. If I were the president of INRIA I will prefer people using my institutional repository instead CCSD. No problem with the last one, they are makinng a great job and increasing the reach of INRIA, but the papers deposited are a very important (the most important?) asset of INRIA.

But how much INRIA papers are linked, downloaded and cited is not necessarily (or even probably) a function of their direct locus!

What is important for INRIA (and all institutions) is that as much as possible of their paper output should be OA, simpliciter, so that it can be linked, downloaded, read, applied, used and cited. It is entirely secondary, for INRIA (and all institutions), *where* their papers are OA, compared to the necessary condition *that* they are OA (and hence freely accessible, usaeble, harvestable).

Hence (in my view) by far the most important ranking factor for institutional repositories is how much of their full-text institutional paper output is indeed deposited and OA. INRIA would have no reason to be disappointed if the locus from which its content is searched, retrieved and linked is some other, multi-institutional harvester. INRIA still gets the credit and benefits from all the links, downloads and citations of INRIA content!

(Having said that, locus of deposit *does* matter, very much, for deposit mandates, Deposit mandates are necessary in order to generate OA content. And, for strategic reasons that are elaborated in my reply to Chris Armbruster, it makes a big practical difference for success in agreeing on the adoption of a mandate that both institutional and funder mandates should require convergent *institutional* deposit, rather than divergent and competing institutional vs. institution-extermal deposit. Here too, your repository rankings would be much more helpful and informative if they gave a greater weight to the relative size of each institutional repository's content and eliminated multi-institutional repositories from the institutional repository rankings -- or at least allowed institutional repositories to be ranked independently on content vs links.

I think you are perhaps being misled here by the analogy with your sister rankings http://www.webometrics.info/ RWWU of universities rather than their repositories In university rankings, the links to the university site itself matter a lot. But in repository rankings links matter much less than *how much institutional content is accessible*. For the degree of usage of that content, harvester sites may be more relevant measures, and, after all, downloads and citations, unlike links, carry their credits (to the authors and institutions) with them no matter where the transaction happens to occur...

> Regarding the other comments we are going to correct those with mistakes but it is very difficult for us to realize that Virginia Tech University is "faking" its institutional repository with contents authored by external scholars.

I have called Gail McMillan at Virginia Tech about this, and she has explained it to me. The question was never whether Virginia Tech was "faking"! They simply host content over and above Virginia Tech content -- for example, OA journals whose content originates from other institutions.

As such, the Virginia Tech repository, besides providing access to Virgina Tech content, is also conduit or portal for accessing the content of those other institutions. The "credit" for providing the conduit, goes to Virginia Tech, of course. But the credit for the links, usage and citations goes to those other institutions! (When an institutional repository is also used as a portal for other institutions, its function becomes a hybrid one -- both an aggregator and a provider. I think it's far more useful and important to try to keep those functions separate, in both the rankings and the weightings.

Best wishes,

Stevan

> El 07/07/2010 23:03, Hélène.Bosc escribió:
>> Isidro,
>> Thank you for your Ranking Web of World Repositories and for informing us about the best quality repositories!
>>
>>
>> Being French, I am delighted to see HAL so well ranked and I take this opportunity to congratulate Franck Laloe for having set up such a good national repository as well as the CCSD team for continuing to maintain and improve it.
>>
>> Nevertheless, there is a problem in your ranking that I have already had occasion to point out to you in private messages.
>> May I remind you that:
>>
>> Correction for the top 800 ranking:
>>
>>
>> The ranking should either index HyperHAL alone, or index both HAL/INRIA and HAL/SHS, but not all three repositories at the same time: HyperHAL includes both HAL/INRIA and HAL/SHS .
>>
>> Correction for the ranking of institutional repositories:
>>
>>
>> Not only does HyperHAL (#1) include both HAL/INRIA (#3) and HAL/SHS (#5), as noted above, but HyperHAL is a multidisciplinary repository, intended to collect all French research output, across all institutions. Hence it should not be classified and ranked against individual institutional repositories but as a national, central repository. Indeed, even HAL/SHS is multi-institutional in the usual sense of the word: single universities or research institutions. The classification is perhaps being misled by the polysemous use of the word "institution."
>>
>>
>> Not to seem to be biassed against my homeland, I would also point out that, among the top 10 of the top 800 "institutional repositories," CERN (#2) is to a certain extent hosting multi-institutional output too, and is hence not strictly comparable to true single-institution repositories. In addition, "California Institute of Technology Online Archive of California" (#9) is misnamed -- it is the Online Archive of California http://www.oac.cdlib.org/ (CDLIB, not CalTech) and as such it too is multi-institutional. And Digital Library and Archives Virginia Tech University (#4) may also be anomalous, as it includes the archives of electronic journals with multi-institutional content. Most of the multi-institutional anomalies in the "Top 800 Institutional" seem to be among the top 10 -- as one would expect if multiple institutional content is inflating the apparent size of a repository. Beyond the top 10 or so, the repositories look to be mostly true institutional ones.
>>
>>
>> I hope that this will help in improving the next release of your increasingly useful ranking!
>>
>>
>> Best wishes
>> Hélène Bosc
>>
>> ----- Original Message ----- From: "Stevan Harnad" <harnad_at_ECS.SOTON.AC.UK>
>> To: <AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG>
>> Sent: Tuesday, July 06, 2010 6:07 PM
>> Subject: Fwd: Ranking Web of Repositories: July 2010 Edition
>>
>>
>>
>> Begin forwarded message:
>>
>> From: "Isidro F. Aguillo" <isidro.aguillo_at_CCHS.CSIC.ES>
>> Date: July 6, 2010 11:13:58 AM EDT
>> To: SIGMETRICS_at_listserv.utk.edu
>> Subject: [SIGMETRICS] Ranking Web of Repositories: July 2010 Edition
>>
>> Ranking Web of Repositories: July 2010 Edition
>>
>> The second edition of 2010 Ranking Web of Repositories has been published the same day OR2010 started here in Madrid. The ranking is available from the following URL:
>>
>> http://repositories.webometrics.info/
>>
>> The main novelty is the substantial increase in the number of repositories analyzed (close to 1000). The Top 800 are ranked according to their web presence and visibility. As usual thematic repositories (CiteSeer, RePEc, Arxiv) leads the Ranking, but the French research institutes (CNRS, INRIA, SHS) using HAL are very close. Two issues have changed from previous editions from a methodologicall point of view:, the use of Bing's engine data has been discarded due to irregularities in the figures obtained and MS Excel files has been excluded again.
>>
>> At the end of July the new edition of the Rankings of universities, research centers and hospitals will be published.
>>
>> Comments, suggestions and additional information are greatly appreciated.
>>
>
>
> --
> ===========================
>
> Isidro F. Aguillo, HonPhD
> Cybermetrics Lab (3C1)
> IPP-CCHS-CSIC
> Albasanz, 26-28
> 28037 Madrid. Spain
>
>
> Editor of the Rankings Web
> ===========================
Received on Thu Jul 08 2010 - 17:42:06 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:50:11 GMT