Re: Scientometric OAI Search Engines

From: Stevan Harnad <>
Date: Thu, 27 May 2004 01:37:17 +0100 (BST)

    Subject Thread:
    "Scientometric OAI Search Engines"

On Wed, 26 May 2004, Michael Leach wrote:

> As we build institutional repositories (IR) and begin the process of
> linking these repositories, we could have the ability to create our own
> impact factors, linking the articles and citations among repositories all
> over the world.

This is not only already possible, but already happening. See:

OpCit: The Open Citation Project providing
Reference Linking and Citation Analysis for Open Archives

Citebase: The Cross-OAI-Archive Citation and Download Ranking Search

Citeseer: The oldest citation engine of them all, operating on harvested
non-OAI articles in computer science archived on arbitrary websites:

and the
Usage/Citation Correlator, which can be used to predict eventual
citations from current downloads:

Many other new forms of digitometric analyses and performance indicators
will emerge as the Open Access Corpus grows.

> Similarly, as IR administrators work with publishers
> (including open access as well as more traditional publishers) to directly
> deposit postprint copies of articles and other digital objects in IRs, the
> new IR-Impact Factors could gain a similar weight to the Thomson/ISI
> Impact Factor. It is likely that the IR-Impact Factor could cover
> literature not currently covered by Thomson/ISI, so while the two Impact
> Factors overlap, they would provide some independent means of assessing a
> journal's or article's impact in a given community.

They can, and already do. Their only limit is the limited size of the OA
corpus so far.

> However, there may be another way to create an "Impact Factor-like"
> statistic to analyze open access materials and other published works.
> With the COUNTER standard and similar e-journal statistical tools, it is
> possible for a variety of libraries to merge their user access statistics
> and produce lists of "most accessed papers" or "most accessed ejournals"
> for given fields.

These are the download statistics that Tim Brody's citebase and
usage/citation correlator already gather. As the OA corpus grows, there
will no doubt be cross-archive arrangements for monitoring, storing and
harvesting download statistics along with citation statistics.

> For instance, the NERL (NorthEast Research Library) Consortium could pool
> their statistics to produce such lists, or perhaps the top research
> institutes in a given field (e.g. MIT, Harvard, Stanford, CalTech, etc. in
> physics) could produce the lists. Granted, this "ranking" would be less
> "scientific" than the current Thomson/ISI Impact Factor, but it may still
> serve the purpose our users and readers want, which is defining quality
> and relevance.

The only handicap OAI digitometrics has over ISI measures is the size
and scope of the OA corpus. There is nothing less "scientific" about it.

> License agreements would have to be adjusted with publishers to include a
> provision for publishing and pooling the statistical data. Open access
> publishers would have to be willing and able to supply such data as well.

If we wait for OA journals to prevail in order to approach 100% OA
coverage we will wait till doomsday. OA self-archiving will prevail far
earlier. I doubt that non-OAI publishers will mind pooling usage data
once OA prevails, perhaps even earlier.

> The debate surrounding open access, in part, resides with quality and
> relevance issues. Waiting five years for an Impact Factor, as IOP's New
> Journal of Physics did, could hinder the process of open access
> acceptance. Creating other measures of quality, such as the "pooled
> statistics/ranking" or IR-Impact Factor model above could provide another
> measure, and an earlier one, for many new publications. With many such
> quality models available, individual readers and authors could pick what
> works best for them in determining quality and relevance.

OA Eprint archives will not only provide early-days metrics and predictors
in the form of download and citation counts for the published final
drafts (postprints), but also for the even earlier-days pre-refereeing

And other, richer digitometric measures will develop too, such as
co-citation statistics (already available with citebase), Google
PageRank-like weightings, but using citations rather than links,
Hub/Authority analysis, co-text semantic analysis, correlation and
prediction, time-series analysis, and much more. All it awaits is the
growth of the Open Access Corpus.

Stevan Harnad


Hitchcock, S. Carr, L., Jiao, Z., Bergmark, D., Hall, W., Lagoze, C. &
Harnad, S. (2000) Developing services for open eprint archives:
globalisation, integration and the impact of links. Proceedings of the
5th ACM Conference on Digital Libraries. San Antonio Texas June 2000.

Harnad, Stevan & Carr, L. (2000) Integrating, Navigating and Analyzing
Eprint Archives Through Open Citation Linking (the OpCit Project).
Current Science 79(5): 629-638.

Harnad, Stevan (2001) "Research access, impact and assessment." Times
Higher Education Supplement 1487: p. 16.

Hitchcock, Steve, Tim Brody, Christopher Gutteridge, Les Carr, Wendy
Hall, Stevan Harnad, Donna Bergmark, Carl Lagoze, Open Citation Linking:
The Way Forward. D-Lib Magazine. Volume 8 Number 10. October 2002.

Hitchcock, Steve; Woukeu, Arouna; Brody, Tim; Carr, Les; Hall, Wendy
and Harnad, Stevan. (2003)
Evaluating Citebase, an open access Web-based citation-ranked search and
impact discovery service

Harnad, S., Carr, L., Brody, T. & Oppenheim, C. (2003) Mandated online
RAE CVs Linked to University Eprint Archives:
Improving the UK Research Assessment Exercise whilst making it cheaper
and easier. Ariadne 35 (April 2003).

Harnad, Stevan (2003) Measuring and Maximising UK Research Impact. Times
Higher Education Supplement. Friday, June 6 2003.
Received on Thu May 27 2004 - 01:37:17 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:28 GMT