Scientometric OAI Search Engines

From: Stevan Harnad <>
Date: Sun, 25 Aug 2002 15:28:55 +0100

Something revolutionary is in the making in the form of scientometric
OAI search engines.

Citebase is a prototype OAI service now available
(free, of course) to give research authors, users, their institutions
and their research-funders a foretaste of what is coming and what
is possible.

Citebase has just been incorporated as an experimental feature for
all users of the Physics Archive -- the largest and most heavily used Eprint Archive to date.

We are hoping that by demonstrating the remarkable possibilities that a
full-text citation-linked open-access corpus opens up, citebase will
help to accelerate the rate at which the refereed research
literature is made openly accessible online through institutional

The mother of all hyperlinks is bibliographic
citation. Google's spectacularly successful system of
ranking digital content by the number of incoming links is simply
a generalized case of the pre-existing scholarly practice of following
the links that authors provide by citing their references. The
number of incoming reference links has long been used for ranking
scholarly/scientific content by, for example, the Institute for
Scientific Information (ISI) in the form of the citation impact factor

Here are the elements in the chain:

    (1) A manuscript (preprint) is submitted to a journal for evaluation
    in the form of peer review.

    (2) The manuscript is peer-reviewed, revised and, if successful,
    published under the journal's name, certifying that it has met that
    journal's quality standards.

    (3) The journal's name and track record for quality is then used by
    researchers, research-funders, and the author's own institution as
    one of the guides in evaluating whether the work should be read,
    used, cited and further funded, and whether the author should be
    rewarded through salary increases, promotion, or prizes.

    (4) In addition to the journal-name's established reputation for
    peer-review standards, its "citation impact factor" (the average
    number of citation links to its articles from other articles) is
    used as an evaluative guide by potential users and funders.

    (5) Articles and authors can also be evaluated and ranked, not just
    by the name-brand and citation impact of the journal in which they
    appear, but by the individual citation impact of each individual
    article and/or author.

    (6) Journal reputations and journal/article/author citation
    impacts can also be supplemented by evaluations in review
    articles and commentaries and by various forms of
    promotion and self-promotion by journals, authors,
    alerting services, and the public press (although these
    evaluations themselves would need to be evaluated, if
    they were not simply to be counted as further citations).

    (7) A new potential measure of on-line impact, not available in the
    on-paper era, is usage, in the form of "hits." This measure is noisy
    (it can be inflated by automated web-crawlers, short-changed by
    intermediate caches, abused by deliberate self-hits from authors,
    and undiscriminating between nonspecific site-browsing and
    item-specific reading) yet it seems to have some signal-value too,
    partly correlated with and partly independent of citation impact:

    (8) Nor do citations and hits exhaust the potential of online
    performance indicators. They are just the beginning of a wealth of
    potential scientometric guides to users and evaluators, including
    co-citation analysis, time-series analysis, and other potentially
    predictive analyses of correlations and trends among citations, hits,
    and even articles' content-words that will no doubt be invented and
    discovered as more of this corpus comes online.

So try out citebase, and don't forget to supplement your experience with
your imagination:

    (a) Citebase content right now is preponderantly in physics,
    mathematics and computer science. Imagine what it would be like if
    the full-text open-access content
    were up there in all the other disciplines too. (And remember
    that getting it up there depends on -- and waits on -- only

    (b) Notice how natural and useful it feels to navigate the literature
    via citation links, guided by author or article ranking in terms of
    citation impact or hit impact. Imagine how much more useful it will
    feel when all the research literature is up there, gap-free, and
    spawns still newer and more powerful online scientometric guides.

Stevan Harnad

Harnad, S. (2001) "Research access, impact and assessment." Times Higher
Education Supplement 1487: p. 16.

NOTE: A complete archive of the ongoing discussion of providing open
access to the refereed journal literature online is available at the
American Scientist September Forum (98 & 99 & 00 & 01):

Discussion can be posted to:

See also the Budapest Open Access Initiative:

and the Free Online Scholarship Movement:
Received on Sun Aug 25 2002 - 15:28:55 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:37 GMT