Enriching the Impact Regression Equation

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Sun, 16 Jan 2005 15:01:57 +0000 (GMT)

In the OACI Leiden statement (if there is to be one)
the following constructive recommendations could perhaps be made:

The 2-year average number of citations to a journal (i.e., the ISI impact
factor) is not meaningless and unpredictive, but merely a needlessly
crude measure of the impact of either an article, an author or a journal.
It can be greatly refined and improved.

Apart from exact citation counts for articles (and authors), and apart
from avoiding the comparison of apples with oranges (by making sure these
measures are used in comparing like with like), there are obvious ways
that even journal impact factors could be made far more accurate and
representative of true research impact.

Right now, "like tends to cite like" in more ways than one! Not only do
articles in phytology tend to cite articles in phytology, but average research
tends to cite average research. This means that there is necessarily a quantitative
citation bulge toward the middle (mean) of the distribution that masks any far more
important qualitative impact from the smaller, higher-quality tail-end of the

There are at least five ways that this could be remedied -- and it makes
no sense to wait for ISI, with their primary need to pay more attention
to market matters, to get around to doing all this for us. A growing
Open Access full-text corpus can count on many talented and enterprising
doctoral students like Tim Brody doing this and more:

(1) RECURSIVE "CiteRank": A recursive measure of citation weight could
replace flat citation counting: If article A cites article B, Article
A's citation weight is not 1 but a normalized multiple of 1 based on the
number of citations the *citing* article has itself received. This would
go some way toward replacing the pure weight of numbers by a recursive
measure of the weight of the numbers (without ever yet leaving the
circle of citation counts themselves). Average work will lose some
of its strength-of-numbers unless it manages to draw citations from
above-average articles too (still in terms of citation counts).

[This recursive technique is analogous to Google's PageRank, hence could
perhaps be called "CiteRank"; it is ironic that Google got the idea of
PageRank from citation ranking, but then improved it, yet the improvement
has not yet percolated back to citation ranking, because ISI had no
particular motive to implement it -- perhaps even a disincentive, as it
might reduce the journal impact factor of the large, average journals
which are of necessity ISI's numerical mainstay!]

(2) USAGE COUNTS: The circularity of citation counting can also be broken
in various ways. One is by adding download counts to the impact measure,
not as a weight on the citation count, but as a second variable in a
multiple regression equation. We know now from Tim Brody's findings that
downloads correlate with and hence predict citations. That means citation
counts plus download counts are better predictors of impact than just
citation counts alone, and are especially good at correcting for early
impact, which may not yet be felt in the citation counts.

(3) RATING SCORES: A more radical way to break out of the circularity of
citation counting can be effected in two ways: Systematic rating polls
can easily be conducted, asking researchers (by field and subfield) to
rank the N most important articles in their field in the past year (or
two). Even with the inevitable noise from incest, bias and subjectivity
that this will evoke, a good-sized systematic sample will still pick out
the recurrent articles (because, by definition, local-average mediocrity
effects/biasses are merely local) and then the rankings could either
be used as (3a) a third independent variable in the impact regression
equation or, perhaps more interestingly, as (3b) another constraint on the
weighting of the CiteRank score (effectively making that weight the result
of a 2nd order regression equation based on the citer's citation count
as well as on the citer's rating score: the download count could also be
used instead as a 3rd component in this 2nd order regression). The result
will be a still better adjustment of the citation count for an article
(and hence an adjustment of the journal's average citation count too).

(4) CO-CITATION & HUB-AUTHORITY SCORES: Although I would need to consult
with a statistician to sort it out optimally, I am certain that
co-citation (what article/author is co-cited with what article/author)
can also be used to correct or add to the impact regression equation. So,
I expect, could a hub (fan-in) and authority (fan-out) score, as well
as a better use of citation latency (ISI's "immediacy factor") in the
impact equation.

(5) AUTHOR/JOURNAL SELF-CITATIONS: Another clean-up factor for citation
counts is of course the correction for self-citations, which would
be interesting not only for author self-citations, but also journal
self-citations: This too might be added as a further pair of variables in
the regression equation (self-citation score and journal self-citation
score), with the weight adjusting itself, as the variable's proves
its predictivity.

The predictivity and validity of the regression equation should of
course also be actively tested and calibrated by validating it against
(a) later citation impact, (b) subjective impact ratings (2, above), (c)
other impact measures such as prizes, funding, and time-line descendants
that are further than one citation-step away (A is cited by B, B is
cited by C: this could be an uncited credit to A...)

And all of this is without even mentioning full-text "semantic" analysis.
So the potential world of impact analysis is a rich and diverse one. Let
us not be parochial, focusing only on the limits of the ISI 2-year
average journal citation-count that has become so mindlessly overused by
libraries and assessors. Let us talk instead about the positive horizons
OA opens up!

Cheers, Stevan
Received on Sun Jan 16 2005 - 15:01:57 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:45 GMT