- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>

Date: Sun, 16 Jan 2005 15:01:57 +0000 (GMT)

In the OACI Leiden statement (if there is to be one)

http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4082.html

the following constructive recommendations could perhaps be made:

The 2-year average number of citations to a journal (i.e., the ISI impact

factor) is not meaningless and unpredictive, but merely a needlessly

crude measure of the impact of either an article, an author or a journal.

It can be greatly refined and improved.

Apart from exact citation counts for articles (and authors), and apart

from avoiding the comparison of apples with oranges (by making sure these

measures are used in comparing like with like), there are obvious ways

that even journal impact factors could be made far more accurate and

representative of true research impact.

Right now, "like tends to cite like" in more ways than one! Not only do

articles in phytology tend to cite articles in phytology, but average research

tends to cite average research. This means that there is necessarily a quantitative

citation bulge toward the middle (mean) of the distribution that masks any far more

important qualitative impact from the smaller, higher-quality tail-end of the

distribution.

There are at least five ways that this could be remedied -- and it makes

no sense to wait for ISI, with their primary need to pay more attention

to market matters, to get around to doing all this for us. A growing

Open Access full-text corpus can count on many talented and enterprising

doctoral students like Tim Brody doing this and more:

(1) RECURSIVE "CiteRank": A recursive measure of citation weight could

replace flat citation counting: If article A cites article B, Article

A's citation weight is not 1 but a normalized multiple of 1 based on the

number of citations the *citing* article has itself received. This would

go some way toward replacing the pure weight of numbers by a recursive

measure of the weight of the numbers (without ever yet leaving the

circle of citation counts themselves). Average work will lose some

of its strength-of-numbers unless it manages to draw citations from

above-average articles too (still in terms of citation counts).

[This recursive technique is analogous to Google's PageRank, hence could

perhaps be called "CiteRank"; it is ironic that Google got the idea of

PageRank from citation ranking, but then improved it, yet the improvement

has not yet percolated back to citation ranking, because ISI had no

particular motive to implement it -- perhaps even a disincentive, as it

might reduce the journal impact factor of the large, average journals

which are of necessity ISI's numerical mainstay!]

(2) USAGE COUNTS: The circularity of citation counting can also be broken

in various ways. One is by adding download counts to the impact measure,

not as a weight on the citation count, but as a second variable in a

multiple regression equation. We know now from Tim Brody's findings that

downloads correlate with and hence predict citations. That means citation

counts plus download counts are better predictors of impact than just

citation counts alone, and are especially good at correcting for early

impact, which may not yet be felt in the citation counts.

http://www.ecs.soton.ac.uk/~harnad/Temp/timcorr.doc

(3) RATING SCORES: A more radical way to break out of the circularity of

citation counting can be effected in two ways: Systematic rating polls

can easily be conducted, asking researchers (by field and subfield) to

rank the N most important articles in their field in the past year (or

two). Even with the inevitable noise from incest, bias and subjectivity

that this will evoke, a good-sized systematic sample will still pick out

the recurrent articles (because, by definition, local-average mediocrity

effects/biasses are merely local) and then the rankings could either

be used as (3a) a third independent variable in the impact regression

equation or, perhaps more interestingly, as (3b) another constraint on the

weighting of the CiteRank score (effectively making that weight the result

of a 2nd order regression equation based on the citer's citation count

as well as on the citer's rating score: the download count could also be

used instead as a 3rd component in this 2nd order regression). The result

will be a still better adjustment of the citation count for an article

(and hence an adjustment of the journal's average citation count too).

(4) CO-CITATION & HUB-AUTHORITY SCORES: Although I would need to consult

with a statistician to sort it out optimally, I am certain that

co-citation (what article/author is co-cited with what article/author)

can also be used to correct or add to the impact regression equation. So,

I expect, could a hub (fan-in) and authority (fan-out) score, as well

as a better use of citation latency (ISI's "immediacy factor") in the

impact equation.

(5) AUTHOR/JOURNAL SELF-CITATIONS: Another clean-up factor for citation

counts is of course the correction for self-citations, which would

be interesting not only for author self-citations, but also journal

self-citations: This too might be added as a further pair of variables in

the regression equation (self-citation score and journal self-citation

score), with the weight adjusting itself, as the variable's proves

its predictivity.

The predictivity and validity of the regression equation should of

course also be actively tested and calibrated by validating it against

(a) later citation impact, (b) subjective impact ratings (2, above), (c)

other impact measures such as prizes, funding, and time-line descendants

that are further than one citation-step away (A is cited by B, B is

cited by C: this could be an uncited credit to A...)

And all of this is without even mentioning full-text "semantic" analysis.

So the potential world of impact analysis is a rich and diverse one. Let

us not be parochial, focusing only on the limits of the ISI 2-year

average journal citation-count that has become so mindlessly overused by

libraries and assessors. Let us talk instead about the positive horizons

OA opens up!

Cheers, Stevan

Received on Sun Jan 16 2005 - 15:01:57 GMT

Date: Sun, 16 Jan 2005 15:01:57 +0000 (GMT)

In the OACI Leiden statement (if there is to be one)

http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4082.html

the following constructive recommendations could perhaps be made:

The 2-year average number of citations to a journal (i.e., the ISI impact

factor) is not meaningless and unpredictive, but merely a needlessly

crude measure of the impact of either an article, an author or a journal.

It can be greatly refined and improved.

Apart from exact citation counts for articles (and authors), and apart

from avoiding the comparison of apples with oranges (by making sure these

measures are used in comparing like with like), there are obvious ways

that even journal impact factors could be made far more accurate and

representative of true research impact.

Right now, "like tends to cite like" in more ways than one! Not only do

articles in phytology tend to cite articles in phytology, but average research

tends to cite average research. This means that there is necessarily a quantitative

citation bulge toward the middle (mean) of the distribution that masks any far more

important qualitative impact from the smaller, higher-quality tail-end of the

distribution.

There are at least five ways that this could be remedied -- and it makes

no sense to wait for ISI, with their primary need to pay more attention

to market matters, to get around to doing all this for us. A growing

Open Access full-text corpus can count on many talented and enterprising

doctoral students like Tim Brody doing this and more:

(1) RECURSIVE "CiteRank": A recursive measure of citation weight could

replace flat citation counting: If article A cites article B, Article

A's citation weight is not 1 but a normalized multiple of 1 based on the

number of citations the *citing* article has itself received. This would

go some way toward replacing the pure weight of numbers by a recursive

measure of the weight of the numbers (without ever yet leaving the

circle of citation counts themselves). Average work will lose some

of its strength-of-numbers unless it manages to draw citations from

above-average articles too (still in terms of citation counts).

[This recursive technique is analogous to Google's PageRank, hence could

perhaps be called "CiteRank"; it is ironic that Google got the idea of

PageRank from citation ranking, but then improved it, yet the improvement

has not yet percolated back to citation ranking, because ISI had no

particular motive to implement it -- perhaps even a disincentive, as it

might reduce the journal impact factor of the large, average journals

which are of necessity ISI's numerical mainstay!]

(2) USAGE COUNTS: The circularity of citation counting can also be broken

in various ways. One is by adding download counts to the impact measure,

not as a weight on the citation count, but as a second variable in a

multiple regression equation. We know now from Tim Brody's findings that

downloads correlate with and hence predict citations. That means citation

counts plus download counts are better predictors of impact than just

citation counts alone, and are especially good at correcting for early

impact, which may not yet be felt in the citation counts.

http://www.ecs.soton.ac.uk/~harnad/Temp/timcorr.doc

(3) RATING SCORES: A more radical way to break out of the circularity of

citation counting can be effected in two ways: Systematic rating polls

can easily be conducted, asking researchers (by field and subfield) to

rank the N most important articles in their field in the past year (or

two). Even with the inevitable noise from incest, bias and subjectivity

that this will evoke, a good-sized systematic sample will still pick out

the recurrent articles (because, by definition, local-average mediocrity

effects/biasses are merely local) and then the rankings could either

be used as (3a) a third independent variable in the impact regression

equation or, perhaps more interestingly, as (3b) another constraint on the

weighting of the CiteRank score (effectively making that weight the result

of a 2nd order regression equation based on the citer's citation count

as well as on the citer's rating score: the download count could also be

used instead as a 3rd component in this 2nd order regression). The result

will be a still better adjustment of the citation count for an article

(and hence an adjustment of the journal's average citation count too).

(4) CO-CITATION & HUB-AUTHORITY SCORES: Although I would need to consult

with a statistician to sort it out optimally, I am certain that

co-citation (what article/author is co-cited with what article/author)

can also be used to correct or add to the impact regression equation. So,

I expect, could a hub (fan-in) and authority (fan-out) score, as well

as a better use of citation latency (ISI's "immediacy factor") in the

impact equation.

(5) AUTHOR/JOURNAL SELF-CITATIONS: Another clean-up factor for citation

counts is of course the correction for self-citations, which would

be interesting not only for author self-citations, but also journal

self-citations: This too might be added as a further pair of variables in

the regression equation (self-citation score and journal self-citation

score), with the weight adjusting itself, as the variable's proves

its predictivity.

The predictivity and validity of the regression equation should of

course also be actively tested and calibrated by validating it against

(a) later citation impact, (b) subjective impact ratings (2, above), (c)

other impact measures such as prizes, funding, and time-line descendants

that are further than one citation-step away (A is cited by B, B is

cited by C: this could be an uncited credit to A...)

And all of this is without even mentioning full-text "semantic" analysis.

So the potential world of impact analysis is a rich and diverse one. Let

us not be parochial, focusing only on the limits of the ISI 2-year

average journal citation-count that has become so mindlessly overused by

libraries and assessors. Let us talk instead about the positive horizons

OA opens up!

Cheers, Stevan

Received on Sun Jan 16 2005 - 15:01:57 GMT

*
This archive was generated by hypermail 2.3.0
: Fri Dec 10 2010 - 19:47:45 GMT
*