Merits of Multiple Post-Publication Metrics Do Not Relegate Peer Review To Generic "Pass/Fail"

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Thu, 23 Jul 2009 17:35:28 -0400

    [ The following text is in the "windows-1252" character set. ]
    [ Your display is set for the "iso-8859-1" character set. ]
    [ Some characters may be displayed incorrectly. ]

[Hyperlinked version of this posting:
http://openaccess.eprints.org/index.php?/archives/612-guid.html]

Patterson, Mark (2009) PLoS Journals ? measuring impact where it
matters https://www.plos.org/cms/node/478/ writes:

"[R]eaders tend to navigate directly to the articles that are relevant
to them, regardless of the journal they were published in... [T]here
is a strong skew in the distribution of citations within a journal ?
typically, around 80% of the citations accrue to 20% of the
articles... [W]hy then do researchers and their paymasters remain
wedded to assessing individual articles by using a metric (the impact
factor) that attempts to measure the average citations to a whole
journal?

"We?d argue that it?s primarily because there has been no strong
alternative. But now alternatives are beginning to emerge... focusing
on articles rather than journals... [and] not confining article-level
metrics to a single indicator... Citations can be counted more
broadly, along with web usage, blog and media coverage, social
bookmarks, expert/community comments and ratings, and so on...

"[J]udgements about impact and relevance can be left almost entirely
to the period after publication. By peer-reviewing submissions purely
for scientific rigour, ethical conduct and proper reporting before
publication, articles can be assessed and published rapidly. Once
articles have joined the published literature, the impact and
relevance of the article can then be determined on the basis of the
activity of the research community as a whole... [through]
[a]rticle-level metrics and indicators..."
Merits of Metrics. Of course direct article and author citation counts
are infinitely preferable to -- and more informative than -- just a
journal average (the journal "impact factor").

And yes, multiple postpublication metrics will be a great help in
navigating, evaluating and analyzing research influence, importance
and impact.

But it is a great mistake to imagine that this implies that peer
review can now be done on just a generic "pass/fail" basis.

Purpose of Peer Review. Not only is peer review dynamic and
interactive -- improving papers before approving them for publication
-- but the planet's 25,000 peer-reviewed journals differ not only in
the subject matter they cover, but also, within a given subject
matter, they differ (often quite substantially) in their respective
quality standards and criteria.

It is extremely unrealistic (and would be highly dysfunctional, if it
were ever made to come true) to suppose that these 25,000 journals are
(or ought to be) flattened to provide a 0/1 pass/fail decision on
publishability at some generic level, common to all refereed research.

Pass/Fail Versus Letter-Grades. Nor is it just a matter of switching
all journals from pass/fail to a letter grade system (A, B+, etc.),
although that is effectively what the system of multiple, independent
peer-reviewed journals provides. For not only do journal peer-review
standards and criteria differ, but the expertise of their respective
"peers" differs too. Better journals have better and more referees,
exercising more rigorous peer review. (So it is not one generic
journal that accepts papers for publication with grades between A+;
rather there are A+ journals, B- journals, etc.)

Track Records and Quality Standards. And users know all this, from the
established track records of journals. Whether we like it or not, this
all boils down to selectivity across a gaussian distribution of
research quality in each field. There are highly selective journals,
that accept only the very best papers, and even those often only after
several rounds of rigorous refereeing, revision and re-refereeing; and
there are less selective journals, that impose less exacting
standards, all the way down to the fuzzy pass/fail threshold that
distinguished refereed journals from journals whose standards are so
low that they are virtually vanity-press journals.

Supplement Versus Substitute. This difference (and independence) among
journals in terms of their quality standards is essential if
peer-review is to serve as the quality enhancer and filter that it is
intended to be. Of course the system is imperfect, and, for just that
reason alone (amongst many others) a rich diversity of
post-publication metrics are an invaluable supplement to peer review.
But they are certainly no substitute for it.

Quality Distribution. On the basis of a generic 0/1 quality threshold,
researchers cannot decide rationally or reliably what new research is
worth the time and investment to read, use and try to build upon.
Researchers differ in quality too, and they are entitled to know a
priori, as they do now, whether or not a newly published work has made
the highest quality cut, rather than merely that it has met some
default standards, and now they must wait for the multiple
post-publication metrics to accumulate in order to be able to have a
more nuanced quality assessment.

Rejection Rates. Minute sorting is precisely what peer review is
about, and for, and especially at the highest quality levels. Although
authors (knowing the quality track-records of their journals) mostly
self-select, submitting their papers to journals whose standards are
roughly commensurate with their quality, the underlying correlate of a
journal's refereeing quality standards is basically their rejection
rate: What percentage of annual papers in their subject matter would
meet their standards (if all were submitted to that journal, and the
only constraint was the quality level of the article, not how many
articles the journal could manage to referee and publish per year)?

Quality Ranges. This independent standard-setting by journals
effectively ranges the 25,000 along a rough letter-grade continuum
within each field, and their "grades" are roughly known by authors and
users, from the journals' track-records for quality.

Quality Differential. Making peer review generic would wipe out that
differential quality information for new research, and force
researchers at all levels to risk pot-luck with newly published
research (until and unless enough time has elapsed to sort out the
rest of the quality variance with post-publication metrics).

Turn-Around Time. Now pre-publication peer review takes time too; but
if it sorts the quality of new publications in terms known, reliable
letter-grade standards (the journals' names and track-records), then
it's time well spent. Offloading that dynamic pre-filtering function
onto post-publication metrics, no matter how rich and plural, would
greatly handicap research progress, and especially at its
all-important highest quality levels.

More Value From Post-Publication Metrics Does Not Entail Less Value
From Pre-Publication Peer Review. It would be ironic if the valid and
timely call for a wider and richer variety of post-publication metrics
-- in place of just the unitary journal average (the journal impact
factor) -- were coupled with an ill-considered call for collapsing the
planet's wide and rich variety of peer-reviewed journals and quality
levels onto a unitary global pass/fail grade.

Harnad, S. (1979) Creative disagreement. The Sciences 19: 18 - 20.

Harnad, S. (ed.) (1982) Peer commentary on peer review: A case study
in scientific quality control, New York: Cambridge University Press.

Harnad, S. (1984) Commentaries, opinions and the growth of scientific
knowledge. American Psychologist 39: 1497 - 1498.

Harnad, Stevan (1985) Rational disagreement in peer review. Science,
Technology and Human Values, 10 p.55-62.

Harnad, S. (1990) Scholarly Skywriting and the Prepublication
Continuum of Scientific Inquiry Psychological Science 1: 342 - 343
(reprinted in Current Contents 45: 9-13, November 11 1991).

Harnad, S. (1986) Policing the Paper Chase. (Review of S. Lock, A
difficult balance: Peer review in biomedical publication.) Nature 322:
24 - 5.

Harnad, S. (1996) Implementing Peer Review on the Net: Scientific
Quality Control in Scholarly Electronic Journals. In: Peek, R. &
Newby, G. (Eds.) Scholarly Publishing: The Electronic Frontier.
Cambridge MA: MIT Press. Pp 103-118.

Harnad, S. (1997) Learned Inquiry and the Net: The Role of Peer
Review, Peer Commentary and Copyright. Learned Publishing 11(4)
283-292.

Harnad, S. (1998/2000/2004) The invisible hand of peer review. Nature
[online] (5 Nov. 1998), Exploit Interactive 5 (2000): and in Shatz, B.
(2004) (ed.) Peer Review: A Critical Inquiry. Rowland & Littlefield.
Pp. 235-242.

Harnad, S. (2008) Validating Research Performance Metrics Against Peer
Rankings. Ethics in Science and Environmental Politics 8 (11) Special
Issue: The Use And Misuse Of Bibliometric Indices In Evaluating
Scholarly Performance

Harnad, S. (2009) Open Access Scientometrics and the UK Research
Assessment Exercise. Scientometrics 79 (1)
Received on Thu Jul 23 2009 - 22:36:17 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:51 GMT