Re: Manual Evaluation of Algorithm Performance on Identifying OA

From: Stevan Harnad <>
Date: Fri, 20 Jan 2006 13:14:51 +0000 (GMT)

Before anyone gets too excited about the tiny Goodman et al.
test result, may I suggest waiting a couple of weeks, when we will be
reporting the results of a far bigger and more accurate test
of the robot's accuracy?

Those who (for some reason) were hoping that the robot would prove
too inaccurate and that the findings on the OA advantage would prove
invalid may be disappointed with the outcome. I can already say that
overinterpretations of the tiny Goodman et al. test as showing that the
OA/OAA findings to date are "worthless" are rather overstated even on the
meagre evidence to date, especially since two thirds of the published
findings on the OA citation advantage are not even robot-based!.

(This shrillness also seems to me to be trying to make rather much out
of having actually done rather little!)

As to the separate issue of how to treat the OA journal article
counts (as opposed to the counts for the self-archived non-OA
journal articles): We count it all, of course, but only use
the non-OA journal article counts in calculating the OA advantage,
because those are (necessarily) within-journal ratios, and citation
ratios of zero and infinity are meaningless. Think about it.

And as to the (completely independent) question of the multiple
factors that generate the OA citation advantage, see:

    OA Impact Advantage = EA + (AA) + (QB) + QA + (CA) + UA

May I close by noting in passing that it is quite remarkable how
quantities of positive results across the past two years have elicited
no particular theoretical or methodological interest, but one tiny hint
of a possible negative outcome pulls out all the pundits!

Stay tuned.

Stevan Harnad

On Fri, 20 Jan 2006, David Goodman wrote:

> I agree with you, and I now explain why.
> What you are asking for is the implications of the posting, and
> we deliberately did not discuss it there. I now give my
> understanding of the meaning for what we did.
> The measurements we analyzed were of both OA and OAA. The OA
> measurement is significant, because such a number can obviously
> be used as a mark of progress towards OA, and I think both
> supporters and opponents would want to have accurate measurements
> of this. We found, as posted, that for all the sets of previous
> measurements we were trying to verify, the determination was done
> by a robot program which is not able to distinguish accurately
> between OA and non-OA. Thus, none of the %OA data based on that
> robot can be validly used. We did not discuss earlier %OA
> measurements from the same group, because we did not analyze
> them.
> The OA Advantage is not a straightforward measure of anything. It
> is probably meaningless on several levels.
> First, it is based on the robot's correct identification of OA
> and non-OA articles, and the robot can not do this even
> approximately right.
> Second, we have not yet discussed the details of the method by
> which that number is computed, but it may suffice to say that the
> method does not count anything in OA journals, or in any journal
> which included no OA articles whatsoever. { We copied the method
> used in order to have comparative data.) This is not yet
> important, because the robot can't make the correct distinctions
> in any case.
> Third, to the extent that there is any OA advantage or other
> advantage at the journal level, it will always have been taken
> into account by the Impact Factor for the journal
> But you are proposing a more important question. You are asking,
> even if the measurements were correct, even if the OAA number
> were correctly determined, even if it were separated from the
> Impact Factor, what is being measured?
> So, fourth, OAA is composed of many parts. Kurtz has shown best
> how to distinguish some of them in a closed universe of both
> citers and citations, but his analysis by its nature applies only
> in astronomy, because of the distinctive data system astronomers
> use.
> One would expect that publishing more universally available
> journals will result in increasing readings, and that some of the
> readings may eventuate as citations. But of course this cannot be
> predicted for any given article. The quality of the article will
> certainly increase both readership and citations, and authors
> might very appropriately pay to give this increased readership
> for their very best papers. This has been termed the Quality
> Advantage, and I think that appropriate. It again cannot be
> predicted for any individual article.
> One would want to measure this, as a functions of all the
> relevant variables. You have listed several; I suspect there are
> even more. But there does not yet seem any sound way of
> distinguishing the variables or making these measurements. I
> agree that it seems necessary to use controlled samples, and I
> doubt this can be done.
> For example, in 1953, Watson & Crick published two papers in
> Nature, the first giving the structure of DNA, the second in the
> next issue, giving the implication of the structure. The first
> one has been cited more than twice the second. To me, this
> reduces the likelihood of finding true matched pairs of articles
> for a controlled measurement.
> Perhaps someone more clever than I will find a way. But this is
> all hypothetical until the basic determinations can be carried
> out correctly.
> I am curious about whether there shall be any forthcoming papers
> or postings using these results without at least a warning of
> their probable invalidity.
> In conclusion, "OA can increase the impact of an article by >
> 50-250%!" is indeed not a good argument. The instruments used are
> worthless. The calculation used does not include all the data,
> however determined. The impact is a function of many variables,
> which have not been distinguished.
> There are many real reasons why all authors should publish OA,
> such as the public appreciation and interest gained by the easy
> access to the material, the greater availability to students at
> small institutions, and scientists' basic responsibility to
> publish in the widest practical manner.
> Dr. David Goodman (and Nisa Bakkalbasi and Kristin Antelman)
> Palmer School of Library and Information Science
> Long Island University
> and formerly
> Princeton University Library
> ----- Original Message -----
> From: Phil Davis <>
> Date: Thursday, January 19, 2006 7:16 pm
> To:
> > David,
> >
> > Your work on the validity of using automated robots to detect OA
> > articles is very important if we are to better understand the
> > effect of Open Access on article impact. Many of us appreciate
> > your work on this topic.
> >
> > The thing that I'm troubled by is that the term "Open Access
> > Advantage" is both loaded and vague. There are many different
> > types of open access (full, partial, delayed, open to developing
> > countries, publisher archived, self-archived, institutional
> > archived, etc.), so to talk about an Open Access Advantage gives
> > credit to many different publishing and access models at the same
> > time.
> >
> > Because studies of this sort are not controlled experiments, this
> > means that the best a researcher can do is come up with likely
> > causal explanations, and hope to rule others out. In the case of
> > higher citations for articles found as OA, article quality and
> > early notification are possible explanations, as are editor
> > selection bias, self-selection bias, and the Matthew Effect.
> > Hundreds of Emerald articles that were republished among two or
> > more journals demonstrates that simple article duplication can
> > also explain increased impact. All of these may be partial
> > causes that explain higher impact (the "Open Access Advantage"),
> > yet they are not limited to Open Access.
> >
> > As a consequence, I am worried about people making unqualified
> > statements like "OA can increase the impact of an article by
> > 50-250%!" The answer may be more like, "author republishing
> > (online and in print) may increase citation impact in some
> > fields, especially among highly prestigious journals and
> > authors". Although this is not as simple as declaring that Open
> > Access increases citation impact, it may be much more precise.
> >
> > --Phil Davis
Received on Fri Jan 20 2006 - 13:30:44 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:11 GMT