Re: Open Access Speeds Use by Others

From: Stevan Harnad <>
Date: Fri, 26 May 2006 03:12:59 +0100

As Eysenbach's very long and remarkably intemperate response on his blog
to my own prior response (on my blog)
mainly repeats prior points, I will respond only to his very few
substantive points:

I had asked: "Does [Eysenbach] seriously think that partialling out the
variance in the number of authors would make a dent in that huge,
consistent effect [the within-journal citation advantage for
self-archived articles]?"

> GE: "the answer is "absolutely"... If high-author papers are
> overrepresented in self-archived papers, then this confounder alone
> will contribute to having a greater number of citations... Only if one
> statistically controls for all these confounders (there are several
> of them - see PLoS paper), and one STILL sees an open access citation
> advantage, then (and only then) one has a SOLID, defendable study."

Here is Eysenbach's list of confounders :

(1) number of authors: As Eysenbach says he is serious, we will now test
this. We have the data. Eysenbach's prediction is that partialling out
the effect of the number of authors will make a dent in our huge,
consistent citation advantage. Stay tuned...

(2) number of days since publication: This is relevant and feasible in a
1-year, 1-journal study like Eysenbach's but neither relevant nor
feasible for a sample of over a million articles ranging over 12 years,
12 disciplines, and hundreds of journals -- all showing exactly the same
citation advantage for self-archived articles in every year and every

(3) article type: We are able to test this separately too (because we
have ISI data on article type) but first let's see whether partialling
out author numbers makes a dent in our basic effect.

(4) country of the corresponding author: This is testable too, but first
let's see how the author-number "confounder" pans out (we could look at
the first-author's birth-sign too...).

(5) funding type: Data not available, and extremely far-fetched.

(6) subject area: Already tested and reported in our data, separately
for 12 different disciplines : the self-archiving advantage is
consistently present in all of them.

(7) submission track (PNAS has three different ways that authors can
submit a paper): Not relevant to the journals we tested, which were all
non-OA and pre-dated Open Choice.

(8) previous citation record of the first and last authors: This, as I
noted, is -- along with the demonstration of how early the OA advantage
emerges in PNAS -- a potentially interesting variable in the fine-tuning
of the OA advantage, but our own studies are concerned with estimating
the generality and size of the OA advantage, not with its fine tuning.

(9) whether authors choosing the OA option in PNAS chose to do so for
only their most important research ("they didn't"): Neither Eysenbach's
study nor ours can confirm causality or eliminate the possibility of
self-selection bias.

> GE: "the fact that we look at a immediate (gold-)OA article population
> in a longitudinal cohort study design takes care of the "arrow of
> causation" problem, because it makes sure that open access status comes
> first, then the citations are coming, not the other way round.

I'm afraid it's not quite that easy to take care of the "arrow of
causation" problem, which is confounded (sic) with the problem of
self-selection bias: For if authors are (contrary to their subjective
reports) indeed self-selecting their better papers for OA-gold (or for
self-archiving) then that, and not the OA, could explain why their
papers get more citations.

> GE: "it is entirely possible that the articles in his sample (which
> he refers to as green-OA articles) were not "immediately" self-archived
> after publication, but 1 month, 6 months, or 12 months after original
> publication, therefore not really what Harnad refers to as green-OA,
> implying "immediate" deposition."

This is actually a valid point of definition: OA should be defined as
"immediate" in order to rule out claims that delayed/embargoed access is
Open Access. The point at which refereed research can and should begin
to be used is when the final refereed draft is accepted for publication,
and that is the point when it should be made freely accessible online. So
a portion of the citation advantage for self-archived articles could
well have come from self-archiving later than the publication date;
technically speaking this should be called a "free access" advantage,
if we reserve OA for access that is free immediately. But surely nothing
of substance rides on this: If there is a self-archiving advantage even
for tardy self-archiving, that confirms, a fortiori, the self-archiving
advantage of immediate (OA) self archiving too!

> GE: "I... made a conscientious decision to submit my paper to a
> gold-OA journal (PLoS) rather than publishing the study in an obscure
> scientrometrics journal and then self-archived [sic] it"

Actually, unless I am mistaken, I seem to recall correspondence from GE
to the effect that it was first declined by Science (or was it Nature?)
-- not a gold-OA journal -- before being submitted to PloS Biology)"

> GE: "The visibility of an article published in a properly promoted OA
> journal site will always be better than a paper that is published in a
> toll-access journal site, even if it is self-archived. This is exactly
> why my study shows an advantage of gold-OA over green-OA, this is also
> why I personally chose the gold route to publish this paper in PLoS,
> and not the green route"

Let us not confound a journal's profile/impact level with its OA/non-OA

The visibility (and no doubt also the citation impact) of an article will
always be better when it is published in a high-profile, high-impact
journal, whether it is OA (like PloS) or non-OA (like Science or
Nature) rather than an obscure scientometrics journal (or an obscure OA
journal). Its visibility and impact will be higher if self-archived in
either case (except perhaps if the journal is both high-profile and OA,
which is partly what Eysenbach's study has shown).

> GE: "the PLoS paper is the first study which contains an analysis of
> both gold and green (thus focuses on "OA itself"), whereas the rest of
> the studies is actually focused on "green"".

Because most of the existing data for within-journal OA/non-OA
comparisons come from the millions of articles published in the
thousands of non-gold journals indexed by ISI and not just the thousands
of articles published in the few journals that are as yet (like PNAS)

Stevan Harnad
Received on Fri May 26 2006 - 03:18:42 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:21 GMT