Re: Increased citation of OA

From: Stevan Harnad
Date: Tue, 1 Apr 2008

On Tue, 1 Apr 2008, Philip Davis wrote:

> We've been conducting a randomized controlled trial of open access
> publishing with 7 publishers in the multidisciplinary sciences, biology,
> medicine, social sciences, and humanities since January 2007.
> The type of methodology we're using (randomized controlled trial) is key
> here since previous observational studies simply assume that
> author-sponsored OA articles are qualitatively similar to
> subscription-based
> articles.

Most prior studies simply compared articles within the same journals and
years that were and were not made OA by being self-archived by their

The ideal study would be one that randomly *imposed* self-archiving on
articles from within the same journals and years, and compared it with
unimposed self-archiving for the same journals and years. This forthcoming
study seems to do only half of this.

A potential problem with assessing the effects of self-archiving
on citations is, of course, the "self": Authors self-select to
self-archive (some authors -- c. 15% -- do it, most don't), and authors
can also self-select which of their papers they self-archive. Hence this
leaves open the possibility that self-archived papers (and authors)
are self-selected to be the better ones. And then the question is:
What proportion of the enhanced citations of self-archived papers occurs
because of OA and what percentage is because of self-selection?

A study that imposes the OA self-archiving randomly could help answer
this question.

But a potential problem of this forthcoming study is time-scale and

The published findings on the higher citations for OA self-archived
articles (e.g. Hajjem et al 2005) are based on hundreds of thousands of
articles, in thousands of journals, across a number of fields, across
a number of years. The effects are always the weakest in the first year
or two after publication (depending on field), before the citations have
had a chance to grow.

During that early period, it is downloads rather than citations
that reflect the OA advantage -- and downloads have been shown to be
correlated with, and predictive of, later citations (Brody et al 2006):

> Preliminary results from 11 journals published by the American
> Physiological
> Society indicate an increase in article downloads, although many of these
> downloads are attributable to indexing robots. The articles are currently
> between 11 and 14 months old and we see no citation advantage. In fact,
> the
> randomly selected OA articles received slightly fewer citations, although
> this result is non-significant.
> Our paper is currently in review and should be made public shortly.

This profile (i.e., no difference) is perfectly compatible with the
conclusion that the sample was too small and the time-span was too
short to have picked up any effects at all. It is comparing apples and
oranges unless there is a control group, in the same journal sample and
year-span, consisting of self-selected, self-archived articles that *do*
show the citation increase whose causes are here being tested.

If an equal-sized sample of self-selected, self-archived articles from
the same 11 journals, over the same 11-14 months, *did* show the citation
increase, whereas the control sample with the self-archiving imposed did
not, then we could make the inference that it is the self-selection that
causes the citation increase.

But with a small sample and a small time-span, and no difference, the
most likely outcome is that neither group would yet show any citation

(Some comparisons might possibly be made with the Eysenbach (2006)
study, which was also based on a small sample sample -- a single very
high-profile journal (PNAS) and about 1500 articles -- and a small
time span. The OA/non-OA citation difference was found surprisingly
early. There were two kinds of "self-archiving": most were done by
PNAS on the (paying) authors' behalf, on the PNAS website; the other
kind was done by (nonpaying) authors, on their own websites (or IRs). The
lion's share of the early OA citation advantage was for the articles
made OA on the PNAS site. But of course both kinds of OA self-archiving
here were self-selected, rather than imposed. And the fact that the OA
advantage was much bigger for the articles "self-archived" on the PNAS
site suggests that the big early effect may have had something to do
with being freely accessible at the much-consulted websites of one of
the highest-citation journals of all.)

> We conclude that the 'citation advantage' so widely promoted in the
> literature is an artifact of other explanatory variables.

These are rather big conclusions to draw from what seems to be a rather
small study (that does not seem to control for the most important
explanatory variable of all, which is unimposed self-selection, in the
same sample and time-interval)!

We are currently conducting a somewhat bigger study, comparing the size
of the citation difference between self-archived and non-self-archived
articles within the same journals and years for the four earliest of the
institutions that mandate self-archiving. A mandate is not a guarantor
that all articles will be self-archived; and mandates have not been
around for that long either; but the prediction would be that if the
self-archiving citation increase were all or mostly due to self-selection,
then mandates should either reduce substantially, or eliminate the
OA/non-OA difference, compared to the unmandated OA/non-OA difference.

Our study compares the size of the self-archived/non-self-archived
separately for mandated and unmandated self-archiving.

Stay tuned.

Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as
Predictors of Later Citation Impact. Journal of the American Association for
Information Science and Technology (JASIST) 57(8) pp. 1060-1072.

Eysenbach, G, (2006) Citation Advantage of Open Access Articles. PLoS
4(5): e157 DOI: 10.1371/journal.pbio.0040157

Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary
Comparison of the Growth of Open Access and How it Increases Research
Impact. IEEE Data Engineering Bulletin 28(4) pp. 39-47.

Stevan Harnad

Philip Davis
> PhD student
> Cornell University, Dept. of Communication
Received on Tue Apr 01 2008

