Re: On Self-Selection Bias In Publisher Anti-Open-Access Lobbying

From: Stevan Harnad <amsciforum_at_GMAIL.COM>
Date: Wed, 18 Nov 2009 14:49:33 -0500

On Wed, Nov 18, 2009 at 1:15 PM, Pippa Smart wrote:

> Citation and impact are not easy to quantify as different studies have shown
> and therefore should not form the basis for arguing in favour of open
> access.

Citations (and downloads) are countable, and are counted. Research
usage and impact is not *synonymous* with citations or downloads, but
citations and downloads are certainly *measures* of research usage and

If OA generates more citations (and downloads), that is most
definitely a compelling basis for arguing in favor of OA.

(Indeed, no argument for OA could be more compelling than that OA
increases research impact; certainly not the argument that OA makes
journals more affordable, nor the argument that it makes accessible to
the general public peer-reviewed research journal articles most of
which the general public has no interest in reading [nor do most
peers!]. So every download and citation matters for this esoteric
content, written by specialists to be read, used, applied and
built-upon by specialists, for the sake of research progress, and
thereby for the benefit of the general public.)

> Intuitively if an article is made open access then it will have
> higher visibility and gain greater citation - but this is not necessarily
> true.

No one has said this is *necessarily* true. The empirical research on
the OA impact advantage and the "self-selection bias" is being
conducted to see whether it is true as a matter of empirical evidence.

> Studies have shown variable citation behaviour in which the access of
> an article appears to have no bearing. For example higher citation of the
> same article within different (higher "Impact Factor") journals (Vincent
> Larivière and Yves Gingras on

(1) Not every OA article will be cited more: only the ones that are
found useful enough to be citeable will. And the more useful an
article is, the greater the observed OA citation advantage. That is
why the empirical question about causality is: "Are OA articles more
likely to be cited because they are OA? Or are they more likely to be
OA if they are more cited (the self-selection bias)?

(2) 80% of citations are citations of the top 20% of articles.

(3) The top journals are both more likely to publish the top articles
and more likely to be cited.

(4) Gingras & Larivière are our co-authors on the study testing
mandated OA against self-selected OA that I mentioned (and that is
being submitted for peer review).

> and the "cluster-effect" of citations whereby authors follow citation trails
> laid by papers that they read resulting in a reduction in the number of
> articles being cited (James Evans in Science, 18 July 2008).

Larivière, V; Gingras, Y; & Archambault, E. (2008) The decline in the
concentration of citations, 1900-2007 ["This paper challenges recent
research (Evans, 2008) reporting that the concentration of cited
scientific literature increases with the online availability of
articles and journals"]

> I guess (as with all statistics) it is quite possible to find a study that supports
> one's point of view.

Yes, and that's called a self-selection bias. The remedy is properly
controlled studies and meta-analyses to determine where the
preponderance of the evidence lies (in the metaphoric, not the
mendacious sense!).

> I agree with Ian Russell that accusing publishers of "intensive lobbying" is
> inflammatory since both sides have formed lobbying bodies.

The difference is that OA lobbyists are not doing it for the money.

> Many publishers (commercial or not) are offering authors the opportunity
> to publish OA within their journals.

And to pay them a hefty price for doing it.

But what is under scrutiny here (the self-selection-bias hypothesis)
is not this generous offer on the part of some "Open Choice"
hybrid-Gold publishers, but the alternative, which is author Green OA
self-archiving, and whether that enhances citations, or is merely a
self-selective bias toward self-archiving the top articles.

> The current problem is that someone has to pay for
> the operation of scholarly communication, and there is no simplistic answer
> that will provide an overarching solution for all disciplines in all parts
> of the world - as much as both publishers and other lobbyists would like
> there to be.

"Scholarly communication" is being paid for, handsomely, today, by
institutional journal subscriptions. So that is definitely not the
"current problem." The problem is that not all the intended users for
which this research is being conducted can access it, because their
institutions can only afford to subscribe to a small fraction of the
peer-reviewed journal corpus.

That's the "current problem." And the -- yes, simple -- solution is
for researchers' institutions and funders to mandate that all their
own journal article output be made freely accessible online -- to all
its intended users (not just to the ones at the institutions that have
subscriptions to the journal in which it happens to be published) --
by ensuring that all authors self-archive their refereed final drafts
in their own institutional repositories immediately upon acceptance
for publication.

There are no disciplinary or geographic differences for the
peer-reviewed journal article corpus in this overarching solution --
neither in its benefits nor in its feasibility -- much though some
publishing lobbyists might wish there were.

> (And to pre-empt the response that repositories would provide the answer,
> no, I don't [think] they necessarily will for all disciplines and in all
> institutions, partly because they do not provide the content fiiltering and
> other valuable benefits that journals currently do, and partly because of
> the additional time/effort/expenditure required of libraries/institutions -
> some can easily meet the requirements, whereas others may not.)

Pippa, I think you may have missed the point about *what* is being
mandated for deposit in authors' institutional repositories: the
peer-reviewed final draft, immediately upon acceptance for

That core fact has been mentioned explicitly and frequently enough, I
should have thought, to pre-empt such a stupendous non-sequitur. But
-- to pre-empt another -- I rather suspect you are co-bundling another
unstated familiar [publishers' doomsday] hypothesis with your notion
of "repositories" [and OA mandates]: that they will destroy journals
and peer review. --

Don't worry. The peers -- the very same authors and users in question
-- do the peer-reviewing. The expenses of a reputable 3rd-party
honest-broker to continue implementing the peer review and certifying
its outcome with its (journal-) title and track-record for those
titles whose publishers prefer not to downsize to this more
parsimonious niche if and when the time comes will simply migrate --
title, track-record, editorial board, referees, authors, readers and
all -- to other (Gold) OA publishers, who will.

So don't worry about "content fiiltering and other valuable benefits
that journals currently do..."

And let OA advocates worry about (and see to the solution of) the real
"current problem": the needless continuing and cumulative daily,
weekly, monthly, yearly loss of research access, usage and impact
owing to access-denial to intended users.

Journals are doing just fine. It's just research access, usage and
impact that isn't.

Stevan Harnad

