How to compare research impact of toll- vs. open-access research

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Wed, 11 Jun 2003 15:48:17 +0100

Reply to (anonymized) query about how to equate and compare the impact of
toll-accessibility with open-accessibility:

> I am generally quite taken by the Open Access approach and think that
> publishers should be encouraged to see it as both an opportunity
> and I think also, reasonably, a threat, unless they reinvent
> themselves. However I think the case will be best advanced if we
> are clear what has actually been demonstrated.

Eventually open-access publishing might be an opportunity. At the moment,
however, it is neither an opportunity nor a threat, because far more
open-access (in terms of number of articles freely accessible online)
is coming from authors providing open access to their toll-access
publications by self-archiving them than from publishing them in
open-access journals.

[Compare the total number of articles in all the open-access journals
listed in the directory of open access journals http://www.doaj.org/
with the number of freely accessible articles indexed by
http://oaister.umdl.umich.edu/o/oaister/ and http://citeseer.nj.nec.com/cs
(equating for year of publication; figures that would include home
website self-archiving would be even higher). There are about 500 open
access journals to date, out of a total of at least 20,000 toll-access
journals, publishing at least 2,000,000 articles a year.]

>> [Steve Lawrence]: "Freely available online articles are 4.5 times
>> more cited than non-freely available online articles [online articles
>> that sit behind a toll gate or articles not online]"'
>
> [This] comes close to justifying the... assertion but not quite,
> since the definition of "non-freely available online articles"
> includes "articles not online" i.e. articles only available in
> paper.

Lawrence's study tried to compare equivalent articles in the
same publication, but because free online availability is also
strongly correlated with time, his comparisons could not be exact.
(The pairs of articles compared might be in the same publication,
but if the publication became freely accessible online in more
recent years, the non-free articles might also have been earlier ones.)

Here are two studies that we are undertaking at Southampton (involving
Tim Brody, designer of citebase, Mike Jewell, designer of paracite, and
Les Carr, the designer of the citation linking in Opcit):

    (1) Comparing citation counts for self-archived and non-self-archived
    articles equated for year, volume, and issue in the same toll-access
    journals, all hybrid journals (i.e. having paper edition and
    toll-access online editions), across a diverse sample of disciplines,
    using paracite to seek full-text free-access versions.

The most revealing study would be of articles published *now*,
in toll-access journals (most of which are now hybrid, having both
paper and online toll-access versions) with the only difference being
whether the authors have or have not made them freely accessible by
self-archiving them.

A long enough timeline is needed to leave time for citations to occur,
so perhaps 1998-2003 could be sampled, carefully making sure that all
journals already had online versions in all cases (and that the
free-access version was self-archived early enough). An approximation
to this selective search could be done using paracite:
http://paracite.eprints.org/

A set of appropriate journals is selected. Their full contents for the
1998-2003 are analyzed by having paracite seek an online full-text
free-access version for each article. The comparison can then be
made, carefully equating both the date of publication and the date of
self-archiving for the self-archived free-access version. (The
possibility of error works *against* our estimate, for paracite might
fail to find free-access versions that do exist in some cases; this
would simply reduce the size of the citation ratio for free-access
vs. toll-access papers.)

There may be some confounding of preprints and postprints in such a
comparison. (Authors may self-archive unrefereed preprints, refereed
postprints, or both.) The obvious cases could be eliminated (if we wished
-- it is not clear that they would really introduce an artifact) by
excluding papers in which the free version appeared before the published
version or if the title differed, but frankly I don't think this is
really relevant, nor an artifact: Enhanced citations owing to
self-archived preprints are still enhanced citations.

    (2) Comparing citation counts for self-archived and non-self-archived
    articles equated for year, volume, and issue in the same toll-access
    journals, all hybrid journals (i.e. having paper edition and
    toll-access online editions) in the Physics ArXiv, using citebase.

Physics (and some areas of maths and allied disciplines) are more
advanced than other disciplines in self-archiving. It would nevertheless
be revealing to compare citation counts for (journal/issue-equated)
papers that are and are not self-archived in ArXiv. Some subfields, like
High Energy Physics, will be virtually 100% self-archived, so there will
be no room for comparison there. But some subfields are still
sub-complete, so there a combination of citebase and web-of-science
for selected samples from the same journal/issue would allow direct
comparison.
http://citebase.eprints.org/cgi-bin/search
http://wos.mimas.ac.uk/

Tim Brody already has indirect evidence that free access is affecting
citations, in that citations are occurring earlier and earlier with
every year that self-archiving grows. This is largely due to the earlier
availability of preprints, but it does show the direct connection
between accessibility and citation:
http://citebase.eprints.org/analysis/correlation.php
(you need recent java installed to view this)

> Open Access publishing has a great intuitive appeal and
> in many ways sits better with the overall academic ethos than the
> present model. The present model does, however, work, albeit within
> its own toll-gated parameters, and any transition from the current
> model to an Open Access Model needs to be undertaken with a shared
> understanding of what we do and don't know.

Agreed. So let us not speak about a transition just yet (except by those
toll-access publishers who may already wish to convert to open-access,
or those new publishers, like BioMed Central, who wish to make their
entry into refereed journal publication as open-access publishers).

As already noted above, the lion's share of the open access today is
being provided by authors -- not by publishing in open-access journals,
but by self-archiving their toll-access publications, both preprints
and postprints. It is in this arena that the benefits of open access,
and its effects on research impact, are being felt, and can be measured.

> Given the importance attached to citation ranking both by authors
> and publishers... further research along the lines undertaken by
> Steven Lawrence in different disciplines would be of great use,
> to the community as a whole. If anyone knows of any research being
> taken along these lines perhaps they could let me know...

See above. See also:
http://cfa-www.harvard.edu/~kurtz/jasist-submitted.ps
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2829.html

I might close with the suggestion that statistical comparisons of
toll-access impact with open-access impact are fine for those who do
not know, or are sceptical, but those who already self-archive don't
need any further convincing! What is not already conveyed by the logic
of the relation between access and impact -- more access is not a
sufficient condition for more citations, but it is certainly a necessary
one! -- is conveyed by the actual experience of self-archivers:

    The rapid evolution of scholarly communication, A. M. Odlyzko.
    Learned Publishing, 15(1) (Jan. 2002), pp. 7-19. Also to appear
    in Bits and Bucks: Economics and Usage of Digital Collections,
    W. Lougee and J. MacKie-Mason, eds., MIT Press, 2002.
    http://www.catchword.com/alpsp/09531513/v15n1/contp1-1.htm

Stevan Harnad

> -----Original Message-----
> From: JISC Electronic Libraries Programme
> [mailto:LIS-ELIB_at_JISCMAIL.AC.UK]On Behalf Of Stevan Harnad
> Sent: 10 June 2003 6:27 pm
> To: LIS-ELIB_at_JISCMAIL.AC.UK
> Subject: Re: THES article on research access Friday June 6 2003
>
> > Date: Tue, 10 Jun 2003 09:19:45 +0100
> > From: [identity deleted]
> >
> >> Re: THES article on research access Friday June 6 2003
> >> "All UK research output should be online"
> >> http://www.ecs.soton.ac.uk/~harnad/Temp/thes.html
> >> Details: http://www.ariadne.ac.uk/issue35/harnad
> >
> > Interesting, and a little ahead of its time. I am sure that citations
> > will play an increasingly important role in the judgements of some
> > [UK Research Assessment] panels next time. But to go the whole way you
> > suggest requires a number of other things to be in place, not least
> > [1] new copyright arrangements, and confidence that other academics
> > everywhere else in the world are [2] able to be made aware of and then
> > [3] access the research publications in question. We are not there yet.
>
> It is certainly true that we are not there yet, but we are much, much
> closer than it may appear. And the outcome is both inevitable and optimal
> for research, researchers, their institutions, their research funders,
> and
> the funders of their funders (tax-paying society). What needs to be done
> is to hasten and facilitate it, and the UK is in a unique position to
> do this.
>
> [1] Regarding copyright, see the Table of Publishers' Policies on
> Self-Archiving maintained by JISC's Project Romeo (Rights Metadata for
> Open Archiving):
> http://www.lboro.ac.uk/departments/ls/disresearch/romeo/index.html
>
> Of the over 7000 journals so far surveyed, 55% already formally support
> self-archiving, and most of the remaining 45% (perhaps 30%) will agree
> on an individual-paper basis if asked. And there are even legal means of
> self-archiving the remaining 15%:
> http://www.eprints.org/self-faq/#self-archiving-legal
>
> So, depending on which way we decide to reckon it, we are at least 55%,
> probably 85% and potentially 100% there already, insofar as copyright
> arrangements are concerned.
>
> So copyright is certainly not the problem.
>
> [2] Regarding international awareness of self-archived open-access
> research, both the awareness and the evidence of the incomparably
> higher visibility and usage of open-access research is already there
> in abundance: It has been reported in Nature that research that is
> freely accessible online is cited 336% as much as equivalent research
> that is not:
> http://www.neci.nec.com/~lawrence/papers/online-nature01/
> There are also search engines such as
> http://oaister.umdl.umich.edu/o/oaister/ poised to become the
> googles of the refereed research literature as soon as that research
> is self-archived, and webmetric search engines ready to monitor and
> quantify impact, in many rich new ways:
> http://citebase.eprints.org/cgi-bin/search
> http://citebase.eprints.org/java/correlation/correlation.html
>
> So worldwide awareness certainly is not the problem.
>
> [3] International access certainly is not the problem either: That is
> what open-access self-archiving is all about!
>
> No, everything is in place and ready. The only thing that is missing
> (and hence the only problem) is the research itself! Researchers (and
> their institutions) have not yet realised that the way to maximise their
> work's impact is to make it open-access by self-archiving it.
>
> It is precisely for this reason that it is so important that
> research-funders should help them realise the importance of maximising
> their research's impact, by the simple and eminently natural extension of
> the "publish or perish" rule to: "publish with maximal impact (through
> self-archiving)."
>
> And it is for this reason that HEFCE and RAE and the UK Research Funding
> Councils are in a position to hasten and facilitate the optimal and
> inevitable, thereby leading the way for the rest of the research world,
> while, paradoxically, simplifying their own lives, insofar as research
> assessment is concerned, even while increasing the predictive power and
> validity of the RAE!
>
> You are right that we are not there yet. To get there we need to go the
> whole way. And the time for that is now. (Indeed, it is overdue, as
> research impact is being needlessly lost daily, and assessment effort is
> being needlessly expended, while we wait.)
>
> Stevan Harnad
>
> PS
> (i) The standardised online RAE-CV can include not only refereed
> journal papers and their webmetric impact measures but all other
> performance indicators too, tailored to each discipline.
> http://paracite.eprints.org/cgi-bin/rae_front.cgi
>
> (ii) Book-based disciplines can self-archive their book's metadata
> (author, title, date, publisher) and reference list to derive the
> full benefit of these new measures of impact even if they prefer not
> to self-archive the full-text.
> http://www.ecs.soton.ac.uk/~harnad/Temp/bookcite.htm
>
> (iii) And even research data (normally is too voluminous to be
> co-published with the research papers based on it) can be self-archived,
> and benefit from measures of its citation and usage:
> http://www.ecs.soton.ac.uk/~harnad/Temp/data-archiving.htm
Received on Wed Jun 11 2003 - 15:48:17 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:58 GMT