CERN's Self-Archiving Growth Rate

From: Stevan Harnad <>
Date: Fri, 2 Dec 2005 13:03:42 +0000

Re-posted from ozeprints list.

---------- Forwarded message ----------
Date: Fri, 2 Dec 2005 12:15:09 +0100
From: Joanne Yeomans <>
To: Stevan Harnad <>
Subject: RE: [ozeprints] Comparative stats for comparing causal factors

Hi Stevan,

Just to step in with a little correction about the CERN figures in case
people believe we have 100% OA preprint coverage. Yes, we have a mandate
to deposit, but we also have to work hard finding and uploading papers
ourselves. We're in the process of doing some new uploads (of older
digitised papers) and are about to start some author-chasing techniques
on more recent papers and will be recalculating our figures soon to see
the effects of these. However, the magic 100% has still to be seen.

The average we calculated at the end of last year for the five year
period 2000-2004 was 72% coverage (i.e. of all the papers we knew were
published in this period we had our own full-text version attached to
72% of them).

For 2004 this appeared to be 90% but as predicted this has now dropped
to 68% because we have begun to find metadata for published articles
that were not deposited. Hopefully as we chase up some of these authors
the proportion will climb again and with some of our new chasing
techniques we'll get a general upward trend towards 100%.

Note these figures exclude conference papers which we think tend not to
be deposited as meticulously.

We're still obviously very pleased to get this 70% coverage but aren't
ready to blow any trumpets announcing 100% quite yet but rest assured
we're still working at it..

Thanks, Joanne

Joanne Yeomans
Office 3/1-012, DSU/SI Service
Mail address:
Mailbox C27810
CERN CH 1211 Geneva 23
Tel: 70548 (externally dial +41 22 76 70548)

> -----Original Message-----
> From: Stevan Harnad []
> Sent: Friday, December 02, 2005 12:04 AM
> To: Arthur Sale
> Cc:; Alma Swan;
>; Eloy Rodrigues; Alexander Borbely; Jens
> Vigen; Tom Cochrane; Joanne Yeomans
> Subject: [ozeprints] Comparative stats for comparing causal factors
> On Wed, 30 Nov 2005, Paula Callan wrote:
> > I feel I need to add some comments about the graphs in
> [Arthur Sale's]
> > presentation and paper comparing eprint deposit levels at
> QUT and UQ
> > with the annual number of DESTable publications...
> >
> > I realise that you created these graphs with the very best of
> > intentions (i.e. to encourage institutions to mandate
> > self-archiving), but, I have grave concerns about the practice of
> > comparing local institutional eprint deposit levels. There is a
> > fantastic spirit of collaboration and mutual support within
> the open
> > access community in Australia and I worry that the use of local
> > institutional comparisons in this manner would introduce an
> element of competition that could undermine this.
> Dear Paula,
> I yield to no one in my admiration for your work, but I think
> you might be misinterpreting the purpose such comparative
> data -- which I think are absolutely essential both to
> understanding what progress has been made, and why/how/when,
> and in helping to promote more of it.
> If comparing repository filling rates were something
> untoward, then one could make the same objection to comparing
> growth graphs in the Institutional Archives Registry
> which explicitly compares the growth rates of nearly 400
> institutional repositories! That was why they were created.
> And by the same token, you would need to worry also about the
> Archive Policy Registry (in which your own QUT archive shares
> pride of place with the other four institutions that have OA mandates)
> That too was created for comparative purposes, so we could
> discern the contributions of different kinds of policies (encourage vs
> require) etc.
> What is missing from these existing comparative (not competitive:
> comparative, informative) graphs is the missing element
> Arthur has ingeniously added, which is the contents growth
> shown not just as raw number of records, but records relative
> to annual research output. We *need* those as a basis for
> comparison, otherwise we have no way of objectively gauging
> our progress, or determining what factors may be responsible
> for it, or what factors might be missing.
> It is with this same rationale that Arthur has been providing
> the weblogs showing the download data. Not to crow about
> UTas's superiority to those who don't have usage stats, nor
> even to crow about the impact of the UTas authors, but to
> give UTas authors objective feedback on the benefits of
> self-archiving -- and, by the same token, to show authors who
> don't self-archive what they are missing.
> This is not competition but rational comparison, based on
> feedback from the consequences of doing or not doing this or that.
> > My second concern relates to the actual validity of the message:
> >
> > a) There are a number of variables that influence eprint
> deposit rates
> > and the existence or absence of a University eprint policy
> is only of them.
> > QUT s eprint policy has been very helpful in my quest to
> populate the
> > repository, but not in the way that most people would expect. When
> > talking directly to our academics and research students, I never
> > mention the policy as the reason why they should deposit
> their work in
> > the repository. In fact, I suspect that it would probably
> have the opposite effect.
> That may very well be an excellent psychological strategy.
> But it certainly does not demonstrate that the mandate is not
> playing its causal role. Unless Tom Cochrane implemented the
> mandate but kept it a secret, QUT researchers know there is a
> mandate. So although you are right to be discreet about the
> help you are giving, and to focus on the positive aspects of
> it rather than the fact that QUT has mandated that it must be
> done, that in no way means the mandate was not a necessary factor.
> And if further evidence for that is wanted, we need only
> compare the QUT growth rate with that of other universities
> (e.g., UQ) that have provided
> 2 of the 3 essential ingredients (the archive and the library
> help) but not the 3rd (the mandate). Not only UQ but many
> other libraries have only 2 out of the three, and their
> growth rates show it. Arthur, for correct statistical
> purposes, chose to compare QUT (+A +L +M) with UQ (+A +L -M)
> and UTas (+A -L -M) because they were in other respects more
> closely comparable than, say, a non-Australian university
> (with no DEST baseline to use).
> > The endorsement of the policy by the University Academic Board has
> > definitely helped me to get the attention of senior academic staff
> > (Assistant Deans, Heads of School, Research Centre Directors etc).
> > However, this is just an opportunity to sell the many REAL
> benefits of
> > the eprint repository. The policy may induce some senior academics
> > and early career researchers to form an intention to deposit their
> > papers. However, it will take more than a policy to get
> them, and the
> > vast majority of middle career academics, to act on that intention.
> I agree completely. And Arthur's data do not show, nor does
> he interpret them to show, that QUT would have had the same
> growth rate with +A -L
> +M. What he showed was the additive, indeed the
> multiplicative effect of
> the two essential factors, +L and +M, compared to +L alone (UQ).
> > The evidence for this assertion lies in the fact that we had the
> > policy in place at the beginning of 2004 yet during the
> whole year we
> > only managed to get 400 papers into the repository, and
> most of those
> > were deposited by me on their behalf.
> This is all valuable information, and confirms how important
> +L is. But the fact is that we have many archives with just
> +L and no +M, and they are not growing. (I don't discount
> personal creativity and dedication, of course. The +PC factor
> is one that it would be very difficult to test statistically,
> for there is only one PC in the world!) In contrast, there
> are only 5 archives with +M, and 4 of them are growing
> robustly. (The latest one, Zurich, remains to be observed: At
> the moment it seems to be +M -L -A! But that will change
> soon, I am sure, as Prof. Borbely sees to it that the archive
> is created and the library help is
> provided.)
> So the evidence seems to be that +M is a necessary, but not a
> sufficient condition for robust archive growth: +L +M are
> sufficient conditions (but especially when the +L comes with +PC!)
> And that's what Arthur's data show.
> > It was only when we identified and lowered the barriers to
> > participation that our academics started depositing their
> own papers.
> > That is, we (the Library) relieved them of the burden of
> > responsibility for checking the publisher s policy on
> self-archiving
> > and allowed them to upload the file in any format
> (including MS Word).
> > Once the perceived benefits outweighed the perceived
> difficulties and
> > worries, the floodgates were opened. With these new
> guidelines in place, this year we have so far had
> > over 1600 papers deposited by the authors themselves.
> That is an important and valuable procedural detail. Others
> have discovered it too. The file formats need to be
> inclusive, not exclusive, if they are to draw rather than
> deter. And copyright worries, irrational and groundless as
> they are, need to be assuaged, and you have successfully done
> so, and should be imitated!
> > I felt that the graphs infer that the relationship between
> an eprint
> > policy and a high unmediated deposit rate is direct and
> causal, this
> > could be misleading as the reality is much more complex.
> I don't think Arthur inferred that from the graphs; I
> certainly didn't. There are several causal factors there, and
> the graphs shows their synergy. (And statistically, we would
> need data on a bigger sample of archives to really draw any
> causal conclusions; the single examples are suggestive, but
> hardly definitive in themselves, since so many local factors
> -- including +PC! -- could have been involved.!)
> > b) I also think that the overlaying of annual DEST outputs levels
> > against the raw number of deposits in a year is rather misleading.
> > The graph infers that we are capturing 100% of our
> DESTable research
> > outputs. However, the deposited papers included a significant
> > proportion of publications from previous years. A more
> valid analysis
> > would be to compare the number of deposits with a specific
> publication
> > year with the DEST output for that year.
> You are quite right, and I am sure that will be Arthur's next
> step. It is *annual* output that is at issue; cumulative
> output is only a very rough first approximation. (But better
> than nothing, and looking rather good, so far!) Fortunately,
> we have the CERN data (+L +M) and their current annual
> article output is at 100% already! Hard to find a comparable
> institution to benchmark its success against, though...
> > I have done this analysis for QUT and found that so far we have
> > captured approximately 41% of our 2004 publications (512
> out of 1240).
> > Even though we have had to block access to some author-manuscript
> > versions (due to publisher policy constraints) approximately 95% of
> > our content remains open access .
> Excellent data. Now what is needed is (1) the figures for 2002, 2003,
> 2004 and 2005 for QUT and (2) the corresponding figures for
> UQ (+L -M) and UTas (-L -M). (3) The annual *growth rates* for each.
> And of course all relative to annual output.
> Then we will have a good basis for evaluating the causal role
> of +M and +L (not the relative virtues of UQ, QUT and UTas,
> nor of their respective
> personnel!)
> (It is of no consequence at all that some of the contents are
> OA and some are IA: The hurdle for the moment is the
> depositing itself. Soon we will have an automatic email
> eprint-request feature in Eprints that will dissolve the
> functional difference between OA and IA and hasten the moment
> when authors make their texts OA straight off.)
> > I am happy with this figure as it is still much better than
> the global
> > open access average of 15% and, as 2004 publications are
> still being
> > deposited, I am sure we will be able to improve on this. P.S. The
> > global average is a nice anonymous benchmark that I am
> happy to thrash
> It's a *comparative* benchmark. Comparisons should also be
> done on larger samples grouped according to the factors M and
> L (and any others).
> Chrs, Stevan
Received on Fri Dec 02 2005 - 14:39:44 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:07 GMT