Re: Self-Archiving JSTOR OCR'd Retrospective Publications

From: Matthew Cockerill <matt_at_BIOMEDCENTRAL.COM>
Date: Fri, 2 Jan 2004 19:38:09 +0000


You say:
> "If it is left to be done by the journals themselves, or an entity
> (like JSTOR) contracted by
> them, then those contents are probably doomed to sit behind toll-access
> barriers instead of being open-access. It would be far better if
> another
> entity could pick up the tab for the scanning and OCR, and then make
> the contents open-access."

Important to note that this is already happening. PubMed Central has
received funding to work towards scanning/OCRing the back issues of all
the journals that are archived in PubMed Central. They have already
completed the scanning of more than 1 million pages. All of this
content is accessible toll-free in perpetuity via the PubMed Central

More info on the scanning project here:

And more info about participation in PubMed Central here:

Essentially, I believe that basic deal for publishers is that, if they
agree to archive their content with PubMed Central, in return NIH will
bear the cost of digitizing the back content. And that back content
will then become globally available to the scientific community (like
the rest of the content in PubMed Central.)

So I would strongly encourage all publishers of journals who are unable
to finance the digitization of their own backfile to consider
participation in PubMed Central.

Matt Cockerill

[NB - my only affiliation with PubMed Central is that the journals that
BioMed Central publishes are archived there.]

On 1 Jan 2004, at 22:00, Stevan Harnad wrote:

> On Wed, 31 Dec 2003, [identity deleted] wrote:
>> In your many contributions about open-access publishing, many
>> references are made to the annual publication of 2.5 million
>> scientific
>> articles, but what is happening to the contents of hard-copy journals
>> of the past?
> You are I assume referring to the retrospective (or "legacy") contents
> of the 24,000 journals in which the 2.5 million articles appear? Author
> self-archiving can provide open access to the living-author portion
> of that literature. If universities and other research institutions
> -- as a component in their open-access provision policy for their own
> research output -- extend their self-archiving to their own legacy
> contents
> too, that will cover still more of the legacy literature. But the rest
> will require a generic scan/OCR initiative (like JSTOR). If it is left
> to be
> done by the journals themselves, or an entity (like JSTOR) contracted
> by
> them, then those contents are probably doomed to sit behind toll-access
> barriers instead of being open-access. It would be far better if
> another
> entity could pick up the tab for the scanning and OCR, and then make
> the contents open-access.
> If individual authors did their own bit and their institutions filled
> in with the rest of their own retrospective output, then it would be
> easy
> to pitch in, consortially (perhaps via SPARC) to cover the cost of the
> rest, so that the entire retrospective journal literature could be made
> open-access, pari passu with the current and future literature.
>> Some are being digitally archived by their publishers [e.g. [deleted]]
> Yes, but the output from that will alas be toll-access rather than
> open-access! If researchers and their instutions pitched in by
> providing
> just their own legacy literature, a lot more of it could be rendered
> open access.
>> but what is to happen to the many national journals
> Online access to the retrospective contents of any journal (national or
> otherwise) that has no retrospective scanning and access-provision
> agenda
> of its own must rely on individual and institutional self-archiving and
> either consortial subsidy (for open access) or JSTOR-style investment
> (for toll access).
>> The publisher of [a number of] national journals has concluded for
>> the time being that digital archiving of backfiles is too expensive
>> for immediate implementation.
> It would be good if national funding councils made it a policy to
> mandate
> open-access provision for all funded research output. This would
> encourage
> researchers and their institutions to self-archive their current
> research,
> with which a natural parallel step will be to self-archive their legacy
> research too. The distributed cost, per researcher's current and past
> output, is negligible. The outcome will be both access-provision and
> open-access provision for a goodly portion of the legacy literature --
> and not just for nationally-funded research or research published in
> nationally-funded journals, but for all research output.
>> As a demonstration project of a cheaper alternative [we have
>> retrodigitised the contents of one journal]... The contents would be
>> readily accessible 24/7/365 if placed on our local computer intranet
>> or
>> made available on the university's Web site. Copyright restrictions
>> currently prevent us from making this "quick and dirty" solution
>> available.
> An admirable project -- though it would obviously be far more useful it
> it were accessible not only to your university (and still better if it
> were accessible toll-free, i.e., open access)!
>> [The cost for all this is low] yet not one national funding agency
>> has been able to identify a program providing a source of funds for
>> our work.
> There is a kind of grim logic in that: If national funding agencies
> (in any country) were well-informed about the causal connection between
> research access and research impact, they would adopt a policy of
> open-access provision for all current and future research output. A
> natural extension of that policy would be open-access provision for
> retrospective content -- but not on a journal by journal basis (there
> is
> no common interest there) but on a research institution by institution
> basis. If a policy of that scale were in place, funding the remaining
> bits
> (of specific national journal content) that got away would be much
> lower,
> much broader-based (not just one journal but all of them) and much more
> readily justifiable (as a component in a coherent and systematic
> whole).
> Slide0044.gif
> But no such national open-access provision policy exists yet, so the
> prospect of paying for access-provision to the retrospective contents
> of
> just one national journal is not very compelling to government
> agencies.
> I would suggest the same to you: Don't think in terms of national
> journals. Think in terms of current and retrospective national research
> output. That way you'll get all that, plus the journal contents too!
>> We hope to approach [national and international research- and
>> archive-supporting agencies] to see if they are interested in funding
>> a
>> major archiving project for the backfiles of national journals. I
>> would
>> be interested in your comments on the points I have made.
> If I were advising the funders of such a proposal I would suggest it
> was a
> worthy of funding, but only as the third component (3) in a much
> larger and
> more important and urgent project, the first two components of which
> are
> (1) open-access provision to all current national research output and
> (2)
> open-access provision to all retrospective institutional research
> output.
> Slide0005.gif
> Slide0022.gif
> (1) has no funding implications, just policy implications. (2) will
> require
> some funding. Then (3) could be the retrodigitisation of any
> retrospective
> national journal content that had not already been covered by (1) and
> (2).
> (If the archiving is OAI-compliant, it will be very easy to check
> exactly
> what is still missing, journal by journal, once (1) and (2) have been
> implemented.)
> But, in the scale or priorities, I would have to assign a far lower
> priority to just (3) alone, simply because it is so much less urgent
> and important than (1) and (2), as well as a logical and practical
> subset
> of them.
> Prior threads on this topic:
> "Self-Archiving JSTOR OCR'd Retrospective Publications"
> Stevan Harnad
> NOTE: A complete archive of the ongoing discussion of providing open
> access to the peer-reviewed research literature online is available at
> the American Scientist Open Access Forum (98 & 99 & 00 & 01 & 02 & 03):
> To join the Forum:
> Forum.html
> Post discussion to:
> Archive:
> Unified Dual Open-Access-Provision Policy:
> BOAI-2 ("gold"): Publish your article in a suitable open-access
> journal whenever one exists.
> BOAI-1 ("green"): Otherwise, publish your article in a suitable
> toll-access journal and also self-archive it.

This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
Received on Fri Jan 02 2004 - 19:38:09 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:47:14 GMT