Re: Distinguishing the Essentials from the Optional Add-Ons

From: Stevan Harnad <harnad_at_coglit.ecs.soton.ac.uk>
Date: Thu, 13 Sep 2001 21:03:03 +0100

                    ENHANCED INTEROPERABILITY IS A SUPPLEMENT TO,
                    NOT A SUBSTITUTE FOR FREE ACCESSIBILITY
 
                     Stevan Harnad

On Wed, 5 Sep 2001, Declan Butler wrote:

> 06 September 2001 Nature 413, 1 - 3 (2001) © Macmillan Publishers Ltd.
> http://www.nature.com/nature/debates/e-access/Articles/opinion2.html
>
> The future of the electronic scientific literature
>
> The Internet's transformation of scientific communication has only begun,
> but already much of its promise is within reach. The vision below may change
> in its detail, but experimentation and lack of dogmatism are undoubtedly the
> way forward.

Unfortunately, Declan's vision conflates the primary question of access
with secondary questions of enhancements. (We will return to the question
of "dogma" shortly.)

> "The Internet is easier to invent than to predict"...
> Much the same might be said of scientific publishing
> on the Internet, the history of which is littered with failed predictions.

Including, alas, Declan's predictions...

(My own predictions, since 1990, failed too, so I stopped predicting,
realizing that to predict was simply to try to second-guess human
nature, a stochastic nightmare. Instead, I have made one safely
time-independent projection (no date attached): that a free, online
refereed literature is optimal and inevitable. The rest is not about
prediction, but about practise, and preaching: How to get there, from
here, the fastest, surest way.)

    Harnad, S. (2001) For Whom the Gate Tolls? How and Why to Free the
    Refereed Research Literature Online Through Author/Institution
    Self-Archiving, Now.
    http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm

> Technological advance itself will, of course, bring dramatic changes -- and
> it is a safe bet that bright software minds will punctually overturn any
> vision. But it is becoming clear that developing common standards will be
> critical in determining both the speed and extent of progress towards a
> scientific web.

Correct. And what Declan goes on to point out, again correctly, is that
the OAI (Open Archives Initiative: http://www.openarchives.org ) which
had its roots in what has since become the SAI (Self-Archiving
Initiative: http://www.eprints.org ) has since grown into something
much bigger, namely, a set of standards for making the entire digital
literature interoperable: http://www.arl.org/newsltr/217/mhp.html

The entire digital literature in principle includes all digital
objects: books, magazines, images, music, software. This has made the
OAI extremely important, and extremely wide in its remit. But we must
not forget that the focus of what concerns us here is the refereed
research literature (20,000 annual journalsful). That is only a small
portion of the OAI's digital remit, but it differs radically from the
rest of it in being all AUTHOR GIVE-AWAYS: written by researchers, for
researchers, solely for research impact and uptake, not for royalties
or fees to the author/researcher.

http://www.ecs.soton.ac.uk/~harnad/Tp/nature4.htm

I will point out below where this crucial feature, setting this
literature apart from the rest of what OAI covers, is overlooked
in Declan's Nature essay, with only confusion as a result.

> 'Standards' for managing electronic content are hardly a riveting topic for
> researchers. But they are key to a host of issues that affect scientists,
> such as searching, data mining, functionality and the creation of stable,
> long-term archives of research results. Moreover, just as the Internet and
> web owe their success to agreed network protocols on which others were able
> to build, common standards in science will provide a foundation for a
> diversity of publishing models and experiments and be a better alternative
> to 'one-size-fits-all' solutions.

True, but equally true for the much larger, for-fee, non-give-away
corpus, and for the much smaller (20K refereed journal) corpus that
is the real target of all of this concern (e.g., the concern of the
27K signers of the PLoS petition).

In focusing on OAI interoperability in general and forgetting the
specific giveaway/non-giveaway divide over which it interoperates,
Declan's essay blurs the deeper issues (repeatedly referring to
them only obliquely as "dogmaticism").

> This explains why the Open Archives Initiative (OAI), one of many
> alternatives now being offered to scientists to disseminate their work, has
> now broadened its focus from e-prints to promoting common web standards for
> digital content.

What explains the widening of the OAI's remit is the importance of
making the entire digital corpus interoperable. But our concern here is
with the refereed research subset of that corpus, and there is much
more to say about that. (And the OAI is not a way to DISSEMINATE work,
but a way to TAG it to make it interoperable!)

> The reason is that some of the most promising emerging technologies will
> only realize their full promise if they are adopted in a consensual fashion
> by entire communities. At the level of the online scientific 'paper', one
> major change, for example, is a shift in format to make papers more
> computer-readable. Searches will become much more powerful; tables and
> figures will cease to be flat, lifeless objects, and instead will be able to
> be queried and manipulated by users, using suites of online visualization
> and data-analysis tools.

This is all true, but it is old news. The entire literature (book,
magazine, journal, etc.) is now hybrid: Most texts that were formerly
only on-paper now also have a digital, on-line incarnation. Yes,
standards and interoperability are essential for the usefulness of this
entire corpus, and they are on the way, but those generalities and
foregone conclusions are not what is at issue! They miss the point.

> This is being made possible by Extensible Mark-up Language (XML), which
> allows a document to be tagged with machine-readable 'metadata', in effect
> converting it into a sort of mini-database.

Yes, yes, we all applaud and embrace XML, but that is not what this is
all about either! (The concerns of the 27,000 signatories of the Public
Library of Science [PLoS] Petition, for example, are neither met nor
even addressed by singing the praises and promises of XML!)

> The possibilities for tagging are endless. But a major need now is for
> stakeholders to agree on common metadata standards for the basic structure
> of scientific papers. This would allow more specific queries to be made
> across large swathes of the literature. Indeed, what is above all hampering
> the usefulness of today's online journals, e-print archives and scientific
> digital libraries is the lack of means to federate these resources through
> unified interfaces.

Well now we have indeed arrived at a substantive question of fact:

        Is "what is above all hampering the usefulness of today's
        online journals" indeed their limited INTEROPERABILITY, or is
        it rather their limited ACCESSIBILITY?

(for that vast majority of researchers whose institutions cannot afford
to access more than a small portion of the 20K -- whose contents, I
hasten to remind Declan, all consist, without exception, of AUTHOR
GIVE-AWAYS, written for impact, not income)?

There are 20K refereed journals, and MOST of them are inaccessible to
MOST of their potential users because of the fee-based access barriers
(and I am talking about the First, not the Second or Third World):
http://fisher.lib.virginia.edu/cgi-local/arlbin/arl.cgi?task=setupstats

If their "usefulness" is their usefulness to the researchers who cannot
access them, then their usefulness is not just hampered but ZERO to the
disenfranchised majority with no access at all. By the same token,
their "usefulness" to their researcher/authors, and to research itself,
is diminished by the full amount of the lost potential impact and uptake
corresponding to that lost access. And no amount of interoperability
will remedy inaccessibility.

(Readers are referred to the "Let Them Eat Cake" discussion-thread in
this Forum.)
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/1525.html

> The OAI has agreed metadata standards to facilitate improved
> searching across participating archives, which can therefore be
> queried by users as if they were one seamless site.

Ahem, queried by users whose institutions can afford to pay for the
full-text access where access is not free, which makes for a rather
gormless "seamlessness"...

> The OAI is attractive compared with centralized archives in that it
> allows any group to create an archive while, by agreeing common
> standards, they become part of a greater whole.

Correct. But that pertains to the digital corpus as a whole. For the
specific 20K in question, OAI compliance is just as compatible with a
(seamless) click-through oligopoly of distributed vendors' toll-booths
as with a global virtual collection of institutional research
give-aways of their self-archived refereed research. Let us not blur
the economics with the ecumenism.

> The idea is catching on: it is supported by the Digital Library
> Federation (DLF), a consortium of US libraries and agencies, including
> the Online Computer Library Center. CrossRef, a collaboration of 78
> learned society and commercial publishers, in which Nature's publishers
> are taking a leading role, is also actively developing common metadata
> standards that would allow better cross-searching of the 3 million
> articles they hold.

Better cross-searching for those researchers who are lucky enough to be
at institutions that can afford the access fees for this (small)
portion of the contents of the annual 20K refereed journals (whose
contents are all, without exception, author/institution give-away
research, let us not stop reminding ourselves...).

This is not to deny that these federations are all very important and
welcome. But in and of themselves, they are not a solution to the
fundamental underlying problem, which is that the accessibility of the
giveaway subset of the digital corpus, the refereed research in the 20K
refereed journals, is still fee-based, be it ever so "seamlessly"
interoperable for those who can pay! (By the way, we are talking here
about the accessibility to the full-text: Access to just the metadata
[title/abstracts] is merely a teaser, like G.B. Shaw's hell, to the
vast majority who lack full-text access to the paper itself.)

> Minimal options
> As metadata are expensive to create - it is estimated that tagging papers
> with even minimal metadata can add as much as 40% to costs - OAI is
> developing its core metadata as a lowest common denominator to avoid putting
> an excessive burden on those who wish to take part. But even these skimpy
> metadata already allow one to improve retrieval.

I don't think Declan has quite understood the rationale behind OAI's
philosophy of minimalism (it is in order to encourage maximalism in
participation!). And it is a complete mystery to me where the 40%
figure came from! (OAI mark-up time per paper is more like .0004%,
relative to everything else that goes into the preparation of the
paper, even if we leave out the data-gathering and analysis and start
the clock only at write-up-time!)

> Minimal metadata will suffice for much of the literature. But there will
> increasingly be sophisticated and novel forms of publications built around
> highly organized communities working off large, shared data sets. These hubs
> will stand out by their large investment in rich metadata and sophisticated
> databases. The future electronic landscape should see such high added-value
> hubs evolving as overlays to vast but largely automated literature archives
> and databases.

All true, but again all orthogonal to the fundamental problem -- of
freeing the access to the 20K -- which neither depends on nor is
conferred by (or substituted for) by any of these enhancements.

(But, yes, sophisticated interoperable data-archiving and co-processing
will be a welcome and important SUPPLEMENT to the free online
accessibility of the refereed research corpus (the 20K). The trouble is
that the way Declan describes it, it makes it look as if that primary
objective I keep refocusing us on were somehow contingent on any of
this other stuff. There's no connection whatsoever, and it blurs things
to imply that there is, or that it is somehow "dogmatic' to keep
reminding us that there isn't! Declan is only talking about the
add-ons, whereas the real issue is the essentials (free access to the
20K).

> In such an early stage of development, it is essential to avoid dogmatic
> solutions.

What is meant by this repeated allusion to "dogma," one wonders? Is
it dogmatic to keep pointing out that interoperability is not the same
as accessibility, nor a substitute for it?

All things are negotiable but one principle: That, now that it is
possible, access to the refereed research corpus, the 20K, an
author/institution give-away, must become FREE online. Not fancier: free.
Not cheaper: free. Not free 6-12 months after publication: free.

That isn't dogmaticism; that is the dictionary meaning of FREE.

Unless someone can adduce a reason why free access to all refereed
research would NOT be beneficial to researchers, their institutions,
and research itself -- why it is preferable to restrict access to the
20K to those researchers who are at institutions that can afford to pay
for it, now that that is clearly no longer necessary, and the alternative
is clearly within reach?

http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/399we152.htm

> Not all papers will warrant the costs of marking up with
> metadata, nor will much of the grey literature, such as conference
> proceedings or the large internal documentation of government agencies.

I have no idea what Declan means here, or what he is basing this on. If
a (refereed, published) paper warrants the cost and effort of being
written at all, it's worth the few minutes of extra effort it takes to
tag its title, abstract, etc. for an OAI compliant Eprint Archive.

(On the question of coding the whole paper in XML -- an open-ended
undertaking -- I plead "nolo contendere" because it simply has nothing
to do with the central question: freeing access to the 20K online.)

Nolo contendere also as regards the unrefereed, unpublished
literature.

So is it unclear what the point here is.

> Many high-cost, low-circulation print journals could be replaced by
> digital libraries. Overheads would be kept low, and the economics
> argues that the cheapest means of handling the bulk of the literature
> may be automated digital libraries. Tags automatically generated from
> machine analysis of the text, for example, might minimize the quantity
> of manual metadata needed.

This conflation has all the virtues of swapping chalk and cheese! Are
digital libraries going to also become implementers of peer review?
Why?

This comes of mixing not only the give-away and non-give-away
literature, but the refereed and nonrefereed literature. The
conclusions it underwrites would have all the utility of a chalk/cheese
souffle...

> Or take ResearchIndex, software produced by the computer company NEC,
> which builds digital libraries with little human intervention. It
> gathers scientific papers from around the web and, using simple rules
> based on document formatting, can extract the title, abstract, author
> and references.

Chalk and cheese again: ResearchIndex http://citeseer.nj.nec.com/cs?/
is wonderful, but it is not REPLACING peer-reviewed journals either,
any more than OAI or Digital Libraries are; it is merely REPACKAGING
those (500K) journal papers (or their pre-refereeing precursors) that
their authors have had the good sense to self-archive online!

ResearchIndex is in fact a kind of ALTERNATIVE to the OAI
approach. OAI is predicated on minimal author self-tagging with
metadata, and then author self-archiving in an OAI compliant Eprint
Archive, whose contents can then be harvested by OAI services such as
cross-archive searchers http://arc.cs.odu.edu/.

ResearchIndex, in contrast, directly "harvests" what there is on the
web, such as it is, whether or not it is perspicuously tagged. As long
as a harvestable file "looks something like" a research article (in
computer science), ResearchIndex "extracts" the key information (title,
author, abstracts, references, etc.) and makes it accessible in one
global virtual archive, with many terrific enhancements, especially
citation navigation and analysis.

ResearchIndex has successfully processed over 500K computer science
papers this way. That's actually more papers than there are in the
Physics Archive [150K] http://arXiv.org. But that's still an awkward,
nonoptimal way of doing it (and the interface looks as horrible as the
capability itself is wonderful! [one could say the same of the Physics
arXiv!]).

What's true, though, is that the 500K ResearchIndex virtual archive is
ITSELF now ripe for harvesting -- into an OAI-compliant Eprint
Archive! (And that's what we soon hope to do -- or to persuade someone
else to do, using the eprints.org software and some automatic
archive-transfer software.)

> It interprets the latter, and can conduct automatic citation analyses for
> all the papers indexed. Such digital libraries will also provide new tools,
> for example to generate new metrics based on user behaviour, which will
> complement and even surpass citation rankings and impact factors.

Yes, and such citation analyses and metrics are also being done on the
OAI side in the OpCit project (in which ResearchIndex are
collaborators): http://opcit.eprints.org/

But both these efforts are predicated on free access to the archived
papers in question (and not just to their metadata). (Besides, for the
time being, even a paper's reference list is part of the full-text
rather than the metadata [as it should preferably be] in most cases.)

So these "digital libraries" are not alternatives to anything, insofar
as refereed journal publication (the 20K) is concerned (unless the
refereed papers are freely archived to begin with, and, a fortiori,
unless they are already REFEREED!). Apples and oranges, chalk and
cheese.

> At the other end of the spectrum, specialized communities organized around
> shared data sets will produce highly sophisticated electronic
> 'publications', making it much more arduous for authors to submit
> information because of the amount and detail they will be required to enter
> in machine-readable form. Take the Alliance for Cellular Signaling (AfCS), a
> 10-year, multimillion-dollar, multidisciplinary project run by a consortium
> of 20 US institutions. It is taking a systems view of proteins involved in
> signalling, and integrating large amounts of data into models that will
> piece together how cellular signalling functions as a whole in the cell.
> Here, authors would be required to input information, for example, on the
> protocols, tissues, cell types, specific concentration factors used and the
> experimental outcomes. Inputs would be chosen from menus of strictly defined
> terms and ranges, corresponding to predefined knowledge representations and
> vocabularies for cell signalling.

May I suggest that we leave this more complex special case until we
have agreed on a solution for the 20K simple cases still waiting patiently
to be freed?

> The idea is that, rather than simply producing their own data, communities
> instead create a vast, shared pool of well-structured information, and
> benefit by being able to make much more powerful queries, simulations and
> data mining. A series of 'molecule pages' would also pull together virtually
> all published data and literature about individual molecules in relation to
> signalling.

Great stuff. But research as a whole is more concerned about access to
the no-frills 20K...

> Indeed, the high-throughput nature of much of modern research means that,
> increasingly, important results can be fully expressed only in electronic
> rather than print format. Systems biology in particular is driving research
> that seeks to describe the function of whole pathways and networks of genes
> and proteins, and to cover scales ranging from atoms and molecules to
> organisms. Increasingly, the literature and biological databases will
> converge to create new forms of publications. Other disciplines stand to
> benefit, too.

And I'll bet that the researchers contributing to these more complex
forms of publication will be just as eager to ensure that access to
their give-away contributions is not blocked by any needless toll-booths
as are all the other authors of the papers in the 20K simpler refereed
research journals...

> Helping machines make sense of science on the web
> Many communities, including the AfCS, are building ontologies to underpin
> such schemes. Ontologies mean different things to different people, but they
> are in effect representations that attempt to hard-code human knowledge
> about a topic and the intrinsic relationships in ways that computers can
> use. The microarray community has been very active in this area. The
> Microarray Gene Expression Database group has coordinated global standards;
> as a result, users will be able to query vast shared data sets to find all
> experiments that use a specified type of biological material, test the
> effects of a specified treatment or measure the expression of a specified
> gene, and much more.

The big question will be: Are the creators of these data and these
standards and services going to want to charge access fees? If so, more
power to them. They are making their livings as data- or
service-providers. But if they are instead researchers, making their
livings from doing and reporting their research, and particularly from
the IMPACT of that research on other researchers and on research
itself, then they will not want to constrain that impact one bit by
access-blocking tolls.

> Ontologies can also be used to tag literature automatically, and will be
> particularly useful for grey literature and archival material for which
> manual tagging was not justified. Papers tagged automatically with concepts
> can be matched, grouped into topic maps and mined. By breaking down
> terminological barriers between disciplines, this should also enhance
> interdisciplinary understanding and even serendipity. Nature is actively
> investigating such possibilities.

All good stuff, and definitely on the way, but these enhancements are
not what all the fuss is about...

> The advent of structured papers that are increasingly held in literature
> databases blurs further the distinction between the scientific paper and
> entries in biological databases. Already, entries in the biological
> databases are often hyperlinked to relevant articles in the literature and
> vice versa, and CrossRef is developing standards for such linking. As text
> becomes more structured, it will be possible to increase the sophistication
> of both linking, data manipulation and retrieval.

There's no blur when it comes to the 20K. So what are we supposed to
conclude (undogmatically) about the 20K from the future-casting based
on these anomalies? It is not even clear whether we are talking here
about refereed publications or merely (accredited) contributions to a
data-base.

> Biological databases and journals have evolved relatively independently of
> one another. Database annotations lack the prestige of published papers;
> indeed, their value is largely ignored by citation metrics, and their upkeep
> is often regarded as a thankless task. Database curation has consequently
> lacked the quality control typical of good journals. The convergence between
> databases and the literature means that database annotators and curators
> will increasingly perform the functions of journal editors and reviewers,
> while publishers will develop sophisticated database platforms and tools.

Yes, database contributions will increase, and should receive due
credit. But the problem at hand is the unambiguous, present 20K, not a
few one-off cases and a possible future increase in them.

Moreover, the more likely trajectory for the convergence between
refereed papers and data is that more data will be included or linked
to research papers (hence will itself have to be refereed). That still
leaves the accessibility status of the give-away research (and
accompanying data) to be determined...

> New ways in
> Database- and metadata-driven systems will drive interfaces to publications
> from simple keyword search models to ones that reflect the structure of
> biological information. Visualization tools of chromosomal location,
> biochemical pathways and structural interactions may become the obvious
> portals to the wider literature, given that there are far fewer protein
> structures or gene sequences than there are articles about them. As Mark
> Gerstein, a bioinformaticist at Yale University, points out: "One might 'fly
> through' a large three-dimensional molecular structure, such as the
> ribosome, where various surface patches would be linked to publications
> describing associated chemical binding studies."

And will following those links to the refereed research papers (still
author give-aways) be free or fee-based? That's still the question, and
it is unchanged no matter how complex a superstructure of enhancements
one builds around it.

http://www.ecs.soton.ac.uk/~harnad/Tp/nature4.htm#B1

> Future electronic literature will therefore be much more heterogeneous than
> the current journal system, and dogmatic solutions should therefore be
> resisted.

One must ask again: What is the dogma? That giveaway refereed research,
done and refereed by and for researchers and research, should be freely
accessible? That sounds a lot more like common sense than dogma (now
that it is possible, with Gutenberg costs gone and only the
implementation of the refereeing to be paid for).

> It is significant and sensible that both CrossRef and OAI have
> made key strategic choices favouring openness and adaptability. They seek
> to federate distributed actors rather than to create centralized structures.

This is again a non sequitur. CrossRef and OAI are about linking and
interoperability. By definition, the entities involved are distributed
and would benefit from linking and interoperability. But the
substantive question (about which CrossRef and OAI are rightly neutral)
is whether the linked entities will or will not be surrounded by a
fee-based firewall in the special case of the 20K. This rosy
all-things-linked picture does not address that question, yet that is
the fundamental question for present purposes, not all these frills.

> They also make their work independent of the type of content, which makes it
> flexible enough to incorporate and link seamlessly not just papers but news,
> books and other media.

News links may or may not be toll-gated, book links almost certainly
will, and media mostly will be. But the big question is about links to
the rest of the 20K. No quantity of link enhancement will offset the
fact that that the links to that 20K must at last be freed, now that
they can be.

> Crucially, both OAI and CrossRef have also decided to build systems
> independent of the economic mechanisms surrounding that content.

As they should. Linking is one thing, whether or not the link is
toll-gated is another.

But that is precisely why the enhancements are orthogonal to the
essentials.

> Many publishers, in particular some learned societies, may be willing
> to make their content free, perhaps after a certain delay.

All very welcome. But that's only some of the 20K (and some of it only
after a delay). Is it dogmatic, or merely dogged, to point out that
author/institution self-archiving (in OAI-compliant, interoperable,
Eprint Archives) will free all of the 20K with no delay?

http://www.ecs.soton.ac.uk/~harnad/Tp/science2.htm

> Others are exploring business models where authors or sponsors pay,
> which would allow free access to articles on publication.

Authors pay (out of what money? their pockets?)? Sponsors pay (who?
why?)? And for what? For everyone else's access tolls to their work?
Across the 20K refereed journals (at current subscription rates) that
averages $2000 par paper! It's obscene to think of asking an
author/researcher to pay that, and a fantasy to imagine that every
author can find a "sponsor" to pay it for him. (As long as we are
dreaming, why not simply find a sponsor to pay each institution's
subscription tolls for all 20K? It works out to the same amount of
funny-money.)

No, it's somewhat more complicated than this, yet much simpler than it
seems: Rather than 500K authors each reaching into their pockets to pay
an average of $2000 per paper to free the access to each of their 4
annual papers in the 20K refereed journals, those authors can instead
self-archive those same papers immediately in their OAI interoperable
institutional Eprint Archives, and if ever the journal publishers'
Subscription/License/Pay-per-view [S/L/P] revenue drops to where the
true cost of peer reviewing of those articles can no longer be covered
out of the annual S/L/P revenues, then the authors' own institutions
can cover those costs for their author/researchers out of their annual
windfall S/L/P savings (peer review only costs 10-30% or $2-500
per paper).

http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/399we152.htm

> The open technological frameworks also mean that
> particular communities, such as scientists with specific metadata needs for
> their discipline, are free to build in more complex data structures; the
> higher overheads incurred may require charging for added-value services.

"Added-value" is fine, but it must no longer be used as a pretext for
holding the refereed paper hostage. Let add-ons be charged for as
options, as we do with everything else. The only essential is peer
review, and the formula for covering that was just described above.

But a word of caution to anyone contemplating make a living out of
selling add-on enhancements to the refereed research literature (other
than peer review itself): Consider that whatever you can do, the
author, or his institution, or OAI harvesters providing OAI services
(e.g. http://arc.cs.odu.edu/ or
http://cite-base.ecs.soton.ac.uk/cgi-bin/search ) may well be willing
and able to do just as well or better, for free. That's one of the
risks of the on-line age.

> Neutrality
> The OAI and CrossRef strategies therefore differ fundamentally from more
> centralized systems proposed by PubMed Central (PMC), operated by the US
> National Library of Medicine, and E-Biosci, being developed by the European
> Molecular Biology Organization.

Instead of mixing apples and oranges here, Declan is mixing apples and
fruit! Both central archives like PubMed Central, E-Biosci, the Physics
ArXiv, and CogPrints, and the growing number of distributed
institutional archives (http://www.eprints.org/users.php and
http://oaisrv.nsdl.cornell.edu/Register/BrowseSites.pl ) can, if they
wish, be OAI-compliant or not. They can also be CrossRef accessible or
not.

So it's not central archiving vs. OAI! Besides, the whole point of OAI
is to make archives interoperable. This means that all archives,
whether formerly "central" or "distributed," now effectively become
distributed archives, integrated by the common OAI "glue" that allows
their metadata (and full-texts, if they are free) to be harvested into
global meta-archives or "virtual archives." In the end, these global
virtual archives will be the only "central " ones, and there may well
be many, often re-presenting the same basic contents but in a different
way, perhaps with different add-ons (some for-free, some for-fee: it
doesn't matter, as long as the basic refereed corpus of 20K is free).

> But PMC and E-Biosci highlight the urgent need to index the full text of
> papers and their metadata and not just abstracts, as is the practice of
> PubMed and other aggregators. Services that require publishers to deposit
> full text only for indexing and improving search are useful.

And those based on authors self-archiving their full-texts for all
purposes (apart from plagiarism!) for free will be more useful still.

> Unfortunately, PMC, unlike E-Biosci, confounds this primarily technological
> issue with an economic one, by requiring that all text be made available
> free after, at most, one year. It is regrettable that PMC has not in the
> first instance sought full-text indexing itself as a goal, as this in itself
> would be an immediate boon to researchers. It would also probably have been
> more successful in attracting publishers.

Ah me! With no supporting argument or evidence whatever, Declan is here
preaching the virtues of continuing to hold the essentials (the peer
reviewed full-texts) hostage to the add-ons, and berating PMC precisely
for declining to do this! (Is it dogmatic to keep pointing this out?)

> The reality is that all of those involved in scientific publishing are in a
> period of intense experimentation, the outcome of which is difficult to
> predict. Getting there will require novel forms of collaboration between
> publishers, databases, digital libraries and other stakeholders. It would be
> unwise to put all of one's eggs in the basket of any one economic or
> technological 'solution'. Diversity is the best bet.

Indeed, let 1000 flowers bloom -- and among them the authors' own
home-grown versions of their 2M annual give-away papers, published in
the 20K refereed journals, but also self-archived in their
institutional, OAI-compliant Eprint Archives. That both frees the
refereed literature and allows the market decide what add-on
enhancements it still chooses to pay for.


--------------------------------------------------------------------
Stevan Harnad harnad_at_cogsci.soton.ac.uk
Professor of Cognitive Science harnad_at_princeton.edu
Department of Electronics and phone: +44 23-80 592-582
             Computer Science fax: +44 23-80 592-865
University of Southampton http://www.ecs.soton.ac.uk/~harnad/
Highfield, Southampton http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM


NOTE: A complete archive of the ongoing discussion of providing free
access to the refereed journal literature online is available at the
American Scientist September Forum (98 & 99 & 00 & 01):

    http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
or
    http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/index.html

You may join the list at the site above.

Discussion can be posted to:

    american-scientist-open-access-forum_at_amsci.org
Received on Thu Sep 13 2001 - 20:58:18 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:15 GMT