Re: Interoperability - subject classification/terminology

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Sat, 8 Mar 2003 13:15:21 +0000

On Fri, 7 Mar 2003, David Goodman wrote:

> I agree that a
> decentralized archive, as distinguished from arXiV, does not need
> much in the way of classification

Not even ArXiv needs it: Those are physics articles, not books. They don't
need LoC classification, only full-text boolean search, with
scientometric ranking along the lines of:
http://citebase.eprints.org/cgi-bin/search

Moreover, if ever a useful taxonomy is generated for the refereed research
article literature, it will be one that is scientometrically (i.e.,
computationally) generated *from* such a digital database, not an
old-style a-priori human classification.

> I suspect the practical access for the immediate future will be
> by known author, supplemented by the citation network.

and boolean full-text search.

> On the other hand, to rely on OAI harvesters and automated search tools
> for accessing the union of all such collections is premature.

Yes, but not for the reason I think you have in mind! It is premature
because the union of all such collections is still so empty! As it
grows, the associated tools will grow (they are the easy part!).

> I am not certain whether it is within human capabilities to design
> this--certainly none of the extensive efforts at automatic document
> retrieval are really adequate--it's a problem of the same magnitude
> as AI in general.

For the human written word corpus as a whole. But not for the 20,000
refereed research journals, classified, as a first cut, by their
discipline and journalname. The rest most definitely *is* within human
capabilities to design (along the lines mentioned above).

> I would love to see this solved, of course, because the
> known manual methods, as they are applied in libraries and
> indexing services, are almost equally unsatisfactory.

In the case of the refereed journal corpus (the only corpus at issue
here), they are not only unsatisfactory, but completely unnecessary.
Let us nto conflate this very special (and small and tractable) part
with the (possibly intractable) whole.

Stevan Harnad
Received on Sat Mar 08 2003 - 13:15:21 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:53 GMT