Re: Full-text search capability for Eprint Archives

From: Leslie Chan <chan_at_UTSC.UTORONTO.CA>
Date: Tue, 17 Jun 2003 17:16:35 -0400

on 6/15/03 7:33 PM, Stevan Harnad at wrote:

> On Mon, 16 Jun 2003, Tamas Dombos wrote:
>> Has anyone tried truly integrating full text search into the eprints
>> software? By integration I mean a common search interface, with both
>> metadata AND full text search at once. (All the full text search I found
>> was two separate searches and of course the results list didn't use the
>> metadata, but some excerpt from the text.)
>> One (apparently) easy way to do this would be to use an already existing
>> search engine (like ht://dig), and pass the results (eprint ID is in the
>> URL of the documents) to the eprints software, and combine the id search
>> with the metadata search from the original search form. I know that some
>> information would be lost (relevence ordering of the full text search,
>> for example) but this seems to be a good start. Now this works in theory,
>> but I have no idea how difficult it is to implement this.
>> Any suggestions?
> Full-text search capability for Eprint Archives is an excellent
> idea, and I am sure it will become an essential feature.
> Htdig is already implemented by at least one site:
> but it is certainly *highly* desirable to have inverted full-text
> for all Eprint Archives.
> It needs some thought whether it makes more sense to invert full-text
> at each local archive, or at a harvester level (google-like).
> Chris?
> Stevan Harnad
> PS Here are some prior discussions of inverted full text
> in the Amsci Forum:
> and OAI-General:

For an example of an Eprint archive with full-text search capability, see
the server at the Indian Institute of Science:

Dr. T.B. Rajashekar of the National Centre for Science in India integrated
the Green Stone Digital Library software with the Eprints software to enable
this feature. Try it out.

Leslie Chan
Received on Tue Jun 17 2003 - 22:16:35 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:59 GMT