An OAI Gateway Service for Web Crawlers: DP9

From: Xiaoming Liu <liu_x_at_cs.odu.edu>
Date: Wed, 21 Nov 2001 13:02:57 +0000

Hi all,

A new OAI service provider for Web Crawlers- DP9 is available, the idea
comes from one discussion in this list -- how to index OAI archives in
Google?

DP9 is a gateway service that enables indexing of an OAI data provider by
an Internet search engine. The DP9 allows a web crawler to retrieve
records in an OAI collection by executing OAI requests and translating XML
responses into HTML format on behalf of a web crawler.

Below are the services that DP9 provides:

An entry page,if Web Crawler find entry page and dig into these links, it
will index all records in an OAI data provider.
 http://arc.cs.odu.edu:8080/dp9/index.jsp

Persistent and bookmarkable URL for OAI record. An example,

 http://arc.cs.odu.edu:8080/dp9/getrecord.jsp?identifier=oai:arXiv:astro-ph/9501031&prefix=oai_dc

Parallel metadata Set, but only limited format is supported now, new
metadata support could be easily added-- just send us your XSL file

http://arc.cs.odu.edu:8080/dp9/getrecord.jsp?identifier=oai:VTETD:etd-3345131939761081&prefix=oai_rfc1807

The DP9 code is available from
   http://arc.cs.odu.edu:8080/dp9/install.jsp
It's based on JSP and XSLT, if you install it in your own server, it will
make your OAI compliant archive webcrawler-enabled, and with your own URL.

DP9 is a gateway service, it doesn't cache the OAI record and just
forwards any request to corresponding OAI data provider, so its quality of
service is highly depended on the server availabity of OAI data providers.

DP9 now uses the data providers list from OAI website
 http://www.openarchives.org/Register/ListFriends.pl

We'd welcome any feedback or advice.


Xiaoming Liu
DL Research Group
Old Dominion Univ







_______________________________________________
OAI-general mailing list
OAI-general_at_oaisrv.nsdl.cornell.edu
http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-general
Received on Wed Nov 21 2001 - 13:03:32 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:18 GMT