Re: The fundamental importance of capturing cited-reference metadata in Institutional Repository deposits

From: Sally Morris <editor_at_alpsp.org>
Date: Fri, 23 Jan 2009 15:41:52 -0000

If CrossRef's SimpleTextQuery can parse any standard reference
format, so presumably could repositories?

 

Sally

 

 

 

Sally Morris

Editor-in-Chief, Learned Publishing

 

South House, The Street

Clapham, Worthing, West Sussex BN13 3UU, UK

 

Tel: +44(0)1903 871286

Fax: +44(0)8701 202806

Email: editor_at_alpsp.org


____________________________________________________________________________


From: American Scientist Open Access Forum
[mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG]
On Behalf Of Stevan Harnad
Sent: 23 January 2009 00:28
To: AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG
Subject: The fundamental importance of capturing cited-reference
metadata in Institutional Repository deposits

 

On 22-Jan-09, at 5:18 AM, Francis Jayakanth wrote on the eprints-tech
list:

Till recently, we used to include references for all the uploads that
are
happening into our repository. While copying and pasting metadata
content
from the PDFs, we don't directly paste the copied content onto the
submission screen. Instead, we first copy the content onto an editor
like
notepad or wordpad and then copy the content from an editor on to the
submission screen. This is specially true for the references.

Our experience has been that when the references are copied and
pasted on to an editor like notepad or wordpad from the PDF file,
invariably
non-ascii characters found in almost every reference. Correcting the
non-ascii characters takes considerable amount of time. Also, as to
be
expected, the references from difference publishers are in different
styles, which may not make reference linking straight forward. Both
these
factors forced us take a decision to do away with uploading of
references,
henceforth. I'll appreciate if you could share your experiences on
the
said matter.


The items in an article's reference list are among the most important
of metadata, second only to the equivalent information about the
article itself. Indeed they are the canonical metadata: authors,
year, title, journal. If each Institutional Repository (IR) has those
canonical metadata for every one of its deposited articles as well as
for every article cited by every one of its deposited articles, that
creates the glue for distributed reference interlinking and metric
analysis of the entire distributed OA corpus webwide, as well as a
means of triangulating institutional affiliations and even name
disambiguation.

Yes, there are some technical problems to be solved in order to
capture all references, such as they are, filtering out noise, but
those technical problems are well worth solving (and sharing the
solution) for the great benefits they will bestow.

The same is true for handling the numerous (but finite) variant
formats that references may take: Yes, there are many, including
different permutations in the order of the key components,
abbreviations, incomplete components etc., but those too are finite,
can be solved once and for all to a very good approximation, and the
solution can be shared and pooled across the distributed IRs and
their softwares. And again, it is eminently worthwhile to make the
relatively small effort to do this, because the dividends are so
vast.

I hope the IR community in general -- and the EPrint community in
particular -- will make the relatively small, distributed,
collaborative effort it takes to ensure that this all-important OA
glue unites all the IRs in one of their most fundamental functions.

 

(Roman Chyla has since replied to eprints-tech with one potential
solution: "The technical solution has been there for quite some time,
look at citeseer where all the references are extracted automatically
(the code of the citeseer, the old version, was available upon
request - I dont know if that is the case now, but it was in the
past). That would be the right way to go, imo. I think to remember
one citeseer-based library for economics existed, so not only the
computer-science texts with predictable reference styles are possible
to process. With humanities it is yet another story.")

Stevan Harnad

 
Received on Fri Jan 23 2009 - 15:52:51 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:39 GMT