Re: Manual Evaluation of Algorithm Performance on Identifying OA

From: David Goodman <David.Goodman_at_LIU.EDU>
Date: Sat, 17 Dec 2005 17:36:53 -0500

We post the present note about the project separately, because
it has the agreement of only the 3 authors, KA, NB and DG
 
For our own non-technical interpretation of these results, see:
Goodman, David and Antelman, Kristin and Bakkalbasi, Nisa (2005) Identifying Open Access Articles: Valid and Invalid Methods. In Proceedings XXV Annual Charleston Conference: Issues in Book and Serial Acquisition, Charleston, South Carolina.

available at http://dlist.sir.arizona.edu/968/

David Goodman, Palmer School of Library and Information Science,
 University <dgoodman_at_liu.edu>
Kristin Antelman, North Carolina State University Libraries
<kristin_antelman_at_ncsu.edu>
Nisa Bakkalbasi, Yale University Library
<nisa.bakkalbasi_at_yale.edu>
 
 
________________________________

From: David Goodman [mailto:David.Goodman_at_liu.edu]
Sent: Sat 12/17/2005 12:06 AM
To: American Scientist Open Access Forum; AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_listserver.sigmaxi.org
Subject: Manual Evaluation of Algorithm Performance on Identifying OA


We have just posted the results
from our cooperative project:

Antelman, K., Bakkalbasi, N., Goodman, D., Hajjem, C. and Harnad,
    S. (2005) Evaluation of Algorithm Performance on Identifying
    OA. Technical Report, North Carolina State University Libraries, North
    Carolina State University. http://eprints.ecs.soton.ac.uk/11689/

ABSTRACT: This is a second signal-detection analysis of the accuracy
    of a robot in detecting open access (OA) articles (by checking by
    hand how many of the articles the robot tagged OA were really OA,
    and vice versa). We found that the robot significantly overcodes for OA.
    In our Biology sample, 40% of identified OA was in fact OA. In
    our Sociology sample, only 18% of identified OA was in fact OA.
    Missed OA was lower: 12% in Biology and 14% in Sociology.
    The sources of the error are impossible
    to determine from the present data, since the algorithm
    did not capture URL's for documents identified as OA. In conclusion,
    the robot is not yet performing at a desirable level, and future work
    may be needed to determine the causes, and improve the algorithm.

  (in alphabetical order)
Kristin Antelman, North Carolina State University Libraries < kristin_antelman_at_ncsu.edu>
Nisa Bakkalbasi, Yale University Library <nisa.bakkalbasi_at_yale.edu, >
David Goodman, Palmer School of Library and Information Science, Long Island University <dgoodman_at_liu.edu>
Chawki Hajjem, Institut des sciences cognitives, Université du Québec à Montréal <Hajjem_at_vif.com>
Stevan Harnad, Institut des sciences cognitives, Université du Québec à Montréal <harnad_at_ecs.soton.ac.uk>
Received on Sat Dec 17 2005 - 23:52:47 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:09 GMT