Re: Optimising Deposit and Import Capabilities of EPrints

From: Yorick Wilks <yorick_at_DCS.SHEF.AC.UK>
Date: Fri, 18 Aug 2006 16:12:49 +0100

 (was Re: measuring affiliation)

It would be nice if this eminently sensible proposal could be done within the overall Semantic Web movement involving unique semantic identifiers for all major classes of objects (of which authours may well be one).
Yorick Wilks

On 18 Aug 2006, at 15:13, guedon wrote:

The author disambiguation is indeed a really important issue. It affects
all kinds of things, ranging from the Science Citation Index to even
some commercial offerings. For example, while searching an author in a
Springer journal the other day, I noticed that their own search engines
distinguished between the author's name with full first name from the
same author's name with just the initial... I had to search through two
lists of articles instead of one.

I believe that scientific and scholarly authors ought to be given a
permanent identifier which ought to accompany their publication in any
journal that carries peer review. In effect, it would be the equivalent
of an ISBN.

The easiest way to begin implementing this PAI (Permanent Author
Identifier) might be for a group of journals to come together and agree
that when a paper is submitted, the author must supply his/her permanent
identifier. If he/she does not have one, indicating so would mean that
the cooperating publisher would assign one immediately and would place
it in an open database. Universities could encourage their students to
take up such an identifier as soon as these are on a track (e.g.
doctoral studies) that should lead to some publishing.

In conclusion, I do not claim to have clear strategies about this PAI,
but the need for one appears very high to me. In particular, it would be
very useful for institutional repositories and the OA movement in

Google and other large search engines might be interested in supporting
such a development. It would greatly enhance the capability of Google
Scholar. Countries that do not use the Latin script or use it with funny
diacritical marks (as in Guédon) might also find it useful to have their
scientists unambiguously visible in the whole world, even though this
might decrease the number of "scientists" for any given country.



Le vendredi 18 août 2006 à 08:51 -0400, Timothy Miles-Board a écrit :
The EPrints team have been looking at this issue in some detail. The current
version of EPrints has "clone" and "new version" options which save having
to re-enter metadata for similar/different versions of an existing deposit.
However, this doesn't help much if you are starting a new deposit. The
approach we've been favouring of late is auto-completion (like Google
Suggest, whereby the depositor begins typing
the first few characters of the name of a co-author and is presented with a
pop-up list of suggestions. The behind-the-scenes logic that determines what
to suggest can be customised to an individual repository's requirements e.g.
suggest from the list of registered users, suggest by looking up in the
institutions user account (e.g. LDAP) server, suggest according to an
internal database list of institutional and non-institutional users. The
previous deposits that you have made can also inform the list of suggestions
e.g. frequent/recent co-authors can be promoted to the top of the list of

This is not just about minimising keystrokes - the suggestion mechanism we
implemented is also able to carry additional data about the authors being
suggested. You mention the potential for cross-linking an author's work
between archives. In order to do this you need to be able to uniquely
identify them. Author disambiguation is potentially important for the
Research Assessment Exercise (RAE) in the UK. When an author's name is
autocompleted, the ID of that author is also attached.

We have also successfully applied the auto-completion technique to keywords
and journal names (with the ISSN number of the journal being passed with the
suggestion and used to auto-fill the ISSN field upon selection of the
intended journal by the user).

Although for the moment we've decided not to include it in the next version
of EPrints (3.0), it will be in a future version. In the meantime, I'd be
happy to describe our technique in more technical detail on the
wiki if that would be useful (creating an autocompleting field in the
EPrints deposit form using an open source AJAX library is straightforward-
the complicated bit comes in designing the (independent) program that makes
appropriate and useful suggestions in reponse to the user's keystrokes).

It is also worth noting that EPrints 3.0 will have a number of new options
for importing data e.g. users can create new deposits by cutting and pasting
BibTeX/EndNote/etc entries from a bibliography file into a textbox and
hitting a button.


Timothy Miles-Board
EPrints Services
Southampton, UK
Consultancy - Training - Hosting
On Tue, 15 Aug 2006 11:08:58 +0100, Andrew A. Adams
<a.a.adams_at_READING.AC.UK> wrote:
Regarding this note, one of the things we're struggling with in setting up a
pilot of an IR at the University of Reading (the School of Systems
Engineering and the School of Maths, Meteorology and Physics are jointly
piloting an IR for the Univrsity) is that of manually inputting local
institutional co-authors. It's one of the weaknesses, IMHO, of the GNU
eprints software that it doesn't have two methods of author input - selection
from a list of institutional users already registered, and free text input of
non-institutional authors. In fact, even with non-institutional authors, it's
quite common to regularly author joint papers with the same
non-co-institutional a number of times, if one has a productive external
collaboration. I would prefer, rather than manually entering each author name
in free text, to have a search system available for "registered authors" not
all of whom need to be registered users of the system (which deals with the
issue of people leaving institutions and stopping being registered users but
remaining as authors for their prior papers). If a new co-author is to be
entered, then minimising the number of keystrokes and the utility of having
more than just free-text name-entry only available, though not neceesarily
mandated, should be considered. As the IR grows then, if it is deemed useful,
people can be employed to add extra information onto the non-user author
details, such as affiliation at the time the paper was deposited, and
possibly cross-links to other IRs containing the works of that author (which
could also be useful for authors moving between institutions).
*E-mail*********  Dr Andrew A Adams
**snail*27 Westerham Walk**********  School of Systems Engineering
***mail*Reading RG2 0BA, UK********  The University of Reading
****Tel*+44-118-378-6997***********  Reading, United Kingdom
Received on Fri Aug 18 2006 - 19:04:50 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:28 GMT