Re: Repositories: Institutional or Central ? emergent properties and the compulsory open society

From: Tomasz Neugebauer <Tomasz.Neugebauer_at_CONCORDIA.CA>
Date: Fri, 6 Feb 2009 13:38:23 -0500

Research repositories, whether they are a physical library, an electronic journal archive, an institutional repository or a subject repository, are collections of interconnected components. Understood in this way, as systems, they have emergent properties. That is, properties of the collection that none of the components (eg.: individual research articles) have, as well as properties of the components that the components have as a result of being a part of that collection (eg.: relevance ranking with respect to a topic within that collection). What are some examples of emergent properties of repositories: the subject coverage, the intended purpose of the collection, the demographics of the readers and authors of the collection, etc.

When a researcher makes the decision to publish/provide access to their work, the emergent properties of the repository are a relevant consideration. Consider the following hypothetical situation: a researcher in Buddhist studies may, for example, object to being "mandated" to the act of placing his article on the topic of "interdependent co-arising" in the same repository that is also home to articles from another department in his institution that specializes in, say, promoting the philosophy of Charles Darwin in social science. That researcher may wish to place his article in the Tibetan and Himalayan Digital Library, but not in the IR of his university. I agree with Thomas Krichel that researchers currently have the freedom to choose and promote the channels of distribution for their work.

About Arthur Sale's statements such as:

Arthur Sale:
"Researchers are not free agents.
I strongly support academics being required to contribute to their discipline and access to knowledge (and opinion). Otherwise why are they employed?"

In my opinion these statements can only succeed in creating resistance from researchers. I don't think that "the compulsory open society" is what Karl Popper had in mind when he wrote The Open Society and Its Enemies; "Open access in your employer's IR, or else!" The fact that the Open Society Institute claims to be inspired by Karl Popper's Open Society and its Enemies does not mean that Popper ever intended to have his theories be implemented through OSI's NGOs, or at all. The Open Access Initiative claims to define and promote "open access", but the concepts of open society and open access reach back to antiquity and touch on paradoxes of freedom and political theory. As an aside, OAI-PMH is a "a low-barrier mechanism", but a barrier nevertheless - perhaps not a paradox, but there is something counterintuitive about promoting open access with a new barrier.

The concepts of open society and open access existed long before Budapest OAI and OAI-PMH. And here we can go back to Jean-Claude Guédon's point about the fact that this debate goes a long way back, and that there is an important difference between theory and practice:

Jean-Claude Guédon:
"This is an old debate where one should carefully distinguish between two levels of analysis.

1. In principle, is it better to have institutional, distributed, depositories, or to have central, thematic, whatever depositories?

2. In practice, we know we will not escape the will by various institutions to develop central, thematic, whatever depositories (e.g. Hal in France). And these depositories will exist. The question then becomes: how do we best live with this mixed bag of situations?

Pursuing the battle on principles is OK with me, but it does not get me enthused.

Pursuing the battle on the pragmatic, practical level, knowing that various tools exist that will restore the distributed nature of these depositories anyway, appears to me far preferable."

I agree with Jean-Claude Guédon regarding the important difference between theory/principle and practice. However, I don't agree with the last sentence, where he expresses confidence that "various tools exist that will restore the distributed nature of these repositories", I am not convinced of this. A while back there was a posting on this list about A Physicist's Challenge to Duplicate Arxiv's Functionality Over Distributed Institutional Repositories:

"If you want to convince me [that institutional self-archiving plus central harvesting can provide all the functionality of Arxiv ], then try to do so by conducting the following experiment with any... "harvesting" vehicles you like:

    (1) Choose an area, such as Mathematical Physics, or Integrable Systems, and find all the papers that have been deposited in any of the archives that they cover, within the past week. (If they cover 95% of the arXiv, they must necessarily producethis information just as well). No other barrage of junk; just that simple list of papers.

    (2) Do the same with respect to all the posted publications by a given author for the past ten years. Again: not a barrage of google-like junk dumped upon you, but this specific information. (If I want a ton of junk, I can also go to Google scholar, and waste endless time trying to find what I need.)

    (3) Find out, at one go, if a given article, or set of articles, from the above list, has been published in a journal , and what the journal reference is.

    (4) Get a copy of any of these articles, at once, in any convenient format, like .pdf, that is available.

    (5) Be equally sure that all the above is simultaneously done for all such articles deposited in individual institutional repositories.

If you can do all the above, successfully, you will have given the 'proof of principle'."

This challenge is about recreating some of the emergent properties of arXiv with distributed IRs. I think that even this problem is currently unsolved and it will be very difficult to solve at best. It calls for authority control on the researchers' names in a distributed environment that includes thousands of repositories from all subjects. And this challenge calls for the re-recreation of only some, and not all, of the emergent properties of arXiv.

I think that ignoring the relevance of emergent properties of collections is a mistake. I remain skeptical of attempts at formalizing this abstract notion of "collection" into the data model of an IR software (such as is the case with DSpace), as well as the vision of future harvesters that recreate the emergent properties of subject-thematic repositories with probabilistic algorithms. I do not object to trying to create these new algorithms and technologies, in fact the topic is of great interest to me, but I don't think it is helpful trivialize that which is far from trivial.

I am not an opponent of IRs, in fact, I am preparing one for Concordia University, but I see IRs as a service that the university offers to its faculty.

Tomasz Neugebauer
Digital Projects & Systems Development Librarian
Concordia University Libraries
1400 de Maisonneuve West (LB 341-3)
Tel.: (514) 848-2424 ex. 7738

-----Original Message-----
From: American Scientist Open Access Forum [mailto:AMERICAN-SCIENTIST-OPEN-ACCESS-FORUM_at_LISTSERVER.SIGMAXI.ORG] On Behalf Of Thomas Krichel
Sent: Thursday, February 05, 2009 9:46 PM
Subject: Re: Repositories: Institutional or Central ? [in French, from Rector's blog, U. Liège]

  Stevan Harnad writes

> (Academic freedom refers to the freedom to research (just about) whatever
> one wishes, and to report (just about) whatever one finds and concludes
> therefrom.

  in the channel of one's choice. IRs should make themselves
  publication channels of choice.


  Thomas Krichel
                                               skype: thomaskrichel
Received on Fri Feb 06 2009 - 19:20:02 GMT

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:49:40 GMT