OAI3 - CERN 12 - 14 Feb 2004

CERN Workshop Series on Innovations in Scholarly Communication: Implementing the benefits of OAI.

The thrust of the workshop was in the area of self-archiving or institutional repositories. With OAI (Open Archive Initiative) operating from servers compliant with the PMH (Protocol for Metadata Harvesting), it is hoped that the content of these repositories may be shared, thus providing a viable alternative to traditional publishing activities in stm. Raf Dekeyser welcoming conference participants recalled the vision of OAI-PMH but immediately pointed out that the benefit for libraries and scholarly communication of OAI is not yet there for the moment. It would not be an overstatement to say that much of the workshop gave the impression of being an exchange of experience between librarians learning how to become efficient and effective online publishers. There were few scientists (3) amongst the 180 participants and even fewer publishers (2). In fact 3 + 2 = 4 because the two sets overlap. 

I feel obliged to report that there were a significant number of presentations in which the speakers spent most of their time asking questions (or rather pointing out problems) to which not only had they no answer but to which they seemed to have given little or no thought.

The OAI-PMH was designed to be a protocol with a low barrier to entry. From what was presented it is clear that in practice things have not at all worked out in the way that was intended. Implementors have trouble coming to grips with XML. Repositories are difficult to discover and evaluate. The metadata is often entered by non-professionals and is of little use. Metadata must at least be provided as Dublin-core metadata but this is almost always unsatisfactory to build significant applications by service providers. That richer metadata which is available in a repository is often not exported by the PMH and in any case probably would not be understandable to a service provider. Both data and service providers are asking for Rights metadata in the PMH. In the basic design of OAI it was thought that the OPEN nullified any necessity for Rights data. Rights and the PMH data model do not fit well together. Another general problem is that of filling an institutional repository with content. A vast majority of academics are not interested in such repositories as a means of scientific communication. Most institutional repositories are being designed to deal essentially with research output and pay scant attention to the core activity of many universities i.e. students and teaching. My little enthusiasm for institutional repositories has definitely been diminished by attending this workshop.

You'll find my more detailed notes of the talks follow. Those of Velden, Asschenfeldt and Voss are particularly worth looking at.

Most of the slides of the speeches are on line at the conference site


Rak Dekeyser
The benefit for libraries and scholarly communication of OAI is not yet there for the moment.

Diann Rusch-Feja - International University of Bremen
Overview of OAI and its relation to scientific publishing in 2004 
Second verion of the OAI-PMH is available
She has many questions and no answers. No mention at all of the importance of the role of the funding agencies.

Carl Lagoze - Cornell
The OAI and OAI-PMH: where to go from here
Technical talk. OAI is suppoed to be low-barrier protocol but many implementers have trouble coming to grips with XML. Lagoze is trying out new systems to make OAI easier to use but it sounds rather complicated. With OAI-PMH they wanted not to have anything to do with rights and rights management but now they find that people want to put rights statements on the metadata. In fact metadata is not free in the making. They think they can produce a system for right statements which willl cover about 95% of cases. It seems to me that much of this is already going beyond what a money-tied institutional repository can manage.

Chris Awre - JISC
The UK Fair programme: OAI in context.
FAIR - Focus on Access to Institutional Resources deals with e-prints and theses. Also has set up a rights working group as there is a  need to clarify what can be shared. OAI does not address long-term preservation issues.

Lilian van  de Vaart - DARE
DARE: (a)Live and kicking
DARE - Digital Academic Repositories has now been running for two weeks. 

Peter Schirmbacher - Humboldt University, Berlin
Certification of a publishing server - an initiative of DINI
DINI has produced a report (copy available) on electronic publishing in higher education with the help of some 47 German univesities. The report is both a set of best practices for electronic publishers (all standard stuff known to any publisher) and a statistical inquiry to see how these are being put into operation. They found very wide differences in the implementations concerning policy, standards, etc etc. In fact the repository managers do not manage to achieve these best practices. In fact they fall very far short of them. Individual repository managers lack the experience and resources to achieve these best practices. DINI undertakes certification of repositories. Price is from EUR 50'000 for non-proift DINI member up tp EUR 250'000 for profit commercial organization. To date they have awarded one certificate.

Philip Hunter - UKOLN, Bath
Open Archives Forum
The funding for the OAF is now finished.  OAI-PMH is a low-barrier interoperability specification. He said that the University of Bath claims copyright on all material in its repository although the IPR belongs to the authors. (University of Geneva does exactly the same). Hunter has reservations about the university taking the copyright.

Theresa Velden - Heinz Nixdorf Centre for Information Management
On the open access strategy of the Max Plank Society
The MPS has initiated and signed the Berlin Declaration in October 2003. In fact the driving force is the President of the MPS himself (Dr. Peter Gruss) who as yet has not managed to convince all of his institute directors that OA is the way to go. Historically the MPS did not do its own publishing but has opened up a new 'Heinz Nixdorf Centre for Information Management' since 2-3 years. So the MPS is now providing infrastructure to implement their open access strategy and has a proactive activity to promote OA. The MPS president wants to abandon journal impact factors in evaluation exercises and wants it replaced by some measure of the intrinsic quality of a publication.
(ICSTI would be well advised to get Velden to come and talk and persuade the MPS to join.)

Martin Wynne - Oxford
OLAC - Open Language Archives Commmunity

Colin Steele - Australian National University
OAI: A down-under perspective

Ziga Turk - Ljubljana
Scientific Publication Process Re-engineering with SciX Open Publication Services
General analysis of publishing print and electronic.

Raym Crow - SPARC Consulting Group (USA)
Half Full: the improving state of scholarly publishing 
Funding in the USA for OA may come from the Howard Hughes Medical Institute. In the UK the Wellcome Trust is opening its purses. In France the CNRS has signed the Berlin declaration.

Christiane Asschenfeldt - Creative Commons
Copyright and Licensing issues - The International Commons
Creative Commons  provides standard licensing statements at four different levels of openness in conformity with law. One marks each article with the icon appropriate to the licensing arrangement in operation. The International Commons part of the project run by Asschenfeldt extends this concepts to the international arena providing licensing statements in the national language and in conformity with the national law. A translation into English is also provided. More details of the individual licensing schemes are available in her slides. Creative Commons have really created a set of template licences for use to protect creative works.

David Prosser - Sparc Europe
Two road, One destination: The Interaction of Self Archiving and Open Access Journals.
A very good speaker who described the activities of SparcEurope. In his slides there is a full list to show funding bodies interested in OA or who have signed the Berlin Declaration. 

Lotte Jorgensen - Lund
How to disseminate Open Access Journals through OAI, the DOAJ project 
DOAJ http://www.doaj.org - Directory of open access journals and their definition applies to whole journals and they apply OA in the full sense of the Berlin declaration. I understood that they do not include journals following the hybrid model.

Yakimischak & Krot - JSTOR
Building the JSTOR OAI-PMH Service: A Technical Case Study in Best Practices.
They are rewriting JSTOR and building in an oai-pmh delivery system. They showed the UML analysis of the design. For JSTOR the implemnetation of PMH is a huge project. Yakimischak recommends rather than the sharing of source code it is much more intesting to share UML source which describes the analysis of a system in terms of operation, function, data, etc. They did not mention OAIS at all.

Bill Hubbard - SHERPA, Nottingham
SHERPA  Institutional repositories and personal advocacy - Securing a Hybrid Environment for Research Preservation and Access 
The essential problem is that of obtaining content for an institutional repository from academics. He is of the opinion that it is only librarians who see the crisis in information diffusion and the serial crsis. He noted that research cultures vary across subject disciplines. Moreover institutional repositories have the problem of integrating into the institutional information service. Maybe it is the nature of the SHERPA project, but listening to Hubbard I had the impression that they no longer have any students at the University of Nottingham.

Saskia Franken - Utrecht
What do scientific authors want? Attracting scientists to institutional repositories
They have only really thought about publications done at the particular institution. No thought has been given to a multi-institutional set of authors

Ruediger Voss - CERN
Peer review in the era of LHC experiments. 
Voss described CERN past and future experiments. He explained how experimental high-energy physicists have had to devise their own procedure for publication in an era where one or two thousand physicists participate in an experiment and in its publication as 'authors'. He wondered whether the methods they have developed could be exported to other fields of science.  One has to know that since the era of the LEP, experiments have grown too complex to be mastered by the single scientist. Technical correctness of design, operation and anaylsis is difficult if not impossible to asses by classical peer review. A particular work is subject to internal review by collaboration. A strictly regulated multi-step process has evolved. A paper is written by a small editorial board (5 persons) and then reviewed by the publication committee. A draft is made public inside the collaboration for comments. There are iteration steps until programme committee decrees convergence. An 'open' archive local to the collaboration is esssential for efficient and tranparent management of the authoring and refereeing process. The paper is then subjected to formal traditional peer review with subsequent commentary but to be sure this is not much more than a rubber-stamping exercise. The above procedure has been successfully implemented by LEP asnd other major non-CERN collaborations. One has to be aware that ultimately quality assurance in particle physics is enforced by the organized redundancy of several different experiments using different methods and approaches and undertaken by different people.

Thomas Krichel - Long Island University
Building a discipline-specific aggregrate for computing and library
Krichel gave his talk over the telephone from New York because he was unable to leave the USA due to visa problems. The thrust of Krichel talk made it clear that he thinks institutional repositories will work better if they are backed up by discipline-specifc aggregation systems. Frankly I wonder whether the intermediate stage of the institutional repository for primary publication is not more of a handicap than an asset.


Reports of the Breakout Sessions:

(1) Implementation: the FAIR and DARE experience
Questions discussed: - how to incorporate data from arXiv - data versioning - the future of libraries as publishers - how  to use institutional repositories (IR) for research. One conclusion was that the IR should be more integrated into the core activity of the university e.g. teaching 

(2) The relationship between OAI-PMH and Dublin core: Required, recommended or other?
Many implementors have rich metadata but only put out the Dublin core. 'Dumbing down' the rich metadata is hard work and then implementors are too tired to put out the rich metadata itself. Also much metadata is input by non professionals and is of very poor quality even the Dubline core metadata itself.
There were questions about the appropriateness of using DC to describe people or places. The quality of  metadata content is a very real problem. Junk records make junky indexes. By way of responses it was noted that junk metadata is a problem borader than just the quality. The semantics of DC is too broad. The DC definition is very general and fuzzy so it is interpreted in all sorts of different ways. This is an intrinsic problem with DC.

(3) Possible enhancements to OAI-PMH  including MetaSearching and Authentication
MetaSearch is defined as searching several repositiories at the same time. (They did not mention the fate of X500. It acquired its rightful place as LDAP i.e. without distributed searching!)

(4) Overlay journals
For the ignorant (i.e. me) an overlay journal is one providing certification for material residing in distributed institutional repositories. A main difficulty is to find a certification or quality indicator which applies to items in one or more repositories. There are few existing examples of overlay journals. The principal issue is that on can not readily build cross-repository overlays until there is widespread, stable and accepted institutional repository infrastructure. Cross-repository overlays will require special metadata support.

(5) The OAI-PMH Community: How to enable better collaboration and communication between Data and Service Providers
    This is the breakout session that I attended which was preceded by demonstrations of some prototype software. One difficulty to locate relevant OAI repositories to harvest and especially in selecting individual sets within a repository to harvest. Registering repositories helps discovery. For the Ann Arbor compilation of OAI-PMH sites, they do some selection of repositories but it is more based on the technical specification of the data rather than its content. There were repeated calls for more controlled vocabularies and classification schemes. Apparently OCLC is working on producing cross maps between classification schemes. Another problem is that of copies of (the same?) metadata on several servers. One needs a system to identify the original of a source of metadata. There are many problems with character encoding. Many software screw up on UTF8. It seems that data providers are distributing all sorts of junk. Markup is not supposed to be included in the dublin core metadata but nevertheless it frequently occurs. There are many problems for data providers to provide rich metadata and at the moment most service providers are only using the Dublin core. The group thought that a catalogue of best practices would be useful for data providers. However there are not really any tools available for data providers to check out the compliancy of their data base with OAI-PMH. There are a few tools from OAI but they do not do a complete job.

(6) Open Access Citation Index
Guedon's answer to the ISI Science Citation Index. Lots of questions but few replies. In fact Guedon himself was not there to report.

(7) If you build it, Will They Come? Filling an Institutional Repository.
The discussion centered about the concerns of academics with respect to IRs and all of the usual things were brought to the fore. Moreover there was discussion on the concerns of learned societies.
Getting a journal to become viable takes five years. IRs may take even longer.

Panel Session: In the order the participants were invited to express their views. Roughly speaking starting with the Goodies and finishing off with the Badies.
(1) Peter Suber, Open Access Project Director at Public Knowledge
(2) David Prosser, director SPARC Europe
(3) Bas Savenije, librarian Utrecht
(4) Simeon Warner, OAI research associate Cornell
(5) Ian Butterworth, scientist etc, Imperial College London
(6) Howard Flack
(7) Desmond Reaney, Institute of Physics Publishing,  

My personal view is that much of the OAI movement seems to have turned into librarians trying to learn to become publishers.
