Crystallographic data

[Jmol visualization]Interactive Publications and the Record of Science

The following report also appears on the ICSTI web site and in the UK Serials Group Serials e-News newsletter.

An ICSTI Workshop in Paris on 8 February 2010 sought to showcase interactive innovations in scientific research publications, and also to explore the challenges in archiving such features as a persistent component of the scientific record.

An early session showed three examples of visualization technologies used within journal articles: optics and medical imaging accessible in three dimensions through a helper application launched by following a hyperlink in the article PDF; interactive PDFs, where 3d figures can be manipulated directly within the page image; and a molecular visualizer sitting within the browser window view of a full-text HTML article. Each technology has its advantages and disadvantages. A publisher can support a helper application relatively easily; all that is needed is a link to a data file and a means of flagging the special nature of the data file. The end user has the burden of actually installing the software (the counter to this is that the user is free to choose a favourite among alternative programs, if they exist). Interactive PDFs are even easier for the publisher to deliver, and often the creation of the interactive figures can be delegated entirely to the author. However, there is a concern about vendor lock-in to a specific solution; and librarians, already wary of over-reliance on PDF as a long-term archive medium, currently endorse standards that prohibit the use of such 'advanced' features of PDF. Embedded applets require most effort from publishers to support; but in principle the publisher can control upgrades, provide authoring tools, and build procedures to integrate the authoring and editing of such applications into the journal production workflow.

There were interesting discussions on the use and usability of such applications. Scientists do value the richer insight into data that good visualization tools provide (a later presentation on powerful visualizations of OECD data sets demonstrated that the benefits are not reserved for the hard sciences alone); but objective tests of the degree to which these tools do improve readers' comprehension and recall suggest that the benefits can be overstated. It seems that the brightest future lies in providing tools that allow the end user to access the underlying data directly, and not just as interpreted through a particular visualization. There was also interest in the implications this has for the peer review process. While the ability to access data sets for review was highly desirable, the need to invest even more effort in effective peer review was challenging.

A second session focused on added value through semantic enhancements to the content. This could be done by marking up the content during the editorial process, as practised by the Royal Society of Chemistry, or by tagging the content on-the-fly using ontologies and machine-derived relational markup in the approach of the Concept Web Allliance. Both approaches allow articles to hyperlink to related resources - definitions of terms, databases of related content, suppliers of relevant products or services. The two approaches differ in quality control and in timeliness: the editorial markup is frozen in time (although it can point to changing resources); dynamical tagging can change as related resources change, but cannot in consequence be so readily archived.

Another approach to semantic enhancement is to restructure the presentation of the article to take advantage of the fluidity of the online medium, and to integrate text, graphics, associated data and other supporting materials, multimedia annotation and content in a more dynamic yet structured format. This approach has been developed by Cell Press in their new 'Article of the Future' format.

A post-lunch session was devoted to the archiving challenge. Much dynamism and interactivity arises from linking to novel components of a scholarly article - data sets, multimedia content, etc., which may or may not reside alongside the article itself on the publisher's web site. It is important to be able to locate and relate these components, and so continuing developments to register and characterise persistent identifiers are vitally important. CrossRef has undertaken and is pioneering many developments in this area, and the registration of DOIs for data sets by the new DataCite initiative will help to establish the necessary infrastructure for preserving these relationships. Meanwhile, large national libraries such as the British Library are still most concerned with archiving content streams - storing the bits, and hoping that the relationships between them will look after themselves, or at least that they can be addressed later.

The final session, exploring the new directions in which technology is taking the whole process of scholarly communication, may have given the libraries small comfort. Nature Publishing Group has been energetic in recent years in experimenting with many new technologies in scholarly communication. Their references back to the journal's 19th-century founding principles promoting early information and communication between scientists suggest that pursuing this mission will not be hindered by concerns that not everything might be archived for posterity. The SciVee project, allowing scientists to provide video annotations and commentaries on their more conventional publications, also highlights the perception that innovation in scientific communication is currently driven more by immediacy than by concern for 'keeping the minutes' of science.

Overall, the impression is that much more work needs to be done to incorporate interactivity fully into the historical record. All the interactive visualization projects represented provided parallel or failover static figures, ensuring some archivability of the interactive content. Those that permit deposit of the original data files will secure the preservation of more content for posterity (and perhaps through standardization of graphical description or scripting languages they may retain more of the essence of the interactivity: preserving 'not how it looks, but what one was looking at').

This was a stimulating workshop, bringing together two topics that are rarely tackled within the same forum, and allowing a first-rate panel of expert speakers to illustrate and identify the challenges in bringing both into harmony. Publishers are still wary of exploiting interactive technologies to the full - no doubt in part because of concerns over long-term preservation. But, paradoxically, the more publishers endeavour to make the interactive an everyday component of their products, then the clearer will become the formal relationships between the components of the interactive article, and the easier - in the long term - will it be to encapsulate those relationships within the formal record of science.

Brian McMahon
Workshop organiser