Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Report on CODATA 2008 Conference

To members of the Electronic Publishing and Database Committees
(copy to IUCr staff for information)

Herewith my digest of the highlights of the recent CODATA meeting in
Kyiv.

Brian
_________________________________________________________________________
Brian McMahon                                       tel: +44 1244 342878
Research and Development Officer                    fax: +44 1244 314888
International Union of Crystallography            e-mail:  bm@iucr.org
5 Abbey Square, Chester CH1 2HU, England

==============================================================================

Scientific information for society - from today to the future
-------------------------------------------------------------

CODATA 2008 - Kyiv, Ukraine, 5-8 October 2008

The theme of the 21st International CODATA Conference continued the
emphasis on the information society that has emerged in the last few
biennial meetings. But if the last conference in Beijing focused on
the maturity of CODATA after 40 years of promoting and representing
international data science, the 2008 meeting took as its keynote the
importance of engaging the younger generation of scientists to lead
future developments in a world community increasingly dependent upon
information and scientific data.

Plenary lectures
----------------

A sombre assessment of the problems that need to be tackled was
provided in the plenary lecture of Bohdan Hawrylyshyn ("Information
and knowlegde as a tool in facing global challenges"); put bluntly,
his thesis was "the world is sick in its main components". Focusing in
turn on the primary areas of demography, ecology, economy, geopolitics
and the failure of the institutions of society (schools, churches,
even family), he provided a reasoned but unsettling picture of the
state of the modern world. But the advantage of a rational analysis is
that it suggests rational responses, and of course scientific research
and analysis can help to fuel sensible and carefully-judged
responses. For each aspect of his analysis, he provided pointers to
how science could help to address the decisions that needed to be
made. It was important that "social wisdom" should grow in an effort
to keep pace with rapidly developing technologies; and he cited the
example of Scandinavian countries that, in his view, better
demonstrated approaches to development that had a sustained emphasis
on social justice.

The plenary lecture of Michael Zgurovsky ("Interdisciplinary
scientific data for sustainable development global simulation")
suggested a practical approach to developing analytic tools for
modeling sustainable development across the nations of the world. A
numeric model of suitable metrics to describe the stability and
security of individual nations could be built from a matrix of factors
describing performance along the economic, ecologic, social and
institutional dimensions - the "sustainable development gauging
matrix" (SDGM). Sustainability, based on the UN declaration of 1996,
is considered an important metric in characterising economic and
political stability, and a number of entries in the SDGM demonstrated
how many economically strong nations were still ranked low in terms of
security of their societies. It was argued that global modeling in
this multidimensional way was important in making well informed policy
decisions.

These global views were accompanied by a special presentation from
Wataru Iwamoto, representing UNESCO, who described many of the
initiatives in which UNESCO is playing a part to promote the
information society. These include the promotion of open access or
differential pricing for access to scientific information; the
development of metadata to facilitate long-term archiving; the
promotion of evidence-based decision making in national policies; and
the recruitment of young scientists and other workers in these tasks.

While supranational agencies are promoting evidence-based policy
making, there is a practical need for high-quality technical
structures to support the management and analysis of the large amounts
of data involved, and in a plenary lecture "The EGEE infrastructure
and its support for European scientific collaboration", Robert Jones
described a particular collaborative effort to provide such a
structure. EGEE is entering its third two-year phase of operation to
provide and increase the capacity of a production computing Grid
infrastructure. With support from over 50 countries in and beyond
Europe, EGEE includes over 300 sites linked together in a
collaborative model. Applications cover many fields, including
high-energy physics, earth sciences and life sciences, and the system
provides not only high-capacity, highly resilient hardware, but
middleware linking contributing centres in coherent "virtual"
organisations. It was acknowledged that data management across the
various applications is still rudimentary compared with the hardware
and middleware provision, but the quality of service provided is very
high, and is promoting an enormous amount of new and exciting
science. Although the current approach is still project-based, the
ultimate goal of EGEE is to provide a long-term sustainable Grid
infrastructure throughout Europe and collaborating partners.

In the final plenary lecture, "Curating data? What about curating
services and workflows?", Carole Goble presented a complementary
approach to linking together complex scientific data-driven
inquiries. In the life sciences, over a thousand databases are
regularly used by bioinformaticians. They are increasingly disparate
in structure and architecture, but are usually accessed through Web
services. This allows the construction of workflows that combine,
integrate, link, process, derive and curate data resources from any
combination of these database sources. The workflows are instantiated
as discrete modules within a computational framework, that can be
exchanged, extended and linked as required. Workflows do have
advantages, in that the choice of modules automatically documents the
processes involved in managing data from a number of disparate
sources. On the other hand, the individual modules are constantly
evolving as living program segments, and so it is essential to capture
the particular versions used in any application. Curation of such
rapidly-changing components is not easy. Neither is validation,
especially as a community of authors contributes workflow modules to a
common pool. At this stage in their development, workflows are being
generated by an active community, that is equally active in quality
assessment and validation. They are carried along on the wave of
enthusiasm for social computing and networking that underlies the "Web
2.0" approach. The "bottom-up" approach to building solutions is in
some ways at the opposite pole from the large-scale integrated
architecture that can be seen in network infrastructures like EGEE;
but it has a real potential to solve problems and perhaps to catalyse
the development of a completely new approach to computer-assisted
problem solving.

Oral sessions
-------------

If the plenary sessions provided an opportunity to state and develop
the overall theme of the conference, the multiplicity of parallel
sessions provided ample evidence for the diversity of activities
embraced by CODATA. Just a few examples of the session topics will
illustrate this: Information Society, global climate change, Grid
infrastructure, geophysical data systems and analysis, biodiversity,
scientific capacity building, repositories for scientific data,
materials: data exchange, nanotechnology, natural disasters and risk,
e-science collaboraion, International Polar Year, biological and
genetics data, etc. The full programme can be reviewed
on the CODATA web site. However, a definite disadvantage of so many oral
presentation sessions (up to 11 in parallel) is that the
interdisciplinary nature of the conference becomes diluted, as each
session focuses on a particular discipline, and it is impossible to
see at one time how different communities face and tackle the same
problems in their different environments. I would certainly recommend
that future programme committees reduce drastically the number of
parallel sessions, and work harder to ensure that each session
explores topics of interest across subject boundaries. There would be
merit in expanding greatly the number of posters presented, since
there is clearly an enthusiasm for presenting the results of research,
and a large poster session can generate much discussion and
excitement.

Among the sessions that I attended, almost at random given the choice,
were a number of exciting astronomy sessions that reviewed many of the
collaborative initiatives contributing to the Virtual Observatory
projects characterising much contemporary astronomical work. A keynote
talk by George Djorgovski was particularly good at demonstrating how
the virtual observatories of astronomy sat within the broader context
of e-science. Modern information technology hardware can - just about
- keep up with the explosive growth in data volumes (large digital sky
surveys currently collect 10 or 100 terabytes of data, and forthcoming
ones will collect petabytes; the latest generation of telescopes can
collect 30 TB per day). There are now real problems in keeping up with
real-time data analysis, and the science is challenged not only by the
data volume, but increasingly by its complexity, such as with
panchromatic (multi-wavelength) views of the Universe, and the
additional computational challenges of simulations. A particular point
of note was the increasing reliance on computational modeling, so that
computer science is in many areas becoming the "new mathematics" of
scientific discovery.

Other astronomy talks covered a range of large-scale observational
projects, including Russian, Armenian, Ukrainian and European
ventures. There were also discussions of the benefits of common data
formats (FITS, VOTables), common interrogation languages and a common
data model in unifying the discipline and increasing the synergy of
collaborative projects. There was also a very nice presentation by
Fabien Chereau of Stellarium and VirGO, open-source desktop
planetarium applications that tap into the large databases of
astronomical objects that are openly available, and allow both amateur
and professional access to fundamental data.

Two sessions on biological responses to low dose radiation illustrated
the rather more mundane, but extraordinarily practical, benefits of
careful collection and comparison of data from individual incidents -
in this case the widespread exposure of human and other biological
populations to radiation from the atomic bomb detonations in Japan and
the Chernobyl reactor incident in Ukraine. A number of careful studies
were reported, building up a more complete picture of long-term health
effects from a (thankfully) very small number of direct observations;
and variations in the epidemiology of various forms of leukaemia
between the two cases provide an example of the new knowledge that can
be gained.

The session on long-term data and knowledge management surveyed a
number of large-scale and successful approaches to archiving, such as
those of the Earth Sciences Sector of Natural Resources Canada, NASA's
Planetary Data System, and the data management and publishing
activities of the Canada Institute for Scientific and Technical
Information (CISTI). Bob Chen of Columbia University made the very
important point that governance and organisational sustainability are
at least as important in building durable archives as the technical
infrastructure and data storage capacity that is most often
discussed. Arrangements to provide long-term archiving for data
collected by the Center for International Earth Science Information
Network (CIESIN) involve lengthy discussion with Columbia University
Libraries to guarantee the preservation of existing data long after
CIESIN itself may have disappeared. Other contributions in this
session looked at the prospects for peer-reviewed data publication, to
confer appropriate academic credit on data generators, managers and
analysts, and to provide citable records; and what could be learned
from the policies and norms for collaborative production and
dissemination of scientific data sets from the activities of the
open-source software developer community.

A stimulating session on physical science: data quality and databases,
which I was privileged to co-chair with Fedor Kuznetsov, included
excellent surveys of science data quality, especially in applied
sciences, as managed in China through application of national and
international standards (Hu Lianglin); of the extensive and careful
programmes of standard reference data evaluation as carried out by the
National Center for Standard Reference Data in Korea (Chang Geung Kim,
H. S. Suh et al.); of the many important physical databases throughout
the Russian Federation (T. Golashvili), in which attention was drawn
to the need to differentiate carefully between reference, recommended
and standard data values; and in nuclear data science activities in
India (S. Ganesan). This latter presentation included a vivid
illustration of the importance of continuously updating working
practices and associated documentation to reflect revised values of
physical data, as failure to do so had led to a near-accident in an
Indian nuclear reactor. Vigorous attempts to redress this problem, and
energetic efforts to practise the highest quality of nuclear science
in the power industry, demonstrate a maturity in Indian nuclear
science that is reflected in the growth of international collaborative
projects in nuclear science and technology, and in high-energy
physics. The session also included a warning from a Russian
high-energy physicist, Vladimir Ezhela, that physics journals needed
to provide full machine-readable copies of numerical measured data as
reported in their publications, to allow adequate refereeing and
quality assurance. He provided the example of negative eigenvalues in
the correlation matrix of certain combinations of the fundamental
physical constants that would be obtained if the published values of
the constants were used, and not their full-precision values. The IUCr
of course requires deposition of experimental data to allow numerical
peer review (and by the nature of our subject we can conduct most
routine validation automatically); in discussions it was suggested
that the International Union of Pure and Applied Physics (IUPAP)
should be engaged to explore similar policies in physics; or that a
CODATA Task Group might be a useful way to approach this. The session
concluded with a challenging paper presented by Dong Bong Yang, Gun
Woong Bahang and Sang Zee Lee that suggested a new natural units
system to define all physical constants as well as the SI units by
dimensionless numerical values.

Finally, the session on data visualization approaches, which promised
an interesting variety of examples, was disappointing because many
speakers failed to show up. Nevertheless, Jean-Jacques Royer presented
to the many local students present an excellent overview of the
three-dimensional subsurface modelling carried out by his group at the
GOCAD project, University of Nancy. I also demonstrated the IUCr
approach to the interactive visualization of data as a feature of
online crystallography journal articles.

Awards
------

The CODATA Prize was awarded this year to Liu Chuang, Professor and
Director of Global Change Information and Research Center at the
Institute of Geography and Natural Resources, Chinese Academy of
Sciences, who has been very actively involved as Co-Chair with the
CODATA Task Group on the Preservation and Archiving of Scientific and
Technical Data in Developing Countries, and who served on the ICSU
Priority Area Assessment (PAA) Panel on Scientific Data and
Information. Her lecture on receiving the CODATA Prize was entitled "A
worldwide solution for bridging the digital divide for innovative
research and development", and ranged widely over her many activities
within CODATA and other organisations to promote archiving,
development and innovation. Among the highlights were the creation and
active development of the CODATA Task Group on Preservation and
Archiving, various workshops, the development of an Open Data policy
in China, the presentation by CODATA at the World Summit on the
Information Society (WSIS) meeting in Tunis, the Berlin Declaration on
open access, and the identification of bridging the digital divide as
a strategic goal for CODATA following the ICSU PAA. She concluded by
looking forward to the activities of the newly created United Nations
Global Alliance for ICT and Development (UN-GAID).

At the same prize-giving ceremony, the Sangster Award 2008 for a young
Canadian Scientist was awarded to Sabrina Fortin, who subsequently
presented a paper on "Normative models to manage collective research
resources - from commons to contracts: the case of human populational
databases" in the parallel session on biomedical data sharing and
informatics.

In this, and in many other ways, the CODATA conference made strong
efforts to showcase young talent. A number of presentations were
singled out as contributions from young scientists. A Young Scientist
Roundtable was held, from which came the idea that a CODATA Working
Group should be formed by young scientists, with a longer-term goal of
establishing a full Task Group. The idea of a CODATA Prize for Young
Scientists was floated. For me, however, the most direct way to reach
out to young scientists was to engage directly with the many students
and youg researchers who were able to attend sessions, and who helped
out as part of the local organisation. This was a real benefit of
holding such a conference in a university environment, and the
cheerful hospitality and enthusiasm of the local students was greatly
appreciated, and will not easily be forgotten.

Summary
-------

As always, I found the CODATA conference a stimulating meeting,
providing a useful cross-disciplinary survey of progress in data
science. The IUCr has benefited from hearing many of the
presentations, and I hope it has also provided stimulus and input to
participants through our involvement. I certainly took advantage of
many informal opportunities to make new contacts, open up new
possibilities for collaboration, and indeed make new friendships. I
hope that the next conference will be structured with fewer parallel
sessions, in order to maximise the opportunities for exploring
interdisciplinary themes, and I also hope that CODATA will continue to
value the contributions of the more laboratory-based sciences in
emphasising the necessity for quality assurance, critical peer review
and proper annotation and management of scientific data.

Brian McMahon
CODATA Representative

==============================================================================

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.