Linking raw experimental data with scientific workflow and software repository: some early experience in the PanData-ODI project


Erica Yang* and Brian Matthews

Scientific Computing, Rutherford Appleton Laboratory, Science and Technology Facilities Council, Harwell, Oxford, Didcot OX11 0QX, UK

Abstract

Large facility providers often have developed mature data and publication infrastructure to capture the scientific outputs from experiments. The aim is not only to ensure the long term accessibility of these digital assets, but also demonstrate the prolonged impact of the research they support. Traditionally, the emphasis is on the cataloguing and archiving processes of the two ends: raw experimental data and publications. However, due to the rapidly rising data rate and volumes from scientific experiments and the complexity of certain types of data analysis, researchers become increasingly reliant on the infrastructure services provided by the facility operators.

This talk presents the early evidence we gathered in the PanData-ODI project in the data provenance work package to demonstrate the emerging needs in the community and to present some early snapshots of our approach to address the problem. In particular, we will examine the interplay of experimental data archive, scientific workflows and software repositories.