The Role of Scientific Data in e-Science:

How Do We Preserve All Necessary Data

John Rumble, Jr.

Information International Associates, Inc.

P.O. Box 4219

Oak Ridge TN 37831-4219 , USA

jumbleusa@earthlink.net

Scientific data are not homogenous in any manner. The disciplines generating data have widely varying practices with respect to the reporting of experimental, observational and calculation conditions and the resulting metadata. Archiving practices, in terms of direct deposition into community databases, inclusion in peer-reviewed papers, etc. differ greatly. Yet because almost all data are generated and managed electronically, the dream exists of making everything available.

Considerable work has started in developing data exchange and recording formats, ontologies, and direct deposition schemes, some of which are now quite mature. Yet a number of barriers exist to having the majority of scientific data available. Some are obvious, such as the cost, time, and data volume. A group of more subtle barriers present significant obstacles, such as the evolution of scientific knowledge and language, the gap between an ideal experiment, observation and calculation and reality, and the multi-center nature of scientific research.

In this paper, I discuss each of these obstacles in some detail, with specific suggestions about possible solutions. I describe the complexity of evolving scientific knowledge and language and outline the impact it has on e-data standards. I provide an outline of ideal and real experiments, observations and calculations and discuss how that relates to e-Science ontology development. I address the additional complications brought about by the geographic and discipline dispersion of data generation and research. Finally, I discuss how focusing on these barriers will greatly advance the e-Science movement.

Keywords: e-Science, data archiving, scientific knowledge, scientific data, ontology development, data standards