19th International CODATA Conference
Category: Data Archiving

Very Long Term Preservation of Digital Information Using Metadata and Ontologies

Prof. Dr. Heinrich Jasper (jasper@tu-freiberg.de)
TU Freiberg, Institut fuer Informatik, Germany


Long term preservation or archiving of digital Information accompanying technological artefacts of mankind is crucial in several domains. We have looked at the requirements for the preservation of information about nuclear waste disposal, especially at the requirements regarding the long term nuclear waste disposal site ERAM (Endlager für radioaktive Abfälle Morsleben) in Germany.The artefacts in ERAM must be watched for ten thousands of years and the information about disposed nuclear waste should be available for that time. Furthermore, there is an open end demand for all information concerning ERAM from legal authorities. For ERAM this regards especially information about the mine and its drifts, the contracts with disposing companies, the description and disposal site of each disposed object and the times series for exposition measurements of many radioactive nuclides.

The technological infrastructure of ERAM is heterogeneous and thus the resulting information is stored in various formats. Today, the long term preservation of digital information in ERAM is an open issue and pragmatically solved by redundancy: various regional authorities store parts of the information depending on their duties. Additionally, non-digital technology is used, i.e. visual copies on microfiches are provided for important information items.Several approaches are well known for archiving digital information in the long run: migration, emulation, universal virtual computer, preservation layer method, archaeological approach to name just a few. Our study shows several disadvantages for each of them and we want to overcome some of these by using modern strategies. We tackle the problem by using two techniques form digital libraries and semantic information management: metadata and ontologies. Metadata are well known for the description of information items for both databases and digital libraries. According to the heterogeneous content of the ERAM information bases metadata are additionally used for a detailed description of the storage media and storage structures. In addition, each information item of a specific type must have metadata for coding and decoding schemas, i.e. the corresponding algorithms. Obviously, our approach works only with open source components where the design information is open to the public, too. In order to access the data in the long run, all information and meta information must be migrated to future digital systems. Apparently, this migration must be handled over and over again when considering the time frame of ERAM. Using information items with considerably enhanced meta information in the sense described above, IT-professionals should be able to manage this manually. In our approach we want to support these engineers by an ontological framework, that guides them with knowledge about the information, meta information, infrastructures and algorithms used in the ERAM scenario. A first approach to this ontology will follow the abstract universal virtual computer concept defined by Lorie and used by the Koninklijke Bibliotheek in Amsterdam for archiving digital information. The presentation concentrates on the requirements of the application scenario, known approaches for long time preservation of digital information and their drawbacks, our approach to overcome some of these, a critical analysis of the field and the final question whether other physical approaches will do better in the long run.