19th International CODATA Conference
Category: Data Archiving

Digital archiving of scientific information: Czech experience

Prof. Dr. Pavel Slavik (slavik@fel.cvut.cz), P. Mach, M. Snorek
CTU Prague, Czech Republic


Very interesting source of scientific data are MSc and PhD theses. Having large amount of these documents after certain period of time it might be possible to obtain an overall picture about the context, in which the particular research has been performed. As majority of theses exist in a digital form at the present time, it is necessary to develop methods for their preserving. This problem is not a new one. Nevertheless, existing solutions are mostly directed towards the needs of digital libraries that are responsible for their loans and distribution (systems like ProQuest etc.).

The approach selected at the CTU Prague is based on results of a national project, where strategy for implementation of national digital archives has been established. The authors of this paper have been members of this team. The large scale system of such the type is very complex (based on general standards like OAIS). However, it is obvious that the theses should be archived in the framework of such a general archive in the future. As the implementation of such an archive will last couple of years, it has been necessary to design a methodology by means of which the theses would be archived in such a way that their transfer into the large archive will be relatively without problems in the future. The solution shall be easy to implement and easy to handle by non experts who will use the archive.

The solution developed is based on redundancy, where the theses are stored in three formats allowing the user later digital archiving in environment of migration of documents. The main idea was to have formats as much independent as possible on software and hardware platforms with easy document reconstruction in case of some migration problems. The formats chosen are: PDF, text and bitmap. A pilot implementation of the system has been realized and currently the first practical experiments are being done.