19th International CODATA Conference
Category: Publication and Citation of Scientific Data

Digital Object Identifiers for scientific data sets

Norman Paskin (n.paskin@doi.org)
International DOI Foundation, Linacre House, UK
http://www.doi.org


The Digital Object Identifier (DOI) is a widely used system for identifying content "objects" in the digital environment. DOIs are names assigned to content objects (physical, digital or abstract) such as electronic journal articles, images, and any kind of content. Scientific data sets may be identified by DOIs, and several efforts are now underway in this area.

DOIs provide persistent identification together with current information about the object. The system is managed by the International DOI Foundation (IDF), an open membership consortium including both commercial and non-commercial partners. Several million DOIs have been assigned by DOI Registration Agencies in the US, Australasia, and Europe. DOI is a development of several existing standards notably the Handle resolution system and the indecs Data Dictionary. DOIs can be used for any form of management of data, whether commercial or non-commercial.

The DOI system has several components: a specified numbering syntax, a resolution service (based on the Handle System), a metadata system (based on the indecs Data Dictionary), and procedures for the implementation of DOIs through a federation of Registration Agencies.

The Handle system implementation in DOI has been supplemented by expanded technical infrastructure and features specific to DOI applications. Handle multiple resolution allows one entity to be resolved to multiple other entities; it can therefore be used to embody e.g. a parent-children relationship, or any other relationship, and is therefore suitable for describing relationships of data sets. Handle per se (deliberately) has no pre-existing constraints to define a framework to express relationships (analogy: spreadsheet software ): DOI is an application of Handle which adds this constraint for a specific purpose of content management (analogy: a spreadsheet application). In DOI the constraints are metadata defining the entities, using a semantically interoperable data dictionary. The IDF is the Registration Authority for one such Dictionary, the ISO/IEC MPEG-21 Rights Data Dictionary, and the developer of a wider indecs Data Dictionary which includes this; a data dictionary enables one to express relationships.

Any existing numbering schemes, and any existing metadata schemes, that provide an accepted numbering or descriptive syntax for a particular community or area of interest (formal ISO standards or accepted community practice) can be used within the DOI System.

The presentation will outline the rationale for the DOI System, its application in current scientific data set activity, and future possibilities.