CODATA


Home

About

CODATA Membership

Resources

Task and Working Groups

Archives


 

 

 

 

 

International Council for Science : Committee on Data for Science and Technology
CODATA The Committee on Data for Science and Technology
< home > < newsletter > < discussion list > < data science journal > < contact > < members area >
C O D A T A

Report from the Delegate International Union of Crystallography (1994)

Much of data related activities of the International Union of Crystallography for the past few years have centered around future developments of various applications of the Self-Defining Text Archive and Retrieval(STAR) syntax. These applications of the STAR syntax were -

  1. upgrading of the Crystallographic Information File(CIF) core.
  2. development of the macromolecular CIF dictionary.
  3. the CIF powder dictionary.
In consultation with the Executive Committee(IUCr) it was decided to establish a separate IUCr Committee for this activity to be denoted as the Committee for the Maintenance of the CIF Standard(ComCIFS). Collaborations with the Chemical Structure Association concerning the implementation of their Standard Molecular Data (SMD) format as a STAR application were continued. This application, the Molecular Information File or MIF, is designed as a standard carrier of chemical structure (connectivity) information. Publication of the standard is scheduled for late 1994.

Crystallographic data centre activities are monitored by the IUCr Commission on Crystallographic Data. These data centre activities include the building and dissemination of crystallographic databases. There are six major crystallographic databases -

  1. Cambridge Structural Database - CSD.
  2. Inorganic Crystal Structure Database - ICSD.
  3. International Centre for Diffraction Data - ICDD.
  4. NRC Metals Crystallographic Data File - CRYSTMET
  5. NIST Crystal and Electron Diffraction Data Center.
  6. Protein Data Bank - PDB.
The following are brief summaries of developments with each of these databases.

CSD - CAMBRIDGE STRUCTURAL DATABASE

This is a summary of developments in the Cambridge Structural Database (CSD) and also some details are given of plans for the near future.

Database News

  1. Contents

    The database which aims to include all published organic and metal-organic complex structures now contains 120,481 data entries (April 1994 release), and 1828 Protein Data Bank entries.

    There are also 8086 entries which have crystal data only, and 678 entries with only partial structural determination.

    The growth rate in the last year has been 10,603 new entries.

    Detailed statistics are available from CCDC on request.

  2. Protein Data Bank (PDB)

    In October 1993 PDB entries were included in the CSD for the first time by agreement with the Brookhaven group. These entries contain the bibliographic details, plus the residue sequence information as a searchable field, but not the coordinates.

Platforms & Formats

The CSD continues to be supplied to all academic institutions via the Affiliated National Data Centres. It is also supplied commercially to industrial companies, predominantly in the pharmaceutical sector, where it is used in drug design.

  1. Platforms

    The CSD is supported on VAX/VMS, DEC Alpha , SGI (UNIX), and SUN(UNIX) platforms.

    A most important development has been a Generic Unix Package, which enables the CSDS to be installed on most commonly available Unix systems. The SGI and SUN are just individual examples of this package. In practice this has satisfied many of the former 'machine independent' users. The machine independent package (MIP) is still available, which requires the user to do some work to get a basic system running on almost any 32-bit computer.

  2. CD-ROM

    In March 1994 CSD was released for the first time on CD-ROM, for UNIX and VMS platforms. This has been widely welcomed, and allows the complete CSD to be accommodated on a single CD.

  3. Magnetic Media

    The last year has seen a movement to more high-density magnetic media, for example DAT, and 1/4 inch tape cassettes, with almost the complete demise of 1600 bpi tape. It is thought that CD will almost completely replace magnetic media in the next few years.

  4. CSD-MDL database

    In Jan. 1994 a version of CSD was released as an MDL registered database, which is searchable by the MDL MACCS and ISIS software.

  5. CIF

    The Crystallographic Information File (CIF) format has been acceptable as a deposit format at CCDC since 1993. It is intended that CIF will become a standard output option from CSD also.

Software for CSD

  1. QUEST3D

    QUEST3D is the search program provided with CSD; this has reached a mature state of development in 1993. The program allows searching of the CSD for fragments defined with certain geometric constraints, which are defined as a set of geometric parameters specified withing certain ranges (distances, angles, torsion angles etc.)

    QUEST3D also allows searching for fragments containing intermolecular (non-bonded) contacts.

    QUEST3D has a new search feature for the PDB residue sequence field.

  2. VISTA

    VISTA is a new program release March 1994, which provides a visualisation tool for statistics on geometric fragment parameters. QUEST3D is used to define a set of parameters for a fragment, which are output to a 'tables' file, which is then easily examined by VISTA. Version 1.0 allows display of histograms, scattergrams (with regression), and PCA. The program allows immediate examination of any entry in the CSD by selection of the given parameter data point, with rotatable 3D display. This is very useful in interpreting a data set in terms of molecular conformation, and scrutiny of outliers.

  3. PLUTO

    This display program has been upgraded in October 1993 with an interactive menu. New features have been developed to enable easy exploration of non-bonded intermolecular contacts e.g. H-bonding and metal coordination.

ICSD INORGANIC CRYSTAL STRUCTURE DATABASE

The ICSD is produced by Fachinformationszentrum Karlsruhe and Gmelin-Institute, Frankfurt. This database presently contains 37.000 entries. It can be accessed online at STN with MESSENGER. For inhouse use there is a mainframe and a CD-ROM version for PC. Both versions are accompanied by effective retrieval systems. The retrieval system is completely menu-driven and has features for graphical representation of all structures and for the calculation of other interesting features. An expert system for further evaluation of the content of the database is under development.

INTERNATIONAL CENTRE FOR DIFFRACTION DATA

Powder diffraction patterns supplied to the ICDD through its Grants-in-Aid program and other sources are now being archived in CIF/STAR format. Implementation of this format has not been uneventful.

As a prototype of a CIF/STAR database product, development of a database containing diffraction patterns of clays and associated minerals is proceeding. While the data format is well-defined, many data processing, editorial, data evaluation, and production issues need to be resolved before a complete product can be released.

Development of the Set 44 Powder Diffraction File products is on schedule for the usual August release. New releases of the Crystal Data Identification File and the Electron Diffraction Database have been completed.

The release date for PC-PDF for Windows is scheduled for 1 August 1994. This Microsoft Windows version of the PC-PDF retrieval program for the Powder Diffraction File contains the considerable enhancement of an Organic Functional Groups index, which makes possible substructure searches of the Powder Diffraction File. Structure display software is also included.

A new product, the Metals and Alloys Indexes, was released. Intended to facilitate entry to the Powder Diffraction File, this print volume consists of an alphabetical formula index, a Pearson symbol code index, a common names index, and a Strukturbericht symbol index. It is an exceptionally-useful tool for studying structural similarities in alloy systems. Data evaluation technology developed during the preparation of these indexes will be applied to other ICDD products in the future.

The Dow collection of polymer diffraction patterns is being reprinted. We hope to produce a CIF/STAR database of digital polymer diffraction patterns if there is a market.

A two-volume trial of PC-LIBRUM, the CD-ROM version of the Denver Conference proceedings, has been authorized. If the product meets with customer acceptance, more volumes of the series "Advances in X-Ray Analysis" will be placed on CD-ROM.

NIST CRYSTAL AND ELECTRON DIFFRACTION DATA CENTER

During the year, this Data Center has focused on abstracting and evaluating crystallographic data. Data are collected on all classes of materials and there are over 200,000 entries in the master database from which products are derived. As the rate of increase is over 15,000 entries per year, it is becoming increasingly difficult to keep current.

Data Center products include CD-ROMS and and on-line service via CISTI. Two up-to-date CD-ROMS, one with NIST Crystal Data and the other with the NIST/Sandia/ICDD Electron Diffraction Database, will be made available to the scientific community by October 1994. The on-line version of NIST Crystal Data in CISTI(CRYSTDAT) is currently being upgraded to include all recent entries. These analytical and research tools have many applications including compound identification and materials design. For example, in the research on superconductor-related materials these databases have been used routinely in the synthesis of new compounds and in the identification of impurities in neutron powder patterns.

In addition to data, the Data Center has been supplying many copies of our evaluation software( NBS*AIDS83) to the scientific community. NBS*AIDS83 is widely used in both single crystal and powder research laboratories for the evaluation and standardization of primary crystallographic data.

NRC METALS CRYSTALLOGRAPHIC DATA FILE (CRYSTMET)

CRYSTMET includes metals, alloys and intermetallics, composed of elements on the left of the Zintl line and, in addition, some cross-compounds (phases formed with elements immediately to the right of the Zintl line) along with some dioxides. Complete back to 1922, CRYSTMET currently has some 52,500 entries and is updated annually. It is produced by the National Research Council of Canada in collaboration with Villars' Intermetallic of Switzerland.

PROTEIN DATA BANK

The Protein Data Bank (PDB) archival database of 3-D structures for biological macromolecules has tripled in size over the past two years. The entire PDB collection (2441 atomic coordinate sets as of April 1994) is now freely available Internet, permitting entirely new possibilities for electronic access and on-line searching. Copies of PDB can also be purchased as before from Brookhaven (BNL) on CD-ROM or magnetic tape. A number of affiliated centers in Europe, Japan and North America offer PDB for distribution as well, and several vendors have incorporated PDB information into software products for applications including modeling, simulation and computer-aided molecular design.

The project team at BNL is working to extend PDB beyond its traditional role as an information repository. Along with making the data even more accessible and accelerating the deposition process for new structures, PDB's goals include development of a genuine database where users can ask questions across the full collection of structures instead of being limited to querying individual ones. At the same time, PDB is evolving into a true international collaboration, receiving not only data but also a free flow of ideas from its growing community of users. A PDB User Group recently has been formed to help in achieving this goal.

To head PDB at this time of major new challenges and opportunities, BNL has appointed Joel L. Sussman, a professor at the Weizmann Institute of Science in Israel.


John R. Rodgers
Chairman
Commission on Crystallographic Data

Working to improve the quality, reliability, management and accessibility of Data for Science and Technology

| home | about | codata membership | resources | task and working groups |
| archives | newsletter | contact | members area |


| XML - CODATA RSS Feed | RSS Feed subscription instructions |