19th International CODATA 
  Conference
  Category: Plenary - Mark-Up Languages
XML Description of Protein Structural Data for Data Grid and Computing Grid
Haruki Nakamura
  Institute for Protein Research, Osaka University, Japan
  The Protein Data Bank (PDB) has been a primary archive of three-dimensional 
  structural information of biological macromolecules. Protein Data Bank Japan 
  (PDBj, http://www.pdbj.org/) 
  has been curating new PDB entries as a member of world-wide Protein Data Bank 
  (wwPDB) [1] along with Research Collaboratory for Structural Bioinformatics 
  (RCSB) and European Bioinformatics Institute (EBI).
A new extensible mark-up language (XML) describing the PDB data, the pdbML, is being developed by wwPDB. Its structure is defined in XML Schema (pdbx-v1.000.xsd at http://deposit.pdb.org/pdbML/), based on Macromolecular Crystallographic Information Format (mmCIF). The entire content in the pdbML is now available from ftp://beta.rcsb.org/pub/pdb/uniformity/data/XML. To make the most of the XML format, we, PDBj, have constructed an XML-based PDB data browser (xPSSS: xml-based Protein Structure Search Service at http://www.pdbj.org/xpsss/), using the native XML-DB. The information of the biological and biochemical functions of proteins is also browsed. In addition to simple searches, full XPath searches are also implemented. This allows users to perform complicated searches and control the output of their search in details. The xPSSS is also used by the SOAP service for large-scale analyses and data grid applications.
In multiscale biological 
  systems, integration of the simulation methods for models at different levels 
  is essential, and a new platform, BioPfuga (Biosimulation Platform United on 
  Grid Architecture), has been developed [2]. It requires that (1) application 
  programs are divided into a set of many pieces, and that (2) data communication 
  be made among the program components by a standard XML description. An example 
  of the BioPfuga application to hybrid QM(HF)/QM(DFT)/MM method will be shown.
  
References:
  [1] Berman et al. (2003) Nature Struct. Biol. 10, 980.
  [2] Nakamura et al. (2004) New Generat. Comput. 22,157-166.