20th International CODATA Conference

 

Session: Primary Biological Databases

 

 

Quality of services of the primary nucleotide sequences databases

 

Hideaki Sugawara, (tree_of_life@leaf.ocn.ne.jp), Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan

 

             

Biological databases have expanded in harmony with the progress of biology, especially thanks to the development of precise and extensive experimental technologies. “The Molecular Biology Database Collection: 2006 update” (Galperin 2006) introduces 858 databases in the wide range of categories from nucleotide sequences to immunology. The number is 139 more than the one in 2005.  Among them, the International Nucleotide Sequence databases (INSD, http://www.insdc.org/) has been one of the successful primary databases for 20 years. The success so far is due to the understanding and support by academia, industries and governments. However, the INSD of DDBJ, EMBL and GenBank has to be well prepared for the increase of quality and variety of data. The data increased 10Kbp oin 1987 to 100,000,000Kbp in 2006 and keep increasing day by day. The major data in 1987 were sequences of genes cloned and the more than half INSD now are from projects of whole genome shotugun. Therefore, INSD has to enlarge computer resources, improve application programs and foster experts to maintain the quality of services. As an effort for the quality assurance, we at DDBJ have carried out a project named Gene Trek in Procaryote Space (GTPS)to apply a common protocol to all the bacterial genome sequences in the public domain by use of GRID, a large scale PC cluster and expert annotators. The GTPS database is available at http://gtps.ddbj.nig.ac.jp/.

Reference: Galperin, Michael Y. (2006) Nucleic Acids Research, Vol. 34, Database issue D3-D5

Keywords: nucleotide sequence, database, prokaryote, genome, annotation