A compressed form for presenting data to banks on the primary and spatial structure of biopolymers. Means for accessing compressed data banks

Description of open CAN format (Compressed Amino acids and Nucleotides) is presented for storing genetic information in compressed form in data banks (DB). Data compression principles are demonstrated in detail on examples of EMBL DB (sequences of nucleotides), SWISSPROT DB (sequences of amino acids) and PDB DB (3D structures). A unified compressed data format provides a possibility to integrate EMBL, SWISSPROT, and PDB DB in one data bank. We are going to use this approach for integration of GENBANK and other similar DBs. One of the outcomes of the research is a library of data retrieval procedures for access to DB, providing developers of the application software packages with a uniform interface to DBs with biologically related data. The proposed scheme for data representation was recommended by the Expert Commission of the Informatics Section of the RSSIP "Human Genome" as a standard for distribution of data banks in Russia.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader