Modernized Uniform Representation of Carbohydrate Molecules in theProtein Data Bank.

2021 
Since 1971, the Protein Data Bank (PDB) has served as the single global archive for experimentally-determined three-dimensional structures of biological macromolecules made freely available to the global community according to the FAIR principles of Findability-Accessibility-Interoperability-Reusability. During the first 50 years of continuous PDB operations, standards for data representation have evolved to better represent rich and complex biological phenomena. Carbohydrate molecules present in more than 14,000 PDB structures have recently been reviewed and remediated to conform to a new standardized format. This machine-readable data representation for carbohydrates occurring in PDB structures and corresponding reference data improves the findability, accessibility, interoperability, and reusability of structural information pertaining to these molecules. The PDBx/mmCIF data dictionary now supports (i) standardized atom nomenclature that conforms to IUPAC-IUBMB recommendations for carbohydrates; (ii) uniform representation of branched entities for oligosaccharides; (iii) commonly used linear descriptors of carbohydrates developed by the glycoscience-community; and (iv) annotation of glycosylation sites in proteins. For the first time, carbohydrates in PDB structures are consistently represented as collections of standardized monosaccharides, which precisely describe oligosaccharide structures and enable improved carbohydrate visualization, structure validation, robust quantitative and qualitative analyses, search for dendritic structures, and classification. The uniform representation of carbohydrate molecules in the PDB described herein will facilitate broader usage of the resource by the glycoscience community and researchers studying glycoproteins.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    82
    References
    4
    Citations
    NaN
    KQI
    []