Accurate and timely functional genome annotation is essential for translating basic pathogen research into clinically impactful advances. Here, through literature curation and structure-function inference, we systematically update the functional genome annotation of Mycobacterium tuberculosis virulent type strain H37Rv. First, we systematically curated annotations for 589 genes from 662 publications, including 282 gene products absent from leading databases. Second, we modeled 1,711 underannotated proteins and developed a semiautomated pipeline that captured shared function between 400 protein models and structural matches of known function on Protein Data Bank, including drug efflux proteins, metabolic enzymes, and virulence factors. In aggregate, these structure- and literature-derived annotations update 940/1,725 underannotated H37Rv genes and generate hundreds of functional hypotheses. Retrospectively applying the annotation to a recent whole-genome transposon mutant screen provided missing function for 48% (13/27) of underannotated genes altering antibiotic efficacy and 33% (23/69) required for persistence during mouse tuberculosis (TB) infection. Prospective application of the protein models enabled us to functionally interpret novel laboratory generated pyrazinamide (PZA)-resistant mutants of unknown function, which implicated the emerging coenzyme A depletion model of PZA action in the mutants' PZA resistance. Our findings demonstrate the functional insight gained by integrating structural modeling and systematic literature curation, even for widely studied microorganisms. Functional annotations and protein structure models are available at https://tuberculosis.sdsu.edu/H37Rv in human- and machine-readable formats. IMPORTANCE Mycobacterium tuberculosis, the primary causative agent of tuberculosis, kills more humans than any other infectious bacterium. Yet 40% of its genome is functionally uncharacterized, leaving much about the genetic basis of its resistance to antibiotics, capacity to withstand host immunity, and basic metabolism yet undiscovered. Irregular literature curation for functional annotation contributes to this gap. We systematically curated functions from literature and structural similarity for over half of poorly characterized genes, expanding the functionally annotated Mycobacterium tuberculosis proteome. Applying this updated annotation to recent in vivo functional screens added functional information to dozens of clinically pertinent proteins described as having unknown function. Integrating the annotations with a prospective functional screen identified new mutants resistant to a first-line TB drug, supporting an emerging hypothesis for its mode of action. These improvements in functional interpretation of clinically informative studies underscore the translational value of this functional knowledge. Structure-derived annotations identify hundreds of high-confidence candidates for mechanisms of antibiotic resistance, virulence factors, and basic metabolism and other functions key in clinical and basic tuberculosis research. More broadly, they provide a systematic framework for improving prokaryotic reference annotations.
Introduction We assessed the applicability of next-generation sequencing (NGS)-based IGH / IGK clonality testing and analyzed the repertoire of immunoglobulin heavy chain ( IGH ) or immunoglobulin kappa light chain ( IGK ) gene usage in Korean patients with multiple myeloma (MM) for the first time. Methods Fifty-nine bone marrow samples from 57 Korean patients with MM were analyzed, and NGS-based clonality testing that targeted the IGH and IGK genes was performed using IGH FR1 and IGK primer sets. Results Clonal IGH and IGK rearrangements were observed in 74.2% and 67.7% of samples from Korean patients with kappa-restricted MM, respectively (90.3% had one or both), and in 60.7% and 95.5% of samples from those with lambda-restricted MM, respectively (85.7% had one or both). In total, 88.1% of samples from Koreans with MM had clonal IGH and/or IGK rearrangement. Clonal rearrangement was not significantly associated with the bone marrow plasma cells as a proportion of all BM lymphoid cells. IGHV3-9 (11.63%) and IGHV4-31 (9.30%) were the most frequently reported IGHV genes and were more common in Koreans with MM than in Western counterparts. IGHD3-10 and IGHD3-3 (13.95% each) were the most frequent IGHD genes; IGHD3-3 was more common in Koreans with MM. No IGK rearrangement was particularly prevalent, but single IGKV-J rearrangements were less common in Koreans with kappa-restricted MM than in Western counterparts. IGKV4-1 was less frequent in Koreans regardless of light chain type. Otherwise, the usages of the IGH V, D, and J genes and of the IGK gene were like those observed in previous Western studies. Conclusion NGS-based IGH / IGK clonality testing ought to be applicable to most Koreans with MM. The overrepresentation of IGHV3-9 , IGHV4-31 , and IGHD3-3 along with the underrepresentation of IGKV4-1 and the differences in IGK gene rearrangement types suggest the existence of ethnicity-specific variations in this disease.
ABSTRACT Accurate and timely functional genome annotation is essential for translating basic pathogen research into clinically impactful advances. Here, through literature curation and structure-function inference, we systematically update the functional genome annotation of Mycobacterium tuberculosis virulent type strain H37Rv. First, we systematically curated annotations for 589 genes from 662 publications, including 282 gene products absent from leading databases. Second, we modeled 1,711 under-annotated proteins and developed a semi-automated pipeline that captured shared function between 400 protein models and structural matches of known function on protein data bank, including drug efflux proteins, metabolic enzymes, and virulence factors. In aggregate, these structure- and literature-derived annotations update 940/1,725 under-annotated H37Rv genes and generate hundreds of functional hypotheses. Retrospectively applying the annotation to a recent whole-genome transposon mutant screen provided missing function for 48% (13/27) of under-annotated genes altering antibiotic efficacy and 33% (23/69) required for persistence during mouse TB infection. Prospective application of the protein models enabled us to functionally interpret novel laboratory generated Pyrazinamide-resistant (PZA) mutants of unknown function, which implicated the emerging Coenzyme A depletion model of PZA action in the mutants’ PZA resistance. Our findings demonstrate the functional insight gained by integrating structural modeling and systematic literature curation, even for widely studied microorganisms. Functional annotations and protein structure models are available at https://tuberculosis.sdsu.edu/H37Rv in human- and machine-readable formats. IMPORTANCE Mycobacterium tuberculosis , the primary causative agent of tuberculosis, kills more humans than any other infectious bacteria. Yet 40% of its genome is functionally uncharacterized, leaving much about the genetic basis of its resistance to antibiotics, capacity to withstand host immunity, and basic metabolism yet undiscovered. Irregular literature curation for functional annotation contributes to this gap. We systematically curated functions from literature and structural similarity for over half of poorly characterized genes, expanding the functionally annotated Mycobacterium tuberculosis proteome. Applying this updated annotation to recent in vivo functional screens added functional information to dozens of clinically pertinent proteins described as having unknown function. Integrating the annotations with a prospective functional screen identified new mutants resistant to a first-line TB drug supporting an emerging hypothesis for its mode of action. These improvements in functional interpretation of clinically informative studies underscores the translational value of this functional knowledge. Structure-derived annotations identify hundreds of high-confidence candidates for mechanisms of antibiotic resistance, virulence factors, and basic metabolism; other functions key in clinical and basic tuberculosis research. More broadly, it provides a systematic framework for improving prokaryotic reference annotations.