RiboReport - Benchmarking tools for ribosome profiling-based identification of open reading frames in bacteria

2021 
Small proteins, those encoded by open reading frames, with less than or equal to 50 codons, are emerging as an important class of cellular macromolecules in all kingdoms of life. However, they are recalcitrant to detection by proteomics or in silico methods. Ribosome profiling (Ribo-seq) has revealed widespread translation of sORFs in diverse species, and this has driven the development of ORF detection tools using Ribo-seq read signals. However, only a handful of tools have been designed for bacterial data, and have not yet been systematically compared. Here, we have performed a comprehensive benchmark of ORF prediction tools which handle bacterial Ribo-seq data. For this, we created a novel Ribo-seq dataset for E. coli, and based on this plus three publicly available datasets for different bacteria, we created a benchmark set by manual labeling of translated ORFs using their Ribo-seq expression profile. This was then used to investigate the predictive performance of four Ribo-seq-based ORF detection tools we found are compatible with bacterial data (Reparation_blast, DeepRibo, Ribo-TISH and SPECtre). The tool IRSOM was also included as a comparison for tools using coding potential and RNA-seq coverage only. DeepRibo and Reparation_blast robustly predicted translated ORFs, including sORFs, with no significant difference for those inside or outside of operons. However, none of the tools was able to predict a set of recently identified, novel, experimentally-verified sORFs with high sensitivity. Overall, we find there is potential for improving the performance, applicability, usability, and reproducibility of prokaryotic ORF prediction tools that use Ribo-Seq as input. Key pointsO_LICreated a benchmark set for Ribo-seq based ORF prediction in bacteria C_LIO_LIDeepRibo the first choice for bacterial ORF prediction tasks C_LIO_LITool performance is comparable between operon vs single gene regions C_LIO_LIIdentification of novel sORF with DeepRibo is, with restrictions, possible, by using the top 100 novel sORFs sorted by rank. C_LIO_LIExperimental results show that considering translation initiation site data could boost the detection of novel small ORFs C_LIO_LIDetermination of novel sORFs in E. coli using a new experimental protocol to enrich for translation initiation site. These data-set shows that still a significant part (here 8 out 24, so 1/3) are not detected dispute sufficient Ribo-seq signal. An additional 7 could be recovered using translation initiation site protocols. C_LIO_LITools should embrace the use of replicate data and improve packaging, usability and documentation. C_LI
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    61
    References
    0
    Citations
    NaN
    KQI
    []