Mass spectrometry-based identification and characterization of human hypothetical proteins highlighting the inconsistency across the protein databases

2020 
A myriad of predicted proteins have been described at the genome scale, but their existence has not been confirmed at the protein level. These proteins that are predicted to be expressed from an open-reading frame (ORF) but for which translation has not been demonstrated are known as hypothetical proteins and constitute major fraction of the human proteome. In this study, we aim to identify and characterize hypothetical proteins from human tumor cell lines, viz., HeLa, MCF7, and BT474, thus providing the analytical basis for their expression. We used gel electrophoresis followed by in-gel digestion of the selected protein lanes and subsequent LC–MS/MS analysis of protein tryptic digests. ENSEMBL genome browser was used for genomic alignment. On search against human hypothetical protein data from NCBI database, 110 common proteins were identified across the three selected cells lines. Out of these, 88 proteins were already functionally characterized and remaining 22 were still found to be unreviewed in UniProt, lacking the evidence of expression at the protein level. To explore them further, following HPP guidelines, 15 proteins were selected and aligned against human reference genome. Five hypothetical proteins were confirmed as isoforms of known proteins. We conclude that the proteomic approach used would serve as a suitable tool to validate the existence of predicted or hypothetical proteins at the protein level. The MS proteomics data have been deposited to the ProteomeXchange Consortium via PRIDE with the data set identifiers PXD014258.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []