A Novel Approach for Increasing Taxonomic Resolution in Protein-Based Alignments

2018 
Most of today's genome sequencing technology requires that genomes be sequenced in fragments. Typically, these fragments are then aligned using a variety of different alignment programs. All alignment tools query against a reference database to determine the most accurate reassembly of the original DNA strand's nucleotide sequence. Although these programs can align in both nucleotide and protein space, each method comes with its own disadvantages. Protein aligners such as PALADIN consistently align a greater percent of reads faster and provide greater insight into the functional capabilities of the aligned sequence. On the other hand, this method reduces the sensitivity of taxonomic classification due to the degeneracy of the genetic codes. Our program, Renuc, is a PALADIN plugin that addresses this issue by taking protein alignment results using the UniProt database and identifying the most likely taxonomic origin for each nucleotide sequence associated with each detected protein. We have validated our approach and its implementation in Renuc by successfully retrieving the nucleotide sequence and corresponding taxonomic IDs for all of the aligned proteins in our test dataset consisting of a whole Escherichia coli genome. Our program aligns over 99 percent of the nucleotide reads with 97 percent of them remaining in the same protein cluster as the original protein alignment. However, this dataset is incredibly well studied and documented in UniProt. Future work should be considered with a dataset containing less annotations in the database. Renuc quickly identifies and visualizes the alignment's taxonomic data in a user friendly way. The integration of SQLite into the program significantly reduces the time required to retrieve information from the UniProt database. Currently, we seek to improve the retrieval of nucleotide sequences by creating a local cache of the NCBI RefSeq database, and visualizing taxonomy with greater resolution using RaxML.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []