Leveraging the k-Nearest Neighbors classification algorithm for Microbial Source Tracking using a bacterial DNA fingerprint library

2015 
Fecal contamination in bodies of water is an issue that cities must combat regularly. Often, city governments must restrict access to water sources until the contaminants dissipate. Sourcing the species of the fecal matter helps curb the issue in the future, giving city governments the ability to mitigate the effects before they occur again. Microbial Source Tracking (MST) aims to determine source host species of strains of microbiological lifeforms and library-based MST is one method that can assist in sourcing fecal matter. Recently, the Biology Department in conjunction with the Computer Science Department at California Polytechnic State University San Luis Obispo (Cal Poly) teamed up to build a database called the Cal Poly Library of Pyroprints (CPLOP). Students collect fecal samples, culture and pyrosequence the E. coli in the samples, and insert this data, called pyroprints, into CPLOP. Using two intergenic transcribed spacer regions of DNA, Cal Poly biologists perform studies on strain differentiation. We propose using k-Nearest Neighbors, a straightforward machine learning technique, to classify the host species of a given pyroprint, construct four algorithms to resolve the regions, and investigate classification accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    2
    Citations
    NaN
    KQI
    []