Development of the decision tree for distinguishing individuals of Chinese four surnames from Guangdong Han population based on Y-STR haplotypes

2021 
Abstract Co-separation studies between surnames and Y chromosome genetic markers are beneficial to revealing population migrations, surname origins, population formation histories and forensic familial searching. Genetic distributions of 27 Y-STRs in Chinese four surnames (Li, Lin, Chen and Huang) from Zhanjiang Han population were investigated. Meanwhile, we tried to develop a decision tree model for surname predictions based on Y-STR haplotypes. Allelic frequencies of 27 Y-STRs showed that unique alleles were only observed in a certain surname; besides, some alleles displayed higher frequencies in a certain surname than those in other surnames, implying these alleles might be employed as the useful indicators for surname predictions. Haplotype match probability values of 27 Y-STRs in these surnames revealed that the system could be used as a valuable tool for forensic male identification. The developed decision tree model performed well for the training set with the accuracy of 0.9860 and obtained the relatively high accuracy (>0.70) for surname predictions of the testing set. To sum up, we explored the power of the machine learning to the surname predictions based on obtained Y-STR haplotypes, which showed promising application values in forensic familial searching.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []