A scalable method for identifying recombinants from unaligned sequences

2020 
Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is of major interest for biologists. However, current methods for detecting recombinants only work for aligned sequences, often require a reference panel, and do not scale well to large datasets. Thus they are not suitable for the analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. We introduce an algorithm to detect recombinant sequences from an unaligned dataset. Our approach can effectively handle thousands of sequences without the need of an alignment or a reference panel, offering a general tool suitable for the analysis of many different types of sequences. We demonstrate the effectiveness of our algorithm through extensive numerical simulations; in particular, it maintains its accuracy in the presence of insertions and deletions. We apply our algorithm to a dataset of 17,335 DBL types in var genes from Ghana, enabling the comparison between recombinant and non-recombinant types for the first time. We observe that sequences belonging to the same ups type or DBL subclass recombine amongst themselves more frequently, and that non-recombinant DBL types are more conserved than recombinant ones.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    81
    References
    1
    Citations
    NaN
    KQI
    []