Analysis of paralogs in target enrichment data pinpoints multiple ancient polyploidy events in Alchemilla s.l. (Rosaceae)

2020 
Target enrichment is becoming increasingly popular for phylogenomic studies. Although baits for enrichment are typically designed to target single-copy genes, paralogs are often recovered with increased sequencing depth, sometimes from a significant proportion of loci. Common approaches for processing paralogs in target enrichment datasets include removal, random selection, and manual pruning of loci that show evidence of paralogy. These approaches can introduce errors in orthology inference, and sometimes significantly reduce the number of loci, especially in groups experiencing whole-genome duplication (WGD) events. Here we used an automated approach for paralog processing in a target enrichment dataset of 68 species of Alchemilla s.l. (Rosaceae), a widely distributed clade of plants primarily from temperate climate regions. Previous molecular phylogenetic studies and chromosome numbers both suggested the polyploid origin of the group. However, putative parental lineages remain unknown. By taking paralogs into consideration, we identified four nodes in the backbone of Alchemilla s.l. with an elevated proportion of gene duplication. Furthermore, using a gene-tree reconciliation approach we established the autopolyploidy origin of the entire Alchemilla s.l. and the nested allopolyploidy origin of four clades within the group. Here we showed the utility of automated orthology methods, commonly used in genomic or transcriptomic datasets, to study complex scenarios of polyploidy and reticulate evolution from target enrichment datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    108
    References
    4
    Citations
    NaN
    KQI
    []