Machine learning substantiates biologically meaningful species delimitations in the phylogenetically complex North American box turtle genus Terrapene

2020 
Model-based approaches to species delimitation are constrained both by computational capacities as well as by algorithmic assumptions that are frequently violated when applied to biologically complex systems. An alternate approach, demonstrated herein, employs machine learning (=ML) approaches from which species limits are derived without an explicit definition of an underlying species model. By doing so, we demonstrate the capacity of these approaches to designate phylogenomically and biologically relevant groups, using North American box turtles (Terrapene spp.) as an example. Several different ML-based and traditional species delimitation algorithms were invoked to parse a large SNP dataset derived from ddRAD sequencing. Our results illuminate two major findings. First, more traditional model-based approaches perform poorly, a likely reflection of systematic biases inherent in their formulation. Multispecies coalescent methods consistently over-split Terrapene, particularly given prior evidence and our own phylogenetic results. Second, results from ML and clustering algorithms consistently reiterated the presence of clades that were well-supported in prior species tree analyses. In summary, we highlight both the strengths and limitations of ML algorithms, and in doing so, explore appropriate approaches to data manipulation and model fit. Our study was accomplished within the context of a well-characterized empirical system that allowed a direct contrast between ML versus traditional approaches. It allowed the utility of ML-methods to be underscored for species delimitation and serves as a study case from which guidelines implicit to ML methods could be applied to other study systems.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    88
    References
    2
    Citations
    NaN
    KQI
    []