The origins and divergence of Drosophila simulans and close relatives D. mauritiana and D. sechellia were examined using the patterns of DNA sequence variation found within and between species at 14 different genes. D. sechellia consistently revealed low levels of polymorphism, and genes from D. sechellia have accumulated mutations at a rate that is approximately 50% higher than the same genes from D. simulans. At synonymous sites, D. sechellia has experienced a significant excess of unpreferred codon substitutions. Together these observations suggest that D. sechellia has had a reduced effective population size for some time, and that it is accumulating slightly deleterious mutations as a result. D. simulans and D. mauritiana are both highly polymorphic and the two species share many polymorphisms, probably since the time of common ancestry. A simple isolation speciation model, with zero gene flow following incipient species separation, was fitted to both the simulans/mauritiana divergence and the simulans/sechellia divergence. In both cases the model fit the data quite well, and the analyses revealed little evidence of gene flow between the species. The exception is one gene copy at one locus in D. sechellia, which closely resembled other D. simulans sequences. The overall picture is of two allopatric speciation events that occurred quite near one another in time.
The divergence of bonobos and three subspecies of the common chimpanzee was examined under a multipopulation isolation-with-migration (IM) model with data from 73 loci drawn from the literature. A benefit of having a full multipopulation model, relative to conducting multiple pairwise analyses between sampled populations, is that a full model can reveal historical gene flow involving ancestral populations. An example of this was found in which gene flow is indicated between the western common chimpanzee subspecies and the ancestor of the central and the eastern common chimpanzee subspecies. The results of a full analysis on all four populations are strongly consistent with analyses on pairs of populations and generally similar to results from previous studies. The basal split between bonobos and common chimpanzees was estimated at 0.93 Ma (0.68–1.54 Ma, 95% highest posterior density interval), with the split among the ancestor of three common chimpanzee populations at 0.46 Ma (0.35–0.65), and the most recent split between central and eastern common chimpanzee populations at 0.093 Ma (0.041–0.157). Population size estimates mostly fell in the range from 5,000 to 10,000 individuals. The exceptions are the size of the ancestor of the common chimpanzee and the bonobo, at 17,000 (8,000–28,000) individuals, and the central common chimpanzee and its immediate ancestor with the eastern common chimpanzee, which have effective size estimates at 27,000 (16,000–44,000) and 32,000 (19,000–54,000) individuals, respectively.
Bifurcating phylogenies are frequently used to describe the evolutionary history of groups of related species. However, simple bifurcating models may poorly represent the evolutionary history of species that have been exchanging genes. Here, we show that the history of three well–known closely related species, Drosophila pseudoobscura, D. persimilis and D. p. bogotana, is not well represented by a bifurcating phylogenetic tree. The phylogenetic relationships among these species vary widely between different genomic regions. Much of this phylogenetic variation can be explained by the potential of different genomic regions to introgress between species, as measured in laboratory studies. We argue that the utility of multiple markers in species–level phylogenetic studies can be greatly enhanced by knowledge of genomic location and, in the case of hybridizing species, by knowledge of the functional or linkage relationships among the markers and regions of the genome that reduce hybrid fitness.
Background The explosively radiating evolution of cichlid fishes of Lake Malawi has yielded an amazing number of haplochromine species estimated as many as 500 to 800 with a surprising degree of diversity not only in color and stripe pattern but also in the shape of jaw and body among them. As these morphological diversities have been a central subject of adaptive speciation and taxonomic classification, such high diversity could serve as a foundation for automation of species identification of cichlids. Methodology/Principal Finding Here we demonstrate a method for automatic classification of the Lake Malawi cichlids based on computer vision and geometric morphometrics. For this end we developed a pipeline that integrates multiple image processing tools to automatically extract informative features of color and stripe patterns from a large set of photographic images of wild cichlids. The extracted information was evaluated by statistical classifiers Support Vector Machine and Random Forests. Both classifiers performed better when body shape information was added to the feature of color and stripe. Besides the coloration and stripe pattern, body shape variables boosted the accuracy of classification by about 10%. The programs were able to classify 594 live cichlid individuals belonging to 12 different classes (species and sexes) with an average accuracy of 78%, contrasting to a mere 42% success rate by human eyes. The variables that contributed most to the accuracy were body height and the hue of the most frequent color. Conclusions Computer vision showed a notable performance in extracting information from the color and stripe patterns of Lake Malawi cichlids although the information was not enough for errorless species identification. Our results indicate that there appears an unavoidable difficulty in automatic species identification of cichlid fishes, which may arise from short divergence times and gene flow between closely related species.
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy-Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets.
Species as evolutionary lineages are expected to show greater evolutionary independence from one another than are populations within species. Two measures of evolutionary independence that stem from the study of isolation-with-migration models, one reflecting the amount of gene exchange and one reflecting the time of separation, were drawn from the literature for a large number of pairs of closely related species and pairs of populations within species. Both measures, for gene flow and time, showed broadly overlapping distributions for pairs of species and for pairs of populations within species. Species on average show more time and less gene flow than populations, but the similarity of the distributions argues against there being a qualitative difference associated with species status, as compared to populations. The two measures of evolutionary independence were similarly correlated with F(ST) estimates, which in turn also showed similar distributions for species comparisons relative to population comparisons. The measures of gene flow and separation time were examined for the capacity to discriminate intraspecific differences from interspecific differences. If used together, the two measures could be used to develop an objective (in the sense of being repeatable) measure for species diagnosis.
Two regions of the genome, a 1-kbp portion of the zeste locus and a 1.1-kbp portion of the yolk protein 2 locus, were sequenced in six individuals from each of four species: Drosophila melanogaster, D. simulans, D. mauritiana, and D. sechellia. The species and strains were the same as those of a previous study of a 1.9-kbp region of the period locus. No evidence was found for recent balancing or directional selection or for the accumulation of selected differences between species. Yolk protein 2 has a high level of amino acid replacement variation and a low level of synonymous variation, while zeste has the opposite pattern. This contrast is consistent with information on gene function and patterns of codon bias. Polymorphism levels are consistent with a ranking of effective population sizes, from low to high, in the following order: D. sechellia, D. melanogaster, D.mauritiana, and D. simulans. The apparent species relationships are very similar to those suggested by the period locus study. In particular, D. simulans appears to be a large population that is still segregating variation that arose before the separation of D. mauritiana and D. sechellia. It is estimated that the separation of ancestral D. melanogaster from the other species occurred 2.5-3.4 Mya. The separations of D. sechellia and D. mauritiana from ancestral D. simulans appear to have occurred 0.58-0.86 Mya, with D. mauritiana having diverged from ancestral D. simulans 0.1 Myr more recently than D. sechellia.
The population genetic history of a 10.1-kbp noncoding region of the human X chromosome was studied using the males of the HGDP-CEPH Human Genome Diversity Panel (672 individuals from 52 populations). The geographic distribution of patterns of variation was roughly consistent with previous studies, with the major exception that 1 highly divergent haplotype (haplotype X, hX) was observed at low frequency in widely scattered non-African populations and not at all observed in sub-Saharan African populations. Microsatellite (short tandem repeat) variation within the sequenced region was low among copies of hX, even though the estimated time of ancestry of hX and other sequences was 1.44 Myr. The estimated age of the common ancestor of all hX copies was 5,230 years (95% consistency index: 2,000–75,480 years). To further address the presence of hX in Africa, additional samples from Chad and Tanzania were screened. Five additional copies of hX were observed, consistent with a history in which hX was present in Africa prior to the migration of modern humans out of Africa and with eastern Africa being the source of non-African modern human populations. Taken together, these features of hX—that it is much older than other haplotypes and uncommon and patchily distributed throughout Africa, Europe, and Asia—present a cautionary tale for interpretations of human history.