This dataset contains all coordinates and presence probabilities of biogeographies described in the article "Restructuring of genomic provinces of surface ocean plankton under climate change" https://www.biorxiv.org/content/10.1101/2020.10.20.347237v8
Biogeographical studies have traditionally focused on readily visible organisms, but recent technological advances are enabling analyses of the large-scale distribution of microscopic organisms, whose biogeographical patterns have long been debated. Here we assessed the global structure of plankton geography and its relation to the biological, chemical, and physical context of the ocean (the ‘seascape’) by analyzing metagenomes of plankton communities sampled across oceans during the Tara Oceans expedition, in light of environmental data and ocean current transport. Using a consistent approach across organismal sizes that provides unprecedented resolution to measure changes in genomic composition between communities, we report a pan-ocean, size-dependent plankton biogeography overlying regional heterogeneity. We found robust evidence for a basin-scale impact of transport by ocean currents on plankton biogeography, and on a characteristic timescale of community dynamics going beyond simple seasonality or life history transitions of plankton.
This dataset contains all coordinates and presence probabilities of biogeographies described in the article "Restructuring of genomic provinces of surface ocean plankton under climate change" https://www.biorxiv.org/content/10.1101/2020.10.20.347237v8
Biogeographical studies have traditionally focused on readily visible organisms, but recent technological advances are enabling analyses of the large-scale distribution of microscopic organisms, whose biogeographical patterns have long been debated. Here we assessed the global structure of plankton geography and its relation to the biological, chemical and physical context of the ocean (the 'seascape') by analyzing metagenomes of plankton communities sampled across oceans during the Tara Oceans expedition, in light of environmental data and ocean current transport. Using a consistent approach across organismal sizes that provides unprecedented resolution to measure changes in genomic composition between communities, we report a pan-ocean, size-dependent plankton biogeography overlying regional heterogeneity. We found robust evidence for a basin-scale impact of transport by ocean currents on plankton biogeography, and on a characteristic timescale of community dynamics going beyond simple seasonality or life history transitions of plankton.
Supplementary Table 1. List of Tara Oceans samples sequenced with a metabarcoding (18S V9) approach and with a metagenomic approach, including identifiers for sequencing reads deposited in the DDBJ/ENA/GenBank Short Read Archives (SRA). [This Table is identical in version 1.]
Supplementary Table 2. Table of environmental parameters for each sample. [The column headers of this Table were modified from version 1. All data values are identical.] Supplementary Table 3. Matrix of metagenomic dissimilarity for the 0-0.22 μm size fraction. [This Table is identical in version 1.] Supplementary Table 4. Matrix of metagenomic dissimilarity for the 0.22-1.6/3 μm size fraction. [This Table is identical in version 1.] Supplementary Table 5. Matrix of metagenomic dissimilarity for the 0.8-5 μm size fraction. [This Table is identical in version 1.] Supplementary Table 6. Matrix of metagenomic dissimilarity for the 5-20 μm size fraction. [This Table is identical in version 1.] Supplementary Table 7. Matrix of metagenomic dissimilarity for the 20-180 μm size fraction. [This Table is identical in version 1.] Supplementary Table 8. Matrix of metagenomic dissimilarity for the 180-2000 μm size fraction. [This Table is identical in version 1.] Supplementary Table 9. Matrix of OTU dissimilarity for the 0-0.22 μm size fraction. [This Table is identical in version 1.] Supplementary Table 10. Matrix of OTU dissimilarity for the 0.22-1.6/3 μm size fraction. [This Table is identical in version 1.] Supplementary Table 11. Matrix of OTU dissimilarity for the 0.8-5 μm size fraction. [This Table is identical in version 1.] Supplementary Table 12. Matrix of OTU dissimilarity for the 5-20 μm size fraction. [This Table is identical in version 1.] Supplementary Table 13. Matrix of OTU dissimilarity for the 20-180 μm size fraction. [This Table is identical in version 1.] Supplementary Table 14. Matrix of OTU dissimilarity for the 180-2000 μm size fraction. [This Table is identical in version 1.] Supplementary Table 15. Matrix of minimum travel time, in years. [This Table is identical in version 1.] Supplementary Table 16. Matrix of minimum geographic distance (without traversing land), in kilometers. [This Table is identical in version 1.] Supplementary Table 17. Matrix of imaging-based dissimilarity. [This Table is new in version 2.]
Supplementary Table 18. Matrix of metagenome-assembled genome (MAG)-based dissimilarity for the 20-180 μm size fraction. [This Table is new in version 2.]
Supplementary Table 19. The cophenetic correlation coefficient for different methods of clustering metagenomic dissimilarity. [This Table is identical in version 1, where it was labeled Supplementary Table 17.] Supplementary Table 20. Baker's Gamma index comparing clustering results within size fractions. [This Table is identical in version 1, where it was labeled Supplementary Table 18.] Supplementary Table 21. Rand Index for K-means and spectral clustering, and multivariate ANOVA calculated by the adonis function. [This Table is identical in version 1, where it was labeled Supplementary Table 19.] Dataset 1. Reference database (in FASTA format) used to perform taxonomic assignment of metabarcodes. The header line of each reference V9 rDNA barcode (with a > sign) contains a unique identifier derived from GenBank accession number, followed by the taxonomic path associated to the reference barcode. [This Dataset is identical in version 1.] Dataset 2. V9 rDNA abundance at the metabarcode level. md5sum = unique identifier; totab = total abundance across all samples; cid = identifier of the OTU to which the barcode belongs (see Dataset 3); pid = best percentage identity to a barcode in Dataset 1; refs = identifier(s) of the best matching barcode(s) in Dataset 1; lineage = taxononmic lineage of the best match in Dataset 1; taxogroup = high-level taxonomic grouping of the best match in Dataset 1; sequence = V9 rDNA sequence; TV9_XXX = barcode abundance by sample (see Supplementary Table 1 for sample identifiers). [This Dataset is identical in version 1.] Dataset 3. V9 rDNA abundance at the OTU (operational taxonomic unit) level. cid = identifier of the OTU; md5sum = unique identifier of the most abundant barcode in the OTU; pid, refs, lineage, taxogroup, sequence = defined as in Dataset 2; rtotab = total abundance of the most abundant barcode in the OTU; ctotab = total abundance of all barcodes in the OTU; TV9_XXX = abundance by sample of all barcodes in the OTU (see Supplementary Table 1 for sample identifiers). [This Dataset is identical in version 1.]
Biogeographical studies have traditionally focused on readily visible organisms, but recent technological advances are enabling analyses of the large-scale distribution of microscopic organisms, whose biogeographical patterns have long been debated. Here we assessed the global structure of plankton geography and its relation to the biological, chemical and physical context of the ocean (the 'seascape') by analyzing metagenomes of plankton communities sampled across oceans during the Tara Oceans expedition, in light of environmental data and ocean current transport. Using a consistent approach across organismal sizes that provides unprecedented resolution to measure changes in genomic composition between communities, we report a pan-ocean, size-dependent plankton biogeography overlying regional heterogeneity. We found robust evidence for a basin-scale impact of transport by ocean currents on plankton biogeography, and on a characteristic timescale of community dynamics going beyond simple seasonality or life history transitions of plankton.
Supplementary Table 1. List of Tara Oceans samples sequenced with a metabarcoding (18S V9) approach and with a metagenomic approach, including identifiers for sequencing reads deposited in the DDBJ/ENA/GenBank Short Read Archives (SRA). [This Table is identical in version 2.]
Supplementary Table 2. Table of environmental parameters for each sample. [This Table is identical in version 2.] Supplementary Table 3. Matrix of metagenomic dissimilarity for the 0-0.22 μm size fraction. [This Table is identical in version 2.] Supplementary Table 4. Matrix of metagenomic dissimilarity for the 0.22-1.6/3 μm size fraction. [This Table is identical in version 2.] Supplementary Table 5. Matrix of metagenomic dissimilarity for the 0.8-5 μm size fraction. [This Table is identical in version 2.] Supplementary Table 6. Matrix of metagenomic dissimilarity for the 5-20 μm size fraction. [This Table is identical in version 2.] Supplementary Table 7. Matrix of metagenomic dissimilarity for the 20-180 μm size fraction. [This Table is identical in version 2.] Supplementary Table 8. Matrix of metagenomic dissimilarity for the 180-2000 μm size fraction. [This Table is identical in version 2.] Supplementary Table 9. Matrix of OTU dissimilarity for the 0-0.22 μm size fraction. [This Table is identical in version 2.] Supplementary Table 10. Matrix of OTU dissimilarity for the 0.22-1.6/3 μm size fraction. [This Table is identical in version 2.] Supplementary Table 11. Matrix of OTU dissimilarity for the 0.8-5 μm size fraction. [This Table is identical in version 2.] Supplementary Table 12. Matrix of OTU dissimilarity for the 5-20 μm size fraction. [This Table is identical in version 2.] Supplementary Table 13. Matrix of OTU dissimilarity for the 20-180 μm size fraction. [This Table is identical in version 2.] Supplementary Table 14. Matrix of OTU dissimilarity for the 180-2000 μm size fraction. [This Table is identical in version 2.] Supplementary Table 15. Matrix of minimum travel time, in years. [This Table is identical in version 2.] Supplementary Table 16. Matrix of minimum geographic distance (without traversing land), in kilometers. [This Table is identical in version 2.] Supplementary Table 17. Matrix of imaging-based dissimilarity. [This Table is identical in version 2.]
Supplementary Table 18. Matrix of metagenome-assembled genome (MAG)-based dissimilarity for the 20-180 μm size fraction. [The filename of this Table was modified from version 2. The contents of the Table are identical.]
Supplementary Table 19. The cophenetic correlation coefficient for different methods of clustering metagenomic dissimilarity. [This Table is identical in version 2.] Supplementary Table 20. Baker's Gamma index comparing clustering results within size fractions. [This Table is identical in version 2.] Supplementary Table 21. Rand Index for K-means and spectral clustering, and multivariate ANOVA calculated by the adonis function. [This Table is identical in version 2.] Dataset 1. Reference database (in FASTA format) used to perform taxonomic assignment of metabarcodes. The header line of each reference V9 rDNA barcode (with a > sign) contains a unique identifier derived from GenBank accession number, followed by the taxonomic path associated to the reference barcode. [This Dataset is identical in version 2.] Dataset 2. V9 rDNA abundance at the metabarcode level. md5sum = unique identifier; totab = total abundance across all samples; cid = identifier of the OTU to which the barcode belongs (see Dataset 3); pid = best percentage identity to a barcode in Dataset 1; refs = identifier(s) of the best matching barcode(s) in Dataset 1; lineage = taxononmic lineage of the best match in Dataset 1; taxogroup = high-level taxonomic grouping of the best match in Dataset 1; sequence = V9 rDNA sequence; TV9_XXX = barcode abundance by sample (see Supplementary Table 1 for sample identifiers). [This Dataset is identical in version 2.] Dataset 3. V9 rDNA abundance at the OTU (operational taxonomic unit) level. cid = identifier of the OTU; md5sum = unique identifier of the most abundant barcode in the OTU; pid, refs, lineage, taxogroup, sequence = defined as in Dataset 2; rtotab = total abundance of the most abundant barcode in the OTU; ctotab = total abundance of all barcodes in the OTU; TV9_XXX = abundance by sample of all barcodes in the OTU (see Supplementary Table 1 for sample identifiers). [This Dataset is identical in version 2.] Dataset 4. Relative abundances of metagenome-assembled genomes (MAGs) in metagenomic samples from the 20-180 μm size fraction. [This Dataset is new in version 3.]
Marine planktonic eukaryotes play critical roles in global biogeochemical cycles and climate. However, their poor representation in culture collections limits our understanding of the evolutionary history and genomic underpinnings of planktonic ecosystems. Here, we used 280 billion
Abstract The smallest phytoplankton species are key actors in oceans biogeochemical cycling and their abundance and distribution are affected with global environmental changes. Among them, algae of the Pelagophyceae class encompass coastal species causative of harmful algal blooms while others are cosmopolitan and abundant. The lack of genomic reference in this lineage is a main limitation to study its ecological importance. Here, we analysed Pelagomonas calceolata relative abundance, ecological niche and potential for the adaptation in all oceans using a complete chromosome-scale assembled genome sequence. Our results show that P. calceolata is one of the most abundant eukaryotic species in the oceans with a relative abundance favoured by high temperature, low-light and iron-poor conditions. Climate change projections based on its relative abundance suggest an extension of the P. calceolata habitat toward the poles at the end of this century. Finally, we observed a specific gene repertoire and expression level variations potentially explaining its ecological success in low-iron and low-nitrate environments. Collectively, these findings reveal the ecological importance of P. calceolata and lay the foundation for a global scale analysis of the adaptation and acclimation strategies of this small phytoplankton in a changing environment.
In condensed matter physics, simplified descriptions are obtained by coarse-graining the features of a system at a certain characteristic length, defined as the typical length beyond which some properties are no longer correlated. From a physics standpoint, in vitro DNA has thus a characteristic length of 300 base pairs (bp), the Kuhn length of the molecule beyond which correlations in its orientations are typically lost. From a biology standpoint, in vivo DNA has a characteristic length of 1000 bp, the typical length of genes. Since bacteria live in very different physico-chemical conditions and since their genomes lack translational invariance, whether larger, universal characteristic lengths exist is a non-trivial question. Here, we examine this problem by leveraging the large number of fully sequenced genomes available in public databases. By analyzing GC content correlations and the evolutionary conservation of gene contexts (synteny) in hundreds of bacterial chromosomes, we conclude that a fundamental characteristic length around 10-20 kb can be defined. This characteristic length reflects elementary structures involved in the coordination of gene expression, which are present all along the genome of nearly all bacteria. Technically, reaching this conclusion required us to implement methods that are insensitive to the presence of large idiosyncratic genomic features, which may co-exist along these fundamental universal structures.
This dataset contains all coordinates and presence probabilities of biogeographies described in the article "Restructuring of genomic provinces of surface ocean plankton under climate change" https://www.biorxiv.org/content/10.1101/2020.10.20.347237v8