The amount of genomic sequence information continues to grow at an exponential rate, while the identification and characterization of genes without known homologs remains a major challenge. For non-model organisms with limited resources for manipulative studies, high-throughput transcriptomic data combined with bioinformatics methods provide a powerful approach to obtain initial insights into the function of unknown genes. In this study, we report the identification and characterization of a novel family of putatively secreted, small, cysteine-rich proteins herein named Small Cysteine-Rich Proteins (SCRiPs). Their discovery in expressed sequence tag (EST) libraries from the coral Montastraea faveolata required the performance of an iterative search strategy based on BLAST and Hidden-Markov-Model algorithms. While a discernible homolog could neither be identified in the genome of the sea anemone Nematostella vectensis, nor in a large EST dataset from the symbiotic sea anemone Aiptasia pallida, we identified SCRiP sequences in multiple scleractinian coral species. Therefore, we postulate that this gene family is an example of lineage-specific gene expansion in reef-building corals. Previously published gene expression microarray data suggest that a sub-group of SCRiPs is highly responsive to thermal stress. Furthermore, data from microarray experiments investigating developmental gene expression in the coral Acropora millepora suggest that different SCRiPs may play distinct roles in the development of corals. The function of these proteins remains to be elucidated, but our results from in silico, transcriptomic, and phylogenetic analyses provide initial insights into the evolution of SCRiPs, a novel, taxonomically restricted gene family that may be responsible for a lineage-specific trait in scleractinian corals.
Microbes are dominant drivers of biogeochemical processes, yet drawing a global picture of functional diversity, microbial community structure, and their ecological determinants remains a grand challenge. We analyzed 7.2 terabases of metagenomic data from 243 Tara Oceans samples from 68 locations in epipelagic and mesopelagic waters across the globe to generate an ocean microbial reference gene catalog with >40 million nonredundant, mostly novel sequences from viruses, prokaryotes, and picoeukaryotes. Using 139 prokaryote-enriched samples, containing >35,000 species, we show vertical stratification with epipelagic community composition mostly driven by temperature rather than other environmental factors or geography. We identify ocean microbial core functionality and reveal that >73% of its abundance is shared with the human gut microbiome despite the physicochemical differences between these two ecosystems.
Fecal microbiota transplantation (FMT) has shown efficacy in treating recurrent Clostridium difficile infection and is increasingly being applied to other gastrointestinal disorders, yet the fate of native and introduced microbial strains remains largely unknown. To quantify the extent of donor microbiota colonization, we monitored strain populations in fecal samples from a recent FMT study on metabolic syndrome patients using single-nucleotide variants in metagenomes. We found extensive coexistence of donor and recipient strains, persisting 3 months after treatment. Colonization success was greater for conspecific strains than for new species, the latter falling within fluctuation levels observed in healthy individuals over a similar time frame. Furthermore, same-donor recipients displayed varying degrees of microbiota transfer, indicating individual patterns of microbiome resistance and donor-recipient compatibilities.
Species interaction networks are shaped by abiotic and biotic factors. Here, as part of the Tara Oceans project, we studied the photic zone interactome using environmental factors and organismal abundance profiles and found that environmental factors are incomplete predictors of community structure. We found associations across plankton functional types and phylogenetic groups to be nonrandomly distributed on the network and driven by both local and global patterns. We identified interactions among grazers, primary producers, viruses, and (mainly parasitic) symbionts and validated network-generated hypotheses using microscopy to confirm symbiotic relationships. We have thus provided a resource to support further research on ocean food webs and integrating biological components into ocean models.
Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.
We simulated five human gut metagenomic samples to assess the taxonomic quantification accuracy of the mOTUs tool (link). In this directory you can find the metagenomic samples, the gold standard (used to produce them) and the profiles obtained with three metagenomic profiler tools. Check README.txt for more information. NOTE: Version1 is obsolete, the files used in the paper are in Version2.
We simulated ten human gut metagenomic samples to assess the taxonomic quantification accuracy of the mOTUs tool (link). In this directory you can find the metagenomic samples, the gold standard (used to produce them) and the profiles obtained with four metagenomic profiler tools. Check README.txt for more information.
This data is the result of the primary analysis of the ITS2 sequencing data associated with the Coral Diversity dataset collected from all islands as part of the Tara Pacific expedition. A full README is contained within the data upload.
The Tara Pacific expedition (2016-2018) sampled coral ecosystems around 32 islands in the Pacific Ocean, and sampled the surface of oceanic waters at 249 locations, resulting in the collection of nearly 58,000 samples (Gorsky et al. 2019, Planes et al. 2019, Flores et al. 2020). The expedition was designed to systematically study corals, fish, plankton, and seawater, and included the collection of samples for advanced biogeochemical, molecular, and imaging analysis. Here we provide the continuous dataset originating from the hyperspectral and multispectral spectrophotometers [ACS] instruments acquiring continuously during the full course of the campaign. Surface seawater was pumped continuously through a hull inlet located 1.5 m under the waterline using a membrane pump (10 LPM; Shurflo), circulated through a vortex debubbler, a flow meter, and distributed to a number of flow-through instruments. An [ACS] spectrophotometer (WETLabs) measured hyper-spectral (4 nm resolution) attenuation and absorption in the visible and near infrared except between Panama and Tahiti where an AC-9 multispectral spectrophotometer (WETLabs) was used instead. The flow was automatically directed through a 0.2 µm filter for 10 minutes every hour before being circulated through the spectrophotometer to eliminate the impact of biofouling and instrument drift and estimate particulate absorption [ap] and attenuation [cp] (Slade et al. 2010). Chlorophyll a content was estimated from particulate absorption line height at 676 nm (Boss et al. 2001). The particulate organic carbon concentration [poc] was estimated using an empirical relation (Gardner et al. 2006) between measured [poc] and measured [cp]. An indicator for size distribution of particles between 0.2 and ~20 µm [gamma] was calculated from [cp] (Boss et al 2001). The data was processed with custom software for underway optical data (InLineAnalysis software available on GitHub). The detailed information regarding the data processing is given in the processing report attached with the data and in Lombard et al. (In prep.). These results are preliminary: no matchup with in-situ chlorophyll from HPLC or [poc] measurements were performed.
Summary To obtain a proxy for the stress level of collected corals, we checked for previous occurrences of bleaching events at sampled reef sites by matching island GPS coordinates to the Reef Check dataset (reefcheck.org) obtained from Sully et al (2019). For each Tara Pacific island coordinate, we determined the Reef Check site that was closest (in terms of distance in km); we only considered Reef Check data that was within a 10 km circumference. We further determined short- and long-term climate variables that are known to affect coral stress resilience for all Tara Pacific collection sites that are available from Lombard et al (2022). These data allow to assess if corals from a given site were exposed higher/lower prevalence of thermal stress events and bleaching prior to sampling (over previous years). References Sully, S., Burkepile, D. E., Donovan, M. K., Hodgson, G. & van Woesik, R. A global analysis of coral bleaching over the past two decades. Nature Communications10, 1264 (2019). Fabien Lombard, Guillaume Bourdin, Stephane Pesant, Sylvain Agostini, Alberto Baudena, Emilie Boissin, Nicolas Cassar, Megan Clampitt, Pascal Conan, Ophélie Da Silva, Celine Dimier, Eric Douville, Amanda Elineau, Jonathan Fin, J. Michel Flores, Jean François Ghiglione, Benjamin C.C. Hume, Laetitia Jalabert, Seth G. John, Rachel L. Kelly, Ilan Koren, Yajuan Lin, Dominique Marie, Ryan McMinds, Zoé Mériguet, Nicolas Metzl, David A. Paz-García, Maria Luiza Pedrotti, Julie Poulain, Mireille Pujo-Pay, Josephine Ras, Gilles Reverdin, Sarah Romac, Eric Röttinger, Assaf Vardi, Christian R. Voolstra, Clémentine Moulin, Guillaume Iwankow, Bernard Banaigs, Chris Bowler, Colomban de Vargas, Didier Forcioli, Paola Furla, Pierre E. Galand, Eric Gilson, Stéphanie Reynaud, Shinichi Sunagawa, Olivier Thomas, Romain Troublé, Rebecca Vega Thurber, Patrick Wincker, Didier Zoccola, Denis Allemand, Serge Planes, Emmanuel Boss, Gaby Gorsky. Open science resources from the Tara Pacific expedition across the surface ocean and coral reef ecosystems. Submitted (2022)