Cetaceans (whales, dolphins, and porpoises) are a group of mammals adapted to various aquatic habitats, from oceans to freshwater rivers. We report the sequencing, de novo assembly and analysis of a finless porpoise genome, and the re-sequencing of an additional 48 finless porpoise individuals. We use these data to reconstruct the demographic history of finless porpoises from their origin to the occupation into the Yangtze River. Analyses of selection between marine and freshwater porpoises identify genes associated with renal water homeostasis and urea cycle, such as urea transporter 2 and angiotensin I-converting enzyme 2, which are likely adaptations associated with the difference in osmotic stress between ocean and rivers. Our results strongly suggest that the critically endangered Yangtze finless porpoises are reproductively isolated from other porpoise populations and harbor unique genetic adaptations, supporting that they should be considered a unique incipient species.
Colorectal cancer (CRC) causes high morbidity and mortality worldwide, and noninvasive gut microbiome (GM) biomarkers are promising for early CRC diagnosis. However, the GM varies significantly based on ethnicity, diet and living environment, suggesting varied GM biomarker performance in different regions. We performed a metagenomic association analysis on stools from 52 patients and 55 corresponding healthy family members who lived together to identify GM biomarkers for CRC in Chongqing, China. The GM of patients differed significantly from that of healthy controls. A total of 22 microbial genes were included as screening biomarkers with high accuracy in additional 46 cases and 40 randomly selected healthy adults in Chongqing (area under the receive-operation curve (AUC) = 0.905, 95% CI 0.832–0.977). The classifier based on the identified 22 biomarkers also performed well in the cohort from Hong Kong (AUC = 0.811, 95% CI 0.715–0.907) and French (AUC = 0.859, 95% CI 0.773–0.944) populations. Quantitative PCR was applied for measuring three selected biomarkers in the classification of CRC patients in independent Chongqing population containing 30 cases and 30 controls and the best biomarker from Coprobacillus performed well with high AUC (0.930, 95% CI 0.904–0.955). This study revealed increased sensitivity and applicability of our GM biomarkers compared with previous biomarkers significantly promoting the early diagnosis of CRC.
Medulloblastoma (MB), a heterogeneous pediatric brain tumor, poses challenges in the treatment of tumor recurrence and dissemination. To characterize cellular diversity and genetic features, we comprehensively analyzed single-cell/nucleus RNA sequencing (sc/snRNA-seq), single-nucleus assay for transposase-accessible chromatin sequencing (snATAC-seq), and spatial transcriptomics profiles and identified distinct cellular populations in SHH (sonic hedgehog) and Group_3 subgroups, with varying proportions in local recurrence or dissemination. Local recurrence showed higher cycling tumor cell enrichment, whereas disseminated lesions had a relatively notable presence of differentiated subsets. Chromosomal alteration evaluation revealed distinct genetic subclones during MB progression, such as chr7q gain and chr11 loss in Group_3 disseminations. A subpopulation termed "high cellular plasticity (HCP)" emerged during MB progression and was associated with increased dividing potential and chromatin accessibility, contributing to recurrence. Inhibiting HCP-associated markers, like protein tyrosine phosphatase receptor type Z1 (PTPRZ1), efficiently suppressed MB progression in preclinical models. These findings address critical gaps in understanding the cellular diversity, chromosomal alterations, and biological dynamics of recurrent MB, offering potential therapeutic insights.
Protein inter-residue contacts are of great use for protein structure determination or prediction. Recent CASP events have shown that a few accurately predicted contacts can help improve both computational efficiency and prediction accuracy of the ab inito folding methods. This paper develops an integer linear programming (ILP) method for consensus-based contact prediction. In contrast to the simple "majority voting" method assuming that all the individual servers are equal and independent, our method evaluates their correlations using the maximum likelihood method and constructs some latent independent servers using the principal component analysis technique. Then, we use an integer linear programming model to assign weights to these latent servers in order to maximize the deviation between the correct contacts and incorrect ones; our consensus prediction server is the weighted combination of these latent servers. In addition to the consensus information, our method also uses server-independent correlated mutation (CM) as one of the prediction features. Experimental results demonstrate that our contact prediction server performs better than the "majority voting" method. The accuracy of our method for the top L/5 contacts on CASP7 targets is 73.41%, which is much higher than previously reported studies. On the 16 free modeling (FM) targets, our method achieves an accuracy of 37.21%.
Abstract Elysia chlorotica , a sacoglossan sea slug found off the East Coast of the United States, is well-known for its ability to sequester chloroplasts from its algal prey and survive by photosynthesis for up to 12 months in the absence of food supply. Here we present a draft genome assembly of E. chlorotica that was generated using a hybrid assembly strategy with Illumina short reads and PacBio long reads. The genome assembly comprised 9,989 scaffolds, with a total length of 557 Mb and a scaffold N50 of 442 kb. BUSCO assessment indicated that 93.3% of the expected metazoan genes were completely present in the genome assembly. Annotation of the E. chlorotica genome assembly identified 176 Mb (32.6%) of repetitive sequences and a total of 24,980 protein-coding genes. We anticipate that the annotated draft genome assembly of the E. chlorotica sea slug will promote the investigation of sacoglossan genetics, evolution, and particularly, the genetic signatures accounting for the long-term functioning of algal chloroplasts in an animal.
Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge.In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent performance improvement, indicating robustness of our approach. Furthermore, bi-clustering results of the extracted features are compatible with fold hierarchy of proteins, implying that these features are fold-specific. Together, these results suggest that the features extracted from predicted contacts are orthogonal to alignment-related features, and the combination of them could greatly facilitate fold recognition at superfamily/fold levels and template-based prediction of protein structures.Source code of DeepFR is freely available through https://github.com/zhujianwei31415/deepfr, and a web server is available through http://protein.ict.ac.cn/deepfr.zheng@itp.ac.cn or dbu@ict.ac.cn.Supplementary data are available at Bioinformatics online.
Autoimmune diseases (ADs) are characterized by their complexity and a wide range of clinical differences. Despite patients presenting with similar symptoms and disease patterns, their reactions to treatments may vary. The current approach of personalized medicine, which relies on molecular data, is seen as an effective method to address the variability in these diseases. This review examined the pathologic classification of ADs, such as multiple sclerosis and lupus nephritis, over time. Acknowledging the limitations inherent in pathologic classification, the focus shifted to molecular classification to achieve a deeper insight into disease heterogeneity. The study outlined the established methods and findings from the molecular classification of ADs, categorizing systemic lupus erythematosus (SLE) into four subtypes, inflammatory bowel disease (IBD) into two, rheumatoid arthritis (RA) into three, and multiple sclerosis (MS) into a single subtype. It was observed that the high inflammation subtype of IBD, the RA inflammation subtype, and the MS "inflammation & EGF" subtype share similarities. These subtypes all display a consistent pattern of inflammation that is primarily driven by the activation of the JAK-STAT pathway, with the effective drugs being those that target this signaling pathway. Additionally, by identifying markers that are uniquely associated with the various subtypes within the same disease, the study was able to describe the differences between subtypes in detail. The findings are expected to contribute to the development of personalized treatment plans for patients and establish a strong basis for tailored approaches to treating autoimmune diseases.
Additional file 3 Visualization of the imputation process. a, c Heatmap of SF and OV lab test data before imputation. b, d Heatmap of SF and OV lab test data after imputation. Black tiles refer to missing entries. Abbreviations: NK, Natural killer cells, Th, T-helper lymphocyte. Ts, T-suppressor lymphocyte. CRP, C reactive protein. PCT, procalcitonin. IFN-γ, interferon-γ. TNF-α, tumor necrosis factor α. IL-1β, interleukin 1β. IL-2R, interleukin 2 receptor. IL-4, interleukin 4. IL-6, interleukin 6. IL-8, interleukin 8. IL-10, interleukin 10. C-IGM, SARS-COV-2 specific antibody IgM. C-IGG, SARS-COV-2 specific antibody IgG. SF, Sino-French New City Campus of Tongji Hospital. OV, Optical Valley Campus of Tongji Hospital.