Objective We performed an updated meta-analysis, using a comprehensive strategy of a logistic regression and a model-free approach, to evaluate more precisely the role of the rs4444235 variant near the Bone morphogenetic protein-4 (BMP4) gene in susceptibility to colorectal cancer (CRC). Methods A total of 19 studies with 28770 cases and 28234 controls were included. Metagen system with logistic regression was applied to choose the most plausible genetic model for rs4444235. Generalized odds ratio (ORG) metric was used to provide a global test of relationship between rs4444235 and CRC risk. Results Metagen analysis suggested the rs4444235 fitted best to an additive model. In assessment of the additive model, heterogeneity was observed (P = 0.059, I2 = 36.1), and pooled per-allele OR was 1.08 (95% CI = 1.05–1.11). Based on the model-free approach, pooled ORG was 1.09 (95% CI = 1.05–1.14) under a random-effect model. Stratified analyses suggested heterogeneity could be in part explained by population ethnicity, study design, sources of controls, and sample size. Sensitivity analysis further supported the robust stability of the current results, by showing similar pooled estimates before and after sequential removal of each study. Conclusions This meta-analysis provides a robust estimate of the positive association between the rs4444235 and CRC risk and further emphasizes the importance of the rs4444235 in CRC risk prediction.
Defining meaningful feature (molecule) combinations can enhance the study of disease diagnosis and prognosis. However, feature combinations are complex and various in biosystems, and the existing methods examine the feature cooperation in a single, fixed pattern for all feature pairs, such as linear combination. To identify the appropriate combination between two features and evaluate feature combination more comprehensively, this paper adopts kernel functions to study feature relationships and proposes a new omics data analysis method KF-[Formula: see text]-TSP. Besides linear combination, KF-[Formula: see text]-TSP also explores the nonlinear combination of features, and allows hybridizing multiple kernel functions to evaluate feature interaction from multiple views. KF-[Formula: see text]-TSP selects [Formula: see text] > 0 top-scoring pairs to build an ensemble classifier. Experimental results show that KF-[Formula: see text]-TSP with multiple kernel functions which evaluates feature combinations from multiple views is better than that with only one kernel function. Meanwhile, KF-[Formula: see text]-TSP performs better than TSP family algorithms and the previous methods based on conversion strategy in most cases. It performs similarly to the popular machine learning methods in omics data analysis, but involves fewer feature pairs. In the procedure of physiological and pathological changes, molecular interactions can be both linear and nonlinear. Hence, KF-[Formula: see text]-TSP, which can measure molecular combination from multiple perspectives, can help to mine information closely related to physiological and pathological changes and study disease mechanism.
Abstract In a Chinese prospective cohort, 500 patients with new‐onset type 2 diabetes (T2D) within 4.61 years and 500 matched healthy participants are selected as case and control groups, and randomized into discovery and validation sets to discover the metabolite changes before T2D onset and the related diabetogenic loci. A serum metabolomics analysis reveals that 81 metabolites changed significantly before T2D onset. Based on binary logistic regression, eight metabolites are defined as a biomarker panel for T2D prediction. Pipecolinic acid, carnitine C14:0, epinephrine and phosphatidylethanolamine 34:2 are first found associated with future T2D. The addition of the biomarker panel to the clinical markers (BMI, triglycerides, and fasting glucose) significantly improves the predictive ability in the discovery and validation sets, respectively. By associating metabolomics with genomics, a significant correlation ( p < 5.0 × 10 −8 ) between eicosatetraenoic acid and the FADS1 (rs174559) gene is observed, and suggestive correlations ( p < 5.0 × 10 −6 ) between pipecolinic acid and CHRM3 (rs535514), and leucine/isoleucine and WWOX (rs72487966) are discovered. Elevated leucine/isoleucine levels increased the risk of T2D. In conclusion, multiple metabolic dysregulations are observed to occur before T2D onset, and the new biomarker panel can help to predict T2D risk.
Large-scale metabolite annotation is a bottleneck in untargeted metabolomics. Here, we present a structure-guided molecular network strategy (SGMNS) for deep annotation of untargeted ultra-performance liquid chromatography-high resolution mass spectrometry (MS) metabolomics data. Different from the current network-based metabolite annotation method, SGMNS is based on a global connectivity molecular network (GCMN), which was constructed by molecular fingerprint similarity of chemical structures in metabolome databases. Neighbor metabolites with similar structures in GCMN are expected to produce similar spectra. Network annotation propagation of SGMNS is performed using known metabolites as seeds. The experimental MS/MS spectra of seeds are assigned to corresponding neighbor metabolites in GCMN as their "pseudo" spectra; the propagation is done by searching predicted retention times, MS1, and "pseudo" spectra against metabolite features in untargeted metabolomics data. Then, the annotated metabolite features were used as new seeds for annotation propagation again. Performance evaluation of SGMNS showed its unique advantages for metabolome annotation. The developed method was applied to annotate six typical biological samples; a total of 701, 1557, 1147, 1095, 1237, and 2041 metabolites were annotated from the cell, feces, plasma (NIST SRM 1950), tissue, urine, and their pooled sample, respectively, and the annotation accuracy was >83% with RSD <2%. The results show that SGMNS fully exploits the chemical space of the existing metabolomes for metabolite deep annotation and overcomes the shortcoming of insufficient reference MS/MS spectra.
Retention time (RT) prediction contributes to identification of small molecules measured by high-performance liquid chromatography coupled with high-resolution mass spectrometry. Deep learning algorithms based on big data can enhance the accuracy of RT prediction. But at different chromatographic conditions, RTs of compounds are different, and the number of compounds with known RTs is small in most cases. Therefore, the transfer of big data is necessary. In this work, a strategy using a deep neural network (DNN) pretrained by weighed autoencoders and transfer learning (DNNpwa-TL) was proposed to efficiently predict RTs of compounds. The loss function in the autoencoders was calculated with features weighted by mutual information. Then, a DNN pretrained by weighted autoencoders (DNNpwa) was produced. For other specific chromatographic methods, the transfer learning model DNNpwa-TLs were built through fine-tuning the DNNpwa with the help of some compounds with known RTs to conduct the RT prediction. With the above strategy, a DNNpwa was first built with the METLIN small molecule retention time data set containing 80 038 small molecule compounds. A median relative error of 3.1% and a mean relative error of 4.9% were achieved. Then, 17 data sets from different chromatographic methods were studied, and the results showed that the performance of DNNpwa-TL was better than those of other deep learning models. Besides, DNNpwa-TL outperformed random forest, gradient boost, least absolute shrinkage and selection operator regression, and DNN for most of the 17 data sets. Therefore, DNNpwa-TL can provide an efficient method to perform RT prediction of small molecule compounds for different chromatographic methods and conditions.
// Ran Liu 1 , Xiaohui Lin 1 , Zuojing Li 2 , Qing Li 1 and Kaishun Bi 1 1 School of Pharmacy, Shenyang Pharmaceutical University, Shenyang, 110016, China 2 School of Medical Devices, Shenyang Pharmaceutical University, Shenyang, 110016, China Correspondence to: Kaishun Bi, email: kaishunbi.syphu@gmail.com Keywords: polyamine; colorectal cancer; lasso regression analysis; quantitative metabolomics Received: October 12, 2017 Accepted: November 15, 2017 Published: December 04, 2017 ABSTRACT As an important biomarker for cancer, polyamine levels in body fluid could be employed for monitoring the colorectal cancer (CRC), however the role of polyamines in the development and therapeutics phases of CRC remains uncertain. In this paper, the relationship between polyamines and CRC development and therapeutics had been investigated by the study of changes in plasma polyamine levels during the precancerous, developmental and treatment phases of CRC. After inducing CRC in Wistar rats by intraperitoneal injection of 1, 2-dimethylhydrazine, the animals were given a traditional Chinese medicine, Aidi injections. Firstly, the polyamine levels in the plasma of CRC, healthy and medicated rats were measured by UHPLC-MS/MS assay. In addition, Lasso regression analysis was used for screening and confirming the key markers, which can be employed for distinguishing the healthy and CRC rats as well as the CRC and medication rats. The results obtained showed that polyamine metabolism had been disrupted by CRC but returned to normal levels following Aidi injections and, in particular, putrescine and agmatine were closely correlated with CRC. Our results demonstrate the potential value of plasma polyamine metabolic profiling during the early diagnosis and medical treatment of CRC. Also, the integrated method of polyamine metabolite target analysis and lasso regression analysis can be applied in metabolomics for seeking the differential metabolites.