Abstract Background: Methods based on within-sample relative expression orderings (REOs) comparisons have been proposed for identifying differentially expressed genes (DEGs) at the individual level and for detecting disease-associated genes based on one-phenotype disease data by reusing data of normal samples from other sources. However, the common potential confounding factors, including age, cigarette smoking, sex and race whether could affect the REOs of gene pairs is still unclear. Methods: For one confounding factor, based on the numbers of related gene pairs or DEGs, we evaluated the effect of this confounding factor on the REOs of gene pairs within normal lung tissues transcriptome. Results: Our results showed that age has little effect on REOs within lung tissues. We found that about 0.23% of the significantly stable REOs of gene pairs in non-smokers’ lung tissues are reversed in smokers’ lung tissues, introduced by 344 DEGs between the two groups of samples (RankCompV2, FDR < 0.05), which are enriched in metabolism of xenobiotics by cytochrome P450, glutathione metabolism and other pathways (hypergeometric test, FDR < 0.05). Comparison between the normal lung tissue samples of males and females revealed fewer reversal REOs introduced by 24 DEGs between the sex groups, among which 19 DEGs are located on sex chromosomes and 5 DEGs involving in spermatogenesis and regulation of oocyte are located on autosomes. Between the normal lung tissue samples of white and black people, we identified 22 DEGs (RankCompV2, FDR < 0.05) which introduced a few reversal REOs between the two races. Conclusions: In summary, the REO-based study should take into account the confounding factors of cigarette smoking, sex and race.
The Connectivity Map (CMAP) database, an important public data source for drug repositioning, archives gene expression profiles from cancer cell lines treated with and without bioactive small molecules. However, there are only one or two technical replicates for each cell line under one treatment condition. For such small-scale data, current fold-changes-based methods lack statistical control in identifying differentially expressed genes (DEGs) in treated cells. Especially, one-to-one comparison may result in too many drug-irrelevant DEGs due to random experimental factors. To tackle this problem, CMAP adopts a pattern-matching strategy to build “connection” between disease signatures and gene expression changes associated with drug treatments. However, many drug-irrelevant genes may blur the “connection” if all the genes are used instead of pre-selected DEGs induced by drug treatments. We applied OneComp, a customized version of RankComp, to identify DEGs in such small-scale cell line datasets. For a cell line, a list of gene pairs with stable relative expression orderings (REOs) were identified in a large collection of control cell samples measured in different experiments and they formed the background stable REOs. When applying OneComp to a small-scale cell line dataset, the background stable REOs were customized by filtering out the gene pairs with reversal REOs in the control samples of the analyzed dataset. In simulated data, the consistency scores of overlapping genes between DEGs identified by OneComp and SAM were all higher than 99%, while the consistency score of the DEGs solely identified by OneComp was 96.85% according to the observed expression difference method. The usefulness of OneComp was exemplified in drug repositioning by identifying phenformin and metformin related genes using small-scale cell line datasets which helped to support them as a potential anti-tumor drug for non-small-cell lung carcinoma, while the pattern-matching strategy adopted by CMAP missed the two connections. The implementation of OneComp is available at https://github.com/pathint/reoa . OneComp performed well in both the simulated and real data. It is useful in drug repositioning studies by helping to find hidden “connections” between drugs and diseases.
Owing to the remarkable heterogeneity of gastric cancer (GC), population-level differentially expressed genes (DEGs) identified using case-control comparison cannot indicate the dysregulated frequency of each DEG in GC. In this work, first, the individual-level DEGs were identified for 1,090 GC tissues without paired normal tissues using the RankComp method. Second, we directly compared the gene expression in a cancer tissue to that in paired normal tissue to identify individual-level DEGs among 448 paired cancer-normal gastric tissues. We found 25 DEGs to be dysregulated in more than 90% of 1,090 GC tissues and also in more than 90% of 448 GC tissues with paired normal tissues. The 25 genes were defined as universal DEGs for GC. Then, we measured 24 paired cancer-normal gastric tissues by RNA-seq to validate them further. Among the universal DEGs, 4 upregulated genes (BGN, E2F3, PLAU, and SPP1) and 1 downregulated gene (UBL3) were found to be cancer genes already documented in the COSMIC or F-Census databases. By analyzing protein-protein interaction networks, we found 12 universally upregulated genes, and we found that their 284 direct neighbor genes were significantly enriched with cancer genes and key biological pathways related to cancer, such as the MAPK signaling pathway, cell cycle, and focal adhesion. The 13 universally downregulated genes and 16 direct neighbor genes were also significantly enriched with cancer genes and pathways related to gastric acid secretion. These universal DEGs may be of special importance to GC diagnosis and treatment targets, and they may make it easier to study the molecular mechanisms underlying GC.
The heterogeneity of cancer is a big obstacle for cancer diagnosis and treatment. Prioritizing combinations of driver genes that mutate in most patients of a specific cancer or a subtype of this cancer is a promising way to tackle this problem. Here, we developed an empirical algorithm, named PathMG, to identify common and subtype-specific mutated sub-pathways for a cancer. By analyzing mutation data of 408 samples (Lung-data1) for lung cancer, three sub-pathways each covering at least 90% of samples were identified as the common sub-pathways of lung cancer. These sub-pathways were enriched with mutated cancer genes and drug targets and were validated in two independent datasets (Lung-data2 and Lung-data3). Especially, applying PathMG to analyze two major subtypes of lung cancer, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LSCC), we identified 13 subtype-specific sub-pathways with at least 0.25 mutation frequency difference between LUAD and LSCC samples in Lung-data1, and 12 of the 13 sub-pathways were reproducible in Lung-data2 and Lung-data3. Similar analyses were done for colorectal cancer. Together, PathMG provides us a novel tool to identify potential common and subtype-specific sub-pathways for a cancer, which can provide candidates for cancer diagnoses and sub-pathway targeted treatments.
It is a basic task in high-throughput gene expression profiling studies to identify differentially expressed genes (DEGs) between two phenotypes. But the weakly differential expression signals between two phenotypes are hardly detectable with limited sample sizes. To solve this problem, many researchers tried to combine multiple independent datasets using meta-analysis or batch effect adjustment algorithms. However, these algorithms may distort true biological differences between two phenotypes and introduce unacceptable high false rates, as demonstrated in this study. These problems pose critical obstacles for analyzing the transcriptional data in The Cancer Genome Atlas where there are many small-scale batches of data. Previously, we developed RankComp to detect DEGs for individual disease samples through exploiting the incongruous relative expression orderings between two phenotypes and further improved it here to identify DEGs using multiple independent datasets. We demonstrated the improved RankComp can directly analyze integrated cross-site data to detect DEGs between two phenotypes without the need of batch effect adjustments. Its usage was illustrated in detecting weak differential expression signals of breast cancer drug-response data using combined datasets from multiple experiments.
Effective compound combination (ECC; i.e, 20-S-ginsenoside Rh1, astragaloside, icariin, nobiletin, and paeonol), derived from Chinese herbal medicine, significantly ameliorates chronic obstructive pulmonary disease (COPD) in rats; however, the underlying mechanisms of ECC remain largely unclear. In this study, network pharmacology analysis integrated with experimental validation was used to explore the therapeutic mechanisms of ECC against COPD. ECC targets and COPD genes and targets were identified from multiple databases, and then used for an analysis of protein–protein interaction (PPI) networks, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and biological functioning. BisoGenet was used to comprehensively analyze the hub-network. We validated the therapeutic effect and mechanisms of ECC both in vivo and in vitro . We identified 45 ECC targets, which were mainly related to inflammatory processes, such as the NOD-like and NF-kappa B signaling pathways, hematopoietic cell lineage, Th17 cell differentiation, cellular response to lipopolysaccharide, and interleukin-8 secretion. In addition, 1180 COPD genes and 70 COPD targets were identified as being involved in the biological functions associated with COPD development, such as cytokine–cytokine receptor interaction, the TNF signaling pathway, the mitogen-activated protein kinase (MAPK) signaling pathway, regulation of lymphocyte proliferation, and positive regulation of leukocyte migration. Integrative analysis of COPD genes and targets and ECC target networks revealed that 54 genes were mainly involved in the inflammatory process, such as IL-17 signaling, NF-kappa B signaling, innate immune response–activating signal transduction, and macrophage cell differentiation. Six targets (AR, ESR1, HNRNPA1, PAPR1, TP53, and VCAM1) contained in the hub-network and their four related compounds were obtained and recognized as the key molecules associated with the effects of ECC. Molecular docking validation demonstrated that four compounds could bind to six targets that interact with COPD genes. Finally, in vivo and in vitro experiments verified that ECC treatment ameliorated the symptoms of COPD in rats by improving their lung function, reducing pathological changes, and suppressing oxidative responses and pro-inflammatory cytokine secretion, while inhibiting inflammation in LPS-induced macrophages, which may be associated with NF-kappa B and MAPK signaling regulation. This study demonstrates the therapeutic mechanisms and effects of ECC on COPD via regulation of the underlying inflammatory process.
Until recently, few prognostic signatures for colorectal cancer (CRC) patients receiving 5-fluorouracil (5-FU)-based chemotherapy could be used in clinical practice. Here, using transcriptional profiles for a panel of cancer cell lines and three cohorts of CRC patients, we developed a prognostic signature based on within-sample relative expression orderings (REOs) of six gene pairs for stage II-III CRC patients receiving 5-FU-based chemotherapy. This REO-based signature had the unique advantage of being insensitive to experimental batch effects and free of the impractical data normalization requirement. After stratifying 184 CRC samples with multi-omics data from The Cancer Genome Atlas into two prognostic groups using the REO-based signature, we further revealed that patients with high recurrence risk were characterized by frequent gene copy number aberrations reducing 5-FU efficacy and DNA methylation aberrations inducing distinct transcriptional alternations to confer 5-FU resistance. In contrast, patients with low recurrence risk exhibited deficient mismatch repair and carried frequent gene mutations suppressing cell adhesion. These results reveal the multi-omics landscapes determining prognoses of stage II-III CRC patients receiving 5-FU-based chemotherapy.
Chronic obstructive pulmonary disease (COPD) is a common respiratory disease with high morbidity and mortality. The etiology of COPD is complex, and the pathogenesis mechanisms remain unclear. In this study, we used rat and human COPD gene expression data from our laboratory and the Gene Expression Omnibus (GEO) database to identify differentially expressed genes (DEGs) between individuals with COPD and healthy individuals. Then, protein-protein interaction (PPI) networks were constructed, and hub genes were identified. Cytoscape was used to construct the co-expressed network and competitive endogenous RNA (ceRNA) networks. A total of 198 DEGs were identified, and a PPI network with 144 nodes and 355 edges was constructed. Twelve hub genes were identified by the cytoHubba plugin in Cytoscape. Of these genes, CCR3, CCL2, COL4A2, VWF, IL1RN, IL2RA, and CCL13 were related to inflammation or immunity, or tissue-specific expression in lung tissue, and their messenger RNA (mRNA) levels were validated by qRT-PCR. COL4A2, VWF, and IL1RN were further verified by the GEO dataset GSE76925, and the ceRNA network was constructed with Cytoscape. These three genes were consistent with COPD rat model data compared with control data, and their dysregulation direction was reversed when the COPD rat model was treated with effective-component compatibility of Bufei Yishen formula III. This bioinformatics analysis strategy may be useful for elucidating novel mechanisms underlying COPD. We pinpointed three key genes that may play a role in COPD pathogenesis and therapy, which deserved to be further studied.
Abstract Background Methods based on within-sample relative expression orderings (REOs) comparisons could be applied for various medical issues such as individualized diagnosis of cancer and subtype identification etc., it could also be used for identifying differentially expressed genes (DEGs) at the individual level and detecting disease-associated genes based on one-phenotype disease data by reusing data of normal samples from other sources. However, the common potential confounding factors, including age, cigarette smoking, sex and race whether could affect the REOs of gene pairs is still unclear. Here, we evaluated these confounding factors on the REOs of gene pairs within normal lung tissues transcriptome. Results For one confounding factor, based on the number of related gene pairs or DEGs, the effect of this confounding factor on REO was evaluated. Our results showed that age has little effect on REOs within lung tissues. We found that about 0.23% of the significantly stable REOs of gene pairs in non-smokers’ lung tissues are reversed in smokers’ lung tissues, introduced by 344 DEGs between the two groups of samples (RankCompV2, FDR < 0.05), which are enriched in metabolism of xenobiotics by cytochrome P450, glutathione metabolism and other pathways (hypergeometric test, FDR < 0.05). Comparison between the normal lung tissue samples of males and females revealed fewer reversal REOs introduced by 24 DEGs between the sex groups, among which 19 DEGs are located on sex chromosomes and 5 DEGs involving in spermatogenesis and regulation of oocyte are located on autosomes. Between the normal lung tissue samples of white and black people, we identified 22 DEGs (RankCompV2, FDR < 0.05) which introduced a few reversal REOs between the two races. Conclusions In summary, the REO-based study should consider the confounding factors of cigarette smoking, sex and race.