Background: Aberrant protein glycosylation is a common feature of cancer and contributes to malignant behavior. However, how and to what extent the cellular glycome is involved in cancer development and progression is still undefined. The primary objective of this study is to conduct insilico identification of glycome genes that could reveal a signature of cancer using expression profiles of cancer genomes. There exists a list of ~500 glycome genes in several molecular categories. This study is based on the hypothesis that if the glycosylation is a common feature of cancer, there exists a shortlist of cancer glycome genes and their expression profiles should carry the signature capable of differentiating 33 different cancers available in The Cancer Genome Atlas (TCGA). Method: The distribution of cancer samples in TCGA is highly imbalanced, ranging from 36 for Cholangiocarcinoma (CHOL) to 1089 for Breast Cancer (BRCA). Supervised feature selection approaches to identify the signature genes would be biased to larger groups. We developed a computational framework using concrete autoencoder (CAE), a deep learning-based unsupervised feature selection algorithm, to find the cancer-related glycome genes. The criteria of optimal feature subset used in this study are (a) the number of features should be as few as possible, and (b) accuracy of classification using the selected features should be > 90%. Results: Our experiment showed a shortlist of glycome genes (132 genes) that can differentiate 33 different cancers with an accuracy of 92%. This study reflects that the cancer glycome genes signify the origins of cancer.
Protein networks that mirror the transitions between disease stages hold the key to early diagnosis and make it easy to understand the essential mechanisms of disease progression at protein network level. But, identifying critical transitions between disease stages and corresponding protein networks during the initiation and progression of a complex disease like cancer is a challenging task. This preliminary work identifies the possible building blocks for disease initiation and progression at the protein network level based on biological rationale that a group of proteins are localized at a specific subcellular location to accomplish a function, which could be beneficial to human body or adversarial to cause a disease. We discovered that three graph-theoretic concepts - i) Clique-like structures, ii) Bipartite-like structures, and iii) Diffusion Kernels could be possible building blocks for disease progression at the protein network level. Using these building blocks, disease progression can be modeled as an event-schedule-like structure, meaning that each of the disease stages corresponds to an event, where each event is completed by a set of proteins by forming a clique-like structure. Once an event or disease stage is completed by a group of proteins, disease signals go to the next group of proteins to cause the next event or disease stage and so on. The transfer of signals can be represented by bipartite-like structure and diffusion kernels can be used to find the strength of disease signals. Further study is required to fully explore the application of these building blocks to analyze the disease progression.
Architecture, engineering, and construction (AEC) industry possesses a diverse workforce involving multi-stakeholder participation. Building information modelling (BIM) technology appears as a cutting-edge platform for collaboration throughout the construction life-cycle. However, many barriers, such as high cost of implementation, inefficient data interoperability, and lack of skilled personnel have diminished the adoption of BIM technology to be slower than anticipated. One of the hidden and overlooked problems is that the current BIM technology is only able to offer its functionality in a strictly uniform manner regardless of workforce's knowledge levels, information needs, and the level of development (LOD) requirements. To address this gap, we are proposing a new intelligent BIM companion (iBCom) technology that can assist diverse workforce to actively seek and efficiently access and use BIM information through interactive and context-sensitive natural language conversations. The ability to efficiently and accurately retrieve information from BIM models based on user's information needs is critical to support the iBCom technology. This paper thus proposes a novel network-based BIM information extraction approach that automatically extracts building element information from the industry foundation classes (IFC) files. The proposed methodology is based on graph theory which first creates building element networks among principle (parent), associated (children), and terminal nodes of IFC data instances, and extracts the information of a particular building element directly from the terminal nodes of its network. The proposed methodology achieved 100% precision and recall when extracting geometry information from two test case models.
Background: In the United States, African American Males (AAM) have the highest lung cancer incidence and mortality rate compared to European American Males (EAM). Cigarette is considered the major risk factor for lung cancer, but smoking alone fails to interpret the rationale for developing lung cancer between AAM and EAM. The higher rates of lung cancer among AAM occur even though they have lower smoking rates, smoke fewer cigarettes per day, and are less likely to be heavy smokers than EAM. Identifying genomic signatures such as key genes that can differentiate lung cancers between AAM and EAM will be a stepping stone to comprehend the disparity of lung cancer between AAM and EAM.Method: The gene expression profiles of whole blood samples from AAM and EAM patients were used to identify the key genes that can differentiate the lung cancers between AAM and EAM. Due to the US population's imbalanced nature between AAM and EAM, the distribution of samples for the present study is also highly imbalanced (AAM: 15 and EAM: 153). Here, we developed a computational framework using a deep learning-based unsupervised feature selection approach, concrete autoencoder (CAE), which can select actual features rather than latent features. First, we showed that features such as differentially expressed genes (DEGs) discovered by a supervised statistical approach LIMMA could not differentiate lung cancers between AAM and EAM. Then we showed that the CAE could isolate essential features capable of differentiating lung cancers between AAM and EAM.Results: The proposed framework using CAE was able to detect 34 key features/genes, which outperforms all sets of DEGs identified using three different thresholds on fold change. Using the selected 34 genes, the Random Forest classifier was able to classify lung cancers among AAM and EAM with 99% accuracy and only one false negative.Conclusion: The proposed framework using CAE reveals the key genes that can differentiate lung tumors between AAM and EAM. These key genes can be used as biomarkers to understand the difference in lung cancer development between AAM and EAM. This study also showed that the CAE is capable of extracting relevant features from a highly imbalanced dataset.
Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer.
It is crucial to find prognostic biomarkers that can predict the cancer prognosis and estimate risk, as they can be used in clinical settings to treat patients. Probing the biomarkers themselves will reveal important insights into the cancer dynamics and molecular pathways underlying pathological behavior. To achieve that goal, this work proposes a bioinformatics framework, taking advantage of the deep learning-based feature selection method Concrete Autoencoder (CAE) to identify key genes and to build a prognostic score model that can assess the risk of cancer patients. 48 gene-pairs were identified to form a prognostic signature model that can significantly differentiate between high-risk and low-risk patients with breast cancer. This prognostic signature was comprised of 42 genes enriched in cancer-related pathways and molecular functions. The proposed framework and the prognostic model can be used as clinical tools to assess the risk levels of breast cancer patients. The identified genes can be studied further for potential targets for cancer therapy.
Identification of conserved gene network modules in different stages of cancer may lead to uncovering mechanisms behind cancer initiation and progression. This work is based on two hypotheses. Hypothesis-l: the network modules conserved in all cancer stages are potential biomarkers related to the trajectory of cancer development or progression of cancer from initiation to stage-to-stage to metastasis. Hypothesis-2: The network modules from a stage, which are not conserved in other stages, can be considered as the stage-specific biomarkers for diagnosis.To test the hypotheses, gene expression and clinical data of Breast Invasive Carcinoma (BRCA) from The Cancer Genome Atlas (TCGA) were used for analysis. Gene expression data was divided into five groups- stage I to stage IV and normal tissue samples. First, the co-expression networks for each of the four stages and normal samples were generated. Second, the modules from each of the stage-specific networks were discovered using weighted gene co-expression network analysis (WGCNA). Third, survival analysis was performed to identify the prognostically significant modules. Fourth, module preservation analysis was performed to determine whether a module from one stage is preserved in other cancer stages as well as in normal stage. Finally, gene ontology and pathway enrichment analyses were performed for the prognostically significant and conserved modules.The present study discovered several gene-network modules for breast cancer preserved in all cancer stages and are significant in overall survival; hence, they can be considered potential biomarkers for cancers, related to the trajectory of cancer development. The modules that were found not to be conserved in different stages can be considered as stage-specific biomarkers.
Finding the biomarkers of cancers and the analysis of cancer-driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Gene expression profiling has been widely used for cancer research, and its patterns, combined with statistical and computational techniques have been explored in many types of cancer. Genes having correlations in terms of expression may form complexes, pathways, or participate in regulatory and signaling circuits [1]-[3].