Detection of effective genes in colon cancer: A machine learning approach

2021 
Abstract Nowadays, a variety of cancers have become common among humans which unfortunately are the cause of death for many of these people. Early detection and diagnosis of cancers can have a significant impact on the survival of patients and treatment cost reduction. Colon cancer is the third and the second main cause of women's and men's death worldwide among cancers. Hence, many researchers have been trying to provide new methods for early diagnosis of colon cancer. In this study, we apply statistical hypothesis tests such as t-test and Mann–Whitney–Wilcoxon and machine learning methods such as Neural Network, KNN and Decision Tree to detect the most effective genes in the vital status of colon cancer patients. We normalize the dataset using a new two-step method. In the first step, the genes within each sample (patient) are normalized to have zero mean and unit variance. In the second step, normalization is done for each gene across the whole dataset. Analyzing the results shows that this normalization method is more efficient than the others and improves the overall performance of the research. Afterwards, we apply unsupervised learning methods to find the meaningful structures in colon cancer gene expressions. In this regard, the dimensionality of the dataset is reduced by employing Principle Component Analysis (PCA). Next, we cluster the patients according to the PCA extracted features. We then check the labeling results of unsupervised learning methods using different supervised learning algorithms. Finally, we determine genes which have major impact on colon cancer mortality rate in each cluster. Our conducted study is the first which suggests that the colon cancer patients can be categorized into two clusters. In each cluster, 20 effective genes were extracted which can be important for early diagnosis of colon cancer. Many of these genes have been identified for the first time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    2
    Citations
    NaN
    KQI
    []