Lung Cancer Classification Models Using Discriminant Information of Mutated Genes in Protein Amino Acids Sequences

2019 
Lung cancer is a heterogeneous disease based on uncontrollable growth of cells. Lung cancer is major cause of cancer-related deaths. Early diagnosis of lung cancer is important for its treatment and survival of patients. In this study, through the statistical analysis of cancerous proteins sequences, we observed the mutated genes associated with etiology of lung cancer. Our analysis revealed most frequent mutated genes TP53, EGFR, KMT2D, PDE4DIP, ATM, ZNF521, DICER1, CTNNB1 RUNX1T1, SMARCA4, FBXW7, NF1, PIK3CA, STK11, NTRk3, APC, PTPRB, BRCA2, MYH11 and AMER1. We observed abnormal mutations in genes contributed toward variations in the composition of amino acid sequences. This variation was described in various feature spaces using statistical and physicochemical properties of amino acids. These influential features have provided sufficient discrimination power for the development of effective lung cancer classification models (LCCMs). The main advantage of proposed novel approach is the effective utilization of the discriminant information of mutated genes. Experimental results showed that SVM model has the best performance in split amino acid composition. In the study, we explored a new dimension of early lung cancer classification using discriminant information of mutated genes revealed through the statistical analysis of the mutated genes. It is anticipated that the proposed approach would be useful for practitioners and domain experts for early lung cancer diagnosis and prognosis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    3
    Citations
    NaN
    KQI
    []