Comparison of Classification Analysis Using LASSO and Principal Component Analysis for Kidney Cancer

2021 
Recently, large-scale biodata has been generated using advanced biotechnology methods; thus, the importance of analyzing this technology has increased. Numerous data mining methods in the bioinformatics field have been developed for processing biodata. We analyzed gene expression data and clinical data of kidney cancer patients as the TCGA database. To predict the prognosis of kidney cancer patients, we extracted significant genes and then we applied a data mining-based classification method to the data. With principal component analysis (PCA) and least absolute shrinkage and selection operator (LASSO), we extracted significant genes and compared classification accuracy and performance with a classification algorithm. We combined clinical data from patients with kidney cancer and gene data to determine the optimal classification model. Also, with sample type and primary diagnosis, we estimated classification accuracy as risk factors. As experimental results, neural network algorithms and logistic regression had the best performance in classification accuracy. The LASSO method showed better classification performance than PCA method for significant gene extraction. We can apply the results to extract biomarkers to predict prognosis of kidney cancer which has a lot of causes and to prevent and diagnose kidney cancer.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []