The impact of scan number and its preprocessing in micro-FTIR imaging when applying machine learning for breast cancer subtypes classification

2021 
Abstract The breast cancer molecular subtype is an important classification to outline the prognostic. Gold-standard assessing using immunohistochemistry adds subjectivity due to interlaboratory and interobserver variations. In order to increase the diagnosis confidence, other techniques need to be examined, where the FTIR spectroscopy imaging allied with machine learning techniques may provide additional and quantitative information regarding the molecular composition. However, the impact of co-added scans acquisition parameter into machine learning classifications still needs better evaluation. In this study, FTIR images of Luminal B and HER2 subtypes were acquired varying the scan number and preprocessing techniques. It was demonstrated a spectral quality improvement when the scan number was increased, decreasing the standard deviation and outliers. Six machine learning models were used to classify the subtypes: Linear Discriminant Analysis, Partial Least Squares Discriminant Analysis, K-Nearest Neighbors, Support Vector Machine, Random Forest and Extreme Gradient Boosting. Best mean accuracy of 0.995 was achieved by Extreme Gradient Boosting model. It was found that all models achieved similar high accuracies with groups b256_064 (256 background and 064 scans), b256_128 and b128_128. Besides assessing the performance of different models, the b256_064 was established as the optimal group due to the minimum acquisition time. Therefore, this work indicates b256_064 for breast cancer subtype classification and also as a basis for other studies using machine learning for cancer evaluation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []