Detecting Rare Cell Populations in Flow Cytometry Data Using UMAP

2021 
We present an approach to detecting small cell populations in flow cytometry (FCM) samples based on the combination of unsupervised manifold embedding and supervised random forest classification. Each sample consists of hundred thousands to a million cells where each cell typically corresponds to a measurement vector with 10 to 50 dimensions. The difficulty of the task is that clusters of measurement vectors formed in the data space according to standard clustering criteria often do not correspond to biologically meaningful sub-populations of cells, due to strong variations in shape and size of their distributions. In many cases the relevant population consists of less than 100 scattered events out of millions of events, where supervised approaches perform better than unsupervised clustering. The aim of this paper is to demonstrate that the performance of the standard supervised classifier can be improved significantly by combining it with a preceding unsupervised learning step involving the Uniform Manifold Approximation and Projection (UMAP). We present an experimental evaluation on FCM data from children suffering from Acute Lymphoblastic Leukemia (ALL) showing that the improvement particularly occurs in difficult samples where the size of the relevant population of leukemic cells is low in relation to other sub-populations. We show that the positive effect of the UMAP becomes more noticeable for smaller training sets. Further, the experiments indicate that in this situation the algorithm also outperforms other baseline methods based on Gaussian Mixture Models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []