A systematic review of datasets that can help elucidate relationships among gene expression, race, and immunohistochemistry-defined subtypes in breast cancer.

2021 
Scholarly requirements have led to a massive increase of transcriptomic data in the public domain, with millions of samples available for secondary research. We identified gene-expression datasets representing 10,214 breast-cancer patients in public databases. We focused on datasets that included patient metadata on race and/or immunohistochemistry (IHC) profiling of the ER, PR, and HER-2 proteins. This review provides a summary of these datasets and describes findings from 32 research articles associated with the datasets. These studies have helped to elucidate relationships between IHC, race, and/or treatment options, as well as relationships between IHC status and the breast-cancer intrinsic subtypes. We have also identified broad themes across the analysis methodologies used in these studies, including breast cancer subtyping, deriving predictive biomarkers, identifying differentially expressed genes, and optimizing data processing. Finally, we discuss limitations of prior work and recommend future directions for reusing these datasets in secondary analyses.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    107
    References
    0
    Citations
    NaN
    KQI
    []