Data Reduction with Distance Correlation

2021 
Data reduction is a technique used in big data applications Volume, velocity, and variety of data bring in time and space complexity problems to computation While there are several approaches used for data reduction, dimension reduction and redundancy removal are among common approaches In those approaches, data are treated as points in a large space This paper considers the scenario of analyzing a topic for which similar multi-dimensional data are available from different sources The problem can be stated as data reduction by source selection This paper examines distance correlation (DC) as a technique for determining similar data sources For demonstration, COVID-19 in the United States of America (US) is considered as the topic of analysis as it is a topic of considerable interest Data reported by the states of US are considered as data sources We define and use a variation of concordance for validation analysis © 2021, Springer Nature Singapore Pte Ltd
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    0
    Citations
    NaN
    KQI
    []