Eyeing the patterns: Data visualization using doubly-seriated color heatmaps

2020 
Abstract In current times, data are being generated at an ever-increasing rate and researchers are challenged to extract meaningful information from this large volume of data. The information extraction can be facilitated by visualization of data, which can be invaluable for deepening insights about trends and exceptions and may facilitate the discovery of patterns, outliers, or other anomalies or characteristics. These visual depictions are provided using various tools, including heatmaps. Heatmaps are matrices of colored cells that represent underlying numeric data. In a typical heatmap, each row represents an object, each column represents a condition, time point, instance, or other property, and the color of each cell indicates the associated data value. However, observations of randomly ordered data are rarely enlightening and rearrangement of rows and columns into clusters of similar data has proven to be of great value. A number of approaches have been developed to tackle this combinatorial problem, including Bond Energy Algorithm, the Traveling Salesman Problem Model, TSP + k, and Hierarchical Clustering. Identifying optimal solutions for the first three of these approaches is NP-hard and the fourth requires exponential computation time. Despite these computational demands, optimality is sometimes pursued. However, approximate algorithms have been developed to address the need for more efficient tools. These approximation techniques are widely varied in both computation time and quality of results. Importantly, even for optimal solvers, these approaches carry assumptions and biases, some of which are quite subtle and commonly overlooked, yet may impact results in significant ways. In short, the choice of rearrangement method should be mindful of the particular characteristics of the data at hand. Another issue in heatmap construction is the failure to properly preprocess data prior to rearrangement. In this chapter, we summarize the history of heatmaps, scrutinize various aspects of data preprocessing, and then examine several algorithms to reorder data rows and columns such that similar data are clustered together. More specifically, we review Bond Energy Algorithm, the Traveling Salesman Problem Model, TSP + k, and Hierarchical Clustering, noting assumptions, strengths, and weaknesses of each approach. This chapter concludes with thoughts for potential future directions of heatmap research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    55
    References
    1
    Citations
    NaN
    KQI
    []