Exact and Approximate Hierarchical Clustering Using A

Craig S. Greenberg,Sebastian Macaluso,Nicholas Monath,Kumar Avinava Dubey,Patrick Flaherty,Manzil Zaheer,Amr Ahmed,Kyle Cranmer,Andrew McCallum

Exact and Approximate Hierarchical Clustering Using A

2021

Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel trellis data structure. This results in an exact algorithm that scales beyond previous state of the art (from search space with 10^12 trees to 10^15 trees) and an approximate algorithm that improves over baselines, even in enormous search spaces (that contain more than 10^1000 trees). Empirically we demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations