language-icon Old Web
English
Sign In

Ball tree

In computer science, a ball tree, balltree or metric tree, is a space partitioning data structure for organizing points in a multi-dimensional space. The ball tree gets its name from the fact that it partitions data points into a nested set of hyperspheres known as 'balls'. The resulting data structure has characteristics that make it useful for a number of applications, most notably nearest neighbor search. In computer science, a ball tree, balltree or metric tree, is a space partitioning data structure for organizing points in a multi-dimensional space. The ball tree gets its name from the fact that it partitions data points into a nested set of hyperspheres known as 'balls'. The resulting data structure has characteristics that make it useful for a number of applications, most notably nearest neighbor search. A ball tree is a binary tree in which every node defines a D-dimensional hypersphere, or ball, containing a subset of the points to be searched. Each internal node of the tree partitions the data points into two disjoint sets which are associated with different balls. While the balls themselves may intersect, each point is assigned to one or the other ball in the partition according to its distance from the ball's center. Each leaf node in the tree defines a ball and enumerates all data points inside that ball. Each node in the tree defines the smallest ball that contains all data points in its subtree. This gives rise to the useful property that, for a given test point t, the distance to any point in a ball B in the tree is greater than or equal to the distance from t to the ball. Formally: Where D B ( t ) {displaystyle D^{B}(t)} is the minimum possible distance from any point in the ball B to some point t. Ball-trees are related to the M-tree, but only support binary splits, whereas in the M-tree each level splits m {displaystyle m} to 2 m {displaystyle 2m} fold, thus leading to a shallower tree structure, therefore need fewer distance computations, which usually yields faster queries. Furthermore, M-trees can better be stored on disk, which is organized in pages. The M-tree also keeps the distances from the parent node precomputed to speed up queries. Vantage-point trees are also similar, but they binary split into one ball, and the remaining data, instead of using two balls. A number of ball tree construction algorithms are available. The goal of such an algorithm is to produce a tree that will efficiently support queries of the desired type (e.g. nearest-neighbor) efficiently in the average case. The specific criteria of an ideal tree will depend on the type of question being answered and the distribution of the underlying data. However, a generally applicable measure of an efficient tree is one that minimizes the total volume of its internal nodes. Given the varied distributions of real-world data sets, this is a difficult task, but there are several heuristics that partition the data well in practice. In general, there is a tradeoff between the cost of constructing a tree and the efficiency achieved by this metric. This section briefly describes the simplest of these algorithms. A more in-depth discussion of five algorithms was given by Stephen Omohundro. The simplest such procedure is termed the 'k-d Construction Algorithm', by analogy with the process used to construct k-d trees. This is an off-line algorithm, that is, an algorithm that operates on the entire data set at once. The tree is built top-down by recursively splitting the data points into two sets. Splits are chosen along the single dimension with the greatest spread of points, with the sets partitioned by the median value of all points along that dimension. Finding the split for each internal node requires linear time in the number of samples contained in that node, yielding an algorithm with time complexity O ( n log n ) {displaystyle O(n,log ,n)} , where n is the number of data points.

[ "Canopy clustering algorithm", "Large margin nearest neighbor", "Nearest-neighbor chain algorithm", "R-tree", "Best bin first" ]
Parent Topic
Child Topic
    No Parent Topic