language-icon Old Web
English
Sign In

R-tree

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as 'Find all museums within 2 km of my current location', 'retrieve all road segments within 2 km of my location' (to display them in a navigation system) or 'find the nearest gas station' (although not taking roads into account). The R-tree can also accelerate nearest neighbor search for various distance metrics, including great-circle distance.Guttman's quadratic split.Pages in this tree overlap a lot.Guttman's linear split.Even worse structure, but also faster to construct.Greene's split. Pages overlap much less than with Guttman's strategy.Ang-Tan linear split.This strategy produces sliced pages, which often yield bad query performance.R* tree topological split. The pages overlap much less since the R*-tree tries to minimize page overlap, and the reinsertions further optimized the tree. The split strategy prefers quadratic pages, which yields better performance for common map applications.Bulk loaded R* tree using Sort-Tile-Recursive (STR).The leaf pages do not overlap at all, and the directory pages overlap only little. This is a very efficient tree, but it requires the data to be completely known beforehand.M-trees are similar to the R-tree, but use nested spherical pages.Splitting these pages is, however, much more complicated and pages usually overlap much more. R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as 'Find all museums within 2 km of my current location', 'retrieve all road segments within 2 km of my location' (to display them in a navigation system) or 'find the nearest gas station' (although not taking roads into account). The R-tree can also accelerate nearest neighbor search for various distance metrics, including great-circle distance. The key idea of the data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree; the 'R' in R-tree is for rectangle. Since all objects lie within this bounding rectangle, a query that does not intersect the bounding rectangle also cannot intersect any of the contained objects. At the leaf level, each rectangle describes a single object; at higher levels the aggregation of an increasing number of objects. This can also be seen as an increasingly coarse approximation of the data set. Similar to the B-tree, the R-tree is also a balanced search tree (so all leaf nodes are at the same depth), organizes the data in pages, and is designed for storage on disk (as used in databases). Each page can contain a maximum number of entries, often denoted as M {displaystyle M} . It also guarantees a minimum fill (except for the root node), however best performance has been experienced with a minimum fill of 30%–40% of the maximum number of entries (B-trees guarantee 50% page fill, and B*-trees even 66%). The reason for this is the more complex balancing required for spatial data as opposed to linear data stored in B-trees. As with most trees, the searching algorithms (e.g., intersection, containment, nearest neighbor search) are rather simple. The key idea is to use the bounding boxes to decide whether or not to search inside a subtree. In this way, most of the nodes in the tree are never read during a search. Like B-trees, R-trees are suitable for large data sets and databases, where nodes can be paged to memory when needed, and the whole tree cannot be kept in main memory. Even if data can be fit in memory (or cached), the R-trees in most practical applications will usually provide performance advantages over naive check of all objects when the number of objects is more than few hundred or so. However, for in-memory applications, there are similar alternatives that can provide slightly better performance or be simpler to implement in practice.

[ "k-nearest neighbors algorithm", "Nearest neighbor search", "Spatial database", "Search engine indexing", "R+ tree", "Cover tree", "Fixed-radius near neighbors", "Ball tree" ]
Parent Topic
Child Topic
    No Parent Topic