Optimal binary search tree

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic. In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic. In the static optimality problem, the tree cannot be modified after it has been constructed. In this case, there exists some particular layout of the nodes of the tree which provides the smallest expected search time for the given access probabilities. Various algorithms exist to construct or approximate the statically optimal tree given the information on the access probabilities of the elements. In the dynamic optimality problem, the tree can be modified at any time, typically by permitting tree rotations. The tree is considered to have a cursor starting at the root which it can move or use to perform modifications. In this case, there exists some minimal-cost sequence of these operations which causes the cursor to visit every node in the target access sequence in order. The splay tree is conjectured to have a constant competitive ratio compared to the dynamically optimal tree in all cases, though this has not yet been proven. In the static optimality problem as defined by Knuth, we are given a set of n ordered elements and a set of 2 n + 1 {displaystyle 2n+1} probabilities. We will denote the elements a 1 {displaystyle a_{1}} through a n {displaystyle a_{n}} and the probabilities A 1 {displaystyle A_{1}} through A n {displaystyle A_{n}} and B 0 {displaystyle B_{0}} through B n {displaystyle B_{n}} . A i {displaystyle A_{i}} is the probability of a search being done for element a i {displaystyle a_{i}} . For 1 ≤ i < n {displaystyle 1leq i<n} , B i {displaystyle B_{i}} is the probability of a search being done for an element between a i {displaystyle a_{i}} and a i + 1 {displaystyle a_{i+1}} , B 0 {displaystyle B_{0}} is the probability of a search being done for an element strictly less than a 1 {displaystyle a_{1}} , and B n {displaystyle B_{n}} is the probability of a search being done for an element strictly greater than a n {displaystyle a_{n}} . These 2 n + 1 {displaystyle 2n+1} probabilities cover all possible searches, and therefore add up to one. The static optimality problem is the optimization problem of finding the binary search tree that minimizes the expected search time, given the 2 n + 1 {displaystyle 2n+1} probabilities. As the number of possible trees on a set of n elements is ( 2 n n ) 1 n + 1 {displaystyle {2n choose n}{frac {1}{n+1}}} , which is exponential in n, brute-force search is not usually a feasible solution. In 1971, Knuth published a relatively straightforward dynamic programming algorithm capable of constructing the statically optimal tree in only O(n2) time. Knuth's primary insight was that the static optimality problem exhibits optimal substructure; that is, if a certain tree is statically optimal for a given probability distribution, then its left and right subtrees must also be statically optimal for their appropriate subsets of the distribution. To see this, consider what Knuth calls the 'weighted path length' of a tree. The weighted path length of a tree on n elements is the sum of the lengths of all 2 n + 1 {displaystyle 2n+1} possible search paths, weighted by their respective probabilities. The tree with the minimal weighted path length is, by definition, statically optimal. But weighted path lengths have an interesting property. Let E be the weighted path length of a binary tree, EL be the weighted path length of its left subtree, and ER be the weighted path length of its right subtree. Also let W be the sum of all the probabilities in the tree. Observe that when either subtree is attached to the root, the depth of each of its elements (and thus each of its search paths) is increased by one. Also observe that the root itself has a depth of one. This means that the difference in weighted path length between a tree and its two subtrees is exactly the sum of every single probability in the tree, leading to the following recurrence: This recurrence leads to a natural dynamic programming solution. Let E i j {displaystyle E_{ij}} be the weighted path length of the statically optimal search tree for all values between ai and aj, let W i j {displaystyle W_{ij}} be the total weight of that tree, and let R i j {displaystyle R_{ij}} be the index of its root. The algorithm can be built using the following formulas:

Parent Topic

Child Topic

No Parent Topic