Tunstall coding

In computer science and information theory, Tunstall coding is a form of entropy coding used for lossless data compression. In computer science and information theory, Tunstall coding is a form of entropy coding used for lossless data compression. Tunstall coding was the subject of Brian Parker Tunstall's PhD thesis in 1967, while at Georgia Institute of Technology. The subject of that thesis was 'Synthesis of noiseless compression codes' Its design is a precursor to Lempel-Ziv. Unlike variable-length codes, which include Huffman and Lempel–Ziv coding,Tunstall coding is a code which maps source symbols to a fixed number of bits. Both Tunstall codes and Lempel-Ziv codes represent variable-length words by fixed-length codes. Unlike typical set encoding, Tunstall coding parses a stochastic source with codewords of variable length. It can be shownthat, for a large enough dictionary, the number of bits per source letter can be infinitely close to H ( U ) {displaystyle H(U)} , the entropy of the source. The algorithm requires as input an input alphabet U {displaystyle {mathcal {U}}} , along with a distribution of probabilities for each word input.It also requires an arbitrary constant C {displaystyle C} , which is an upper bound to the size of the dictionary that it will compute.The dictionary in question, D {displaystyle D} , is constructed as a tree of probabilities, in which each edge is associated to a letter from the input alphabet.The algorithm goes like this: Let's imagine that we wish to encode the string 'hello, world'.Let's further assume (somewhat unrealistically) that the input alphabet U {displaystyle {mathcal {U}}} contains only characters from the string 'hello, world' — that is, 'h', 'e', 'l', ',', ' ', 'w', 'o', 'r', 'd'.We can therefore compute the probability of each character based on its statistical appearance in the input string.For instance, the letter L appears thrice in a string of 12 characters: its probability is 3 12 {displaystyle 3 over 12} .

Parent Topic

Child Topic

No Parent Topic