Integer sorting

In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by numeric keys, each of which is an integer. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers, rational numbers, or text strings. The ability to perform integer arithmetic on the keys allows integer sorting algorithms to be faster than comparison sorting algorithms in many cases, depending on the details of which operations are allowed in the model of computing and how large the integers to be sorted are. In computer science, integer sorting is the algorithmic problem of sorting a collection of data values by numeric keys, each of which is an integer. Algorithms designed for integer sorting may also often be applied to sorting problems in which the keys are floating point numbers, rational numbers, or text strings. The ability to perform integer arithmetic on the keys allows integer sorting algorithms to be faster than comparison sorting algorithms in many cases, depending on the details of which operations are allowed in the model of computing and how large the integers to be sorted are. Integer sorting algorithms including pigeonhole sort, counting sort, and radix sort are widely used and practical. Other integer sorting algorithms with smaller worst-case time bounds are not believed to be practical for computer architectures with 64 or fewer bits per word. Many such algorithms are known, with performance depending on a combination of the number of items to be sorted, number of bits per key, and number of bits per word of the computer performing the sorting algorithm. Time bounds for integer sorting algorithms typically depend on three parameters: the number n of data values to be sorted, the magnitude K of the largest possible key to be sorted, and the number w of bits that can be represented in a single machine word of the computer on which the algorithm is to be performed. Typically, it is assumed that w ≥ log2(max(n, K)); that is, that machine words are large enough to represent an index into the sequence of input data, and also large enough to represent a single key. Integer sorting algorithms are usually designed to work in either the pointer machine or random access machine models of computing. The main difference between these two models is in how memory may be addressed. The random access machine allows any value that is stored in a register to be used as the address of memory read and write operations, with unit cost per operation. This ability allows certain complex operations on data to be implemented quickly using table lookups. In contrast, in the pointer machine model, read and write operations use addresses stored in pointers, and it is not allowed to perform arithmetic operations on these pointers. In both models, data values may be added, and bitwise Boolean operations and binary shift operations may typically also be performed on them, in unit time per operation. Different integer sorting algorithms make different assumptions, however, about whether integer multiplication is also allowed as a unit-time operation. Other more specialized models of computation such as the parallel random access machine have also been considered. Andersson, Miltersen & Thorup (1999) showed that in some cases the multiplications or table lookups required by some integer sorting algorithms could be replaced by customized operations that would be more easily implemented in hardware but that are not typically available on general-purpose computers. Thorup (2003) improved on this by showing how to replace these special operations by the bit field manipulation instructions already available on Pentium processors. A priority queue is a data structure for maintaining a collection of items with numerical priorities, having operations for finding and removing the item with the minimum priority value. Comparison-based priority queues such as the binary heap take logarithmic time per update, but other structures such as the van Emde Boas tree or bucket queue may be faster for inputs whose priorities are small integers. These data structures can be used in the selection sort algorithm, which sorts a collection of elements by repeatedly finding and removing the smallest element from the collection, and returning the elements in the order they were found. A priority queue can be used to maintain the collection of elements in this algorithm, and the time for this algorithm on a collection of n elements can be bounded by the time to initialize the priority queue and then to perform n find and remove operations. For instance, using a binary heap as a priority queue in selection sort leads to the heap sort algorithm, a comparison sorting algorithm that takes O(n log n) time. Instead, using selection sort with a bucket queue gives a form of pigeonhole sort, and using van Emde Boas trees or other integer priority queues leads to other fast integer sorting algorithms. Instead of using an integer priority queue in a sorting algorithm, it is possible to go the other direction, and use integer sorting algorithms as subroutines within an integer priority queue data structure. Thorup (2007) used this idea to show that, if it is possible to perform integer sorting in time T(n) per key, then the same time bound applies to the time per insertion or deletion operation in a priority queue data structure. Thorup's reduction is complicated and assumes the availability of either fast multiplication operations or table lookups, but he also provides an alternative priority queue using only addition and Boolean operations with time T(n) + T(log n) + T(log log n) + ... per operation, at most multiplying the time by an iterated logarithm. The classical integer sorting algorithms of pigeonhole sort, counting sort, and radix sort are widely used and practical. Much of the subsequent research on integer sorting algorithms has focused less on practicality and more on theoretical improvements in their worst case analysis, and the algorithms that come from this line of research are not believed to be practical for current 64-bit computer architectures, althoughexperiments have shown that some of these methods may be an improvement on radix sorting for data with 128 or more bits per key. Additionally, for large data sets, the near-random memory access patterns of many integer sorting algorithms can handicap them compared to comparison sorting algorithms that have been designed with the memory hierarchy in mind. Integer sorting provides one of the six benchmarks in the DARPA High Productivity Computing Systems Discrete Mathematics benchmark suite, and one of eleven benchmarks in the NAS Parallel Benchmarks suite.

Parent Topic

Child Topic

No Parent Topic