In-place matrix transposition

In-place matrix transposition, also called in-situ matrix transposition, is the problem of transposing an N×M matrix in-place in computer memory, ideally with O(1) (bounded) additional storage, or at most with additional storage much less than NM. Typically, the matrix is assumed to be stored in row-major order or column-major order (i.e., contiguous rows or columns, respectively, arranged consecutively).If 0 ≤ a = Mn + m < MN − 1, then Na mod (MN−1) = MN n + Nm mod (MN − 1) = n + Nm. Let s be the smallest element of the cycle, and d = gcd ( s , M N − 1 ) {displaystyle d=gcd(s,MN-1)} . From the definition of the permutation P above, every other element x of the cycle is obtained by repeatedly multiplying s by N modulo MN−1, and therefore every other element is divisible by d. But, since N and MN − 1 are coprime, x cannot be divisible by any factor of MN − 1 larger than d, and hence d = gcd ( x , M N − 1 ) {displaystyle d=gcd(x,MN-1)} .The length k of the cycle containing s is the smallest k > 0 such that s N k = s mod ( M N − 1 ) {displaystyle sN^{k}=s{mod {(}}MN-1)} . Clearly, this is the same as the smallest k>0 such that ( − s ) N k = − s mod ( M N − 1 ) {displaystyle (-s)N^{k}=-s{mod {(}}MN-1)} , since we are just multiplying both sides by −1, and M N − 1 − s = − s mod ( M N − 1 ) {displaystyle MN-1-s=-s{mod {(}}MN-1)} . In-place matrix transposition, also called in-situ matrix transposition, is the problem of transposing an N×M matrix in-place in computer memory, ideally with O(1) (bounded) additional storage, or at most with additional storage much less than NM. Typically, the matrix is assumed to be stored in row-major order or column-major order (i.e., contiguous rows or columns, respectively, arranged consecutively). Performing an in-place transpose (in-situ transpose) is most difficult when N ≠ M, i.e. for a non-square (rectangular) matrix, where it involves a complicated permutation of the data elements, with many cycles of length greater than 2. In contrast, for a square matrix (N = M), all of the cycles are of length 1 or 2, and the transpose can be achieved by a simple loop to swap the upper triangle of the matrix with the lower triangle. Further complications arise if one wishes to maximize memory locality in order to improve cache line utilization or to operate out-of-core (where the matrix does not fit into main memory), since transposes inherently involve non-consecutive memory accesses.

Parent Topic

Child Topic

No Parent Topic