New distributed algorithms for fast sign detection in residue number systems (RNS)

Dhananjay S. Phatak,Steven D. Houston

New distributed algorithms for fast sign detection in residue number systems (RNS)

2016

We identify a canonical parameter in the Chinese Remainder Theorem (CRT) and call it the "Reconstruction Coefficient", (denoted by " R C "); and introduce the notions of "Partial" and "Full" Reconstruction. If the R C can be determined efficiently, then arithmetic operations that are (relatively) harder to realize in RNS; including Sign Detection, Base change/extension and Scaling or division by a constant can also be implemented efficiently. This paper therefore focuses on and presents two distinct methods to efficiently evaluate the R C at long wordlengths. A straightforward application of these methods leads to ultra-fast sign-detection.An independent contribution of this paper is to illustrate non-trivial trade-offs between run-time computation vs. pre-computation and look-up. We show a simple method to select the moduli which leads to both the (i) number of RNS channelsź n ;źźas well asźź(ii) the largest channel modulusź m n satisfying { O ( n ) , O ( m n ) } ź N ź the full-precision bit-length. The net result is that for many canonical operations; exhaustive look-up covering all possible input values is feasible even at long cryptographic bit-lengths N . Under fairly general and realistic assumptions about the capabilities of current hardware, the memory needed for exhaustive look-up tables is shown to be bounded by a low degree polynomial ofź n .źMoreover, both methods to compute R C can achieve a delay of O ( lg n ) in a RN system with n channels. To the best of our knowledge, no other method published to date has shown a path to achieve that lower bound on the execution delay. Further, small values of channel moduli make it ideal to implement each individual RNS channel on a simple core in a many-core processor or as a distributed node, and our algorithms require a limited number of inter-channel communications, averaging O ( n ) . Results from a multi-core GPU implementation corroborate the theory. Most operations hitherto deemed hard to realize in RNS share the same bottleneck.Finding the LSB of an unknown in the Chinese Remainder Theorem is that bottleneck.Show 2 new fast methods to solve the bottleneck, one meets analytic speed bound.Moduli selection enables exhaustive pre-computation even at very long word lengths.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations