CONNOTECH Experts-conseils Inc.

Software Acceleration for Public Key Cryptography

by Thierry Moreau

May 1997

© 1997 CONNOTECH Experts-conseils Inc.


Public key cryptography implies the use of large integer arithmetic operations, with a high proportion of modulus operations (as a reminder, the modulus operation is the remainder of an integer division). The division is by far the most compute intensive of the four elementary operations to the point where the the performance of a public key cryptosystem is always deceptively low.

To make these computations relatively efficient, the Montgomery modulo reduction algorithm is very useful. The mathematically inclined reader may refer to the original article by Peter L. Montgomery, Modular Multiplication Without Trial Division (in Mathemetics of computations, Vol. 44, no. 170, April 1985, pp 519-522). Two implementations of the Montgomery algorithm are reported, respectively

Simplification and adaptation of these references may be useful to the Montgomery method in the context of a general purpose computer (for software implementation). This is the subject of the present document. The intended readership includes the computer programmers and analysts.

The Montgomery method is useful when repeated modulus operations are needed with a constant divisor. When a public key cryptosystem requires the computation of a "modular inverse", the Montgomery method is less useful because the divisor is not a constant in this computation. Although the Montgomery method is useful for many public key cryptosystems, a particular focus of this document is the PEKE cryptosystem. For one of the two parties taking part of the PEKE cryptosystem protocol, the computation load is relatively small for a public key cryptosystem. The use of a relevant optimization technique may make the difference between unacceptable and bearable performance characteristics for a PEKE cryptosystem implementation.

The computation facilitated with the Montgomery method is repetitive "modular reduction" operations "mod N", where N is odd integer constant. This is because there are precomputations (on N) of which processing cost cannot be neglected. Let b be the natural computer word size of the target computer and n be such that bn>N. The values to be pre-computed are b2n mod N, and also N0' to be specified hereafter.

Montgomery multiplication, in the multiple precision case as with public key cryptography, allows fast modular arithmetic for a modulus N relatively prime to bn, where bn>N, and arithmetic modulo b is easy (e.g. 2767<N<2768, N is odd, b=28, n=96, so bn=2768). Let B=bn (not to be confused with Bi, a notation proper to the PEKE specification). Given T satisfying 0<=T<=B×N, the Montgomery reduction algorithm efficiently computes T×(B-1) mod N. Our focus with PEKE is the modular squaring, but we will show modular multiplication for sake of generality, so let T=x'×y' where x'=x×B mod N and y'=y×B mod N; then the Montgomery reduction algorithm efficiently computes x'×y'×(B-1) mod N=x×y×B mod N.

Let us use the notation MN,B(X,Y) for the result of X×Y×(B-1) mod N, using the Montgomery reduction algorithm. Then x×y×B mod N=MN,B(x×B mod N,y×B mod N). To convert an integer x, 0<=x<N, into x×B mod N, compute MN,B(x,b2n mod N). The value b2n mod N should be pre-computed once and for all; this must be done with a general purpose division operation. See for instance the article by Per Brinch Hansen, Multiple-length Division Revisited: a Tour of the Minefield (in Software Practice and Experience, vol. 24(6), June 1994, pp 579-601). To recover x from x×B mod N, compute MN,B(x×B mod N,1). There are two routes to complete a single modular multiplication: MN,B(MN,B(x,y),b2n mod N)=x×y mod N, or by pre-computing x'=MN,B(x,b2n mod N) and y'=MN,B(y,b2n mod N), and then MN,B(MN,B(x',y'),1)=x×y mod N. Whenever a series of multiplications is performed on a same set of inputs or intermediate results, the latter route is more efficient.

A series of multiplications occurs in the PEKE cryptosystem whenever the PEKE parameter t is greater than 1. Moreover, the PEKE cryptosystem may be changed without altering its security to fully exploit the Montgomery method. If this is done, the PEKE equations are restated as
x'=MN,B(x,b2n mod N),
x0'=MN,B(x',x'), instead of the original x0=x2 mod N,
xi+1'=MN,B(xi',xi'), instead of the original xi+1=xi2 mod N, and
Bi=xi' mod 2k, instead of the original Bi=xi mod 2k.
Formally, this requires an additional public parameter for the PEKE cryptosystem, that is the base bn. The conversion from the value xt'=xt×bn mod N to the value xt is done with the equation
xt=MN,B(xt',1);
it may be assigned to the initiator of the PEKE protocol to further reduce the computation load for the responder.

Moreover, the original PEKE specification requires another integer division, found in the following equation:
x=(|_x/S_|×S×C)+(xA->B×S)+(x mod S),
where |_a/b_| represents the integer division of a by b. This equation is equivalent to
x=((x-(x mod S))×C)+(xA->B×S)+(x mod S).
Then, x mod S can be computed with the Montgomery method,
x mod S=MS,B(MS,B(x,1),b2s mod S).
The conditions on the base B=b2s are b2s>S and b2s>x.

In summary, the PEKE cryptosystem may be made more efficient if the above change is adopted and if the parameters b, n, b2n mod N, s, and b2s mod S are all pre-computed. The efficiency gain from the precomputation of N0' and S0' is less important.

Now, the internals of the multiprecision Montgomery multiplication algorithm may be stated. Again, let B=bn. There is a need for the value of integer N' such that B×(B-1)-N×N'=1, where B×(B-1) mod N=1. Actually, only the least significant part of N' is needed, hence the definition N0'=N' mod b. We reproduce below the simple algorithm from the said article by Dussé and Kaliski to efficiently find N0' from N0 and b, when b is an exact power of two. Obviously, N0' can be computed once and for all.


SAKEM2_6.GIF

In the multiprecision Montgomery multiplication algorithm that follows, capital letters are multi-precision variables. The indices are as expected for natural integers, e.g. N0 is the least significant part of N. The algorithm is an application of the convolution-sum method for the multiplication. The variable c is an accumulator with sufficient capacity for a sum of products and multiple carry bits from the additions (up to 2n of them).


SAKEM2_7.GIF

The variable Rn is actually a local storage area of this algorithm (like Q and lower case variables). Consequently, the storage requirement for R is the same as for X and Y. Moreover, if X and Y are the same variable, as in the modular squaring operation of the PEKE cryptosystem, the storage for X, Y, and R can be the same.


security scheme designalternative to PKIpatent publicationsSAKEMscholarly web contentsconsulting services ]
[ CONNOTECH home page: http://www.connotech.com/about us | e-mail to: info@connotech.com ]

CONNOTECH Experts-conseils Inc.
9130 Place de Montgolfier
Montréal, Québec, Canada, H2M 2A1
Tél.: +1-514-385-5691 Fax: +1-514-385-5900