_T_e_c_h_n_i_q_u_e_s _f_o_r _I_m_p_l_e_m_e_n_t_i_n_g _t_h_e _R_S_A _P_u_b_l_i_c _K_e_y _C_r_y_p_t_o_s_y_s_t_e_m _1_.
_b_y
Lawrence Brown,
Dept. of Computer Science,
University College, UNSW,
Australian Defence Force Academy,
Canberra. ACT. 2600.
AAbbssttrraacctt
The RSA cryptosystem is generating consider-
able interest at present, as a scheme with a wide
range of potential uses. Its practical use has,
however, been limited because of the time taken to
perform its essential encryption and decryption
functions. Whilst the times achieved to date are
sufficiently short for key exchange schemes, they
are not fast enough for encrypting data on a commu-
nications link of any reasonable speed. Much
research is being conducted at present to find
techniques of improving the time efficiency of the
algorithms underlying RSA. This paper summarizes
some of the techniques suggested to date, illus-
trating the advantages and disadvantages of each,
and indicating the improvement in performance
expected from each. It is concluded that with the
development of specially designed chips using
carry-save multiplication techniques, and of algo-
rithms optimized for use on high-speed Digital Sig-
nal Processing (DSP) chips, that the efficiency of
RSA implementations will improve significantly.
____________________
1 This is an expanded version of a paper presented at the "Se-
cure Data Communications Workshop", organized by the IEEE Communi-
cations Section, Victorian Branch, held at the Town House, Mel-
bourne, on 31st July 1987.
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
-- 11 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
11.. IInnttrroodduuccttiioonn
Rivest, Shamir and Adleman [RSA78] outlined the first practi-
cal and fully-fledged public key cryptosystem (RSA) 2, the security
of which is based on the difficulty of factoring large integers.
This system is still the most highly regarded, and is the one most
commonly suggested for use in public key systems. It has been sug-
gested as an ISO standard [|||86], and has been placed as an
appendix in a number of other standards related to key exchange in
Electronic Funds Transfer (EFT) schemes, in the authentication of
certificates issued by an Electronic Directory Service, and in the
authentication of participants in wide-area networks. Zimmerman
[Zim86] outlines in detail a set of protocols which could form the
basis of an industry standard for RSA usage.
The major advantages of RSA are that it does not increase the
size of the message and that it may be used to provide both privacy
and authentication (by providing a digital signature) over communi-
cations links. Its main disadvantage is its reliance on modular
exponentiation, an operation which is still moderately time consum-
ing, though much effort is being expended to improve this.
This paper will outlined how encryption, decryption, and key
generation for RSA are performed in theory; will describe some of
the techniques which may be used in practice; the tradeoff's
between them; and some actual implementations.
22.. RRSSAA -- TThhee TThheeoorryy
22..11.. AAnn OOvveerrvviieeww ooff RRSSAA
In the RSA system, each user wishing to receive encrypted mes-
sages, performs the following steps:
+o selects two large primes _p and _q at random, each about 100
digits long (for key lengths believed secure at present),
+o calculates the modulus of their system _R=_p_q,
+o selects at random a number _e with _e<_R,_G_C_D(_e,|o(_R))=1, where
|o(_R)=(_p-1)(_q-1), is the Euler totient function of _R, (and _e is
usually small, for example 3, and may be fixed for all users
[QuC82], [Wil86], [KBS86b], and [KBS86a])
+o solves for the unique _d in the congruence
_d_e==1(_m_o_d|o(_R)),0<=_d<_R,
+o broadcasts the public key _E=<_e,_R>
____________________
2 for an extensive bibliography of references on Public-Key
schemes, see [Bro87]
-- 22 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
+o and keeps secure the private key _D=<_d>.
A sender can encrypt a message _M (expressed as some large integer
of a size less than the length of the modulus used in the system,
and created by concatenating characters together to form a bit-
string of the appropriate length) to form the ciphertext _C using
the public encryption key <_e,_R> by calculating:
_C=_M_e(_m_o_d_R), 0<=_C<_R
The receiver can then decipher this using his private decryption
key <_d> by calculating:
_M=_C_d(_m_o_d_R)
The system relies on the following well known number theoretic
identity (Euler's generalization of Fermat's theorem):
If_G_C_D(_X,_R)=1, then_X|o(_R)==1(_m_o_d_R)
Using this identity, it may be shown3 that:
_C_d=_M_e_d==_M1+_n|o(_R)==
_M1_M_n|o(_R)==_M1==_M(_m_o_d_R).
RSA can also be used as a signature scheme since the encryp-
tion and decryption operations are commutative. That is, a sender
can create a signature _S for a message _M using their private key
<_d> (which only they have access to), by calculating:
_S=_M_d(_m_o_d_R)
which may be verified by anyone using the sender's public key <_e,_R>
(obtained from the public directory), by calculating:
_M=_S_e(_m_o_d_R),
The signature can then be encrypted using the recipient's public
key, in order to privately transfer it to the recipient.
There is a minor problem known as the signature reblocking
problem, which must be overcome when the same scheme is used for
both secrecy and signatures. It occurs because each user has mod-
ulii _R of different sizes, and hence a block of ciphertext in one
user's field may not be a valid block in the other users field.
This becomes a problem when user _A signs a message for _B, and then
wishes to protect its privacy by encrypting with _B's public key.
If user _A's field is larger than _B's, and the signed message also
____________________
3 see Rivest, Shamir and Adleman's original paper [RSA78] for
the full derivation
-- 33 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
turns out to be too large (ie _R_B<_S_A_B<_R_A), then it cannot be
directly encrypted (ie _S_e_A_B_B(_m_o_d_R_B) cannot be decrypted correctly).
Several strategies are suggested to overcome this problem. These
include using separate modulii for secrecy and for signatures,
guaranteeing that the signature modulus is always smaller than any
secrecy modulus [RSA78], and reversing the order of signature and
secrecy encryptions, since these functions are commutative [Koh78].
Alternatively the problem may be avoided by using a hashing func-
tion to compress the message into a single block, which is then
signed with the private key only. Because a one-way function is
used to perform the hashing operation, no significant information
is conveyed about the content of the message. Hence no secrecy step
is needed. [Dav83], [Den83], [Den84].
22..22.. RRSSAA EEnnccrryyppttiioonn aanndd DDeeccrryyppttiioonn
Fundamental to the RSA algorithm is the modular exponentiation
of very large integers, of up to 200 digits (or 600 bits) in
length. These very large numbers are necessary to ensure the secu-
rity of the scheme, as the effort needed to break the scheme (by
factoring the modulus _R), is of _O(_e_f(_R)), an exponential of a func-
tion _f on the modulus _R. The best known algorithm, at present, for
factorizing a number _R with a large prime factor _p is a variation
on Lenstra's elliptic curve algorithm proposed by Brent [Bre86],
which runs in time
_O(_e___\___/__l__nl___pn__l_p__n__l__n___p__)
and which leads to the choice of R around 200 digits (or 600 bits)
in length [DHS84], [Bre86] (see Table 1 for a comparison of effort
for encryption and factoring for various key lengths)4.
____________________
4 currently 1e+14 operations is regarded as a limit for computa-
tional feasibility. Times may be derived by assuming one instruc-
tion per microsecond on a 1 MIP machine, and knowing that there are
3e+13 microseconds in a year.
-- 44 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
+---------------------------------------------+
| Table 1 - Complexity of Factorization and |
| Exponentiation Operations Modulo R |
+------------+---------------+----------------+
|Digits in R | Operations in | Operations in |
| | Factorization | Exponentiation |
+------------+---------------+----------------+
| 20 | 7.2e+03 | 8.0e+03 |
| 40 | 3.1e+06 | 6.4e+04 |
| 60 | 4.6e+08 | 2.1e+05 |
| 80 | 3.7e+10 | 5.1e+05 |
| 100 | 1.9e+12 | 1.0e+06 |
| 120 | 7.6e+13 | 1.7e+06 |
| 140 | 2.3e+15 | 2.7e+06 |
| 160 | 5.9e+16 | 4.0e+06 |
| 180 | 1.2e+18 | 5.8e+06 |
| 200 | 2.3e+19 | 8.0e+06 |
+------------+---------------+----------------+
Since no conventional computers support integer calculations
of this size, the problem can be approached from two basic direc-
tions. One is to break the numbers into words of a size supported
by the hardware on current machines, and carry out each step of the
calculation in a number of stages using multi-precision arithmetic.
The other technique is to design and build specialized hardware
which supports the large wordsize required. Both of these methods
have been attempted, and both can utilize some of the techniques
described below.
In multi-precision arithmetic the calculation is done on each
digit of the number in succession, with a carry or borrow being
propagated into the next word of the number. The size of the digit
used depends on the complexity of the algorithm, and on the word
size of the machine (or specialized hardware) being used. The algo-
rithms needed to do this are well known, and may be found in Knuth
[Knu81], and also in Schroeder's excellent introductory text to
number theory applications [Sch84]. The basic multi-precision mul-
tiplication algorithm they describe, runs in _O(_n2) time on a number
_n bits in length. This corresponds to performing long-hand multi-
plication of decimal integers (see Fig. 1); which can be done
either right-to-left, or left-to-right. The former is more usual,
though the latter is often more convenient for computer implementa-
tion.
-- 55 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
+--------------------------------------------+
| Fig. 1 - Long-hand Multiplication |
+---------------------+----------------------+
| | |
| 452 | 452 |
| * 125 | * 125 |
|______ | ______ |
| 2260 (* 5*1) | 452 (* 1) |
| | ______ |
| | 4520 (pp *10) |
|+ 9040 (* 2*10) | + 904 (* 2) |
|______ | ______ |
|=11300 (pp step 1) | = 5424 (pp step 2) |
| | ______ |
| | 54240 (pp *10) |
|+45200 (* 1*100) | + 2264 (* 5) |
|______ | ______ |
|=56500 (Result) | =56500 (Result) |
|______ | ______ |
| | |
| Right-To-Left | Left-To_Right |
| Evaluation | Evaluation |
+---------------------+----------------------+
Since in RSA, both encryption and decryption involve modular multi-
plication, a division stage must be performed after the multiplica-
tion to extract the remainder (mod R). Several techniques have
been suggested to minimize the time taken by this process.
One is to precompute a multiplication table for one of the
operands, and use it to perform the multiplication with a series of
additions and table lookup steps, as described by Henry [Hen81] and
Quisquater et al. [QuC82].
Another is to perform the modulo reduction in parallel with
each stage of the multiplication calculation, by removing the most
significant digit, and adding its remainder (mod R) to the partial
sum. This process achieves a saving by a factor of about 6, and I
strongly recommend that it be used. It is based on a result pre-
sented in the paper by Chivers [Chi84], which states that
_n-_1
{ _> _a_i_b_i}(_m_o_d_m)==
_i=0
_n-_2
{ _> _a_i_b_i+_a_n-1_b_n-1(_m_o_d_j_m)}(_m_o_d_m),forall_j
_i=0
The technique is described by Chivers [Chi84], Blakley [Bla83], and
Willoner et al. [WiC81] and produces a result that it is never
larger than _n+ln_n bits long. In more detail, it involves removing
the most significant digit (MSD) of the result, and adding its
remainder mod _m to the number represented by the remaining digits.
To use it, a table is constructed of these remainders for all pos-
sible digits. The size of the table obviously depends on the size
-- 66 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
of the base used. Typically bases of 1, 4, or 8 bits are used,
leading to table sizes of 1, 16, or 256 entries. See Figs. 2 and 3
for examples (which are presented in base 10). This process can be
generalized to an algorithm for reducing any number larger than the
modulus down until it has no more digits than the modulus, by iter-
ating over all digits larger than the modulus length, and scaling
up the remainders appropriately.
+----------------------------------------+
|Fig. 2 - Number and Remainder (mod 517) |
+----------------+-----------------------+
| n | n (mod 517) |
+----------------+-----------------------+
| 1000 | 483 |
| 2000 | 449 |
| 3000 | 415 |
| 4000 | 381 |
| 5000 | 347 |
| 6000 | 313 |
| 7000 | 279 |
| 8000 | 245 |
| 9000 | 211 |
+----------------+-----------------------+
+-----------------------------------------+
| Fig 3 - Example of Modulo Reduction |
+-----------------------------------------+
| 4120 (mod 517) |
|______ |
| 0120 (remove MSD) |
| + 381 (add remainder of 4000 mod 517) |
|______ |
| = 501 (result mod 517) |
| |
+-----------------------------------------+
Some, or all of the above techniques could be used by an
implementor in conjunction with a conventional multi-precision
arithmetic package. Alternatively one of several fast multiplica-
tion algorithms, followed by a fast division algorithm designed for
this application, could be used. This technique is outlined by
Mohan et al. [MoA85]. Asymptotically, the fastest integer multi-
plication algorithm known, is the Schonhage-Strassen Algorithm
[AHU74]. It is one of a family of algorithms which break the num-
ber into blocks, and treat each block as the coefficients of a
polynomial. The polynomials are evaluated at suitable points, the
products calculated, and the product polynomial found by interpola-
tion. Any one of a number of interpolation schemes can be used, the
fastest uses the Discrete Fourier Transform and the Convolution
Theorem, and can multiply two integers of length _n in _O(_nlog_n)
steps. An iterative process is then used to perform the division
step, optimized by careful choice of the modulus R. This technique
-- 77 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
seems particularly well suited for implementation on current digi-
tal signal processing chips, and is at present being investigated
further.
A further alternative, where special hardware is used, is to
have special carry-save multiplication hardware, where the addition
of carries between digits in the partial result is delayed until
the end of the multiplication step by using two registers for each
number. An extension of this technique which incorporates modulo
reduction similar to that described above, is presented by Brickell
[Bri82] and performs modular multiplication in time _O(_n+7) with
_O(_n) gates. This is the preferred approach when special hardware
is to be designed. It is possible to multiply two numbers in
_O(log_n) time, using _O(_n2) gates. Currently for the size of numbers
proposed with RSA, this appears to be unachievable. However, as
circuit densities increase, it will undoubtably lead to faster
implementations.
Modular exponentiation is performed using a series of modular
multiplications, in the well known square and multiply algorithm
(see example below) [Knu81].
8053==((((((802.80)2)2).80)2)2.80)
This appears to be an essentially sequential process whose time
efficiency cannot be generally improved upon beyond the improve-
ments in the underlying multiplication stages.
In the specific case of RSA some optimizations can be made.
Encryption can be performed using a small exponent _e (possibly
fixed), which limits the number of steps needed significantly
(since if _e only has say 8 significant bits, then at most 8 square
and multiply steps need to be done). The decryption key however,
must be long to ensure security. However in the decryption stage
the calculations can be done independently (mod p) and (mod q), and
the results combined using the Chinese Remainder Theorem. Since _p
and _q are each about 1/2 the size of _R, this results in a saving of
1/2 to 3/4 of the time, depending on the multiplication algorithm
used. In more detail, the Chinese Remainder Theorem states that
there can only be one solution to a set of congruences, given that
the modulii are relatively prime [Knu81]. In this specific case,
form:
_M1=_M(_m_o_d_p)=(_C(_m_o_d_p))_d(_m_o_d(_p-1))
_M2=_M(_m_o_d_q)=(_C(_m_o_d_q))_d(_m_o_d(_q-1))
Then the system of equations:
_M=_M1(_m_o_d_p), _M=_M2(_m_o_d_q)
will have a single solution (being the required message M by con-
struction) of the form:
-- 88 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
_M=[((_M2+_q-_M1)_u)(_m_o_d_q)]_p+_M1,
where_p_u(_m_o_d_q)==1
The multiplicative inverse _u is calculated at key generation time,
and is kept along with the secret decryption key. This method is
described by Quisquater et al. [QuC82].
These various choices (outlined above), which can be made in
implementing RSA, and the tradeoff's between them are summarized in
a significant overview paper by Rivest [Riv84]. They are:
IInntteeggeerr MMuullttiipplliiccaattiioonn of two _n-bit integers requires time:
+o _O(_n2) on a sequential computer using a standard algorithm
+o _O(_n) on special purpose serial/parallel multiplication hard-
ware, with _O(_n) gates,
+o _O(log_n) on special purpose parallel/parallel multiplication
hardware with _O(_n2) gates.
MMoodduullaarr MMuullttiipplliiccaattiioonn of two _n-bit integers modulo a third _n-bit
integer requires the same order of time, but with a constant
speedup factor if special algorithms are used as described previ-
ously.
MMoodduullaarr EExxppoonneennttiiaattiioonn requires _O(_n) multiplications. A constant
improvement in time efficiency may be obtained by using either:
+o a left-to-right exponentiation algorithm and precomputing pow-
ers of M, or alternatively
+o a right-to-left exponentiation algorithm and performing the
two modular multiplications in parallel.
IImmpplleemmeennttaattiioonn ttrriicckkss appropriate to performing RSA encryption and
decryption include:
+o using a fast clock rate
+o using a short encryption exponent _e (however the decryption
exponent _d must be long)
+o using the Chinese Remainder Theorem to operate modulo _p and _q
separately.
22..33.. RRSSAA KKeeyy GGeenneerraattiioonn
The first stage of the key generation process for RSA involves
finding two large primes _p and _q, to form the modulus of the system
_R=_p_q. The choice of these primes is critical to the success of RSA
as a secure cryptosystem. Several proposed attacks on RSA are based
on the possibility of a bad choice in these primes. The necessary
-- 99 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
conditions imposed on the primes are summarized in a significant
paper by Gordon [Gor84], which is recommended as a basis on which
the primality selection routines should be based. He designates as
_s_t_r_o_n_g _p_r_i_m_e_s those numbers which satisfy the following condi-
tions:
+o _p is a large prime
+o _p is chosen at random from a random seed
+o _p has a specified number of bits
+o _p-1 has a large prime factor _r
+o _p+1 has a large prime factor _s
+o _r-1 has a large prime factor _t
These conditions imply the following:
_p=2_j_r+1
_p=2_k_s-1
_r=2_L_t+1
for some _j,_k,_L, where _r,_s,_t are primes. The algorithm consists of
finding primes _s,_t, and then constructing _r and _p from them. The
details needed to do this are given in Gordon's paper [Gor84].
One of the crucial steps though, is to first identify a prime
number. Much recent work has been done on primality testing algo-
rithms, and near polynomial-time algorithms have now been devel-
oped. It thus appears at this time, that primality testing is sig-
nificantly easier than factorization of large integers (and hence
that RSA is both reasonably secure and efficient). A survey and
overview of the recent developments in primality testing is given
by Pomerance [Pom81], who outlines the various algorithms in use.
Nearly all of these make use of strong pseudo-prime tests, which if
failed prove that the integer is definitely composite. An analysis
of a couple of pseudo-prime tests in practical use is given in Bond
[Bon84]. The original test is derived from Fermat's Little Theorem,
which states that: If _n is prime, then for all _a:0<_a<_n
_a_n-1=1_m_o_d_n.
All prime numbers will satisfy this test. Some composite numbers,
termed _p_s_e_u_d_o_-_p_r_i_m_e_s, will also satisfy it. However, if the test is
repeated a large (say 100) number of times, then the chance of a
composite being accepted is very small. Nonetheless, there are some
numbers, called _s_t_r_o_n_g _p_s_e_u_d_o_-_p_r_i_m_e_s, which pass this test for all
n. Thus other tests have been suggested, which reduce further the
chance of such numbers being accepted as primes. These include the
SSoolloovvaayy SSttrraasssseenn TTeesstt:
-- 1100 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
Choose _a at random from 1<_a<_n-1
Accept _n if:
_G_C_D(_a,_n)=1,
and (_a_n__)(_m_o_d_n)=_a(___n__-2__1__)__(_m_o_d_n)
where (_a_n__)(_m_o_d_n)istheJacobisymbol
Otherwise reject.
and the RRaabbiinn TTeesstt:
Let _n=2_r_d+1, where _d is odd.
Choose _a at random from 1<_a<_n-1
Accept _n if either:
_a_d=1(_m_o_d_n)
or
_a2_j_d=-1(_m_o_d_n),
forsome_j:0<=_j<_r
Otherwise reject.
These tests may be iterated by repeating them for different, ran-
domly chosen values of _a.
The major algorithms for primality testing are described in
several recent papers:
+o Solovay et al. [SoS77] present the fast Solovay-Strassen prob-
abilistic test for primality.
+o Couvreur et al. [CoQ82] survey all the techniques known in
1982, summarizing their various advantages and disadvantages
(a good overview paper).
+o Adams et al. [AdS82] present a new set of pseudo-prime tests
which result in fewer inaccurate results.
+o Adleman et al. [APR83] present the very fast algorithm for
accurate (non-probabilistic) determination on whether a number
is prime or not. It uses a series of pseudo-prime tests to
narrow the range of possible divisors of a number, which is
then exhaustively searched to determine whether there are any
factors. It has an upper bound on the running time for a num-
ber _p of:
-- 1111 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
(log_p)_clogloglog_p,forsomeconstant_c
An improved version of the algorithm is presented by Cohen et
al. [CoL84], along with results from a practical implementa-
tion.
Having found the primes _p and _q, and obtained _R=_p_q, and
selected an encryption exponent _e (which may well be constant for
all users, but otherwise will usually be some small prime, see
[QuC82], [Wil86], [KBS86b], and [KBS86a]), the final stage of the
key generation process to find a number _d such that
_d_e==1(_m_o_d|o(_R)), 0<=_d<_R.
The usual technique suggested for performing this calculation is to
use an extended Euclid's algorithm for finding the _G_C_D(|o(_R),_e), as
described in Knuth [Knu81]. Alternatives to this include using a
binary algorithm, also described in Knuth [Knu81], and extended by
Purdy [Pur83] who describes a variant using carry-save hardware
which can perform the GCD calculation in time _O(_n) with _O(_n) gates;
or a systolic algorithm developed by Brent and modified by
Bojanczyk [BoB86], which provides a similar level of performance.
33.. RRSSAA IImmpplleemmeennttaattiioonn iinn PPrraaccttiiccee
The better known RSA implementations (either in full, or of
modular exponentiation implementations), may be considered in two
groups, software implementations on existing computers, and spe-
cialized hardware implementations:
33..11.. SSooffttwwaarree IImmpplleemmeennttaattiioonnss ooff RRSSAA
A number of software implementations of RSA, typically
encrypting at 1-10 bits/second, have been attempted. These include:
+o Schaller et al. [ScA83] describe an implementation for a Z-80
microprocessor of a system where RSA is used to distribute
keys for a private-key cryptosystem (DES), along with the
associated name and key servers, and message flow control, as
an experiment in the use of such a system; Similar schemes are
described by Jackson et al. [JMN84] and Muller-Schloer
[Mul83].
+o Coyne [Coy85] describes a full RSA system, used to provide
secure electronic mail, which has been implemented at the
Basser Dept. of Computer Science at the University of Sydney.
+o Aruliah [Aru85] presents an ISO Standard Pascal implementation
which has been fully tested on a number of systems, and which
it is suggested may be used as a comparison suite for other
implementations.
+o Burton [Bur84] gives code in Ratfor which implements a full
RSA system.
-- 1122 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
33..22.. HHaarrddwwaarree IImmpplleemmeennttaattiioonnss ooff RRSSAA
A number of hardware implementations of RSA have been
attempted. These include:
+o Rivest [Riv80]. This is a VLSI implementation using a 512 bit
ALU providing modular arithmetic including exponentiation,
prime number generation, GCD calculation, extraction of a
small common divisor if one exists, and RSA key generation.
It could achieve an encryption rate of 1200 bits/second.
+o Rieden et al. [RSW82] report on the work done by a team at
Sandia National Labs in the US, designing a radiation hardened
LSI chip set, which provided exponentiation of 336-bit mes-
sages at a rate of 430 bits/sec, using two chips.
+o British Telecom have released advanced information on a
256-bit RSA exponentiator chip [BBB86], known as the METEOR
VLSI Public Key Device, which will be capable of encrypting at
9600 bits/second.
+o Serpell et al. [SBC84], describe a hybrid cryptographic system
which uses RSA to distribute keys which are then used in a
private-key (B-Crypt scheme). The RSA calculations are done on
the British Telecom hardware exponentiation chip.
+o Orton et al. [ORS87] present some results from the implementa-
tion of a modular RSA chip which may be stacked to obtain
longer bit-lengths.
+o Janiak [Jan87] describes a chip set, based on the TMS320C10
DSP processor, which can exponentiate 640 bit numbers in 1.7
secs. It is designed to be used as an auxiliary processor in
PC's and Workstations.
44.. CCoonncclluussiioonn
In this paper, various alternative methods for implementing
the RSA cryptosystem have been outlined. These may be generally
classified as conventional multi-precision arithmetic with modulo
reduction, performed either in software on conventional machines,
or specially designed long bit-length hardware; specialized multi-
plication algorithms which may be optimized for use on high-speed
Digital Signal Processing (DSP) chips; and the use of carry-save
multiplication techniques on specially designed chips. The latter
two techniques in particular seem worthy of further investigation,
and hold promise that the efficiency of the RSA system may be sig-
nificantly improved.
RReeffeerreenncceess
[AdS82] W. Adams and D. Shanks, "Strong Primality Tests That Are
Not Sufficient," _M_a_t_h_e_m_a_t_i_c_s _o_f _C_o_m_p_u_t_a_t_i_o_n, vol. 39,
-- 1133 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
no. 159, pp. 255-300, |July| 1982.
[APR83] L. M. Adleman, C. Pomerance and R. S. Rumlely, "On
Distinguishing Prime Numbers from Composite Numbers,"
_A_n_n_a_l_s _o_f _M_a_t_h_e_m_a_t_i_c_s, vol. 117, pp. 173-206, 1983.
[AHU74] A. V. Aho, J. E. Hopcroft and J. D. Ullman, _T_h_e _D_e_s_i_g_n
_a_n_d _A_n_a_l_y_s_i_s _o_f _C_o_m_p_u_t_e_r _A_l_g_o_r_i_t_h_m_s, |Addison Wesley,
Reading, Mass., 1974.
[Aru85] A. A. Aruliah, "_P_a_s_c_a_l _I_m_p_l_e_m_e_n_t_a_t_i_o_n _o_f _t_h_e _R_S_A
_A_l_g_o_r_i_t_h_m," NPL-DITC-66/85, National Physical Lab., Div.
of Information Technology and Computing, Teddington
(England), |September.| 1985.
[Bla83] G. R. Blakley, "A Computer Algorithm for Calculating the
Product AB Modulo M," _I_E_E_E _|_T_r_a_n_s_._| _o_n _C_o_m_p_u_t_e_r_s, vol.
C-32, no. 5, pp. 497-500, |May| 1983.
[BoB86] A. W. Bojanczyk and R. P. Brent, "A Systolic Algorithm
for Extended GCD Computation," in _|_P_r_o_c_._| _9_t_h _A_u_s_t_r_a_l_i_a_n
_C_o_m_p_u_t_e_r _S_c_i_e_n_c_e _|_C_o_n_f_._|, pp. 129-137, ANU, Canberra,
Australia, |January.| 29-31, 1986.
[Bon84] D. J. Bond, "Practical Primality Testing," in _|_P_r_o_c_._| _o_f
_|_I_n_t_._| _|_C_o_n_f_._| _o_n _S_e_c_u_r_e _C_o_m_m_u_n_i_c_a_t_i_o_n _S_y_s_t_e_m_s, pp.
50-53, IEE, 22-23 |February.| 1984.
[Bre86] R. P. Brent, "Some Integer Factorization Algorithms using
Elliptic Curves," in _|_P_r_o_c_._| _9_t_h _A_u_s_t_r_a_l_i_a_n _C_o_m_p_u_t_e_r
_S_c_i_e_n_c_e _|_C_o_n_f_._|, pp. 149-163, ANU, Canberra, Australia,
|January.| 29-31, 1986.
[Bri82] E. F. Brickell, "A Fast Modular Multiplication Algorithm
with Application to Two Key Cryptography," in _A_d_v_a_n_c_e_s
_i_n _C_r_y_p_t_o_l_o_g_y _- _|_P_r_o_c_._| _o_f _C_r_y_p_t_o _8_2, D. Chaum, R. L.
Rivest and A. T. Sherman (editors), pp. 51-60, Plenum
Press, New York, |August.| 23-25, 1982.
[BBB86] "_A_d_v_a_n_c_e _I_n_f_o_r_m_a_t_i_o_n _o_n _t_h_e _M_e_t_e_o_r _V_L_S_I _P_u_b_l_i_c _K_e_y
_D_e_v_i_c_e," TA10.1.2, British Telecom, Ipswich UK, |April.|
1986.
[Bro87] L. Brown, "_P_u_b_l_i_c_-_K_e_y _C_r_y_p_t_o_s_y_s_t_e_m_s _- _A _B_i_b_l_i_o_g_r_a_p_h_y,"
Tech. Rep. CS87/6, |Dept. of Computer Science, UC UNSW,
Australian Defence Force Academy|, Canberra, Australia,
July 17, 1987.
[Bur84] C. E. Burton, "RSA: A Public Key Cryptosystem, Parts 1
and 2," _D_o_c_t_o_r _D_o_b_b_s _J_o_u_r_n_a_l, vol. 9, no. 3,4, pp.
16-43, 32-59, March - April 1984.
[Chi84] H. R. Chivers, "A Practical Fast Exponentiation Algorithm
for Public-Key," in _|_P_r_o_c_._| _o_f _|_I_n_t_._| _|_C_o_n_f_._| _o_n _S_e_c_u_r_e
_C_o_m_m_u_n_i_c_a_t_i_o_n _S_y_s_t_e_m_s, pp. 54-58, IEE, 22-23 |February.|
1984.
-- 1144 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
[CoL84] H. Cohen and H. W. Lenstra, "Primality Testing and Jacobi
Sums," _M_a_t_h_e_m_a_t_i_c_s _o_f _C_o_m_p_u_t_a_t_i_o_n, vol. 42, no. 165, pp.
297-330, |January.| 1984.
[CoQ82] C. Couvreur and J. J. Quisquater, "An Introduction to
Fast Generation of Large Prime Numbers," _P_h_i_l_i_p_s _J_.
_R_e_s_e_a_r_c_h, vol. 37, no. 5-6, pp. 231-264, Philips Research
Labs, Brussels, Belgium, 1982.
[Coy85] M. Coyne, "_A_n _I_m_p_l_e_m_e_n_t_a_t_i_o_n _o_f _a _P_u_b_l_i_c _K_e_y
_C_r_y_p_t_o_s_y_s_t_e_m," Honours Thesis, Basser Department of
Computer Science, University of Sydney, Australia,
|November.| 13, 1985.
[Dav83] D. W. Davies, "Applying the RSA Digital Signature to
Electronic Mail," _|_I_E_E_E _C_o_m_p_u_t_e_r_s_|, vol. C-16, no. 2,
pp. 55-62, |February.| 1983.
[DHS84] J. A. Davies, D. B. Holdridge and G. J. Simmons, "Status
Report on Factoring (At the Sandia National
Laboratories)," in _A_d_v_a_n_c_e_s _I_n _C_r_y_p_t_o_l_o_g_y_: _|_P_r_o_c_._| _o_f
_E_u_r_o_c_r_y_p_t _8_4, Lecture Notes in Computer Science, no. 209,
T. Beth, N. Cot and I. Ingemarsson (editors), pp.
183-215, |Springer Verlag|, Berlin, |April.| 1984.
[Den83] D. E. Denning, "Protecting Public Keys and Signature
Keys," _|_I_E_E_E _C_o_m_p_u_t_e_r_s_|, vol. C-16, no. 2, pp. 27-35,
|February.| 1983.
[Den84] D. E. Denning, "Digital Signatures with RSA and Other
Public-Key Cryptosystems," _|_C_o_m_m_. _o_f _t_h_e _A_C_M_|, vol. 27,
no. 4, pp. 388-392, |April.| 1984.
[Gor84] J. Gordon, "Strong RSA keys," _E_l_e_c_t_r_o_n_i_c_s _L_e_t_t_e_r_s, vol.
20, no. 12, pp. 514-516, |June| 1984.
[Hen81] P. S. Henry, "Fast Decryption Algorithm for the Knapsack
Cryptographic System," _B_e_l_l _S_y_s_t_e_m _T_e_c_h_n_i_c_a_l _J_o_u_r_n_a_l,
vol. 60, no. 5, pp. 767-773, Bell Labs, May-June 1981.
[JMN84] A. M. Jackson, N. A. McEvoy and B. B. Newman, "The
Project Universe Experiment," in _|_P_r_o_c_._| _1_9_8_4 _|_I_n_t_._|
_|_C_o_n_f_._| _o_n _S_e_c_u_r_e _C_o_m_m_u_n_i_c_a_t_i_o_n_s _S_y_s_t_e_m_s, pp. 14-19, IEE,
London, 22-23 |February.| 1984.
[Jan87] A. Janiak, "_E_D_S_P_-_2_4_7 _E_n_c_r_y_p_t_i_o_n _C_h_i_p_s_e_t," Product Info,
Kencomp - A Division of Janiak Holdings Pty. Ltd., 2
Regina Street, Kilsyth, Vic 3137, Australia, 1987.
[KBS86a] D. Khoo, D. J. Bird and J. Seberry, "_E_n_c_r_y_p_t_i_o_n _e_x_p_o_n_e_n_t_s
_3 _a_n_d _t_h_e _s_e_c_u_r_i_t_y _o_f _R_S_A," Tech. Rep. 275, Basser Dept.
of Computer Science, Uni. of Sydney, 1986.
[KBS86b] D. Khoo, D. J. Bird and J. Seberry, "Encryption Exponent
3 and the Security of RSA," _E_u_r_o_c_r_y_p_t _8_6 _- _A_b_s_t_r_a_c_t_s _o_f
_P_a_p_e_r_s , Linkoping, Sweden, 20-22 May 1986. also Journal
-- 1155 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
Cryptology (to appear).
[Knu81] D. E. Knuth, _S_e_m_i_n_u_m_e_r_i_c_a_l _A_l_g_o_r_i_t_h_m_s, The Art of
Computer Programming, no. 2, |Addison Wesley, London,
1981.
[Koh78] L. M. Kohnfelder, "On the Signature Reblocking Problem in
Public-Key Cryptosystems," _|_C_o_m_m_. _o_f _t_h_e _A_C_M_|, vol. 21,
no. 2, p. 179, |February.| 1978.
[MoA85] S. B. Mohan and B. S. Adiga, "Fast Algorithms for
Implementing RSA Public Key Cryptosystem," _E_l_e_c_t_r_o_n_i_c_s
_L_e_t_t_e_r_s, vol. 21, no. 17, p. 761, |August.| 1985.
[Mul83] C. Muller-Schloer, "A Microprocessor-based
Cryptoprocessor," _I_E_E_E _M_i_c_r_o , pp. 5-15, |October.|
1983.
[ORS87] G. A. Orton, M. P. Roy, P. A. Scott, L. E. Peppard and S.
E. Tavares, _A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y _- _C_R_Y_P_T_O_'_8_6, Lecture
Notes in Computer Science, no. 263, pp. 277-301,
|Springer Verlag|, Berlin, 1987.
[Pom81] C. Pomerance, "Recent Developments in Primality Testing,"
_T_h_e _M_a_t_h_e_m_a_t_i_c_a_l _I_n_t_e_l_l_i_g_e_n_c_e_r, vol. 3, no. 3, pp.
97-105, 1981.
[Pur83] G. B. Purdy, "A Carry-Free Algorithm for Finding the
Greatest Common Divisor of Two Integers," _C_o_m_p_u_t_e_r_s _a_n_d
_M_a_t_h_e_m_a_t_i_c_s _w_i_t_h _A_p_p_l_i_c_a_t_i_o_n_s, vol. 9, no. 2, pp.
311-316, 1983.
[QuC82] J. J. Quisquater and C. Couvreur, "Fast Decipherment
Algorithm for RSA Public-Key Cryptosystem," _E_l_e_c_t_r_o_n_i_c_s
_L_e_t_t_e_r_s, vol. 18, no. 21, pp. 905-907, |October.| 1982.
[RSW82] R. F. Rieden, J. B. Snyder, R. J. Widman and W. J.
Barnard, "_A _T_w_o_-_C_h_i_p _I_m_p_l_e_m_e_n_t_a_t_i_o_n _o_f _t_h_e _R_S_A _P_u_b_l_i_c_-_K_e_y
_E_n_c_r_y_p_t_i_o_n _A_l_g_o_r_i_t_h_m," SAND82-0994, Sandia National
Labs, Albuquerque, NM 87185, 1982.
[RSA78] R. L. Rivest, A. Shamir and L. Adleman, "A Method for
Obtaining Digital Signatures and Public-Key
Cryptosystems," _|_C_o_m_m_. _o_f _t_h_e _A_C_M_|, vol. 21, no. 2, pp.
120-126, |February.| 1978.
[Riv80] R. L. Rivest, "A Description of a Single-Chip
Implementation of the RSA cipher," _L_a_m_b_d_a, vol. 1, no.
3, pp. 14-18, 1980.
[Riv84] R. L. Rivest, "RSA Chips (Past/Present/Future)," in
_A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y_: _|_P_r_o_c_._| _o_f _E_u_r_o_c_r_y_p_t _8_4, Lecture
Notes In Computer Science, no. 209, T. Beth (editor), pp.
159-165, |Springer Verlag|, Berlin, |April.| 1984.
-- 1166 --
TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm
[ScA83] G. C. Schaller and J. F. Arnold, "Microprocessor Based
Implementation of the Public Key Encryption System," in
_1_9_t_h _|_I_n_t_._| _E_l_e_c_t_r_o_n_i_c_s _C_o_n_v_e_n_t_i_o_n _a_n_d _E_x_h_i_b_i_t_i_o_n_. _D_i_g_e_s_t
_o_f _P_a_p_e_r_s_., pp. 300-302, IREE, 1983.
[Sch84] M. R. Schroeder, _N_u_m_b_e_r _T_h_e_o_r_y _i_n _S_c_i_e_n_c_e _a_n_d
_C_o_m_m_u_n_i_c_a_t_i_o_n, Springer-Verlag, 1984.
[SBC84] S. C. Serpell, C. B. Brookson and B. L. Clark, "A
Prototype Encryption System Using Public Key," in
_A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y_: _|_P_r_o_c_._| _o_f _C_r_y_p_t_o _8_4, Lecture
Notes in Computer Science, no. 196, G. R. Blakley and D.
Chaum (editors), pp. 3-9, |Springer Verlag|, Berlin,
|August.| 1984.
[SoS77] R. Solovay and V. Strassen, "A Fast Monte Carlo Test for
Primality," _S_i_a_m _J_o_u_r_n_a_l _o_n _C_o_m_p_u_t_i_n_g, vol. 6, no. 1,
pp. 84-85, |March.| 1977.
[Wil86] H. C. Williams, "An M^3 Public-Key Encryption Scheme,"
in _A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y _- _C_r_y_p_t_o _8_5, Lecture Notes in
Computer Science, no. 218, H. C. Williams (editor), pp.
358-368, |Springer Verlag|, Berlin, 1986.
[WiC81] R. Willoner and I. Chen, "An Algorithm for Modular
Exponentiation," in _|_P_r_o_c_._| _5_t_h _S_y_m_p_o_s_i_u_m _o_n _C_o_m_p_u_t_e_r
_A_r_i_t_h_m_e_t_i_c_, _1_9_8_1_., pp. 135-138, 1981.
[Zim86] P. Zimmermann, "A Proposed Standard Format for RSA
Cryptosystems," _C_o_m_p_u_t_e_r, vol. 19, no. 9, pp. 21-34,
|September.| 1986.
[|||86] "_I_n_f_o_r_m_a_t_i_o_n _P_r_o_c_e_s_s_i_n_g _- _D_a_t_a _C_r_y_p_t_o_g_r_a_p_h_i_c _T_e_c_h_n_i_q_u_e_s _-
_S_p_e_c_i_f_i_c_a_t_i_o_n _o_f _D_E_A _2_, _a _P_u_b_l_i_c _K_e_y _A_l_g_o_r_i_t_h_m," ISO/TC
97/SC 20 N110, |ISO|, Paris, |January.| 1986.
-- 1177 --