_T_e_c_h_n_i_q_u_e_s _f_o_r _I_m_p_l_e_m_e_n_t_i_n_g _t_h_e _R_S_A _P_u_b_l_i_c _K_e_y _C_r_y_p_t_o_s_y_s_t_e_m _1_. _b_y Lawrence Brown, Dept. of Computer Science, University College, UNSW, Australian Defence Force Academy, Canberra. ACT. 2600. AAbbssttrraacctt The RSA cryptosystem is generating consider- able interest at present, as a scheme with a wide range of potential uses. Its practical use has, however, been limited because of the time taken to perform its essential encryption and decryption functions. Whilst the times achieved to date are sufficiently short for key exchange schemes, they are not fast enough for encrypting data on a commu- nications link of any reasonable speed. Much research is being conducted at present to find techniques of improving the time efficiency of the algorithms underlying RSA. This paper summarizes some of the techniques suggested to date, illus- trating the advantages and disadvantages of each, and indicating the improvement in performance expected from each. It is concluded that with the development of specially designed chips using carry-save multiplication techniques, and of algo- rithms optimized for use on high-speed Digital Sig- nal Processing (DSP) chips, that the efficiency of RSA implementations will improve significantly. ____________________ 1 This is an expanded version of a paper presented at the "Se- cure Data Communications Workshop", organized by the IEEE Communi- cations Section, Victorian Branch, held at the Town House, Mel- bourne, on 31st July 1987. TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm -- 11 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm 11.. IInnttrroodduuccttiioonn Rivest, Shamir and Adleman [RSA78] outlined the first practi- cal and fully-fledged public key cryptosystem (RSA) 2, the security of which is based on the difficulty of factoring large integers. This system is still the most highly regarded, and is the one most commonly suggested for use in public key systems. It has been sug- gested as an ISO standard [|||86], and has been placed as an appendix in a number of other standards related to key exchange in Electronic Funds Transfer (EFT) schemes, in the authentication of certificates issued by an Electronic Directory Service, and in the authentication of participants in wide-area networks. Zimmerman [Zim86] outlines in detail a set of protocols which could form the basis of an industry standard for RSA usage. The major advantages of RSA are that it does not increase the size of the message and that it may be used to provide both privacy and authentication (by providing a digital signature) over communi- cations links. Its main disadvantage is its reliance on modular exponentiation, an operation which is still moderately time consum- ing, though much effort is being expended to improve this. This paper will outlined how encryption, decryption, and key generation for RSA are performed in theory; will describe some of the techniques which may be used in practice; the tradeoff's between them; and some actual implementations. 22.. RRSSAA -- TThhee TThheeoorryy 22..11.. AAnn OOvveerrvviieeww ooff RRSSAA In the RSA system, each user wishing to receive encrypted mes- sages, performs the following steps: +o selects two large primes _p and _q at random, each about 100 digits long (for key lengths believed secure at present), +o calculates the modulus of their system _R=_p_q, +o selects at random a number _e with _e<_R,_G_C_D(_e,|o(_R))=1, where |o(_R)=(_p-1)(_q-1), is the Euler totient function of _R, (and _e is usually small, for example 3, and may be fixed for all users [QuC82], [Wil86], [KBS86b], and [KBS86a]) +o solves for the unique _d in the congruence _d_e==1(_m_o_d|o(_R)),0<=_d<_R, +o broadcasts the public key _E=<_e,_R> ____________________ 2 for an extensive bibliography of references on Public-Key schemes, see [Bro87] -- 22 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm +o and keeps secure the private key _D=<_d>. A sender can encrypt a message _M (expressed as some large integer of a size less than the length of the modulus used in the system, and created by concatenating characters together to form a bit- string of the appropriate length) to form the ciphertext _C using the public encryption key <_e,_R> by calculating: _C=_M_e(_m_o_d_R), 0<=_C<_R The receiver can then decipher this using his private decryption key <_d> by calculating: _M=_C_d(_m_o_d_R) The system relies on the following well known number theoretic identity (Euler's generalization of Fermat's theorem): If_G_C_D(_X,_R)=1, then_X|o(_R)==1(_m_o_d_R) Using this identity, it may be shown3 that: _C_d=_M_e_d==_M1+_n|o(_R)== _M1_M_n|o(_R)==_M1==_M(_m_o_d_R). RSA can also be used as a signature scheme since the encryp- tion and decryption operations are commutative. That is, a sender can create a signature _S for a message _M using their private key <_d> (which only they have access to), by calculating: _S=_M_d(_m_o_d_R) which may be verified by anyone using the sender's public key <_e,_R> (obtained from the public directory), by calculating: _M=_S_e(_m_o_d_R), The signature can then be encrypted using the recipient's public key, in order to privately transfer it to the recipient. There is a minor problem known as the signature reblocking problem, which must be overcome when the same scheme is used for both secrecy and signatures. It occurs because each user has mod- ulii _R of different sizes, and hence a block of ciphertext in one user's field may not be a valid block in the other users field. This becomes a problem when user _A signs a message for _B, and then wishes to protect its privacy by encrypting with _B's public key. If user _A's field is larger than _B's, and the signed message also ____________________ 3 see Rivest, Shamir and Adleman's original paper [RSA78] for the full derivation -- 33 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm turns out to be too large (ie _R_B<_S_A_B<_R_A), then it cannot be directly encrypted (ie _S_e_A_B_B(_m_o_d_R_B) cannot be decrypted correctly). Several strategies are suggested to overcome this problem. These include using separate modulii for secrecy and for signatures, guaranteeing that the signature modulus is always smaller than any secrecy modulus [RSA78], and reversing the order of signature and secrecy encryptions, since these functions are commutative [Koh78]. Alternatively the problem may be avoided by using a hashing func- tion to compress the message into a single block, which is then signed with the private key only. Because a one-way function is used to perform the hashing operation, no significant information is conveyed about the content of the message. Hence no secrecy step is needed. [Dav83], [Den83], [Den84]. 22..22.. RRSSAA EEnnccrryyppttiioonn aanndd DDeeccrryyppttiioonn Fundamental to the RSA algorithm is the modular exponentiation of very large integers, of up to 200 digits (or 600 bits) in length. These very large numbers are necessary to ensure the secu- rity of the scheme, as the effort needed to break the scheme (by factoring the modulus _R), is of _O(_e_f(_R)), an exponential of a func- tion _f on the modulus _R. The best known algorithm, at present, for factorizing a number _R with a large prime factor _p is a variation on Lenstra's elliptic curve algorithm proposed by Brent [Bre86], which runs in time _O(_e___\___/__l__nl___pn__l_p__n__l__n___p__) and which leads to the choice of R around 200 digits (or 600 bits) in length [DHS84], [Bre86] (see Table 1 for a comparison of effort for encryption and factoring for various key lengths)4. ____________________ 4 currently 1e+14 operations is regarded as a limit for computa- tional feasibility. Times may be derived by assuming one instruc- tion per microsecond on a 1 MIP machine, and knowing that there are 3e+13 microseconds in a year. -- 44 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm +---------------------------------------------+ | Table 1 - Complexity of Factorization and | | Exponentiation Operations Modulo R | +------------+---------------+----------------+ |Digits in R | Operations in | Operations in | | | Factorization | Exponentiation | +------------+---------------+----------------+ | 20 | 7.2e+03 | 8.0e+03 | | 40 | 3.1e+06 | 6.4e+04 | | 60 | 4.6e+08 | 2.1e+05 | | 80 | 3.7e+10 | 5.1e+05 | | 100 | 1.9e+12 | 1.0e+06 | | 120 | 7.6e+13 | 1.7e+06 | | 140 | 2.3e+15 | 2.7e+06 | | 160 | 5.9e+16 | 4.0e+06 | | 180 | 1.2e+18 | 5.8e+06 | | 200 | 2.3e+19 | 8.0e+06 | +------------+---------------+----------------+ Since no conventional computers support integer calculations of this size, the problem can be approached from two basic direc- tions. One is to break the numbers into words of a size supported by the hardware on current machines, and carry out each step of the calculation in a number of stages using multi-precision arithmetic. The other technique is to design and build specialized hardware which supports the large wordsize required. Both of these methods have been attempted, and both can utilize some of the techniques described below. In multi-precision arithmetic the calculation is done on each digit of the number in succession, with a carry or borrow being propagated into the next word of the number. The size of the digit used depends on the complexity of the algorithm, and on the word size of the machine (or specialized hardware) being used. The algo- rithms needed to do this are well known, and may be found in Knuth [Knu81], and also in Schroeder's excellent introductory text to number theory applications [Sch84]. The basic multi-precision mul- tiplication algorithm they describe, runs in _O(_n2) time on a number _n bits in length. This corresponds to performing long-hand multi- plication of decimal integers (see Fig. 1); which can be done either right-to-left, or left-to-right. The former is more usual, though the latter is often more convenient for computer implementa- tion. -- 55 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm +--------------------------------------------+ | Fig. 1 - Long-hand Multiplication | +---------------------+----------------------+ | | | | 452 | 452 | | * 125 | * 125 | |______ | ______ | | 2260 (* 5*1) | 452 (* 1) | | | ______ | | | 4520 (pp *10) | |+ 9040 (* 2*10) | + 904 (* 2) | |______ | ______ | |=11300 (pp step 1) | = 5424 (pp step 2) | | | ______ | | | 54240 (pp *10) | |+45200 (* 1*100) | + 2264 (* 5) | |______ | ______ | |=56500 (Result) | =56500 (Result) | |______ | ______ | | | | | Right-To-Left | Left-To_Right | | Evaluation | Evaluation | +---------------------+----------------------+ Since in RSA, both encryption and decryption involve modular multi- plication, a division stage must be performed after the multiplica- tion to extract the remainder (mod R). Several techniques have been suggested to minimize the time taken by this process. One is to precompute a multiplication table for one of the operands, and use it to perform the multiplication with a series of additions and table lookup steps, as described by Henry [Hen81] and Quisquater et al. [QuC82]. Another is to perform the modulo reduction in parallel with each stage of the multiplication calculation, by removing the most significant digit, and adding its remainder (mod R) to the partial sum. This process achieves a saving by a factor of about 6, and I strongly recommend that it be used. It is based on a result pre- sented in the paper by Chivers [Chi84], which states that _n-_1 { _> _a_i_b_i}(_m_o_d_m)== _i=0 _n-_2 { _> _a_i_b_i+_a_n-1_b_n-1(_m_o_d_j_m)}(_m_o_d_m),forall_j _i=0 The technique is described by Chivers [Chi84], Blakley [Bla83], and Willoner et al. [WiC81] and produces a result that it is never larger than _n+ln_n bits long. In more detail, it involves removing the most significant digit (MSD) of the result, and adding its remainder mod _m to the number represented by the remaining digits. To use it, a table is constructed of these remainders for all pos- sible digits. The size of the table obviously depends on the size -- 66 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm of the base used. Typically bases of 1, 4, or 8 bits are used, leading to table sizes of 1, 16, or 256 entries. See Figs. 2 and 3 for examples (which are presented in base 10). This process can be generalized to an algorithm for reducing any number larger than the modulus down until it has no more digits than the modulus, by iter- ating over all digits larger than the modulus length, and scaling up the remainders appropriately. +----------------------------------------+ |Fig. 2 - Number and Remainder (mod 517) | +----------------+-----------------------+ | n | n (mod 517) | +----------------+-----------------------+ | 1000 | 483 | | 2000 | 449 | | 3000 | 415 | | 4000 | 381 | | 5000 | 347 | | 6000 | 313 | | 7000 | 279 | | 8000 | 245 | | 9000 | 211 | +----------------+-----------------------+ +-----------------------------------------+ | Fig 3 - Example of Modulo Reduction | +-----------------------------------------+ | 4120 (mod 517) | |______ | | 0120 (remove MSD) | | + 381 (add remainder of 4000 mod 517) | |______ | | = 501 (result mod 517) | | | +-----------------------------------------+ Some, or all of the above techniques could be used by an implementor in conjunction with a conventional multi-precision arithmetic package. Alternatively one of several fast multiplica- tion algorithms, followed by a fast division algorithm designed for this application, could be used. This technique is outlined by Mohan et al. [MoA85]. Asymptotically, the fastest integer multi- plication algorithm known, is the Schonhage-Strassen Algorithm [AHU74]. It is one of a family of algorithms which break the num- ber into blocks, and treat each block as the coefficients of a polynomial. The polynomials are evaluated at suitable points, the products calculated, and the product polynomial found by interpola- tion. Any one of a number of interpolation schemes can be used, the fastest uses the Discrete Fourier Transform and the Convolution Theorem, and can multiply two integers of length _n in _O(_nlog_n) steps. An iterative process is then used to perform the division step, optimized by careful choice of the modulus R. This technique -- 77 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm seems particularly well suited for implementation on current digi- tal signal processing chips, and is at present being investigated further. A further alternative, where special hardware is used, is to have special carry-save multiplication hardware, where the addition of carries between digits in the partial result is delayed until the end of the multiplication step by using two registers for each number. An extension of this technique which incorporates modulo reduction similar to that described above, is presented by Brickell [Bri82] and performs modular multiplication in time _O(_n+7) with _O(_n) gates. This is the preferred approach when special hardware is to be designed. It is possible to multiply two numbers in _O(log_n) time, using _O(_n2) gates. Currently for the size of numbers proposed with RSA, this appears to be unachievable. However, as circuit densities increase, it will undoubtably lead to faster implementations. Modular exponentiation is performed using a series of modular multiplications, in the well known square and multiply algorithm (see example below) [Knu81]. 8053==((((((802.80)2)2).80)2)2.80) This appears to be an essentially sequential process whose time efficiency cannot be generally improved upon beyond the improve- ments in the underlying multiplication stages. In the specific case of RSA some optimizations can be made. Encryption can be performed using a small exponent _e (possibly fixed), which limits the number of steps needed significantly (since if _e only has say 8 significant bits, then at most 8 square and multiply steps need to be done). The decryption key however, must be long to ensure security. However in the decryption stage the calculations can be done independently (mod p) and (mod q), and the results combined using the Chinese Remainder Theorem. Since _p and _q are each about 1/2 the size of _R, this results in a saving of 1/2 to 3/4 of the time, depending on the multiplication algorithm used. In more detail, the Chinese Remainder Theorem states that there can only be one solution to a set of congruences, given that the modulii are relatively prime [Knu81]. In this specific case, form: _M1=_M(_m_o_d_p)=(_C(_m_o_d_p))_d(_m_o_d(_p-1)) _M2=_M(_m_o_d_q)=(_C(_m_o_d_q))_d(_m_o_d(_q-1)) Then the system of equations: _M=_M1(_m_o_d_p), _M=_M2(_m_o_d_q) will have a single solution (being the required message M by con- struction) of the form: -- 88 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm _M=[((_M2+_q-_M1)_u)(_m_o_d_q)]_p+_M1, where_p_u(_m_o_d_q)==1 The multiplicative inverse _u is calculated at key generation time, and is kept along with the secret decryption key. This method is described by Quisquater et al. [QuC82]. These various choices (outlined above), which can be made in implementing RSA, and the tradeoff's between them are summarized in a significant overview paper by Rivest [Riv84]. They are: IInntteeggeerr MMuullttiipplliiccaattiioonn of two _n-bit integers requires time: +o _O(_n2) on a sequential computer using a standard algorithm +o _O(_n) on special purpose serial/parallel multiplication hard- ware, with _O(_n) gates, +o _O(log_n) on special purpose parallel/parallel multiplication hardware with _O(_n2) gates. MMoodduullaarr MMuullttiipplliiccaattiioonn of two _n-bit integers modulo a third _n-bit integer requires the same order of time, but with a constant speedup factor if special algorithms are used as described previ- ously. MMoodduullaarr EExxppoonneennttiiaattiioonn requires _O(_n) multiplications. A constant improvement in time efficiency may be obtained by using either: +o a left-to-right exponentiation algorithm and precomputing pow- ers of M, or alternatively +o a right-to-left exponentiation algorithm and performing the two modular multiplications in parallel. IImmpplleemmeennttaattiioonn ttrriicckkss appropriate to performing RSA encryption and decryption include: +o using a fast clock rate +o using a short encryption exponent _e (however the decryption exponent _d must be long) +o using the Chinese Remainder Theorem to operate modulo _p and _q separately. 22..33.. RRSSAA KKeeyy GGeenneerraattiioonn The first stage of the key generation process for RSA involves finding two large primes _p and _q, to form the modulus of the system _R=_p_q. The choice of these primes is critical to the success of RSA as a secure cryptosystem. Several proposed attacks on RSA are based on the possibility of a bad choice in these primes. The necessary -- 99 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm conditions imposed on the primes are summarized in a significant paper by Gordon [Gor84], which is recommended as a basis on which the primality selection routines should be based. He designates as _s_t_r_o_n_g _p_r_i_m_e_s those numbers which satisfy the following condi- tions: +o _p is a large prime +o _p is chosen at random from a random seed +o _p has a specified number of bits +o _p-1 has a large prime factor _r +o _p+1 has a large prime factor _s +o _r-1 has a large prime factor _t These conditions imply the following: _p=2_j_r+1 _p=2_k_s-1 _r=2_L_t+1 for some _j,_k,_L, where _r,_s,_t are primes. The algorithm consists of finding primes _s,_t, and then constructing _r and _p from them. The details needed to do this are given in Gordon's paper [Gor84]. One of the crucial steps though, is to first identify a prime number. Much recent work has been done on primality testing algo- rithms, and near polynomial-time algorithms have now been devel- oped. It thus appears at this time, that primality testing is sig- nificantly easier than factorization of large integers (and hence that RSA is both reasonably secure and efficient). A survey and overview of the recent developments in primality testing is given by Pomerance [Pom81], who outlines the various algorithms in use. Nearly all of these make use of strong pseudo-prime tests, which if failed prove that the integer is definitely composite. An analysis of a couple of pseudo-prime tests in practical use is given in Bond [Bon84]. The original test is derived from Fermat's Little Theorem, which states that: If _n is prime, then for all _a:0<_a<_n _a_n-1=1_m_o_d_n. All prime numbers will satisfy this test. Some composite numbers, termed _p_s_e_u_d_o_-_p_r_i_m_e_s, will also satisfy it. However, if the test is repeated a large (say 100) number of times, then the chance of a composite being accepted is very small. Nonetheless, there are some numbers, called _s_t_r_o_n_g _p_s_e_u_d_o_-_p_r_i_m_e_s, which pass this test for all n. Thus other tests have been suggested, which reduce further the chance of such numbers being accepted as primes. These include the SSoolloovvaayy SSttrraasssseenn TTeesstt: -- 1100 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm Choose _a at random from 1<_a<_n-1 Accept _n if: _G_C_D(_a,_n)=1, and (_a_n__)(_m_o_d_n)=_a(___n__-2__1__)__(_m_o_d_n) where (_a_n__)(_m_o_d_n)istheJacobisymbol Otherwise reject. and the RRaabbiinn TTeesstt: Let _n=2_r_d+1, where _d is odd. Choose _a at random from 1<_a<_n-1 Accept _n if either: _a_d=1(_m_o_d_n) or _a2_j_d=-1(_m_o_d_n), forsome_j:0<=_j<_r Otherwise reject. These tests may be iterated by repeating them for different, ran- domly chosen values of _a. The major algorithms for primality testing are described in several recent papers: +o Solovay et al. [SoS77] present the fast Solovay-Strassen prob- abilistic test for primality. +o Couvreur et al. [CoQ82] survey all the techniques known in 1982, summarizing their various advantages and disadvantages (a good overview paper). +o Adams et al. [AdS82] present a new set of pseudo-prime tests which result in fewer inaccurate results. +o Adleman et al. [APR83] present the very fast algorithm for accurate (non-probabilistic) determination on whether a number is prime or not. It uses a series of pseudo-prime tests to narrow the range of possible divisors of a number, which is then exhaustively searched to determine whether there are any factors. It has an upper bound on the running time for a num- ber _p of: -- 1111 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm (log_p)_clogloglog_p,forsomeconstant_c An improved version of the algorithm is presented by Cohen et al. [CoL84], along with results from a practical implementa- tion. Having found the primes _p and _q, and obtained _R=_p_q, and selected an encryption exponent _e (which may well be constant for all users, but otherwise will usually be some small prime, see [QuC82], [Wil86], [KBS86b], and [KBS86a]), the final stage of the key generation process to find a number _d such that _d_e==1(_m_o_d|o(_R)), 0<=_d<_R. The usual technique suggested for performing this calculation is to use an extended Euclid's algorithm for finding the _G_C_D(|o(_R),_e), as described in Knuth [Knu81]. Alternatives to this include using a binary algorithm, also described in Knuth [Knu81], and extended by Purdy [Pur83] who describes a variant using carry-save hardware which can perform the GCD calculation in time _O(_n) with _O(_n) gates; or a systolic algorithm developed by Brent and modified by Bojanczyk [BoB86], which provides a similar level of performance. 33.. RRSSAA IImmpplleemmeennttaattiioonn iinn PPrraaccttiiccee The better known RSA implementations (either in full, or of modular exponentiation implementations), may be considered in two groups, software implementations on existing computers, and spe- cialized hardware implementations: 33..11.. SSooffttwwaarree IImmpplleemmeennttaattiioonnss ooff RRSSAA A number of software implementations of RSA, typically encrypting at 1-10 bits/second, have been attempted. These include: +o Schaller et al. [ScA83] describe an implementation for a Z-80 microprocessor of a system where RSA is used to distribute keys for a private-key cryptosystem (DES), along with the associated name and key servers, and message flow control, as an experiment in the use of such a system; Similar schemes are described by Jackson et al. [JMN84] and Muller-Schloer [Mul83]. +o Coyne [Coy85] describes a full RSA system, used to provide secure electronic mail, which has been implemented at the Basser Dept. of Computer Science at the University of Sydney. +o Aruliah [Aru85] presents an ISO Standard Pascal implementation which has been fully tested on a number of systems, and which it is suggested may be used as a comparison suite for other implementations. +o Burton [Bur84] gives code in Ratfor which implements a full RSA system. -- 1122 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm 33..22.. HHaarrddwwaarree IImmpplleemmeennttaattiioonnss ooff RRSSAA A number of hardware implementations of RSA have been attempted. These include: +o Rivest [Riv80]. This is a VLSI implementation using a 512 bit ALU providing modular arithmetic including exponentiation, prime number generation, GCD calculation, extraction of a small common divisor if one exists, and RSA key generation. It could achieve an encryption rate of 1200 bits/second. +o Rieden et al. [RSW82] report on the work done by a team at Sandia National Labs in the US, designing a radiation hardened LSI chip set, which provided exponentiation of 336-bit mes- sages at a rate of 430 bits/sec, using two chips. +o British Telecom have released advanced information on a 256-bit RSA exponentiator chip [BBB86], known as the METEOR VLSI Public Key Device, which will be capable of encrypting at 9600 bits/second. +o Serpell et al. [SBC84], describe a hybrid cryptographic system which uses RSA to distribute keys which are then used in a private-key (B-Crypt scheme). The RSA calculations are done on the British Telecom hardware exponentiation chip. +o Orton et al. [ORS87] present some results from the implementa- tion of a modular RSA chip which may be stacked to obtain longer bit-lengths. +o Janiak [Jan87] describes a chip set, based on the TMS320C10 DSP processor, which can exponentiate 640 bit numbers in 1.7 secs. It is designed to be used as an auxiliary processor in PC's and Workstations. 44.. CCoonncclluussiioonn In this paper, various alternative methods for implementing the RSA cryptosystem have been outlined. These may be generally classified as conventional multi-precision arithmetic with modulo reduction, performed either in software on conventional machines, or specially designed long bit-length hardware; specialized multi- plication algorithms which may be optimized for use on high-speed Digital Signal Processing (DSP) chips; and the use of carry-save multiplication techniques on specially designed chips. The latter two techniques in particular seem worthy of further investigation, and hold promise that the efficiency of the RSA system may be sig- nificantly improved. RReeffeerreenncceess [AdS82] W. Adams and D. Shanks, "Strong Primality Tests That Are Not Sufficient," _M_a_t_h_e_m_a_t_i_c_s _o_f _C_o_m_p_u_t_a_t_i_o_n, vol. 39, -- 1133 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm no. 159, pp. 255-300, |July| 1982. [APR83] L. M. Adleman, C. Pomerance and R. S. Rumlely, "On Distinguishing Prime Numbers from Composite Numbers," _A_n_n_a_l_s _o_f _M_a_t_h_e_m_a_t_i_c_s, vol. 117, pp. 173-206, 1983. [AHU74] A. V. Aho, J. E. Hopcroft and J. D. Ullman, _T_h_e _D_e_s_i_g_n _a_n_d _A_n_a_l_y_s_i_s _o_f _C_o_m_p_u_t_e_r _A_l_g_o_r_i_t_h_m_s, |Addison Wesley, Reading, Mass., 1974. [Aru85] A. A. Aruliah, "_P_a_s_c_a_l _I_m_p_l_e_m_e_n_t_a_t_i_o_n _o_f _t_h_e _R_S_A _A_l_g_o_r_i_t_h_m," NPL-DITC-66/85, National Physical Lab., Div. of Information Technology and Computing, Teddington (England), |September.| 1985. [Bla83] G. R. Blakley, "A Computer Algorithm for Calculating the Product AB Modulo M," _I_E_E_E _|_T_r_a_n_s_._| _o_n _C_o_m_p_u_t_e_r_s, vol. C-32, no. 5, pp. 497-500, |May| 1983. [BoB86] A. W. Bojanczyk and R. P. Brent, "A Systolic Algorithm for Extended GCD Computation," in _|_P_r_o_c_._| _9_t_h _A_u_s_t_r_a_l_i_a_n _C_o_m_p_u_t_e_r _S_c_i_e_n_c_e _|_C_o_n_f_._|, pp. 129-137, ANU, Canberra, Australia, |January.| 29-31, 1986. [Bon84] D. J. Bond, "Practical Primality Testing," in _|_P_r_o_c_._| _o_f _|_I_n_t_._| _|_C_o_n_f_._| _o_n _S_e_c_u_r_e _C_o_m_m_u_n_i_c_a_t_i_o_n _S_y_s_t_e_m_s, pp. 50-53, IEE, 22-23 |February.| 1984. [Bre86] R. P. Brent, "Some Integer Factorization Algorithms using Elliptic Curves," in _|_P_r_o_c_._| _9_t_h _A_u_s_t_r_a_l_i_a_n _C_o_m_p_u_t_e_r _S_c_i_e_n_c_e _|_C_o_n_f_._|, pp. 149-163, ANU, Canberra, Australia, |January.| 29-31, 1986. [Bri82] E. F. Brickell, "A Fast Modular Multiplication Algorithm with Application to Two Key Cryptography," in _A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y _- _|_P_r_o_c_._| _o_f _C_r_y_p_t_o _8_2, D. Chaum, R. L. Rivest and A. T. Sherman (editors), pp. 51-60, Plenum Press, New York, |August.| 23-25, 1982. [BBB86] "_A_d_v_a_n_c_e _I_n_f_o_r_m_a_t_i_o_n _o_n _t_h_e _M_e_t_e_o_r _V_L_S_I _P_u_b_l_i_c _K_e_y _D_e_v_i_c_e," TA10.1.2, British Telecom, Ipswich UK, |April.| 1986. [Bro87] L. Brown, "_P_u_b_l_i_c_-_K_e_y _C_r_y_p_t_o_s_y_s_t_e_m_s _- _A _B_i_b_l_i_o_g_r_a_p_h_y," Tech. Rep. CS87/6, |Dept. of Computer Science, UC UNSW, Australian Defence Force Academy|, Canberra, Australia, July 17, 1987. [Bur84] C. E. Burton, "RSA: A Public Key Cryptosystem, Parts 1 and 2," _D_o_c_t_o_r _D_o_b_b_s _J_o_u_r_n_a_l, vol. 9, no. 3,4, pp. 16-43, 32-59, March - April 1984. [Chi84] H. R. Chivers, "A Practical Fast Exponentiation Algorithm for Public-Key," in _|_P_r_o_c_._| _o_f _|_I_n_t_._| _|_C_o_n_f_._| _o_n _S_e_c_u_r_e _C_o_m_m_u_n_i_c_a_t_i_o_n _S_y_s_t_e_m_s, pp. 54-58, IEE, 22-23 |February.| 1984. -- 1144 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm [CoL84] H. Cohen and H. W. Lenstra, "Primality Testing and Jacobi Sums," _M_a_t_h_e_m_a_t_i_c_s _o_f _C_o_m_p_u_t_a_t_i_o_n, vol. 42, no. 165, pp. 297-330, |January.| 1984. [CoQ82] C. Couvreur and J. J. Quisquater, "An Introduction to Fast Generation of Large Prime Numbers," _P_h_i_l_i_p_s _J_. _R_e_s_e_a_r_c_h, vol. 37, no. 5-6, pp. 231-264, Philips Research Labs, Brussels, Belgium, 1982. [Coy85] M. Coyne, "_A_n _I_m_p_l_e_m_e_n_t_a_t_i_o_n _o_f _a _P_u_b_l_i_c _K_e_y _C_r_y_p_t_o_s_y_s_t_e_m," Honours Thesis, Basser Department of Computer Science, University of Sydney, Australia, |November.| 13, 1985. [Dav83] D. W. Davies, "Applying the RSA Digital Signature to Electronic Mail," _|_I_E_E_E _C_o_m_p_u_t_e_r_s_|, vol. C-16, no. 2, pp. 55-62, |February.| 1983. [DHS84] J. A. Davies, D. B. Holdridge and G. J. Simmons, "Status Report on Factoring (At the Sandia National Laboratories)," in _A_d_v_a_n_c_e_s _I_n _C_r_y_p_t_o_l_o_g_y_: _|_P_r_o_c_._| _o_f _E_u_r_o_c_r_y_p_t _8_4, Lecture Notes in Computer Science, no. 209, T. Beth, N. Cot and I. Ingemarsson (editors), pp. 183-215, |Springer Verlag|, Berlin, |April.| 1984. [Den83] D. E. Denning, "Protecting Public Keys and Signature Keys," _|_I_E_E_E _C_o_m_p_u_t_e_r_s_|, vol. C-16, no. 2, pp. 27-35, |February.| 1983. [Den84] D. E. Denning, "Digital Signatures with RSA and Other Public-Key Cryptosystems," _|_C_o_m_m_. _o_f _t_h_e _A_C_M_|, vol. 27, no. 4, pp. 388-392, |April.| 1984. [Gor84] J. Gordon, "Strong RSA keys," _E_l_e_c_t_r_o_n_i_c_s _L_e_t_t_e_r_s, vol. 20, no. 12, pp. 514-516, |June| 1984. [Hen81] P. S. Henry, "Fast Decryption Algorithm for the Knapsack Cryptographic System," _B_e_l_l _S_y_s_t_e_m _T_e_c_h_n_i_c_a_l _J_o_u_r_n_a_l, vol. 60, no. 5, pp. 767-773, Bell Labs, May-June 1981. [JMN84] A. M. Jackson, N. A. McEvoy and B. B. Newman, "The Project Universe Experiment," in _|_P_r_o_c_._| _1_9_8_4 _|_I_n_t_._| _|_C_o_n_f_._| _o_n _S_e_c_u_r_e _C_o_m_m_u_n_i_c_a_t_i_o_n_s _S_y_s_t_e_m_s, pp. 14-19, IEE, London, 22-23 |February.| 1984. [Jan87] A. Janiak, "_E_D_S_P_-_2_4_7 _E_n_c_r_y_p_t_i_o_n _C_h_i_p_s_e_t," Product Info, Kencomp - A Division of Janiak Holdings Pty. Ltd., 2 Regina Street, Kilsyth, Vic 3137, Australia, 1987. [KBS86a] D. Khoo, D. J. Bird and J. Seberry, "_E_n_c_r_y_p_t_i_o_n _e_x_p_o_n_e_n_t_s _3 _a_n_d _t_h_e _s_e_c_u_r_i_t_y _o_f _R_S_A," Tech. Rep. 275, Basser Dept. of Computer Science, Uni. of Sydney, 1986. [KBS86b] D. Khoo, D. J. Bird and J. Seberry, "Encryption Exponent 3 and the Security of RSA," _E_u_r_o_c_r_y_p_t _8_6 _- _A_b_s_t_r_a_c_t_s _o_f _P_a_p_e_r_s , Linkoping, Sweden, 20-22 May 1986. also Journal -- 1155 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm Cryptology (to appear). [Knu81] D. E. Knuth, _S_e_m_i_n_u_m_e_r_i_c_a_l _A_l_g_o_r_i_t_h_m_s, The Art of Computer Programming, no. 2, |Addison Wesley, London, 1981. [Koh78] L. M. Kohnfelder, "On the Signature Reblocking Problem in Public-Key Cryptosystems," _|_C_o_m_m_. _o_f _t_h_e _A_C_M_|, vol. 21, no. 2, p. 179, |February.| 1978. [MoA85] S. B. Mohan and B. S. Adiga, "Fast Algorithms for Implementing RSA Public Key Cryptosystem," _E_l_e_c_t_r_o_n_i_c_s _L_e_t_t_e_r_s, vol. 21, no. 17, p. 761, |August.| 1985. [Mul83] C. Muller-Schloer, "A Microprocessor-based Cryptoprocessor," _I_E_E_E _M_i_c_r_o , pp. 5-15, |October.| 1983. [ORS87] G. A. Orton, M. P. Roy, P. A. Scott, L. E. Peppard and S. E. Tavares, _A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y _- _C_R_Y_P_T_O_'_8_6, Lecture Notes in Computer Science, no. 263, pp. 277-301, |Springer Verlag|, Berlin, 1987. [Pom81] C. Pomerance, "Recent Developments in Primality Testing," _T_h_e _M_a_t_h_e_m_a_t_i_c_a_l _I_n_t_e_l_l_i_g_e_n_c_e_r, vol. 3, no. 3, pp. 97-105, 1981. [Pur83] G. B. Purdy, "A Carry-Free Algorithm for Finding the Greatest Common Divisor of Two Integers," _C_o_m_p_u_t_e_r_s _a_n_d _M_a_t_h_e_m_a_t_i_c_s _w_i_t_h _A_p_p_l_i_c_a_t_i_o_n_s, vol. 9, no. 2, pp. 311-316, 1983. [QuC82] J. J. Quisquater and C. Couvreur, "Fast Decipherment Algorithm for RSA Public-Key Cryptosystem," _E_l_e_c_t_r_o_n_i_c_s _L_e_t_t_e_r_s, vol. 18, no. 21, pp. 905-907, |October.| 1982. [RSW82] R. F. Rieden, J. B. Snyder, R. J. Widman and W. J. Barnard, "_A _T_w_o_-_C_h_i_p _I_m_p_l_e_m_e_n_t_a_t_i_o_n _o_f _t_h_e _R_S_A _P_u_b_l_i_c_-_K_e_y _E_n_c_r_y_p_t_i_o_n _A_l_g_o_r_i_t_h_m," SAND82-0994, Sandia National Labs, Albuquerque, NM 87185, 1982. [RSA78] R. L. Rivest, A. Shamir and L. Adleman, "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems," _|_C_o_m_m_. _o_f _t_h_e _A_C_M_|, vol. 21, no. 2, pp. 120-126, |February.| 1978. [Riv80] R. L. Rivest, "A Description of a Single-Chip Implementation of the RSA cipher," _L_a_m_b_d_a, vol. 1, no. 3, pp. 14-18, 1980. [Riv84] R. L. Rivest, "RSA Chips (Past/Present/Future)," in _A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y_: _|_P_r_o_c_._| _o_f _E_u_r_o_c_r_y_p_t _8_4, Lecture Notes In Computer Science, no. 209, T. Beth (editor), pp. 159-165, |Springer Verlag|, Berlin, |April.| 1984. -- 1166 -- TTeecchhnniiqquueess ffoorr IImmpplleemmeennttiinngg tthhee RRSSAA PPuubblliicc KKeeyy CCrryyppttoossyysstteemm [ScA83] G. C. Schaller and J. F. Arnold, "Microprocessor Based Implementation of the Public Key Encryption System," in _1_9_t_h _|_I_n_t_._| _E_l_e_c_t_r_o_n_i_c_s _C_o_n_v_e_n_t_i_o_n _a_n_d _E_x_h_i_b_i_t_i_o_n_. _D_i_g_e_s_t _o_f _P_a_p_e_r_s_., pp. 300-302, IREE, 1983. [Sch84] M. R. Schroeder, _N_u_m_b_e_r _T_h_e_o_r_y _i_n _S_c_i_e_n_c_e _a_n_d _C_o_m_m_u_n_i_c_a_t_i_o_n, Springer-Verlag, 1984. [SBC84] S. C. Serpell, C. B. Brookson and B. L. Clark, "A Prototype Encryption System Using Public Key," in _A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y_: _|_P_r_o_c_._| _o_f _C_r_y_p_t_o _8_4, Lecture Notes in Computer Science, no. 196, G. R. Blakley and D. Chaum (editors), pp. 3-9, |Springer Verlag|, Berlin, |August.| 1984. [SoS77] R. Solovay and V. Strassen, "A Fast Monte Carlo Test for Primality," _S_i_a_m _J_o_u_r_n_a_l _o_n _C_o_m_p_u_t_i_n_g, vol. 6, no. 1, pp. 84-85, |March.| 1977. [Wil86] H. C. Williams, "An M^3 Public-Key Encryption Scheme," in _A_d_v_a_n_c_e_s _i_n _C_r_y_p_t_o_l_o_g_y _- _C_r_y_p_t_o _8_5, Lecture Notes in Computer Science, no. 218, H. C. Williams (editor), pp. 358-368, |Springer Verlag|, Berlin, 1986. [WiC81] R. Willoner and I. Chen, "An Algorithm for Modular Exponentiation," in _|_P_r_o_c_._| _5_t_h _S_y_m_p_o_s_i_u_m _o_n _C_o_m_p_u_t_e_r _A_r_i_t_h_m_e_t_i_c_, _1_9_8_1_., pp. 135-138, 1981. [Zim86] P. Zimmermann, "A Proposed Standard Format for RSA Cryptosystems," _C_o_m_p_u_t_e_r, vol. 19, no. 9, pp. 21-34, |September.| 1986. [|||86] "_I_n_f_o_r_m_a_t_i_o_n _P_r_o_c_e_s_s_i_n_g _- _D_a_t_a _C_r_y_p_t_o_g_r_a_p_h_i_c _T_e_c_h_n_i_q_u_e_s _- _S_p_e_c_i_f_i_c_a_t_i_o_n _o_f _D_E_A _2_, _a _P_u_b_l_i_c _K_e_y _A_l_g_o_r_i_t_h_m," ISO/TC 97/SC 20 N110, |ISO|, Paris, |January.| 1986. -- 1177 --