michael@0: This Source Code Form is subject to the terms of the Mozilla Public michael@0: License, v. 2.0. If a copy of the MPL was not distributed with this michael@0: file, You can obtain one at http://mozilla.org/MPL/2.0/. michael@0: michael@0: The ECL exposes routines for constructing and converting curve michael@0: parameters for internal use. michael@0: michael@0: The floating point code of the ECL provides algorithms for performing michael@0: elliptic-curve point multiplications in floating point. michael@0: michael@0: The point multiplication algorithms perform calculations almost michael@0: exclusively in floating point for efficiency, but have the same michael@0: (integer) interface as the ECL for compatibility and to be easily michael@0: wired-in to the ECL. Please see README file (not this README.FP file) michael@0: for information on wiring-in. michael@0: michael@0: This has been implemented for 3 curves as specified in [1]: michael@0: secp160r1 michael@0: secp192r1 michael@0: secp224r1 michael@0: michael@0: RATIONALE michael@0: ========= michael@0: Calculations are done in the floating-point unit (FPU) since it michael@0: gives better performance on the UltraSPARC III chips. This is michael@0: because the FPU allows for faster multiplication than the integer unit. michael@0: The integer unit has a longer multiplication instruction latency, and michael@0: does not allow full pipelining, as described in [2]. michael@0: Since performance is an important selling feature of Elliptic Curve michael@0: Cryptography (ECC), this implementation was created. michael@0: michael@0: DATA REPRESENTATION michael@0: =================== michael@0: Data is primarily represented in an array of double-precision floating michael@0: point numbers. Generally, each array element has 24 bits of precision michael@0: (i.e. be x * 2^y, where x is an integer of at most 24 bits, y some positive michael@0: integer), although the actual implementation details are more complicated. michael@0: michael@0: e.g. a way to store an 80 bit number might be: michael@0: double p[4] = { 632613 * 2^0, 329841 * 2^24, 9961 * 2^48, 51 * 2^64 }; michael@0: See section ARITHMETIC OPERATIONS for more details. michael@0: michael@0: This implementation assumes that the floating-point unit rounding mode michael@0: is round-to-even as specified in IEEE 754 michael@0: (as opposed to chopping, rounding up, or rounding down). michael@0: When subtracting integers represented as arrays of floating point michael@0: numbers, some coefficients (array elements) may become negative. michael@0: This effectively gives an extra bit of precision that is important michael@0: for correctness in some cases. michael@0: michael@0: The described number presentation limits the size of integers to 1023 bits. michael@0: This is due to an upper bound of 1024 for the exponent of a double precision michael@0: floating point number as specified in IEEE-754. michael@0: However, this is acceptable for ECC key sizes of the foreseeable future. michael@0: michael@0: DATA STRUCTURES michael@0: =============== michael@0: For more information on coordinate representations, see [3]. michael@0: michael@0: ecfp_aff_pt michael@0: ----------- michael@0: Affine EC Point Representation. This is the basic michael@0: representation (x, y) of an elliptic curve point. michael@0: michael@0: ecfp_jac_pt michael@0: ----------- michael@0: Jacobian EC Point. This stores a point as (X, Y, Z), where michael@0: the affine point corresponds to (X/Z^2, Y/Z^3). This allows michael@0: for fewer inversions in calculations. michael@0: michael@0: ecfp_chud_pt michael@0: ------------ michael@0: Chudnovsky Jacobian Point. This representation stores a point michael@0: as (X, Y, Z, Z^2, Z^3), the same as a Jacobian representation michael@0: but also storing Z^2 and Z^3 for faster point additions. michael@0: michael@0: ecfp_jm_pt michael@0: ---------- michael@0: Modified Jacobian Point. This representation stores a point michael@0: as (X, Y, Z, a*Z^4), the same as Jacobian representation but michael@0: also storing a*Z^4 for faster point doublings. Here "a" represents michael@0: the linear coefficient of x defining the curve. michael@0: michael@0: EC_group_fp michael@0: ----------- michael@0: Stores information on the elliptic curve group for floating michael@0: point calculations. Contains curve specific information, as michael@0: well as function pointers to routines, allowing different michael@0: optimizations to be easily wired in. michael@0: This should be made accessible from an ECGroup for the floating michael@0: point implementations of point multiplication. michael@0: michael@0: POINT MULTIPLICATION ALGORITHMS michael@0: =============================== michael@0: Elliptic Curve Point multiplication can be done at a higher level orthogonal michael@0: to the implementation of point additions and point doublings. There michael@0: are a variety of algorithms that can be used. michael@0: michael@0: The following algorithms have been implemented: michael@0: michael@0: 4-bit Window (Jacobian Coordinates) michael@0: Double & Add (Jacobian & Affine Coordinates) michael@0: 5-bit Non-Adjacent Form (Modified Jacobian & Chudnovsky Jacobian) michael@0: michael@0: Currently, the fastest algorithm for multiplying a generic point michael@0: is the 5-bit Non-Adjacent Form. michael@0: michael@0: See comments in ecp_fp.c for more details and references. michael@0: michael@0: SOURCE / HEADER FILES michael@0: ===================== michael@0: michael@0: ecp_fp.c michael@0: -------- michael@0: Main source file for floating point calculations. Contains routines michael@0: to convert from floating-point to integer (mp_int format), point michael@0: multiplication algorithms, and several other routines. michael@0: michael@0: ecp_fp.h michael@0: -------- michael@0: Main header file. Contains most constants used and function prototypes. michael@0: michael@0: ecp_fp[160, 192, 224].c michael@0: ----------------------- michael@0: Source files for specific curves. Contains curve specific code such michael@0: as specialized reduction based on the field defining prime. Contains michael@0: code wiring-in different algorithms and optimizations. michael@0: michael@0: ecp_fpinc.c michael@0: ----------- michael@0: Source file that is included by ecp_fp[160, 192, 224].c. This generates michael@0: functions with different preprocessor-defined names and loop iterations, michael@0: allowing for static linking and strong compiler optimizations without michael@0: code duplication. michael@0: michael@0: TESTING michael@0: ======= michael@0: The test suite can be found in ecl/tests/ecp_fpt. This tests and gets michael@0: timings of the different algorithms for the curves implemented. michael@0: michael@0: ARITHMETIC OPERATIONS michael@0: --------------------- michael@0: The primary operations in ECC over the prime fields are modular arithmetic: michael@0: i.e. n * m (mod p) and n + m (mod p). In this implementation, multiplication, michael@0: addition, and reduction are implemented as separate functions. This michael@0: enables computation of formulae with fewer reductions, e.g. michael@0: (a * b) + (c * d) (mod p) rather than: michael@0: ((a * b) (mod p)) + ((c * d) (mod p)) (mod p) michael@0: This takes advantage of the fact that the double precision mantissa in michael@0: floating point can hold numbers up to 2^53, i.e. it has some leeway to michael@0: store larger intermediate numbers. See further detail in the section on michael@0: FLOATING POINT PRECISION. michael@0: michael@0: Multiplication michael@0: -------------- michael@0: Multiplication is implemented in a standard polynomial multiplication michael@0: fashion. The terms in opposite factors are pairwise multiplied and michael@0: added together appropriately. Note that the result requires twice michael@0: as many doubles for storage, as the bit size of the product is twice michael@0: that of the multiplicands. michael@0: e.g. suppose we have double n[3], m[3], r[6], and want to calculate r = n * m michael@0: r[0] = n[0] * m[0] michael@0: r[1] = n[0] * m[1] + n[1] * m[0] michael@0: r[2] = n[0] * m[2] + n[1] * m[1] + n[2] * m[0] michael@0: r[3] = n[1] * m[2] + n[2] * m[1] michael@0: r[4] = n[2] * m[2] michael@0: r[5] = 0 (This is used later to hold spillover from r[4], see tidying in michael@0: the reduction section.) michael@0: michael@0: Addition michael@0: -------- michael@0: Addition is done term by term. The only caveat is to be careful with michael@0: the number of terms that need to be added. When adding results of michael@0: multiplication (before reduction), twice as many terms need to be added michael@0: together. This is done in the addLong function. michael@0: e.g. for double n[4], m[4], r[4]: r = n + m michael@0: r[0] = n[0] + m[0] michael@0: r[1] = n[1] + m[1] michael@0: r[2] = n[2] + m[2] michael@0: r[3] = n[3] + m[3] michael@0: michael@0: Modular Reduction michael@0: ----------------- michael@0: For the curves implemented, reduction is possible by fast reduction michael@0: for Generalized Mersenne Primes, as described in [4]. For the michael@0: floating point implementation, a significant step of the reduction michael@0: process is tidying: that is, the propagation of carry bits from michael@0: low-order to high-order coefficients to reduce the precision of each michael@0: coefficient to 24 bits. michael@0: This is done by adding and then subtracting michael@0: ecfp_alpha, a large floating point number that induces precision roundoff. michael@0: See [5] for more details on tidying using floating point arithmetic. michael@0: e.g. suppose we have r = 961838 * 2^24 + 519308 michael@0: then if we set alpha = 3 * 2^51 * 2^24, michael@0: FP(FP(r + alpha) - alpha) = 961838 * 2^24, because the precision for michael@0: the intermediate results is limited. Our values of alpha are chosen michael@0: to truncate to a desired number of bits. michael@0: michael@0: The reduction is then performed as in [4], adding multiples of prime p. michael@0: e.g. suppose we are working over a polynomial of 10^2. Take the number michael@0: 2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95, stored in 5 elements michael@0: for coefficients of 10^0, 10^2, ..., 10^8. michael@0: We wish to reduce modulo p = 10^6 - 2 * 10^4 + 1 michael@0: We can subtract off from the higher terms michael@0: (2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95) - (2 * 10^2) * (10^6 - 2 * 10^4 + 1) michael@0: = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 michael@0: = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 - (15) * (10^6 - 2 * 10^4 + 1) michael@0: = 83 * 10^4 + 21 * 10^2 + 80 michael@0: michael@0: Integrated Example michael@0: ------------------ michael@0: This example shows how multiplication, addition, tidying, and reduction michael@0: work together in our modular arithmetic. This is simplified from the michael@0: actual implementation, but should convey the main concepts. michael@0: Working over polynomials of 10^2 and with p as in the prior example, michael@0: Let a = 16 * 10^4 + 53 * 10^2 + 33 michael@0: let b = 81 * 10^4 + 31 * 10^2 + 49 michael@0: let c = 22 * 10^4 + 0 * 10^2 + 95 michael@0: And suppose we want to compute a * b + c mod p. michael@0: We first do a multiplication: then a * b = michael@0: 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5100 * 10^4 + 3620 * 10^2 + 1617 michael@0: Then we add in c before doing reduction, allowing us to get a * b + c = michael@0: 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 michael@0: We then perform a tidying on the upper half of the terms: michael@0: 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 michael@0: 0 * 10^10 + (1296 + 47) * 10^8 + 89 * 10^6 michael@0: 0 * 10^10 + 1343 * 10^8 + 89 * 10^6 michael@0: 13 * 10^10 + 43 * 10^8 + 89 * 10^6 michael@0: which then gives us michael@0: 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 michael@0: we then reduce modulo p similar to the reduction example above: michael@0: 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 michael@0: - (13 * 10^4 * p) michael@0: 69 * 10^8 + 89 * 10^6 + 5109 * 10^4 + 3620 * 10^2 + 1712 michael@0: - (69 * 10^2 * p) michael@0: 227 * 10^6 + 5109 * 10^4 + 3551 * 10^2 + 1712 michael@0: - (227 * p) michael@0: 5563 * 10^4 + 3551 * 10^2 + 1485 michael@0: finally, we do tidying to get the precision of each term down to 2 digits michael@0: 5563 * 10^4 + 3565 * 10^2 + 85 michael@0: 5598 * 10^4 + 65 * 10^2 + 85 michael@0: 55 * 10^6 + 98 * 10^4 + 65 * 10^2 + 85 michael@0: and perform another reduction step michael@0: - (55 * p) michael@0: 208 * 10^4 + 65 * 10^2 + 30 michael@0: There may be a small number of further reductions that could be done at michael@0: this point, but this is typically done only at the end when converting michael@0: from floating point to an integer unit representation. michael@0: michael@0: FLOATING POINT PRECISION michael@0: ======================== michael@0: This section discusses the precision of floating point numbers, which michael@0: one writing new formulae or a larger bit size should be aware of. The michael@0: danger is that an intermediate result may be required to store a michael@0: mantissa larger than 53 bits, which would cause error by rounding off. michael@0: michael@0: Note that the tidying with IEEE rounding mode set to round-to-even michael@0: allows negative numbers, which actually reduces the size of the double michael@0: mantissa to 23 bits - since it rounds the mantissa to the nearest number michael@0: modulo 2^24, i.e. roughly between -2^23 and 2^23. michael@0: A multiplication increases the bit size to 2^46 * n, where n is the number michael@0: of doubles to store a number. For the 224 bit curve, n = 10. This gives michael@0: doubles of size 5 * 2^47. Adding two of these doubles gives a result michael@0: of size 5 * 2^48, which is less than 2^53, so this is safe. michael@0: Similar analysis can be done for other formulae to ensure numbers remain michael@0: below 2^53. michael@0: michael@0: Extended-Precision Floating Point michael@0: --------------------------------- michael@0: Some platforms, notably x86 Linux, may use an extended-precision floating michael@0: point representation that has a 64-bit mantissa. [6] Although this michael@0: implementation is optimized for the IEEE standard 53-bit mantissa, michael@0: it should work with the 64-bit mantissa. A check is done at run-time michael@0: in the function ec_set_fp_precision that detects if the precision is michael@0: greater than 53 bits, and runs code for the 64-bit mantissa accordingly. michael@0: michael@0: REFERENCES michael@0: ========== michael@0: [1] Certicom Corp., "SEC 2: Recommended Elliptic Curve Domain Parameters", Sept. 20, 2000. www.secg.org michael@0: [2] Sun Microsystems Inc. UltraSPARC III Cu User's Manual, Version 1.0, May 2002, Table 4.4 michael@0: [3] H. Cohen, A. Miyaji, and T. Ono, "Efficient Elliptic Curve Exponentiation Using Mixed Coordinates". michael@0: [4] Henk C.A. van Tilborg, Generalized Mersenne Prime. http://www.win.tue.nl/~henkvt/GenMersenne.pdf michael@0: [5] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2. michael@0: [6] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2 Notes.