security/nss/lib/freebl/ecl/README.FP

Wed, 31 Dec 2014 06:09:35 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Wed, 31 Dec 2014 06:09:35 +0100
changeset 0
6474c204b198
permissions
-rw-r--r--

Cloned upstream origin tor-browser at tor-browser-31.3.0esr-4.5-1-build1
revision ID fc1c9ff7c1b2defdbc039f12214767608f46423f for hacking purpose.

michael@0 1 This Source Code Form is subject to the terms of the Mozilla Public
michael@0 2 License, v. 2.0. If a copy of the MPL was not distributed with this
michael@0 3 file, You can obtain one at http://mozilla.org/MPL/2.0/.
michael@0 4
michael@0 5 The ECL exposes routines for constructing and converting curve
michael@0 6 parameters for internal use.
michael@0 7
michael@0 8 The floating point code of the ECL provides algorithms for performing
michael@0 9 elliptic-curve point multiplications in floating point.
michael@0 10
michael@0 11 The point multiplication algorithms perform calculations almost
michael@0 12 exclusively in floating point for efficiency, but have the same
michael@0 13 (integer) interface as the ECL for compatibility and to be easily
michael@0 14 wired-in to the ECL. Please see README file (not this README.FP file)
michael@0 15 for information on wiring-in.
michael@0 16
michael@0 17 This has been implemented for 3 curves as specified in [1]:
michael@0 18 secp160r1
michael@0 19 secp192r1
michael@0 20 secp224r1
michael@0 21
michael@0 22 RATIONALE
michael@0 23 =========
michael@0 24 Calculations are done in the floating-point unit (FPU) since it
michael@0 25 gives better performance on the UltraSPARC III chips. This is
michael@0 26 because the FPU allows for faster multiplication than the integer unit.
michael@0 27 The integer unit has a longer multiplication instruction latency, and
michael@0 28 does not allow full pipelining, as described in [2].
michael@0 29 Since performance is an important selling feature of Elliptic Curve
michael@0 30 Cryptography (ECC), this implementation was created.
michael@0 31
michael@0 32 DATA REPRESENTATION
michael@0 33 ===================
michael@0 34 Data is primarily represented in an array of double-precision floating
michael@0 35 point numbers. Generally, each array element has 24 bits of precision
michael@0 36 (i.e. be x * 2^y, where x is an integer of at most 24 bits, y some positive
michael@0 37 integer), although the actual implementation details are more complicated.
michael@0 38
michael@0 39 e.g. a way to store an 80 bit number might be:
michael@0 40 double p[4] = { 632613 * 2^0, 329841 * 2^24, 9961 * 2^48, 51 * 2^64 };
michael@0 41 See section ARITHMETIC OPERATIONS for more details.
michael@0 42
michael@0 43 This implementation assumes that the floating-point unit rounding mode
michael@0 44 is round-to-even as specified in IEEE 754
michael@0 45 (as opposed to chopping, rounding up, or rounding down).
michael@0 46 When subtracting integers represented as arrays of floating point
michael@0 47 numbers, some coefficients (array elements) may become negative.
michael@0 48 This effectively gives an extra bit of precision that is important
michael@0 49 for correctness in some cases.
michael@0 50
michael@0 51 The described number presentation limits the size of integers to 1023 bits.
michael@0 52 This is due to an upper bound of 1024 for the exponent of a double precision
michael@0 53 floating point number as specified in IEEE-754.
michael@0 54 However, this is acceptable for ECC key sizes of the foreseeable future.
michael@0 55
michael@0 56 DATA STRUCTURES
michael@0 57 ===============
michael@0 58 For more information on coordinate representations, see [3].
michael@0 59
michael@0 60 ecfp_aff_pt
michael@0 61 -----------
michael@0 62 Affine EC Point Representation. This is the basic
michael@0 63 representation (x, y) of an elliptic curve point.
michael@0 64
michael@0 65 ecfp_jac_pt
michael@0 66 -----------
michael@0 67 Jacobian EC Point. This stores a point as (X, Y, Z), where
michael@0 68 the affine point corresponds to (X/Z^2, Y/Z^3). This allows
michael@0 69 for fewer inversions in calculations.
michael@0 70
michael@0 71 ecfp_chud_pt
michael@0 72 ------------
michael@0 73 Chudnovsky Jacobian Point. This representation stores a point
michael@0 74 as (X, Y, Z, Z^2, Z^3), the same as a Jacobian representation
michael@0 75 but also storing Z^2 and Z^3 for faster point additions.
michael@0 76
michael@0 77 ecfp_jm_pt
michael@0 78 ----------
michael@0 79 Modified Jacobian Point. This representation stores a point
michael@0 80 as (X, Y, Z, a*Z^4), the same as Jacobian representation but
michael@0 81 also storing a*Z^4 for faster point doublings. Here "a" represents
michael@0 82 the linear coefficient of x defining the curve.
michael@0 83
michael@0 84 EC_group_fp
michael@0 85 -----------
michael@0 86 Stores information on the elliptic curve group for floating
michael@0 87 point calculations. Contains curve specific information, as
michael@0 88 well as function pointers to routines, allowing different
michael@0 89 optimizations to be easily wired in.
michael@0 90 This should be made accessible from an ECGroup for the floating
michael@0 91 point implementations of point multiplication.
michael@0 92
michael@0 93 POINT MULTIPLICATION ALGORITHMS
michael@0 94 ===============================
michael@0 95 Elliptic Curve Point multiplication can be done at a higher level orthogonal
michael@0 96 to the implementation of point additions and point doublings. There
michael@0 97 are a variety of algorithms that can be used.
michael@0 98
michael@0 99 The following algorithms have been implemented:
michael@0 100
michael@0 101 4-bit Window (Jacobian Coordinates)
michael@0 102 Double & Add (Jacobian & Affine Coordinates)
michael@0 103 5-bit Non-Adjacent Form (Modified Jacobian & Chudnovsky Jacobian)
michael@0 104
michael@0 105 Currently, the fastest algorithm for multiplying a generic point
michael@0 106 is the 5-bit Non-Adjacent Form.
michael@0 107
michael@0 108 See comments in ecp_fp.c for more details and references.
michael@0 109
michael@0 110 SOURCE / HEADER FILES
michael@0 111 =====================
michael@0 112
michael@0 113 ecp_fp.c
michael@0 114 --------
michael@0 115 Main source file for floating point calculations. Contains routines
michael@0 116 to convert from floating-point to integer (mp_int format), point
michael@0 117 multiplication algorithms, and several other routines.
michael@0 118
michael@0 119 ecp_fp.h
michael@0 120 --------
michael@0 121 Main header file. Contains most constants used and function prototypes.
michael@0 122
michael@0 123 ecp_fp[160, 192, 224].c
michael@0 124 -----------------------
michael@0 125 Source files for specific curves. Contains curve specific code such
michael@0 126 as specialized reduction based on the field defining prime. Contains
michael@0 127 code wiring-in different algorithms and optimizations.
michael@0 128
michael@0 129 ecp_fpinc.c
michael@0 130 -----------
michael@0 131 Source file that is included by ecp_fp[160, 192, 224].c. This generates
michael@0 132 functions with different preprocessor-defined names and loop iterations,
michael@0 133 allowing for static linking and strong compiler optimizations without
michael@0 134 code duplication.
michael@0 135
michael@0 136 TESTING
michael@0 137 =======
michael@0 138 The test suite can be found in ecl/tests/ecp_fpt. This tests and gets
michael@0 139 timings of the different algorithms for the curves implemented.
michael@0 140
michael@0 141 ARITHMETIC OPERATIONS
michael@0 142 ---------------------
michael@0 143 The primary operations in ECC over the prime fields are modular arithmetic:
michael@0 144 i.e. n * m (mod p) and n + m (mod p). In this implementation, multiplication,
michael@0 145 addition, and reduction are implemented as separate functions. This
michael@0 146 enables computation of formulae with fewer reductions, e.g.
michael@0 147 (a * b) + (c * d) (mod p) rather than:
michael@0 148 ((a * b) (mod p)) + ((c * d) (mod p)) (mod p)
michael@0 149 This takes advantage of the fact that the double precision mantissa in
michael@0 150 floating point can hold numbers up to 2^53, i.e. it has some leeway to
michael@0 151 store larger intermediate numbers. See further detail in the section on
michael@0 152 FLOATING POINT PRECISION.
michael@0 153
michael@0 154 Multiplication
michael@0 155 --------------
michael@0 156 Multiplication is implemented in a standard polynomial multiplication
michael@0 157 fashion. The terms in opposite factors are pairwise multiplied and
michael@0 158 added together appropriately. Note that the result requires twice
michael@0 159 as many doubles for storage, as the bit size of the product is twice
michael@0 160 that of the multiplicands.
michael@0 161 e.g. suppose we have double n[3], m[3], r[6], and want to calculate r = n * m
michael@0 162 r[0] = n[0] * m[0]
michael@0 163 r[1] = n[0] * m[1] + n[1] * m[0]
michael@0 164 r[2] = n[0] * m[2] + n[1] * m[1] + n[2] * m[0]
michael@0 165 r[3] = n[1] * m[2] + n[2] * m[1]
michael@0 166 r[4] = n[2] * m[2]
michael@0 167 r[5] = 0 (This is used later to hold spillover from r[4], see tidying in
michael@0 168 the reduction section.)
michael@0 169
michael@0 170 Addition
michael@0 171 --------
michael@0 172 Addition is done term by term. The only caveat is to be careful with
michael@0 173 the number of terms that need to be added. When adding results of
michael@0 174 multiplication (before reduction), twice as many terms need to be added
michael@0 175 together. This is done in the addLong function.
michael@0 176 e.g. for double n[4], m[4], r[4]: r = n + m
michael@0 177 r[0] = n[0] + m[0]
michael@0 178 r[1] = n[1] + m[1]
michael@0 179 r[2] = n[2] + m[2]
michael@0 180 r[3] = n[3] + m[3]
michael@0 181
michael@0 182 Modular Reduction
michael@0 183 -----------------
michael@0 184 For the curves implemented, reduction is possible by fast reduction
michael@0 185 for Generalized Mersenne Primes, as described in [4]. For the
michael@0 186 floating point implementation, a significant step of the reduction
michael@0 187 process is tidying: that is, the propagation of carry bits from
michael@0 188 low-order to high-order coefficients to reduce the precision of each
michael@0 189 coefficient to 24 bits.
michael@0 190 This is done by adding and then subtracting
michael@0 191 ecfp_alpha, a large floating point number that induces precision roundoff.
michael@0 192 See [5] for more details on tidying using floating point arithmetic.
michael@0 193 e.g. suppose we have r = 961838 * 2^24 + 519308
michael@0 194 then if we set alpha = 3 * 2^51 * 2^24,
michael@0 195 FP(FP(r + alpha) - alpha) = 961838 * 2^24, because the precision for
michael@0 196 the intermediate results is limited. Our values of alpha are chosen
michael@0 197 to truncate to a desired number of bits.
michael@0 198
michael@0 199 The reduction is then performed as in [4], adding multiples of prime p.
michael@0 200 e.g. suppose we are working over a polynomial of 10^2. Take the number
michael@0 201 2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95, stored in 5 elements
michael@0 202 for coefficients of 10^0, 10^2, ..., 10^8.
michael@0 203 We wish to reduce modulo p = 10^6 - 2 * 10^4 + 1
michael@0 204 We can subtract off from the higher terms
michael@0 205 (2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95) - (2 * 10^2) * (10^6 - 2 * 10^4 + 1)
michael@0 206 = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95
michael@0 207 = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 - (15) * (10^6 - 2 * 10^4 + 1)
michael@0 208 = 83 * 10^4 + 21 * 10^2 + 80
michael@0 209
michael@0 210 Integrated Example
michael@0 211 ------------------
michael@0 212 This example shows how multiplication, addition, tidying, and reduction
michael@0 213 work together in our modular arithmetic. This is simplified from the
michael@0 214 actual implementation, but should convey the main concepts.
michael@0 215 Working over polynomials of 10^2 and with p as in the prior example,
michael@0 216 Let a = 16 * 10^4 + 53 * 10^2 + 33
michael@0 217 let b = 81 * 10^4 + 31 * 10^2 + 49
michael@0 218 let c = 22 * 10^4 + 0 * 10^2 + 95
michael@0 219 And suppose we want to compute a * b + c mod p.
michael@0 220 We first do a multiplication: then a * b =
michael@0 221 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5100 * 10^4 + 3620 * 10^2 + 1617
michael@0 222 Then we add in c before doing reduction, allowing us to get a * b + c =
michael@0 223 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712
michael@0 224 We then perform a tidying on the upper half of the terms:
michael@0 225 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6
michael@0 226 0 * 10^10 + (1296 + 47) * 10^8 + 89 * 10^6
michael@0 227 0 * 10^10 + 1343 * 10^8 + 89 * 10^6
michael@0 228 13 * 10^10 + 43 * 10^8 + 89 * 10^6
michael@0 229 which then gives us
michael@0 230 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712
michael@0 231 we then reduce modulo p similar to the reduction example above:
michael@0 232 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712
michael@0 233 - (13 * 10^4 * p)
michael@0 234 69 * 10^8 + 89 * 10^6 + 5109 * 10^4 + 3620 * 10^2 + 1712
michael@0 235 - (69 * 10^2 * p)
michael@0 236 227 * 10^6 + 5109 * 10^4 + 3551 * 10^2 + 1712
michael@0 237 - (227 * p)
michael@0 238 5563 * 10^4 + 3551 * 10^2 + 1485
michael@0 239 finally, we do tidying to get the precision of each term down to 2 digits
michael@0 240 5563 * 10^4 + 3565 * 10^2 + 85
michael@0 241 5598 * 10^4 + 65 * 10^2 + 85
michael@0 242 55 * 10^6 + 98 * 10^4 + 65 * 10^2 + 85
michael@0 243 and perform another reduction step
michael@0 244 - (55 * p)
michael@0 245 208 * 10^4 + 65 * 10^2 + 30
michael@0 246 There may be a small number of further reductions that could be done at
michael@0 247 this point, but this is typically done only at the end when converting
michael@0 248 from floating point to an integer unit representation.
michael@0 249
michael@0 250 FLOATING POINT PRECISION
michael@0 251 ========================
michael@0 252 This section discusses the precision of floating point numbers, which
michael@0 253 one writing new formulae or a larger bit size should be aware of. The
michael@0 254 danger is that an intermediate result may be required to store a
michael@0 255 mantissa larger than 53 bits, which would cause error by rounding off.
michael@0 256
michael@0 257 Note that the tidying with IEEE rounding mode set to round-to-even
michael@0 258 allows negative numbers, which actually reduces the size of the double
michael@0 259 mantissa to 23 bits - since it rounds the mantissa to the nearest number
michael@0 260 modulo 2^24, i.e. roughly between -2^23 and 2^23.
michael@0 261 A multiplication increases the bit size to 2^46 * n, where n is the number
michael@0 262 of doubles to store a number. For the 224 bit curve, n = 10. This gives
michael@0 263 doubles of size 5 * 2^47. Adding two of these doubles gives a result
michael@0 264 of size 5 * 2^48, which is less than 2^53, so this is safe.
michael@0 265 Similar analysis can be done for other formulae to ensure numbers remain
michael@0 266 below 2^53.
michael@0 267
michael@0 268 Extended-Precision Floating Point
michael@0 269 ---------------------------------
michael@0 270 Some platforms, notably x86 Linux, may use an extended-precision floating
michael@0 271 point representation that has a 64-bit mantissa. [6] Although this
michael@0 272 implementation is optimized for the IEEE standard 53-bit mantissa,
michael@0 273 it should work with the 64-bit mantissa. A check is done at run-time
michael@0 274 in the function ec_set_fp_precision that detects if the precision is
michael@0 275 greater than 53 bits, and runs code for the 64-bit mantissa accordingly.
michael@0 276
michael@0 277 REFERENCES
michael@0 278 ==========
michael@0 279 [1] Certicom Corp., "SEC 2: Recommended Elliptic Curve Domain Parameters", Sept. 20, 2000. www.secg.org
michael@0 280 [2] Sun Microsystems Inc. UltraSPARC III Cu User's Manual, Version 1.0, May 2002, Table 4.4
michael@0 281 [3] H. Cohen, A. Miyaji, and T. Ono, "Efficient Elliptic Curve Exponentiation Using Mixed Coordinates".
michael@0 282 [4] Henk C.A. van Tilborg, Generalized Mersenne Prime. http://www.win.tue.nl/~henkvt/GenMersenne.pdf
michael@0 283 [5] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2.
michael@0 284 [6] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2 Notes.

mercurial