Wed, 31 Dec 2014 06:09:35 +0100
Cloned upstream origin tor-browser at tor-browser-31.3.0esr-4.5-1-build1
revision ID fc1c9ff7c1b2defdbc039f12214767608f46423f for hacking purpose.
michael@0 | 1 | This Source Code Form is subject to the terms of the Mozilla Public |
michael@0 | 2 | License, v. 2.0. If a copy of the MPL was not distributed with this |
michael@0 | 3 | file, You can obtain one at http://mozilla.org/MPL/2.0/. |
michael@0 | 4 | |
michael@0 | 5 | The ECL exposes routines for constructing and converting curve |
michael@0 | 6 | parameters for internal use. |
michael@0 | 7 | |
michael@0 | 8 | The floating point code of the ECL provides algorithms for performing |
michael@0 | 9 | elliptic-curve point multiplications in floating point. |
michael@0 | 10 | |
michael@0 | 11 | The point multiplication algorithms perform calculations almost |
michael@0 | 12 | exclusively in floating point for efficiency, but have the same |
michael@0 | 13 | (integer) interface as the ECL for compatibility and to be easily |
michael@0 | 14 | wired-in to the ECL. Please see README file (not this README.FP file) |
michael@0 | 15 | for information on wiring-in. |
michael@0 | 16 | |
michael@0 | 17 | This has been implemented for 3 curves as specified in [1]: |
michael@0 | 18 | secp160r1 |
michael@0 | 19 | secp192r1 |
michael@0 | 20 | secp224r1 |
michael@0 | 21 | |
michael@0 | 22 | RATIONALE |
michael@0 | 23 | ========= |
michael@0 | 24 | Calculations are done in the floating-point unit (FPU) since it |
michael@0 | 25 | gives better performance on the UltraSPARC III chips. This is |
michael@0 | 26 | because the FPU allows for faster multiplication than the integer unit. |
michael@0 | 27 | The integer unit has a longer multiplication instruction latency, and |
michael@0 | 28 | does not allow full pipelining, as described in [2]. |
michael@0 | 29 | Since performance is an important selling feature of Elliptic Curve |
michael@0 | 30 | Cryptography (ECC), this implementation was created. |
michael@0 | 31 | |
michael@0 | 32 | DATA REPRESENTATION |
michael@0 | 33 | =================== |
michael@0 | 34 | Data is primarily represented in an array of double-precision floating |
michael@0 | 35 | point numbers. Generally, each array element has 24 bits of precision |
michael@0 | 36 | (i.e. be x * 2^y, where x is an integer of at most 24 bits, y some positive |
michael@0 | 37 | integer), although the actual implementation details are more complicated. |
michael@0 | 38 | |
michael@0 | 39 | e.g. a way to store an 80 bit number might be: |
michael@0 | 40 | double p[4] = { 632613 * 2^0, 329841 * 2^24, 9961 * 2^48, 51 * 2^64 }; |
michael@0 | 41 | See section ARITHMETIC OPERATIONS for more details. |
michael@0 | 42 | |
michael@0 | 43 | This implementation assumes that the floating-point unit rounding mode |
michael@0 | 44 | is round-to-even as specified in IEEE 754 |
michael@0 | 45 | (as opposed to chopping, rounding up, or rounding down). |
michael@0 | 46 | When subtracting integers represented as arrays of floating point |
michael@0 | 47 | numbers, some coefficients (array elements) may become negative. |
michael@0 | 48 | This effectively gives an extra bit of precision that is important |
michael@0 | 49 | for correctness in some cases. |
michael@0 | 50 | |
michael@0 | 51 | The described number presentation limits the size of integers to 1023 bits. |
michael@0 | 52 | This is due to an upper bound of 1024 for the exponent of a double precision |
michael@0 | 53 | floating point number as specified in IEEE-754. |
michael@0 | 54 | However, this is acceptable for ECC key sizes of the foreseeable future. |
michael@0 | 55 | |
michael@0 | 56 | DATA STRUCTURES |
michael@0 | 57 | =============== |
michael@0 | 58 | For more information on coordinate representations, see [3]. |
michael@0 | 59 | |
michael@0 | 60 | ecfp_aff_pt |
michael@0 | 61 | ----------- |
michael@0 | 62 | Affine EC Point Representation. This is the basic |
michael@0 | 63 | representation (x, y) of an elliptic curve point. |
michael@0 | 64 | |
michael@0 | 65 | ecfp_jac_pt |
michael@0 | 66 | ----------- |
michael@0 | 67 | Jacobian EC Point. This stores a point as (X, Y, Z), where |
michael@0 | 68 | the affine point corresponds to (X/Z^2, Y/Z^3). This allows |
michael@0 | 69 | for fewer inversions in calculations. |
michael@0 | 70 | |
michael@0 | 71 | ecfp_chud_pt |
michael@0 | 72 | ------------ |
michael@0 | 73 | Chudnovsky Jacobian Point. This representation stores a point |
michael@0 | 74 | as (X, Y, Z, Z^2, Z^3), the same as a Jacobian representation |
michael@0 | 75 | but also storing Z^2 and Z^3 for faster point additions. |
michael@0 | 76 | |
michael@0 | 77 | ecfp_jm_pt |
michael@0 | 78 | ---------- |
michael@0 | 79 | Modified Jacobian Point. This representation stores a point |
michael@0 | 80 | as (X, Y, Z, a*Z^4), the same as Jacobian representation but |
michael@0 | 81 | also storing a*Z^4 for faster point doublings. Here "a" represents |
michael@0 | 82 | the linear coefficient of x defining the curve. |
michael@0 | 83 | |
michael@0 | 84 | EC_group_fp |
michael@0 | 85 | ----------- |
michael@0 | 86 | Stores information on the elliptic curve group for floating |
michael@0 | 87 | point calculations. Contains curve specific information, as |
michael@0 | 88 | well as function pointers to routines, allowing different |
michael@0 | 89 | optimizations to be easily wired in. |
michael@0 | 90 | This should be made accessible from an ECGroup for the floating |
michael@0 | 91 | point implementations of point multiplication. |
michael@0 | 92 | |
michael@0 | 93 | POINT MULTIPLICATION ALGORITHMS |
michael@0 | 94 | =============================== |
michael@0 | 95 | Elliptic Curve Point multiplication can be done at a higher level orthogonal |
michael@0 | 96 | to the implementation of point additions and point doublings. There |
michael@0 | 97 | are a variety of algorithms that can be used. |
michael@0 | 98 | |
michael@0 | 99 | The following algorithms have been implemented: |
michael@0 | 100 | |
michael@0 | 101 | 4-bit Window (Jacobian Coordinates) |
michael@0 | 102 | Double & Add (Jacobian & Affine Coordinates) |
michael@0 | 103 | 5-bit Non-Adjacent Form (Modified Jacobian & Chudnovsky Jacobian) |
michael@0 | 104 | |
michael@0 | 105 | Currently, the fastest algorithm for multiplying a generic point |
michael@0 | 106 | is the 5-bit Non-Adjacent Form. |
michael@0 | 107 | |
michael@0 | 108 | See comments in ecp_fp.c for more details and references. |
michael@0 | 109 | |
michael@0 | 110 | SOURCE / HEADER FILES |
michael@0 | 111 | ===================== |
michael@0 | 112 | |
michael@0 | 113 | ecp_fp.c |
michael@0 | 114 | -------- |
michael@0 | 115 | Main source file for floating point calculations. Contains routines |
michael@0 | 116 | to convert from floating-point to integer (mp_int format), point |
michael@0 | 117 | multiplication algorithms, and several other routines. |
michael@0 | 118 | |
michael@0 | 119 | ecp_fp.h |
michael@0 | 120 | -------- |
michael@0 | 121 | Main header file. Contains most constants used and function prototypes. |
michael@0 | 122 | |
michael@0 | 123 | ecp_fp[160, 192, 224].c |
michael@0 | 124 | ----------------------- |
michael@0 | 125 | Source files for specific curves. Contains curve specific code such |
michael@0 | 126 | as specialized reduction based on the field defining prime. Contains |
michael@0 | 127 | code wiring-in different algorithms and optimizations. |
michael@0 | 128 | |
michael@0 | 129 | ecp_fpinc.c |
michael@0 | 130 | ----------- |
michael@0 | 131 | Source file that is included by ecp_fp[160, 192, 224].c. This generates |
michael@0 | 132 | functions with different preprocessor-defined names and loop iterations, |
michael@0 | 133 | allowing for static linking and strong compiler optimizations without |
michael@0 | 134 | code duplication. |
michael@0 | 135 | |
michael@0 | 136 | TESTING |
michael@0 | 137 | ======= |
michael@0 | 138 | The test suite can be found in ecl/tests/ecp_fpt. This tests and gets |
michael@0 | 139 | timings of the different algorithms for the curves implemented. |
michael@0 | 140 | |
michael@0 | 141 | ARITHMETIC OPERATIONS |
michael@0 | 142 | --------------------- |
michael@0 | 143 | The primary operations in ECC over the prime fields are modular arithmetic: |
michael@0 | 144 | i.e. n * m (mod p) and n + m (mod p). In this implementation, multiplication, |
michael@0 | 145 | addition, and reduction are implemented as separate functions. This |
michael@0 | 146 | enables computation of formulae with fewer reductions, e.g. |
michael@0 | 147 | (a * b) + (c * d) (mod p) rather than: |
michael@0 | 148 | ((a * b) (mod p)) + ((c * d) (mod p)) (mod p) |
michael@0 | 149 | This takes advantage of the fact that the double precision mantissa in |
michael@0 | 150 | floating point can hold numbers up to 2^53, i.e. it has some leeway to |
michael@0 | 151 | store larger intermediate numbers. See further detail in the section on |
michael@0 | 152 | FLOATING POINT PRECISION. |
michael@0 | 153 | |
michael@0 | 154 | Multiplication |
michael@0 | 155 | -------------- |
michael@0 | 156 | Multiplication is implemented in a standard polynomial multiplication |
michael@0 | 157 | fashion. The terms in opposite factors are pairwise multiplied and |
michael@0 | 158 | added together appropriately. Note that the result requires twice |
michael@0 | 159 | as many doubles for storage, as the bit size of the product is twice |
michael@0 | 160 | that of the multiplicands. |
michael@0 | 161 | e.g. suppose we have double n[3], m[3], r[6], and want to calculate r = n * m |
michael@0 | 162 | r[0] = n[0] * m[0] |
michael@0 | 163 | r[1] = n[0] * m[1] + n[1] * m[0] |
michael@0 | 164 | r[2] = n[0] * m[2] + n[1] * m[1] + n[2] * m[0] |
michael@0 | 165 | r[3] = n[1] * m[2] + n[2] * m[1] |
michael@0 | 166 | r[4] = n[2] * m[2] |
michael@0 | 167 | r[5] = 0 (This is used later to hold spillover from r[4], see tidying in |
michael@0 | 168 | the reduction section.) |
michael@0 | 169 | |
michael@0 | 170 | Addition |
michael@0 | 171 | -------- |
michael@0 | 172 | Addition is done term by term. The only caveat is to be careful with |
michael@0 | 173 | the number of terms that need to be added. When adding results of |
michael@0 | 174 | multiplication (before reduction), twice as many terms need to be added |
michael@0 | 175 | together. This is done in the addLong function. |
michael@0 | 176 | e.g. for double n[4], m[4], r[4]: r = n + m |
michael@0 | 177 | r[0] = n[0] + m[0] |
michael@0 | 178 | r[1] = n[1] + m[1] |
michael@0 | 179 | r[2] = n[2] + m[2] |
michael@0 | 180 | r[3] = n[3] + m[3] |
michael@0 | 181 | |
michael@0 | 182 | Modular Reduction |
michael@0 | 183 | ----------------- |
michael@0 | 184 | For the curves implemented, reduction is possible by fast reduction |
michael@0 | 185 | for Generalized Mersenne Primes, as described in [4]. For the |
michael@0 | 186 | floating point implementation, a significant step of the reduction |
michael@0 | 187 | process is tidying: that is, the propagation of carry bits from |
michael@0 | 188 | low-order to high-order coefficients to reduce the precision of each |
michael@0 | 189 | coefficient to 24 bits. |
michael@0 | 190 | This is done by adding and then subtracting |
michael@0 | 191 | ecfp_alpha, a large floating point number that induces precision roundoff. |
michael@0 | 192 | See [5] for more details on tidying using floating point arithmetic. |
michael@0 | 193 | e.g. suppose we have r = 961838 * 2^24 + 519308 |
michael@0 | 194 | then if we set alpha = 3 * 2^51 * 2^24, |
michael@0 | 195 | FP(FP(r + alpha) - alpha) = 961838 * 2^24, because the precision for |
michael@0 | 196 | the intermediate results is limited. Our values of alpha are chosen |
michael@0 | 197 | to truncate to a desired number of bits. |
michael@0 | 198 | |
michael@0 | 199 | The reduction is then performed as in [4], adding multiples of prime p. |
michael@0 | 200 | e.g. suppose we are working over a polynomial of 10^2. Take the number |
michael@0 | 201 | 2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95, stored in 5 elements |
michael@0 | 202 | for coefficients of 10^0, 10^2, ..., 10^8. |
michael@0 | 203 | We wish to reduce modulo p = 10^6 - 2 * 10^4 + 1 |
michael@0 | 204 | We can subtract off from the higher terms |
michael@0 | 205 | (2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95) - (2 * 10^2) * (10^6 - 2 * 10^4 + 1) |
michael@0 | 206 | = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 |
michael@0 | 207 | = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 - (15) * (10^6 - 2 * 10^4 + 1) |
michael@0 | 208 | = 83 * 10^4 + 21 * 10^2 + 80 |
michael@0 | 209 | |
michael@0 | 210 | Integrated Example |
michael@0 | 211 | ------------------ |
michael@0 | 212 | This example shows how multiplication, addition, tidying, and reduction |
michael@0 | 213 | work together in our modular arithmetic. This is simplified from the |
michael@0 | 214 | actual implementation, but should convey the main concepts. |
michael@0 | 215 | Working over polynomials of 10^2 and with p as in the prior example, |
michael@0 | 216 | Let a = 16 * 10^4 + 53 * 10^2 + 33 |
michael@0 | 217 | let b = 81 * 10^4 + 31 * 10^2 + 49 |
michael@0 | 218 | let c = 22 * 10^4 + 0 * 10^2 + 95 |
michael@0 | 219 | And suppose we want to compute a * b + c mod p. |
michael@0 | 220 | We first do a multiplication: then a * b = |
michael@0 | 221 | 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5100 * 10^4 + 3620 * 10^2 + 1617 |
michael@0 | 222 | Then we add in c before doing reduction, allowing us to get a * b + c = |
michael@0 | 223 | 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 |
michael@0 | 224 | We then perform a tidying on the upper half of the terms: |
michael@0 | 225 | 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 |
michael@0 | 226 | 0 * 10^10 + (1296 + 47) * 10^8 + 89 * 10^6 |
michael@0 | 227 | 0 * 10^10 + 1343 * 10^8 + 89 * 10^6 |
michael@0 | 228 | 13 * 10^10 + 43 * 10^8 + 89 * 10^6 |
michael@0 | 229 | which then gives us |
michael@0 | 230 | 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 |
michael@0 | 231 | we then reduce modulo p similar to the reduction example above: |
michael@0 | 232 | 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 |
michael@0 | 233 | - (13 * 10^4 * p) |
michael@0 | 234 | 69 * 10^8 + 89 * 10^6 + 5109 * 10^4 + 3620 * 10^2 + 1712 |
michael@0 | 235 | - (69 * 10^2 * p) |
michael@0 | 236 | 227 * 10^6 + 5109 * 10^4 + 3551 * 10^2 + 1712 |
michael@0 | 237 | - (227 * p) |
michael@0 | 238 | 5563 * 10^4 + 3551 * 10^2 + 1485 |
michael@0 | 239 | finally, we do tidying to get the precision of each term down to 2 digits |
michael@0 | 240 | 5563 * 10^4 + 3565 * 10^2 + 85 |
michael@0 | 241 | 5598 * 10^4 + 65 * 10^2 + 85 |
michael@0 | 242 | 55 * 10^6 + 98 * 10^4 + 65 * 10^2 + 85 |
michael@0 | 243 | and perform another reduction step |
michael@0 | 244 | - (55 * p) |
michael@0 | 245 | 208 * 10^4 + 65 * 10^2 + 30 |
michael@0 | 246 | There may be a small number of further reductions that could be done at |
michael@0 | 247 | this point, but this is typically done only at the end when converting |
michael@0 | 248 | from floating point to an integer unit representation. |
michael@0 | 249 | |
michael@0 | 250 | FLOATING POINT PRECISION |
michael@0 | 251 | ======================== |
michael@0 | 252 | This section discusses the precision of floating point numbers, which |
michael@0 | 253 | one writing new formulae or a larger bit size should be aware of. The |
michael@0 | 254 | danger is that an intermediate result may be required to store a |
michael@0 | 255 | mantissa larger than 53 bits, which would cause error by rounding off. |
michael@0 | 256 | |
michael@0 | 257 | Note that the tidying with IEEE rounding mode set to round-to-even |
michael@0 | 258 | allows negative numbers, which actually reduces the size of the double |
michael@0 | 259 | mantissa to 23 bits - since it rounds the mantissa to the nearest number |
michael@0 | 260 | modulo 2^24, i.e. roughly between -2^23 and 2^23. |
michael@0 | 261 | A multiplication increases the bit size to 2^46 * n, where n is the number |
michael@0 | 262 | of doubles to store a number. For the 224 bit curve, n = 10. This gives |
michael@0 | 263 | doubles of size 5 * 2^47. Adding two of these doubles gives a result |
michael@0 | 264 | of size 5 * 2^48, which is less than 2^53, so this is safe. |
michael@0 | 265 | Similar analysis can be done for other formulae to ensure numbers remain |
michael@0 | 266 | below 2^53. |
michael@0 | 267 | |
michael@0 | 268 | Extended-Precision Floating Point |
michael@0 | 269 | --------------------------------- |
michael@0 | 270 | Some platforms, notably x86 Linux, may use an extended-precision floating |
michael@0 | 271 | point representation that has a 64-bit mantissa. [6] Although this |
michael@0 | 272 | implementation is optimized for the IEEE standard 53-bit mantissa, |
michael@0 | 273 | it should work with the 64-bit mantissa. A check is done at run-time |
michael@0 | 274 | in the function ec_set_fp_precision that detects if the precision is |
michael@0 | 275 | greater than 53 bits, and runs code for the 64-bit mantissa accordingly. |
michael@0 | 276 | |
michael@0 | 277 | REFERENCES |
michael@0 | 278 | ========== |
michael@0 | 279 | [1] Certicom Corp., "SEC 2: Recommended Elliptic Curve Domain Parameters", Sept. 20, 2000. www.secg.org |
michael@0 | 280 | [2] Sun Microsystems Inc. UltraSPARC III Cu User's Manual, Version 1.0, May 2002, Table 4.4 |
michael@0 | 281 | [3] H. Cohen, A. Miyaji, and T. Ono, "Efficient Elliptic Curve Exponentiation Using Mixed Coordinates". |
michael@0 | 282 | [4] Henk C.A. van Tilborg, Generalized Mersenne Prime. http://www.win.tue.nl/~henkvt/GenMersenne.pdf |
michael@0 | 283 | [5] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2. |
michael@0 | 284 | [6] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2 Notes. |