1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 1.2 +++ b/security/nss/lib/freebl/ecl/README.FP Wed Dec 31 06:09:35 2014 +0100 1.3 @@ -0,0 +1,284 @@ 1.4 +This Source Code Form is subject to the terms of the Mozilla Public 1.5 +License, v. 2.0. If a copy of the MPL was not distributed with this 1.6 +file, You can obtain one at http://mozilla.org/MPL/2.0/. 1.7 + 1.8 +The ECL exposes routines for constructing and converting curve 1.9 +parameters for internal use. 1.10 + 1.11 +The floating point code of the ECL provides algorithms for performing 1.12 +elliptic-curve point multiplications in floating point. 1.13 + 1.14 +The point multiplication algorithms perform calculations almost 1.15 +exclusively in floating point for efficiency, but have the same 1.16 +(integer) interface as the ECL for compatibility and to be easily 1.17 +wired-in to the ECL. Please see README file (not this README.FP file) 1.18 +for information on wiring-in. 1.19 + 1.20 +This has been implemented for 3 curves as specified in [1]: 1.21 + secp160r1 1.22 + secp192r1 1.23 + secp224r1 1.24 + 1.25 +RATIONALE 1.26 +========= 1.27 +Calculations are done in the floating-point unit (FPU) since it 1.28 +gives better performance on the UltraSPARC III chips. This is 1.29 +because the FPU allows for faster multiplication than the integer unit. 1.30 +The integer unit has a longer multiplication instruction latency, and 1.31 +does not allow full pipelining, as described in [2]. 1.32 +Since performance is an important selling feature of Elliptic Curve 1.33 +Cryptography (ECC), this implementation was created. 1.34 + 1.35 +DATA REPRESENTATION 1.36 +=================== 1.37 +Data is primarily represented in an array of double-precision floating 1.38 +point numbers. Generally, each array element has 24 bits of precision 1.39 +(i.e. be x * 2^y, where x is an integer of at most 24 bits, y some positive 1.40 +integer), although the actual implementation details are more complicated. 1.41 + 1.42 +e.g. a way to store an 80 bit number might be: 1.43 +double p[4] = { 632613 * 2^0, 329841 * 2^24, 9961 * 2^48, 51 * 2^64 }; 1.44 +See section ARITHMETIC OPERATIONS for more details. 1.45 + 1.46 +This implementation assumes that the floating-point unit rounding mode 1.47 +is round-to-even as specified in IEEE 754 1.48 +(as opposed to chopping, rounding up, or rounding down). 1.49 +When subtracting integers represented as arrays of floating point 1.50 +numbers, some coefficients (array elements) may become negative. 1.51 +This effectively gives an extra bit of precision that is important 1.52 +for correctness in some cases. 1.53 + 1.54 +The described number presentation limits the size of integers to 1023 bits. 1.55 +This is due to an upper bound of 1024 for the exponent of a double precision 1.56 +floating point number as specified in IEEE-754. 1.57 +However, this is acceptable for ECC key sizes of the foreseeable future. 1.58 + 1.59 +DATA STRUCTURES 1.60 +=============== 1.61 +For more information on coordinate representations, see [3]. 1.62 + 1.63 +ecfp_aff_pt 1.64 +----------- 1.65 +Affine EC Point Representation. This is the basic 1.66 +representation (x, y) of an elliptic curve point. 1.67 + 1.68 +ecfp_jac_pt 1.69 +----------- 1.70 +Jacobian EC Point. This stores a point as (X, Y, Z), where 1.71 +the affine point corresponds to (X/Z^2, Y/Z^3). This allows 1.72 +for fewer inversions in calculations. 1.73 + 1.74 +ecfp_chud_pt 1.75 +------------ 1.76 +Chudnovsky Jacobian Point. This representation stores a point 1.77 +as (X, Y, Z, Z^2, Z^3), the same as a Jacobian representation 1.78 +but also storing Z^2 and Z^3 for faster point additions. 1.79 + 1.80 +ecfp_jm_pt 1.81 +---------- 1.82 +Modified Jacobian Point. This representation stores a point 1.83 +as (X, Y, Z, a*Z^4), the same as Jacobian representation but 1.84 +also storing a*Z^4 for faster point doublings. Here "a" represents 1.85 +the linear coefficient of x defining the curve. 1.86 + 1.87 +EC_group_fp 1.88 +----------- 1.89 +Stores information on the elliptic curve group for floating 1.90 +point calculations. Contains curve specific information, as 1.91 +well as function pointers to routines, allowing different 1.92 +optimizations to be easily wired in. 1.93 +This should be made accessible from an ECGroup for the floating 1.94 +point implementations of point multiplication. 1.95 + 1.96 +POINT MULTIPLICATION ALGORITHMS 1.97 +=============================== 1.98 +Elliptic Curve Point multiplication can be done at a higher level orthogonal 1.99 +to the implementation of point additions and point doublings. There 1.100 +are a variety of algorithms that can be used. 1.101 + 1.102 +The following algorithms have been implemented: 1.103 + 1.104 +4-bit Window (Jacobian Coordinates) 1.105 +Double & Add (Jacobian & Affine Coordinates) 1.106 +5-bit Non-Adjacent Form (Modified Jacobian & Chudnovsky Jacobian) 1.107 + 1.108 +Currently, the fastest algorithm for multiplying a generic point 1.109 +is the 5-bit Non-Adjacent Form. 1.110 + 1.111 +See comments in ecp_fp.c for more details and references. 1.112 + 1.113 +SOURCE / HEADER FILES 1.114 +===================== 1.115 + 1.116 +ecp_fp.c 1.117 +-------- 1.118 +Main source file for floating point calculations. Contains routines 1.119 +to convert from floating-point to integer (mp_int format), point 1.120 +multiplication algorithms, and several other routines. 1.121 + 1.122 +ecp_fp.h 1.123 +-------- 1.124 +Main header file. Contains most constants used and function prototypes. 1.125 + 1.126 +ecp_fp[160, 192, 224].c 1.127 +----------------------- 1.128 +Source files for specific curves. Contains curve specific code such 1.129 +as specialized reduction based on the field defining prime. Contains 1.130 +code wiring-in different algorithms and optimizations. 1.131 + 1.132 +ecp_fpinc.c 1.133 +----------- 1.134 +Source file that is included by ecp_fp[160, 192, 224].c. This generates 1.135 +functions with different preprocessor-defined names and loop iterations, 1.136 +allowing for static linking and strong compiler optimizations without 1.137 +code duplication. 1.138 + 1.139 +TESTING 1.140 +======= 1.141 +The test suite can be found in ecl/tests/ecp_fpt. This tests and gets 1.142 +timings of the different algorithms for the curves implemented. 1.143 + 1.144 +ARITHMETIC OPERATIONS 1.145 +--------------------- 1.146 +The primary operations in ECC over the prime fields are modular arithmetic: 1.147 +i.e. n * m (mod p) and n + m (mod p). In this implementation, multiplication, 1.148 +addition, and reduction are implemented as separate functions. This 1.149 +enables computation of formulae with fewer reductions, e.g. 1.150 +(a * b) + (c * d) (mod p) rather than: 1.151 +((a * b) (mod p)) + ((c * d) (mod p)) (mod p) 1.152 +This takes advantage of the fact that the double precision mantissa in 1.153 +floating point can hold numbers up to 2^53, i.e. it has some leeway to 1.154 +store larger intermediate numbers. See further detail in the section on 1.155 +FLOATING POINT PRECISION. 1.156 + 1.157 +Multiplication 1.158 +-------------- 1.159 +Multiplication is implemented in a standard polynomial multiplication 1.160 +fashion. The terms in opposite factors are pairwise multiplied and 1.161 +added together appropriately. Note that the result requires twice 1.162 +as many doubles for storage, as the bit size of the product is twice 1.163 +that of the multiplicands. 1.164 +e.g. suppose we have double n[3], m[3], r[6], and want to calculate r = n * m 1.165 +r[0] = n[0] * m[0] 1.166 +r[1] = n[0] * m[1] + n[1] * m[0] 1.167 +r[2] = n[0] * m[2] + n[1] * m[1] + n[2] * m[0] 1.168 +r[3] = n[1] * m[2] + n[2] * m[1] 1.169 +r[4] = n[2] * m[2] 1.170 +r[5] = 0 (This is used later to hold spillover from r[4], see tidying in 1.171 +the reduction section.) 1.172 + 1.173 +Addition 1.174 +-------- 1.175 +Addition is done term by term. The only caveat is to be careful with 1.176 +the number of terms that need to be added. When adding results of 1.177 +multiplication (before reduction), twice as many terms need to be added 1.178 +together. This is done in the addLong function. 1.179 +e.g. for double n[4], m[4], r[4]: r = n + m 1.180 +r[0] = n[0] + m[0] 1.181 +r[1] = n[1] + m[1] 1.182 +r[2] = n[2] + m[2] 1.183 +r[3] = n[3] + m[3] 1.184 + 1.185 +Modular Reduction 1.186 +----------------- 1.187 +For the curves implemented, reduction is possible by fast reduction 1.188 +for Generalized Mersenne Primes, as described in [4]. For the 1.189 +floating point implementation, a significant step of the reduction 1.190 +process is tidying: that is, the propagation of carry bits from 1.191 +low-order to high-order coefficients to reduce the precision of each 1.192 +coefficient to 24 bits. 1.193 +This is done by adding and then subtracting 1.194 +ecfp_alpha, a large floating point number that induces precision roundoff. 1.195 +See [5] for more details on tidying using floating point arithmetic. 1.196 +e.g. suppose we have r = 961838 * 2^24 + 519308 1.197 +then if we set alpha = 3 * 2^51 * 2^24, 1.198 +FP(FP(r + alpha) - alpha) = 961838 * 2^24, because the precision for 1.199 +the intermediate results is limited. Our values of alpha are chosen 1.200 +to truncate to a desired number of bits. 1.201 + 1.202 +The reduction is then performed as in [4], adding multiples of prime p. 1.203 +e.g. suppose we are working over a polynomial of 10^2. Take the number 1.204 +2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95, stored in 5 elements 1.205 +for coefficients of 10^0, 10^2, ..., 10^8. 1.206 +We wish to reduce modulo p = 10^6 - 2 * 10^4 + 1 1.207 +We can subtract off from the higher terms 1.208 +(2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95) - (2 * 10^2) * (10^6 - 2 * 10^4 + 1) 1.209 += 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 1.210 += 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 - (15) * (10^6 - 2 * 10^4 + 1) 1.211 += 83 * 10^4 + 21 * 10^2 + 80 1.212 + 1.213 +Integrated Example 1.214 +------------------ 1.215 +This example shows how multiplication, addition, tidying, and reduction 1.216 +work together in our modular arithmetic. This is simplified from the 1.217 +actual implementation, but should convey the main concepts. 1.218 +Working over polynomials of 10^2 and with p as in the prior example, 1.219 +Let a = 16 * 10^4 + 53 * 10^2 + 33 1.220 +let b = 81 * 10^4 + 31 * 10^2 + 49 1.221 +let c = 22 * 10^4 + 0 * 10^2 + 95 1.222 +And suppose we want to compute a * b + c mod p. 1.223 +We first do a multiplication: then a * b = 1.224 +0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5100 * 10^4 + 3620 * 10^2 + 1617 1.225 +Then we add in c before doing reduction, allowing us to get a * b + c = 1.226 +0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 1.227 +We then perform a tidying on the upper half of the terms: 1.228 +0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 1.229 +0 * 10^10 + (1296 + 47) * 10^8 + 89 * 10^6 1.230 +0 * 10^10 + 1343 * 10^8 + 89 * 10^6 1.231 +13 * 10^10 + 43 * 10^8 + 89 * 10^6 1.232 +which then gives us 1.233 +13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 1.234 +we then reduce modulo p similar to the reduction example above: 1.235 +13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 1.236 + - (13 * 10^4 * p) 1.237 +69 * 10^8 + 89 * 10^6 + 5109 * 10^4 + 3620 * 10^2 + 1712 1.238 + - (69 * 10^2 * p) 1.239 +227 * 10^6 + 5109 * 10^4 + 3551 * 10^2 + 1712 1.240 + - (227 * p) 1.241 +5563 * 10^4 + 3551 * 10^2 + 1485 1.242 +finally, we do tidying to get the precision of each term down to 2 digits 1.243 +5563 * 10^4 + 3565 * 10^2 + 85 1.244 +5598 * 10^4 + 65 * 10^2 + 85 1.245 +55 * 10^6 + 98 * 10^4 + 65 * 10^2 + 85 1.246 +and perform another reduction step 1.247 + - (55 * p) 1.248 +208 * 10^4 + 65 * 10^2 + 30 1.249 +There may be a small number of further reductions that could be done at 1.250 +this point, but this is typically done only at the end when converting 1.251 +from floating point to an integer unit representation. 1.252 + 1.253 +FLOATING POINT PRECISION 1.254 +======================== 1.255 +This section discusses the precision of floating point numbers, which 1.256 +one writing new formulae or a larger bit size should be aware of. The 1.257 +danger is that an intermediate result may be required to store a 1.258 +mantissa larger than 53 bits, which would cause error by rounding off. 1.259 + 1.260 +Note that the tidying with IEEE rounding mode set to round-to-even 1.261 +allows negative numbers, which actually reduces the size of the double 1.262 +mantissa to 23 bits - since it rounds the mantissa to the nearest number 1.263 +modulo 2^24, i.e. roughly between -2^23 and 2^23. 1.264 +A multiplication increases the bit size to 2^46 * n, where n is the number 1.265 +of doubles to store a number. For the 224 bit curve, n = 10. This gives 1.266 +doubles of size 5 * 2^47. Adding two of these doubles gives a result 1.267 +of size 5 * 2^48, which is less than 2^53, so this is safe. 1.268 +Similar analysis can be done for other formulae to ensure numbers remain 1.269 +below 2^53. 1.270 + 1.271 +Extended-Precision Floating Point 1.272 +--------------------------------- 1.273 +Some platforms, notably x86 Linux, may use an extended-precision floating 1.274 +point representation that has a 64-bit mantissa. [6] Although this 1.275 +implementation is optimized for the IEEE standard 53-bit mantissa, 1.276 +it should work with the 64-bit mantissa. A check is done at run-time 1.277 +in the function ec_set_fp_precision that detects if the precision is 1.278 +greater than 53 bits, and runs code for the 64-bit mantissa accordingly. 1.279 + 1.280 +REFERENCES 1.281 +========== 1.282 +[1] Certicom Corp., "SEC 2: Recommended Elliptic Curve Domain Parameters", Sept. 20, 2000. www.secg.org 1.283 +[2] Sun Microsystems Inc. UltraSPARC III Cu User's Manual, Version 1.0, May 2002, Table 4.4 1.284 +[3] H. Cohen, A. Miyaji, and T. Ono, "Efficient Elliptic Curve Exponentiation Using Mixed Coordinates". 1.285 +[4] Henk C.A. van Tilborg, Generalized Mersenne Prime. http://www.win.tue.nl/~henkvt/GenMersenne.pdf 1.286 +[5] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2. 1.287 +[6] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2 Notes.