|
1 This Source Code Form is subject to the terms of the Mozilla Public |
|
2 License, v. 2.0. If a copy of the MPL was not distributed with this |
|
3 file, You can obtain one at http://mozilla.org/MPL/2.0/. |
|
4 |
|
5 The ECL exposes routines for constructing and converting curve |
|
6 parameters for internal use. |
|
7 |
|
8 The floating point code of the ECL provides algorithms for performing |
|
9 elliptic-curve point multiplications in floating point. |
|
10 |
|
11 The point multiplication algorithms perform calculations almost |
|
12 exclusively in floating point for efficiency, but have the same |
|
13 (integer) interface as the ECL for compatibility and to be easily |
|
14 wired-in to the ECL. Please see README file (not this README.FP file) |
|
15 for information on wiring-in. |
|
16 |
|
17 This has been implemented for 3 curves as specified in [1]: |
|
18 secp160r1 |
|
19 secp192r1 |
|
20 secp224r1 |
|
21 |
|
22 RATIONALE |
|
23 ========= |
|
24 Calculations are done in the floating-point unit (FPU) since it |
|
25 gives better performance on the UltraSPARC III chips. This is |
|
26 because the FPU allows for faster multiplication than the integer unit. |
|
27 The integer unit has a longer multiplication instruction latency, and |
|
28 does not allow full pipelining, as described in [2]. |
|
29 Since performance is an important selling feature of Elliptic Curve |
|
30 Cryptography (ECC), this implementation was created. |
|
31 |
|
32 DATA REPRESENTATION |
|
33 =================== |
|
34 Data is primarily represented in an array of double-precision floating |
|
35 point numbers. Generally, each array element has 24 bits of precision |
|
36 (i.e. be x * 2^y, where x is an integer of at most 24 bits, y some positive |
|
37 integer), although the actual implementation details are more complicated. |
|
38 |
|
39 e.g. a way to store an 80 bit number might be: |
|
40 double p[4] = { 632613 * 2^0, 329841 * 2^24, 9961 * 2^48, 51 * 2^64 }; |
|
41 See section ARITHMETIC OPERATIONS for more details. |
|
42 |
|
43 This implementation assumes that the floating-point unit rounding mode |
|
44 is round-to-even as specified in IEEE 754 |
|
45 (as opposed to chopping, rounding up, or rounding down). |
|
46 When subtracting integers represented as arrays of floating point |
|
47 numbers, some coefficients (array elements) may become negative. |
|
48 This effectively gives an extra bit of precision that is important |
|
49 for correctness in some cases. |
|
50 |
|
51 The described number presentation limits the size of integers to 1023 bits. |
|
52 This is due to an upper bound of 1024 for the exponent of a double precision |
|
53 floating point number as specified in IEEE-754. |
|
54 However, this is acceptable for ECC key sizes of the foreseeable future. |
|
55 |
|
56 DATA STRUCTURES |
|
57 =============== |
|
58 For more information on coordinate representations, see [3]. |
|
59 |
|
60 ecfp_aff_pt |
|
61 ----------- |
|
62 Affine EC Point Representation. This is the basic |
|
63 representation (x, y) of an elliptic curve point. |
|
64 |
|
65 ecfp_jac_pt |
|
66 ----------- |
|
67 Jacobian EC Point. This stores a point as (X, Y, Z), where |
|
68 the affine point corresponds to (X/Z^2, Y/Z^3). This allows |
|
69 for fewer inversions in calculations. |
|
70 |
|
71 ecfp_chud_pt |
|
72 ------------ |
|
73 Chudnovsky Jacobian Point. This representation stores a point |
|
74 as (X, Y, Z, Z^2, Z^3), the same as a Jacobian representation |
|
75 but also storing Z^2 and Z^3 for faster point additions. |
|
76 |
|
77 ecfp_jm_pt |
|
78 ---------- |
|
79 Modified Jacobian Point. This representation stores a point |
|
80 as (X, Y, Z, a*Z^4), the same as Jacobian representation but |
|
81 also storing a*Z^4 for faster point doublings. Here "a" represents |
|
82 the linear coefficient of x defining the curve. |
|
83 |
|
84 EC_group_fp |
|
85 ----------- |
|
86 Stores information on the elliptic curve group for floating |
|
87 point calculations. Contains curve specific information, as |
|
88 well as function pointers to routines, allowing different |
|
89 optimizations to be easily wired in. |
|
90 This should be made accessible from an ECGroup for the floating |
|
91 point implementations of point multiplication. |
|
92 |
|
93 POINT MULTIPLICATION ALGORITHMS |
|
94 =============================== |
|
95 Elliptic Curve Point multiplication can be done at a higher level orthogonal |
|
96 to the implementation of point additions and point doublings. There |
|
97 are a variety of algorithms that can be used. |
|
98 |
|
99 The following algorithms have been implemented: |
|
100 |
|
101 4-bit Window (Jacobian Coordinates) |
|
102 Double & Add (Jacobian & Affine Coordinates) |
|
103 5-bit Non-Adjacent Form (Modified Jacobian & Chudnovsky Jacobian) |
|
104 |
|
105 Currently, the fastest algorithm for multiplying a generic point |
|
106 is the 5-bit Non-Adjacent Form. |
|
107 |
|
108 See comments in ecp_fp.c for more details and references. |
|
109 |
|
110 SOURCE / HEADER FILES |
|
111 ===================== |
|
112 |
|
113 ecp_fp.c |
|
114 -------- |
|
115 Main source file for floating point calculations. Contains routines |
|
116 to convert from floating-point to integer (mp_int format), point |
|
117 multiplication algorithms, and several other routines. |
|
118 |
|
119 ecp_fp.h |
|
120 -------- |
|
121 Main header file. Contains most constants used and function prototypes. |
|
122 |
|
123 ecp_fp[160, 192, 224].c |
|
124 ----------------------- |
|
125 Source files for specific curves. Contains curve specific code such |
|
126 as specialized reduction based on the field defining prime. Contains |
|
127 code wiring-in different algorithms and optimizations. |
|
128 |
|
129 ecp_fpinc.c |
|
130 ----------- |
|
131 Source file that is included by ecp_fp[160, 192, 224].c. This generates |
|
132 functions with different preprocessor-defined names and loop iterations, |
|
133 allowing for static linking and strong compiler optimizations without |
|
134 code duplication. |
|
135 |
|
136 TESTING |
|
137 ======= |
|
138 The test suite can be found in ecl/tests/ecp_fpt. This tests and gets |
|
139 timings of the different algorithms for the curves implemented. |
|
140 |
|
141 ARITHMETIC OPERATIONS |
|
142 --------------------- |
|
143 The primary operations in ECC over the prime fields are modular arithmetic: |
|
144 i.e. n * m (mod p) and n + m (mod p). In this implementation, multiplication, |
|
145 addition, and reduction are implemented as separate functions. This |
|
146 enables computation of formulae with fewer reductions, e.g. |
|
147 (a * b) + (c * d) (mod p) rather than: |
|
148 ((a * b) (mod p)) + ((c * d) (mod p)) (mod p) |
|
149 This takes advantage of the fact that the double precision mantissa in |
|
150 floating point can hold numbers up to 2^53, i.e. it has some leeway to |
|
151 store larger intermediate numbers. See further detail in the section on |
|
152 FLOATING POINT PRECISION. |
|
153 |
|
154 Multiplication |
|
155 -------------- |
|
156 Multiplication is implemented in a standard polynomial multiplication |
|
157 fashion. The terms in opposite factors are pairwise multiplied and |
|
158 added together appropriately. Note that the result requires twice |
|
159 as many doubles for storage, as the bit size of the product is twice |
|
160 that of the multiplicands. |
|
161 e.g. suppose we have double n[3], m[3], r[6], and want to calculate r = n * m |
|
162 r[0] = n[0] * m[0] |
|
163 r[1] = n[0] * m[1] + n[1] * m[0] |
|
164 r[2] = n[0] * m[2] + n[1] * m[1] + n[2] * m[0] |
|
165 r[3] = n[1] * m[2] + n[2] * m[1] |
|
166 r[4] = n[2] * m[2] |
|
167 r[5] = 0 (This is used later to hold spillover from r[4], see tidying in |
|
168 the reduction section.) |
|
169 |
|
170 Addition |
|
171 -------- |
|
172 Addition is done term by term. The only caveat is to be careful with |
|
173 the number of terms that need to be added. When adding results of |
|
174 multiplication (before reduction), twice as many terms need to be added |
|
175 together. This is done in the addLong function. |
|
176 e.g. for double n[4], m[4], r[4]: r = n + m |
|
177 r[0] = n[0] + m[0] |
|
178 r[1] = n[1] + m[1] |
|
179 r[2] = n[2] + m[2] |
|
180 r[3] = n[3] + m[3] |
|
181 |
|
182 Modular Reduction |
|
183 ----------------- |
|
184 For the curves implemented, reduction is possible by fast reduction |
|
185 for Generalized Mersenne Primes, as described in [4]. For the |
|
186 floating point implementation, a significant step of the reduction |
|
187 process is tidying: that is, the propagation of carry bits from |
|
188 low-order to high-order coefficients to reduce the precision of each |
|
189 coefficient to 24 bits. |
|
190 This is done by adding and then subtracting |
|
191 ecfp_alpha, a large floating point number that induces precision roundoff. |
|
192 See [5] for more details on tidying using floating point arithmetic. |
|
193 e.g. suppose we have r = 961838 * 2^24 + 519308 |
|
194 then if we set alpha = 3 * 2^51 * 2^24, |
|
195 FP(FP(r + alpha) - alpha) = 961838 * 2^24, because the precision for |
|
196 the intermediate results is limited. Our values of alpha are chosen |
|
197 to truncate to a desired number of bits. |
|
198 |
|
199 The reduction is then performed as in [4], adding multiples of prime p. |
|
200 e.g. suppose we are working over a polynomial of 10^2. Take the number |
|
201 2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95, stored in 5 elements |
|
202 for coefficients of 10^0, 10^2, ..., 10^8. |
|
203 We wish to reduce modulo p = 10^6 - 2 * 10^4 + 1 |
|
204 We can subtract off from the higher terms |
|
205 (2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95) - (2 * 10^2) * (10^6 - 2 * 10^4 + 1) |
|
206 = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 |
|
207 = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 - (15) * (10^6 - 2 * 10^4 + 1) |
|
208 = 83 * 10^4 + 21 * 10^2 + 80 |
|
209 |
|
210 Integrated Example |
|
211 ------------------ |
|
212 This example shows how multiplication, addition, tidying, and reduction |
|
213 work together in our modular arithmetic. This is simplified from the |
|
214 actual implementation, but should convey the main concepts. |
|
215 Working over polynomials of 10^2 and with p as in the prior example, |
|
216 Let a = 16 * 10^4 + 53 * 10^2 + 33 |
|
217 let b = 81 * 10^4 + 31 * 10^2 + 49 |
|
218 let c = 22 * 10^4 + 0 * 10^2 + 95 |
|
219 And suppose we want to compute a * b + c mod p. |
|
220 We first do a multiplication: then a * b = |
|
221 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5100 * 10^4 + 3620 * 10^2 + 1617 |
|
222 Then we add in c before doing reduction, allowing us to get a * b + c = |
|
223 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 |
|
224 We then perform a tidying on the upper half of the terms: |
|
225 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 |
|
226 0 * 10^10 + (1296 + 47) * 10^8 + 89 * 10^6 |
|
227 0 * 10^10 + 1343 * 10^8 + 89 * 10^6 |
|
228 13 * 10^10 + 43 * 10^8 + 89 * 10^6 |
|
229 which then gives us |
|
230 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 |
|
231 we then reduce modulo p similar to the reduction example above: |
|
232 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 |
|
233 - (13 * 10^4 * p) |
|
234 69 * 10^8 + 89 * 10^6 + 5109 * 10^4 + 3620 * 10^2 + 1712 |
|
235 - (69 * 10^2 * p) |
|
236 227 * 10^6 + 5109 * 10^4 + 3551 * 10^2 + 1712 |
|
237 - (227 * p) |
|
238 5563 * 10^4 + 3551 * 10^2 + 1485 |
|
239 finally, we do tidying to get the precision of each term down to 2 digits |
|
240 5563 * 10^4 + 3565 * 10^2 + 85 |
|
241 5598 * 10^4 + 65 * 10^2 + 85 |
|
242 55 * 10^6 + 98 * 10^4 + 65 * 10^2 + 85 |
|
243 and perform another reduction step |
|
244 - (55 * p) |
|
245 208 * 10^4 + 65 * 10^2 + 30 |
|
246 There may be a small number of further reductions that could be done at |
|
247 this point, but this is typically done only at the end when converting |
|
248 from floating point to an integer unit representation. |
|
249 |
|
250 FLOATING POINT PRECISION |
|
251 ======================== |
|
252 This section discusses the precision of floating point numbers, which |
|
253 one writing new formulae or a larger bit size should be aware of. The |
|
254 danger is that an intermediate result may be required to store a |
|
255 mantissa larger than 53 bits, which would cause error by rounding off. |
|
256 |
|
257 Note that the tidying with IEEE rounding mode set to round-to-even |
|
258 allows negative numbers, which actually reduces the size of the double |
|
259 mantissa to 23 bits - since it rounds the mantissa to the nearest number |
|
260 modulo 2^24, i.e. roughly between -2^23 and 2^23. |
|
261 A multiplication increases the bit size to 2^46 * n, where n is the number |
|
262 of doubles to store a number. For the 224 bit curve, n = 10. This gives |
|
263 doubles of size 5 * 2^47. Adding two of these doubles gives a result |
|
264 of size 5 * 2^48, which is less than 2^53, so this is safe. |
|
265 Similar analysis can be done for other formulae to ensure numbers remain |
|
266 below 2^53. |
|
267 |
|
268 Extended-Precision Floating Point |
|
269 --------------------------------- |
|
270 Some platforms, notably x86 Linux, may use an extended-precision floating |
|
271 point representation that has a 64-bit mantissa. [6] Although this |
|
272 implementation is optimized for the IEEE standard 53-bit mantissa, |
|
273 it should work with the 64-bit mantissa. A check is done at run-time |
|
274 in the function ec_set_fp_precision that detects if the precision is |
|
275 greater than 53 bits, and runs code for the 64-bit mantissa accordingly. |
|
276 |
|
277 REFERENCES |
|
278 ========== |
|
279 [1] Certicom Corp., "SEC 2: Recommended Elliptic Curve Domain Parameters", Sept. 20, 2000. www.secg.org |
|
280 [2] Sun Microsystems Inc. UltraSPARC III Cu User's Manual, Version 1.0, May 2002, Table 4.4 |
|
281 [3] H. Cohen, A. Miyaji, and T. Ono, "Efficient Elliptic Curve Exponentiation Using Mixed Coordinates". |
|
282 [4] Henk C.A. van Tilborg, Generalized Mersenne Prime. http://www.win.tue.nl/~henkvt/GenMersenne.pdf |
|
283 [5] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2. |
|
284 [6] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2 Notes. |