security/nss/lib/freebl/ecl/README.FP

changeset 0
6474c204b198
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/security/nss/lib/freebl/ecl/README.FP	Wed Dec 31 06:09:35 2014 +0100
     1.3 @@ -0,0 +1,284 @@
     1.4 +This Source Code Form is subject to the terms of the Mozilla Public
     1.5 +License, v. 2.0. If a copy of the MPL was not distributed with this
     1.6 +file, You can obtain one at http://mozilla.org/MPL/2.0/.
     1.7 +
     1.8 +The ECL exposes routines for constructing and converting curve
     1.9 +parameters for internal use.
    1.10 +
    1.11 +The floating point code of the ECL provides algorithms for performing
    1.12 +elliptic-curve point multiplications in floating point.
    1.13 +
    1.14 +The point multiplication algorithms perform calculations almost
    1.15 +exclusively in floating point for efficiency, but have the same
    1.16 +(integer) interface as the ECL for compatibility and to be easily
    1.17 +wired-in to the ECL.  Please see README file (not this README.FP file)
    1.18 +for information on wiring-in.
    1.19 +
    1.20 +This has been implemented for 3 curves as specified in [1]:
    1.21 +	secp160r1
    1.22 +	secp192r1
    1.23 +	secp224r1
    1.24 +
    1.25 +RATIONALE
    1.26 +=========
    1.27 +Calculations are done in the  floating-point unit (FPU) since it
    1.28 +gives better performance on the UltraSPARC III chips.  This is
    1.29 +because the FPU allows for faster multiplication than the integer unit.
    1.30 +The integer unit has a longer multiplication instruction latency, and
    1.31 +does not allow full pipelining, as described in [2].
    1.32 +Since performance is an important selling feature of Elliptic Curve
    1.33 +Cryptography (ECC), this implementation was created.
    1.34 +
    1.35 +DATA REPRESENTATION
    1.36 +===================
    1.37 +Data is primarily represented in an array of double-precision floating
    1.38 +point numbers.  Generally, each array element has 24 bits of precision
    1.39 +(i.e. be x * 2^y, where x is an integer of at most 24 bits, y some positive
    1.40 +integer), although the actual implementation details are more complicated.
    1.41 +
    1.42 +e.g. a way to store an 80 bit number might be:
    1.43 +double p[4] = { 632613 * 2^0, 329841 * 2^24, 9961 * 2^48, 51 * 2^64 };
    1.44 +See section ARITHMETIC OPERATIONS for more details.
    1.45 +
    1.46 +This implementation assumes that the floating-point unit rounding mode 
    1.47 +is round-to-even as specified in IEEE 754
    1.48 +(as opposed to chopping, rounding up, or rounding down).
    1.49 +When subtracting integers represented as arrays of floating point 
    1.50 +numbers, some coefficients (array elements) may become negative.
    1.51 +This effectively gives an extra bit of precision that is important
    1.52 +for correctness in some cases.
    1.53 +
    1.54 +The described number presentation limits the size of integers to 1023 bits.
    1.55 +This is due to an upper bound of 1024 for the exponent of a double precision
    1.56 +floating point number as specified in IEEE-754.
    1.57 +However, this is acceptable for ECC key sizes of the foreseeable future.
    1.58 +
    1.59 +DATA STRUCTURES
    1.60 +===============
    1.61 +For more information on coordinate representations, see [3].
    1.62 +
    1.63 +ecfp_aff_pt
    1.64 +-----------
    1.65 +Affine EC Point Representation.  This is the basic 
    1.66 +representation (x, y) of an elliptic curve point.
    1.67 +
    1.68 +ecfp_jac_pt
    1.69 +-----------
    1.70 +Jacobian EC Point.  This stores a point as (X, Y, Z), where
    1.71 +the affine point corresponds to (X/Z^2, Y/Z^3).  This allows
    1.72 +for fewer inversions in calculations.
    1.73 +
    1.74 +ecfp_chud_pt
    1.75 +------------
    1.76 +Chudnovsky Jacobian Point.  This representation stores a point
    1.77 +as (X, Y, Z, Z^2, Z^3), the same as a Jacobian representation
    1.78 +but also storing Z^2 and Z^3 for faster point additions.
    1.79 +
    1.80 +ecfp_jm_pt
    1.81 +----------
    1.82 +Modified Jacobian Point.  This representation stores a point
    1.83 +as (X, Y, Z, a*Z^4), the same as Jacobian representation but
    1.84 +also storing a*Z^4 for faster point doublings.  Here "a" represents
    1.85 +the linear coefficient of x defining the curve.
    1.86 +
    1.87 +EC_group_fp
    1.88 +-----------
    1.89 +Stores information on the elliptic curve group for floating
    1.90 +point calculations.  Contains curve specific information, as
    1.91 +well as function pointers to routines, allowing different
    1.92 +optimizations to be easily wired in.
    1.93 +This should be made accessible from an ECGroup for the floating
    1.94 +point implementations of point multiplication.
    1.95 +
    1.96 +POINT MULTIPLICATION ALGORITHMS
    1.97 +===============================
    1.98 +Elliptic Curve Point multiplication can be done at a higher level orthogonal
    1.99 +to the implementation of point additions and point doublings.  There
   1.100 +are a variety of algorithms that can be used.
   1.101 +
   1.102 +The following algorithms have been implemented:
   1.103 +
   1.104 +4-bit Window (Jacobian Coordinates)
   1.105 +Double & Add (Jacobian & Affine Coordinates)
   1.106 +5-bit Non-Adjacent Form (Modified Jacobian & Chudnovsky Jacobian)
   1.107 +
   1.108 +Currently, the fastest algorithm for multiplying a generic point
   1.109 +is the 5-bit Non-Adjacent Form.
   1.110 +
   1.111 +See comments in ecp_fp.c for more details and references.
   1.112 +
   1.113 +SOURCE / HEADER FILES
   1.114 +=====================
   1.115 +
   1.116 +ecp_fp.c
   1.117 +--------
   1.118 +Main source file for floating point calculations.  Contains routines
   1.119 +to convert from floating-point to integer (mp_int format), point
   1.120 +multiplication algorithms, and several other routines.
   1.121 +
   1.122 +ecp_fp.h
   1.123 +--------
   1.124 +Main header file.  Contains most constants used and function prototypes.
   1.125 +
   1.126 +ecp_fp[160, 192, 224].c
   1.127 +-----------------------
   1.128 +Source files for specific curves.  Contains curve specific code such
   1.129 +as specialized reduction based on the field defining prime.  Contains
   1.130 +code wiring-in different algorithms and optimizations.
   1.131 +
   1.132 +ecp_fpinc.c
   1.133 +-----------
   1.134 +Source file that is included by ecp_fp[160, 192, 224].c.  This generates
   1.135 +functions with different preprocessor-defined names and loop iterations,
   1.136 +allowing for static linking and strong compiler optimizations without
   1.137 +code duplication.
   1.138 +
   1.139 +TESTING
   1.140 +=======
   1.141 +The test suite can be found in ecl/tests/ecp_fpt.  This tests and gets
   1.142 +timings of the different algorithms for the curves implemented.
   1.143 +
   1.144 +ARITHMETIC OPERATIONS
   1.145 +---------------------
   1.146 +The primary operations in ECC over the prime fields are modular arithmetic:
   1.147 +i.e. n * m (mod p) and n + m (mod p).  In this implementation, multiplication,
   1.148 +addition, and reduction are implemented as separate functions.  This
   1.149 +enables computation of formulae with fewer reductions, e.g.
   1.150 +(a * b) + (c * d) (mod p) rather than:
   1.151 +((a * b) (mod p)) + ((c * d) (mod p)) (mod p)
   1.152 +This takes advantage of the fact that the double precision mantissa in
   1.153 +floating point can hold numbers up to 2^53, i.e. it has some leeway to
   1.154 +store larger intermediate numbers.  See further detail in the section on 
   1.155 +FLOATING POINT PRECISION.
   1.156 +
   1.157 +Multiplication
   1.158 +--------------
   1.159 +Multiplication is implemented in a standard polynomial multiplication
   1.160 +fashion.  The terms in opposite factors are pairwise multiplied and
   1.161 +added together appropriately.  Note that the result requires twice 
   1.162 +as many doubles for storage, as the bit size of the product is twice
   1.163 +that of the multiplicands.
   1.164 +e.g. suppose we have double n[3], m[3], r[6], and want to calculate r = n * m
   1.165 +r[0] = n[0] * m[0]
   1.166 +r[1] = n[0] * m[1] + n[1] * m[0]
   1.167 +r[2] = n[0] * m[2] + n[1] * m[1] + n[2] * m[0]
   1.168 +r[3] = n[1] * m[2] + n[2] * m[1]
   1.169 +r[4] = n[2] * m[2]
   1.170 +r[5] = 0 (This is used later to hold spillover from r[4], see tidying in
   1.171 +the reduction section.)
   1.172 +
   1.173 +Addition
   1.174 +--------
   1.175 +Addition is done term by term.  The only caveat is to be careful with
   1.176 +the number of terms that need to be added.  When adding results of
   1.177 +multiplication (before reduction), twice as many terms need to be added
   1.178 +together.  This is done in the addLong function.
   1.179 +e.g. for double n[4], m[4], r[4]:  r = n + m
   1.180 +r[0] = n[0] + m[0]
   1.181 +r[1] = n[1] + m[1]
   1.182 +r[2] = n[2] + m[2]
   1.183 +r[3] = n[3] + m[3]
   1.184 +
   1.185 +Modular Reduction
   1.186 +-----------------
   1.187 +For the curves implemented, reduction is possible by fast reduction
   1.188 +for Generalized Mersenne Primes, as described in [4].  For the 
   1.189 +floating point implementation, a significant step of the reduction 
   1.190 +process is tidying:  that is, the propagation of carry bits from 
   1.191 +low-order to high-order coefficients to reduce the precision of each 
   1.192 +coefficient to 24 bits.
   1.193 +This is done by adding and then subtracting
   1.194 +ecfp_alpha, a large floating point number that induces precision roundoff.
   1.195 +See [5] for more details on tidying using floating point arithmetic.
   1.196 +e.g. suppose we have r = 961838 * 2^24 + 519308
   1.197 +then if we set alpha = 3 * 2^51 * 2^24,
   1.198 +FP(FP(r + alpha) - alpha) = 961838 * 2^24, because the precision for
   1.199 +the intermediate results is limited.  Our values of alpha are chosen
   1.200 +to truncate to a desired number of bits.
   1.201 +
   1.202 +The reduction is then performed as in [4], adding multiples of prime p.
   1.203 +e.g. suppose we are working over a polynomial of 10^2.  Take the number
   1.204 +2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95, stored in 5 elements
   1.205 +for coefficients of 10^0, 10^2, ..., 10^8.
   1.206 +We wish to reduce modulo p = 10^6 - 2 * 10^4 + 1
   1.207 +We can subtract off from the higher terms
   1.208 +(2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95) - (2 * 10^2) * (10^6 - 2 * 10^4 + 1)
   1.209 += 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95
   1.210 += 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 - (15) * (10^6 - 2 * 10^4 + 1)
   1.211 += 83 * 10^4 + 21 * 10^2 + 80
   1.212 +
   1.213 +Integrated Example
   1.214 +------------------
   1.215 +This example shows how multiplication, addition, tidying, and reduction
   1.216 +work together in our modular arithmetic.  This is simplified from the
   1.217 +actual implementation, but should convey the main concepts.  
   1.218 +Working over polynomials of 10^2 and with p as in the prior example,
   1.219 +Let a = 16 * 10^4 + 53 * 10^2 + 33
   1.220 +let b = 81 * 10^4 + 31 * 10^2 + 49
   1.221 +let c = 22 * 10^4 +  0 * 10^2 + 95
   1.222 +And suppose we want to compute a * b + c mod p.  
   1.223 +We first do a multiplication:  then a * b =
   1.224 +0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5100 * 10^4 + 3620 * 10^2 + 1617
   1.225 +Then we add in c before doing reduction, allowing us to get a * b + c =
   1.226 +0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712
   1.227 +We then perform a tidying on the upper half of the terms:
   1.228 +0 * 10^10 + 1296 * 10^8 + 4789 * 10^6
   1.229 +0 * 10^10 + (1296 + 47) * 10^8 + 89 * 10^6
   1.230 +0 * 10^10 + 1343 * 10^8 + 89 * 10^6
   1.231 +13 * 10^10 + 43 * 10^8 + 89 * 10^6
   1.232 +which then gives us
   1.233 +13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712
   1.234 +we then reduce modulo p similar to the reduction example above:
   1.235 +13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712 
   1.236 +	- (13 * 10^4 * p)
   1.237 +69 * 10^8 + 89 * 10^6 + 5109 * 10^4 + 3620 * 10^2 + 1712
   1.238 +	- (69 * 10^2 * p)
   1.239 +227 * 10^6 + 5109 * 10^4 + 3551 * 10^2 + 1712
   1.240 +	- (227 * p)
   1.241 +5563 * 10^4 + 3551 * 10^2 + 1485
   1.242 +finally, we do tidying to get the precision of each term down to 2 digits
   1.243 +5563 * 10^4 + 3565 * 10^2 + 85
   1.244 +5598 * 10^4 + 65 * 10^2 + 85
   1.245 +55 * 10^6 + 98 * 10^4 + 65 * 10^2 + 85
   1.246 +and perform another reduction step
   1.247 +	- (55 * p)
   1.248 +208 * 10^4 + 65 * 10^2 + 30
   1.249 +There may be a small number of further reductions that could be done at
   1.250 +this point, but this is typically done only at the end when converting
   1.251 +from floating point  to an integer unit representation.
   1.252 +
   1.253 +FLOATING POINT PRECISION
   1.254 +========================
   1.255 +This section discusses the precision of floating point numbers, which
   1.256 +one writing new formulae or a larger bit size should be aware of.  The
   1.257 +danger is that an intermediate result may be required to store a
   1.258 +mantissa larger than 53 bits, which would cause error by rounding off.
   1.259 +
   1.260 +Note that the tidying with IEEE rounding mode set to round-to-even
   1.261 +allows negative numbers, which actually reduces the size of the double
   1.262 +mantissa to 23 bits - since it rounds the mantissa to the nearest number 
   1.263 +modulo 2^24, i.e. roughly between -2^23 and 2^23.
   1.264 +A multiplication increases the bit size to 2^46 * n, where n is the number
   1.265 +of doubles to store a number.  For the 224 bit curve, n = 10.  This gives 
   1.266 +doubles of size 5 * 2^47.  Adding two of these doubles gives a result
   1.267 +of size 5 * 2^48, which is less than 2^53, so this is safe.  
   1.268 +Similar analysis can be done for other formulae to ensure numbers remain
   1.269 +below 2^53.
   1.270 +
   1.271 +Extended-Precision Floating Point
   1.272 +---------------------------------
   1.273 +Some platforms, notably x86 Linux, may use an extended-precision floating
   1.274 +point representation that has a 64-bit mantissa.  [6]  Although this
   1.275 +implementation is optimized for the IEEE standard 53-bit mantissa,
   1.276 +it should work with the 64-bit mantissa.  A check is done at run-time
   1.277 +in the function ec_set_fp_precision that detects if the precision is
   1.278 +greater than 53 bits, and runs code for the 64-bit mantissa accordingly.
   1.279 +
   1.280 +REFERENCES
   1.281 +==========
   1.282 +[1] Certicom Corp., "SEC 2: Recommended Elliptic Curve Domain Parameters", Sept. 20, 2000.  www.secg.org
   1.283 +[2] Sun Microsystems Inc.  UltraSPARC III Cu User's Manual, Version 1.0, May 2002, Table 4.4
   1.284 +[3] H. Cohen, A. Miyaji, and T. Ono, "Efficient Elliptic Curve Exponentiation Using Mixed Coordinates".
   1.285 +[4] Henk C.A. van Tilborg, Generalized Mersenne Prime.  http://www.win.tue.nl/~henkvt/GenMersenne.pdf
   1.286 +[5] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2.
   1.287 +[6] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2 Notes.

mercurial