michael@0: Squaring Algorithm
michael@0: 
michael@0: When you are squaring a value, you can take advantage of the fact that
michael@0: half the multiplications performed by the more general multiplication
michael@0: algorithm (see 'mul.txt' for a description) are redundant when the
michael@0: multiplicand equals the multiplier.
michael@0: 
michael@0: In particular, the modified algorithm is:
michael@0: 
michael@0: k = 0
michael@0: for j <- 0 to (#a - 1)
michael@0:   w = c[2*j] + (a[j] ^ 2);
michael@0:   k = w div R
michael@0: 
michael@0:   for i <- j+1 to (#a - 1)
michael@0:     w = (2 * a[j] * a[i]) + k + c[i+j]
michael@0:     c[i+j] = w mod R
michael@0:     k = w div R
michael@0:   endfor
michael@0:   c[i+j] = k;
michael@0:   k = 0;
michael@0: endfor
michael@0: 
michael@0: On the surface, this looks identical to the multiplication algorithm;
michael@0: however, note the following differences:
michael@0: 
michael@0:   - precomputation of the leading term in the outer loop
michael@0: 
michael@0:   - i runs from j+1 instead of from zero
michael@0: 
michael@0:   - doubling of a[i] * a[j] in the inner product
michael@0: 
michael@0: Unfortunately, the construction of the inner product is such that we
michael@0: need more than two digits to represent the inner product, in some
michael@0: cases.  In a C implementation, this means that some gymnastics must be
michael@0: performed in order to handle overflow, for which C has no direct
michael@0: abstraction.  We do this by observing the following:
michael@0: 
michael@0: If we have multiplied a[i] and a[j], and the product is more than half
michael@0: the maximum value expressible in two digits, then doubling this result
michael@0: will overflow into a third digit.  If this occurs, we take note of the
michael@0: overflow, and double it anyway -- C integer arithmetic ignores
michael@0: overflow, so the two digits we get back should still be valid, modulo
michael@0: the overflow.
michael@0: 
michael@0: Having doubled this value, we now have to add in the remainders and
michael@0: the digits already computed by earlier steps.  If we did not overflow
michael@0: in the previous step, we might still cause an overflow here.  That
michael@0: will happen whenever the maximum value expressible in two digits, less
michael@0: the amount we have to add, is greater than the result of the previous
michael@0: step.  Thus, the overflow computation is:
michael@0: 
michael@0: 
michael@0:   u = 0
michael@0:   w = a[i] * a[j]
michael@0: 
michael@0:   if(w > (R - 1)/ 2)
michael@0:     u = 1;
michael@0: 
michael@0:   w = w * 2
michael@0:   v = c[i + j] + k
michael@0: 
michael@0:   if(u == 0 && (R - 1 - v) < w)
michael@0:     u = 1
michael@0: 
michael@0: If there is an overflow, u will be 1, otherwise u will be 0.  The rest
michael@0: of the parameters are the same as they are in the above description.
michael@0: 
michael@0: ------------------------------------------------------------------
michael@0:  This Source Code Form is subject to the terms of the Mozilla Public
michael@0:  # License, v. 2.0. If a copy of the MPL was not distributed with this
michael@0:  # file, You can obtain one at http://mozilla.org/MPL/2.0/.