media/kiss_fft/README.simd

Tue, 06 Jan 2015 21:39:09 +0100

author
Michael Schloh von Bennewitz <michael@schloh.com>
date
Tue, 06 Jan 2015 21:39:09 +0100
branch
TOR_BUG_9701
changeset 8
97036ab72558
permissions
-rw-r--r--

Conditionally force memory storage according to privacy.thirdparty.isolate;
This solves Tor bug #9701, complying with disk avoidance documented in
https://www.torproject.org/projects/torbrowser/design/#disk-avoidance.

michael@0 1 If you are reading this, it means you think you may be interested in using the SIMD extensions in kissfft
michael@0 2 to do 4 *separate* FFTs at once.
michael@0 3
michael@0 4 Beware! Beyond here there be dragons!
michael@0 5
michael@0 6 This API is not easy to use, is not well documented, and breaks the KISS principle.
michael@0 7
michael@0 8
michael@0 9 Still reading? Okay, you may get rewarded for your patience with a considerable speedup
michael@0 10 (2-3x) on intel x86 machines with SSE if you are willing to jump through some hoops.
michael@0 11
michael@0 12 The basic idea is to use the packed 4 float __m128 data type as a scalar element.
michael@0 13 This means that the format is pretty convoluted. It performs 4 FFTs per fft call on signals A,B,C,D.
michael@0 14
michael@0 15 For complex data, the data is interlaced as follows:
michael@0 16 rA0,rB0,rC0,rD0, iA0,iB0,iC0,iD0, rA1,rB1,rC1,rD1, iA1,iB1,iC1,iD1 ...
michael@0 17 where "rA0" is the real part of the zeroth sample for signal A
michael@0 18
michael@0 19 Real-only data is laid out:
michael@0 20 rA0,rB0,rC0,rD0, rA1,rB1,rC1,rD1, ...
michael@0 21
michael@0 22 Compile with gcc flags something like
michael@0 23 -O3 -mpreferred-stack-boundary=4 -DUSE_SIMD=1 -msse
michael@0 24
michael@0 25 Be aware of SIMD alignment. This is the most likely cause of segfaults.
michael@0 26 The code within kissfft uses scratch variables on the stack.
michael@0 27 With SIMD, these must have addresses on 16 byte boundaries.
michael@0 28 Search on "SIMD alignment" for more info.
michael@0 29
michael@0 30
michael@0 31
michael@0 32 Robin at Divide Concept was kind enough to share his code for formatting to/from the SIMD kissfft.
michael@0 33 I have not run it -- use it at your own risk. It appears to do 4xN and Nx4 transpositions
michael@0 34 (out of place).
michael@0 35
michael@0 36 void SSETools::pack128(float* target, float* source, unsigned long size128)
michael@0 37 {
michael@0 38 __m128* pDest = (__m128*)target;
michael@0 39 __m128* pDestEnd = pDest+size128;
michael@0 40 float* source0=source;
michael@0 41 float* source1=source0+size128;
michael@0 42 float* source2=source1+size128;
michael@0 43 float* source3=source2+size128;
michael@0 44
michael@0 45 while(pDest<pDestEnd)
michael@0 46 {
michael@0 47 *pDest=_mm_set_ps(*source3,*source2,*source1,*source0);
michael@0 48 source0++;
michael@0 49 source1++;
michael@0 50 source2++;
michael@0 51 source3++;
michael@0 52 pDest++;
michael@0 53 }
michael@0 54 }
michael@0 55
michael@0 56 void SSETools::unpack128(float* target, float* source, unsigned long size128)
michael@0 57 {
michael@0 58
michael@0 59 float* pSrc = source;
michael@0 60 float* pSrcEnd = pSrc+size128*4;
michael@0 61 float* target0=target;
michael@0 62 float* target1=target0+size128;
michael@0 63 float* target2=target1+size128;
michael@0 64 float* target3=target2+size128;
michael@0 65
michael@0 66 while(pSrc<pSrcEnd)
michael@0 67 {
michael@0 68 *target0=pSrc[0];
michael@0 69 *target1=pSrc[1];
michael@0 70 *target2=pSrc[2];
michael@0 71 *target3=pSrc[3];
michael@0 72 target0++;
michael@0 73 target1++;
michael@0 74 target2++;
michael@0 75 target3++;
michael@0 76 pSrc+=4;
michael@0 77 }
michael@0 78 }

mercurial