Tue, 06 Jan 2015 21:39:09 +0100
Conditionally force memory storage according to privacy.thirdparty.isolate;
This solves Tor bug #9701, complying with disk avoidance documented in
https://www.torproject.org/projects/torbrowser/design/#disk-avoidance.
michael@0 | 1 | If you are reading this, it means you think you may be interested in using the SIMD extensions in kissfft |
michael@0 | 2 | to do 4 *separate* FFTs at once. |
michael@0 | 3 | |
michael@0 | 4 | Beware! Beyond here there be dragons! |
michael@0 | 5 | |
michael@0 | 6 | This API is not easy to use, is not well documented, and breaks the KISS principle. |
michael@0 | 7 | |
michael@0 | 8 | |
michael@0 | 9 | Still reading? Okay, you may get rewarded for your patience with a considerable speedup |
michael@0 | 10 | (2-3x) on intel x86 machines with SSE if you are willing to jump through some hoops. |
michael@0 | 11 | |
michael@0 | 12 | The basic idea is to use the packed 4 float __m128 data type as a scalar element. |
michael@0 | 13 | This means that the format is pretty convoluted. It performs 4 FFTs per fft call on signals A,B,C,D. |
michael@0 | 14 | |
michael@0 | 15 | For complex data, the data is interlaced as follows: |
michael@0 | 16 | rA0,rB0,rC0,rD0, iA0,iB0,iC0,iD0, rA1,rB1,rC1,rD1, iA1,iB1,iC1,iD1 ... |
michael@0 | 17 | where "rA0" is the real part of the zeroth sample for signal A |
michael@0 | 18 | |
michael@0 | 19 | Real-only data is laid out: |
michael@0 | 20 | rA0,rB0,rC0,rD0, rA1,rB1,rC1,rD1, ... |
michael@0 | 21 | |
michael@0 | 22 | Compile with gcc flags something like |
michael@0 | 23 | -O3 -mpreferred-stack-boundary=4 -DUSE_SIMD=1 -msse |
michael@0 | 24 | |
michael@0 | 25 | Be aware of SIMD alignment. This is the most likely cause of segfaults. |
michael@0 | 26 | The code within kissfft uses scratch variables on the stack. |
michael@0 | 27 | With SIMD, these must have addresses on 16 byte boundaries. |
michael@0 | 28 | Search on "SIMD alignment" for more info. |
michael@0 | 29 | |
michael@0 | 30 | |
michael@0 | 31 | |
michael@0 | 32 | Robin at Divide Concept was kind enough to share his code for formatting to/from the SIMD kissfft. |
michael@0 | 33 | I have not run it -- use it at your own risk. It appears to do 4xN and Nx4 transpositions |
michael@0 | 34 | (out of place). |
michael@0 | 35 | |
michael@0 | 36 | void SSETools::pack128(float* target, float* source, unsigned long size128) |
michael@0 | 37 | { |
michael@0 | 38 | __m128* pDest = (__m128*)target; |
michael@0 | 39 | __m128* pDestEnd = pDest+size128; |
michael@0 | 40 | float* source0=source; |
michael@0 | 41 | float* source1=source0+size128; |
michael@0 | 42 | float* source2=source1+size128; |
michael@0 | 43 | float* source3=source2+size128; |
michael@0 | 44 | |
michael@0 | 45 | while(pDest<pDestEnd) |
michael@0 | 46 | { |
michael@0 | 47 | *pDest=_mm_set_ps(*source3,*source2,*source1,*source0); |
michael@0 | 48 | source0++; |
michael@0 | 49 | source1++; |
michael@0 | 50 | source2++; |
michael@0 | 51 | source3++; |
michael@0 | 52 | pDest++; |
michael@0 | 53 | } |
michael@0 | 54 | } |
michael@0 | 55 | |
michael@0 | 56 | void SSETools::unpack128(float* target, float* source, unsigned long size128) |
michael@0 | 57 | { |
michael@0 | 58 | |
michael@0 | 59 | float* pSrc = source; |
michael@0 | 60 | float* pSrcEnd = pSrc+size128*4; |
michael@0 | 61 | float* target0=target; |
michael@0 | 62 | float* target1=target0+size128; |
michael@0 | 63 | float* target2=target1+size128; |
michael@0 | 64 | float* target3=target2+size128; |
michael@0 | 65 | |
michael@0 | 66 | while(pSrc<pSrcEnd) |
michael@0 | 67 | { |
michael@0 | 68 | *target0=pSrc[0]; |
michael@0 | 69 | *target1=pSrc[1]; |
michael@0 | 70 | *target2=pSrc[2]; |
michael@0 | 71 | *target3=pSrc[3]; |
michael@0 | 72 | target0++; |
michael@0 | 73 | target1++; |
michael@0 | 74 | target2++; |
michael@0 | 75 | target3++; |
michael@0 | 76 | pSrc+=4; |
michael@0 | 77 | } |
michael@0 | 78 | } |