Algorithms: FFT-I

The idea of the FFT is actually attributable to Gauss, as if we should be surprised by this, but the computational potential was not realized until the mid nineteen sixties, when the FFT was rediscovered.

The basic notion is to perform the Fourier transform recursively to cut down the complexity or computation step number of the process. The algorithm is far from obvious if we restrict ourselves to expressing indices of arrays in base ten, but jumps out at us if we label the data array using base two number representations.

Consider a set of data

    \[f[ \, ]=\{ f_0, f_1, f_2, f_3, \cdots, f_{N} \}\]

The discrete Fourier transform will be a sum of N terms, each a product

    \[g_k=\sum_{n=0}^{N-1} f_n \, e^{-2\pi i {nk\over N}}\]

FFT develops a recursion for doing the sum in M=\ln_2 (N) steps, each a product, for a dataset with N=2^M. Label the indices in base two

    \[n=n_{N-1} \, 2^{N-1} + n_{N-2} \, 2^{N-2} +\cdots + n_2 \, 2^2 + n_1 \, 2^1 +n_0 \, 2^0\]

    \[\rightarrow (n_{N-1}, n_{N-2},\cdots, n_1, n_0)\]

consistent with how we read our numbers, largest exponent (most significant) digits leftmost in the string. In binary each n_i is zero or one. Then if we call

    \[\omega_k=e^{-2\pi i {k\over N}}\]

the Discrete Fourier transform is

    \[g_k=\sum_{n_0, n_1, \cdots, n_{N-1}=0,1} f_{(n_{N-1}, \cdots, n_1, n_0)} (\omega_k^{2^0})^{n_0} \, (\omega_k^{2^1})^{n_1} \, \cdots (\omega_k^{2^{N-1}})^{n_{N-1}}\]

This formula can be expanded from the inside out; for example consider the case of N=2^3=8 data \{ f_0, f_1, f_2, f_3, f_4, f_5, f_6, f_7\}, then

    \[g_k=\sum_{n_0, n_1,n_2=0,1} f_{(n_2, n_1, n_0)} (\omega_k^{2^0})^{n_0} \, (\omega_k^{2^1})^{n_1} \, (\omega_k^{2^2})^{n_2}\]

\bullet expand the n_2=0,1 sum:

    \[g_k=\sum_{n_0, n_1=0,1} [f_{(0, n_1, n_0)} + (\omega_k^{2^2}) \, f_{(1, n_1, n_0)}] \,  (\omega_k^{2^0})^{n_0} \, (\omega_k^{2^1})^{n_1}\]

\bullet expand the n_1=0,1 sum;

    \[g_k=\sum_{n_0=0,1} \big( ([f_{(0, 0, n_0)} + (\omega_k^{2^2}) \, f_{(1, 0, n_0)}])+(\omega_k^{2^1}) \, ([f_{(0, 1, n_0)}\]

    \[+ (\omega_k^{2^2}) \, f_{(1, 1, n_0)}])\big) \,  (\omega_k^{2^0})^{n_0}\]

\bullet expand the n_0=0,1 sum;

    \[g_k=\big( ([f_{(0, 0, 0)} + (\omega_k^{2^2}) \, f_{(1, 0, 0)}])+(\omega_k^{2^1}) \, ([f_{(0, 1, 0)} + (\omega_k^{2^2}) \, f_{(1, 1, 0)}])\big)\]

    \[+(\omega_k^{2^0}) \, \big( ([f_{(0, 0, 1)} + (\omega_k^{2^2}) \, f_{(1, 0, 1)}])+(\omega_k^{2^1}) \, ([f_{(0, 1, 1)} + (\omega_k^{2^2}) \, f_{(1, 1, 1)}])\big)\]

Examine the order in which f_n are added into the sum; they are added in pairs: we rearrange them in reverse bit-order, take the first two and combine them, then the next two and combine them, then the next two, then the next two to create four objects

    \[z_0^0= f_{(0, 0, 0)} + (\omega_k^{2^2}) \, f_{(1, 0, 0)}, \quad z_1^0=f_{(0, 1, 0)} + (\omega_k^{2^2}) \, f_{(1, 1, 0)}\]

    \[z_2^0=f_{(0, 0, 1)} + (\omega_k^{2^2}) \, f_{(1, 0, 1)}, \quad z_3^0= f_{(0, 1, 1)} + (\omega_k^{2^2}) \, f_{(1, 1, 1)}\]

each term has the n_2 slot on the two summands be 0,1 and n_1,n_0 slots run over the four possibilities that give us the four terms. Now we have four terms that can be binary-labeled, and we repeat the process with these four objects z_0,z_1,z_2,z_3 by making

    \[z_0^1=z_0^0+(\omega_k^{2^1})z_1^0, \qquad z_1^1=z_2^0+(\omega_k^{2^1})z_3^0\]

finally

    \[g_k=z_0^2=z_0^1+(\omega_k^{2^0})z_1^1\]

You can see that the order in which the f_i are added in is the reverse of their binary ordering, this is called reverse bit ordering, read the binary number from right to left rather than from left to right

    \[\begin{array}{lll} \mbox{Decimal order} & \mbox{reverse bit order} & \mbox{bit order} \\ \hline 0 & (0,0,0) & (0,0,0)\\ 4 & (0,0,1) & (1,0,0)\\ 2 & (0,1,0) & (0,1,0)\\ 6 & (0,1,1) & (1,1,0)\\ 1 & (1,0,0) & (0,0,1)\\ 5 & (1,0,1) & (1,0,1)\\ 3 & (1,1,0) & (0,1,1)\\ 7 & (1,1,1) & (1,1,1)\\ \end{array}\]

If the original vector of data is first reverse bit ordered, we can perform an iterative summation, looping only \ln_2 (N) rather than N times, each loop containing one addition and one multiplication. This means far fewer computations than straightforward application of the defining formula for Fourier transformation in terms of a sum up to N.
This is why the algorithm is faster.

The looping and assignment can be illustrated in the following way

    \[\left(\begin{array}{c} z_0^0=f_0\\ z_1^0=f_4\\ z_2^0=f_2\\ z_3^0=f_6\\ z_4^0=f_1\\ z_5^0=f_5\\ z_6^0=f_3\\ z_7^0=f_7\\ \end{array}\right)\rightarrow \left(\begin{array}{c} z_0^1=z_0^0+\omega_k^4 z_1^0\\ z_1^1=z_2^0+\omega_k^4 z_3^0\\ z_2^1=z_4^0+\omega_k^4 z_5^0\\ z_3^1=z_6^0+\omega_k^4 z_7^0\\ z_4^1=0\\ z_5^1=0\\ z_6^1=0\\ z_7^1=0\\ \end{array}\right)\rightarrow \left(\begin{array}{c} z_0^2=z_0^1+\omega_k^2 z_1^1\\ z_1^2=z_2^1+\omega_k^2 z_3^1\\ z_2^2=0\\ z_3^2=0\\ z_4^2=0\\ z_5^2=0\\ z_6^2=0\\ z_7^2=0\\ \end{array}\right)\] \[\rightarrow \left(\begin{array}{c} z_0^3=z_0^2+\omega_k^1 z_1^2\\ z_1^3=0\\ z_2^3=0\\ z_3^3=0\\ z_4^3=0\\ z_5^3=0\\ z_6^3=0\\ z_7^3=0\\ \end{array}\right)\]

in which

    \[g_k=z_0^3\]

and the subscripts on the z_n label the iteration number, not an exponent. Each loop runs half as many cycles as its predecessor.

Let’s first code this up in C, the lowest level language that we are using in these posts, therefore the fastest but most difficult to use.

#include<stdio.h>
#include<math.h>
#define PI 3.1415926
#define N 64
#define M 6        /* this is log_2(N) */

   
int i1, i2, i3;
int i, j, k, l, m;
double f_imag[N],f_real[N], theta, ctheta, stheta;
int l1, l2;
double tmp1, tmp2;

main()
{

    for(i=0;i<N;i++){
f_real[i]=cos(PI*(double)i/64.0); 
f_imag[i]=0.0;}

/*     bit reversal */

    l = 1;
    for (k = 0; k <N-1; k++) {
	if (k < l) {
	    tmp1 = f_real[l-1];
	    tmp2 = f_imag[l-1];
	    f_real[l-1] = f_real[k];
	    f_imag[l-1] = f_imag[k];
	    f_real[k] = tmp1;
	    f_imag[k] = tmp2;
	}
	j = N/2;
	while(j < l) {
	    l = l - j;
	    j = j/2;
	}
	l = l + j;
    }

/*     actual FFT routine */

    l2 = 0;
    i1 = M;
    for (l = 0; l < i1; l++) {
	theta = 0.0;
	l1 = l2;
	l2 = 2*l1 ;   /* bit shift to right = multiply by two */
	for (k = 0; k <i2; k++) {
	    ctheta = cos(theta);
	    stheta = -sin(theta);
	    theta = theta + PI / (double)l1;
	    i3 = l2;
	    for (j = k; i3 < 0 ? j > N : j < N; j += i3) {
		i = j + l1;
		tmp1 = f_real[i] * ctheta - f_imag[i] * stheta;
		tmp2 = f_real[i ] * stheta + f_imag[i] * ctheta;
		f_real[i ] = f_real[j ] - tmp1;
		f_real[j ] = f_real[j ] + tmp1;
		f_imag[i ] = f_imag[j ] - tmp2;
		f_imag[j ] = f_imag[j ] + tmp2;

	    }
	}
    }
    /* the end */
    for(i=0;i<N;i++)
      printf("%d\t%f\n", i, f_real[i]);
} 

Next time: implemented in julia and R, canned libraries.

Home 2.0
error: Content is protected !!