Some thoughts on how to parallelize the general Cooley–Tukey FFT algorithm, mainly to benchmark the performance of different libraries like OpenMP, Intel TBB, OpenCL and C++11 threads.