[Editor's note: Part 2 shows how to optimize DSP kernels (i.e., inner loops), and how to write fast floating-point and fractional code. Part 4 explains why it is important to optimize “control code,” ...