In this interesting article, Richard mentioned a very simple yet important Amdahl's Law. (http://en.wikipedia.org/wiki/Amdahl's_law#Parallelization)
Below is the formula.
S = T/((1-f)T+fT/N) = 1/(1-f+f/N)
He gave an example that If we can get half the executable code to run in parallel, and the other half running in one processor, what speedup can we expect on a 64–processor system?
Well, that gives a speedup S = 1.97 only!
The basic point here is that we should take the scalar latency optimization seriously when tuning the whole system. However we should not be misled by author to under estimate the importance of parallel as well for it is also important factor for another axis of the system performance measurement - throughput. Without high throughput support, low latency can be screwed while system load is heavy.
Original post of Richard is at http://blogs.sun.com/rchrd/entry/why_scalar_optimization_is_important, however it cannot be accessed or found in WWW any more. It is now collected in book "The Developer's Edge", which is an interesting book for reading.
The Developer's Edge
It is very generous that authors have published this book freely online at http://www.oracle.com/technetwork/server-storage/archive/r11-005-sys-edge-archive-495362.pdf
One of the author Darryl's blog address is http://www.darrylgove.com/ . You can always find the correct download link there.