Saturday, October 8, 2011

Sequence processing will always be faster in performance

Consider some code that sums a square array:
[code]   for (row = 0; row < N;, ++row)
      for (col = 0; col < N; ++col)
         sum += A[row][col];[/code] 
Or you can do it the other way round:
[code]   for (col = 0; col < N; ++col)
      for (row = 0; row < N; ++row)
         sum += A[row][col];[/code]
So does it matter? Indeed it does!
In C++ arrays are stored row-wise in contiguous memory. So if you traverse the array rows first you will traverse the data sequentially. That means that the next data point you need is likely to be in pipeline, cache, RAM, and the next hard drive sector before you need it. But if you traverse columns first then you will be repeatedly reading just one item from each row before reading from the next row. As a result your system's caching and lookahead will fail, and you will waste a lot of time waiting for RAM, which is about three to five times slower than cache. Even worse, you may wind up waiting for the hard drive, which can be millions of times slower than RAM if accessed in large random jumps.
How much time is wasted? On my machine summing a billion bytes row-wise takes 9 seconds, whereas summing them column-wise takes 51 seconds. The less sequential your data acccess, the slower your program will run.

No comments:

Post a Comment