Today I was testing the performance of a piece of code, which is basically accessing each element in a container within a for loop. But the result is quite shocking because I found the std::for_each version is 10 times faster than the raw loop. What?
I was using VS2015 under Debug build. Here’s the output:
Raw loop: 978
Range-based for loop: 426
std::for_each: 66
However when I switched to Release build:
Raw loop: 2
Range-based for loop: 2
std::for_each: 5
That’s what I’ve been expecting. And when I changed time precision to nanoseconds it turned out that raw loop is slightly faster than the range-based for loop.
Lesson learned
The compiler sure knows how to optimize your code. So do your profiling with optimization on.