C 64-bit loop performance on x86 C 64-bit loop performance on x86 c c

C 64-bit loop performance on x86


I had a similar issue like this before; I couldn't find any issues in either of our code. BUT what worked for me was changing the compiler.

My guess is that GCC is writing deprecated Assembly.

If you can decompile your application we could shed more light on the issue but there just isn't enough information to go on here.

When I decompiled my code what I found is that it was rewriting an entire method several times. but that might just be for me.

Hope this helped you, there is little to no information about this anywhere.

If I had to guess I would agree with Learner, I am pretty certain that the decompiled code would point to the for loop. I am very interested with this issue so please comment back.


Probable answer: "i < size - 1" condition can be compiled and executed more efficiently than "i < size - 3". First one requires just a decrement instruction instead of the other which requires a constant 3 to also be loaded somewhere. This calculation is executed with every iteration. You should store the result of this calculation elsewhere.

This has nothing to do with the while loop. When you rewrote the while loop you changed the iteration condition too and eliminated the cause above.

I would also prefer doing the type casting outside the loop, but that also reveals one restriction - your data must


Are you making the compiler's job difficult. In the inner loop you're calculating the byte offset yourself by your choice of index stride and the cast. This might be preventing loop unrolling or any other optimization that tries to assume alignment. Might also not be letting the compiler use addressing modes and going out to calculate the effective address itself (or LEA it).

If i were doing this, I'd cast the data pointer at the top of the loop to your stride type and increment your loop counter by 1. The compiler might be a little happier.