Is it better to avoid using the mod operator when possible?

c performance optimization modulo

My general advice is as follows. Use whichever version you think is easier on the eye, and then profile your entire system. Only optimize those parts of the code that the profiler flags up as bottlenecks. I'll bet my bottom dollar that the modulo operator isn't going to be among them.

As far as the specific example goes, only benchmarking can tell which is faster on your specific architecture using your specific compiler. You are potentially replacing modulo with branching, and it's anything but obvious which would be faster.

c performance optimization modulo

Some simple measurement:

#include <stdio.h>#include <stdlib.h>int main(int argc, char *argv[]){    int test = atoi(argv[1]);    int divisor = atoi(argv[2]);    int iterations = atoi(argv[3]);    int a = 0;    if (test == 0) {        for (int i = 0; i < iterations; i++)            a = (a + 1) % divisor;    } else if (test == 1) {        for (int i = 0; i < iterations; i++)            a = a + 1 == divisor ? 0 : a + 1;    }    printf("%d\n", a);}

Compiling with either gcc or clang with -O3, and running time ./a.out 0 42 1000000000 (modulo version) or time ./a.out 1 42 1000000000 (comparison version) results in

6.25 seconds user runtime for the modulo version,
1.03 seconds for the comparison version.

(using gcc 5.2.1 or clang 3.6.2; Intel Core i5-4690K @ 3.50GHz; 64-bit Linux)

This means that it is probably a good idea to use the comparison version.

c performance optimization modulo

Well, have a look at 2 ways to get the next value of a "modulo 3" cyclic counter.

int next1(int n) {    return (n + 1) % 3;}int next2(int n) {    return n == 2 ? 0 : n + 1;}

I've compiled it with gcc -O3 option (for the common x64 architecture), and -s to get the assembly code.

The code for the first function does some unexplainable magic (*) to avoid a division, using a multiplication anyway:

addl    $1, %edimovl    $1431655766, %edxmovl    %edi, %eaximull   %edxmovl    %edi, %eaxsarl    $31, %eaxsubl    %eax, %edxleal    (%rdx,%rdx,2), %eaxsubl    %eax, %edimovl    %edi, %eaxret

And is much longer (and I bet slower) than the second function:

leal    1(%rdi), %eaxcmpl    $2, %edimovl    $0, %edxcmove   %edx, %eaxret

So it is not always true that "the (modern) compiler does a better job than you anyway".

Interestingly, the same experiment with 4 instead of 3 leads to a and-masking for the first function

addl    $1, %edimovl    %edi, %edxsarl    $31, %edxshrl    $30, %edxleal    (%rdi,%rdx), %eaxandl    $3, %eaxsubl    %edx, %eaxret

but it is still, and by large, inferior to the second version.

Being more explicit about proper ways to do the things

int next3(int n) {    return (n + 1) & 3;;}

yields much better results :

leal    1(%rdi), %eaxandl    $3, %eaxret

(*) well, not that complicated. Multiplication by reciprocical. Compute the integer constant K = (2^N)/3, for some large enough value of N. Now, when you want the value of X/3, instead of a division by 3, compute X*K, and shift it N positions to the right.

CodeHunter

Is it better to avoid using the mod operator when possible?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last