Measuring amount of CPU time taken by a piece of code, in C on Unix/Linux Measuring amount of CPU time taken by a piece of code, in C on Unix/Linux unix unix

Measuring amount of CPU time taken by a piece of code, in C on Unix/Linux


On recent Linux's (*). you can get this information from the /proc filesystem. In the file /proc/PID/stat the 14th entry has the number of jiffies used in userland code and the 15th entry has the number of jiffies used in system code.

If you want to see the data on a per-thread basis, you should reference the file /proc/PID/task/TID/stat instead.

To convert jiffies to microseconds, you can use the following:

define USEC_PER_SEC         1000000ULlong long jiffies_to_microsecond(long long jiffies){    long hz = sysconf(_SC_CLK_TCK);    if (hz <= USEC_PER_SEC && !(USEC_PER_SEC % hz))    {        return (USEC_PER_SEC / hz) * jiffies;    }    else if (hz > USEC_PER_SEC && !(hz % USEC_PER_SEC))    {        return (jiffies + (hz / USEC_PER_SEC) - 1) / (hz / USEC_PER_SEC);    }    else    {        return (jiffies * USEC_PER_SEC) / hz;    }}

If all you care about is the per-process statistics, getrusage is easier. But if you want to be prepared to do this on a per-thread basis, this technique is better as other then the file name, the code would be identical for getting the data per-process or per-thread.

* - I'm not sure exactly when the stat file was introduced. You will need to verify your system has it.


I would give a try with getrusage and check system and user time.

Also check with gettimeofday to compare with wall clock time.


I would try to correlate the time with the shell's time command, as a sanity check.

You should also consider that the compiler may be optimizing the loop. Since the memset does not depend on the loop variable the compiler will certainly be tempted to apply an optimization known as loop invariant code motion.

I would also caution that a 10MB possibly in-cache clear will really be 1.25 or 2.5 million CPU operations as memset certainly writes in 4-byte or 8-byte quantities. While I rather doubt that this could be done in less than a microsecond, as stores are a bit expensive and 100K adds some L1 cache pressure, you are talking about not much more than one operation per nanosecond, which is not that hard to sustain for a multi-GHz CPU.

One imagines that 600 nS would round off to 1 clock tick, but I would worry about that as well.