Is clock_gettime() adequate for submicrosecond timing? Is clock_gettime() adequate for submicrosecond timing? linux linux

Is clock_gettime() adequate for submicrosecond timing?


I ran some benchmarks on my system which is a quad core E5645 Xeon supporting a constant TSC running kernel 3.2.54 and the results were:

clock_gettime(CLOCK_MONOTONIC_RAW)       100ns/callclock_gettime(CLOCK_MONOTONIC)           25ns/callclock_gettime(CLOCK_REALTIME)            25ns/callclock_gettime(CLOCK_PROCESS_CPUTIME_ID)  400ns/callrdtsc (implementation @DavidSchwarz)     600ns/call

So it looks like on a reasonably modern system the (accepted answer) rdtsc is the worst route to go down.


No. You'll have to use platform-specific code to do it. On x86 and x86-64, you can use 'rdtsc' to read the Time Stamp Counter.

Just port the rdtsc assembly you're using.

__inline__ uint64_t rdtsc(void) {  uint32_t lo, hi;  __asm__ __volatile__ (      // serialize  "xorl %%eax,%%eax \n        cpuid"  ::: "%rax", "%rbx", "%rcx", "%rdx");  /* We cannot use "=A", since this would use %rax on x86_64 and return only the lower 32bits of the TSC */  __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));  return (uint64_t)hi << 32 | lo;}


I need a high-resolution timer for the embedded profiler in the Linux build of our application. Our profiler measures scopes as small as individual functions, so it needs a timer precision of better than 25 nanoseconds.

Have you considered oprofile or perf? You can use the performance counter hardware on your CPU to get profiling data without adding instrumentation to the code itself. You can see data per-function, or even per-line-of-code. The "only" drawback is that it won't measure wall clock time consumed, it will measure CPU time consumed, so it's not appropriate for all investigations.