Profiling CPU Cache/Memory from the OS/Application? Profiling CPU Cache/Memory from the OS/Application? windows windows

Profiling CPU Cache/Memory from the OS/Application?


You might want to look at Intel's PMU i.e. Performance Monitoring Unit. Some processors have one. It is a bunch of special purpose registers (Intel calls them Model Specific Registers, or MSRs) which you can program to count events, like cache misses, using the RDMSR and WRMSR instructions.

Here is a document about Performance Analysis on i7 and Xeon 5500.

You might want to check out Intel's Performance Counter Monitor, which is basically some routines that abstract the PMU, which you can use in a C++ application to measure several performance metrics live, including cache misses. It also has some GUI/Commandline tools for standalone use.

Apparently, the Linux kernel has a facility for manipulating MSRs.

There are other utilities/APIs that also use the PMU: perf, PAPI.


Cache performance is generally measured in terms of hit rate and miss rate.

There are many tools to do this for you. Check how Valgrind does cache profiling.

Also cache performance is generally measured on a per program basis. Well written programs will result in a fewer cache misses and better cache performance and vice versa for poorly written code.

Measuring the actual cache speed is the headache of the hardware manufacturers and you can refer their manuals to know this value.

Callgrind/Cachegrind combination can help you track cache hits/misses


This has some examples.TAU, an open-source profiler which works using PAPI can also be used.

If however, you want to write a code to measure the cache statistics you can write a program using PAPI. PAPI allows the user to access the hardware counters without any need to know system architecture.PMU uses Model Specific Registers, hence you must have the knwoledge of the registers to be used.

Perf allows for measurement of L1 and LLC (which is L2), Cachegrind, on the other hand allows the user to measure L1 and LLC (which can be L2 or L3, whichever the highest level cache is). Use Cachegrind only if you have no need of faster results because Cachegrind runs the program about 10X slower.