why does perf stat show "stalled-cycles-backend" as <not supported>? why does perf stat show "stalled-cycles-backend" as <not supported>? linux linux

why does perf stat show "stalled-cycles-backend" as <not supported>?


Looks like perf has not been updated to understand all the performance monitoring events that Ivy Bridge supports. Fortunately there is a generic, albeit cumbersome, interface that allows you to access the full list of performance monitoring events. I didn't see stalled-cycles-backend in the list when I gave it a quick look, but maybe I missed, or maybe they have broken it down by all the different events that could stall the backend.

We start with

perf list --help

...shows the following NOTE

    1. Intel(R) 64 and IA-32 Architectures Software Developer's Manual       Volume 3B: System Programming Guide       http://www.intel.com/Assets/PDF/manual/253669.pdf

...armed with that URL you end up in

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf

...you want section 19.3

19.3 PERFORMANCE MONITORING EVENTS FOR 3RD GENERATION INTEL® CORE™ PROCESSORS 3rd generation Intel® Core™ processors and Intel Xeon processor E3-1200 v2 product family are based on Intel microarchitecture code name Ivy Bridge. They support architectural performance-monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are listed in Table 19-5. The events in Table 19-5 apply to processors with CPUID signature of DisplayFamily_DisplayModel encoding with the following values: 06_3AH.

...so for architectural events you need Table 19-1

19.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS Architectural performance events are introduced in Intel Core Solo and Intel Core Duo processors. They are also supported on processors based on Intel Core microarchitecture. Table 19-1 lists pre-defined architectural performance events that can be configured using general-purpose performance counters and associated event-select registers.

**Table 19-1. Architectural Performance Events

enter image description here

enter image description here

... now comes the tricky part, you take the UMask Value as the upper 2 hex digits and the Event Num is the lower 2 hex digits of a 4 hex digit hardware register number to be given to perf stat.

perf stat --help
   -e, --event=       Select the PMU event. Selection can be a symbolic event name (use       perf list to list all events) or a raw PMU event (eventsel+umask) in       the form of rNNN where NNN is a hexadecimal event descriptor.

... it says NNN but you can give it NNNN. Let's verify that this works, let's ask perf stat for cache-misses both as a symbolic event name and as a hex number from table 19-1. We'll invoke the date command for simplicity.

$ perf stat -e r412e -e cache-misses dateFri Mar 28 09:28:52 CDT 2014Performance counter stats for 'date':          2292 r412e                                                                 2292 cache-misses                                                   0.003322663 seconds time elapsed$ 

As you can see both reported the same number, so far so good. Now we go to Table 19-5 for the non-architectural hardware registers, of which there are too many too list here, but I'll list a few:

enter image description here


The perf (or its in-kernel part) was not updated to support your CPU, so perf is unable to map generic event name "stalled-cycles-backend" to actual HW event.

In such case it can be easier to find event names; e.g. for Intel CPUs - from Intel's optimization manual http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf (which groups events by type and explains how to use them to measure various parts). Don't have similar document for AMD.

To use event names with perf without manual conversion into raw event ids (like amdn says in his answer), you can use converter scripts showevtinfo and check_events from perfmon2 (libpfm4; examples folder), as explained in the article "How to monitor the full range of CPU performance events" by Bojan Nikolic http://www.bnikolic.co.uk/blog/hpc-prof-events.html. perfmon2 knows AMD and Intel CPUs, and written in C/C++

For Intel CPUs the easiest way is to use ocperf wrapper over perf from Intel's open source python project by Andi Kleen "pmu-tools" hosted at github https://github.com/andikleen/pmu-tools and introduced here in ML: https://lwn.net/Articles/556983/ and in Andi's blog http://halobates.de/blog/p/245

The ocperf understands all intel event names from Intel's optimization manual.

ocperf will also support every HW event with older linux kernels. It has its own database in tsv or json format with all HW events and their codes at https://download.01.org/perfmon/ (there is auto-downloader in pmu-tools), and the database is constantly updated by Intel's employers. Format of database is documented in readme: https://download.01.org/perfmon/readme.txt

For Sandy Bridge/Ivy Bridge or Haswell, and kernels 3.10 or newer, you can also use toplev.py script from "pmu-tools" to investigate performance. Here is description from its author, Andi Kleen, http://halobates.de/blog/p/262 "pmu-tools, part II: toplev" based on "TopDown" method from Ahmad Yasin "How to Tune Applications Using a Top-Down Characterization of Microarchitectural Issues and "Top Down Analysis. Never lost with performance counters"


Just found Re: perf, x86: Add parts of the remaining haswell PMU functionality:

> AFAICS backend stall cycles are documented to work on Ivy Bridge.I'm not aware of any documentation that presents these eventsas accurate frontend/backend stalls without using the fullTopDown methology (Optimization manual B.3.2)

So IIUC stalled-cycles-backend counters are too unreliable on Ivy Bridge, and that's why the kernel devs have decided to not support them.

And sure enough, Linux' perf_event_intel.c supports PERF_COUNT_HW_STALLED_CYCLES_BACKEND for Nehalem, Xeon E7 and SandyBridge, but not for IvyBridge. PERF_COUNT_HW_STALLED_CYCLES_FRONTEND is supported for IvyBridge, though.

So I guess there won't be a way to get this counter on my current CPU - either switch CPUs or use the full top-down methodology mentioned in the mail (and described here and here)