How to implement efficient C++ runtime statistics How to implement efficient C++ runtime statistics linux linux

How to implement efficient C++ runtime statistics


I would recommend that in your code, you maintain counters that get incremented. The counters can be static class members or globals. If you use a class to define your counter, you can have the constructor register your counter with a single repository along with a name. Then, you can query and reset your counters by consulting the repository.

struct Counter {    unsigned long c_;    unsigned long operator++ () { return ++c_; }    operator unsigned long () const { return c_; }    void reset () { unsigned long c = c_; ATOMIC_DECREMENT(c_, c); }    Counter (std::string name);};struct CounterAtomic : public Counter {    unsigned long operator++ () { return ATOMIC_INCREMENT(c_, 1); }    CounterAtomic (std::string name) : Counter(name) {}};

ATOMIC_INCREMENT would be a platform specific mechanism to increment the counter atomically. GCC provides a built-in __sync_add_and_fetch for this purpose. ATOMIC_DECREMENT is similar, with GCC built-in __sync_sub_and_fetch.

struct CounterRepository {    typedef std::map<std::string, Counter *> MapType;    mutable Mutex lock_;    MapType map_;    void add (std::string n, Counter &c) {        ScopedLock<Mutex> sl(lock_);        if (map_.find(n) != map_.end()) throw n;        map_[n] = &c;    }    Counter & get (std::string n) const {        ScopedLock<Mutex> sl(lock_);        MapType::const_iterator i = map_.find(n);        if (i == map_.end()) throw n;        return *(i->second);    }};CounterRepository counterRepository;Counter::Counter (std::string name) {    counterRepository.add(name, *this);}

If you know the same counter will be incremented by more than one thread, then use CounterAtomic. For counters that are specific to a thread, just use Counter.


I gather you are trying to implement the gathering of run-time statistics -- things like how many bytes you sent, how long you've been running, and how many times the user has activated a particular function.

Typically, in order to compile run-time statistics such as these from a variety of sources (like worker threads), I would have each source (thread) increment its own, local counters of the most fundamental data but not perform any lengthy math or analysis on that data yet.

Then back in the main thread (or wherever you want these stats analyzed & displayed), I send a RequestProgress type message to each of the worker threads. In response, the worker threads will gather up all the fundamental data and perhaps perform some simple analysis. This data, along with the results of the basic analysis, are sent back to the requesting (main) thread in a ProgressReport message. The main thread then aggregates all this data, does additional (perhaps costly) analysis, formatting and display to the user or logging.

The main thread sends this RequestProgress message either on user request (like when they press the S key), or on a timed interval. If a timed interval is what I'm going for, I'll typically implement another new "heartbeat" thread. All this thread does is Sleep() for a specified time, then send a Heartbeat message to the main thread. The main thread in turn acts on this Heartbeat message by sending RequestProgress messages to every worker thread the statistics are to be gathered from.

The act of gathering statistics seems like it should be fairly straightforward. So why such a complex mechanism? The answer is two-fold.

First, the worker threads have a job to do, and computing usage statistics isn't it. Trying to refactor these threads to take on a second responsibility orthoganal to their main purpose is a little like trying to jam a square peg in to a round hole. They weren't built to do that, so the code will resist being written.

Second, the computation of run-time statistics can be costly if you try to do too much, too often. Suppose for example you have a worker thread that send multicast data on the network, and you want to gather throughput data. How many bytes, over how long a time period, and an average of how many bytes per second. You could have the worker thread compute all this on the fly itself, but it's a lot of work and that CPU time is better spent by the worker thread doing what it's supposed to be doing -- sending multicast data. If instead you simply incremented a counter for how many bytes you've sent every time you send a message, the counting has minimal impact on the performance of the thread. Then in response to the occasional RequestProgress message you can figure out the start & stop times, and send just that along to let the main thread do all the divison etc.


Use shared memory (POSIX, System V, mmap or whatever you have available). Put a fixed length array of volatile unsigned 32- or 64-bit integers (i.e. the largest you can atomically increment on your platform) in there by casting the raw block of memory to your array definition. Note that the volatile doesn't get you atomicity; it prevents compiler optimizations that might trash your stats values. Use intrinsics like gcc's __sync_add_and_fetch() or the newer C++11 atomic<> types.

You can then write a small program that attaches to the same block of shared memory and can print out one or all stats. This small stats reader program and you main program would have to share a common header file that enforced the position of each stat in the array.

The obvious drawback here is that you're stuck with a fixed number of counters. But it's hard to beat, performance-wise. The impact is the atomic increment of an integer at various points in your program.