Memory Leak Detectors Working Principle Memory Leak Detectors Working Principle c c

Memory Leak Detectors Working Principle


There are a couple of different ways that leak detectors work. You can replace the implementation of malloc and free with ones that can track more information during allocation and are not concerned with performance. This is similar to how dmalloc works. In general, any address that is malloc'ed but not free'd is leaked.

The basic implementation is actually pretty simple. You just maintain a lookup table of every allocation and its line number, and remove the entry when it is freed. Then when the program is done you can list all leaked memory. The hard part is determining when and where the allocation should have been freed. This is even harder when there are multiple pointers to the same address.

In practice, you'll probably want more than just the single line number, but rather a stack trace for the lost allocations.

Another approach is how valgrind works which implements an entire virtual machine to keep track of addresses and memory references and associated bookkeeping. The valgrind approach is much more expensive, but also much more effective as it can also tell you about other types of memory errors like out of bounds reads or writes.

Valgrind essentially instruments the underlying instructions and can track when a given memory address has no more references. It can do this by tracking assignments of addresses, and so it can tell you not just that a piece of memory was lost, but exactly when it became lost.

C++ makes things a little harder for both types of leak detectors because it adds the new and delete operators. Technically new can be a completely different source of memory than malloc. However, in practice many real C++ implementations just use malloc to implement new or have an option to use malloc instead of the alternate approach.

Also higher level languages like C++ tend to have alternative higher level ways of allocating memory like std::vector or std::list. A basic leak detector would report the potentially many allocations made by the higher level modes separately. That's much less useful than saying the entire container was lost.


Here's a published technical paper on how our CheckPointer tool works.

Fundamentally it tracks the lifetimes of all values (heap and stack), and their sizes according their types as defined by the language. This allows CheckPointer to find not only leaks, but out-of-array bound accesses, even for arrays in the stack, which valgrind won't do.

In particular, it analyzes the source code to find all pointer uses.(This is quite the task just by itself).

It keeps track of pointer meta data for each pointer, consisting of

  • A reference to the object meta data for the heap-allocated object or global or local variable orfunction pointed to by the pointer and
  • The address range of the (sub)object of the object that the pointer may currently access. This may be smaller than the address range of thewhole object; e.g. if you take the address of a struct member, the instrumented source code will only allow access to that member when using the resulting pointer.

It also tracks the kind and location of each object, i.e. whetherit is a function, a global, thread-local or local variable, heap-allocated memory, or a string literal constant:

  • The address range of the object that may be safely accessed, and
  • For each pointer stored in the heap-allocated object or variable, a reference to the pointer metadata for that pointer.

All this tracking is accomplished by transforming the original program source, into a program which does what the original program does, and interleaves various meta-data checking or updating routines. The resulting program is compiled and run. Where a meta-data check fails at runtime, a backtrace is provided with a report of the type of failure (invalid pointer, pointer outside valid bounds, ...)


This is tagged C and C++ and no operating system is mentioned. This answer is for Windows.

C

Windows has the concept of virtual memory. Any memory a process can get is virtual memory. This is done through VirtualAlloc() [MSDN]. You can imagine the leak detector to put a breakpoint on that function and whenever it is called, it gets the callstack and saves it somewhere. Then it can do similar for VirtualFree()[MSDN].

The difference can then be identified and shown along with the callstacks that have been saved.

C++

C++ has a different concept: it takes the large 64kb blocks which it gets from VirtualAlloc() and splits it into smaller pieces, called the Heap. The C++ heap manager comes from Microsoft and offers new methods HeapAlloc() [MSDN] and HeapFree()[MSDN].

Then, you could do the same as before, but actually, that feature is already built-in. Microsoft's GFlags [MSDN] tool can enable the tracking:

Screenshot: GFlags enabled for Notepad

In this case it will save up to 50 MB of callstack information for C++ heap manager calls.

Since that settings can also be enabled via the Windows Registry, a memory leak detector can make use of it easily.

General concept

As you can see, the general concept is to keep track of allocations and deallocations, compare them and show the callstacks of the difference.