How to proceed to make a software which tracks the function calls of another software? How to proceed to make a software which tracks the function calls of another software? linux linux

How to proceed to make a software which tracks the function calls of another software?


You can do this before runtime by static analysis, or at runtime by dynamic analysis.

You have only two ways to do this at runtime, and they both amount to instrumenting code:

Instrument the source code

Modify ("instrument") the source code so it keeps track of what you want, e.g., modifyevery call site to pass an additional argument effectively containing the calling function name, and every function entry to record the entry to that function (e.g., effectively the called function name) and the calling function name. The instrumentation process needs to build a cross-reference from points that are instrumented, back to source file and line to enable later reporting.

You can do this in a completely automated way by using a Program Transformation System (PTS), which can parse source code to make ASTs, modify the ASTs, and then regenerate source. Often a PTS will let you write the changes as source-level patterns of the form, "if you see this, replace it by that". (No, regexes can't do this).

An extended example of how do this can be found in my technical paper, Branch Coverage for Arbitrary Languages. The paper shows discusses what is in a PTS, and then shows typical instrumentation transformations.

Getting this done "right" for a language like C can be difficult, because you need as a foundation, a full C parser, which is a major technical feat in its own right. Best to get such a beast in completed form (some of the PTS have this, consider Concinnelle or DMS) or you'll never get around to doing the instrumentation part. (Another answer suggests that GCC has much of this instrumentation functionality built-in).

Done reasonably, the execution cost of the additional instrumentation is pretty modest (10-30%). This concept has been used to implement test coverage tools and timing profilers, which operate by tracking the dynamic set of calls, which seems to be exactly what you want.

Instrument the object code

Modify the execution engine that runs the code to keep track of the details you want. Given that you want to do this with C programs, the execution engine is essentially the underlying instruction set.

Since you can't reasonably modify the processor chip itself, you'll have to either:

  • modify the object code (using object-code equivalents of program transformations, see preceding paragraph for concepts, or see tools like PIN and Valgrind for details of implementation),

  • or write some kind of clever instruction set interpreter that collects the information it needs as it encounters call-instructions (this is likely to be rather slow, because of its interpretive nature).

Implementing these for real instruction sets is complicated by the instruction sets themselves (Intel's X86 instruction set is huge), object file formats, etc. Don't expect this path to be easier than the source instrumentation path; just expect it to have a different set of problems.

One standard problem is knowing what to instrument; the object code space is full of code (libraries, system routines, ...) that aren't part of the application of interest, and thus aren't interesting to collect test coverage data for. For this, you need a symbol table ( pairs) to tell what is application code, and what is not. You also need to be able to trace from the name to a point in the original source file(s),in order to report the results.

One other difficulty with object code instrumentation is that it is difficult to tell if a function foo, which has been inlined, has been executed ("covered"). The source instrumentation scheme does not have this problem, because foo gets instrumented before it gets compiled/inlined, so the instrumentation gets inlined too.

Threads

Handling threads is an orthogonal issue. At each place where you record a fact X-calls-Y, you simply attach the thread ID (or its equivalent) T, obtained from the OS, to produce X-calls-Y-in-T. It isn't clear from your question why you want to do this.

Static Analysis

You could decide to skip all the hassles of runtime implementation, and build/get a static analyzer that can produce a call graph. Getting of these for C might be possible using Clang or DMS. Building it yourself is likely difficult; now you need a full C parser, complete data flow and points-to analysis, and call-graph construction. The details won't fit in this paragraph.


This is very much platform dependent. What you need is to do more or less what a debugger does.

The way I'd approach this (I actually implemented all this once when I had to debug tools needed to build gdb on an new architecture):

Read the symbol table of the program you want to debug. You said Linux in a comment, so to start you need a library that reads ELF files or read the ELF specification and implement something yourself.

Use the ptrace syscall to create a breakpoint at the function main. If you're lucky, your system has a ptrace facility to create breakpoints and keeps the bookkeeping in the kernel. If it doesn't, you need to figure out the breakpoint instruction for your cpu architecture and implement breakpoints yourself.

Next, you need to figure out the debugging hooks in your dynamic loader so that you know when shared libraries are loaded. You'll need the symbol tables of the libraries as well.

Now that you have all the symbols (admittedly this is after your program has run for a while, since it had to load the dynamic libraries, but we'll pretend that the program starts at main) create breakpoints at all functions you've gotten from the symbol table.

Let the program run. Every time you hit a breakpoint, reverse look up the instruction pointer in the symbol table. Save the name of the function to a file or wherever you want to save it. You can now track function calls.

Or just use a debugger. gdb can probably be scripted to do something like this.


I apologize that these are not open source tools, but I have used them for research in the past and they enable the functionality you require, so you might get some hints from playing with them.

On Linux, try taking a look at how Pin works.

On Windows, look at Detours.