How does one go about understanding GNU source code? How does one go about understanding GNU source code? c c

How does one go about understanding GNU source code?


The GNU programs big and complicated. The size of GNU Hello World shows that even the simplest GNU project needs a lot of code and configuration around it.

The autotools are hard to understand for a beginner, but you don't need to understand them to read the code. Even if you modify the code, most of the time you can simply run make to compile your changes.

To read code, you need a good editor (VIM, Emacs) or IDE (Eclipse) and some tools to navigate through the source. The tar project contains a src directory, that is a good place to start. A program always start with the main function, so do

grep main *.c

or use your IDE to search for this function. It is in tar.c. Now, skip all the initialization stuff, untill

/* Main command execution.  */

There, you see a switch for subcommands. If you pass -x it does this, if you pass -c it does that, etc. This is the branching structure for those commands. If you want to know what these macro's are, run

grep EXTRACT_SUBCOMMAND *.h

there you can see that they are listed in common.h.

Below EXTRACT_SUBCOMMAND you see something funny:

read_and (extract_archive);

The definition of read_and() (again obtained with grep):

read_and (void (*do_something) (void))

The single parameter is a function pointer like a callback, so read_and will supposedly read something and then call the function extract_archive. Again, grep on it and you will see this:

  if (prepare_to_extract (current_stat_info.file_name, typeflag, &fun))    {      if (fun && (*fun) (current_stat_info.file_name, typeflag)      && backup_option)    undo_last_backup ();    }  else    skip_member ();

Note that the real work happens when calling fun. fun is again a function pointer, which is set in prepare_to_extract. fun may point to extract_file, which does the actual writing.

I hope I walked you a great deal through this and shown you how I navigate through source code. Feel free to contact me if you have related questions.


The problem with programs like tar and sed is twofold (this is just my opinion, of course!). First of all, they're both really old. That means they've had multiple people maintain them over the years, with different coding styles and different personalities. For GNU utilities, it's usually pretty good, because they usually enforce a reasonably consistent coding style, but it's still an issue. The other problem is that they're unbelievably portable. Usually "portability" is seen as a good thing, but when taken to extremes, it means your codebase ends up full of little hacks and tricks to work around obscure bugs and corner cases in particular pieces of hardware and systems. And for programs as widely ported as tar and sed, that means there's a lot of corner cases and obscure hardware/compilers/OSes to take into account.

If you want to learn C, then I would say the best place to start is not trying to study code that others have written. Rather, try to write code yourself. If you really want to start with an existing codebase, choose one that's being actively maintained where you can see the changes that other people are making as they make them, follow along in the discussions on the mailing lists and so on.

With well-established programs like tar and sed, you see the result of the discussions that would've happened, but you can't see how software design decisions and changes are being made in real-time. That can only happen with actively-maintained software.

That's just my opinion of course, and you can take it with a grain of salt if you like :)


Why not download the source of the coreutils (http://ftp.gnu.org/gnu/coreutils/) and take a look at tools like yes? Less than 100 lines of C code and a fully functional, useful and really basic piece of GNU software.