Get start and end of process segments C/C++ Get start and end of process segments C/C++ unix unix

Get start and end of process segments C/C++


I'm not sure that the data or the heap segment is well defined and unique (in particular in multi-threaded applications, or simply in applications using dynamic libraries, including libc.so). In other words, there is no more any well defined start and end of text, data, or heap segment, because today a process has many such segments. So your question don't even make sense in the general case.

Most malloc implementations use mmap(2) and munmap much more than sbrk

You should read more about proc(5). In particular, your application could read /proc/self/maps (or /proc/1234/maps for process of pid 1234) or /proc/self/smaps; try cat /proc/self/maps and consider using fopen(3) on "/proc/self/maps" (then a loop on fgets or readline, and finally and quickly fclose). Perhaps dladdr(3) might be relevant.

You could also read the ELF headers of your program, e.g. of /proc/self/exe. See also readelf(1) and objdump(1) & execve(2) & elf(5) & ld.so(8) & libelf. Read also Levine's Linkers & Loaders book and Drepper's paper: How To Write Shared Libraries.

See also this answer to a related question (and also that question). Notice that recent Linux systems have ASLR, so the address layout of two similar processes running the same program in the same environment would be different.

Try also to strace(1) some simple command or your program. You'll understand a bit more the relevant syscalls(2). Read also Advanced Linux Programming


See man 3 end for some help:

#include <stdio.h>extern etext;extern edata;extern end;intmain(int ac, char **av, char **env){        printf("main %p\n", main);        printf("etext %p\n", &etext);        printf("edata %p\n", &edata);        printf("end %p\n", &end);        return 0;}

The addresses of those 3 symbols are the first address after the end of the text, initialized data, and uninitialized data segments.

You can get at the enivonrment variables via a 3rd parameter to main() as in the example code above, but you can also walk up the stack starting with the address &argv[0]. There's a NULL value word (32 or 64 bit depending on CPU) after the last pointer to a command line argument string. After that NULL lies the environment.

The top of the stack is near impossible to get programmatically - modern OSes all do "Address Space Layout Randomization" (ASLR) to provide some mitigation of buffer overflows. The "end" of the stack is hazy, as you can allocate on the stack (via recursion or alloca()) until you run into the top of the heap. So the "end" of the stack depends on allocation patterns of the program in question.

You should also be aware of ELF auxilliary vector. See man getauxval for a C language interface, and this article for some explanation. User programs never have a use for the ELF auxilliary vector, but it's intimately tied up with dynamic linking.


As said in another comment, the notion of a text, data, and stack segment does not really exist on Linux today. Program text is spread over shared libraries and memory allocation is done with mmap() instead of brk() causing the allocated data to be spread out all over the address space of a program.

That said, you can use the brk() system call to find the end of the data segment and you can use the symbols etext, edata, and end to find the boundaries of the executable. The beginning of the text segment is traditionally fixed (also called the “loading address”) and depends on the architecture and linker configuration. Notice that your program will most likely execute code outside the text section of your binary and will most likely not allocate any dynamic memory with brk.

See the corresponding man pages for more details.