How to find the main function's entry point of elf executable file without any symbolic information?
Locating main()
in a stripped Linux ELF binary is straightforward. No symbol information is required.
The prototype for __libc_start_main
is
int __libc_start_main(int (*main) (int, char**, char**), int argc, char *__unbounded *__unbounded ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (*__unbounded stack_end));
The runtime memory address of main()
is the argument corresponding to the first parameter, int (*main) (int, char**, char**)
. This means that the last memory address saved on the runtime stack prior to calling __libc_start_main
is the memory address of main()
, since arguments are pushed onto the runtime stack in the reverse order of their corresponding parameters in the function definition.
One can enter main()
in gdb
in 4 steps:
- Find the program entry point
- Find where
__libc_start_main
is called - Set a break point to the address last saved on stack prior to the call to
_libc_start_main
- Let program execution
continue
until the break point formain()
is hit
The process is the same for both 32-bit and 64-bit ELF binaries.
Entering main()
in an example stripped 32-bit ELF binary called "test_32":
$ gdb -q -nh test_32Reading symbols from test_32...(no debugging symbols found)...done.(gdb) info file #step 1Symbols from "/home/c/test_32".Local exec file: `/home/c/test_32', file type elf32-i386. Entry point: 0x8048310 < output snipped >(gdb) break *0x8048310Breakpoint 1 at 0x8048310(gdb) runStarting program: /home/c/test_32 Breakpoint 1, 0x08048310 in ?? ()(gdb) x/13i $eip #step 2=> 0x8048310: xor %ebp,%ebp 0x8048312: pop %esi 0x8048313: mov %esp,%ecx 0x8048315: and $0xfffffff0,%esp 0x8048318: push %eax 0x8048319: push %esp 0x804831a: push %edx 0x804831b: push $0x80484a0 0x8048320: push $0x8048440 0x8048325: push %ecx 0x8048326: push %esi 0x8048327: push $0x804840b # address of main() 0x804832c: call 0x80482f0 <__libc_start_main@plt>(gdb) break *0x804840b # step 3Breakpoint 2 at 0x804840b(gdb) continue # step 4 Continuing.Breakpoint 2, 0x0804840b in ?? () # now in main()(gdb) x/x $esp+40xffffd110: 0x00000001 # argc = 1(gdb) x/s **(char ***) ($esp+8)0xffffd35c: "/home/c/test_32" # argv[0](gdb)
Entering main()
in an example stripped 64-bit ELF binary called "test_64":
$ gdb -q -nh test_64Reading symbols from test_64...(no debugging symbols found)...done.(gdb) info file # step 1Symbols from "/home/c/test_64".Local exec file: `/home/c/test_64', file type elf64-x86-64. Entry point: 0x400430 < output snipped >(gdb) break *0x400430Breakpoint 1 at 0x400430(gdb) run Starting program: /home/c/test_64 Breakpoint 1, 0x0000000000400430 in ?? ()(gdb) x/11i $rip # step 2=> 0x400430: xor %ebp,%ebp 0x400432: mov %rdx,%r9 0x400435: pop %rsi 0x400436: mov %rsp,%rdx 0x400439: and $0xfffffffffffffff0,%rsp 0x40043d: push %rax 0x40043e: push %rsp 0x40043f: mov $0x4005c0,%r8 0x400446: mov $0x400550,%rcx 0x40044d: mov $0x400526,%rdi # address of main() 0x400454: callq 0x400410 <__libc_start_main@plt>(gdb) break *0x400526 # step 3Breakpoint 2 at 0x400526(gdb) continue # step 4Continuing.Breakpoint 2, 0x0000000000400526 in ?? () # now in main()(gdb) print $rdi $3 = 1 # argc = 1(gdb) x/s **(char ***) ($rsp+16)0x7fffffffe35c: "/home/c/test_64" # argv[0](gdb)
A detailed treatment of program initialization and what occurs before main()
is called and how to get to main()
can be found be found in Patrick Horgan's tutorial "Linux x86 Program Start Upor - How the heck do we get to main()?"
If you have a very stripped version, or even a binary that is packed, as using UPX, you can gdb on it in the tough way as:
$ readelf -h echo | grep EntryEntry point address: 0x103120
And then you can break at it in GDB as:
$ gdb mybinary(gdb) break * 0x103120Breakpoint 1 at 0x103120gdb) (gdb) rStarting program: mybinary Breakpoint 1, 0x0000000000103120 in ?? ()
and then, you can see the entry instructions:
(gdb) x/10i 0x0000000000103120=> 0x103120: bl 0x103394 0x103124: dcbtst 0,r5 0x103128: mflr r13 0x10312c: cmplwi r7,2 0x103130: bne 0x103214 0x103134: stw r5,0(r6) 0x103138: add r4,r4,r3 0x10313c: lis r0,-32768 0x103140: lis r9,-32768 0x103144: addi r3,r3,-1
I hope it helps
As far as I know, once a program has been stripped, there is no straightforward way to locate the function that the symbol main
would have otherwise referenced.
The value of the symbol main
is not required for program start-up: in the ELF format, the start of the program is specified by the e_entry
field of the ELF executable header. This field normally points to the C library's initialization code, and not directly to main
.
While the C library's initialization code does call main()
after it has set up the C run time environment, this call is a normal function call that gets fully resolved at link time.
In some cases, implementation-specific heuristics (i.e., the specific knowledge of the internals of the C runtime) could be used to determine the location of main
in a stripped executable. However, I am not aware of a portable way to do so.