Why unused objects in STATIC lib included in final binary when SHARED lib reference them?
Here is a much simplified illustration of the linker behaviour that is puzzlingyou:
main.c
extern void foo(void);int main(void){ foo(); return 0;}
foo.c
#include <stdio.h>void foo(void){ puts(__func__);}
bar.c
#include <stdio.h>extern void do_bar(void);void bar(void){ do_bar();}
do_bar.c
#include <stdio.h>void do_bar(void){ puts(__func__);}
Let's compile all those source files to object files:
$ gcc -Wall -c main.c foo.c bar.c do_bar.c
Now we'll try to link a program, like so:
$ gcc -o prog main.o foo.o bar.obar.o: In function `bar':bar.c:(.text+0x5): undefined reference to `do_bar'
The undefined function do_bar
is referenced only in the definitionof bar
, and bar
is not referenced inthe program at all. Why then the linkage failure?
Quite simply, this linkage failed because we told the linker to link bar.o
into the program; so it did; and bar.o
contains the definition of bar
,which references do_bar
, which is not defined in the linkage. bar
is notreferenced, but do_bar
is - by bar
, which is linked in the program.
By default, the linker demands that any symbol that is referenced in the linkageof a program is defined in the linkage. If we compel it to link the definitionof bar
, then it will demand a definition of do_bar
, because without adefinition of do_bar
it hasn't actually got a definition of bar
. It if linksa definition of bar
, it does not question whether we need to link it,and then permit undefined references to do_bar
if the answer is No.
The linkage failure is course fixable with:
$ gcc -o prog main.o foo.o bar.o do_bar.o$ ./progfoo
Now in this illustration, linking bar.o
in the program is simply gratuitous. Wecan also link successfully just by not telling the linker to link bar.o
.
gcc -o prog main.o foo.o$ ./progfoo
bar.o
and do_bar.o
are both are superfluous forexecuting main
, but the program can only be linked with both, or with neither
But suppose foo
and bar
were defined in the same file?
They might be defined in the same object file, foobar.o
:
ld -r -o foobar.o foo.o bar.o
And then:
$ gcc -o prog main.o foobar.ofoobar.o: In function `bar':(.text+0x18): undefined reference to `do_bar'collect2: error: ld returned 1 exit status
Now, the linker cannot link the definition of foo
without also linking thedefinition of bar
. So once again, we have to link a definition of do_bar
:
$ gcc -o prog main.o foobar.o do_bar.o$ ./progfoo
Linked like this, prog
contains definitions of foo
, bar
and do_bar
:
$ nm prog | grep -e foo -e bar000000000000065d T bar0000000000000669 T do_bar000000000000064a T foo
(T
= defined function symbol).
Equally, foo
and bar
might be defined in the same shared library:
$ gcc -Wall -fPIC -c foo.c bar.c$ gcc -shared -o libfoobar.so foo.o bar.o
and then this linkage:
$ gcc -o prog main.o -L. -lfoobar -Wl,-rpath=$(pwd)./libfoobar.so: undefined reference to `do_bar'collect2: error: ld returned 1 exit status
fails just as before, and is fixable in the same way:
$ gcc -o prog main.o do_bar.o -L. -lfoobar -Wl,-rpath=$(pwd)$ ./progfoo
When we link the shared library libfoobar.so
rather than the objectfile foobar.o
, our prog
has a different symbol table:
$ nm prog | grep -e foo -e bar00000000000007aa T do_bar U foo
This time, prog
does not contain definitions of either foo
or bar
. Itcontains an undefined reference (U
) to foo
, because it calls foo
, and ofcourse that reference will now be satisfied, at runtime, by the definition in libfoobar.so
.There's not even an undefined reference to bar
, nor should there be, since the programnever calls bar
.
But still, prog
contains the definition of do_bar
, which is now unreferencedfrom all functions in the symbol table.
This echoes your own SSCCE, but in a less convoluted way. In your case:
The object file
libsub.a(shared2.o)
islinked into the program to provide definitions forfunc2a
andfunc2b
.Those defintions must be found and linked because they are referenced, respectively, in the definitions of
Client_func2a
andClient_func2b
, which are defined inlibcshared.so
.libcshared.so
must be linked to provide a definition ofClient_func1a
.A definition of
Client_func1a
must be found and linked because it isreferenced from the definition offunc1a
.And
func1a
is called bymain
.
That's why we see:
$ nm main | grep func2 U Client_func2a U Client_func2b00000000004009f7 T func2a0000000000400a30 T func2b
in the symbol table of your program.
It is is not at all unusual for definitions to be linked into a program forfunctions that it does not call. It usually happens in the way we we've seen: the linkage,recursively resolving symbol references starting with main
, discovers that it needs a definitionof f
, which it can only get by linking some object file file.o
, and with file.o
it also links a definition of function g
, which is never called.
What is rather odd is to end up with a program like your main
and like my last version of prog
,which contains a definition of an uncalled function (e.g do_bar
) that is linked to resolvereferences from the definition of another uncalled function (e.g. bar
) that is not defined in the program.Even if there are redundant function definitions, usually we can chain them back to one or moreobject files in the linkage where the first redundant definitions are pulled in along withsome necessary defintions.
This oddity is caused, in a case like:
gcc -o prog main.o do_bar.o -L. -lfoobar -Wl,-rpath=$(pwd)
because the first redundant function definition that must be linked (bar
) isprovided by linking a shared library, libfoobar.so
, while the definition of do_bar
that is demanded by bar
is not in that shared library, or any other shared library,but in an object file.
The definition of bar
that's provided by libfoobar.so
will stay there when theprogram is linked with that shared library. It won't be physically linked into theprogram. That's the nature of dynamic linkage. But any object file required by thelinkage - whether it's a free-standing object file like do_bar.o
or onethat the linker extracts from an archive like libsub.a(shared2.o)
- can only belinked physically into the program. So the redundant do_bar
appears in thesymbol table of prog
. But the redundant bar
, which explains why do_bar
is there,isn't there. It is in the symbol table of libfoobar.so
.
When you discover dead code in your program, you might like the linker to be smarter.Usually, it can be smarter, at the cost of some extra effort. You need to ask it to garbage-collect sections,and before that, you need to ask the compiler to prepare the way by generating data-sections andfunction-sections in the object files. See How to remove unused C/C++ symbols with GCC and ld?, andthe answer
But this way of pruning dead code will not work in the unusual case where thedead code is linked in the program to satisfy redundant references from a shared libraryrequired by the linkage. The linker can only recursively garbage-collect unused sections fromthe ones that it outputs into the program, and it only outputs sections that are inputfrom object files, not from shared libraries that are to be dynamically linked.
The right way to avoid the dead code in your main
and my prog
is not to do that peculiar kind of linkage in whicha shared library will contain undefined references that the program does not call but that have to beresolved by linking dead object code into your program.
Instead, when you build a shared library, either don't leave any undefined references in it,or else leave only undefined references that shall by satisfied by its own dynamic dependencies.
So, the proper way to build my libfoobar.so
is:
$ gcc -shared -o libfoobar.so foo.o bar.o do_bar.o
This gives me a shared library that has an API of:
void foo(void);void bar(void);
for whoever wants either or both of them, and no undefined references. ThenI build my program that is a client just of foo
:
$ gcc -o prog main.o -L. -lfoobar -Wl,-rpath=$(pwd)$ ./progfoo
And it contains no dead code:
$ nm prog | grep -e foo -e bar U foo
Similarly, if you build your libshared.so
without undefined references, like:
$ gcc -c -fPIC shared2.c shared1.c$ ar -crs libsub.a shared1.o shared2.o$ gcc -shared -o libcshared.so cshared1.o cshared2.o -L. -lsub
and then link your program:
$ gcc -o main main.c libcmain.so libcshared.so
it too will have no dead code:
$ nm main | grep func U func1a
If you dislike the fact that libsub.a(shared1.o)
and libsub.a(shared2.o)
become physically linked into libcshared.so
by this solution, then take theother orthodox approach to linking a shared library: leave all the func*
functions undefined in libcshared.so
: make libsub
alsoa shared library, which then is a dynamic dependency of libcshared.so
.
If you're just looking to get rid of unused functions, you may not need to use a shared library. For GCC, try this. For XL, replace -fdata-sections -ffunction-sections
with -qfuncsect
. An important related topic is the use of export/import lists and visibility options. These control whether extra symbols linked into your library are exported outside your library or not. See here for more information.