How to remove strings from a compiled binary (.so) How to remove strings from a compiled binary (.so) linux linux

How to remove strings from a compiled binary (.so)


These strings are in the dynamic symbol table, which is used when the library is loaded at runtime. readelf -p .dynstr mylib.so will show these entries.

strip -g will remove debugging symbols, but it can't remove entries from the dynamic symbol table, as these may be needed at runtime. Your problem is that you have entries in the dynamic symbol table for functions which are never going to be called from outside your library. Unless you tell it, the compiler/linker has no way of knowing which functions form part of the external API (and therefore need entries in the dynamic symbol table) and which functions are private to your library (and so don't need entries in the dynamic symbol table), so it just creates dynamic symbol table entries for all non-static functions.

There are two main ways you can inform the compiler which functions are private.

  1. Mark the private functions static. Obviously, this only works for functions only needed within a single compilation unit, though for some libraries this technique might be sufficient.

  2. Use the gcc "visibility" attribute to mark the functions as visible or hidden. You have two options: either mark all the private functions as hidden, or change the default visibility to hidden using the -fvisibility=hidden compiler option and mark all the public functions as visible. The latter is probably the best option for you, as it means that you don't have to worry about accidentally adding a function and forgetting to mark it as hidden.

If you have a function:

int foo(int a, int b);

then the syntax for marking it hidden is:

int foo(int a, int b) __attribute__((visibility("hidden")));

and the syntax for marking it visible is:

int foo(int a, int b) __attribute__((visibility("default")));

For further details, see this document, which is an excellent source of information on this subject.


There are some commercial obfuscators which accomplish this. Basically, they re-write all of the symbols on the go. Something like this:

void foo()

becomes

void EEhj_y33() // usually much, much longer and clobbered

Variable names are also given the same treatment, as are members of structures / unions (depending on what level of obfuscation you set).

Most of them work by scanning your code base, establishing a dictionary then substituting garbled messes for symbol names in the output, which can then be compiled as usual.

I don't recommend using them, but they are available. Simply obfuscating meaningful symbol names is not going to stop someone who is determined to discover how your library / program works. Additionally, you aren't going to be able to do anything about someone who traces system calls. Really, what's the point? Some argue that it helps keep the 'casual observer' at bay, I argue that someone running ltrace strace and strings is typically anything but casual.

Unless you mean string literals , not symbols ? There's nothing you can do about them, unless you store the literals in an encrypted format that you code has to decrypt before using. That is not just a waste, but an egregious waste that provides no benefit whatsoever.


Assuming you are correctly specifying a hidden visibility to g++ for all of your source files (as other posters have recommended), there's a chance you might be running in to this GCC bug:http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38643

Try dumping the symbols in your binary that are showing up (readelf -Wa mylib.so | c++filt | less); if you see only vtable and VTT symbols after demangling, then the gcc bug might be your problem.

Edit: if you can, try GCC 4.4.0 or later, as it appears to be fixed there.