String literals: Where do they go? String literals: Where do they go? c c

String literals: Where do they go?


A common technique is for string literals to be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you can't change it).

It does vary by platform. For example, simpler chip architectures may not support read-only memory segments so the data segment will be writable.

Rather than try to figure out a trick to make string literals changeable (it will be highly dependent on your platform and could change over time), just use arrays:

char foo[] = "...";

The compiler will arrange for the array to get initialized from the literal and you can modify the array.


Why should I not try to alter it?

Because it is undefined behavior. Quote from C99 N1256 draft 6.7.8/32 "Initialization":

EXAMPLE 8: The declaration

char s[] = "abc", t[3] = "abc";

defines "plain" char array objects s and t whose elements are initialized with character string literals.

This declaration is identical to

char s[] = { 'a', 'b', 'c', '\0' },t[] = { 'a', 'b', 'c' };

The contents of the arrays are modifiable. On the other hand, the declaration

char *p = "abc";

defines p with type "pointer to char" and initializes it to point to an object with type "array of const char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.

Where do they go?

GCC 4.8 x86-64 ELF Ubuntu 14.04:

  • char s[]: stack
  • char *s:
    • .rodata section of the object file
    • the same segment where the .text section of the object file gets dumped, which has Read and Exec permissions, but not Write

Program:

#include <stdio.h>int main() {    char *s = "abc";    printf("%s\n", s);    return 0;}

Compile and decompile:

gcc -ggdb -std=c99 -c main.cobjdump -Sr main.o

Output contains:

 char *s = "abc";8:  48 c7 45 f8 00 00 00    movq   $0x0,-0x8(%rbp)f:  00         c: R_X86_64_32S .rodata

So the string is stored in the .rodata section.

Then:

readelf -l a.out

Contains (simplified):

Program Headers:  Type           Offset             VirtAddr           PhysAddr                 FileSiz            MemSiz              Flags  Align      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000                 0x0000000000000704 0x0000000000000704  R E    200000 Section to Segment mapping:  Segment Sections...   02     .text .rodata

This means that the default linker script dumps both .text and .rodata into a segment that can be executed but not modified (Flags = R E). Attempting to modify such a segment leads to a segfault in Linux.

If we do the same for char[]:

 char s[] = "abc";

we obtain:

17:   c7 45 f0 61 62 63 00    movl   $0x636261,-0x10(%rbp)

so it gets stored in the stack (relative to %rbp), and we can of course modify it.


There is no one answer to this. The C and C++ standards just say that string literals have static storage duration, any attempt at modifying them gives undefined behavior, and multiple string literals with the same contents may or may not share the same storage.

Depending on the system you're writing for, and the capabilities of the executable file format it uses, they may be stored along with the program code in the text segment, or they may have a separate segment for initialized data.

Determining the details will vary depending on the platform as well -- most probably include tools that can tell you where it's putting it. Some will even give you control over details like that, if you want it (e.g. gnu ld allows you to supply a script to tell it all about how to group data, code, etc.)