String literals vs array of char when initializing a pointer String literals vs array of char when initializing a pointer arrays arrays

String literals vs array of char when initializing a pointer


I think you're confused because char *p = "ab"; and char p[] = "ab"; have similar semantics, but different meanings.

I believe that the latter case (char p[] = "ab";) is best regarded as a short-hand notation for char p[] = {'a', 'b', '\0'}; (initializes an array with the size determined by the initializer). Actually, in this case, you could say "ab" is not really used as a string literal.

However, the former case (char *p = "ab";) is different in that it simply initializes the pointer p to point to the first element of the read-only string literal "ab".

I hope you see the difference. While char p[] = "ab"; is representable as an initialization such as you described, char *p = "ab"; is not, as pointers are, well, not arrays, and initializing them with an array initializer does something entirely different (namely give them the value of the first element, 0x61 in your case).

Long story short, C compilers only "replace" a string literal with a char array initializer if it is suitable to do so, i.e. it is being used to initialize a char array.


The second example is syntactically incorrect. In C, {'a', 'b', '\0'} can be used to initialize an array, but not a pointer.

Instead, you can use a C99 compound literal (also available in some compilers as extension, e.g, GCC) like this:

char *p = (char []){'a', 'b', '\0'};

Note that it's more powerful as the initializer isn't necessarily null-terminated.


String literals have a "magical" status in C. They're unlike anything else. To understand why, it's useful to think about this in terms of memory management. For example, ask yourself, "Where is a string literal stored in memory? When is it freed from memory?" and things will start making sense.

They're unlike numeric literals which translate easily to machine instructions. For a simplified example, something like this:

int x = 123;

... might translate to something like this at the machine level:

mov ecx, 123

When we do something like:

const char* str = "hello";

... we now have a dilemma:

mov ecx, ???

There's not necessarily some native understanding of the hardware of what a multi-byte, variable-length string actually is. It mainly knows about bits and bytes and numbers and has registers designed to store these things, yet a string is a memory block containing multiple of those.

So compilers have to generate instructions to store that string's memory block somewhere, and so they typically generate instructions when compiling your code to store that string somewhere in a globally-accessible place (typically a read-only memory segment or the data segment). They might also coalesce multiple literal strings that are identical to be stored in the same memory region to avoid redundancy. Now it can generate a mov/load instruction to load the address to the literal string, and you can then work with it indirectly through a pointer.

Another scenario we might run into is this:

static const char* some_global_ptr = "blah";int main(){    if (...)    {        const char* ptr = "hello";        ...        some_global_ptr = ptr;    }    printf("%s\n", some_global_ptr);}

Naturally ptr goes out of scope, but we need that literal string's memory to linger around for this program to have well-defined behavior. So literal strings translate not only to addresses to globally-accessible memory blocks, but they also don't get freed as long as your binary/program is loaded/running so that you don't have to worry about their memory management. [Edit: excluding potential optimizations: for the C programmer, we never have to worry about the memory management of a literal string, so the effect is like it's always there].

Now about character arrays, literal strings aren't necessarily character arrays, per se. At no point in the software can we capture them to an array r-value that can give us the number of bytes allocated using sizeof. We can only point to the memory through char*/const char*

This code actually gives us a handle to such an array without involving a pointer:

char str[] = "hello";

Something interesting happens here. A production compiler is likely going to apply all kinds of optimizations, but excluding those, at a basic level such code might create two separate memory blocks.

The first block is going to be persistent for the duration of the program, and will contain that literal string, "hello". The second block will be for that actual str array, and it's not necessarily persistent. If we wrote such code inside a function, it's going to allocate memory on the stack, copy that literal string to the stack, and the free the memory from the stack when str goes out of scope. The address of str is not going to match the literal string, to put it another way.

Finally, when we write something like this:

char str[] = {'h', 'e', 'l', 'l', 'o', '\0'};

... it's not necessarily equivalent, as here there are no literal strings involved. Of course an optimizer is allowed to do all kinds of things, but in this scenario, it is possible that we will simply create a single memory block (allocated on the stack and freed from the stack if we're inside a function) with instructions to move all these numbers (characters) you specified to the stack.

So while we're effectively achieving the same effect as the previous version as far as the logic of the software is concerned, we're actually doing something subtly different when we don't specify a literal string. Again, optimizers can recognize when doing something different can have the same logical effect, so they might get fancy here and make these two effectively the same thing in terms of machine instructions. But short of that, this is subtly different code we're writing.

Last but not least, when we use initializers like {...}, the compiler expects you to assign it to an aggregate l-value with memory that is allocated and freed at some point when things go out of scope. So that's why you're getting the error trying to assign such a thing to a scalar (a single pointer).