Does the C standard guarantee buffers are not touched past their null terminator? Does the C standard guarantee buffers are not touched past their null terminator? c c

Does the C standard guarantee buffers are not touched past their null terminator?


The C99 draft standard does not explicitly state what should happen in those cases, but by considering multiple variations, you can show that it must work a certain way so that it meets the specification in all cases.

The standard says:

%s - Matches a sequence of non-white-space characters.252)

If no l length modifier is present, the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically.

Here's a pair of examples that show it must work the way you are proposing to meet the standard.

Example A:

char buffer[4] = "abcd";char buffer2[10];  // Note the this could be placed at what would be buffer+4sscanf("123 4", "%s %s", buffer, buffer2);// Result is buffer =  "123\0"//           buffer2 = "4\0"

Example B:

char buffer[17] = "abcdefghijklmnop";char* buffer2 = &buffer[4];sscanf("123 4", "%s %s", buffer, buffer2);// Result is buffer = "123\04\0"

Note that the interface of sscanf doesn't provide enough information to really know that these were different. So, if Example B is to work properly, it must not mess with the bytes after the null character in Example A. This is because it must work in both cases according to this bit of spec.

So implicitly it must work as you stated due to the spec.

Similar arguments can be placed for other functions, but I think you can see the idea from this example.

NOTE:Providing size limits in the format, such as "%16s", could change the behavior. By the specification, it would be functionally acceptable for sscanf to zero out a buffer to its limits before writing the data into the buffer. In practice, most implementations opt for performance, which means they leave the remainder alone.

When the intent of the specification is to do this sort of zeroing out, it is usually explicitly specified. strncpy is an example. If the length of the string is less than the maximum buffer length specified, it will fill the rest of the space with null characters. The fact that this same "string" function could return a non-terminated string as well makes this one of the most common functions for people to roll their own version.

As far as fgets, a similar situation could arise. The only gotcha is that the specification explicitly states that if nothing is read in, the buffer remains untouched. An acceptable functional implementation could sidestep this by checking to see if there is at least one byte to read before zeroing out the buffer.


Each individual byte in the buffer is an object. Unless some part of the function description of sscanf or fgets mentions modifying those bytes, or even implies their values may change e.g. by stating their values become unspecified, then the general rule applies: (emphasis mine)

6.2.4 Storage durations of objects

2 [...] An object exists, has a constant address, and retains its last-stored value throughout its lifetime. [...]

It's this same principle that guarantees that

#include <stdio.h>int a = 1;int main() {  printf ("%d\n", a);  printf ("%d\n", a);}

attempts to print 1 twice. Even though a is global, printf can access global variables, and the description of printf doesn't mention not modifying a.

Neither the description of fgets nor that of sscanf mentions modifying buffers past the bytes that actually were supposed to be written (except in the case of a read error), so those bytes don't get modified.


The standard is somewhat ambiguous on this, but I think a reasonable reading of it is that the answer is: yes, it's not allowed to write more bytes to the buffer than it read+null. On the other hand, a stricter reading/interpretation of the text could conclude that the answer is no, there's no guarantee. Here's what a publicly avaialble draft says about fgets.

char *fgets(char * restrict s, int n, FILE * restrict stream);

The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.

The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

There's a guarantee about how much it is supposed to read from the input, i.e. stop reading at newline or EOF and not read more than n-1 bytes. Although nothing is said explicitly about how much it's allowed to write to the buffer, the common knowledge is that fgets's n parameter is used to prevent buffer overflows. It's a little strange that the standard uses the ambiguous term read, which may not necessarily imply that gets can't write to the buffer more than n bytes, if you want to nitpick on the terminology it uses. But note that the same "read" terminology is used about both issues: the n-limit and the EOF/newline limit. So if you interpret the n-related "read" as a buffer-write limit, then [for consistency] you can/should interpret the other "read" the same way, i.e. not write more than what it read when string is shorter than the buffer.

On the other hand, if you distinguish between the uses of the phrase-verb "read into" (="write") and just "read", then you can't read the committee's text the same way. You are guaranteed that it won't "read into" (="write to") the array more than n bytes, but if the input string is terminated sooner by newline or EOF you're only guaranteed the rest (of the input) won't be "read", but whether that implies in won't be "read into" (="written to") the buffer is unclear under this stricter reading. The crucial issue is keyword is "into", which is elided, so the problem is whether the completion given by me in brackets in the following modified quote is the intended interpretation:

No additional characters are read [into the array] after a new-line character (which is retained) or after end-of-file.

Frankly a single postcondition stated as a formula (and would be pretty short in this case) would have been a lot more helpful than the verbiage I quoted...

I can't be bothered to try and analyze their writeup about the *scanf family, because I suspect it's going to be even more complicated given all the other things that happen in those functions; their writeup for fscanf is about five pages long... But I suspect a similar logic applies.