Realistic usage of the C99 'restrict' keyword?

c gcc c99 restrict-qualifier

restrict says that the pointer is the only thing that accesses the underlying object. It eliminates the potential for pointer aliasing, enabling better optimization by the compiler.

For instance, suppose I have a machine with specialized instructions that can multiply vectors of numbers in memory, and I have the following code:

void MultiplyArrays(int* dest, int* src1, int* src2, int n){    for(int i = 0; i < n; i++)    {        dest[i] = src1[i]*src2[i];    }}

The compiler needs to properly handle if dest, src1, and src2 overlap, meaning it must do one multiplication at a time, from start to the end. By having restrict, the compiler is free to optimize this code by using the vector instructions.

Wikipedia has an entry on restrict, with another example, here.

c gcc c99 restrict-qualifier

The Wikipedia example is very illuminating.

It clearly shows how it allows to save one assembly instruction.

Without restrict:

void f(int *a, int *b, int *x) {  *a += *x;  *b += *x;}

Pseudo assembly:

load R1 ← *x    ; Load the value of x pointerload R2 ← *a    ; Load the value of a pointeradd R2 += R1    ; Perform Additionset R2 → *a     ; Update the value of a pointer; Similarly for b, note that x is loaded twice,; because x may point to a (a aliased by x) thus ; the value of x will change when the value of a; changes.load R1 ← *xload R2 ← *badd R2 += R1set R2 → *b

With restrict:

void fr(int *restrict a, int *restrict b, int *restrict x);

Pseudo assembly:

load R1 ← *xload R2 ← *aadd R2 += R1set R2 → *a; Note that x is not reloaded,; because the compiler knows it is unchanged; "load R1 ← *x" is no longer needed.load R2 ← *badd R2 += R1set R2 → *b

Does GCC really do it?

GCC 4.8 Linux x86-64:

gcc -g -std=c99 -O0 -c main.cobjdump -S main.o

With -O0, they are the same.

With -O3:

void f(int *a, int *b, int *x) {    *a += *x;   0:   8b 02                   mov    (%rdx),%eax   2:   01 07                   add    %eax,(%rdi)    *b += *x;   4:   8b 02                   mov    (%rdx),%eax   6:   01 06                   add    %eax,(%rsi)  void fr(int *restrict a, int *restrict b, int *restrict x) {    *a += *x;  10:   8b 02                   mov    (%rdx),%eax  12:   01 07                   add    %eax,(%rdi)    *b += *x;  14:   01 06                   add    %eax,(%rsi)

For the uninitiated, the calling convention is:

rdi = first parameter
rsi = second parameter
rdx = third parameter

GCC output was even clearer than the wiki article: 4 instructions vs 3 instructions.

Arrays

So far we have single instruction savings, but if pointer represent arrays to be looped over, a common use case, then a bunch of instructions could be saved, as mentioned by supercat.

Consider for example:

void f(char *restrict p1, char *restrict p2) {    for (int i = 0; i < 50; i++) {        p1[i] = 4;        p2[i] = 9;    }}

Because of restrict, a smart compiler (or human), could optimize that to:

memset(p1, 4, 50);memset(p2, 9, 50);

which is potentially much more efficient as it may be assembly optimized on a decent libc implementation (like glibc): Is it better to use std::memcpy() or std::copy() in terms to performance?

Does GCC really do it?

GCC 5.2.1.Linux x86-64 Ubuntu 15.10:

gcc -g -std=c99 -O0 -c main.cobjdump -dr main.o

With -O0, both are the same.

With -O3:

with restrict:

3f0:   48 85 d2                test   %rdx,%rdx3f3:   74 33                   je     428 <fr+0x38>3f5:   55                      push   %rbp3f6:   53                      push   %rbx3f7:   48 89 f5                mov    %rsi,%rbp3fa:   be 04 00 00 00          mov    $0x4,%esi3ff:   48 89 d3                mov    %rdx,%rbx402:   48 83 ec 08             sub    $0x8,%rsp406:   e8 00 00 00 00          callq  40b <fr+0x1b>                        407: R_X86_64_PC32      memset-0x440b:   48 83 c4 08             add    $0x8,%rsp40f:   48 89 da                mov    %rbx,%rdx412:   48 89 ef                mov    %rbp,%rdi415:   5b                      pop    %rbx416:   5d                      pop    %rbp417:   be 09 00 00 00          mov    $0x9,%esi41c:   e9 00 00 00 00          jmpq   421 <fr+0x31>                        41d: R_X86_64_PC32      memset-0x4421:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)428:   f3 c3                   repz retq

Two memset calls as expected.

without restrict: no stdlib calls, just a 16 iteration wide loop unrolling which I do not intend to reproduce here :-)

I haven't had the patience to benchmark them, but I believe that the restrict version will be faster.

C99

Let's look at the standard for completeness sake.

restrict says that two pointers cannot point to overlapping memory regions. The most common usage is for function arguments.

This restricts how the function can be called, but allows for more compile-time optimizations.

If the caller does not follow the restrict contract, undefined behavior.

The C99 N1256 draft 6.7.3/7 "Type qualifiers" says:

The intended use of the restrict qualifier (like the register storage class) is to promote optimization, and deleting all instances of the qualifier from all preprocessing translation units composing a conforming program does not change its meaning (i.e., observable behavior).

and 6.7.3.1 "Formal definition of restrict" gives the gory details.

Strict aliasing rule

The restrict keyword only affects pointers of compatible types (e.g. two int*) because the strict aliasing rules says that aliasing incompatible types is undefined behavior by default, and so compilers can assume it does not happen and optimize away.

See: What is the strict aliasing rule?

See also

C++14 does not yet have an analogue for restrict, but GCC has __restrict__ as an extension: What does the restrict keyword mean in C++?
Many questions that ask: according to the gory details, does this code UB or not?
A "when to use" question: When to use restrict and when not to
The related GCC __attribute__((malloc)), which says that the return value of a function is not aliased to anything: GCC: __attribute__((malloc))

CodeHunter

Realistic usage of the C99 'restrict' keyword?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last