Can a compiler automatically detect pure functions without the type information about purity? Can a compiler automatically detect pure functions without the type information about purity? c c

Can a compiler automatically detect pure functions without the type information about purity?


Sure, you can detect pure functions in some cases. For instance,

int f(int x){    return x*2;}

can be detected as pure with simple static analysis. The difficulty is doing this in general, and detecting interfaces which use "internal" state but are externally pure is basically impossible.

GCC does have the warning options -Wsuggest-attribute=pure and -Wsuggest-attribute=const, which suggest functions that might be candidates for the pure and const attributes. I'm not sure whether it opts to be conservative (i.e. missing many pure functions, but never suggesting it for a non-pure function) or lets the user decide.

Note that GCC's definition of pure is "depending only on arguments and global variables":

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.

GCC manual

Strict purity, i.e. the same results for the same arguments in all circumstances, is represented by the const attribute, but such a function cannot even dereference a pointer passed to it. So the parallelisation opportunities for pure functions are limited, but much fewer functions can be const compared to the pure functions you can write in a language like Haskell.

By the way, automatically parallelising pure functions is not as easy as you might think; the hard part becomes deciding what to parallelise. Parallelise computations that are too cheap, and overhead makes it pointless. Don't parallelise enough, and you don't reap the benefits. I don't know of any practical functional language implementation that does automatic parallelisation for this reason, although libraries like repa parallelise many operations behind the scenes without explicit parallelism in the user code.


There is another problem. Consider

int isthispure(int i) {   if (false) return getchar();   return i + 42;}

The function is effectively pure, though it contains impure code, but this code cannot be reached.Now suppose false is replaced by g(i) but we know quite sure that g(i) is false (for example, g might check if its argument is a Lychrel number).To prove that isthispure is indeed pure, the compiler would have to prove that no Lychrel numbers exist.

(I admit that this is a quite theoretical consideration. One could also decide that if a function contains any impure code, it is itself impure. But this is not justified by the C type system, IMHO.)


Determining if a function is pure (even in the limited sense used by GCC) is equivalent to the halting problem, so the answer is "not for arbitrary functions." It is possible to automatically detect that some functions are pure, others are not pure, and flag the rest as "unknown", which allows for automatic parallelization in some cases.

In my experience, even programmers aren't very good at figuring out such things, so I want the type system to help keep track of it for me, not just for the optimizer.