T* versus char* pointer arithmetic T* versus char* pointer arithmetic arrays arrays

T* versus char* pointer arithmetic


In [dcl.array]:

An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.

Contiguous implies that the offset between any consecutive subobjects of type T is sizeof(T), which implies that the offset of the nth subobject is n*sizeof(T).

The upper bound of n < N comes from [expr.add]:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j < n; otherwise, the behavior is undefined.


It's always true, but instead of looking at the rules for pointer arithmetic you must rely on the semantics given for the sizeof operator (5.3.3 [expr.sizeof]):

When applied to a reference or a reference type, the result is the size of the referenced type. When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. The size of a most derived class shall be greater than zero. The result of applying sizeof to a base class subobject is the size of the base class type. When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element.

It should be clear that there's only one packing that puts n non-overlapping elements in space of n * sizeof(element), namely that they are regularly spaced sizeof (element) bytes apart. And only one ordering is allowed by the pointer comparison rules found under the relational operator section (5.9 [expr.rel]):

Comparing pointers to objects is defined as follows:

  • If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.


The declaration in the first line is also a definition. (§3.1(2))It creates the array object. (§1.8(1))

An object can be accessed via multiple lvalues due to the aliasing rules. (§3.10(10)) In particular, the objects on the right hand side may be legally accessed (aliased) through char pointers.

Lets look at a sentence in the array definition and then disambiguate 'contiguous'.

"An object of array type contains a contiguously allocated non-empty set of N subobjects of type T." [dcl.array] §8.3.4.


Disambiguation

We start from the binary symmetric relation 'contiguous' for char objects, which should be obvious. ('iff' is short for 'if and only if', sets and sequences are mathematical ones, not C++ containers) If you canlink to a better or more acknowledged definition, comment.

A sequence x_1 ... x_N of char objects is contiguous iffx_i and x_{i+1} are contiguous in memory for all i=1...N-1.

A set M of char objects is contiguous iff the objects in M can be numbered, x_1 ...x_N, say, such that the sequence (x_i)_i is contiguous.That is, iff M is the image of a contiguous, injective sequence.

Two sets M_1, M_2 of char objects are contiguous iff thereexist x_1 in M_1 and x_2 in M_2 such that x_1 and x_2 are contiguous.

A sequence M_1 ... M_N of sets of char objects is contiguous iffM_i and M_{i+1} are contiguous for all i=1...N-1.

A set of sets of char objects is contiguous iff it is the image ofa contiguous, injective sequence of sets of char objects.

Now which version of 'contiguous' to apply? Linguistic overload resolution:

1) 'contiguous' may refer to 'allocation'. As an allocation function call provides asubset of the available char objects, this would invoke the set-of-chars variant. That is,the set of all char objects that occur in any of the N subobjects would be meant to be contiguous.

2) 'contiguous' may refer to 'set'. This would invoke the set-of-sets-of-chars variant with every subobject considered as a set of char objects.


What does this mean? First, while the authors numbered the array subobjects a[0] ... a[N-1], they chose not to say anything about theorder of subobjects in memory: they used 'set' instead of 'sequence'.They described the allocation as contiguous, but they do not say that a[j] and a[j+1] are contiguous in memory. Also, they chose not to write down the straightforward formula involving (char*) pointers and sizeof(). While it looks like they deliberately separated contiguity from ordering concerns, §5.9 (3) requires one and the same ordering for array subobjects of all types.

If pointers point to two different elements of the same array, or a subobject thereof, the pointer to the element with the higher subscript compares greater.

Now do the bytes that make up the array subobjects qualify assubobjects in the sense of the above quote? Reading §1.8(2) and Complete object or subobject?the answer is: No, at least not for arrays whose elements don't contain subobjects and are no arrays of chars, e.g. arrays of ints. So we may find examples where no particular ordering is imposed on the array elements.

But for the moment let's assume that our array subobjects are populated with chars only. What does this mean considering the two possible interpretations of 'contiguous'?

1) We have a contiguous set of bytes that coincides with an ordered set of subobjects.Then the claim in the OP is unconditionally true.

2) We have a contiguous sequence of subobjects, each of which may be non-contiguous individually.This may happen in two ways: either the subobjects may have gaps, that is, they contain two char objects at distance greater than sizeof(subobject)-1. Or thesubobjects may be distributed among different sequences of contiguous bytes.

In case 2) there is no guarantee that that the claim in the OP is true.

Therefore, it is important to be clear about what 'contiguous' means.


Finally, here's an example of an implementation where no obvious ordering is imposed on the array subobjects by §5.9 because the array subobjects don't have subobjects themselves. Readers raised concerns that this would contradict the standard in other places, but no definite contradiction has been demonstrated yet.

Assume T is int, and we have one particular conforming implementation that behaves as expected naively with one exception:

It allocates arrays of ints in reversed memory order,putting the array's first element at the high-memory-address end of the object:

a[N-1], a[N-2], ... a[0]  

instead of

a[0], a[1],   ... a[N-1]  

This implementation satisfies any reasonable contiguityrequirement, so we don't have to agree on a single interpretation of 'contiguous' to proceed with the argument.

Then if p points to a, mapping p to &a[0] (invoking [conv.array]) would make the pointer jump near the high memory end of a.As array arithmetic has to be compatible with pointer arithmetic, we'd also have

int * p= &intVariable;(char*)(p+1) + sizeof(int) == (char*)p

and

int a[N];(char*)(void*)&a[n] + n*sizeof(int)==(char*)(void*)&a[0],  (0<=n<N)

Then, for T=int, there is no guarantee that the claim in the original post is true.


edit history: removed and reintroduced in modified form a possibly erroneous shortcut that was due to not applying a relevant part of the pointer < relation specification. It has not been determined yet whether this was justified or not, but the main argument about contiguity comes through anyway.