How do I use arrays in C++? How do I use arrays in C++? arrays arrays

How do I use arrays in C++?


Arrays on the type level

An array type is denoted as T[n] where T is the element type and n is a positive size, the number of elements in the array. The array type is a product type of the element type and the size. If one or both of those ingredients differ, you get a distinct type:

#include <type_traits>static_assert(!std::is_same<int[8], float[8]>::value, "distinct element type");static_assert(!std::is_same<int[8],   int[9]>::value, "distinct size");

Note that the size is part of the type, that is, array types of different size are incompatible types that have absolutely nothing to do with each other. sizeof(T[n]) is equivalent to n * sizeof(T).

Array-to-pointer decay

The only "connection" between T[n] and T[m] is that both types can implicitly be converted to T*, and the result of this conversion is a pointer to the first element of the array. That is, anywhere a T* is required, you can provide a T[n], and the compiler will silently provide that pointer:

                  +---+---+---+---+---+---+---+---+the_actual_array: |   |   |   |   |   |   |   |   |   int[8]                  +---+---+---+---+---+---+---+---+                    ^                    |                    |                    |                    |  pointer_to_the_first_element   int*

This conversion is known as "array-to-pointer decay", and it is a major source of confusion. The size of the array is lost in this process, since it is no longer part of the type (T*). Pro: Forgetting the size of an array on the type level allows a pointer to point to the first element of an array of any size. Con: Given a pointer to the first (or any other) element of an array, there is no way to detect how large that array is or where exactly the pointer points to relative to the bounds of the array. Pointers are extremely stupid.

Arrays are not pointers

The compiler will silently generate a pointer to the first element of an array whenever it is deemed useful, that is, whenever an operation would fail on an array but succeed on a pointer. This conversion from array to pointer is trivial, since the resulting pointer value is simply the address of the array. Note that the pointer is not stored as part of the array itself (or anywhere else in memory). An array is not a pointer.

static_assert(!std::is_same<int[8], int*>::value, "an array is not a pointer");

One important context in which an array does not decay into a pointer to its first element is when the & operator is applied to it. In that case, the & operator yields a pointer to the entire array, not just a pointer to its first element. Although in that case the values (the addresses) are the same, a pointer to the first element of an array and a pointer to the entire array are completely distinct types:

static_assert(!std::is_same<int*, int(*)[8]>::value, "distinct element type");

The following ASCII art explains this distinction:

      +-----------------------------------+      | +---+---+---+---+---+---+---+---+ |+---> | |   |   |   |   |   |   |   |   | | int[8]|     | +---+---+---+---+---+---+---+---+ ||     +---^-------------------------------+|         ||         ||         ||         |  pointer_to_the_first_element   int*||  pointer_to_the_entire_array              int(*)[8]

Note how the pointer to the first element only points to a single integer (depicted as a small box), whereas the pointer to the entire array points to an array of 8 integers (depicted as a large box).

The same situation arises in classes and is maybe more obvious. A pointer to an object and a pointer to its first data member have the same value (the same address), yet they are completely distinct types.

If you are unfamiliar with the C declarator syntax, the parenthesis in the type int(*)[8] are essential:

  • int(*)[8] is a pointer to an array of 8 integers.
  • int*[8] is an array of 8 pointers, each element of type int*.

Accessing elements

C++ provides two syntactic variations to access individual elements of an array.Neither of them is superior to the other, and you should familiarize yourself with both.

Pointer arithmetic

Given a pointer p to the first element of an array, the expression p+i yields a pointer to the i-th element of the array. By dereferencing that pointer afterwards, one can access individual elements:

std::cout << *(x+3) << ", " << *(x+7) << std::endl;

If x denotes an array, then array-to-pointer decay will kick in, because adding an array and an integer is meaningless (there is no plus operation on arrays), but adding a pointer and an integer makes sense:

   +---+---+---+---+---+---+---+---+x: |   |   |   |   |   |   |   |   |   int[8]   +---+---+---+---+---+---+---+---+     ^           ^               ^     |           |               |     |           |               |     |           |               |x+0  |      x+3  |          x+7  |     int*

(Note that the implicitly generated pointer has no name, so I wrote x+0 in order to identify it.)

If, on the other hand, x denotes a pointer to the first (or any other) element of an array, then array-to-pointer decay is not necessary, because the pointer on which i is going to be added already exists:

   +---+---+---+---+---+---+---+---+   |   |   |   |   |   |   |   |   |   int[8]   +---+---+---+---+---+---+---+---+     ^           ^               ^     |           |               |     |           |               |   +-|-+         |               |x: | | |    x+3  |          x+7  |     int*   +---+

Note that in the depicted case, x is a pointer variable (discernible by the small box next to x), but it could just as well be the result of a function returning a pointer (or any other expression of type T*).

Indexing operator

Since the syntax *(x+i) is a bit clumsy, C++ provides the alternative syntax x[i]:

std::cout << x[3] << ", " << x[7] << std::endl;

Due to the fact that addition is commutative, the following code does exactly the same:

std::cout << 3[x] << ", " << 7[x] << std::endl;

The definition of the indexing operator leads to the following interesting equivalence:

&x[i]  ==  &*(x+i)  ==  x+i

However, &x[0] is generally not equivalent to x. The former is a pointer, the latter an array. Only when the context triggers array-to-pointer decay can x and &x[0] be used interchangeably. For example:

T* p = &array[0];  // rewritten as &*(array+0), decay happens due to the additionT* q = array;      // decay happens due to the assignment

On the first line, the compiler detects an assignment from a pointer to a pointer, which trivially succeeds. On the second line, it detects an assignment from an array to a pointer. Since this is meaningless (but pointer to pointer assignment makes sense), array-to-pointer decay kicks in as usual.

Ranges

An array of type T[n] has n elements, indexed from 0 to n-1; there is no element n. And yet, to support half-open ranges (where the beginning is inclusive and the end is exclusive), C++ allows the computation of a pointer to the (non-existent) n-th element, but it is illegal to dereference that pointer:

   +---+---+---+---+---+---+---+---+....x: |   |   |   |   |   |   |   |   |   .   int[8]   +---+---+---+---+---+---+---+---+....     ^                               ^     |                               |     |                               |     |                               |x+0  |                          x+8  |     int*

For example, if you want to sort an array, both of the following would work equally well:

std::sort(x + 0, x + n);std::sort(&x[0], &x[0] + n);

Note that it is illegal to provide &x[n] as the second argument since this is equivalent to &*(x+n), and the sub-expression *(x+n) technically invokes undefined behavior in C++ (but not in C99).

Also note that you could simply provide x as the first argument. That is a little too terse for my taste, and it also makes template argument deduction a bit harder for the compiler, because in that case the first argument is an array but the second argument is a pointer. (Again, array-to-pointer decay kicks in.)


Programmers often confuse multidimensional arrays with arrays of pointers.

Multidimensional arrays

Most programmers are familiar with named multidimensional arrays, but many are unaware of the fact that multidimensional array can also be created anonymously. Multidimensional arrays are often referred to as "arrays of arrays" or "true multidimensional arrays".

Named multidimensional arrays

When using named multidimensional arrays, all dimensions must be known at compile time:

int H = read_int();int W = read_int();int connect_four[6][7];   // okayint connect_four[H][7];   // ISO C++ forbids variable length arrayint connect_four[6][W];   // ISO C++ forbids variable length arrayint connect_four[H][W];   // ISO C++ forbids variable length array

This is how a named multidimensional array looks like in memory:

              +---+---+---+---+---+---+---+connect_four: |   |   |   |   |   |   |   |              +---+---+---+---+---+---+---+              |   |   |   |   |   |   |   |              +---+---+---+---+---+---+---+              |   |   |   |   |   |   |   |              +---+---+---+---+---+---+---+              |   |   |   |   |   |   |   |              +---+---+---+---+---+---+---+              |   |   |   |   |   |   |   |              +---+---+---+---+---+---+---+              |   |   |   |   |   |   |   |              +---+---+---+---+---+---+---+

Note that 2D grids such as the above are merely helpful visualizations. From the point of view of C++, memory is a "flat" sequence of bytes. The elements of a multidimensional array are stored in row-major order. That is, connect_four[0][6] and connect_four[1][0] are neighbors in memory. In fact, connect_four[0][7] and connect_four[1][0] denote the same element! This means that you can take multi-dimensional arrays and treat them as large, one-dimensional arrays:

int* p = &connect_four[0][0];int* q = p + 42;some_int_sequence_algorithm(p, q);

Anonymous multidimensional arrays

With anonymous multidimensional arrays, all dimensions except the first must be known at compile time:

int (*p)[7] = new int[6][7];   // okayint (*p)[7] = new int[H][7];   // okayint (*p)[W] = new int[6][W];   // ISO C++ forbids variable length arrayint (*p)[W] = new int[H][W];   // ISO C++ forbids variable length array

This is how an anonymous multidimensional array looks like in memory:

              +---+---+---+---+---+---+---+        +---> |   |   |   |   |   |   |   |        |     +---+---+---+---+---+---+---+        |     |   |   |   |   |   |   |   |        |     +---+---+---+---+---+---+---+        |     |   |   |   |   |   |   |   |        |     +---+---+---+---+---+---+---+        |     |   |   |   |   |   |   |   |        |     +---+---+---+---+---+---+---+        |     |   |   |   |   |   |   |   |        |     +---+---+---+---+---+---+---+        |     |   |   |   |   |   |   |   |        |     +---+---+---+---+---+---+---+        |      +-|-+   p: | | |      +---+

Note that the array itself is still allocated as a single block in memory.

Arrays of pointers

You can overcome the restriction of fixed width by introducing another level of indirection.

Named arrays of pointers

Here is a named array of five pointers which are initialized with anonymous arrays of different lengths:

int* triangle[5];for (int i = 0; i < 5; ++i){    triangle[i] = new int[5 - i];}// ...for (int i = 0; i < 5; ++i){    delete[] triangle[i];}

And here is how it looks like in memory:

          +---+---+---+---+---+          |   |   |   |   |   |          +---+---+---+---+---+            ^            | +---+---+---+---+            | |   |   |   |   |            | +---+---+---+---+            |   ^            |   | +---+---+---+            |   | |   |   |   |            |   | +---+---+---+            |   |   ^            |   |   | +---+---+            |   |   | |   |   |            |   |   | +---+---+            |   |   |   ^            |   |   |   | +---+            |   |   |   | |   |            |   |   |   | +---+            |   |   |   |   ^            |   |   |   |   |            |   |   |   |   |          +-|-+-|-+-|-+-|-+-|-+triangle: | | | | | | | | | | |          +---+---+---+---+---+

Since each line is allocated individually now, viewing 2D arrays as 1D arrays does not work anymore.

Anonymous arrays of pointers

Here is an anonymous array of 5 (or any other number of) pointers which are initialized with anonymous arrays of different lengths:

int n = calculate_five();   // or any other numberint** p = new int*[n];for (int i = 0; i < n; ++i){    p[i] = new int[n - i];}// ...for (int i = 0; i < n; ++i){    delete[] p[i];}delete[] p;   // note the extra delete[] !

And here is how it looks like in memory:

          +---+---+---+---+---+          |   |   |   |   |   |          +---+---+---+---+---+            ^            | +---+---+---+---+            | |   |   |   |   |            | +---+---+---+---+            |   ^            |   | +---+---+---+            |   | |   |   |   |            |   | +---+---+---+            |   |   ^            |   |   | +---+---+            |   |   | |   |   |            |   |   | +---+---+            |   |   |   ^            |   |   |   | +---+            |   |   |   | |   |            |   |   |   | +---+            |   |   |   |   ^            |   |   |   |   |            |   |   |   |   |          +-|-+-|-+-|-+-|-+-|-+          | | | | | | | | | | |          +---+---+---+---+---+            ^            |            |          +-|-+       p: | | |          +---+

Conversions

Array-to-pointer decay naturally extends to arrays of arrays and arrays of pointers:

int array_of_arrays[6][7];int (*pointer_to_array)[7] = array_of_arrays;int* array_of_pointers[6];int** pointer_to_pointer = array_of_pointers;

However, there is no implicit conversion from T[h][w] to T**. If such an implicit conversion did exist, the result would be a pointer to the first element of an array of h pointers to T (each pointing to the first element of a line in the original 2D array), but that pointer array does not exist anywhere in memory yet. If you want such a conversion, you must create and fill the required pointer array manually:

int connect_four[6][7];int** p = new int*[6];for (int i = 0; i < 6; ++i){    p[i] = connect_four[i];}// ...delete[] p;

Note that this generates a view of the original multidimensional array. If you need a copy instead, you must create extra arrays and copy the data yourself:

int connect_four[6][7];int** p = new int*[6];for (int i = 0; i < 6; ++i){    p[i] = new int[7];    std::copy(connect_four[i], connect_four[i + 1], p[i]);}// ...for (int i = 0; i < 6; ++i){    delete[] p[i];}delete[] p;


Assignment

For no particular reason, arrays cannot be assigned to one another. Use std::copy instead:

#include <algorithm>// ...int a[8] = {2, 3, 5, 7, 11, 13, 17, 19};int b[8];std::copy(a + 0, a + 8, b);

This is more flexible than what true array assignment could provide because it is possible to copy slices of larger arrays into smaller arrays.std::copy is usually specialized for primitive types to give maximum performance. It is unlikely that std::memcpy performs better. If in doubt, measure.

Although you cannot assign arrays directly, you can assign structs and classes which contain array members. That is because array members are copied memberwise by the assignment operator which is provided as a default by the compiler. If you define the assignment operator manually for your own struct or class types, you must fall back to manual copying for the array members.

Parameter passing

Arrays cannot be passed by value. You can either pass them by pointer or by reference.

Pass by pointer

Since arrays themselves cannot be passed by value, usually a pointer to their first element is passed by value instead. This is often called "pass by pointer". Since the size of the array is not retrievable via that pointer, you have to pass a second parameter indicating the size of the array (the classic C solution) or a second pointer pointing after the last element of the array (the C++ iterator solution):

#include <numeric>#include <cstddef>int sum(const int* p, std::size_t n){    return std::accumulate(p, p + n, 0);}int sum(const int* p, const int* q){    return std::accumulate(p, q, 0);}

As a syntactic alternative, you can also declare parameters as T p[], and it means the exact same thing as T* p in the context of parameter lists only:

int sum(const int p[], std::size_t n){    return std::accumulate(p, p + n, 0);}

You can think of the compiler as rewriting T p[] to T *p in the context of parameter lists only. This special rule is partly responsible for the whole confusion about arrays and pointers. In every other context, declaring something as an array or as a pointer makes a huge difference.

Unfortunately, you can also provide a size in an array parameter which is silently ignored by the compiler. That is, the following three signatures are exactly equivalent, as indicated by the compiler errors:

int sum(const int* p, std::size_t n)// error: redefinition of 'int sum(const int*, size_t)'int sum(const int p[], std::size_t n)// error: redefinition of 'int sum(const int*, size_t)'int sum(const int p[8], std::size_t n)   // the 8 has no meaning here

Pass by reference

Arrays can also be passed by reference:

int sum(const int (&a)[8]){    return std::accumulate(a + 0, a + 8, 0);}

In this case, the array size is significant. Since writing a function that only accepts arrays of exactly 8 elements is of little use, programmers usually write such functions as templates:

template <std::size_t n>int sum(const int (&a)[n]){    return std::accumulate(a + 0, a + n, 0);}

Note that you can only call such a function template with an actual array of integers, not with a pointer to an integer. The size of the array is automatically inferred, and for every size n, a different function is instantiated from the template. You can also write quite useful function templates that abstract from both the element type and from the size.