Should I always use size_t when indexing arrays?
size_t
is an unsigned integer that is capable of holding the size of the largest object you can allocate. It is useful for indexing because this means it can index into the largest array you can allocate.
This does not mean it is required or even necessarily recommended for indexing. You can use any integer type that is large enough to index the array. int_fast32_t
might be faster, uint_least16_t
might be smaller in a structure, and so on. Know your data, and you can make a good choice.
One consideration you should make is that on some platforms, using a signed index might require an extra sign extension instruction. As an example, here is x86:
// movzx eax, BYTE PTR [rcx+rdx]// retchar get_index(char *ptr, unsigned idx){ return ptr[idx];}// ; sign extending idx from 32 bits to 64 bits with movsx here.// movsx rdx, edx // movzx eax, BYTE PTR [rcx+rdx]// retchar get_index(char *ptr, int idx){ return ptr[idx];}
Virtual memory is outside the scope of C or C++. From their point of view, you simply index into memory and it's up to your platform to make it work. In practice your app only uses virtual addresses; your CPU/OS is translating the virtual address to a physical address behind the scenes. It is not something you need to worry about.
In order to avoid program failures the programmer should always use an index type that is at least as large as the type returned by the size()
method. This ensures that the index never overflows any possible size of the array. The implementation of an array is usually making sure that its runtime size never overflows the type returned by the size()
method. This means the index type should be:
size_t
in case ofchar[N]
,uint8_t[N]
,int[N]
, etcsize_t
in case ofstd::vector
andstd::list
int
in case ofQList
andQVector
- an arbitrary precision integer (aint) in case of bitarrays (if the bitarray's
size()
method returns an aint) - aint in case of arrays compressed in memory (if the array's
size()
method returns an aint) - aint in case of arrays spanning multiple machines (if the array's
size()
method returns an aint) - Other languages than C++:
int
in case ofjava.util.Collection
and its subclasses
In summary: A safe index type is the type returned by the size()
method.
Note: If the size()
method returns the unsigned size_t
, then the signed int
and ssize_t
aren't safe index types. In case of gcc and clang, the compiler flags -Wsign-compare
(enabled by -Wall
) and -Wconversion
can be used to prevent most of these cases.