Why use _mm_malloc? (as opposed to _aligned_malloc, alligned_alloc, or posix_memalign)
Intel compilers support POSIX (Linux) and non-POSIX (Windows) operating systems, hence cannot rely upon either the POSIX or the Windows function. Thus, a compiler-specific but OS-agnostic solution was chosen.
C11 is a great solution but Microsoft doesn't even support C99 yet, so who knows if they will ever support C11.
Update: Unlike the C11/POSIX/Windows allocation functions, the ICC intrinsics include a deallocation function. This allows this API to use a separate heap manager from the default one. I don't know if/when it actually does that, but it can be useful to support this model.
Disclaimer: I work for Intel but have no special knowledge of these decisions, which happened long before I joined the company.
It's possible to take an existing C compiler which does not presently happen to use the identifiers _mm_alloc
and _mm_free
and define functions with those names which will behave as required. This could be done either by having _mm_alloc
function as a wrapper on malloc()
which asks for a slightly-oversized allocation and constructs a pointer to the first suitably-aligned address within it that's at least one byte from the beginning, and storing the number of bytes skipped immediately before that address, or by having _mm_malloc
request large chunks of memory from malloc()
and then dispense them piecemeal. In any case, the pointers returned by _mm_malloc()
would not be pointers that free()
would generally know how to do anything with; calling _mm_free
would use the byte immediately preceding the allocation as an aid to finding the real start of the allocation received from malloc
, and then pass that do free
.
If an aligned-allocate function is allowed to use the internals of the malloc
and free
functions, however, that may eliminate the need for the extra layer of wrapping. It's possible to write _mm_alloc()
/_mm_free()
functions which wraps malloc
/free
without knowing anything about their internals, but it requires that _mm_alloc()
keep book-keeping information which is separate from that used by malloc
/free
.
If the author of an aligned-allocate function knows how malloc
and free
are implemented, it will often be possible to coordinate the design of all the allocation/free functions so that free
can distinguish all kinds of allocations and handle them appropriately. No single aligned-allocate implementation would be usable on all malloc
/free
implementations, however.
I would suggest that the most portable way to write code would probably be to select a couple of symbols that are not used anywhere else for your own allocate and free functions, so that you could then say, e.g.
#define a_alloc(align,sz) _mm_alloc((align),(sz))#define a_free(ptr) _mm_free((ptr))
on compilers that support that, or
static inline void *aa_alloc(int align, int size){ void *ret=0; posix_memalign(&ret, align, size); // Guessing here return ret;}#define a_alloc(align,sz) aa_alloc((align),(sz))#define a_free(ptr) free((ptr))
on Posix systems, etc. For every system it should be possible to define macros or functions that will yield the necessary behavior [I think it's probably better to use macros consistently than to sometimes use macros and sometimes functions, so as to allow #if defined macroname
to test whether things are defined yet].
_mm_malloc seems to have been created before there was a standard aligned_alloc function, and the need to use _mm_free is a quirk of the implementation.
My guess is that unlike when using posix_memalign, it doesn't need to over-allocate in order to guarantee alignment, instead it uses a separate alignment-aware allocator. This will save memory when allocating types with alignment different to the default alignment (typically 8 or 16 bytes).