How does memchr() work under the hood? How does memchr() work under the hood? c c

How does memchr() work under the hood?


I would suggest taking a look at GNU libc's source. As for most functions, it will contain both a generic optimized C version of the function, and optimized assembly language versions for as many supported architectures as possible, taking advantage of machine specific tricks.

The x86-64 SSE2 version combines the results from pcmpeqb on a whole cache-line of data at once (four 16B vectors), to amortize the overhead of the early-exit pmovmskb/test/jcc.

gcc and clang are currently incapable of auto-vectorizing loops with if() break early-exit conditions, so they make naive byte-at-a-time asm from the obvious C implementation.


This implementation of memchr from newlib is one example of someone's optimizing memchr:it's reading and testing 4 bytes at a time (apart from memchr, other functions in the newlib library are here).

Incidentally, most of the the source code for the MSVC run-time library is available, as an optional part of the MSVC installation (so, you could look at that).


Here is FreeBSD's (BSD-licensed) memchr() from memchr.c. FreeBSD's online source code browser is a good reference for time-tested, BSD-licensed code examples.

void *memchr(s, c, n)    const void *s;    unsigned char c;    size_t n;{    if (n != 0) {        const unsigned char *p = s;        do {            if (*p++ == c)                return ((void *)(p - 1));        } while (--n != 0);    }    return (NULL);}