Split string with delimiters in C
You can use the strtok()
function to split a string (and specify the delimiter to use). Note that strtok()
will modify the string passed into it. If the original string is required elsewhere make a copy of it and pass the copy to strtok()
.
EDIT:
Example (note it does not handle consecutive delimiters, "JAN,,,FEB,MAR" for example):
#include <stdio.h>#include <stdlib.h>#include <string.h>#include <assert.h>char** str_split(char* a_str, const char a_delim){ char** result = 0; size_t count = 0; char* tmp = a_str; char* last_comma = 0; char delim[2]; delim[0] = a_delim; delim[1] = 0; /* Count how many elements will be extracted. */ while (*tmp) { if (a_delim == *tmp) { count++; last_comma = tmp; } tmp++; } /* Add space for trailing token. */ count += last_comma < (a_str + strlen(a_str) - 1); /* Add space for terminating null string so caller knows where the list of returned strings ends. */ count++; result = malloc(sizeof(char*) * count); if (result) { size_t idx = 0; char* token = strtok(a_str, delim); while (token) { assert(idx < count); *(result + idx++) = strdup(token); token = strtok(0, delim); } assert(idx == count - 1); *(result + idx) = 0; } return result;}int main(){ char months[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC"; char** tokens; printf("months=[%s]\n\n", months); tokens = str_split(months, ','); if (tokens) { int i; for (i = 0; *(tokens + i); i++) { printf("month=[%s]\n", *(tokens + i)); free(*(tokens + i)); } printf("\n"); free(tokens); } return 0;}
Output:
$ ./main.exemonths=[JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC]month=[JAN]month=[FEB]month=[MAR]month=[APR]month=[MAY]month=[JUN]month=[JUL]month=[AUG]month=[SEP]month=[OCT]month=[NOV]month=[DEC]
I think strsep
is still the best tool for this:
while ((token = strsep(&str, ","))) my_fn(token);
That is literally one line that splits a string.
The extra parentheses are a stylistic element to indicate that we're intentionally testing the result of an assignment, not an equality operator ==
.
For that pattern to work, token
and str
both have type char *
. If you started with a string literal, then you'd want to make a copy of it first:
// More general pattern:const char *my_str_literal = "JAN,FEB,MAR";char *token, *str, *tofree;tofree = str = strdup(my_str_literal); // We own str's memory now.while ((token = strsep(&str, ","))) my_fn(token);free(tofree);
If two delimiters appear together in str
, you'll get a token
value that's the empty string. The value of str
is modified in that each delimiter encountered is overwritten with a zero byte - another good reason to copy the string being parsed first.
In a comment, someone suggested that strtok
is better than strsep
because strtok
is more portable. Ubuntu and Mac OS X have strsep
; it's safe to guess that other unixy systems do as well. Windows lacks strsep
, but it has strbrk
which enables this short and sweet strsep
replacement:
char *strsep(char **stringp, const char *delim) { if (*stringp == NULL) { return NULL; } char *token_start = *stringp; *stringp = strpbrk(token_start, delim); if (*stringp) { **stringp = '\0'; (*stringp)++; } return token_start;}
Here is a good explanation of strsep
vs strtok
. The pros and cons may be judged subjectively; however, I think it's a telling sign that strsep
was designed as a replacement for strtok
.