How to split a delimited string into an array in awk? How to split a delimited string into an array in awk? unix unix

How to split a delimited string into an array in awk?


Have you tried:

echo "12|23|11" | awk '{split($0,a,"|"); print a[3],a[2],a[1]}'


To split a string to an array in awk we use the function split():

 awk '{split($0, a, ":")}' #           ^^  ^  ^^^ #            |  |   | #       string  |   delimiter #               | #               array to store the pieces

If no separator is given, it uses the FS, which defaults to the space:

$ awk '{split($0, a); print a[2]}' <<< "a:b c:d e"c:d

We can give a separator, for example ::

$ awk '{split($0, a, ":"); print a[2]}' <<< "a:b c:d e"b c

Which is equivalent to setting it through the FS:

$ awk -F: '{split($0, a); print a[1]}' <<< "a:b c:d e"b c

In gawk you can also provide the separator as a regexp:

$ awk '{split($0, a, ":*"); print a[2]}' <<< "a:::b c::d e" #note multiple :b c

And even see what the delimiter was on every step by using its fourth parameter:

$ awk '{split($0, a, ":*", sep); print a[2]; print sep[1]}' <<< "a:::b c::d e"b c:::

Let's quote the man page of GNU awk:

split(string, array [, fieldsep [, seps ] ])

Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array[1], the second piece in array[2], and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).


Please be more specific! What do you mean by "it doesn't work"?Post the exact output (or error message), your OS and awk version:

% awk -F\| '{  for (i = 0; ++i <= NF;)    print i, $i  }' <<<'12|23|11'1 122 233 11

Or, using split:

% awk '{  n = split($0, t, "|")  for (i = 0; ++i <= n;)    print i, t[i]  }' <<<'12|23|11'1 122 233 11

Edit: on Solaris you'll need to use the POSIX awk (/usr/xpg4/bin/awk) in order to process 4000 fields correctly.