Get the right columns when input contains consecutive tabs using read in shell
You're correct in that a sequence of IFS
characters can be counted as a single delimiter, namely when they're whitespace, or a non-whitespace character surrounded by whitespace (from the Bash manual – emphasis mine):
Word Splitting
[...]
Any character in
IFS
that is notIFS
whitespace, along with any adjacentIFS
whitespace characters, delimits a field. A sequence ofIFS
whitespace characters is also treated as a delimiter. If the value ofIFS
is null, no word splitting occurs.
One way I can think of to deal with this is preprocessing to insert a space between any two consecutive tab characters.
Without space:
while IFS=$'\t' read -r c1 c2 c3; do printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done < input_file
Output:
1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[column3] 3:[]
Space added with sed:
while IFS=$'\t' read -r c1 c2 c3; do printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done < <(sed 's/\t\t/\t \t/g' input_file)
Output:
1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[ ] 3:[column3]
This works if you're okay with having the space instead of the empty string in c2
for the second line.
Another option is to use a non-whitespace character in your IFS
, as those (see manual snippet above) are not squeezed when delimiting fields:
while IFS='~' read -r c1 c2 c3; do printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done < <(tr $'\t' '~' < input_file)
Output:
1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[] 3:[column3]
Now, c2
in the second line is the empty string, but the downside is that we have to find a character for IFS
that doesn't appear in our file.
Note that the process substitution (<(...)
) requires Bash, but the IFS
related points apply to the POSIX shell as well, see the specification.
If consecutive delimiters within the white space class is consolidated, the next option is to convert the delimiters to non white space class before using bash read. For example, convert to pipe delimiter instead of a tab:
#!/bin/bashIFS=$'\n'for txt_record in $(tr "\t" "|" < your_tab_delim_file.txt)do IFS=$'|' read -r field1 field2 field3 <<< "$txt_record" echo "[$field1] [$field2] [$field3]"done