Get the right columns when input contains consecutive tabs using read in shell Get the right columns when input contains consecutive tabs using read in shell shell shell

Get the right columns when input contains consecutive tabs using read in shell


You're correct in that a sequence of IFS characters can be counted as a single delimiter, namely when they're whitespace, or a non-whitespace character surrounded by whitespace (from the Bash manual – emphasis mine):

Word Splitting

[...]

Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.

One way I can think of to deal with this is preprocessing to insert a space between any two consecutive tab characters.

Without space:

while IFS=$'\t' read -r c1 c2 c3; do    printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done  < input_file

Output:

1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[column3] 3:[]

Space added with sed:

while IFS=$'\t' read -r c1 c2 c3; do    printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done  < <(sed 's/\t\t/\t \t/g' input_file)

Output:

1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[ ] 3:[column3]

This works if you're okay with having the space instead of the empty string in c2 for the second line.

Another option is to use a non-whitespace character in your IFS, as those (see manual snippet above) are not squeezed when delimiting fields:

while IFS='~' read -r c1 c2 c3; do    printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done  < <(tr $'\t' '~' < input_file)

Output:

1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[] 3:[column3]

Now, c2 in the second line is the empty string, but the downside is that we have to find a character for IFS that doesn't appear in our file.

Note that the process substitution (<(...)) requires Bash, but the IFS related points apply to the POSIX shell as well, see the specification.


You can use awk:

awk '{    if(NF == 4)        print $3    else        print ""}' text.txt

Output:

column2#empty line 


If consecutive delimiters within the white space class is consolidated, the next option is to convert the delimiters to non white space class before using bash read. For example, convert to pipe delimiter instead of a tab:

#!/bin/bashIFS=$'\n'for txt_record in $(tr "\t" "|" < your_tab_delim_file.txt)do  IFS=$'|' read -r field1 field2 field3 <<< "$txt_record"  echo "[$field1] [$field2] [$field3]"done