Get the right columns when input contains consecutive tabs using read in shell

shell

You're correct in that a sequence of IFS characters can be counted as a single delimiter, namely when they're whitespace, or a non-whitespace character surrounded by whitespace (from the Bash manual – emphasis mine):

Word Splitting
[...]
Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.

One way I can think of to deal with this is preprocessing to insert a space between any two consecutive tab characters.

Without space:

while IFS=$'\t' read -r c1 c2 c3; do    printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done  < input_file

Output:

1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[column3] 3:[]

Space added with sed:

while IFS=$'\t' read -r c1 c2 c3; do    printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done  < <(sed 's/\t\t/\t \t/g' input_file)

Output:

1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[ ] 3:[column3]

This works if you're okay with having the space instead of the empty string in c2 for the second line.

Another option is to use a non-whitespace character in your IFS, as those (see manual snippet above) are not squeezed when delimiting fields:

while IFS='~' read -r c1 c2 c3; do    printf '1:[%s] 2:[%s] 3:[%s]\n' "$c1" "$c2" "$c3"done  < <(tr $'\t' '~' < input_file)

Output:

1:[column1] 2:[column2] 3:[column3]1:[column1] 2:[] 3:[column3]

Now, c2 in the second line is the empty string, but the downside is that we have to find a character for IFS that doesn't appear in our file.

Note that the process substitution (<(...)) requires Bash, but the IFS related points apply to the POSIX shell as well, see the specification.

shell

You can use awk:

awk '{    if(NF == 4)        print $3    else        print ""}' text.txt

Output:

column2#empty line

shell

If consecutive delimiters within the white space class is consolidated, the next option is to convert the delimiters to non white space class before using bash read. For example, convert to pipe delimiter instead of a tab:

#!/bin/bashIFS=$'\n'for txt_record in $(tr "\t" "|" < your_tab_delim_file.txt)do  IFS=$'|' read -r field1 field2 field3 <<< "$txt_record"  echo "[$field1] [$field2] [$field3]"done

CodeHunter

Get the right columns when input contains consecutive tabs using read in shell

Word Splitting

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last