Fix Mismatch Between Data And Local In Awk Command Fix Mismatch Between Data And Local In Awk Command unix unix

Fix Mismatch Between Data And Local In Awk Command


Make the locale as C to use only ASCII character set with single byte encoding, pass LC_ALL=C to awk's environment:

LC_ALL=C awk 'length($0)<10000' file.txt >output-file.txt

Also you don't need to use cat as awk takes filename(s) as argument(s).


I've found three solutions on my machines:

Change environment variable

This has been answered on the approved one.

Add variable export LC_ALL=C to the environment.

Add parameter (only possible on gawk)

Add -b (binary) parameter. Like in:

cat file.txt | awk -b 'length($0)<10000' > output-file.txt

Use mawk instead of gawk

You can check if you are using gawk or mawk implementation on Linux (the first one is installed with a package of the same name on Ubuntu). For Ubuntu you can run

sudo update-alternatives --config awk

Source answer