Assign string containing null-character (\0) to a variable in Bash Assign string containing null-character (\0) to a variable in Bash bash bash

Assign string containing null-character (\0) to a variable in Bash


In Bash, you can't store the NULL-character in a variable.

You may, however, store a plain hex dump of the data (and later reverse this operation again) by using the xxd command.

VAR1=`echo -ne "n\0m\0k" | xxd -p | tr -d '\n'`echo -ne "$VAR1" | xxd -r -p | od -c   # -> 0000000    n  \0   m  \0   k


As others have already stated, you can't store/use NUL char:

  • in a variable
  • in an argument of the command line.

However, you can handle any binary data (including NUL char):

  • in pipes
  • in files

So to answer your last question:

can anybody give me a hint how strings containing \0 chars can be stored or handled efficiently without losing any (meta-) characters?

You can use files or pipes to store and handle efficiently any string with any meta-characters.

If you plan to handle data, you should note additionally that:

  • Only the NUL char will be eaten by variable and argument of the command line, you can check this.
  • Be wary that command substitution (as $(command..) or `command..`) has an additional twist above being a variable as it'll eat your ending new lines.

Bypassing limitations

If you want to use variables, then you must get rid of the NUL char by encoding it, and various other solutions here give clever ways to do that (an obvious way is to use for example base64 encoding/decoding).

If you are concerned by memory or speed, you'll probably want to use a minimal parser and only quote NUL character (and the quoting char). In this case this would help you:

quote() { sed 's/\\/\\\\/g;s/\x0/\\x00/g'; }

Then, you can secure your data before storing them in variables andcommand line argument by piping your sensitive data into quote, which will output a safe data stream without NUL chars. You can get backthe original string (with NUL chars) by using echo -en "$var_quoted" which will send the correct string on the standard output.

Example:

## Our example output generator, with NUL charsascii_table() { echo -en "$(echo '\'0{0..3}{0..7}{0..7} | tr -d " ")"; }## storemyvar_quoted=$(ascii_table | quote)## useecho -en "$myvar_quoted"

Note: use | hd to get a clean view of your data in hexadecimal andcheck that you didn't loose any NUL chars.

Changing tools

Remember you can go pretty far with pipes without using variables nor argument in command line, don't forget for instance the <(command ...) construct that will create a named pipe (sort of a temporary file).

EDIT: the first implementation of quote was incorrect and would not deal correctly with \ special characters interpreted by echo -en. Thanks @xhienne for spotting that.

EDIT2: the second implementation of quote had bug because of using only \0 than would actually eat up more zeroes as \0, \00, \000 and \0000 are equivalent. So \0 was replaced by \x00. Thanks for @MatthijsSteen for spotting this one.


Use uuencode and uudecode for POSIX portability

xxd and base64 are not POSIX 7 but uuencode is.

VAR="$(uuencode -m <(printf "a\0\n") /dev/stdout)"uudecode -o /dev/stdout <(printf "$VAR") | od -tx1

Output:

0000000 61 00 0a0000003

Unfortunately I don't see a POSIX 7 alternative for the Bash process <() substitution extension except writing to file, and they are not installed in Ubuntu 12.04 by default (sharutils package).

So I guess that the real answer is: don't use Bash for this, use Python or some other saner interpreted language.