Snappy or LZO for logs then consumed by hadoop Snappy or LZO for logs then consumed by hadoop hadoop hadoop

Snappy or LZO for logs then consumed by hadoop


Maybe it's too late, but Python-snappy provides for a command-line tool for snappy compression/decompression:

Compressing and decompressing a file:

$ python -m snappy -c uncompressed_file compressed_file.snappy

$ python -m snappy -d compressed_file.snappy uncompressed_file

Compressing and decompressing a stream:

$ cat uncompressed_data | python -m snappy -c > compressed_data.snappy

$ cat compressed_data.snappy | python -m snappy -d > uncompressed_data

Snappy also consistently decompresses 20%+ faster than lzo, which is a pretty big win if you want it for files you're reading a lot over hadoop. Finally, it's used by Google for stuff like BigTable and MapReduce, which is a really important endorsement for me at least.