Python equivalent of unix "strings" utility Python equivalent of unix "strings" utility python python

Python equivalent of unix "strings" utility


Here's a generator that yields all the strings of printable characters >= min (4 by default) in length that it finds in filename:

import stringdef strings(filename, min=4):    with open(filename, errors="ignore") as f:  # Python 3.x    # with open(filename, "rb") as f:           # Python 2.x        result = ""        for c in f.read():            if c in string.printable:                result += c                continue            if len(result) >= min:                yield result            result = ""        if len(result) >= min:  # catch result at EOF            yield result

Which you can iterate over:

for s in strings("something.bin"):    # do something with s

... or store in a list:

sl = list(strings("something.bin"))

I've tested this very briefly, and it seems to give the same output as the Unix strings command for the arbitrary binary file I chose. However, it's pretty naïve (for a start, it reads the whole file into memory at once, which might be expensive for large files), and is very unlikely to approach the performance of the Unix strings command.


To quote man strings:

STRINGS(1)                   GNU Development Tools                  STRINGS(1)NAME       strings - print the strings of printable characters in files.[...]DESCRIPTION       For each file given, GNU strings prints the printable character       sequences that are at least 4 characters long (or the number given with       the options below) and are followed by an unprintable character.  By       default, it only prints the strings from the initialized and loaded       sections of object files; for other types of files, it prints the       strings from the whole file.

You could achieve a similar result by using a regex matching at least 4 printable characters. Something like that:

>>> import re>>> content = "hello,\x02World\x88!">>> re.findall("[^\x00-\x1F\x7F-\xFF]{4,}", content)['hello,', 'World']

Please note this solution require the entire file content to be loaded in memory.


You can also call strings directly for example like this:

def strings(bytestring: bytes, min: int = 10) -> str:    cmd = "strings -n {}".format(min)    process = subprocess.Popen(        cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, stdin=subprocess.PIPE)    process.stdin.write(bytestring)    output = process.communicate()[0]    return output.decode("ascii")