Selecting specific columns from df -h output in python Selecting specific columns from df -h output in python unix unix

Selecting specific columns from df -h output in python


You can use op.popen to run the command and retrieve its output, then splitlines and split to split the lines and fields. Run df -Ph rather than df -h so that lines are not split if a column is too long.

df_output_lines = [s.split() for s in os.popen("df -Ph").read().splitlines()]

The result is a list of lines. To extract the first column, you can use [line[0] for line in df_output_lines] (note that columns are numbered from 0) and so on. You may want to use df_output_lines[1:] instead of df_output_lines to strip the title line.

If you already have the output of df -h stored in a file somewhere, you'll need to join the lines first.

fixed_df_output = re.sub('\n\s+', ' ', raw_df_output.read())df_output_lines = [s.split() for s in fixed_df_output.splitlines()]

Note that this assumes that neither the filesystem name nor the mount point contain whitespace. If they do (which is possible with some setups on some unix variants), it's practically impossible to parse the output of df, even df -P. You can use os.statvfs to obtain information on a given filesystem (this is the Python interface to the C function that df calls internally for each filesystem), but there's no portable way of enumerating the filesystems.


Not an answer to the question, but I tried to solve the problem. :)

from os import statvfswith open("/proc/mounts", "r") as mounts:    split_mounts = [s.split() for s in mounts.read().splitlines()]    print "{0:24} {1:24} {2:16} {3:16} {4:15} {5:13}".format(            "FS", "Mountpoint", "Blocks", "Blocks Free", "Size", "Free")    for p in split_mounts:        stat = statvfs(p[1])        block_size = stat.f_bsize        blocks_total = stat.f_blocks        blocks_free = stat.f_bavail        size_mb = float(blocks_total * block_size) / 1024 / 1024        free_mb = float(blocks_free * block_size) / 1024 / 1024        print "{0:24} {1:24} {2:16} {3:16} {4:10.2f}MiB {5:10.2f}MiB".format(                p[0], p[1], blocks_total, blocks_free, size_mb, free_mb)


Here is the complete example:

import subprocessimport rep = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True)dfdata, _ = p.communicate()dfdata = dfdata.decode().replace("Mounted on", "Mounted_on")columns = [list() for i in range(10)]for line in dfdata.split("\n"):    line = re.sub(" +", " ", line)    for i,l in enumerate(line.split(" ")):        columns[i].append(l)print(columns[0])

Its assumes that mount points do not contain spaces.

Here is the more complete (and complicated solution) that does not hard-cores number of columns:

import subprocessimport redef yield_lines(data):    for line in data.split("\n"):        yield linedef line_to_list(line):    return re.sub(" +", " ", line).split()p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True)dfdata, _ = p.communicate()dfdata = dfdata.decode().replace("Mounted on", "Mounted_on")lines = yield_lines(dfdata)headers = next(lines, line_to_list)columns = [list() for i in range(len(headers))]for i,h in enumerate(headers):    columns[i].append(h) for line in lines:    for i,l in enumerate(line_to_list(line)):        columns[i].append(l)print(columns[0])