Filtering os.walk() dirs and files Filtering os.walk() dirs and files python python

Filtering os.walk() dirs and files


This solution uses fnmatch.translate to convert glob patterns to regular expressions (it assumes the includes only is used for files):

import fnmatchimport osimport os.pathimport reincludes = ['*.doc', '*.odt'] # for files onlyexcludes = ['/home/paulo-freitas/Documents'] # for dirs and files# transform glob patterns to regular expressionsincludes = r'|'.join([fnmatch.translate(x) for x in includes])excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'for root, dirs, files in os.walk('/home/paulo-freitas'):    # exclude dirs    dirs[:] = [os.path.join(root, d) for d in dirs]    dirs[:] = [d for d in dirs if not re.match(excludes, d)]    # exclude/include files    files = [os.path.join(root, f) for f in files]    files = [f for f in files if not re.match(excludes, f)]    files = [f for f in files if re.match(includes, f)]    for fname in files:        print fname


From docs.python.org:

os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])

When topdown is True, the caller can modify the dirnames list in-place … this can be used to prune the search …

for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):    # excludes can be done with fnmatch.filter and complementary set,    # but it's more annoying to read.    dirs[:] = [d for d in dirs if d not in excludes]     for pat in includes:        for f in fnmatch.filter(files, pat):            print os.path.join(root, f)

I should point out that the above code assumes excludes is a pattern, not a full path. You would need to adjust the list comprehension to filter if os.path.join(root, d) not in excludes to match the OP case.


why fnmatch?

import osexcludes=....for ROOT,DIR,FILES in os.walk("/path"):    for file in FILES:       if file.endswith(('doc','odt')):          print file    for directory in DIR:       if not directory in excludes :          print directory

not exhaustively tested