Filtering os.walk() dirs and files
This solution uses fnmatch.translate
to convert glob patterns to regular expressions (it assumes the includes only is used for files):
import fnmatchimport osimport os.pathimport reincludes = ['*.doc', '*.odt'] # for files onlyexcludes = ['/home/paulo-freitas/Documents'] # for dirs and files# transform glob patterns to regular expressionsincludes = r'|'.join([fnmatch.translate(x) for x in includes])excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'for root, dirs, files in os.walk('/home/paulo-freitas'): # exclude dirs dirs[:] = [os.path.join(root, d) for d in dirs] dirs[:] = [d for d in dirs if not re.match(excludes, d)] # exclude/include files files = [os.path.join(root, f) for f in files] files = [f for f in files if not re.match(excludes, f)] files = [f for f in files if re.match(includes, f)] for fname in files: print fname
From docs.python.org:
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
When topdown is True, the caller can modify the dirnames list in-place … this can be used to prune the search …
for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True): # excludes can be done with fnmatch.filter and complementary set, # but it's more annoying to read. dirs[:] = [d for d in dirs if d not in excludes] for pat in includes: for f in fnmatch.filter(files, pat): print os.path.join(root, f)
I should point out that the above code assumes excludes
is a pattern, not a full path. You would need to adjust the list comprehension to filter if os.path.join(root, d) not in excludes
to match the OP case.