Use find command but exclude files in two directories Use find command but exclude files in two directories unix unix

Use find command but exclude files in two directories


Here's how you can specify that with find:

find . -type f -name "*_peaks.bed" ! -path "./tmp/*" ! -path "./scripts/*"

Explanation:

  • find . - Start find from current working directory (recursively by default)
  • -type f - Specify to find that you only want files in the results
  • -name "*_peaks.bed" - Look for files with the name ending in _peaks.bed
  • ! -path "./tmp/*" - Exclude all results whose path starts with ./tmp/
  • ! -path "./scripts/*" - Also exclude all results whose path starts with ./scripts/

Testing the Solution:

$ mkdir a b c d e$ touch a/1 b/2 c/3 d/4 e/5 e/a e/b$ find . -type f ! -path "./a/*" ! -path "./b/*"./d/4./c/3./e/a./e/b./e/5

You were pretty close, the -name option only considers the basename, where as -path considers the entire path =)


Here is one way you could do it...

find . -type f -name "*_peaks.bed" | egrep -v "^(./tmp/|./scripts/)"


Use

find \( -path "./tmp" -o -path "./scripts" \) -prune -o  -name "*_peaks.bed" -print

or

find \( -path "./tmp" -o -path "./scripts" \) -prune -false -o  -name "*_peaks.bed"

or

find \( -path "./tmp" -path "./scripts" \) ! -prune -o  -name "*_peaks.bed"

The order is important. It evaluates from left to right.Always begin with the path exclusion.

Explanation

Do not use -not (or !) to exclude whole directory. Use -prune.As explained in the manual:

−prune    The primary shall always evaluate as  true;  it          shall  cause  find  not  to descend the current          pathname if it is a directory.  If  the  −depth          primary  is specified, the −prune primary shall          have no effect.

and in the GNU find manual:

-path pattern              [...]              To ignore  a  whole              directory  tree,  use  -prune rather than checking              every file in the tree.

Indeed, if you use -not -path "./pathname",find will evaluate the expression for each node under "./pathname".

find expressions are just condition evaluation.

  • \( \) - groups operation (you can use -path "./tmp" -prune -o -path "./scripts" -prune -o, but it is more verbose).
  • -path "./script" -prune - if -path returns true and is a directory, return true for that directory and do not descend into it.
  • -path "./script" ! -prune - it evaluates as (-path "./script") AND (! -prune). It revert the "always true" of prune to always false. It avoids printing "./script" as a match.
  • -path "./script" -prune -false - since -prune always returns true, you can follow it with -false to do the same than !.
  • -o - OR operator. If no operator is specified between two expressions, it defaults to AND operator.

Hence, \( -path "./tmp" -o -path "./scripts" \) -prune -o -name "*_peaks.bed" -print is expanded to:

[ (-path "./tmp" OR -path "./script") AND -prune ] OR ( -name "*_peaks.bed" AND print )

The print is important here because without it is expanded to:

{ [ (-path "./tmp" OR -path "./script" )  AND -prune ]  OR (-name "*_peaks.bed" ) } AND print

-print is added by find - that is why most of the time, you do not need to add it in you expression. And since -prune returns true, it will print "./script" and "./tmp".

It is not necessary in the others because we switched -prune to always return false.

Hint: You can use find -D opt expr 2>&1 1>/dev/null to see how it is optimized and expanded,
find -D search expr 2>&1 1>/dev/null to see which path is checked.