How to grep for a pattern in the files in tar archive without filling up disk space How to grep for a pattern in the files in tar archive without filling up disk space shell shell

How to grep for a pattern in the files in tar archive without filling up disk space


Seems like nobody posted this simple solution that processes the archive only once:

tar xzf archive.tgz --to-command \    'grep --label="$TAR_FILENAME" -H PATTERN ; true'

Here tar passes the name of each file in a variable (see the docs) and it is used by grep to print it with each match. Also true is added so that tar doesn't complain about failing to extract files that don't match.


Here's my take on this:

while read filename; do tar -xOf file.tar "$filename" | grep 'pattern' | sed "s|^|$filename:|"; done < <(tar -tf file.tar | grep -v '/$')

Broken out for explanation:

  • while read filename; do -- it's a loop...
  • tar -xOf file.tar "$filename" -- this extracts each file...
  • | grep 'pattern' -- here's where you put your pattern...
  • | sed "s|^|$filename:|"; - prepend the filename, so this looks like grep. Salt to taste.
  • done < <(tar -tf file.tar | grep -v '/$') -- end the loop, get the list of files as to fead to your while read.

One proviso: this breaks if you have OR bars (|) in your filenames.

Hmm. In fact, this makes a nice little bash function, which you can append to your .bashrc file:

targrep() {  local taropt=""  if [[ ! -f "$2" ]]; then    echo "Usage: targrep pattern file ..."  fi  while [[ -n "$2" ]]; do        if [[ ! -f "$2" ]]; then      echo "targrep: $2: No such file" >&2    fi    case "$2" in      *.tar.gz) taropt="-z" ;;      *) taropt="" ;;    esac    while read filename; do      tar $taropt -xOf "$2" \       | grep "$1" \       | sed "s|^|$filename:|";    done < <(tar $taropt -tf $2 | grep -v '/$')  shift  done}


Here's a bash function that may work for you. Add the following to your ~/.bashrc

targrep () {    for i in $(tar -tzf "$1"); do        results=$(tar -Oxzf "$1" "$i" | grep --label="$i" -H "$2")        echo "$results"    done}

Usage:

targrep archive.tar.gz "pattern"