How to use grep with large (millions) number of files to search for string and get result in few minutes How to use grep with large (millions) number of files to search for string and get result in few minutes linux linux

How to use grep with large (millions) number of files to search for string and get result in few minutes


You should remove -0 argument to xargs and up -n parameter instead:

... | xargs -n16 ...


It's not that big stack of files (kudos to 10⁷ files - a messys dream) but I created 100k files (400 MB overall) with

for i in {1..100000}; do head -c 10 /dev/urandom > dummy_$i; done

and made some tests for pure curiosity (the keyword 10 I was searching is chosen randomly):

> time find . | xargs -n1 -P8 grep -H "10"real 0m22.626suser 0m0.572ssys  0m5.800s

> time find . | xargs -n8 -P8 grep -H "10"real 0m3.195suser 0m0.180ssys  0m0.748s

> time grep "10" *real 0m0.879suser 0m0.512ssys  0m0.328s

> time awk '/10/' *real 0m1.123suser 0m0.760ssys  0m0.348s

> time sed -n '/10/p' *real 0m1.531suser 0m0.896ssys  0m0.616s

> time perl -ne 'print if /10/' *real 0m1.428suser 0m1.004ssys  0m0.408s

Btw. there isn't a big difference in running time if I suppress the output with piping STDOUT to /dev/null. I am using Ubuntu 12.04 on a not so powerful laptop ;)My CPU is Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz.

More curiosity:

> time find . | xargs -n1 -P8 grep -H "10" 1>/dev/nullreal 0m22.590suser 0m0.616ssys  0m5.876s> time find . | xargs -n4 -P8 grep -H "10" 1>/dev/nullreal m5.604suser 0m0.196ssys  0m1.488s> time find . | xargs -n8 -P8 grep -H "10" 1>/dev/nullreal 0m2.939suser 0m0.140ssys  0m0.784s> time find . | xargs -n16 -P8 grep -H "10" 1>/dev/nullreal 0m1.574suser 0m0.108ssys  0m0.428s> time find . | xargs -n32 -P8 grep -H "10" 1>/dev/nullreal 0m0.907suser 0m0.084ssys  0m0.264s> time find . | xargs -n1024 -P8 grep -H "10" 1>/dev/nullreal 0m0.245suser 0m0.136ssys  0m0.404s> time find . | xargs -n100000 -P8 grep -H "10" 1>/dev/nullreal 0m0.224suser 0m0.100ssys  0m0.520s


8 million files is a lot in one directory! However, 8 million times 2kb is 16GB and you have 50GB of RAM. I am thinking of a RAMdisk...