How to use grep with large (millions) number of files to search for string and get result in few minutes
It's not that big stack of files (kudos to 10⁷ files - a messys dream) but I created 100k files (400 MB overall) with
for i in {1..100000}; do head -c 10 /dev/urandom > dummy_$i; done
and made some tests for pure curiosity (the keyword 10 I was searching is chosen randomly):
> time find . | xargs -n1 -P8 grep -H "10"real 0m22.626suser 0m0.572ssys 0m5.800s
> time find . | xargs -n8 -P8 grep -H "10"real 0m3.195suser 0m0.180ssys 0m0.748s
> time grep "10" *real 0m0.879suser 0m0.512ssys 0m0.328s
> time awk '/10/' *real 0m1.123suser 0m0.760ssys 0m0.348s
> time sed -n '/10/p' *real 0m1.531suser 0m0.896ssys 0m0.616s
> time perl -ne 'print if /10/' *real 0m1.428suser 0m1.004ssys 0m0.408s
Btw. there isn't a big difference in running time if I suppress the output with piping STDOUT
to /dev/null
. I am using Ubuntu 12.04 on a not so powerful laptop ;)My CPU is Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz.
More curiosity:
> time find . | xargs -n1 -P8 grep -H "10" 1>/dev/nullreal 0m22.590suser 0m0.616ssys 0m5.876s> time find . | xargs -n4 -P8 grep -H "10" 1>/dev/nullreal m5.604suser 0m0.196ssys 0m1.488s> time find . | xargs -n8 -P8 grep -H "10" 1>/dev/nullreal 0m2.939suser 0m0.140ssys 0m0.784s> time find . | xargs -n16 -P8 grep -H "10" 1>/dev/nullreal 0m1.574suser 0m0.108ssys 0m0.428s> time find . | xargs -n32 -P8 grep -H "10" 1>/dev/nullreal 0m0.907suser 0m0.084ssys 0m0.264s> time find . | xargs -n1024 -P8 grep -H "10" 1>/dev/nullreal 0m0.245suser 0m0.136ssys 0m0.404s> time find . | xargs -n100000 -P8 grep -H "10" 1>/dev/nullreal 0m0.224suser 0m0.100ssys 0m0.520s