How to use grep efficiently? How to use grep efficiently? linux linux

How to use grep efficiently?


If you have xargs installed on a multi-core processor, you can benefit from the following just in case someone is interested.

Environment:

Processor: Dual Quad-core 2.4GHzMemory: 32 GBNumber of files: 584450Total Size: ~ 35 GB

Tests:

1. Find the necessary files, pipe them to xargs and tell it to execute 8 instances.

time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -H "string" >> Strings_find8real    3m24.358suser    1m27.654ssys     9m40.316s

2. Find the necessary files, pipe them to xargs and tell it to execute 4 instances.

time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P4 grep -H "string" >> Stringsreal    16m3.051suser    0m56.012ssys     8m42.540s

3. Suggested by @Stephen: Find the necessary files and use + instead of xargs

time find ./ -name "*.ext" -exec grep -H "string" {} \+ >> Stringsreal    53m45.438suser    0m5.829ssys     0m40.778s

4. Regular recursive grep.

grep -R "string" >> Stringsreal    235m12.823suser    38m57.763ssys     38m8.301s

For my purposes, the first command worked just fine.


Wondering why -n1 is used below won't it be faster to use a higher value (say -n8? or leave it out so xargs will do the right thing)?

xargs -0 -n1 -P8 grep -H "string"

Seems it will be more efficient to give each grep that's forked to process on more than one file (I assume -n1 will give only one file name in argv for the grep) -- as I see it, we should be able to give the highest n possible on the system (based on argc/argv max length limitation). So the setup cost of bringing up a new grep process is not incurred more often.