Why is grep so slow and memory intensive with -w (--word-regexp) flag?

bash shell unix awk grep

grep -F string file is simply looking for occurrences of string in the file but grep -w -F string file has to check each character before and after string too to see if they are word characters or not. That's a lot of extra work and one possible implementation of it would be to first separate lines into every possible non-word-character-delimited string with overlaps of course so that could take up a lot of memory but idk if that's what's causing your memory usage or not.

In any case, grep is simply the wrong tool for this job since you only want to match against a specific field in the input file, you should be using awk instead:

$ awk 'NR==FNR{ids[$0];next} /^>/{f=($1 in ids)} f' file.ids file.data>EA4 textdata>EA9 text_againdata_here

The above assumes your "data" lines cannot start with >. If they can then tell us how to identify data lines vs id lines.

Note that the above will work no matter how many data lines you have between id lines, even if there's 0 or 100:

$ cat file.data>EA4 text>E40 blahmore_data>EA9 text_againdata 1data 2data 3$ awk 'NR==FNR{ids[$0];next} /^>/{f=($1 in ids)} f' file.ids file.data>EA4 text>EA9 text_againdata 1data 2data 3

Also, you don't need to pipe the output to grep -v:

grep -A1 -Ff file.ids file.data | grep -v "^-" > output.data

just do it all in the one script:

awk 'NR==FNR{ids[$0];next} /^>/{f=($1 in ids)} f && !/^-/' file.ids file.data

CodeHunter

Why is grep so slow and memory intensive with -w (--word-regexp) flag?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last