Faster Alternative to Unix Grep

unix grep

Use time command with all these

$> time grep ">" file.fasta > output.txt$> time egrep ">" file.fasta > output.txt$> time awk  '/^>/{print $0}' file.fasta > output.txt -- If ">' is first letter

If you see the output..they are almost the same .

In my opinion ,if the data is in columnar format, then use awk to search.

unix grep

Hand-built state machine. If you only want '>' to be accepted at the beginning of the line, you'll need one more state. If you need to recognise '\r' too, you will need a few more states.

#include <stdio.h>int main(void){int state,ch;for(state=0; (ch=getc(stdin)) != EOF;   ) {        switch(state) {        case 0: /* start */                if (ch == '>') state = 1;                else break;        case 1: /* echo */                fputc(ch,stdout);                if (ch == '\n') state = 0;                break;                }        }if (state==1) fputc('\n',stdout);return 0;}

If you want real speed, you could replace the fgetc() and fputc() by their macro equivalents getc() and putc(). (but I think trivial programs like this will be I/O bound anyway)

unix grep

For big files, the fastest possible grep can be accomplished with GNU parallel. An example using parallel and grep can be found here.

For your purposes, you may like to try:

cat file.fasta | parallel -j 4 --pipe --block 10M grep "^\>" > output.txt

The above will use four cores, and parse 10 MB blocks to grep. The block-size is optional, but I find using a 10 MB block-size quite a bit faster on my system. YRMV.

HTH

CodeHunter

Faster Alternative to Unix Grep

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last