Why is "grep --ignore-case" 50 times slower?

I think this bug report helps in understanding why it is slow:

This slowness is due to grep (on a UTF-8 locale) constantly accesses files "/usr/lib/locale/locale-archive" and "/usr/lib/gconv/gconv-modules.cache".

It can be shown using the strace utility. Both files are from glibc.

performance bash time grep

The reason is that it needs to do a Unicode-aware comparison for the current locale, and judging by Marat's answer, it's not very efficient in doing so.

This shows how much faster it is when Unicode is not taken into consideration:

$ time LC_CTYPE=C grep -i fun test.txtall work and no plJack is no funJack is no Funreal    0m0.192s

Of course, this alternative won't work with characters in other languages such as Ñ/ñ, Ø/ø, Ð/ð, Æ/æ and so on.

Another alternative is to modify the regex so that it matches with case insensitivity:

$ time grep '[Ff][Uu][Nn]' test.txtall work and no plJack is no funJack is no Funreal    0m0.193s

This is reasonably fast, but of course it's a pain to convert each character into a class, and it's not easy to convert it to an alias or an sh script, unlike the above one.

For comparison, in my system:

$ time grep fun test.txtall work and no plJack is no funreal    0m0.085s$ time grep -i fun test.txtall work and no plJack is no funJack is no Funreal    0m3.810s

CodeHunter

Why is "grep --ignore-case" 50 times slower?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last