Ignore case in Python strings [duplicate]

python string case-insensitive

Here is a benchmark showing that using str.lower is faster than the accepted answer's proposed method (libc.strcasecmp):

#!/usr/bin/env python2.7import randomimport timeitfrom ctypes import *libc = CDLL('libc.dylib') # change to 'libc.so.6' on linuxwith open('/usr/share/dict/words', 'r') as wordlist:    words = wordlist.read().splitlines()random.shuffle(words)print '%i words in list' % len(words)setup = 'from __main__ import words, libc; gc.enable()'stmts = [    ('simple sort', 'sorted(words)'),    ('sort with key=str.lower', 'sorted(words, key=str.lower)'),    ('sort with cmp=libc.strcasecmp', 'sorted(words, cmp=libc.strcasecmp)'),]for (comment, stmt) in stmts:    t = timeit.Timer(stmt=stmt, setup=setup)    print '%s: %.2f msec/pass' % (comment, (1000*t.timeit(10)/10))

typical times on my machine:

235886 words in listsimple sort: 483.59 msec/passsort with key=str.lower: 1064.70 msec/passsort with cmp=libc.strcasecmp: 5487.86 msec/pass

So, the version with str.lower is not only the fastest by far, but also the most portable and pythonic of all the proposed solutions here.I have not profiled memory usage, but the original poster has still not given a compelling reason to worry about it. Also, who says that a call into the libc module doesn't duplicate any strings?

NB: The lower() string method also has the advantage of being locale-dependent. Something you will probably not be getting right when writing your own "optimised" solution. Even so, due to bugs and missing features in Python, this kind of comparison may give you wrong results in a unicode context.

python string case-insensitive

Your question implies that you don't need Unicode. Try the following code snippet; if it works for you, you're done:

Python 2.5.2 (r252:60911, Aug 22 2008, 02:34:17)[GCC 4.3.1] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import locale>>> locale.setlocale(locale.LC_COLLATE, "en_US")'en_US'>>> sorted("ABCabc", key=locale.strxfrm)['a', 'A', 'b', 'B', 'c', 'C']>>> sorted("ABCabc", cmp=locale.strcoll)['a', 'A', 'b', 'B', 'c', 'C']

Clarification: in case it is not obvious at first sight, locale.strcoll seems to be the function you need, avoiding the str.lower or locale.strxfrm "duplicate" strings.

python string case-insensitive

Are you using this compare in a very-frequently-executed path of a highly-performance-sensitive application? Alternatively, are you running this on strings which are megabytes in size? If not, then you shouldn't worry about the performance and just use the .lower() method.

The following code demonstrates that doing a case-insensitive compare by calling .lower() on two strings which are each almost a megabyte in size takes about 0.009 seconds on my 1.8GHz desktop computer:

from timeit import Timers1 = "1234567890" * 100000 + "a"s2 = "1234567890" * 100000 + "B"code = "s1.lower() < s2.lower()"time = Timer(code, "from __main__ import s1, s2").timeit(1000)print time / 1000   # 0.00920499992371 on my machine

If indeed this is an extremely significant, performance-critical section of code, then I recommend writing a function in C and calling it from your Python code, since that will allow you to do a truly efficient case-insensitive search. Details on writing C extension modules can be found here: https://docs.python.org/extending/extending.html

CodeHunter

Ignore case in Python strings [duplicate]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last