How to extract numbers from a string in Python? How to extract numbers from a string in Python? python python

How to extract numbers from a string in Python?


If you only want to extract only positive integers, try the following:

>>> txt = "h3110 23 cat 444.4 rabbit 11 2 dog">>> [int(s) for s in txt.split() if s.isdigit()][23, 11, 2]

I would argue that this is better than the regex example because you don't need another module and it's more readable because you don't need to parse (and learn) the regex mini-language.

This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, jmnas's answer below will do the trick.


I'd use a regexp :

>>> import re>>> re.findall(r'\d+', 'hello 42 I\'m a 32 string 30')['42', '32', '30']

This would also match 42 from bla42bla. If you only want numbers delimited by word boundaries (space, period, comma), you can use \b :

>>> re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string 30')['42', '32', '30']

To end up with a list of numbers instead of a list of strings:

>>> [int(s) for s in re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string 30')][42, 32, 30]


This is more than a bit late, but you can extend the regex expression to account for scientific notation too.

import re# Format is [(<string>, <expected output>), ...]ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3",       ['-12.34', '33', '-14.23e-2', '+45e5', '+67.56E+3']),      ('hello X42 I\'m a Y-32.35 string Z30',       ['42', '-32.35', '30']),      ('he33llo 42 I\'m a 32 string -30',        ['33', '42', '32', '-30']),      ('h3110 23 cat 444.4 rabbit 11 2 dog',        ['3110', '23', '444.4', '11', '2']),      ('hello 12 hi 89',        ['12', '89']),      ('4',        ['4']),      ('I like 74,600 commas not,500',        ['74,600', '500']),      ('I like bad math 1+2=.001',        ['1', '+2', '.001'])]for s, r in ss:    rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s)    if rr == r:        print('GOOD')    else:        print('WRONG', rr, 'should be', r)

Gives all good!

Additionally, you can look at the AWS Glue built-in regex


matomo