How to account for accent characters for regex in Python?

python regex django hashtag non-ascii-characters

Try the following:

hashtags = re.findall(r'#(\w+)', str1, re.UNICODE)

Regex101 Demo

EDITCheck the useful comment below from Martijn Pieters.

python regex django hashtag non-ascii-characters

You may also want to use

import unicodedataoutput = unicodedata.normalize('NFD', my_unicode).encode('ascii', 'ignore')

how do i convert all those escape characters into their respective characters like if there is an unicode à, how do i convert that into a standard a?Assume you have loaded your unicode into a variable called my_unicode... normalizing à into a is this simple...

import unicodedataoutput = unicodedata.normalize('NFD', my_unicode).encode('ascii', 'ignore')Explicit example...

myfoo = u'àà'myfoou'\xe0\xe0'unicodedata.normalize('NFD', myfoo).encode('ascii', 'ignore')'aa'

check this answer it helped me a lot: How to convert unicode accented characters to pure ascii without accents?

python regex django hashtag non-ascii-characters

I know this question is a little outdated but you may also consider adding the range of accented characters À (index 192) and ÿ (index 255) to your original regex.

hashtags = re.findall(r'#([A-Za-z0-9_À-ÿ]+)', str1)

which will return ['yogenfrüz']

Hope this'll help anyone else.

CodeHunter

How to account for accent characters for regex in Python?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last