removing emojis from a string in Python removing emojis from a string in Python python python

removing emojis from a string in Python


On Python 2, you have to use u'' literal to create a Unicode string. Also, you should pass re.UNICODE flag and convert your input data to Unicode (e.g., text = data.decode('utf-8')):

#!/usr/bin/env pythonimport retext = u'This dog \U0001f602'print(text) # with emojiemoji_pattern = re.compile("["        u"\U0001F600-\U0001F64F"  # emoticons        u"\U0001F300-\U0001F5FF"  # symbols & pictographs        u"\U0001F680-\U0001F6FF"  # transport & map symbols        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)                           "]+", flags=re.UNICODE)print(emoji_pattern.sub(r'', text)) # no emoji

Output

This dog πŸ˜‚This dog 

Note: emoji_pattern matches only some emoji (not all). See Which Characters are Emoji.


I am updating my answer to this by @jfs because my previous answer failed to account for other Unicode standards such as Latin, Greek etc. StackOverFlow doesn't allow me to delete my previous answer hence I am updating it to match the most acceptable answer to the question.

#!/usr/bin/env pythonimport retext = u'This is a smiley face \U0001f602'print(text) # with emojidef deEmojify(text):    regrex_pattern = re.compile(pattern = "["        u"\U0001F600-\U0001F64F"  # emoticons        u"\U0001F300-\U0001F5FF"  # symbols & pictographs        u"\U0001F680-\U0001F6FF"  # transport & map symbols        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)                           "]+", flags = re.UNICODE)    return regrex_pattern.sub(r'',text)print(deEmojify(text))

This was my previous answer, do not use this.

def deEmojify(inputString):    return inputString.encode('ascii', 'ignore').decode('ascii')


Complete Version of remove Emojis
✍ 🌷 πŸ“Œ πŸ‘ˆπŸ» πŸ–₯

import redef remove_emojis(data):    emoj = re.compile("["        u"\U0001F600-\U0001F64F"  # emoticons        u"\U0001F300-\U0001F5FF"  # symbols & pictographs        u"\U0001F680-\U0001F6FF"  # transport & map symbols        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)        u"\U00002500-\U00002BEF"  # chinese char        u"\U00002702-\U000027B0"        u"\U00002702-\U000027B0"        u"\U000024C2-\U0001F251"        u"\U0001f926-\U0001f937"        u"\U00010000-\U0010ffff"        u"\u2640-\u2642"         u"\u2600-\u2B55"        u"\u200d"        u"\u23cf"        u"\u23e9"        u"\u231a"        u"\ufe0f"  # dingbats        u"\u3030"                      "]+", re.UNICODE)    return re.sub(emoj, '', data)