Retrieve all hashtags from a tweet in a PHP function
I created my own solution. It does:
- Finds all hashtags in a string
- Removes duplicate ones
- Sorts hashtags regarding to count of the existence in text
Supports unicode characters
function getHashtags($string) { $hashtags= FALSE; preg_match_all("/(#\w+)/u", $string, $matches); if ($matches) { $hashtagsArray = array_count_values($matches[0]); $hashtags = array_keys($hashtagsArray); } return $hashtags;}
Output is like this:
( [0] => #_ƒOllOw_ [1] => #FF [2] => #neslitükendi [3] => #F_0_L_L_O_W_ [4] => #takipedeğerdost [5] => #GönüldenTakipleşiyorum)
Don't forget about hashtags that contain unicode, numeric values and underscores:
$tweet = "Valid hashtags include: #hashtag #NYC2016 #NYC_2016 #gøypålandet!";preg_match_all('/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u', $tweet, $matches);print_r( $matches );
\p{Pc} - to match underscore
\p{N} - numeric character in any script
\p{L} - letter from any language
\p{Mn} - any non marking space (accents, umlauts, etc)