What is the best way to split a string into an array of Unicode characters in PHP? What is the best way to split a string into an array of Unicode characters in PHP? arrays arrays

What is the best way to split a string into an array of Unicode characters in PHP?


You could use the 'u' modifier with PCRE regex ; see Pattern Modifiers (quoting) :

u (PCRE8)

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

For instance, considering this code :

header('Content-type: text/html; charset=UTF-8');  // So the browser doesn't make our lives harder$str = "abc 文字化け, efg";$results = array();preg_match_all('/./', $str, $results);var_dump($results[0]);

You'll get an unusable result:

array  0 => string 'a' (length=1)  1 => string 'b' (length=1)  2 => string 'c' (length=1)  3 => string ' ' (length=1)  4 => string '�' (length=1)  5 => string '�' (length=1)  6 => string '�' (length=1)  7 => string '�' (length=1)  8 => string '�' (length=1)  9 => string '�' (length=1)  10 => string '�' (length=1)  11 => string '�' (length=1)  12 => string '�' (length=1)  13 => string '�' (length=1)  14 => string '�' (length=1)  15 => string '�' (length=1)  16 => string ',' (length=1)  17 => string ' ' (length=1)  18 => string 'e' (length=1)  19 => string 'f' (length=1)  20 => string 'g' (length=1)

But, with this code :

header('Content-type: text/html; charset=UTF-8');  // So the browser doesn't make our lives harder$str = "abc 文字化け, efg";$results = array();preg_match_all('/./u', $str, $results);var_dump($results[0]);

(Notice the 'u' at the end of the regex)

You get what you want :

array  0 => string 'a' (length=1)  1 => string 'b' (length=1)  2 => string 'c' (length=1)  3 => string ' ' (length=1)  4 => string '文' (length=3)  5 => string '字' (length=3)  6 => string '化' (length=3)  7 => string 'け' (length=3)  8 => string ',' (length=1)  9 => string ' ' (length=1)  10 => string 'e' (length=1)  11 => string 'f' (length=1)  12 => string 'g' (length=1)

Hope this helps :-)


Slightly simpler than preg_match_all:

preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY)

This gives you back a 1-dimensional array of characters. No need for a matches object.


Try this:

preg_match_all('/./u', $text, $array);