How to replace Microsoft-encoded quotes in PHP How to replace Microsoft-encoded quotes in PHP php php

How to replace Microsoft-encoded quotes in PHP


I have found an answer to this question. You need just one line of code using iconv() function in php:

// replace Microsoft Word version of single  and double quotations marks (“ ” ‘ ’) with  regular quotes (' and ")$output = iconv('UTF-8', 'ASCII//TRANSLIT', $input);     


Considering you only want to replace a few specific and well identified characters, I would go for str_replace with an array: you obviously don't need the heavy artillery regex will bring you ;-)

And if you encounter some other special characters (damn copy-paste from Microsoft Word...), you can just add them to that array whenever is necessary / whenever they are identified.


The best answer I can give to your comment is probably this link: Convert Smart Quotes with PHP

And the associated code (quoting that page):

function convert_smart_quotes($string) {     $search = array(chr(145),                     chr(146),                     chr(147),                     chr(148),                     chr(151));     $replace = array("'",                      "'",                      '"',                      '"',                      '-');     return str_replace($search, $replace, $string); } 

(I don't have Microsoft Word on this computer, so I can't test by myself)

I don't remember exactly what we used at work (I was not the one having to deal with that kind of input), but it was the same kind of stuff...


Your Microsoft-encoded quotes are the probably the typographic quotation marks. You can simply replace them with str_replace if you know the encoding of the string in that you want to replace them.

Here’s an example for UTF-8 but using a single mapping array with strtr:

$quotes = array(    "\xC2\xAB"     => '"', // « (U+00AB) in UTF-8    "\xC2\xBB"     => '"', // » (U+00BB) in UTF-8    "\xE2\x80\x98" => "'", // ‘ (U+2018) in UTF-8    "\xE2\x80\x99" => "'", // ’ (U+2019) in UTF-8    "\xE2\x80\x9A" => "'", // ‚ (U+201A) in UTF-8    "\xE2\x80\x9B" => "'", // ‛ (U+201B) in UTF-8    "\xE2\x80\x9C" => '"', // “ (U+201C) in UTF-8    "\xE2\x80\x9D" => '"', // ” (U+201D) in UTF-8    "\xE2\x80\x9E" => '"', // „ (U+201E) in UTF-8    "\xE2\x80\x9F" => '"', // ‟ (U+201F) in UTF-8    "\xE2\x80\xB9" => "'", // ‹ (U+2039) in UTF-8    "\xE2\x80\xBA" => "'", // › (U+203A) in UTF-8);$str = strtr($str, $quotes);

If you’re need another encoding, you can use mb_convert_encoding to convert the keys.