The curly quotes, or “smart quotes” generated by Microsoft Word and other applications can be a real headache to developers. If you’ve built an administration area for your content publishers, and the publishers frequently compose their posts in Word and then copy+paste into your form to publish to the web, you may run into the situation where the curly quotes are replaced by your browser’s version of an unrecognized symbol, often a question mark. This can be particularly frustrating when Word-generated characters such as these curly quotes or em dashes break content-generated XML feeds, even after you’ve been careful enough to convert “normal” HTML special characters so that your XML would be valid. Fortunately, there is an easy workaround.
Rather than try to convince your publishers to stop using Word to compose their content, the easier (and more effective) solution will be to replace the curly quotes with “normal” quotes before the data is inserted into the database.
The function below will convert curly quotes and em dashes into standard quotes and dashes “-“. If you’ve got a handful of classes or functions that you routinely use as part of your data scrubbing process (to clean data before it gets sent to the server), you may want to include this function in that group, that way you don’t ever have to think about it again.
[sourcecode language=’php’] function convert_smart_quotes($string){
$search = array(chr(145),
chr(146),
chr(147),
chr(148),
chr(151));
$replace = array(“‘”,
“‘”,
‘”‘,
‘”‘,
‘-‘);
return str_replace($search, $replace, $string);
}
[/sourcecode]