The curly quotes, or “smart quotes” generated by Microsoft Word and other applications can be a real headache to developers. If you’ve built an administration area for your content publishers, and the publishers frequently compose their posts in Word and then copy+paste into your form to publish to the web, you may run into the situation where the curly quotes are replaced by your browser’s version of an unrecognized symbol, often a question mark. This can be particularly frustrating when Word-generated characters such as these curly quotes or em dashes break content-generated XML feeds, even after you’ve been careful enough to convert “normal” HTML special characters so that your XML would be valid. Fortunately, there is an easy workaround.


Rather than try to convince your publishers to stop using Word to compose their content, the easier (and more effective) solution will be to replace the curly quotes with “normal” quotes before the data is inserted into the database.

The function below will convert curly quotes and em dashes into standard quotes and dashes “-“. If you’ve got a handful of classes or functions that you routinely use as part of your data scrubbing process (to clean data before it gets sent to the server), you may want to include this function in that group, that way you don’t ever have to think about it again.

function convert_smart_quotes($string)
{
$search = array(chr(145),
chr(146),
chr(147),
chr(148),
chr(151));

$replace = array("'",
"'",
'"',
'"',
'-');

return str_replace($search, $replace, $string);
}

Advertisement

Themeforest
ssd-virtual-servers-banner-468x60
Previous post

Facebook Connect - a More Authentic Web, Or Loss of Privacy?

Next post

Using IP Geolocation and Radius Searching with PHP/MySQL

snipe

snipe

I’m a tech geek/dev/infosec-nerd/scuba diver/blacksmith/sword-fighter/crime fighter/ENTP/warcrafter/activist. I'm the CTO at Mass Mosaic and the CEO of Grokability, Inc. in San Diego, CA. Tweet at me @snipeyhead or read more...

  • Couldn’t you also just make sure you’re using Unicode? If you’re both receiving data in Unicode and putting it out to the browser in Unicode, then everything should work properly. (It’s possible I’m misunderstanding how this works, since I’m no expert on Unicode).

  • Couldn’t you also just make sure you’re using Unicode? If you’re both receiving data in Unicode and putting it out to the browser in Unicode, then everything should work properly. (It’s possible I’m misunderstanding how this works, since I’m no expert on Unicode).

  • You probably could, but in some cases (existing system, etc) there may be reasons why you can’t just switch to Unicode. Also, for me, I prefer to clean out characters like that before the data hits the database. If the system changes, or we decide to no longer show curly quotes (but now have a database full of them), this can create problems. I’m a bit of a purist in that respect – I’d rather have clean data and then decide how I want to display it, so that its standard regardless of the encoding, style, etc.

  • You probably could, but in some cases (existing system, etc) there may be reasons why you can’t just switch to Unicode. Also, for me, I prefer to clean out characters like that before the data hits the database. If the system changes, or we decide to no longer show curly quotes (but now have a database full of them), this can create problems. I’m a bit of a purist in that respect – I’d rather have clean data and then decide how I want to display it, so that its standard regardless of the encoding, style, etc.

  • hel

    Great Post

  • hel

    Great Post

  • Don De.

    I don’t get it. How do I use this function? Do I use it on the page where the text is? Do I say “” etc. Sorry I’m so stupid.

    Don

  • Don De.

    I don’t get it. How do I use this function? Do I use it on the page where the text is? Do I say “” etc. Sorry I’m so stupid.

    Don

  • @Don – you could either include the function on the same page as the script where your text would be displayed, or within a functions file that you include in that file. Check out this tutorial on using functions in PHP for more info: http://www.php-mysql-tutorial.com/wikis/php-tutorial/php-functions.aspx

  • @Don – you could either include the function on the same page as the script where your text would be displayed, or within a functions file that you include in that file. Check out this tutorial on using functions in PHP for more info: http://www.php-mysql-tutorial.com/wikis/php-tutorial/php-functions.aspx

  • Scott

    There's no reason to stop with just smart quotes. You can fix all manner of illegal characters with the following:

    $allEntities = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES);
    $specialEntities = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES);
    $noTags = array_diff($allEntities, $specialEntities);

    And, that will leave tags alone, assuming that's what you want.
    $valid = strtr($invalid, $noTags);

  • Iuaspel

    Thanks. But for me it doesn’t work.

    I get “ before “ or “ (left double quotation mark) and ” after ” or ” (right double quotation mark).

    For me works well this function:

    Adapted from http://www.toao.net/48-replacing-smart-quotes-and-em-dashes-in-mysql

  • Justwrote

    If the code doesn’t appear well you can copy it from: http://notepub.com/#note=197525

    Just remove the blank space after the first character (less-than, similar to bracket.

    • Just so you know, using a fake email address means I can’t whitelist you so your comments can’t be automatically approved. I don’t much care one way or another, but I thought you should know.

  • Justwrittennow

    This simpler version also does it: http://notepub.com/#note=197532

  • Falyuteis

    They do this also in client side with JavaScript in http://www.kevinkorb.com/post/37

    The problem is that in many blogs and other websites they do the opposite: you enter data with the keyboard default quotes (vertical) and when you submit they convert them to curly ones ( they can do for example with http://pastebin.com/CEK0NN43 ). If the submited data is a computer code you have normally to put the quotes as straight again so the code works.

    In client side they do the opposite like shown in http://stackoverflow.com/questions/2202811/converting-straight-quotes-to-curly-quotes

  • Aslasfweutt

    I’ve copied the codes that worked for me: https://pzt.me/9l95

  • Monty

    Thank you so much for this. Looked high and low for a fix for this troublesome problem and yours worked like a charm. God bless!