While working on a client project, I ended up having to send HTML email notifications to users. During testing, I discovered some stray characters at the beginning of the email.
If you have to send HTML email, there are certainly lots of good libraries available for download to make your life easier. I opted to use PHPMailer by Worx International, since it’s one I have used in the past, and getting up and running with it takes about 3 minutes, tops.
What makes it particularly nice is that you don’t have to do anything special to format your HTML to be ready for mailing. If your workflow is anything like mine, you design an HTML email and code it into HTML, and then post it somewhere for the client to review. Since this HTML file already exists on your server, using PHPMailer is a breeze, since all you have to do to start sending HTML email is point the program to the HTML file you already created.
After you upload the PHPMailer library, all it takes is a few lines of code, and you’re done:
[sourcecode lang=php]require_once $_SERVER[‘DOCUMENT_ROOT’].’/phpMailer/class.phpmailer.php’;$mail = new PHPMailer(); // defaults to using php “mail()”
$body = $mail->getFile(‘../invites/email.html’);
$mail->From = $from_email;
$mail->FromName = $from_name;
$mail->Subject = “Test email subject”;
$mail->AltBody = “To view the message, please use an HTML compatible email viewer!”;
$mail->MsgHTML($body);
$mail->AddAddress($to_email, $to_name);
if(!$mail->Send()) {
echo “
} else {
echo “
}[/sourcecode]
One thing I noticed while sending out the test emails is that there were weird characters, specifically ) showing up at the very beginning of the HTML email, despite there not being any stray characters or empty spaces at the top of the HTML email source file.
I verified that the HTML email file was in UTF-8 with Unix line endings – still those funky characters remained. They were appearing in the emails regardless of email client – Entourage, Thunderbird, Postbox, everything.
I removed the encoding and doctype declarations from the HTML email file. No joy. Googling led me to lots if interesting articles on encoding and character sets for PHPMailer, but nothing particularly useful.
Finally I happened upon an FAQ article on the W3C website related to “Display problems caused by the UTF-8 BOM“. Of course! The byte order mark. I haven’t had to send out HTML emails in a long time and had just switched from using Coda to BBedit for code editing, and completely forgot about that.
Some applications insert a particular combination of bytes at the beginning of a file to indicate that the text contained in the file is Unicode. This combination of bytes is known as a signature or Byte Order Mark (BOM). Some applications – such as a text editor or a browser – will display the BOM as an extra line in the file, others will display unexpected characters, such as .
The BOM is always at the beginning of the file, and so you would normally expect to see the display issues at the top of a page. However, you may also find blank lines appearing within the page if you include text from a separate file that begins with a UTF-8 signature.
After further investigation in BBedit, I realized that BBedit offers an encoding type of “Unicode (UTF-8, no BOM)”, I switched to this encoding for the HTML source email file, re-saved, sent another test, and all was right with the world.
(Pardon all the blurring – this project was for a high-profile Facebook application, and I am paranoid about exposing database or file structures to the outside world.)
Some additional notes from W3C
If you have an editor which shows the characters that make up the UTF-8 signature you may be able to delete them by hand. Chances are, however, that the BOM is there in the first place because you didn’t see it.
Check whether your editor allows you to specify whether a UTF-8 signature is added or kept during a save. Such an editor provides a way of removing the signature by simply reading the file in then saving it out again. For example, if Dreamweaver detects a BOM the Save As dialogue box will have a check mark alongside the text “Include Unicode Signature (BOM)”. Just uncheck the box and save.
You will find that some text editors such as Windows Notepad will automatically add a UTF-8 signature to any file you save as UTF-8.
A UTF-8 signature at the beginning of a CSS file can sometimes cause the initial rules in the file to fail on certain user agents.
In some browsers, the presence of a UTF-8 signature will cause the browser to interpret the text as UTF-8 regardless of any character encoding declarations to the contrary.
So there you have it. Not exactly rocket surgery, but it was frustrating for the short time I was trying to troubleshoot, so I’m putting it into the internet ether to hopefully save someone else a few minutes.