Handling special characters in Drupal mail headers

Drupal is usually great at handling special characters. Unlike other systems everything has always worked out of the box for me in Drupal. Recently however I did end up with an edge case involving mail headers.

What I wanted to do was to send a nicely formatted From-header so the site name would show up instead of just the e-mail address. The header could for example look like this: From: Mÿ Fäncÿ Sitë <mail@example.com>

As e-mail headers can only contain US-ASCII characters any special characters needs to be escaped, but Drupal does this automatically, right? Yes and no. Drupal does run all mail headers through mime_header_encode() to escape non-US-ASCII characters. In our case that results in:

From: =?UTF-8?B?TcO/IEbDpG5jw78gU2l0w6sgPG1haWxAZXhhbXBsZS5jb20+?=

It's a little more garbled than it should be. A correctly formatted header in our case would be:

From: =?UTF-8?B?TcO/IEbDpG5jw78gU2l0w6s=?= <mail@example.com>

As you can see the "name" of the from-address should be escaped with all the UTF-8 magic, but the e-mail address itself should not be encoded. E-mail clients handle these malformed headers badly—often showing no From-name at all or showing a part of the following header as the From-name.

The solution is to do the encoding yourself before sending the header off to drupal_mail(). In our case it would look like this:

$from = mime_header_encode('Mÿ Fäncÿ Sitë'). ' <mail@example.com>';

This of course means that the name is runs through the encode twice—once by you and once by drupal_mail(). Fortunately, mime_header_encode() is smart enough to only encode strings which contain non-US-ASCII characters and you can safely run strings through it multiple times without risking double-encoding issues.

Tagged:

1 comment

Well, I happen to know a bit about this, in particular that it's defined by RFC 2047, and a bit about implementation as I had to fix SquirrelMails implementation at some time...

Using mime_header_encode on the whole from header is a bug. From contains an addr-spec, so the whole header can't be encoded (as you've found out). The display-name, however, can.

Using the base64 encoding (B) is a bit of a copout, the Q encoding produces results that's a tad more readable in the raw. Sadly, there's no implementation in PHP, so it's doing it by hand (I think SquirrelMails implementation was 80 lines of PHP code, but I can't remember if that was before or after I cut it in half).

Thirdly, I seem to recall that the spec used to say that one should attempt to encode as little as possible. So, "Åge Andersen", should ideally only have "Åge" encoded, leaving the rest in ASCII.

In any case, you should report it as a bug, as the proper behavior is clearly defined by the RFCs (2822 , for all the gory details).

Add your comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p>
  • Lines and paragraphs break automatically.

Recent photos

About the blog

This is the personal website of Andreas Haugstrup Pedersen: commentary on media, communi­cation, culture and technology. Read more»