RFC 2047: MIME Part 3 — Message Header Extensions for Non-ASCII Text

Standards Track MIME — Multipurpose Internet Mail Extensions Published November 1996

ELI5: Email headers like Subject and From are restricted to plain ASCII — English letters and basic punctuation. RFC 2047 is the trick that lets you write a Subject line in Japanese, a sender name in Arabic, or an accented name in French. It encodes non-ASCII characters into an ASCII-safe wrapper that any mail server can transport, and mail clients decode it back for display.

Why This Exists

RFC 2045 and RFC 2046 solved the problem of carrying non-ASCII content in message bodies with Content-Transfer-Encoding. But email headers — Subject, From display names, To display names, and others — are governed by RFC 5322, which restricts them to 7-bit US-ASCII.

This creates an obvious problem: billions of email users write in languages that require non-ASCII characters. Without RFC 2047, you could not send an email with:

A Subject line in Chinese, Japanese, Korean, Arabic, Hebrew, or Thai
A sender display name with accented characters (René, Müller, Björk)
Any header field containing characters outside the ASCII range

RFC 2047 defines the encoded-word syntax: a compact way to embed non-ASCII text inside ASCII-only headers, readable by any MIME-aware mail client.

How It Works

The Encoded-Word Format

An encoded-word has this structure:

=?charset?encoding?encoded-text?=

The three components are:

Component	Purpose	Values
`charset`	The character set of the original text	`UTF-8`, `ISO-8859-1`, `ISO-2022-JP`, etc.
`encoding`	How the text is encoded into ASCII	`B` (base64) or `Q` (quoted-printable variant)
`encoded-text`	The encoded representation	ASCII characters only

B Encoding (Base64)

Uses standard base64 encoding. Best for text that is heavily non-ASCII, such as CJK scripts:

; Subject: "Meeting confirmation" in Japanese
Subject: =?UTF-8?B?5Lya6K2w44Gu56K66KqN?=

; From display name in Chinese
From: =?UTF-8?B?5byg5LiJ?= <zhang@example.com>

Q Encoding (Quoted-Printable Variant)

A modified quoted-printable encoding optimized for headers. Like body QP, non-ASCII bytes become =XX hex pairs. Key difference: spaces are encoded as underscores (_):

; Subject: "Café menu" with accented e
Subject: =?UTF-8?Q?Caf=C3=A9_menu?=

; From display name: "René Dupont"
From: =?UTF-8?Q?Ren=C3=A9_Dupont?= <rene@example.com>

; Subject: "Gruße aus Berlin" (German greetings)
Subject: =?UTF-8?Q?Gru=C3=9Fe_aus_Berlin?=

Q encoding is more human-readable when most of the text is ASCII with just a few non-ASCII characters. B encoding is more compact when most characters are non-ASCII.

Where Encoded-Words Can Appear

Encoded-words are allowed in specific positions within headers:

Subject, Comments, Keywords: Anywhere text is expected (as a replacement for an atom or quoted-string).
From, To, Cc, Bcc, Reply-To, Sender: Only in the display name portion, never in the email address itself.
Content-Description: Allowed for describing MIME parts.

Encoded-words are not allowed inside quoted-strings, in the local-part or domain of an email address, or as parameter values in structured headers like Content-Type (use RFC 2231 for that).

Key Technical Details

Length Limits

Each encoded-word must not exceed 75 characters. If the encoded text is longer, it must be split into multiple encoded-words separated by folding whitespace (CRLF + space or tab):

; Long subject split across two encoded-words
Subject: =?UTF-8?B?5LuK5pel44Gu5Lya6K2w44Gr44Gk44GE44Gm?=
 =?UTF-8?B?44GU5qGI5YaF44GE44Gf44GX44G+44GZ?=

When two adjacent encoded-words are separated only by linear whitespace, the whitespace between them is ignored during decoding. This allows seamless splitting of long text across multiple encoded-words.

Charset Selection

Always use UTF-8 for new messages. The other charsets exist for legacy reasons:

Charset	Use Case	Recommendation
`UTF-8`	Covers all Unicode characters	Always use this
`ISO-8859-1`	Western European legacy	Do not use in new messages
`ISO-2022-JP`	Japanese legacy encoding	Still seen from some Japanese mail clients
`GB2312`	Simplified Chinese legacy	Do not use in new messages

Interaction with Header Folding

RFC 5322 limits header lines to 998 characters and recommends keeping them under 78. Encoded-words interact with folding: you can break between encoded-words at whitespace boundaries, but you must never break in the middle of an encoded-word. The =?...?= wrapper must be on a single line.

Decoding Rules

When a mail client encounters an encoded-word, it:

Extracts the charset, encoding type, and encoded text from the =?charset?encoding?text?= wrapper.
Decodes the text using base64 (B) or quoted-printable (Q).
Interprets the resulting bytes according to the declared charset.
Displays the decoded Unicode text to the user.

If the client does not recognize the charset, it should display the encoded-word as-is rather than displaying garbled text.

Examples

A Complete Message with Encoded Headers

MIME-Version: 1.0
From: =?UTF-8?Q?Ren=C3=A9_Dupont?= <rene@example.fr>
To: =?UTF-8?B?5bGx55Sw5aSq6YOO?= <yamada@example.jp>
Subject: =?UTF-8?Q?Re:_R=C3=A9union_du_15_mars?=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Bonjour Taro,

Confirmons la r=C3=A9union pour le 15 mars.

Note: the header uses RFC 2047 encoded-words (=?...?=), while the body uses regular quoted-printable encoding (=XX without the wrapper). These are different mechanisms for different parts of the message.

Encoding Comparison

The same text — "München" — encoded both ways:

; Q encoding: readable, good for mostly-ASCII text
=?UTF-8?Q?M=C3=BCnchen?=

; B encoding: compact but opaque
=?UTF-8?B?TcO8bmNoZW4=?=

Common Mistakes

Encoding the email address itself. Only the display name can be encoded. =?UTF-8?Q?user?=@example.com is invalid and will be rejected or misinterpreted. For internationalized email addresses, see RFC 6531/6532.
Missing space between encoded-words and regular text. An encoded-word must be separated from adjacent text by whitespace. Hello=?UTF-8?Q?World?= is malformed; it should be Hello =?UTF-8?Q?World?=.
Breaking an encoded-word across lines. The entire =?...?= token must fit on one line. If you need to fold, split into multiple encoded-words at word boundaries.
Using RFC 2047 in Content-Type parameters. Encoded-words are not valid in structured header parameters like filename= or name=. Use RFC 2231 parameter encoding instead: filename*=UTF-8''R%C3%A9sum%C3%A9.pdf.
Exceeding the 75-character limit. Each encoded-word must be 75 characters or fewer. Long text must be split into multiple encoded-words. Oversized encoded-words may be silently truncated by mail servers.
Double-encoding. Encoding text that is already encoded produces garbage like =?UTF-8?Q?=3D=3FUTF-8=3FQ=3F...?=. Ensure your encoding pipeline runs exactly once.

Deliverability Impact

Incorrect encoding triggers spam filters. Malformed encoded-words in Subject lines are a red flag. Spam filters have seen decades of broken encoding from spam tools. Clean, standards-compliant encoding signals legitimate sending software.
Display name encoding affects trust. If the From display name contains non-ASCII characters that are not properly encoded, recipients see raw =?UTF-8?Q?...?= text instead of a readable name. This looks suspicious and hurts open rates.
Subject line rendering is critical for engagement. A garbled Subject line due to wrong charset or broken encoding means the recipient cannot read it. The email gets ignored or reported as spam.
Always use UTF-8. Legacy charsets like ISO-8859-1 cannot represent all characters. If a system mixes charsets across different headers, clients may display some correctly and others as mojibake. Standardize on UTF-8 everywhere.
Test across clients. Outlook, Gmail, Apple Mail, and Thunderbird all have slightly different RFC 2047 decoding behaviors, especially around edge cases like long encoded-words and mixed encoding/non-encoding in a single header.