HTML Entities Deep Dive
When to use HTML entities, when not to, and how UTF-8 changed everything.
HTML entities like & and © were once essential. With UTF-8 ubiquitous, most are unnecessary, but a few remain critical.
What Entities Are
Three forms encode characters:
& named entity
& decimal numeric
& hex numeric
All three render as &.
The Five You Must Use
These special characters always need escaping in HTML:
|-----------|--------|
&&<<>>"" (in attributes)'' or ' (in attributes)If you forget the first three, the parser breaks. The last two only matter inside attribute values.
Everything Else: Use UTF-8
Once HTML pages are served as UTF-8 (default for decades), entities are obsolete for typography:
©→©…→…—→—€→€&heart;→♥
Just type the character. Source files are easier to read; output is identical.
When Entities Still Matter
- Whitespace control:
(non-breaking space),(soft hyphen),(zero-width joiner) — invisible in source if you type them directly. - Encoding boundaries: when content is generated by a system that may not preserve UTF-8.
- Email HTML: some clients still struggle with non-ASCII bytes; entities are safer.
Escaping User Input
For any user-supplied string rendered into HTML:
const escape = s => s.replace(/[&<>"']/g, c => ({
'&': '&', '<': '<', '>': '>', '"': '"', "'": '''
}[c]));
Frameworks (React, Vue, Svelte) do this automatically for text bindings — only worry about v-html/{@html}/dangerouslySetInnerHTML.
URL Encoding Is Different
HTML entities are not URL encoding. %20 and & solve different problems. URL decode for query strings; HTML decode for display.
Decode and inspect with the [HTML Entity Decoder](https://sdk.is/html-entity-decoder).