HTML Entities Deep Dive

HTML entities like & and © were once essential. With UTF-8 ubiquitous, most are unnecessary, but a few remain critical.

What Entities Are

Three forms encode characters:

& named entity & decimal numeric & hex numeric

All three render as &.

These special characters always need escaping in HTML:

CharacterEntity

|-----------|--------|

&& << >> "" (in attributes) '' or ' (in attributes)

If you forget the first three, the parser breaks. The last two only matter inside attribute values.

Once HTML pages are served as UTF-8 (default for decades), entities are obsolete for typography:

Just type the character. Source files are easier to read; output is identical.

Whitespace control: (non-breaking space), (soft hyphen), ‍ (zero-width joiner) — invisible in source if you type them directly.
Encoding boundaries: when content is generated by a system that may not preserve UTF-8.
Email HTML: some clients still struggle with non-ASCII bytes; entities are safer.

For any user-supplied string rendered into HTML:

const escape = s => s.replace(/[&<>"']/g, c => ({
  '&': '&', '<': '<', '>': '>', '"': '"', "'": '''
}[c]));

Frameworks (React, Vue, Svelte) do this automatically for text bindings — only worry about v-html/{@html}/dangerouslySetInnerHTML.

HTML entities are not URL encoding. %20 and & solve different problems. URL decode for query strings; HTML decode for display.

Decode and inspect with the [HTML Entity Decoder](https://sdk.is/html-entity-decoder).