🔢 Emoji Unicode Converter – Encode & Decode Emoji Code Points
The Emoji Unicode Converter is a bidirectional tool that translates between visible emoji characters and their underlying Unicode code point representations. Whether you are debugging emoji in a database, embedding emoji safely in HTML, CSS, or JavaScript, or simply exploring how the Unicode standard works, this tool provides instant conversions in eight different output formats — all without leaving your browser.
What Is a Unicode Code Point?
Every character in the Unicode standard — including all 3,600+ emoji — is assigned a unique number called a code point. Code points are written in U+ notation: for example, the grinning face emoji 😀 is U+1F600. When you need to include emoji in source code, HTML, or data transfer formats, you often cannot use the raw character; instead you use an escape sequence derived from the code point.
Supported Output Formats
| Format | Example (😀 = U+1F600) | Use Case |
|---|---|---|
| U+ Notation | U+1F600 | Unicode documentation, databases |
| HTML Hex Entity | 😀 | HTML pages, XML |
| HTML Decimal Entity | 😀 | Legacy HTML, email templates |
| CSS Escape | \1F600 | CSS content property |
| JS ES6 | \u{1F600} | Modern JavaScript/TypeScript strings |
| JS ES5 Surrogate | \uD83D\uDE00 | Legacy JS engines, JSON |
| Python | \U0001F600 | Python 3 string literals |
| Raw Hex | 1F600 | Low-level programming, fonts |
ZWJ Sequences Explained
Some of the most complex emoji — such as family groups (👨👩👧👦), profession emoji (👩💻), or flag sequences — are built by joining simpler emoji with the Zero-Width Joiner character (U+200D, ZWJ). Visually they appear as a single glyph, but they are actually a sequence of multiple code points. The converter uses the Intl.Segmenter API to identify these clusters and shows each component emoji and its ZWJ connectors in the breakdown panel.
Surrogate Pairs in JavaScript
JavaScript stores strings internally as UTF-16. Characters above U+FFFF — which includes virtually all emoji — cannot fit in a single 16-bit code unit and require a surrogate pair: two 16-bit values (a high surrogate 0xD800–0xDBFF and a low surrogate 0xDC00–0xDFFF) that together encode the full code point. The formula is:
high = Math.floor((codePoint - 0x10000) / 0x400) + 0xD800
low = ((codePoint - 0x10000) % 0x400) + 0xDC00Modern JavaScript (ES6+) avoids this complexity with the \u{XXXXX} syntax, which directly accepts the full code point. The converter outputs both representations so you can pick the right one for your target environment.
UTF-8 Byte Encoding
UTF-8 uses a variable number of bytes per character. Most emoji fall in the range U+1F000–U+1FAFF, which requires 4 bytes in UTF-8 (bytes starting with F0). The per-emoji table shows the exact hex byte sequence for every character, which is useful when working with databases, file I/O, or network protocols that operate at the byte level.
Common Use Cases
- Database debugging — convert emoji to code points to check whether your database column is using a UTF-8 4-byte charset (e.g.,
utf8mb4in MySQL). - HTML templates — use HTML entity notation to safely embed emoji in HTML without worrying about file encoding.
- CSS icons — paste the CSS escape into the
contentproperty of a::beforepseudo-element. - API payloads — sanitise emoji in JSON by replacing raw characters with
\uescapes before serialisation. - Accessibility & NLP — identify and strip emoji from text pipelines using the per-emoji table.
Limitations
Emoji names shown in the breakdown are derived from Unicode block ranges and a built-in lookup table. They give the Unicode block name (e.g., "EMOTICONS", "TRANSPORT & MAP SYMBOLS") rather than the full CLDR annotation (e.g., "grinning face") for every single character. For the full official CLDR name of any specific emoji, cross-reference with the Unicode Emoji Chart. Input is capped at 10,000 characters to maintain browser performance.