Logo

MonoCalc

/

Unicode Character Lookup

Encode/Decode
🔍

Paste or type any character above

About This Tool

🔡 Unicode Character Lookup – Explore Every Code Point

The Unicode Character Lookup tool lets you explore any of the 150,000+ characters in the Unicode standard. Paste a character, enter a code point, or analyse an entire string to retrieve its official Unicode metadata — name, category, block, plane, and a full suite of encoding representations — all computed instantly in your browser.

Three Lookup Modes

Character Mode

Paste or type any single character into the input field. The tool extracts its code point and displays everything you need to know: the official Unicode name (e.g., EURO SIGN), the general category (e.g., Sc – Currency Symbol), the Unicode block (e.g., Currency Symbols), the plane, and eight different encoding formats from UTF-8 bytes through CSS and JavaScript escape sequences.

Code Point Mode

Enter a code point in any of these formats and the tool decodes it into the corresponding character and full metadata:

  • U+20AC — standard U+ notation
  • 0x20AC — C-style hex prefix
  • 8364 — decimal integer
  • 20AC — plain 4–6 digit hex

Valid code points range from U+0000 to U+10FFFF. Surrogate code points (U+D800–U+DFFF) are detected and flagged as invalid for standalone use.

String Analysis Mode

Enter a multi-character string (up to 200 code points) and the tool generates a per-character breakdown table. Each row shows the glyph, code point badge, Unicode name, category, and UTF-8 bytes. A summary footer reports the total number of code points, the UTF-8 byte length, whether the string contains surrogate pairs, and whether it includes right-to-left characters from Arabic, Hebrew, or Syriac scripts.

Encoding Formats Explained

FormatExample (€ = U+20AC)Use case
UTF-8 BytesE2 82 ACFile storage, network transmission, databases
UTF-16 Code Unit(s)0x20ACJavaScript, Java, Windows APIs
UTF-320x000020ACInternal processing, simple indexing
HTML Entity€ / €HTML source code, XML documents
URL Encoded%E2%82%ACQuery strings, URI components
CSS Escape\20ACCSS content property, selector escaping
JS / Python Escape\u20ACString literals in JavaScript, Python, Java

Understanding Unicode Structure

Unicode organises its 1,114,112 code points into 17 planes, each containing up to 65,536 code points. Plane 0 (the Basic Multilingual Plane or BMP, U+0000–U+FFFF) covers almost every character in modern use — Latin alphabets, CJK ideographs, Greek, Cyrillic, Arabic, Hebrew, and the bulk of symbols and punctuation. Plane 1 (the Supplementary Multilingual Plane) holds emoji, historic scripts, and musical notation. Planes 2–3 extend CJK coverage with rare ideographs.

Within each plane, characters are grouped into named blocks (e.g., Currency Symbols, Emoticons, Hiragana). The tool identifies the block for every code point so you can immediately understand the script or symbol category.

Each character also carries a general category property — a two-letter code such as Lu (Uppercase Letter), Nd (Decimal Digit), Sc (Currency Symbol), or So (Other Symbol). These categories drive text-processing algorithms, regular expression character classes, and locale-sensitive sorting.

UTF-8 and Surrogate Pairs

UTF-8 encodes BMP characters (U+0000–U+FFFF) in 1–3 bytes and supplementary characters (U+10000–U+10FFFF) in 4 bytes, using a variable-length scheme designed for ASCII compatibility. UTF-16, used internally by JavaScript and Java, represents supplementary characters as surrogate pairs — two consecutive 16-bit code units in the ranges U+D800–U+DBFF (high surrogate) and U+DC00–U+DFFF (low surrogate). The tool detects when a supplementary character requires a surrogate pair and shows both code units.

Common Use Cases

  • Debugging encoding issues — find why a character displays as a replacement symbol or garbled text by checking its UTF-8 byte sequence.
  • Web development — copy the HTML entity or CSS escape ready to paste into source code without worrying about character encoding in the file.
  • Internationalisation (i18n) — verify that a string contains the expected code points and does not accidentally include lookalike characters from a different script.
  • Security research — identify homoglyph characters (e.g., Cyrillic а U+0430 vs. Latin a U+0061) that could be used in phishing domain names or IDN homograph attacks.
  • Learning Unicode — explore how emoji, mathematical symbols, or rare scripts are encoded and what their official Unicode names are.

Frequently Asked Questions

Is the Unicode Character Lookup free?

Yes, Unicode Character Lookup is totally free :)

Can I use the Unicode Character Lookup offline?

Yes, you can install the webapp as PWA.

Is it safe to use Unicode Character Lookup?

Yes, any data related to Unicode Character Lookup only stored in your browser (if storage required). You can simply clear browser cache to clear all the stored data. We do not store any data on server.

How does the Unicode Character Lookup tool work?

Paste any character or enter a Unicode code point (e.g., U+20AC) to instantly retrieve its official Unicode metadata: name, category, block, plane, and all encoding representations. In String Analysis mode, every code point in the input string is analysed row by row. All processing happens locally in your browser — nothing is sent to a server.

What encodings does the tool show?

For each character the tool shows: UTF-8 bytes (hex), UTF-16 code unit(s) (hex), UTF-32 code unit (hex), HTML named entity or numeric character reference (&#DDDDD;), URL percent-encoding, CSS escape (\XXXXXX), and JavaScript/Python Unicode escape (\uXXXX or \u{XXXXX} for astral characters).

What code point input formats are accepted?

You can enter a code point as U+20AC notation, a plain hex number like 20AC or 0x20AC, or a decimal integer like 8364. All formats are parsed automatically. The valid range is U+0000 to U+10FFFF; surrogate code points (U+D800–U+DFFF) are flagged as invalid for standalone use.

What is String Analysis mode?

String Analysis mode accepts a multi-character string (up to 200 characters) and breaks it into a table of individual code points, each showing the glyph, code point, name, category, and UTF-8 bytes. A summary footer shows total code points, UTF-8 byte length, whether the string contains surrogate pairs, and whether it contains right-to-left characters.

Why does some characters show a placeholder instead of a glyph?

Control characters (U+0000–U+001F, U+007F), unassigned code points, and characters with no visual glyph (e.g., combining marks in isolation) do not render a visible glyph. The tool shows a [no glyph] placeholder in those cases to make it clear that the character exists in Unicode but has no standalone visual representation.

What Unicode plane and block information is provided?

The tool identifies the Unicode plane (0 = Basic Multilingual Plane, 1 = Supplementary Multilingual Plane, etc.) and the named Unicode block (e.g., Currency Symbols, Emoticons, CJK Unified Ideographs). This helps you understand which script or symbol category a character belongs to and how it is encoded.