🔤 Base62 Encoding – Compact, URL-Safe Alphanumeric Representation
Base62 encoding converts arbitrary binary data, text strings, integers, or byte sequences into a compact alphanumeric string using only the 62 printable characters [0-9A-Za-z]. Unlike Base64, Base62 contains no special characters such as +, /, or =, making encoded output inherently safe for URLs, database keys, filenames, and query strings — with zero escaping required.
🔍 Why Base62 Instead of Base64?
Base64 is the dominant binary-to-text encoding, but it introduces three characters that require percent-encoding in URLs (+ → %2B, / → %2F, = → %3D). Base62 sidesteps this problem entirely by restricting the alphabet to digits and ASCII letters only. The trade-off is a slightly longer output — roughly 6% more characters — but the URL-safety and readability gains are often worth it.
| Property | Base62 | Base64 |
|---|---|---|
| Alphabet size | 62 characters | 64 characters |
| Characters used | 0-9 A-Z a-z | 0-9 A-Z a-z + / = |
| URL-safe (no escaping) | ✅ Yes | ❌ No (standard) / ✅ (URL variant) |
| Padding characters | None | = padding required |
| Size overhead vs raw bytes | ~34% larger | ~33% larger |
| UUID (128-bit) encoding | 22 characters | 24 characters (with padding) |
⚙️ How Base62 Encoding Works
Base62 is a positional numeral system, not a block cipher like Base64. The algorithm treats the input bytes as a single big-endian integer and converts it to base-62 using repeated integer division:
1. Convert input to bytes (UTF-8 for text, raw bytes for hex/UUID)
2. Interpret byte array as a big-endian integer N
3. Repeat until N = 0:
remainder = N mod 62
N = N div 62
prepend ALPHABET[remainder] to result
4. Pad to minimum length if specifiedFor example, encoding the text "Hello" (bytes [72, 101, 108, 108, 111]):
Bytes: 72 101 108 108 111
BigInt: 310939249775
Base62: 5TP3M🧩 Encoding Modes Explained
The input string is encoded as UTF-8 bytes and then converted to Base62. This is the most common mode for encoding short strings, tokens, or identifiers.
A non-negative integer is directly expressed in base-62 notation. This is the classic use case for URL shorteners — map a numeric database row ID (e.g., 9999999) to a short slug like FXsT.
A hexadecimal byte string (e.g., output from a hash function or random byte generator) is parsed as raw bytes and encoded. Useful for compressing SHA or MD5 digests into shorter alphanumeric strings.
A standard UUID (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) is compressed to a fixed 22-character Base62 string. This is a popular technique for using UUIDs as primary keys while keeping URLs short and clean.
📏 Output Size Estimation
| Input Type | Formula | Example |
|---|---|---|
| n bytes of binary data | ⌈n × log(256) / log(62)⌉ ≈ n × 1.344 | 10 bytes → ~14 chars |
| Integer value k | ⌈log₆₂(k)⌉ = ⌈log(k) / log(62)⌉ | 1,000,000 → 4 chars |
| UUID (128-bit) | Fixed | Always 22 characters |
🔧 Custom Alphabet Support
The standard alphabet orders digits before uppercase before lowercase: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz. Some applications use alternative orderings:
- Flickr / YouTube style — lowercase first, then uppercase:
0-9a-zA-Z - Case-insensitive friendly — uppercase only with digits, extended with symbols (requires a non-standard Base62)
- Lexicographic sort-safe — use the standard
0-9A-Za-zordering so encoded strings sort the same as the original integer values
🌐 Common Use Cases
- URL shorteners — convert a numeric row ID to a 3–6 character slug (e.g., bit.ly, TinyURL)
- Short unique IDs — compress UUIDs or random byte arrays into compact tokens for APIs and databases
- Session tokens — generate readable, URL-safe tokens from random bytes
- Query string parameters — embed encoded data in URLs without percent-encoding overhead
- Hash abbreviation — shorten SHA-256 digests for display or logging while maintaining uniqueness