🕵️ Zero Width Character Encoder – Text Steganography with Unicode
The Zero Width Character Encoder lets you hide secret messages, watermarks, or fingerprint tokens inside ordinary visible text using invisible Unicode characters. The encoded output looks completely normal to any reader but carries a hidden payload that only the decoder can reveal. This technique is known as Unicode steganography.
What Are Zero-Width Characters?
Zero-width characters (ZWCs) are Unicode code points that render without any visible glyph or horizontal space. They are legitimate Unicode characters used for linguistic purposes — for example, the Zero Width Joiner controls ligature formation in scripts like Arabic and Devanagari — but they are invisible in most contexts.
| Character | Code Point | Name | Binary Role |
|---|---|---|---|
(invisible) | U+200B | Zero Width Space (ZWS) | Bit 0 |
(invisible) | U+200C | Zero Width Non-Joiner (ZWNJ) | Bit 1 |
(invisible) | U+200D | Zero Width Joiner (ZWJ) | Byte delimiter |
(invisible) | U+FEFF | Zero Width No-Break Space (BOM) | 4th symbol (quaternary) |
How the Encoding Algorithm Works
The tool implements a binary ZWC steganography algorithm:
- Convert the secret payload to a UTF-8 byte array.
- Convert each byte to its 8-bit binary string (e.g.,
65→01000001). - Map each bit:
0→U+200B,1→U+200C. - Use
U+200D(ZWJ) as a byte delimiter between groups of 8 bits. - Distribute the resulting ZWC stream evenly through the carrier text.
The output text is visually identical to the carrier. Only a decoder that knows to look for ZWC sequences — and interprets them using the same scheme — can recover the hidden payload.
Three Operating Modes
1. Encode Mode
Enter your carrier text (the visible host text) and your secret payload. Choose an encoding scheme and click Encode. The tool distributes invisible ZWCs throughout the carrier and shows you the encoded result — which looks identical to the original carrier text but contains the hidden message. A capacity meter shows how much of the carrier is used.
2. Decode Mode
Paste any text that may contain hidden ZWCs (such as output from Encode mode). Click Decode to reveal the hidden payload. The tool also reports the ZWC count, their positions, and the carrier text with all invisible characters stripped.
3. Sanitize Mode
Paste any suspicious text and click Sanitize to remove all zero-width characters. This is useful for cleaning text copied from web pages, chat applications, or documents that may have been watermarked or contain maliciously injected ZWCs.
Encoding Schemes Compared
| Scheme | Symbols | ZWCs per ASCII char | Best for |
|---|---|---|---|
| Binary | 2 (ZWS, ZWNJ) | 9 (8 bits + 1 delimiter) | Maximum compatibility |
| Ternary | 3 (ZWS, ZWNJ, ZWJ) | ~6 (base-3 encoding) | Longer payloads in short carriers |
| Quaternary | 4 (ZWS, ZWNJ, ZWJ, FEFF) | ~5 (base-4 encoding) | Most compact embedding |
Common Use Cases
- Digital watermarking – Embed a unique identity token in published text to trace unauthorized copying or leaks.
- Document fingerprinting – Tag each copy of a document sent to different recipients so you can identify the source of a leak.
- Steganography education – Demonstrate how data can be hidden in plain sight using Unicode's invisible character set.
- Security research – Study how ZWCs can bypass naive text filters or be used in phishing and spoofing attacks.
- Text sanitization – Clean text of hidden characters before processing or publishing.
Zero-width character steganography is not encryption. The hidden text is unprotected — anyone with a decoder can read it. ZWCs are also used maliciously in phishing attacks (to bypass spam filters) and text spoofing. Always sanitize untrusted text before processing. For sensitive communications, use proper end-to-end encryption.