Question 1

Is the Text Encoding Detector free?

Accepted Answer

Yes, Text Encoding Detector is totally free :)

Question 2

Can I use the Text Encoding Detector offline?

Accepted Answer

Yes, you can install the webapp as PWA.

Question 3

Is it safe to use Text Encoding Detector?

Accepted Answer

Yes, any data related to Text Encoding Detector only stored in your browser (if storage required). You can simply clear browser cache to clear all the stored data. We do not store any data on server.

Question 4

How does the Text Encoding Detector work?

Accepted Answer

The tool analyzes the raw bytes of your input using three strategies: BOM (Byte Order Mark) sniffing for definitive Unicode identification, multi-byte sequence validation for UTF-8 and CJK encodings, and byte-frequency histogram analysis for legacy single-byte encodings. Each candidate receives a confidence score between 0 and 100%.

Question 5

What is a BOM and why does it matter?

Accepted Answer

A Byte Order Mark (BOM) is a fixed byte sequence at the very start of a file that signals its Unicode encoding. For example, UTF-8 BOM starts with EF BB BF, UTF-16 LE with FF FE, and UTF-16 BE with FE FF. When a BOM is present, encoding detection is 100% certain — no heuristics are needed.

Question 6

Can I detect encoding from pasted text?

Accepted Answer

Yes. When you paste text, the tool re-encodes it to UTF-8 bytes and then runs the detection algorithm on those bytes. This is useful for confirming that a string contains valid UTF-8 sequences and for checking if it could safely be interpreted as ASCII. For files with unknown legacy encodings, use the File Upload mode.

Question 7

What encodings can the tool detect?

Accepted Answer

The tool covers the most common charset families: Unicode (UTF-8, UTF-16 LE/BE, UTF-32 LE/BE, UTF-8 BOM), Western European (ISO-8859-1, Windows-1252), Cyrillic (Windows-1251, ISO-8859-5, KOI8-R), Greek, Turkish, Hebrew, Arabic, CJK Japanese (Shift-JIS, EUC-JP), Simplified Chinese (GBK, GB2312), Traditional Chinese (Big5), and Korean (EUC-KR).

Question 8

Why do I see multiple candidate encodings with similar confidence?

Accepted Answer

Many single-byte legacy encodings (e.g., ISO-8859-1 vs Windows-1252, or KOI8-R vs Windows-1251) share large overlapping byte ranges. For short inputs or files with mostly ASCII content, the algorithm cannot always disambiguate between them. Use a language hint or provide a larger sample for more accurate results.

Question 9

Is my data uploaded to any server?

Accepted Answer

No. All encoding detection runs entirely in your browser using JavaScript. No bytes from your text or files are ever sent to any external server. Files are read locally via the FileReader API.

Category	Encodings
Unicode	UTF-8, UTF-8 BOM, UTF-16 LE/BE, UTF-32 LE/BE, UTF-7
ASCII	US-ASCII (7-bit pure)
Legacy Western	ISO-8859-1, Windows-1252, ISO-8859-15, MacRoman
Cyrillic	Windows-1251, ISO-8859-5, KOI8-R, KOI8-U
Other Legacy	ISO-8859-2/7/8/9, Windows-1250/1253/1254/1255/1256
CJK	Shift-JIS, EUC-JP, ISO-2022-JP, GBK, GB2312, Big5, EUC-KR

Text Encoding Detector

About This Tool

🔍 Text Encoding Detector – Identify Charset from Bytes

📥 Three Input Modes

🔬 How Detection Works

📊 Supported Encoding Families

🎯 Reading the Results

💡 Common Use Cases

Frequently Asked Questions