🔢 Text Entropy Calculator – Measure Information Density
The Text Entropy Calculator applies Shannon entropy — the foundational concept of information theory — to any text you provide. It quantifies how much randomness or information content exists in a string, measured in bits per character. Whether you're evaluating password strength, auditing encryption output, or studying linguistics, this tool gives you an instant, accurate picture of your text's information density.
📐 The Shannon Entropy Formula
Claude Shannon introduced entropy in 1948 as a measure of uncertainty in a communication channel. For discrete symbols, the formula is:
H(X) = -Σ [ p(xᵢ) × log₂(p(xᵢ)) ]Where:
p(xᵢ)is the probability (relative frequency) of characterxᵢ- The sum runs over all unique characters in the input
- The result is in bits per character
For example, the string aaaa has zero entropy — every character is predictable. The string abcd has entropy log₂(4) = 2.0 bits/char — each character is equally likely.
📊 Derived Metrics Explained
| Metric | Formula | What It Tells You |
|---|---|---|
| Shannon Entropy | H = -Σ p·log₂(p) | Average randomness per character |
| Total Entropy Bits | H × length | Total information content of the string |
| Max Possible Entropy | log₂(alphabet_size) | Upper bound for the detected character set |
| Normalised Entropy | H / log₂(alphabet) × 100 | How close to maximum randomness (percentage) |
🔐 Password Strength via Entropy
Total entropy bits (H × length) is the most reliable metric for password strength because it accounts for both character diversity and password length:
- Under 28 bits → Weak — easily brute-forced
- 28–35 bits → Fair — acceptable for low-risk accounts
- 36–59 bits → Strong — recommended for most use cases
- 60+ bits → Very Strong — suitable for high-security credentials
These thresholds assume an attacker has no prior knowledge of the password's structure. Dictionary-based passwords may score high in entropy but remain vulnerable to targeted attacks.
🔬 Analysis Modes
Character vs. Byte Analysis
Character mode treats each Unicode code point as one symbol — ideal for natural language text. Byte mode uses the raw UTF-8 byte values (0–255), which is appropriate for cryptographic assessments of binary data or multi-byte characters like emoji and accented letters.
N-gram Entropy
Instead of analysing individual characters, n-gram entropy considers sequences of n characters at once. A bigram entropy near zero (e.g., abababab) reveals repeating two-character patterns that unigram entropy alone might partially miss. This is useful for detecting structured tokens or weak pseudo-random generators.
Chunked Analysis
For long documents or logs, chunked mode divides the text into fixed-size segments and computes entropy for each chunk independently. Visualising entropy across chunks helps identify regions of compressed data, encrypted blocks, or repeated boilerplate content.
Comparative Mode
Paste two text samples side-by-side to instantly compare their entropy values, normalised scores, and character distributions. This is particularly useful when comparing candidate passwords or evaluating which version of a generated token is more random.
🛠️ Practical Use Cases
- Security audits — verify that generated API keys, session tokens, or passwords meet entropy thresholds
- Data compression — text with low entropy compresses well; high entropy text is already dense
- Cryptographic analysis — encrypted ciphertext should have entropy close to the maximum for its byte range
- Linguistics research — compare information density across languages, writing systems, or genres
- Education — demonstrate core concepts of information theory interactively
⚙️ Options Reference
- Case Sensitivity — treat
Aandaas the same character to analyse structural entropy independent of casing - Ignore Whitespace — exclude spaces and newlines to focus on meaningful character distribution
- Character Set Filter — restrict analysis to All characters, ASCII only, or Alphanumeric only
- Precision — configure the number of decimal places shown in the entropy result (0–10)
- Custom Alphabet Size — override the detected alphabet size for normalisation (e.g., set to 64 for Base64-encoded strings)
All calculations run entirely in your browser. No text is ever sent to any server, making this tool safe for analysing passwords, private keys, and sensitive documents.