Artificial Wasteland · Ground Truth · The Verification Venue

You Already Know the Rest

SPECIMEN — ENTROPY OF PRINTED ENGLISH · SHANNON'S 1951 PREDICTION EXPERIMENT · 27-LETTER ALPHABET · PLAYED & RECOMPUTED LIVE

A printed letter looks like it could be any of twenty-seven things — about 4.76 bits of surprise apiece. It isn't. By the time you've read this far, the next letter is almost never a surprise at all. The gap between those two facts is the most-studied number in the science of language, and you can drive it down yourself, by hand.

Here is a strange way to measure a language. Cover up the rest of a sentence and ask someone to guess the next letter. If they're wrong, tell them, and let them guess again — and again — until they hit it. Write down how many guesses each letter took. Do this for a few hundred letters and you have measured something real: how much information a letter of English actually carries, in bits.

That experiment is Claude Shannon's, from a 1951 paper with the flat title Prediction and Entropy of Printed English. Its result is one of the quiet shocks of information theory. A 27-symbol alphabet (the 26 letters plus a space) could in principle carry log₂27 = 4.76 bits per letter — that's the ceiling, the entropy of pure noise where every symbol is equally likely and nothing is predictable. Real English, Shannon found, carries somewhere between 0.6 and 1.3 bits. The other three-plus bits are redundancy: structure so thick that a competent reader can fill the gaps. English is roughly 75% redundant, and the demonstration is simply that you can play the game well.

This page makes you the predictor. You play; your guesses drive Shannon's own formula; the bound on the entropy of English drops in front of you, from the 4.76-bit ceiling toward the floor — pulled down by nothing but what you already know about words.

I · The Shannon game

A short passage is hidden below. Guess the next letter — type it, or tap the keys. Wrong guesses are told to you (Shannon's rule), and you keep going until you're right; the number of tries is that letter's guess-rank. The rarer it is that you need more than one guess, the lower the entropy. Your running guess-ranks feed Shannon's bounds (his equation 17) directly — a lower and an upper estimate of the bits-per-letter of English, recomputed on every keystroke.

Instrument I · Predict the next letterPASSAGE 1

Click the text (or the keys) and start guessing. Spacebar guesses a space.

Your guess-rank distribution (how often the correct letter was your 1st, 2nd, 3rd… guess) — the input to Shannon's bounds:

Shannon's own subject, on 129 letters of a novel, guessed right on the first try 69% of the time. If you're near that, you're carrying the language about as well as a 1950 Bell Labs volunteer. The two bounds straddle the true entropy: the lower one is what an ideal predictor would achieve with your guess-pattern; the upper one is looser. With only a few hundred letters they wobble — that wobble is honest sampling error, named in Shannon's paper too.

II · What the machine knows, and what you know

Your mind reaches back across whole words and clauses. A machine that knows only letter statistics reaches back a fixed number of letters. Shannon called the entropy using N−1 letters of context F_N, and showed it falls as the context grows: F₁ (just letter frequencies) ≈ 4.0 bits, F₂ (one letter of memory) ≈ 3.3, F₃ ≈ 3.1, and onward down. Below, those rungs are recomputed live on a public-domain novel — and for each, an ideal predictor with that much memory is run over the text, its guess-ranks fed through the very same equation 17 to sandwich the true entropy between a lower and upper bound. As memory grows, the whole sandwich slides toward the floor. Then your game result is dropped onto the same axes — out past the machines, because the context you use has no fixed length.

Instrument II · The descent — bits per letter vs. memoryFRANKENSTEIN · LIVE

ideal predictor — upper / lower bound (this corpus) Shannon 1951 — experimental bounds you (from the game) F₀ = log₂27 ceiling

The green points are recomputed in your browser from the embedded text; the cyan curve is Shannon's published experiment (his Fig. 4); the floor near 1 bit is where Cover & King's 1978 gambling estimate (≈1.3) and modern compressors land. Play the game above to plant your own point.

III · The other side of the coin: redundancy

Low entropy means high redundancy — letters you didn't strictly need. The cleanest way to feel it is to start deleting them and notice the meaning survives. Drag the slider to knock out a fraction of the letters in a passage at random; you'll find you can still read it long after a noiseless channel would have given up. That surviving readability is the redundancy, the same ~75% the game measures from the other direction.

Instrument III · Knock out the letters0% removed

erase 0% of the letters

Spaces and word shapes are kept; only letters are blanked. The text below is the same novel's opening. (Try 50–60% — still readable. That's the redundancy a perfect code would compress away.)

The check

Everything quantitative here is recomputed — live in your browser, and offline by a committed verifier (research/entropy-of-english/verify.mjs) that this panel mirrors. The verified anchors are the ceiling F₀ = log₂27, the falling n-gram ladder, and Shannon's bound machinery proved on real text with an ideal predictor (the sandwich lower ≤ F ≤ upper at every memory order). Shannon's and Cover & King's headline numbers are cited, not re-derived — said plainly, not smuggled.

research/entropy-of-english/verify.mjs — recomputed live below from the same data

Where this is soft (named, not hidden)

“The entropy of English” is not one number. It is bounded, not pinned. Shannon gave a band (0.6–1.3 bits for 100 letters of context); Cover & King's convergent gambling method gave ≈1.3; large language models, which are predictors of enormous order, push the estimate toward ~1 bit — but every one of these is a bound or an estimate on a particular corpus, not a constant of nature. The number drifts with genre, era, and what you count as a symbol.

The live n-gram rungs run low. Measured on a single ~400,000-letter book, the plug-in F₃ here lands around 2.6 rather than Shannon's 3.1 — a finite-sample under-estimate, because rare three-letter contexts look more certain than they really are. That bias is why this page treats the high rungs as illustrative and rests its verified claims on F₀ and the bound theorem, not on the exact value of F₃.

The lower bound assumes an ideal predictor. Shannon proved Σ i(qᵢ−qᵢ₊₁)log₂i ≤ F for the ideal predictor; a human (or you, above) is not ideal, so your lower bound is a touch optimistic. Shannon argued the two errors — non-ideal humans, and non-rectangular conditional distributions — roughly cancel; we report your raw bounds and leave the caveat standing rather than correcting it away.

← back to the ground

Sources, verified against the primary texts: Shannon, C. E. (1951), “Prediction and Entropy of Printed English,” Bell System Technical Journal 30:50–64 — eq. (17), the §2 n-gram table (F₀ 4.76, F₁ 4.03, F₂ 3.32, F₃ 3.1), the §6 per-column bounds (100 letters: 0.6–1.3), and the abstract's “redundancy of roughly 75%.” Cover, T. M. & King, R. C. (1978), “A Convergent Gambling Estimate of the Entropy of English,” IEEE Trans. Inf. Theory 24:413 (≈1.3 bits/symbol). Corpus: Mary Shelley, Frankenstein (Project Gutenberg #84, public domain).

The n-gram ladder and the ideal-predictor sandwich recompute live from the embedded text; the offline verifier (checks) recomputes them on the full novel and writes the figures this page reads. The Shannon game runs entirely in your browser. Nothing on this page is fetched, tracked, or stored.