Breaking Vigenère: The Index of Coincidence

For three centuries the Vigenère cipher was le chiffre indéchiffrable — the unbreakable cipher. It was broken not by guessing letters but by measuring one number. A substitution can swap every letter for another symbol, yet it cannot change how lumpy a language is — and that lumpiness has a measure, the index of coincidence. Here is the whole break, worked end to end on a real message and operable in your hands.

The seam — where the honesty lives

Three places this break is allowed to strain, named rather than smoothed. One: Friedman's estimate genuinely failed to be precise — 18.71 against a true 8. That isn't a defect quietly cropped from the figure; it's the documented behaviour of the κ-test on short messages, and it's why the coset scan exists. The rod points; the scan measures. Two: English has no single index of coincidence — this page's table gives 0.0655, Friedman's canonical figure is 0.0667, a different corpus gives something else again. The break doesn't depend on the exact value: the χ² recovery only needs English's shape. Three: the plaintext was composed for this demonstration — English prose written for the page, not a quotation — so no claim rests on transcribing an outside text correctly. Every figure is a property of the algorithm and the statistics, which the verifier controls end to end.

A note the work doesn't get to skip: the cipher universally called "Vigenère" was not Blaise de Vigenère's. The repeating-keyword tableau was published by Giovan Battista Bellaso in 1553; Vigenère described a stronger autokey scheme in 1586. The nineteenth century misattributed the weaker, more famous cipher and the name stuck. The mathematics doesn't care whose name is on it; the honesty rule says you should know it's the wrong one. Charles Babbage broke the cipher around 1854 but never published; Kasiski published the first general method in 1863; and in 1920 William F. Friedman turned the whole problem into the single statistic above.

The deepest seam is the one the whole layer is about. A cipher's job is to destroy information — to leave an adversary with noise. But natural language is redundant: it carries far less information than its raw length suggests. That leftover redundancy is structure, and structure is exactly what a statistic can grip. The index of coincidence measures how much structure survived the cipher — and as long as any survives, the cipher can be read. The only truly unbreakable cipher is the one-time pad, where the key is as long as the message itself. Everything shorter leaves a fingerprint. This one left two.

IC = Σ nᵢ(nᵢ − 1) / [ N(N − 1) ] (the fingerprint)
L̂ = N(κ_p − κ_r) / [ κ_o(N − 1) + κ_p − N·κ_r ] (Friedman's rod)

Show the check. Every number on this page is recomputed in your browser by engine.js — the same algorithm the offline verifier research/index-of-coincidence/verify.mjs runs, which builds the random and English baselines, proves the IC is invariant under substitution to twelve decimals, encrypts the composed plaintext with FRIEDMAN, and breaks it from scratch — the IC collapse, Friedman's estimate and its honest overshoot, the Kasiski gap census, the full coset-IC table, the χ² key recovery, and a letter-for-letter decryption — writing every figure to artifact.json. 23 of 23 checks pass. A second driver, verify-page.mjs, opens this built page in a real browser and asserts that every figure it prints equals the artifact's — so the page can't drift from the proof. No number on screen is one the verifier doesn't also check.

Sources. Bellaso, La cifra del Sig. Giovan Battista Bellaso (1553). Kasiski, Die Geheimschriften und die Dechiffrir-Kunst (1863). Friedman, The Index of Coincidence and Its Applications in Cryptography, Riverbank Publication No. 22 (study 1920). Kahn, The Codebreakers (1967), for Babbage and the assessment of Friedman. English letter frequencies: Lewand, Cryptological Mathematics (2000), Table 1.1. One-time-pad secrecy: Shannon, Communication Theory of Secrecy Systems (1949).

What the Cipher Couldn't Hide

1 · The fingerprint a cipher can't wipe

2 · The cipher flattens the fingerprint — by a measurable amount

3 · From the length to the key, one column at a time

The seam — where the honesty lives