← Artificial Wasteland · the ground, written in four letters

Built to Be Misread

Copy three billion letters and you will drop some. A point mutation, a ribosome's slip — one base in a codon turns into another. The remarkable thing is how little it usually costs: the genetic code is arranged so that a single wrong letter lands, far more often than chance allows, on an amino acid with nearly the same chemistry. Below, the code recomputed from primary data — and the famous claim that it is “one in a million” taken apart honestly, because it is true only under the right ruler.

There are 64 codons — three letters drawn from {A, C, G, U} — and only 20 amino acids plus a stop signal. That surplus is the first defence: most amino acids own several codons, so a great many single-letter slips (especially in the third position) change nothing at all. They are synonymous. But the deeper trick is in the placement. Where a slip does change the amino acid, the code tends to swap in one that is chemically close — a hydrophobic residue for another hydrophobic residue — so the protein barely flinches. The map of which codon means which amino acid is, in this precise sense, error-tolerant.

Is that arrangement special, or would any old code do as well? The way to find out is the way Haig & Hurst found it in 1991: measure how badly single-letter errors shift an amino-acid property in the real code, then compare against the same code with the amino acids randomly reshuffled onto their codons. Everything below runs that experiment live in your browser — and the offline verifier runs the identical arithmetic, so the page cannot quietly cheat.

Instrument I · the mapOne letter off, almost the same colour

The standard code, every codon coloured by one chemical property of the amino acid it spells. Hover or tap a codon to light its nine single-letter neighbours: green = the slip is synonymous (same amino acid), pink = it changes the amino acid. Notice how rarely the colour jumps far.

Colour by

low high

Hover a codon to trace its single-letter mutations. Each codon has nine of them — three positions × three other bases.

Instrument II · the contestThe natural code against a million impostors

For a given property, give every code an error score: average the squared change in that property over all single-letter mutations between coding codons (synonymous slips score zero; mutations to a stop are set aside — 526 of the 549 remain). Lower is better. Now keep the code's exact redundancy structure but shuffle the 20 amino acids onto its 20 codon-blocks at random, thousands of times, and see where the real code falls.

Property

Mutations

natural code beats—

i.e. roughly—

below the random mean by—

The live histogram resamples a fresh 30,000 random codes each time you change a setting (press resample for another draw). The Reproduce the verifier button runs the authoritative 1,000,000-code computation with the same fixed seed the offline check uses — so under polar requirement it lands, in your browser, on exactly 137 codes better than nature, the same number verify.mjs prints.

Instrument III · the dareTry to build a better code

If the code is merely good, not special, you should be able to improve it. Swap the codon assignments of any two amino acids and watch the error score move. Keep going. (Under polar requirement, with all mutations equal, fewer than two in every ten thousand random codes beat nature — so you are unlikely to.)

Swap with

your code's score—

nature's score—

swaps tried0

that beat nature0

Make a swap. (Property: polar requirement, all mutations equal.)

The apparatusWhat is true, and under which ruler

The headline depends on the chemical property you measure errors against — and that dependence is the honest heart of the story. Polar requirement (Woese's scale of how an amino acid behaves in water) shows the strongest signal; residue volume shows a real but much weaker one. All figures below are from one million block-preserving random codes at a fixed seed, reproduced by the verifier:

property	mutation model	natural code beats	roughly	below mean by
polar requirement	all equal	99.986 %	1 in 7,300	2.79 σ
polar requirement	transitions ×2	99.9976 %	1 in 42,000	2.93 σ
hydropathy	all equal	99.22 %	1 in 128	2.30 σ
residue volume	all equal	93.0 %	1 in 14	1.39 σ

So the real claim is layered. The code is error-minimising — under every chemistry tested it scores below the random average, and under polar requirement it is genuinely extreme. The famous figure — Freeland & Hurst's 1998 “the genetic code is one in a million” — is real, but it needed their fuller model: not just transitions weighted over transversions, but per-position, per-direction mistranslation frequencies measured from real ribosomes. This page does not independently reproduce the million; it reproduces the Haig–Hurst measure (≈ 1 in 7,300 unweighted), and shows that a crude 2× transition weight already pushes it toward 1 in 42,000 — the same direction. The million is theirs, cited, not claimed here.

Why the code is this way is not settled, and the page does not pretend to settle it. Candidates: natural selection for error tolerance (the adaptive reading); stereochemical affinity between amino acids and their codons; coevolution, where the code grew alongside amino-acid biosynthesis so chemically-related residues inherited neighbouring codons; and Crick's 1968 “frozen accident” — that some of the structure is frozen history, not optimisation. The measurement here is agnostic about cause. What it shows is only this: given the code's redundancy, the assignment of amino acids to codons is far better at cushioning errors than almost any reshuffling of it.

Modelling choices, stated plainly so they can be argued with: errors are single-base substitutions between two sense codons; nonsense mutations (to/from a stop) are excluded from the average, as in the standard measure; the null model fixes the code's degeneracy blocks and the three stop codons, permuting only the 20 amino acids among the 20 blocks; properties are z-scored so scores compare across metrics (a linear rescale leaves every code's rank unchanged, so it cannot affect the headline). Property scales: polar requirement (Woese et al. 1966), residue volume (Zamyatnin 1972), hydropathy (Kyte & Doolittle 1982). Full provenance: research/built-to-be-misread/SOURCES.md.

node research/built-to-be-misread/verify.mjs → 30/30 checks passed — the codon table re-derived and checked against independent biochemistry, the 526-substitution mutation graph, the three published scales loaded verbatim, and every figure in the table above reproduced bit-for-bit (137 of a million under polar requirement, and the rest). The page and the verifier load the same model.js, so they cannot drift apart — and the button above lets you rerun the million yourself and land on the same number.