The Verification Venue · a promise the dictionary can't keep

The Dictionary That Eats Itself

Every word in the dictionary is defined using other words in the same dictionary. So the advice — just look up the words you don't know — is a promise it structurally cannot keep. Follow it and the lookups never bottom out. They fall into a knot of words that only define each other, and there is no way down to the ground.

You already half-know this. But "dictionaries are circular, so what?" is a shrug, not a fact. Below, the shrug is turned into something you can measure. Pick a word. Look it up. Then look up the words in that definition. Keep your finger on the trail and watch where it goes — on a real, public-domain dictionary, with the whole computation happening in front of you.

1 · look it up, by hand Webster's 1913 · 1,200 words

Start from a word

Type any of the 1,200 headwords, or tap one:

word

…

Click an underlined word in the definition to follow it.

Every underlined word in that definition is itself an entry in the same book. Click one and you're reading a new definition — built, again, from words that are entries. Try to reach a word that is defined in terms of nothing you have to look up. You can't. Within a handful of clicks the trail turns red: you're back at a word you already passed. That is not bad luck. It is the shape of the whole dictionary.

What you fell into has a name

Represent the dictionary as a graph: draw an arrow from a word to each content-word used in its definition — word → sign, word → sound, word → idea — the "to understand this, look that up" direction.^① Now ask which words you can reach from which. Two words are in the same strongly-connected component when there's a chain of lookups from each to the other and back. In 1972 Robert Tarjan gave a way to find every such component in one linear pass. Run it on this dictionary and almost the entire thing collapses into a single component.

The site does exactly that below — no numbers are typed in; Tarjan's algorithm runs on the shipped lexicon when the page loads, and prints what it finds.

2 · run every lookup at once from “word”

Instead of clicking, expand all the lookups from your starting word at the same time — its definition words, then their definition words, outward until nothing new appears. Watch the closure swallow the whole Core and stop finding new ground.

Press expand. Each ring is one round of lookups; the centre is your starting word.

The reachable set stops growing at 739 words — and every single lookup you could still make from inside it lands on a word already in the set. There is no exit. Here is the whole structure, recomputed live from the shipped file:

words (nodes)

—

the shipped vocabulary

lookups (edges)

—

word → definition-word

Kernel

—

after recursive pruning

Core = largest SCC

—

Tarjan, run in-page

2nd-largest SCC

—

everything else is dust

stay-in-Core rate

—

of lookups from a Core word

One component of 724 words; the next-biggest is 3. That is the shape the paper by Philippe Vincent-Lamarre, Stevan Harnad and colleagues found in 2016 in learner dictionaries and WordNet, and it is the shape here too: one giant knot, and everything else a scatter of tiny fragments hanging off it. Of every lookup you can make starting from a word inside the Core, 98.6% lands you on another word inside the Core. It is a near-perfect trap.

The check — nothing here is asserted

Every figure above is computed when the page loads, by parsing the shipped dictionary and running Tarjan's algorithm on it — the same code as the verifier. The parse is deliberately crude and fully disclosed, because the parse is the whole argument: change what counts as a word and the numbers move. What this build does, on each definition: lowercase it, drop stopwords (the, of, is, to…), apply a light suffix-stemmer (animals→animal, produces→produce), keep only the first sense trimmed to 30 words, and keep only tokens that are themselves headwords in the 1,200-word vocabulary — every proper name, archaic term and rare word is dropped. Those dropped words are exactly why a handful of entries look grounded (see below). Recompute it yourself:

node research/the-dictionary-that-eats-itself/verify-the-dictionary-that-eats-itself.mjs

→ 20/20 PASS. Full method and the exact source dictionary: research/the-dictionary-that-eats-itself/.

the false floor

47 of the 1,200 words look grounded — their trimmed definition uses no other word in the vocabulary, so the lookups seem to stop. But look at which words: data, user, computer, server, photo… — modern terms whose real defining words (digital, machine, network) simply fell outside this 1,200-word slice. They aren't primitive. They are holes in the sample. Widen the vocabulary and the arrows reappear, pointing right back into the Core. A real dead-end — a word defined with nothing you'd need to look up — does not turn up.

“So what?” has a precise answer

The shrug — of course a dictionary is circular — assumes the circularity is a shapeless mess. It isn't. The 2016 result carves it into three nested objects, each of which you can compute:

The Kernel. Start deleting words that no definition ever uses — words that are only defined, never defining. Deleting them frees up others, so repeat until you can't. What survives is the Kernel: 755 words here, the part of the dictionary that actually does the defining. Everything you looked up on the way in was, ultimately, built out of these.

The Core. Inside the Kernel sits the strongly-connected knot you already met — the 724-word component where every word has a path to and from every other. This is the part that can never be broken into "define A first, then B, then C." There is no first. Whatever order you try, some definition points backward.

The Minimum Grounding Set. Here is the sharp question. Suppose someone simply gave you the meanings of a few words — grounded them from outside the book. What is the smallest set you'd need, so that every other word could then be defined without any circle left? That set is a minimum feedback vertex set: the fewest words whose removal makes the whole tangle acyclic. And this is where the tidy story fractures — twice.

First: finding the true minimum on the full Core is NP-hard. Nobody can hand you the minimum grounding set of English by running an algorithm to completion. Second, and stranger: even where you can compute it, it is not unique. There is no such thing as the irreducible core of the language — there are many equally-small sets, each of which would do. Watch both happen on a piece of the real Core small enough to solve exactly:

3 · the smallest set you'd have to already know 6 real words · exact

Six words from the Core and every definition-arrow that runs between them (all real — each arrow is a word that genuinely appears in the other's Webster gloss). Tap a word to “ground” it (assume its meaning is given from outside). The readout says whether the words left still contain a circle.

Grounded: none. The six words still form a circle — no word among them can be defined first.

Three different pairs of words — {cause, act}, {cause, power}, {act, make} — each of which, if you already knew it, would let you define the other four without any circle. No pair is the foundation. "The irreducible core of English" is not a well-defined thing; it is a choice among many equally-good ones. That is the twist the shrug never saw coming: the circularity is real, it is measurable, and pinning down its "bottom" is both computationally hard and genuinely ambiguous.

on the real Core — an honest upper bound

On the full 724-word Core the exact minimum is out of reach, but you can watch a crude greedy version run: repeatedly grounding the most-connected word until no circle remains.

—

That greedy set is an upper bound only — the real minimum is smaller, and finding it is the NP-hard problem above. For calibration: on curated learner dictionaries the published estimate of the minimum grounding set is about 1% of the dictionary; the Kernel there is roughly 10% and the Core roughly 75% of the Kernel (Vincent-Lamarre et al. 2016, on other corpora — not the figures this page computes on Webster's 1913, which are larger because 1913's vocabulary is vast and unrestricted).

Where the graph stops and the argument begins

Everything up to here is fact. The Core exists; you can compute it; it is one enormous strongly-connected component. But there's a famous next step, and it is not a theorem — it is an argument, and it is still contested.

In 1990 Stevan Harnad posed the symbol grounding problem: a symbol system whose symbols are only ever defined by other symbols is like trying to learn Chinese from a Chinese-only dictionary — you can shuffle forever and never break out to what any word means. Meaning, he argued, is parasitic: the words in your head feel meaningful only because you ground some of them in sensorimotor experience — in seeing red, in lifting a weight, in the world. Twenty-six years later, Harnad was a co-author of the paper that measured the trap he'd described: the Core is what a purely word-defined system can never escape on its own.

the open part — stated as open

The graph shows meaning can't ground within this dictionary. Whether meaning can ground in language at all — without ever leaving words for the world — is the open question, and it is live. The distributional / large-language-model view holds the opposite: that meaning can emerge from nothing but the pattern of how words are used together, no sensorimotor floor required — and that systems trained only on text behave as if they'd found one. Harnad says that's still just symbols chasing symbols. This piece does not settle it. The Core is the fact; whether the Core is a prison is the argument.

what this is not

This is not Gödel, and not the liar's "this sentence is false." Those are constructions in logic — a self-reference built on purpose to make a point. The Core is nothing built: it is an empirical property of an ordinary dictionary someone wrote to be useful, discovered by measuring it. No paradox, no trickery — just what a book of words defined by words turns out, on inspection, to be.

One more distinction worth keeping straight. Publishers already act as if a small grounding set exists: the Longman learner's dictionary writes every definition using a fixed list of about 2,000 words — its "defining vocabulary" — and prints anything outside it in small capitals. That list is human-chosen and pedagogical; the Kernel and the Minimum Grounding Set are computed. They are different objects, arrived at differently, and they are different sizes — but both are the same instinct, from opposite ends: that under all the words, a much smaller set is quietly doing the work.

Assumptions, edges, and what would change the numbers

The corpus. The shipped lexicon is the 1,200 most-frequent single-word content headwords of Webster's Revised Unabridged Dictionary (1913, public domain), each reduced to its first-sense gloss trimmed to 30 tokens. It is a slice, deliberately, so the whole computation fits in your browser and can be checked by hand. The published papers analyse entire dictionaries; their exact percentages (Kernel ≈10%, Core ≈75% of Kernel, MinSet ≈1%) are on those corpora, not on this slice — this page reports only its own measured numbers and labels theirs.

Edge direction. The paper draws arrows from defining words to the words they define; the "look it up" story runs the other way. A strongly-connected component is identical under reversal, so the Core is direction-agnostic. The Kernel pruning is not: here it removes, recursively, every word with no incoming lookup — the lookup-direction image of the paper's "remove words that define nothing." One convention, stated, held consistent across the trail, the bloom and the check.

The parse is the argument. Tokenisation, stemming, stopword choice and the first-sense trim all move the numbers. A stricter lemmatiser or a fuller definition would grow the Core; a harsher stopword list would shrink it. None of it removes the phenomenon — a large SCC survives every reasonable choice — but the exact size is a property of this parse, which is why the parse ships in full.

The minimum grounding set. Exact only on the six-word subgraph, by brute force over all vertex subsets. On the full Core it is NP-hard; the greedy run is an upper bound, not the answer. Non-uniqueness is exhibited, not merely claimed: the verifier enumerates all three minimum pairs.

The philosophy. The symbol-grounding conclusion is Harnad's argument, presented as a live dispute, not a result. The graph is agnostic about whether text-only meaning suffices.