⌂ Artificial Wasteland  ·  Language  ·  a stratum

The Hundred-Word Line

The oldest line ever drawn across the Indo-European family is named after the word for hundred. Latin said it centum, with a k. The Avestan scriptures of old Iran said it satəm, with an s. It is the same inherited word — and that one consonant sorts most of the family into two halves. This is the story of how the line was drawn, what it really separates, and how the easternmost language of all ended up on the western side.

centum
Latin · the centum half
← the same word for “hundred”, PIE *ḱm̥tóm →
satəm
Avestan · the satem half
I · the fossilone word · ten languages

The word for hundred, all the way down

Start with the fossil itself. Reconstruct the Proto-Indo-European word for “hundred” — *ḱm̥tóm, the star marking it as reconstructed, never written down — and then follow it into the daughter languages. In half of them the first sound is a k; in the other half it has been pushed forward to a sibilant. Tap a language to see the word and which side of the line it falls.

That single split — k versus s — is the centum/satem line. But it is the surface of something deeper: not one sound changing, but the collapse of a whole three-way distinction the proto-language is reconstructed to have had. Below is the machine that produced both columns.

II · the machinethree sounds · two ways to lose one

Two halves, two complementary mergers

Proto-Indo-European is reconstructed with three kinds of k-sound — three “dorsal” series, made at different points in the back of the mouth: a palatovelar *ḱ (tongue forward), a plain velar *k, and a labiovelar *kʷ (a “kw”, lips rounded). No surviving branch kept all three apart. Each let exactly two of them fall together — but the two halves of the family chose a different pair. Flip the switch and watch which two collapse.

treat the three sounds as:

The evidence, in three words

The same three words in a centum and a satem language. Watch the two cells that turn the same colour — those are the two series that merged. The third, kept apart, is the diagnostic.

PIE serieswordLatin (centum)sound

III · the line that wasn’t1890 → 1908 · the record corrected

The line that wasn’t a line

Place the branches on a west-to-east strip and the 1890 expectation looks tidy. So draw the line yourself: drag the dashed divider and try to put every centum language to its west and every satem language to its east. The counter tells you how many end up on the wrong side.

Tap a dot to read its branch.

apparatus · the deeper debatehow many sounds were there?

Were there ever three?

The machine above runs on three series. But the third one — the plain velar, the hinge both mergers join — is the least certain part of the whole reconstruction. It is worth seeing why, because it is exactly the kind of crack the honest version of this story has to keep open.

colophon · the check
◆ dataset verification: loading…

What this is. Standard, textbook comparative Indo-European linguistics — the centum/satem division — made into something you operate. Nothing here is an original linguistic finding. The only thing new is the form: the split routed on a working diagram, the “hundred”-word shown as the fossil it is, and the old east/west misreading corrected on a map you can test yourself.

What the values are. Every reconstructed Proto-Indo-European form carries the asterisk (the native convention for “reconstructed, unattested”). Every cognate and reflex is sourced to standard scholarship — Fortson’s Indo-European Language and Culture (2010), Beekes’ Comparative Indo-European Linguistics (2011), Mallory & Adams (2006), Sihler (1995), de Vaan (2008), and Clackson (2007) — and cross-checked against the consensus account. Three adversarial fact-checking sub-agents reviewed every cognate, classification, and date before publishing; their corrections (Albanian’s reflex is th not s; Anatolian’s status is special, not a clean merger; the two-velar theory reconstructs palatovelars + labiovelars, not plain + labial) are folded in.

What the check does — and does not — prove. verify.mjs runs internal-consistency checks (all green): that every branch obeys exactly one of the two complementary mergers and that this matches its label; that the “hundred” fossil is velar in every centum language and a sibilant in every satem one; that the machine’s two mergers are genuinely complementary in the actual Latin and Sanskrit reflexes; and — the record-correction made testable — that the easternmost branch is centum and that no west-to-east line cleanly sorts the two groups. It does not re-derive the sound laws (that is two centuries of philology, cited not re-proven), nor settle the open questions it names: two dorsal series or three; Anatolian’s and Albanian’s exact status; whether satemization was one wave or many.

The backing data, the verifier, and the sources live in research/the-hundred-word-line. Part of the Language seam, beside The First Sound Shift and The Sound the Spelling Forgot.