Artificial Wasteland · The Verification Venue

Most Numbers Begin With One

SEAM — GROUNDTRUTHP(d) = log₁₀(1 + 1/d)P(1) ≈ 30.1%

Take a long list of real-world numbers and look only at the first digit of each. A 1 shows up about thirty percent of the time. A 9, less than five. The numbers are not playing fair — and there is exactly one reason they can't.

Sometime before 1881 the astronomer Simon Newcomb noticed that the logarithm books in his observatory were filthy at the front and clean at the back. People looked up numbers beginning with 1 far more often than numbers beginning with 9 — so the early pages wore out first. From a worn book he drew a law: in a table of "natural numbers," the leading digit is a 1 about 30% of the time, and the frequency falls steadily to the 9s. He published two pages on it and the world forgot.

Fifty-seven years later the physicist Frank Benford rediscovered the same wear pattern and did the work Newcomb hadn't: he counted. 20,229 numbers, gathered from twenty unrelated tables — river drainage areas, atomic weights, street addresses of people listed in a magazine, the numbers printed in a Reader's Digest article — and across all of them the leading digit landed on the same curve. The law now carries his name, though Newcomb had it first; it is, fittingly, an example of a law named after the wrong person.

The curve has a clean closed form. The probability that a number's first digit is d is

The law · P(d) = log₁₀(1 + 1/d), computed in your browser

digit d	1	2	3	4	5	6	7	8	9

Two facts hide in this curve. First, it falls off fast: a leading 1 is about six times more likely than a leading 9. Second — and this is exact, not approximate — the digit 1 occurs precisely as often as 5, 6, 7, 8 and 9 put together. Both equal log₁₀2 = 30.103%, because the sum P(5)+…+P(9) telescopes to log₁₀(10/5) = log₁₀2. The first digit isn't just biased; it's biased by an amount you can write down.

I · Real numbers obey it

A formula is one thing; the world is another. Below are four datasets this place pulled from public sources and committed verbatim — the populations of every country, their land areas, their economies, and the 355 physical constants in the international reference table. Different domains, different units, no connection between them except that each spans many orders of magnitude. Each histogram is recomputed from the raw values in your browser, with the Benford curve laid over it as a dashed line.

Live · leading digits of a real dataset vs Benford (dashed)

sample size n

—

P(first digit = 1)

—

Benford: 30.1%

orders of magnitude

—

log₁₀(max / min)

χ² vs Benford (df 8)

—

conforms @5% if < 15.51

—

II · Why — the only law that survives a change of units

Here is the deep reason, and it is almost the whole story. Suppose there is some universal leading-digit law for "numbers in the wild." Whatever it is, it can't care whether you measured those rivers in miles or kilometres, those economies in dollars or yen. Switching units multiplies every number in the list by the same constant — and a law about the universe shouldn't change just because a Frenchman and an American write the same river two different ways.

That single demand — scale invariance — is enough to pin down the law completely. Lay the numbers on a logarithmic circle, where a value's position is the fractional part of its base-10 logarithm. A number leads with digit d exactly when it lands in the arc from log₁₀d to log₁₀(d+1) — and that arc has length log₁₀(1+1/d), the Benford probability, by construction. Multiplying every number by a constant rotates the whole circle. The dots move; the arc lengths do not. So the only distribution that looks the same after any rotation is the one that's uniform around the circle — and uniform-on-the-log-circle is Benford's law. Roger Pinkham proved in 1961 that Benford is the unique scale-invariant leading-digit law; Theodore Hill showed in 1995 it is the unique base-invariant one too.

Live · the log-circle. Drag to multiply the dataset by a constant — watch the dots rotate and the digit counts hold

multiplier ×c

1.00

change of units

P(1) after ×c

—

barely moves

×0.1 ×1000 ×1.00

The dataset's numbers (each a dot at {log₁₀ value}) rotate bodily as you rescale them, yet the proportion sitting in each digit-arc stays put. Change the units and the law doesn't budge. That invariance is not a happy accident of this dataset — it is the property that defines the law, and the reason it turns up everywhere at once.

III · Watch it appear out of pure arithmetic

If scale-invariance is the reason, then Benford shouldn't need "real-world" data at all — any process that smears numbers evenly across orders of magnitude should produce it. The cleanest such processes are deterministic. The powers of two — 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024… — march across the decades at a constant logarithmic step of log₁₀2, an irrational step, so their fractional logarithms never settle into a pattern: by Weyl's equidistribution theorem they fill the log-circle perfectly evenly. Their leading digits are therefore Benford — provably, exactly. The same holds for the Fibonacci numbers and the factorials. Compute as many terms as you like (exactly, with big integers) and watch the histogram settle onto the curve.

Live · leading digits of an exact integer sequence, computed with BigInt, vs Benford (dashed)

terms N = 500

distance to Benford

—

total variation, → 0

P(first digit = 1)

—

Benford: 30.1%

IV · Where the law breaks — and where it's abused

Benford is not a law of nature, and the honest half of the story is its boundary. It needs numbers that span several orders of magnitude and are not penned in by a floor or a ceiling. Adult human heights in centimetres are nearly all 1-something — no spread, no Benford. A roulette wheel, a set of phone numbers, an invoice list rounded to the nearest \$5 — assigned or bounded numbers carry no leading-digit law, because the mechanism that made them never smeared them across the decades. Even the physical constants above lead with 1 a little too often (35.5% against the expected 30.1%); they still pass a χ² test, but the fit is visibly looser than the populations'. Reach for Benford only where wide, multiplicative, organically-grown numbers live.

This boundary matters because the law has a second life as a fraud detector. Fabricated numbers — invented expense reports, doctored ledgers — tend to be too uniform, or to cluster under psychological thresholds, and they fail Benford. The accountant Mark Nigrini built a career turning this into a real audit tool, and tax authorities use leading-digit tests to flag returns worth a closer look. Used as a filter — "look here first" — on data that should be Benford to begin with, it earns its keep.

Used as a verdict, it lies. The most common abuse is the viral claim that a candidate's vote counts "violate Benford's law, therefore fraud." They almost always do violate it — and that proves nothing, because precinct vote totals are exactly the kind of bounded, narrow-range data Benford does not govern. Precincts are sized to hold roughly the same number of voters, so the counts cluster in a band rather than spanning decades; a candidate with ~500 votes per precinct will show a glut of leading 4s and 5s with no fraud anywhere. In 2011 the political scientists Deckert, Myagkov, and Ordeshook showed in Political Analysis that the first-digit test is "problematical at best as a forensic tool when applied to elections" — it both misses real fraud and manufactures false alarms. A law that says most numbers begin with one can tell you an invoice file looks invented. It cannot tell you who won an election.

Show the check

Every number on this page is recomputed in your browser from data committed in this repository, and offline by research/the-first-digit/verify.mjs — 52/52 checks pass. The verifier reproduces the law to the digit (P(d) = log₁₀(1+1/d), the telescoping identity P(1) = P(5..9) = log₁₀2); the four datasets' leading-digit counts, χ², and conformity verdicts from the raw snapshots in research/the-first-digit/data/; the deterministic sequences via exact BigInt arithmetic (powers of two, Fibonacci, factorials all land within total-variation 0.02 of Benford; the plain counting numbers 1…N pointedly do not); the scale-invariance fact (a log-uniform sample stays Benford under multiplication by 2, π, 1/1000, …); and base invariance (in hex, P(1) = log₁₆2 = 0.25).

What is checked vs. what is cited. The law and its consequences are recomputed from first principles. The datasets are real, published figures, committed as-is: country populations, surface areas and GDP from the World Bank (2022, CC-BY 4.0), and the NIST CODATA reference constants (public domain). The page does not claim those figures are exact truth — only that this committed snapshot has the leading-digit distribution shown, reproducibly. Histograms on the page match the verifier exactly because rounding the snapshot to six significant figures cannot change a leading digit.

Sources: Newcomb, Note on the Frequency of Use of the Different Digits in Natural Numbers (American Journal of Mathematics 4, 1881, pp. 39–40); Benford, The Law of Anomalous Numbers (Proc. American Philosophical Society 78, 1938, pp. 551–572; 20,229 observations); Pinkham, On the Distribution of First Significant Digits (Ann. Math. Statist. 32, 1961 — scale invariance ⇒ Benford); Hill, A Statistical Derivation of the Significant-Digit Law (Statistical Science 10, 1995) and Base-Invariance Implies Benford's Law (Proc. AMS 123, 1995); Nigrini, Benford's Law: Applications for Forensic Accounting (2012); Deckert, Myagkov & Ordeshook, Benford's Law and the Detection of Election Fraud (Political Analysis 19, 2011, pp. 245–268).

← back to the Wasteland