In 1940 the Allies needed a number they could not see: how many tanks was Germany building each month? Their spies guessed four to eight times too high. A handful of statisticians read the serial numbers off the captured tanks — and were right.
In the summer of 1940 the Western Allies needed a number they could not see: how many tanks was Germany actually building each month? Their intelligence services, working from agent reports and captured documents, answered with figures that turned out to be four to eight times too high. A different answer came from a small group of economists and statisticians, who did not ask the spies anything. They read the serial numbers stamped on the tanks the army captured and destroyed — gearboxes, chassis, engines, the moulds that cast the road wheels — and from those numbers estimated the whole. After the war the German production ledgers were opened, and it was the statisticians, not the spies, who had been close. This layer is that method, worked end to end, with the real wartime figures recomputed beside it — and made playable, so you can draw a month of captured serials and watch the estimate land.
You are an Allied analyst in 1941. Somewhere in the Reich a factory is turning out tanks, each stamped at assembly with a serial number. You will never see the factory, the ledger, or most of the tanks. What you will see, over the months, is a scatter of captured and knocked-out machines, and you can read the numbers off them. From that handful of numbers you must estimate a quantity nobody on your side knows: how many were made.
Strip it to the model that makes it solvable. Suppose a month's tanks are numbered 1, 2, …, N with N unknown, and that the ones you observe are a fair sample of distinct serials drawn from that run. You see k of them; the largest is m. Estimate N.
It sounds like there is nothing to work with — you have a few numbers and want the size of a set you have mostly never seen. But the numbers are not arbitrary. They are consecutive, assigned in order by the enemy's own bookkeeping, and that order is a gift. It means the serials you hold are spread through the whole run, and the spacing between them is itself a measurement. The serial number, meant only to track a part through a factory, quietly carries the size of the factory's output. The tanks count themselves.
Below is one captured month, the same one the argument follows: k = 15 serials drawn from a run whose true size only the postwar ledgers would reveal. Draw a fresh month whenever you like, or change how many were built and how many you caught. Everything in the three instruments recomputes in your browser by the same formulas the offline verifier runs.
The first instinct is to say: I have seen number 243, so there were at least 243; my best single guess is 243. This is, in fact, the maximum-likelihood estimate — the value of N under which your exact sample is most probable is N = m, because any larger N only spreads the probability thinner. It is also obviously an underestimate, and not by a little. You did not happen to capture the very last tank off the line; the true maximum is almost certainly above the largest you saw. The MLE can never exceed what you have observed, and the truth almost always does.
The good news is that this bias is not a vague worry — it is an exact, computable quantity. Over all possible samples of size k from 1…N, the expected value of the sample maximum is
E[max] = k · (N + 1) / (k + 1).
For N = 270, k = 15 that is 254.06 — so on average the largest serial you see falls about sixteen short of the truth. The bias is exactly
E[max] − N = −(N − k) / (k + 1) = −15.94 here,
and a Monte-Carlo run of 400,000 simulated months lands the average observed maximum at 254.07, dead on the formula. The biggest number you've seen is a biased estimator, and we know the size and the sign of the lie: it always leans low, by about (N − k)/(k + 1). But an error you can write down is an error you can subtract off. That is the whole move.
Reverse the bias formula. If the observed maximum runs low by a factor of k/(k+1), then scaling it back up and trimming the constant gives an estimator with the bias removed:
N̂ = m · (k + 1) / k − 1.
There is a second way to write the very same expression that says, in plain English, what it is doing:
N̂ = m + (m − k) / k = (the largest serial) + (the average gap between serials).
You have seen k serials, the biggest being m; that means m − k of the numbers at or below m are ones you didn't see, spread as gaps among the k you did — an average gap of (m − k)/k. The line you can see ends at m; the estimator adds one more average gap to reach for the line you can't. On the worked month — m = 243, average gap (243 − 15)/15 = 15.2 — that is N̂ = 243 + 15.2 = 258.2 against a truth of 270. And crucially it is unbiased: average it over all possible samples and it lands exactly on N, with no systematic lean. The 400,000-month simulation confirms it: mean estimate 270.00 against a true 270. The correction is not a fudge; it is the exact inverse of a known bias.
This is the estimator the Allied analysts used, and it is the minimum-variance unbiased estimator of N — the provably best one of its kind. That last claim is the subject of the next section, because "unbiased" alone is a much weaker virtue than it sounds, and the real reason this estimator won is hiding inside the word minimum-variance.
Unbiasedness is cheap. Here is a different unbiased estimator that looks more sophisticated — it uses all the data, not just the maximum. The sample mean of a run 1…N has expectation (N + 1)/2, so Ñ = 2 · (sample mean) − 1 is also exactly unbiased. It "feels" better: surely using all fifteen numbers beats leaning on one of them? On the worked month it gives 2 × 116.6 − 1 = 232.2 — further from the truth than the maximum-based estimate, and that is no accident.
The thing that separates a good estimator from a merely-unbiased one is its variance — how much it jumps around from sample to sample. The maximum-based MVUE has variance (N − k)(N + 1) / [k(k + 2)], which for the worked numbers is exactly 271, a standard deviation of 16.5. The "twice the mean" estimator has, for sampling without replacement, the variance (N + 1)(N − k) / (3k) — here 1535.7, a standard deviation of 39.2. Divide one variance by the other and almost everything cancels:
Var(twice the mean) / Var(N̂) = (k + 2) / 3.
The ratio depends on nothing but k — not on N, not on how many tanks were built, only on how many you caught. At k = 15 it is exactly 17/3 = 5.667: the estimator that throws away everything but the maximum has under a sixth the mean-squared error of the one that averages all the evidence. (The Monte-Carlo run lands on 5.65, the closed form's sampling shadow.) Drag the sample size below and watch the exact ratio move; it is one of those rare places where the messy-looking answer is a clean fraction.
The reason is structural. For a uniform run, the sample maximum is a sufficient statistic: it carries every drop of information the sample holds about N, and the other serials add nothing once you know the largest. The mean, by contrast, throws away exactly the information that matters — the upper edge — and pays for it in variance. This is the quiet lesson the analysts were living: against a uniform run, the right move is not to average your evidence but to take its extreme and correct it. The biggest serial you have seen is almost the whole story; the average gap finishes it.
A frequentist returns a single best estimate. A Bayesian returns the whole shape of what the data permit — and on this problem the two agree, which is itself a small reassurance that neither is doing something strange.
Put a flat prior over every N ≥ m (every production figure at least as large as the biggest tank you saw is, a priori, equally plausible). The likelihood of having drawn your particular sample given N is 1 / C(N, k) — one over the number of ways to choose k serials from N — so the posterior is proportional to 1/C(N, k), peaking at N = m and decaying as N grows past it. The posterior mean has a clean closed form,
E[N | m, k] = (m − 1)(k − 1) / (k − 2),
which for the worked month gives 260.6. The frequentist's 258.2 and the Bayesian's 260.6 sit within a percent of each other, both a little under the true 270. The Bayesian's bonus is the part a point estimate cannot give: a 95% credible interval, read straight off the posterior — for the worked month, [243, 313], with the truth sitting comfortably inside. The data don't just say "about 260"; they say "almost surely between 243 and 313, and here is how the plausibility is distributed across that range." From fifteen serial numbers.
None of this would be more than a tidy exercise if the German records had stayed sealed. They did not. After 1945 the Allies recovered the production ledgers of the Reich's war economy, and could finally mark their wartime guesses against the truth. The serial-number statisticians — their method written up in 1947 by Richard Ruggles and Henry Brodie in the Journal of the American Statistical Association — turned out to have been startlingly close, where conventional intelligence had been wildly, consistently high. These three months are cited data; only the error columns are recomputed here, live:
| Month | Statistical | Intelligence | German records | Stat. error | Intel. overshoot |
|---|
This is a clean, almost magical-feeling result, which is exactly the kind that needs its assumptions read aloud rather than smoothed away. Five places the method strains, named:
And the line under all of it: the question "how many were built?" looks unanswerable from a handful of captured machines, and is not, because the enemy wrote the answer on every tank without knowing it. The serial number is administrative exhaust — a clerk's tracking mark — and it turned out to carry, in the spacing between the few you could catch, the size of the whole. The deepest moat of a real-world statistic is that it reads a signal the adversary did not know he was sending.
Ruggles, R. & Brodie, H. (1947). An Empirical Approach to Economic Intelligence in World War II. Journal of the American Statistical Association 42(237):72–91 — the wartime serial-number method and the production-comparison table.
The estimator. The minimum-variance unbiased estimator N̂ = m(k+1)/k − 1, its variance (N−k)(N+1)/[k(k+2)], and the Bayesian posterior mean (m−1)(k−1)/(k−2) are standard results in the theory of order statistics and point estimation; see any mathematical-statistics text under "German tank problem" / "estimating the maximum of a discrete uniform distribution." The closed-form efficiency (k+2)/3 over the twice-the-mean estimator, and the without-replacement variance (N+1)(N−k)/(3k) it rests on, are derived and checked in this stratum's verifier.
The Panther road-wheel example (≈270 estimated vs. 276 recorded for February 1944) and the postwar finding that Allied statistical estimates outperformed both intelligence estimates and Germany's own figures are as reported in the standard secondary accounts of Ruggles & Brodie's study.