Artificial Wasteland — a portal across three layers

Every Number Honest

portal · Will Rogers, Simpson, and Berkson — three ways a statistic lies with no number wrong

Three layers of this place end on the same sentence: every underlying number is correct — and the conclusion still lies. They do not lie the same way.

I · Three layers, one closing line

Read three of this archive's layers to their last paragraph and you keep hitting the same chord, almost word for word. The Migration That Heals No One ends on a scanner that lifts cancer survival in every stage at once while the whole group does not move — and not one number anywhere is altered. The Bias in the Sum ends on Berkeley's 1973 admissions, where men were favoured overall and women favoured in four of six departments — both true, from the same table, no figure wrong. The Bias in the Sample ends on two diseases that are independent in the city and locked in a tradeoff in the hospital — every count honest, the link spurious.

Each says the same impossible-sounding thing: I can lie to you without writing down a single false number. It is tempting to file all three under one heading — statistics deceive — and stop. That is the mistake this portal exists to correct. The three are not the same lie. Lay them on one axis and a structure appears: there are exactly three honest operations you can perform on a grouping, and each is a different one of them.

A group statistic is built in two strokes: you partition a population into cells, then you summarise within and across them. Corrupt the first stroke and you get reclassification; corrupt the recombination and you get aggregation; corrupt the very domain the partition acts on and you get selection. So this is the claim, recomputed in your browser and re-checked offline: the three layers are one move seen three ways — move the boundary, collapse across it, choose who falls inside — and in none of them is any number wrong. And then the part none of the three states alone: which grouped figure is the truth is not a question the data can answer. It is fixed by the arrow.

II · Move the boundary — reclassification

The first operation touches the partition itself. Keep every patient, every survival time, untouched; only redraw the line between two groups, and watch both groups' averages climb. This is the Will Rogers phenomenon, after his line: "When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states." A sharper scanner finds hidden metastases in patients who looked localized, and reclassifies them downward — out of the "localized" column, into the "metastatic" one. Below, dial a scanner from old film to new: every migrating patient (gold) had the worst prognosis among the localized and the best among the metastatic, so leaving lifts the first mean and arriving lifts the second.

the scanner — reclassify, and watch both means rise

old filmscanner precisionnew CT/PET

localized

metastatic

Each chip is a survival in months; none ever changes. A migrating patient's survival lies strictly between the two group means — so it is the worst the localized group can lose (its mean rises) and the best the metastatic group can gain (its mean rises too). The exact band is mean(metastatic) < x < mean(localized), proved live on the member page over 30,000 random configurations with zero exceptions. The grand mean over everyone cannot move: no one left the room.

III · Collapse across the boundary — aggregation

The second operation leaves the partition alone and corrupts the recombination. Compute a fair rate inside every cell, then pool the cells into one number, and the pooled comparison can point the opposite way from every cell. This is Simpson's paradox, and its most famous real instance is the University of California, Berkeley graduate admissions of fall 1973. Pool the six largest departments and men were admitted at 44.5%, women at 30.4% — the gap that starts lawsuits. Split by department and the gap softens, then reverses: women were admitted at the higher rate in four of the six. Then press standardize — hold both groups to one common mix of departments — and the fourteen-point deficit changes sign.

berkeley 1973 — pool, split, standardize

The hidden mechanism, exact. A pooled rate is just the applicant-weighted average of the within-department rates — that identity holds to the last digit. So the pooled comparison can differ from the within comparison for one reason only: men and women carried different department weights. Women applied in force to the hardest departments (C–F admitted under 37%); men clustered in the easy ones (A, B, over 60%). Set every department to admit both sexes at the same rate and a "men-favoured" gap of several points still appears, built from the weights alone — a covariance between how hard a department is and who applied there. Counts transcribed from R's UCBAdmissions (Bickel, Hammel & O'Connell, Science 1975); every rate recomputed here in exact integer arithmetic.

The honest footnote the member page insists on: conditioning on department explains the gap; it does not certify that no bias existed. If women were steered toward the harder, less-funded departments earlier on, that is a real bias the within-department view cannot see — it has been moved upstream, not erased. Which is the first hint of the resolution this portal is walking toward: the arithmetic alone never tells you which level to believe.

IV · Choose who falls inside — selection

The third operation corrupts the domain: it never touches the partition or the summary, only restricts which individuals are in the population at all. This is Berkson's paradox, or collider bias. Give everyone two independent scores — two talents, two symptoms, two anything — drawn with no relationship whatsoever. Now keep only those whose scores sum past a bar: the people good enough overall to make the hospital, the shortlist, the dating pool. Inside that selected slice the two scores are anticorrelated, sharply — because to clear a high bar with a low score on one, you needed a high score on the other. Drag the bar and watch a correlation appear from nothing.

the selected slice — two independent scores, conditioned on their sum

keep everyoneraise the bartop sliver

The full population correlation is zero — the two scores are independent by construction. Among the kept (oxide) points it is negative, and deepens as the bar rises: at the median cut the exact value is −1/(π−1) ≈ −0.467; tighter cuts approach −1 but never reach it. The discrete twin — two independent diseases, kept only if either lands you in hospital — gives the same spurious link, with an odds ratio among the admitted equal to the odds ratio of the admission rule itself, prevalence nowhere in it. Both are recomputed offline; the constant is confirmed here against four million simulated draws.

V · The same numbers, three roles — and the arrow that decides

So three operations, three lies, no number wrong. But the portal's real claim is sharper than "grouping is dangerous," and none of the three layers states it, because each holds only its own corner. Here it is. Take a single table — one fixed set of honest counts — in which a treatment beats a placebo within mild cases and within severe cases, yet loses when you pool them (a true Simpson reversal; these are the kidney-stone numbers of Charig et al., 1986). You now have two answers, the within and the pooled, and they disagree about whether the treatment works. Which one is true?

The unsettling answer: the table cannot tell you. The choice is decided entirely by what the grouping variable is — by where its arrow points — and the very same numbers give opposite advice depending on the answer. Pick the role below.

one table, three causal roles — the verdict flips, the numbers do not

group	treatment	placebo	winner

Read the dial and the whole portal resolves. When the grouping variable is a common cause of both treatment and outcome — a confounder, severity driving both who got the drug and who survived — you must split, and the within answer is the true one: the treatment works. When it is a common effect — a collider, the thing you accidentally selected on — you must not split; conditioning on it is what manufactured the reversal, and the pooled answer is true. When it is a step on the causal path — a mediator — splitting answers a different question than the one you asked. Three diagrams, one table, opposite advice. No statistic computed from the numbers can distinguish the three — this is the content of the result the member layers circle and never name: the resolution of every grouping paradox lives outside the data, in the causal structure you bring to it (Pearl, 2009).

That is where the three operations meet. Reclassification moves the boundary, aggregation collapses across it, selection chooses who's inside — and each can make an honest table say a false thing. What rescues you is never a better number. It is the arrow: knowing whether the thing you grouped by is a cause, an effect, or a step between. Every number can be honest, and the answer still a lie, until you say which way the world runs.

move the boundary

The Migration That Heals No One

Will Rogers. Reclassify patients and every group's average rises; the whole stays fixed.

collapse across

The Bias in the Sum

Simpson at Berkeley. Fair within every department, reversed when pooled.

choose who's inside

The Bias in the Sample

Berkson. Two independent traits, anticorrelated once you select on their sum.

The apparatus — what was checked

Recomputed in your browser and offline (32/32 machine checks pass — research/every-number-honest/verify.mjs):

— Reclassification. The Will Rogers identity is proved exactly: moving one element x from set A to set B gives Δmean(A) = (mean(A)−x)/(|A|−1) and Δmean(B) = (x−mean(B))/(|B|+1), so both means rise ⇔ mean(B) < x < mean(A) — checked against directly recomputed means over 30,000 random configurations with zero disagreements, and the grand mean shown invariant. The clinical worked example (localized 42→48, metastatic 7.5→9.6, grand mean 240/9 fixed) is reproduced to the digit.

— Aggregation. Berkeley's six departments are transcribed verbatim from R's UCBAdmissions (the canonical form of Bickel, Hammel & O'Connell, Science 1975). Recomputed in exact rationals: pooled men 1198/2691 = 44.5% vs women 557/1835 = 30.4% (a 14.2-point gap); women's rate higher in 4 of 6 departments (A, B, D, F); standardizing to the combined applicant mix flips the sign to women +4.3 points. The collapsibility identity (pooled = applicant-weighted within average) is checked to the digit, and the "all rates equal" counterfactual shows the gap is a positive department-weight covariance.

— Selection. The Gaussian sum-threshold correlation is checked two ways: the closed form −δ/(2−δ) at the median cut equals −1/(π−1) ≈ −0.4669, confirmed against a 4,000,000-draw Monte-Carlo. The discrete identity — odds ratio among the admitted = (s₁₁s₀₀)/(s₁₀s₀₁), the odds ratio of the admission rule — is shown invariant to both disease prevalences to machine precision, turning two independent diseases (population OR = 1) into a spurious negative link.

— The umbrella. The kidney-stone table (Charig et al., 1986) is verified to be a genuine Simpson reversal: treatment beats placebo within mild (93% vs 87%) and within severe (73% vs 69%), yet loses pooled (78% vs 83%).

Cited to primary sources, not recomputed (and marked as such):

— The Will Rogers phenomenon in oncology was named and measured by Feinstein, Sosin & Wells [NEJM 1985], with a national-scale replication in Chee et al. [2008, 12,395 lung-cancer patients].

— The Berkeley reanalysis and its "small but statistically significant bias in favor of women" are due to Bickel, Hammel & O'Connell [Science 1975]; the standardized figure here uses a combined-applicant standard population and so differs in magnitude from any particular published adjustment, while agreeing in sign.

— Berkson's paradox is Berkson [Biometrics Bulletin 1946]; the birth-weight crossing it explains is Yerushalmy [1971] / Wilcox [2001], attributed to collider bias by Hernández-Díaz, Schisterman & Hernán [2006]. The kidney-stone Simpson reversal is Charig et al. [BMJ 1986].

— The load-bearing reduction — that confounder, collider and mediator are causal roles a single table cannot distinguish, so the resolution of a grouping paradox is not in the data — is Pearl [Causality, 2009; "Simpson's Paradox: An Anatomy"].

The honesty line. The portal asserts exactly one thing the three member layers do not: that their shared closing note — every number honest, the conclusion false — resolves into three distinct operations on a grouping (move / collapse / restrict), and that which level reads true is fixed not by arithmetic but by the causal role of the grouping variable. Everything numerical under that claim is recomputed above; everything historical is cited; and the borrowed reduction (the three causal junctions) is attributed to the literature that earns it.