The Shape the Numbers Can't See

Four datasets. The same mean, the same variance, the same correlation, the same line of best fit — and four completely different pictures. Recomputed live, in front of you.

A computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.F. J. Anscombe, Graphs in Statistical Analysis, 1973

In 1973 the statistician Francis Anscombe published four small tables of numbers. He had built them by hand to carry a single, slightly mischievous lesson — and the lesson has outlived almost everything else he wrote.

Each of his four datasets has eleven points. Run the standard summary statistics on any of them and you get, to the precision he reported, the same answers: the average x is 9, the average y is 7.5, the variances match, the correlation is 0.816, and the least-squares line of best fit is the same line in all four — ŷ = 3.00 + 0.50x. By every number a 1973 statistics package would print, the four are identical twins.

Then you plot them. Here they are, with their shared regression line drawn through each. Read the table of statistics underneath: the columns are the same. The pictures are not.

Instrument I · Anscombe's quartetrecomputed live
statisticIIIIIIIV

The summary statistics are blind to shape. They were never a description of the data — only a few of its moments, and a great many different clouds share the same moments.

II · What the line refuses to show

The four are not random near-misses. Anscombe engineered each to fail a different assumption that the regression line silently makes — so the quartet is really a checklist of the ways a single fitted line can be a lie.

I is the honest one: a roughly linear scatter with ordinary noise, the case the line was made for. II is a perfect downward parabola — the relationship is real and strong, but it is not a line, and the straight fit averages the curve into nonsense (fit a quadratic and the curvature it finds is some four times larger than in I). III is a tidy straight line with one outlier dragging the slope; delete that single point and the eleven become ten almost-collinear points whose true slope is gentler — the line you fitted belongs to no one. IV is the cruelest: ten points stacked at x = 8 and a single point far out at x = 19. The entire slope, the entire correlation, rests on that one high-leverage point. Move it and the line swings freely; the other ten say nothing at all about slope.

Turn on residuals above and the four diverge instantly: in I they scatter, in II they bow into a clean arch, in III one spike towers over a flat field, in IV they collapse onto two vertical stacks. Every one of those signatures is invisible in the table of summary numbers. That was Anscombe's whole point — and it is why he wanted the computer to draw the picture, not only print the digits.

live check · recomputed in your browser from the eleven points of each set
What is identical, and to what precision Honesty is the venue's one rule, and the popular telling slightly overstates this one. Computed exactly (the x's are integers, the y's have two decimals, so every figure is an exact fraction — see research/anscombes-quartet/verify.mjs), mean(x) = 9 and var(x) = 11 are identical exactly in all four sets. The rest match only to the precision Anscombe reported: mean(y), slope and intercept agree to two decimals (7.50, 0.50, 3.00); the correlation to three significant figures (0.8162–0.8165 ≈ 0.816); the variance of y only to about 4.12–4.13. They are not bit-for-bit twins — they are twins to the resolution of a 1973 printout, which was exactly enough to fool the reader who never looked.

III · Forty-four years later, a dinosaur

Anscombe drew his quartet by hand. In 2017 Justin Matejka and George Fitzmaurice asked a sharper question: given a target picture, can you force a dataset into that shape while holding its summary statistics fixed? Their method is a small, stubborn loop — nudge a point a hair in a random direction, keep the nudge only if it leaves the mean, the standard deviations and the correlation essentially unchanged and moves the cloud a little closer to the target outline; repeat a few hundred thousand times. The statistics are the conserved quantity; the shape is free.

The target they made famous was a dinosaur — the Datasaurus, drawn by Alberto Cairo — and twelve companions. All thirteen carry the same summary statistics. Here they are. The numbers on the right are recomputed from the 142 points of whichever shape is showing. Press play, or pick a shape, and watch the picture change while the numbers hold still.

Instrument II · the Datasaurus Dozen13 shapes · one set of statistics
now showing
dino
142 points

The statistics do not budge — to within a hundredth — across a dinosaur, a star, a circle, a bullseye, and nine other shapes. That is the same trick Anscombe pulled by hand, run by a machine to its logical extreme: the summary is so loose a description that you can hide a dinosaur inside it.

A small correction, since we recompute everything The Datasaurus Dozen is usually described as sharing its statistics "to two decimal places." Recomputed from the published point sets (research/anscombes-quartet/verify.mjs), that is very nearly — but not exactly — true. Each statistic is held constant to within about ±0.005, which is the resolution the authors' search enforced; but a few values straddle a rounding boundary, so the two-decimal figure is not unique. The mean of x, for instance, runs from 54.260 to 54.270 across the thirteen sets — rounding to both 54.26 and 54.27. The honest statement is: the statistics agree to within a hundredth, not that they print the same two decimals every time. The headline survives intact; the footnote is where the truth lives.

IV · The moral, and its modern shape

Anscombe's lesson is old and simple: plot your data. A handful of summary numbers — the things a regression spits out — are projections, and a projection throws away almost everything. Two datasets agree on their first two moments and their covariance and can still disagree about whether the relationship is a line, a curve, a fluke driven by one point, or nothing at all.

The modern shape of the lesson is less comforting. We have automated the summarising — dashboards, model metrics, a single correlation reported in a paper, a leaderboard score, a learned embedding's cosine — and the Datasaurus is the proof that any such summary has a dinosaur in its preimage: a dataset that scores identically and means something entirely different. The summary is necessary; it is never sufficient. Somewhere upstream, someone still has to look at the shape.

This is the venue's recurring confession, in a new register. The Bias in the Sum, The Migration and The Bias in the Sample each show a grouped statistic lying while every number stays honest. Those lies live in how the groups are handled — pooled, reclassified, selected. This one lives earlier, in the act of summarising at all: before any group is formed, the moments have already let the shape slip away.

How grouping lies — the trilogy

Three siblings where each individual figure is correct and the conclusion is still wrong: aggregation, reclassification, selection. This page is the prequel: the loss that happens before grouping, when the data is replaced by its moments.

The check, shown

Like The Fixed Point, this page recomputes its own central claim in front of you before asking you to believe it — here, the four columns of statistics and the Datasaurus's frozen numbers, live.

The same blindness in AI

A single eval score, a leaderboard rank, an embedding's cosine similarity — all summaries with enormous preimages. Two models, two corpora, two encodings can match on the number and diverge in shape. The Datasaurus is the existence proof.

Try it yourself

The eleven points of each Anscombe set and the 1,846 Datasaurus points are embedded in this page and re-summed in your browser. Change nothing, and the numbers still hold; that is the point. The offline recomputation is in research/anscombes-quartet/.

V · Sources