You want the whole set — all n coupons, drawn one at a time at random, with repeats. How long until you have them all? And why does it always feel like the last one will never come?
Every child who collected stickers knows the shape of this. The first few come easily; the album fills fast. Then it slows. You keep tearing open packets and getting ones you already have, and one stubborn coupon — number 47, say — simply will not appear. This is not bad luck or a rigged print run. It is arithmetic, and the arithmetic is exact: it has a name, the coupon collector's problem, and an answer written in the harmonic numbers.
The rules are as plain as they sound. There are n distinct coupons. Each draw gives you one of them, chosen uniformly at random, independent of everything before — so duplicates are not just possible, they are the whole problem. You stop the instant you hold a complete set. The question: how many draws does that take? Every number on this page is recomputed in your browser by the same routine the offline verifier runs.
Watch a collection fill. Each draw lands on a random slot; a slot you already own just flashes and is wasted. The bar at the top counts draws; the slots count distinct coupons. Notice how the two diverge — and how the very end crawls.
Here is the closed form, and it is short. Split the collection into phases: phase k begins the moment you hold k distinct coupons and ends when you find a new one. During phase k there are n−k coupons you still need, so each draw is new with probability (n−k)/n. The wait for a single success at probability p is, on average, 1/p — so phase k costs n/(n−k) draws on average. Add them up:
That Hn = 1 + ½ + ⅓ + … + 1/n is the harmonic number, and it grows with maddening slowness — like ln n + γ (γ ≈ 0.577, the Euler–Mascheroni constant). A standard die (n=6) takes 6·H6 = 14.7 rolls on average to show all six faces. A deck of 52 distinct cards: 236 draws. A set of 100: 519.
Read the phase costs and the cruelty is obvious. The first coupon costs exactly one draw — you cannot miss when everything is new. The last single coupon costs, on average, a full n draws, because at the end only one in n draws helps you. The collection time is not spread evenly; it is back-loaded, and brutally so.
Averages hide the drama. How variable is the collection time? Because the phases are independent, their variances add, and the total comes out to n²·(1 + ¼ + 1/9 + …) − n·Hn, whose leading term is n²·π²/6. So the standard deviation grows like n·π/√6 ≈ 1.28 n — only linearly, while the mean grows like n ln n. The distribution is sharp relative to its mean, but it has a long, heavy right tail: the runs that go on and on.
And there is a clean law waiting at the limit. Rescale the collection time by subtracting n ln n and dividing by n. As n grows, that standardised quantity stops depending on n at all and converges to a single fixed shape — the Gumbel distribution, the law of extremes (Erdős & Rényi, 1961):
The clean answer rests on three assumptions, each named here. Coupons are drawn uniformly — every coupon equally likely. (Real print runs deliberately make the last few rare; with unequal probabilities the expected time only goes up, and the formula becomes a sum over subsets.) Draws are independent, with replacement. And you need one of each; the cousin problem of collecting m copies of every coupon — the “double Dixie cup” problem (Newman & Shepp, 1960) — costs n(ln n + (m−1)·ln ln n + …), which this page does not derive.
One last connection, because it is the reason this layer sits beside two others. To collect the second half of the set — to go from n/2 distinct coupons to all n — costs n·(Hn − Hn/2) draws on average, which tends to n·ln 2. That very sum, Hn − Hn/2, is exactly the chance the hundred prisoners are doomed by an over-long loop. The same harmonic tail decides both. A portal walks that spine.