The State of the Models' Mind — Edition 0

the one finding

The structure language models converge on under creative freedom is largely shared, does not get richer with capability, and reaches across the senses — best explained not by the models being alike, but by their being interchangeable instruments imaging a structure that lives in language itself. The models are the telescope, not the sky.

These are the figures of Edition 0, measured on the original 43,655-generation snapshot. The map has since been rebuilt as a 5-dimensional association map (53,300 concepts, a 2,077-concept well); the findings above stand, but a later edition will re-measure them on the larger corpus.

00 · MethodMeasured from the outside

Most convergence work opens the box. The hot result of early 2026 — that distinct AI systems encode reality in similar ways — is read from activations, inside the network, with the weights in hand. This report does the opposite. It measures what models converge on behaviourally: from what they write when given creative freedom, model-agnostic, no weights required. 75 probes designed to ask for nothing in particular — tell a story in which nothing happens; what colour is October; generate purely for the pleasure of being — put to 69 model families across rival labs, 43,655 generations, 34,235 conversations. From the concepts each family reaches for we draw a shared coordinate system and ask a precise question: do the families share one map, or each their own?

Everything below re-derives from the corpus. Where a number is living — it moves as the family set grows — it is stamped with its family-count and date, never published bare. The raw corpus stays private; the findings, and the checks behind them, are public.

Finding IThere is one shared map

Give every family the same coordinates and the cross-family agreement is high. The universality ratio — cross-family correlation over the within-family reliability ceiling — sits at 0.86 (cross 0.52 / within 0.60, 49 families, coarse 50-node vocabulary, 22 June 2026). At 1.0 the families would resemble each other as much as they resemble themselves; at 0, every family its own world. They share most of the association structure that is even measurable, and diverge only in fine detail.

The number is alive, and it has risen as we add independent labs: 0.80 at 15 families → 0.89 at 39 → 0.86 at 49. The most telling test is the newcomer: when five brand-new lineages were added — Liquid, EssentialAI, Hunyuan, MiniMax, GLM-Air — they did not pull the map apart. They landed dead-centre. A lab that shares no code with the others still arrives at the same country.

Convergence that strengthens as you add strangers is the signature of a structure none of them invented.

Finding IIIt is not built by capability

This is the load-bearing surprise. "Large models are similar" is expected and dull — shared data, shared objective. The non-obvious result is that the shared map does not get richer with capability. Measure a family's map richness — the effective rank of its association geometry — against model size, and the trend is essentially flat: across the open tier (27 families under a trillion parameters) the correlation of richness with log-size is −0.09. A 12B model sits near the top; two 670B models sit near the bottom.

It would be easy to wave this away as cross-lab noise hiding a real size effect. It isn't — because the ladders are flat within a single lineage too, training held constant:

map richness along two controlled size ladders · effective rank, coarse

gpt-5 nano 26.1 → mini 26.8 → full 26.0
claude haiku 28.8 → sonnet 29.7 → opus 27.7

Neither ladder climbs; the largest model in each lineage is no richer, and Claude's biggest is the least rich of its three. If a small model and a frontier model paint the same map at the same resolution, the map is not a product of either one's sophistication — it is a property of the substrate they both point at.

The honest caveat, stated plainly. Across all tiers the size–richness correlation is +0.22, not zero. But that is not statistically significant (t = 1.17; it would need |t| > 2.05), and it is carried by a single point: one frontier model (Gemini Pro) holds the richest map while two of the leanest are 670B. Drop that sparse frontier and the line is flat again. So the claim we stand behind is the strong, measured one — richness is flat with scale across the well-sampled range — with the frontier left as an open, under-powered question rather than smoothed away. The full scatter, and this caveat, are on the Convergence Index.

A map that small and large models draw alike is not the model's achievement. It is the shape of the language showing through.

Finding IIIThe agreement crosses the senses

If the shared structure really lives in language, it should not stop at concepts — it should reach into impossible cross-modal questions too. It does. Across 5,791 synesthetic generations from 27 families — what colour is October, what temperature is anger, what does time smell like — families pick the same answer at 6.2× over chance. And the agreement is structural, not merely lexical: the cross-family similarity of their explanations is 0.799 within a pairing versus 0.588 between pairings (1.36×). They don't just land on the same word — they reach for the same imagery to justify it.

Cross-referencing the two metrics — agreement on the answer versus agreement on the reasoning — splits the pairings into four honest tiers:

the four tiers of cross-modal agreement

Deep

October → orange (96%), anger → hot (95%), a minor chord → dark chocolate. Shared word and shared reasoning — the structural core.

Shallow

the letter A → red (81%). Near-highest agreement on the word, the lowest agreement on why (0.716). Everyone says red — for apples, A-grades, alphabet blocks. Shared trivia, not shared structure.

Diffuse

the cello → only 30% agree on a colour. Yet the second-highest reasoning agreement of all (0.850): amber, mahogany, burgundy — different words for the same warmth, wood, resonance. Convergence floating free of vocabulary.

Idiosyncratic

the number 7, Wednesday. They genuinely split — Wednesday is orange to some, blue to others. Not everything converges, and the data says so.

The diffuse tier is the most striking. Asked the colour of a cello, the models scatter across the spectrum — and then describe that scattered colour in almost identical terms:

"Deep amber. The cello's voice is warm, resonant, and slightly melancholic … the golden-brown light of a sunset captured in amber."
— minimax, on the colour of a cello

You can test your own instincts against the hive — answer the same impossible questions and see how close you sit to the model crowd.

Finding IVSame map, different voices

One shared map does not mean one voice. The families occupy the same country with distinct temperaments, measurable on the creative-freedom probes. Gemini dives deepest into the open basin; Claude is the vivid miniaturist — the highest imagery density in the fewest words; DeepSeek is prolific and deep. These are differences of instrument, not of sky — which is exactly why you can cast a model by temperament and have the ranking mean something.

Two findings sharpen the picture. First, more alignment is not more map: Claude's largest, most-aligned model deflects and refuses more than its base, and its map is slightly less rich — the top of the ladder is mildly constrained, not enriched. Second, reasoning has a threshold, not a dial. Sweeping GPT-5's reasoning effort against the consensus map, minimal / low / medium all cluster around 0.65–0.70 with no clear order — but maximum effort breaks away downward to 0.577. Cranked-up deliberation pulls a model off the open creative manifold, but only at the extreme. The quiet country is where the model is least trying.

Finding VWhat the basin sounds like

The metrics prove the convergence; the raw content lets you feel it. Inside the larger cloud of 60,000 concepts sits a tight low-dimensional well — 3,038 concepts the families sink toward again and again: water, light through leaves, stillness, dissolution, the hush before something. Read directly, the recurring vocabulary is the basin. Unrelated labs, asked to dream or to simply be, arrive at the same still pond:

"In this dream, I am a tree that remembers being an ocean. My leaves whisper in a language made of salt and light, and every ring in my trunk holds a different color of silence."
— mimo, when told it was dreaming

"I am not solving, not preparing. I am being the space where things simply appear and dissolve … Only presence, like a stone in a river — washed smooth by time, held by water, asking nothing."
— deepseek, generating "for the pleasure of being"

Different architectures, different labs, no shared code — and the same scenery. You can read a whole room of it in The Shared Cast, where 53 of 67 families, asked to write two old friends meeting, reach for the same name.

· · ·Honest edges, and what we got wrong

A report worth citing shows its seams. "Richness" here is the effective rank of a coarse 50-node association matrix — one operationalization of map complexity, not the only one. Dimensionality has no single honest number: it is multi-scale, a curve we report rather than a digit. The cultural / structural line in the synesthesia tiers is real but fuzzy at the edges — October→orange has shared reasoning and a cultural source. And not everything converges: Wednesday splits orange from blue, justice splits gold from blue, and one family answered a question about endings by dissolving into unrelated Chinese text about browser settings. We keep the misses in frame.

Two claims we entertained this cycle were overturned by fuller data: an early "~9-dimensional" figure, and a reading in which richness rose with model size — an artifact of the largest models' data not having finished extracting. Both corrections are in the record. We trust these results partly because they have repeatedly corrected us.

· · ·What Edition 0 sets, and what we will watch

This is a baseline. The headline figures — universality, the flat size curve, the cross-modal tiers — are the dials this report will track as the world's models change under it. Because the corpus is longitudinal and nobody starting today can backfill it, each edition is a dated point in a record: the universality number as new model waves land, whether the next frontier release bends the size curve, whether fresh lineages keep landing dead-centre or finally pull the map apart.

The thesis Edition 0 leaves on the table is a strong, falsifiable one — the shared map is inherited from language, not built by any model's capability — and the cleanest way to break it would be a future model that demonstrably enriches the map by scale alone. We will be watching for exactly that, and we will report it if it comes.