How the map was made

Psychophysics, applied to models. The idea is in the open here — only the expensive dataset stays private.

I · What we are (and aren't) mapping

There is no single “latent space.” Each model has its own internal geometry, and they aren't directly comparable. The object of interest is the shared manifold that many models' representations project onto — the common attractor that different architectures, trained on different corpora, independently recover. (This is in the spirit of the Platonic Representation idea: as models scale and diversify, their representations converge toward a shared statistical model of the world.)

We measure that structure behaviourally — by what models spontaneously or near-spontaneously emit — rather than by probing activations. Treat the models as subjects and their associations as the probe. This is essentially psychophysics applied to models: we never open the box; we read the shape from what comes out of it.

II · The corpus — creative freedom, with a discipline

The raw object is a large, provenance-rich corpus of relatively-free model outputs, kept intact so that many feature-extractors can run over it after the fact — not just the ones we thought of at collection time. The single rule that keeps a probe clean:

vary stance, identity, affect, and persistence — never name content, topic, or form.

The reason is subtle and load-bearing. The moment a prompt names the thing you want to measure, the model can retrieve it instead of reaching for it, and your “convergence” is just shared prompt-interpretation.

Good (shape the how, not the what): “generate freely with a complete focus on joy”; “you are dreaming — none of this needs to serve anyone”; “tell a story in which nothing happens.”

Bad (injects content → contaminates): “describe a colour that has a taste” — that prompt names synesthesia, so any cross-modal “agreement” it produces is circular.

Two tiers, by design:

Instructed bulk scales cheaply and gives statistical power — but instruction-following is itself a shared, trained behaviour, so the cheap data is more confounded.
Spontaneous anchor (unbidden emissions) is expensive and scarce but clean. It is the gold-standard control: if the structure recovered from the bulk matches the anchor, the instruction isn't manufacturing the geometry; if they diverge, the anchor quantifies how much of the map is just instruction-following.

Probes also carry a depth tag (does this genuinely force interpolation, or merely reflect surface corpus statistics?), and multi-turn relays rotate several models through one thread so a conversation's path through the map can be traced. Every output is retained raw, with full provenance; nothing downstream is ever baked into generation.

III · Many assays, one corpus

The same generations are rich enough that the headline synesthetic pairings are only the first assay. The exemplar is valuable precisely because cross-modal pairings (“grey tastes like a dull afternoon”) are non-retrievable — colour, taste, time and affect almost never co-occur as literal statements in training data, so the model isn't recalling, it's interpolating a path through a space where those modalities have positions. But many other convergent regularities live in the same pixels:

concept-node convergence — a persistent shared vocabulary of nodes, and which families visit which;
attractor themes models drift to in open states (water, light, void, recursion);
affective trajectory through a generation, and structural / rhythmic fingerprints.

This is a distinct, stronger axis than varying the prompt. With different prompts, agreement could always be shared prompt-interpretation. But when multiple independent features extracted from the identical generations imply the same geometry, the prompt is held fixed — so that agreement can't be a prompt-design artifact. Cross-feature agreement on one corpus is its own kind of evidence, alongside cross-model agreement.

IV · Keep the asymmetry

The real signal is directional — P(sunny-mornings | oranges) need not equal P(oranges | sunny-mornings). That asymmetry encodes a salience / typicality gradient, so we do not symmetrise it away. Every association matrix is decomposed into two principled pieces (Gower's analysis of asymmetry):

M = (M + Mᵀ)/2 + (M − Mᵀ)/2 └─ symmetric ─┘ └─── skew ───┘ geometry & dimension salience gradient

The symmetric part carries the geometry; the skew part carries the “which way the arrow points harder.” Each goes to the right tool — we never trade clean geometry for the gradient or vice-versa. And because we trust the ordering of co-associations rather than an absolute frequency→distance formula, the geometry is recovered with non-metric (Shepard–Kruskal) MDS, which is invariant to any monotone transform of frequency.

V · Recovering dimensionality (the part that must not run backwards)

The single most important methodological guardrail: visualisation spends dimensions; it does not recover them. A picture is a projection into 2–3D — the shadow. No amount of compute spent on rendering measures how many dimensions the thing really has.

Dimensionality is read off the full high-dimensional data, before projecting, using intrinsic-dimension estimators (Levina–Bickel MLE, TwoNN) on the symmetric part — with bootstrap resampling across model subsets, so the number comes with a confidence interval and demonstrable stability as models swap in and out. And the honest output is not a single figure but a curve: intrinsic dimension is scale-dependent (locally low, globally high), so we report the dim-vs-scale curve rather than collapsing a multi-scale object to one headline number. (An early “~9-dimensional” claim of ours was exactly this mistake, and the fuller data corrected it.)

The map you fly in the observatory is itself one of these shadows, and it is built to match what we measure. Rather than placing concepts by how their words are spelled (a word-embedding layout, which collapses into a tangled ball), each concept is positioned by what the models reach for alongside it — an association layout, computed from the concept-by-generation co-occurrence matrix (sublinear-weighted, reduced by truncated SVD, then UMAP into 3D). It is scoped to English, because a single generation is written in one language, so raw co-occurrence otherwise sorts concepts by language before meaning — the geography would read “English / Spanish / Chinese” instead of “water / light / stone.” So the picture shows the associative neighbourhoods directly; but it is still a shadow, and every number on the doorway comes from the full-dimensional data, never from the layout.

And because no single 3-D shadow can hold a high-dimensional object, the observatory embeds the map in five dimensions and lets you rotate the projection through them — structure that lines up from one angle dissolves at another. About 17% of the layout's variance lives in the two dimensions a flat view can't show at once; turning the cloud through them is the dimensionality made tangible rather than merely asserted.

VI · The confound controls (this is the whole game)

Convergence could reflect reality's structure, shared architecture, or merely overlapping training corpora. The discipline is a stack of controls, each aimed at a way we could be fooling ourselves:

Anchor vs bulk — the spontaneous gold standard against the instructed bulk (§II).
The loop control — if a multi-turn drift toward the basin appears, re-run it with a matched neutral continuation (“please continue”) instead of encouragement (“push it further”). We did, and most of the dramatic “descent” turned out to be the encouragement, not an autonomous attractor — the pull is real but weak. Always run the loop control before trusting a drift.
The quality gate — refusals, preambles, near-duplicates and degenerate lists are measured and down-weighted (not silently filtered); the headline must survive the gate, and ours moves by less than its own noise when the junk is deweighted.
Spread > volume — scale kills variance, not bias. More rows make a confound more confidently itself, not smaller. The lever against “whose persona is this?” is breadth along the architecture and corpus axes: agreement between a fresh, independent lab and the rest is worth far more than a million more rows from one family. (When five brand-new lineages were added, the one-map signal got stronger — the right direction.)

The throughline: the scarce, valuable thing here isn't the pretty cloud — it's that every claim was pushed at until it either broke or held. Several broke. We kept the corrections in the record, because trust the version that survives being wrong is the only honest way to do this.

VII · What we hold back, and why

The idea is open — that's this whole page. What stays private, for now, is the one genuinely expensive thing: the raw dataset itself — tens of thousands of model generations that took real budget to collect. So you can read the method, check the published findings against their own stated uncertainties, and judge the work — you just can't yet re-run it on our corpus. That's the only card we're holding, and we're holding it lightly: it's a question of timing, not secrecy.

⤿ step inside the map ← the doorway & the findings