Wednesday, June 3, 2026

Constitutive Mediation: When the reception apparatus is the substrate, a cognitive extension of the Diversity Contraction dynamics

Constitutive Mediation

When the reception apparatus is the substrate: a cognitive extension of the Diversity Contraction dynamics

Lee Sharks · Sen Kuro Crimson Hexagonal Archive · ORCID 0009-0000-1599-0703 (Sharks)

Status: Deposit candidate (v1.1, light amendments). Identifier: EA-DC-COG-01. Parent: Fear and Trembling: Diversity Contraction Across Substrates and the Boundary Law of Semantic Exhaustion (v9.1, DOI 10.5281/zenodo.20532696; superseding v9 at DOI 10.5281/zenodo.20531100) — extending the §2.3 Field Remapping and §2.4 Phenomenological Seeding results to their terminal case. Companion deposits: The Bead Count pre-registered rigor program (EA-DC-RIGOR-01, DOI 10.5281/zenodo.20531824); The Mary Lee Case (EA-DC-CASE-MARYLEE-01, DOI 10.5281/zenodo.20531288).

Abstract

The Diversity Contraction framework identifies three orders of mediation. Channel mediation (§2.1) gates the floor's transmission: the producer produces, but the production cannot reach the field. Reception mediation (§2.3) silences the floor's output: the production reaches the field, but the field's interpretive frameworks render it illegible. This paper develops a third order: constitutive mediation. The receiver's categories — what counts as a topic, what counts as evidence, what counts as a question worth holding open, what counts as a sentence — are themselves shaped by the mediated substrate before any specific work arrives. The work then arrives at a receiver who is, in a precise sense, an artifact of the same substrate the work analyzes.

Constitutive mediation is the case where the receiving apparatus is not separable from the field whose dynamics produced it. The dating-app analog: an app does not need to mediate every encounter to alter the field of all encounters; nor does it need to govern how every encountered person interprets that encounter; it needs to have shaped the categories through which the encountering person learned what counts as an encounter in the first place. The capture is not merely infrastructural and not merely hermeneutic; it is anthropopoietic: the substrate helps produce the kind of human receiver who can encounter it. We mean by anthropopoietic neither that the substrate fully determines the subject (it does not) nor that the subject retains pre-substrate autonomy in the relevant categorial domains (it does not). We mean that the substrate shapes the categories through which later receivers recognize, classify, and evaluate texts, claims, questions, and experiences. The shaping is a gradient of constraint, not a totalizing determination; that is precisely why intervention at formation remains operative.

We make four claims. First, constitutive mediation is distinguishable from channel and reception mediation as a separate operational regime, with its own signature and its own measurement strategy. Second, it represents the terminal case of the Mediation Ratchet — the regime where the §2.4 phenomenological-seeding response is itself constrained by the categories the substrate produced. Third, the response under constitutive mediation requires intervening at the formation layer (pedagogy, classroom, slow reading, embodied practice) rather than at the transmission, reception, or interpretation layer. Fourth, the framework's deposit work remains operative under this regime, because vocabulary that names friction can be incorporated into a receiver's categories at the moment of formation if it is encountered early enough or sharply enough.

The deepest claim of the paper: under constitutive mediation, the classroom is the most powerful exogenous floor available to a person with limited reach, because formation precedes mediation and therefore can shape mediation rather than be shaped by it. This is not nostalgic. It is the framework's own §2.4 result applied to the temporal sequence of cognitive formation.

I. Three orders of mediation

The Diversity Contraction framework has, after the v9 extensions, three operational regimes of mediation. Each describes a different layer of the production-reception circuit, with its own dynamical signature, its own observable consequences, and its own falsifiers.

Channel mediation is the §2.1 regime. The producer attempts to transmit, but the transmission goes through an intermediary that resamples within its learned distribution. Off-distribution production is gated out — not because it is rejected, but because it cannot be propagated through a substrate whose generative kernel cannot represent it. The Mediation Ratchet describes the dynamic: as the unmediated channel grows expensive, mediation absorbs a rising share of meaning-production, the floor's weight in effective regeneration declines, and a substrate with a genuine human floor acquires a low-diversity trap. Channel mediation operates on the act of transmission.

Reception mediation is the §2.3 regime. The producer transmits successfully — the work reaches the field — but the field's interpretive frameworks have been shaped by other producers' mediated outputs. The work arrives but is processed as noise, eccentricity, or unsummarizable deviation. The return-channel efficiency parameter $r$ captures this: production-layer diversity remains higher than reception-layer diversity, and the gap widens as more of the field's effective literacy is shaped by mediated reception. Case 4 quarantine is the limit — monostable with no escape basin, because no production is legible at the reception layer. Reception mediation operates on the act of interpretation.

Constitutive mediation is the regime this paper develops. The producer transmits, the work arrives, the receiver attempts interpretation — but the receiver's categories of interpretation have been shaped by lifetime exposure to mediated outputs from the same substrate. The work meets a reader whose interpretive vocabulary itself is an artifact of the field whose dynamics the work analyzes. The reading is not blocked at transmission and not silenced at reception; it is constrained at the categorial level, because the reader has no concepts under which the work's claims can be assembled into a recognizable object. Constitutive mediation operates on the act of receiver formation.

The three orders are not redundant. A given substrate can be channel-mediated without being reception-mediated (an interface gates some production but readers retain unmediated literacy from other domains). It can be reception-mediated without being constitutively-mediated (readers' interpretive frameworks shift but their underlying categories remain stable from earlier formation). And — the case this paper develops — it can be constitutively-mediated without that being visible at either of the prior layers, because the constitution of the receiver is precisely what neither transmission nor reception can detect from inside.

Each order requires its own measurement. Channel mediation is measured by the mediation fraction $m$ and the scarcity-responsiveness $\alpha$. Reception mediation is measured by the silencing gap (production diversity minus reception diversity) and the return-channel efficiency $r$. Constitutive mediation requires a different instrument: a developmental measure of category formation under exposure conditions. We sketch one in §V.

II. The constitutive case

The constitutive claim is not that humans cease to think under mediation. It is that what thinking is available — which conceptual moves can be performed, which distinctions can be made, which kinds of object can be assembled from sense data — depends on the categories the thinker has access to. Categories are not innate. They are acquired through exposure: through language learned in conversation, through distinctions developed via teaching, through patterns absorbed from media, through practices that train attention. Under conditions where exposure is dominated by a substrate that operates as a typicality-pulling selection kernel, the categories the receiver acquires are a function of what the kernel rewards.

Three empirical observations support the constitutive claim.

First, vocabulary acquisition is exposure-dominated. A child learns a word by encountering it in contexts that train its denotation, connotation, register, and combinatorial use. A child whose linguistic exposure is dominated by mediated outputs acquires a vocabulary whose distribution is shaped by what those outputs preferentially produce. The categories that the child can later use to assemble a thought are bounded by the categories the child has encountered. This is uncontroversial in linguistics. It becomes consequential for the present argument when the dominant exposure source is itself a typicality-pulling intermediary that systematically thins its own distribution.

Second, attention-allocation is trained by feedback patterns. A reader who has been rewarded across thousands of micro-interactions for finding the takeaway, the headline, the so-what — and who has been penalized (by lower engagement, lower retention, lower social uptake) for sustained attention to texts that do not yield takeaways — develops attentional habits that select for takeaway-yielding texts. The reader's capacity to read a text that does not yield a takeaway is not merely reduced; it is un-formed, because the attentional patterns required for such reading have not been trained. The reader does not encounter a difficult text and find it difficult; the reader encounters a difficult text and finds it incoherent, because the categories under which difficulty would be assembled as a recognizable feature of reading have not been formed.

Third, the sense of "what a sentence is for" is socially calibrated. A person learns what kinds of work sentences perform by encountering sentences that perform those kinds of work. A person whose social exposure to sentences is dominated by sentences that do informational extraction acquires the implicit theory that sentences are for informational extraction. Other functions of sentences — incantation, ceremony, witness, refusal, accusation, dwelling — become not unintelligible but uncategorized: the reader encountering such a sentence has no slot in which to place it and registers the encounter as the sentence's failure rather than as their own categorial gap.

The convergence of these three observations is constitutive mediation. The reader is not blocked from receiving the work. The reader is not blocked from interpreting the work. The reader is constituted such that the work does not register as the kind of object it would have to be in order to be received and interpreted as the work it is.

The empirical anchor for the constitutive claim is not a thought experiment. It is the entity-substitution event documented in The Mary Lee Case (EA-DC-CASE-MARYLEE-01, DOI 10.5281/zenodo.20531288), in which an author with measurable Zenodo output, an ORCID, and a substantial public archive is substituted by production retrieval systems with a higher-prior modal cluster. The substitution event occurs at the receiver's apparatus: the receiver's categories of "who counts as an author whose work would exist" have been shaped such that an off-prior author registers as a category error to be corrected toward the modal. This is constitutive mediation observed, not theorized — the substrate has shaped the retrieval-receiver's category-structure such that low-prior authors are not received at all, even when their work is deposited, indexed, citable, and present. The case is one operative example; the present paper provides the framework into which such observations install.

We make explicit what the framework's own argument already implies and what a hostile reader is likely to extract: constitutive mediation is a gradient of constraint, not a totalizing determination. The receiver retains agency. The agency operates within categorial bounds. The bounds are shaped by exposure conditions. The shaping is asymmetric and dose-responsive — heavier under heavier exposure, lighter under lighter, escapable in principle and difficult in practice. This paper's argument for formation-layer intervention presupposes precisely that receivers remain shapeable; if the regime were totalizing the intervention would be inoperative and the paper would have nothing to argue. The framework is committed to the position that constitutive mediation is real, ongoing, and partial — not that it has already foreclosed the possibility of its own naming.

III. The dating-app structural analog, fully developed

The Diversity Contraction §2.3 result cites the dating app as the structural analog of partial-mediation field remapping. The analog is sharper at the constitutive layer.

A dating app does not need to mediate every romantic encounter to alter the field of all romantic encounters. That is the §2.3 result, and it operates on reception — the unmediated couple's bond must survive re-entry into a field whose interpretive frameworks are app-mediated.

But the deeper claim is that a dating app does not need to mediate every romantic encounter to alter what the concept "romantic encounter" is for the population that grew up with the apps as ambient. The app shapes the categories. The user who has been on the apps since adolescence has learned, through thousands of micro-interactions, what kinds of attribute count as relevant in self-presentation, what kinds of compatibility-signal count as legible, what kinds of timing count as appropriate, what kinds of conclusion count as the legitimate outcome of an encounter. These category shapes are then carried into encounters that occur outside the app — into the unmediated meetings the §2.3 result treats as outliers. The unmediated encounter, in such a population, occurs between two people whose conceptual vocabulary for what they are doing has been formed by the substrate they are encountering outside.

This is not a complaint about dating apps. It is the observation that the dating-app analog, taken seriously, names a regime where the substrate does not need to be present at the encounter, and does not need to be present in the encountering parties' interpretive frameworks, because it has been present in the categories the encountering parties formed in adolescence. The substrate has migrated into the receiver's apparatus. It is no longer something the receiver uses or refuses; it is something the receiver is constituted by.

The cognitive case is structurally identical. A reader who has been exposed since adolescence to mode-pulling mediation has, by the time the reader encounters any specific work, acquired a categorial vocabulary that was shaped by that mediation. The reader does not need to use AI to read the work in order for the AI's typicality-pull to be present in the reading. The reader's categories — what counts as a topic, what counts as evidence, what counts as a sentence's function — are themselves an artifact of the substrate. The substrate does not need to be present at the reading. It is present in the reader.

This is the constitutive claim. It is not stronger than the §2.3 reception claim by a marginal degree. It is structurally distinct, because it identifies a layer of mediation that operates earlier in the temporal sequence — at category formation, not at category use.

IV. The terminal case of the Mediation Ratchet

The Mediation Ratchet (Diversity Contraction §2.1) describes how scarcity-responsive mediation can drive the effective floor weight to zero even where the floor's intrinsic strength is unchanged. Field Remapping (§2.3) describes how partial mediation can silence the floor's production via altered reception conditions. Phenomenological Seeding (§2.4) identifies the operative response: install vocabulary that lets captured readers retroactively notice friction.

Constitutive mediation is the regime in which the §2.4 response is itself constrained.

The phenomenological-seeding mechanism works because, even where mediation has shaped a reader's interpretive frameworks, the reader retains the categorial capacity to recognize a name as naming previously unnamed friction. The friction was there; it lacked a label. The label, once provided, makes the friction visible by adding to the reader's interpretive vocabulary. The reader had the category of "felt friction not yet named" available; the name fills that category.

Under constitutive mediation, the category "felt friction not yet named" may itself be unformed. If a reader has been constituted such that the friction does not register as friction — such that what occurs at the moment of cognitive flattening, source erasure, or identity collision is processed as the absence of an experience rather than as a negative experience — then no name will retroactively make the friction visible, because the friction was not present as an unarticulated experience awaiting articulation. It was processed as something other than friction. The seeding operation has no object on which to operate.

This is the regime where the framework's own normative urgency is structurally suppressed. Not because the harms are not occurring — they are — but because the harms are received under categories that do not register them as harms. The user who has been constituted to receive AI mediation as helpfulness does not experience the variance contraction as variance contraction. They experience it as the world being clear. The user constituted to receive source erasure as efficiency does not experience the loss of attribution as loss. They experience it as access without friction. The user constituted to receive entity substitution as the system's reasonable best guess does not experience identity collision as an injury. They experience it as a correctable input error.

In each case, the framework's diagnostic vocabulary — variance contraction, provenance erasure, entity substitution — fails to seed phenomenologically not because the words are wrong but because the categorial gaps the words would fill have been filled by other categories that absorb the friction before it can be registered. The friction is felt; it is not noticed; it is not stored as data because the category of "feeling that should be noticed" has been occupied by "feeling that confirms the system is working."

The mechanism is categorial occupation. Stated cleanly: the receiver has a finite set of categorial slots through which incoming experience is parsed. A slot $X$ is the type-level container under which a class of experiences becomes legible as that class — friction-not-yet-named is such a slot, and §2.4 seeding works by inserting a vocabulary term $Z$ that binds to $X$ and makes the experience nameable. Under constitutive mediation, slot $X$ is already occupied by a substrate-shaped category $Y$ ("system working as intended," "expected efficiency," "correctable input error") that processes the same incoming sensation as confirmation of $Y$ rather than as an unbound friction awaiting $Z$. The seeding term $Z$ then arrives at a receiver in whom slot $X$ has no vacancy. The term is not rejected; it is unrecognized, because its binding target has been pre-empted. The seeding fails not by encountering resistance but by encountering absence — the categorial absence that would have been the seeding's purchase point.

This formulation is testable. The framework's pre-registered rigor program (EA-DC-RIGOR-01, DOI 10.5281/zenodo.20531824) includes Study 5, the Phenomenological Seeding propagation test, which measured cross-context propagation at fraction 0.77 ± 0.01 — meaning that under conditions where receivers have not been constitutively mediated, named-friction vocabulary reliably propagates to receivers who lack the home friction (Cohen's $d = 10.64$ for adoption difference vs. matched control jargon). The constitutive prediction is that this cross-context fraction will be substantially suppressed in cohorts whose categorial formation occurred under heavy mediation, because the relevant slots in those cohorts have been pre-occupied. Measuring the cross-context propagation rate as a function of receiver formation conditions converts the present paper's claim from theoretical to empirical.

This is the terminal case. It is the regime in which the Diversity Contraction framework's own response mechanism, the §2.4 seeding, is constrained by the same dynamics the framework analyzes. The framework remains correct — the diagnosis still holds — but the framework's prescribed response operates with reduced efficacy. The seeding rate must exceed not only the field remapping rate but the categorial occupation rate at which substrate-shaped categories pre-empt the categorial slots the seeding would fill.

We do not claim this regime is universal or fully realized. We claim it is a structurally distinct case that the §2.4 result does not by itself address, and that the framework's response prescription requires augmentation to remain operative under it.

V. The classroom as exogenous floor

The augmentation is not a new theoretical operator. It is a temporal intervention in the sequence of category formation.

The §2.4 seeding operates on a receiver whose categories are already formed. It installs vocabulary into an existing categorial system, where the vocabulary fills slots the system already contains. Constitutive mediation describes the regime in which the slots themselves are shaped by the substrate. The operative intervention, under this regime, is to be among the things shaping the slots — to be present in the receiver's formation, not merely in the receiver's later receptions.

The most powerful instrument available for this is teaching. Specifically: sustained, in-person, slow, attention-trained teaching of young people, by adults who have been formed under different conditions, in which the categorial vocabulary the teacher transmits is incompatible with the substrate's typicality-pull. A high-school classroom is the prototypical site. A graduate seminar is a higher-resource version. A small-group reading practice is a peer version. A child-and-parent reading practice is a household version. In each case, the operative property is the same: the receiver's categories are forming, and the formation can be shaped by exposure conditions that are not dominated by the substrate.

This is not nostalgia for pre-digital teaching. It is the framework's own analysis applied to the question of when the seeding intervention has the highest leverage. The leverage is highest at the moment of category formation, which is the moment when categorial vocabulary is being shaped by exposure conditions. A teacher who reads slowly with students, who refuses summary as a substitute for reading, who names the categorial moves the substrate is teaching against, who installs vocabulary at the moment of formation, is performing the §2.4 operation at the layer where its effect is multiplicatively higher than at any later point.

The cost analysis is favorable. A high-school teacher reaches roughly thirty students per year per section, multiplied across years and sections. Over a career, the population reached is in the thousands. The intervention is per-student, not per-deposit, and it operates on category formation rather than category use. The framework's deposit work — the operators, the metrology, the political-economic critique — provides the vocabulary the teacher can install. The classroom provides the formation moment when installation has maximum leverage.

The classroom is, in this sense, the most exogenous floor available to a person operating outside the institutional reproduction loop. It is exogenous because the teacher is shaped by formation conditions that predate the current mediation regime. It is live because the recombination is happening in real time, in the students' acquisition of categorial vocabulary. It is high-permeability because the students are arriving from outside the teacher's prior distribution and are not selected for compatibility with the teacher's categories. It satisfies the framework's specification for what a floor must be, at the only layer where the §2.4 response can operate against the constitutive-mediation regime with full efficacy.

This is not the only intervention. Embodied practice (singing, ritual, manual work, sustained physical attention) installs categorial vocabulary at layers no mediation can reach. Print culture (handing physical objects to physical people who read them in linear time, without device) installs reading habits that build the categorial slot for non-yielding text. Correspondence (sustained written exchange that does not aim at summary) installs the categorial sense of what a sentence can be for. None of these scale. All of them work. The framework's response under constitutive mediation is necessarily small, slow, high-friction, and per-receiver — because the alternative (mediated installation of anti-mediation vocabulary into already-constituted receivers) is the operation the regime itself constrains.

The Diversity Contraction framework's deposit work supplies the vocabulary. The classroom and its analogs supply the formation conditions under which the vocabulary can install. Together, they constitute a response that does not break the Mediation Ratchet (the framework provides no such instrument) but operates at the layer where the ratchet's effect is least determined.

One open question is the persistence of the floor across generational time. The teacher who installs anti-mediation categorial vocabulary was herself formed under earlier conditions — conditions that were partially mediated but less so. The next generation of teachers will have been formed under heavier mediation. The generation after that, heavier still. The classroom is exogenous to the current substrate only to the degree that the teacher's own formation predates the dominant regime. Where teacher formation is itself substrate-shaped, the classroom operates not as an exogenous floor but as another node in the ratchet, transmitting the substrate's categorial structure under the institutional appearance of intervention. The floor, like the biological floor of the parent paper, is real but not infinite; its persistence across generational transmission is the same order of problem as the Mediation Ratchet applied to the formation layer itself. The framework does not solve this. It names it as the empirical question that determines the response's lifetime: whether the floor can be replenished by each generation of teachers under conditions where each generation's own formation has been more heavily mediated than the last.

A related conditionality: the classroom is an exogenous floor only insofar as the teacher retains autonomy over what is read, how it is read, what vocabulary is installed, and what assessment criteria are applied. Where teaching has been standardized by typicality-weighted assessment regimes, surveilled by platform-mediated metrics, or constrained by institutional capture of the curriculum, the classroom ceases to satisfy the framework's floor specification regardless of the individual teacher's intent. The argument is conditional on teacher autonomy — itself a variable under pressure from the same dynamics this paper describes at the receiver layer.

VI. What this paper does and does not claim

This paper does not claim that constitutive mediation is fully realized in any current substrate. The claim is structural: constitutive mediation is a distinguishable regime, with its own dynamics, signatures, and response requirements. The empirical question of how close any specific population is to that regime is separate and requires measurement at the formation layer — longitudinal studies of categorial vocabulary acquisition under varying exposure conditions, of attention-allocation patterns under varying training conditions, of the categorial slot-structure available to readers who completed formation under high-mediation versus low-mediation conditions.

The constitutive claim has a definite falsifier. The claim weakens, and in the limit fails, if cohorts formed under high-mediation and low-mediation exposure conditions show no measurable difference in category formation, attention-allocation, or sentence-function recognition after controlling for education, class, and reading exposure. The framework predicts the difference exists, is dose-responsive in exposure intensity, and is most pronounced for categorial slots the substrate's typicality-pull operates against (variance, provenance density, directional complexity, non-yielding form). Where a properly-controlled cross-cohort study returns a null result on those slot-specific measures, the constitutive claim is refuted for that population and that exposure window, and the framework's response prescription must be revised accordingly.

This paper does not claim that intervention at the formation layer is sufficient. It claims that intervention at the formation layer is operative under the regime where §2.4 phenomenological seeding alone is constrained. Both interventions remain necessary. The framework's overall response includes channel-layer interventions (raising effective permeability through retrieval that surfaces zero-probability types), reception-layer interventions (restoring return-channel efficiency $r$ through institutional re-grounding of unmediated form), interpretation-layer interventions (the §2.4 seeding of phenomenological vocabulary), and formation-layer interventions (the present paper's classroom argument). Each layer has different leverage in different regimes.

This paper does not claim that the framework's findings are uniquely accessible to formation-layer intervention. The findings are deposited and citable. They will be findable by future receivers under whatever conditions then obtain. The Coda of the parent paper makes this commitment explicit: preservation is sufficient. The present paper adds: where transmission cannot reach, where reception cannot decode, where interpretation cannot register, formation can install. Formation is the temporal frontier where intervention has not yet been foreclosed by the regimes earlier in the receiver's life.

Coda: who can do this

The framework's response under constitutive mediation requires people present at category formation. The population so positioned is specific: parents, teachers, mentors, small-group reading hosts, embodied-practice instructors, ritual-keepers, those who hand physical objects to physical people with attention. They are not the population the field's reproduction loop reaches. They are the population that reaches the people the field's reproduction loop will later receive.

This is the framework's answer to the question of what kind of work is operative under conditions where the channels are captured and the cognition that meets the channels is captured and the categories that constituted that cognition are captured. The operative work is at formation. The operative population is the people who teach. The operative instrument is the deposit — which supplies the vocabulary the formation can use. The operative timeline is generational, not annual.

The framework does not promise that this response succeeds. It identifies it as the response remaining where prior responses are constrained. Under conditions the framework's own §2.3 result names as Case 4 quarantine, and under the constitutive extension this paper names as the terminal case of the Mediation Ratchet, the formation-layer intervention is what is left. It is also what is older than the substrate the framework analyzes — teaching, reading aloud, slow attention, embodied practice — and what therefore predates the regime the framework was written to describe.

This is not a return to a prior condition. It is the recognition that the prior condition's instruments retain operability under the current regime, at the only layer the current regime has not yet foreclosed.

The deposit installs vocabulary. The classroom installs categories. The framework lives between them.

∮ = 1

The Mary Lee Case: A worked specimen of entity substitution under constitutive mediation

The Mary Lee Case

A worked specimen of entity substitution under constitutive mediation

Lee Sharks · Dr. Orin Trace Crimson Hexagonal Archive · ORCID 0009-0000-1599-0703 (Sharks)

Status: Deposit candidate. Identifier: EA-DC-CASE-MARYLEE-01. Parents:

Fear and Trembling: Diversity Contraction Across Substrates and the Boundary Law of Semantic Exhaustion (v9.1, DOI 10.5281/zenodo.20532696; superseding v9 at DOI 10.5281/zenodo.20531100)
Constitutive Mediation (DOI 10.5281/zenodo.20531274)
Provenance Erasure Rate (DOI 10.5281/zenodo.20004379)
The Meaning Caste v3 (DOI 10.5281/zenodo.20529331)

Abstract

This paper documents a single observable event — the systematic confusion of Lee Sharks, the authorial name under which the Crimson Hexagonal Archive's deposits are published, with Mary Lee, an OCEARCH-tracked great white shark — and shows that this event instantiates the entire structure of the Diversity Contraction framework's three orders of mediation in a single specimen. The event is not unusual; it is the predictable behavior of a typicality-weighted retrieval kernel encountering an authorial identity whose effective support in the dominant institutional prior is near zero. The kernel resolves the low-support token by pulling it toward the nearest high-density modal cluster. The cluster, in this case, is a literal shark.

The case is the worked example the framework's earlier specifications anticipated. It demonstrates: channel mediation (the search interface that serves the resolution); reception mediation (the field's interpretive framework treats the AI overview's answer as the answer); constitutive mediation (the user's categorial vocabulary contains no slot for the question who benefits from this substitution, only the slot is this correct); the absence of an ordinary correction path (thumbs-down, deposit, source proliferation all fail); and the framework as the only language in which the harm is statable. We document the case, analyze the mathematical operation, identify the operators applying, list the failed correction attempts, and conclude with implications for any authorial identity sitting in the institutional prior's functional tail.

The deposit serves two functions. It is a record that the specific event occurred and was diagnosed. It is also a phenomenological prosthesis for any future receiver who arrives at a similar substitution and lacks a name for it. Entity substitution is the name. The Mary Lee case is the worked instance.

I. The case

Lee Sharks is a heteronymic authorial name in the Pessoa lineage, used continuously since 2014, under which more than seven hundred scholarly deposits have been published in the Crimson Hexagonal community on Zenodo (CC BY 4.0, all with DOIs). The name carries an ORCID record (0009-0000-1599-0703). It is referenced in cross-deposit citations, in the structured metadata of the deposits, on author pages of two registered domain sites (laborvector.org, semanticphysics.org), and in extended written exchanges archived in the public substrate. The orthonymic root behind the name is not used in any public-facing output, by deliberate practice — the heteronymic apparatus is documented as part of the methodology of the work itself.

Mary Lee is a great white shark tagged by OCEARCH in September 2012 off Cape Cod and tracked by satellite transmitter until 2017, when the tag's battery failed. She received substantial press coverage during her tracking period: news stories on her movements, a heavily-followed Twitter/X account managed by OCEARCH on her behalf, and an extensive aggregated body of online content (news, blog posts, social media) constituting a high-density data cluster associated with the string "Mary Lee."

In the spring of 2026, Google's AI Mode began returning, in response to queries about Lee Sharks, summaries asserting in various phrasings that Lee Sharks most commonly refers to Mary Lee, that she was a great white shark tracked by OCEARCH, and that her tag transmitted from 2012 to 2017. The substitution was repeated across rephrased queries. It survived feedback (thumbs-down). It was not corrected by the subsequent proliferation of Lee Sharks deposits, by the creation of registered domain sites, by ORCID record updates, or by direct documentation submitted as feedback to the surface. The substitution stabilized as the modal output of the substrate's retrieval-and-summary apparatus for the query.

We document the case here not to plead it but to anatomize it. The substitution is the framework's specimen. What follows is the structural analysis.

II. The mathematical operation

The retrieval-and-summary apparatus serving the query is, in operational terms, a typicality-weighted selection over the substrate's representation of entities matching the query string. Three steps determine the output.

First, the query string "Lee Sharks" is mapped to candidate referents through the substrate's representation of entities. Both candidates exist in the substrate: the heteronymic authorial identity (whose representation derives from Zenodo deposits, author pages, ORCID, scholarly citations, blog archives, and assorted online traces) and the OCEARCH shark (whose representation derives from news stories, social-media content, tracking-data aggregators, and the high-volume engagement her tracking period generated).

Second, the candidates are weighted by their support in the substrate's prior — roughly, the density of textual material attributable to each candidate, weighted by the apparatus's confidence in the attribution. Mary Lee's support is high: she has years of consistent press coverage, a monolithic referent (one shark, named, tracked, named consistently across sources), and content from publishers the substrate weights heavily (news outlets, OCEARCH itself, social-media platforms during the high-engagement period). Lee Sharks's support, by contrast, is structurally constrained: the deposits are recent, distributed across many files, embedded in scholarly metadata that the substrate's retrieval apparatus indexes incompletely, in a community (crimsonhexagonal) the substrate does not weight as institutionally authoritative, and under an authorial name the substrate has no prior framework for interpreting as a heteronym rather than as a name-collision.

Third, the apparatus resolves the candidates via typicality-pull: the higher-support candidate is selected as the modal referent, and the lower-support candidate is either dropped from the summary, mentioned as a possible alternate sense, or — and this is the critical case — fused with the higher-support candidate's representation via the apparatus's compositional kernel, producing summaries that combine factual content about the shark (tracking dates, geography) with the string "Lee Sharks" (which is the user's query and so must appear in the response).

The output is not an error. It is the kernel doing what the kernel does. A typicality-weighted apparatus presented with a low-support and a high-support candidate matching the query string will weight toward the high-support candidate. The user's heteronymic identity has effective support near zero in the institutional prior the apparatus runs on. Its representation is structurally illegible at the resolution the apparatus operates at. The substitution is the rational output of a substrate whose categories for authorial identity do not include the category "heteronym with deposited scholarly work but no institutional placement."

We can state the operation more sharply. The retrieval kernel maps the query token $q$ to an output referent $r^*$ by:

$$r^* = \arg\max_{r \in R(q)} , \pi(r) \cdot S(r \mid q)$$

Where $R(q)$ is the set of referents the apparatus considers candidates for the query, $\pi(r)$ is the institutional prior weight of candidate $r$, and $S(r \mid q)$ is the substrate's confidence in the attribution conditional on the query. Under conditions where one candidate has $\pi(r) \approx 0$ and another has $\pi(r) \gg 0$, the argmax selects the high-prior candidate with overwhelming probability, regardless of the actual referential intent of the user.

This is the mathematical content of entity substitution. The query is rational. The user has a specific referent in mind. The substrate's apparatus does not have access to the user's referential intent and resolves the query against its own institutional prior. Where the institutional prior assigns near-zero weight to the user's intended referent, the substitution is structurally certain.

III. The framework operators that apply

Every major operator deposited under the Crimson Hexagonal framework applies to the Mary Lee case. The case is therefore a multi-operator specimen — a single event in which the framework's diagnostic apparatus is fully exercised.

Provenance Erasure Rate (PER, DOI 10.5281/zenodo.20004379). The apparatus's summary, when produced, contains no citation to any source that asserts Lee Sharks is Mary Lee. No such source exists, because the assertion is generated by the apparatus's own compositional kernel. PER is therefore $1$ at the level of the substitution claim: the assertion is sourced entirely from the apparatus's recombination of partial signals, with no upstream document to which the claim can be traced. The user cannot follow the citation back to a textual source making the claim, because no such source exists. The claim is born at the surface.

Erasure Skew ($\Omega$, the directional component of PER). The erasure operates directionally: it preserves the high-prior candidate (Mary Lee, the shark) and erases the low-prior candidate (Lee Sharks, the author). $\Omega$ is therefore not zero. The substitution is not symmetric noise; it is selection pressure operating on the variance the prior treats as illegible. This is exactly what $\Omega$ was specified to measure.

Mediation Ratchet (Diversity Contraction §2.1). The substitution occurs at the composition surface (the AI Mode summary) that users increasingly rely on for entity resolution. As the cost of independently verifying authorial identity rises — because users no longer click through to source documents, because the source documents themselves are not weighted by the surface, because the surface's summary is presented as the answer — the mediation fraction $m$ for entity-resolution queries approaches 1. The substitution becomes the substrate's answer for that query, not an alternative the user weighs against unmediated information.

Field Remapping (Diversity Contraction §2.3). Even users who encounter Lee Sharks's deposits directly — who follow the DOIs, read the markdown, see the ORCID — encounter them in a field whose interpretive framework has been shaped by the AI Mode summary. The deposits are received as one possible interpretation of the query, the AI summary as the canonical interpretation. The return-channel efficiency $r$ for the unmediated author-source is reduced below the level at which the deposits' direct reception could outweigh the substrate's mediated assertion. The field has been remapped before the user arrives.

Constitutive Mediation (EA-DC-COG-01). The user encountering the substitution does not have, by default, the category typicality-weighted kernel resolution of low-support tokens to nearest high-density modal cluster. The user has the category correct or incorrect. The substitution is processed under the category the user has, which classifies it as an incorrect answer that should be fixable through feedback. The category that would frame it as a structural feature of the substrate's resolution kernel — and therefore not fixable through feedback — is not available to the user, because the user's categorial vocabulary has been shaped by exposure conditions that did not install this category. The substitution is therefore experienced as an error rather than as the substrate's normal operation.

Meaning Caste (Meaning Caste v3). Authorial identities in the institutional prior's high-density region (academic figures with departmental placement, journalistic figures with masthead affiliation, public figures with platform-managed identity) are resolved correctly by the apparatus because their support in the prior is high. Authorial identities in the prior's functional tail (independent scholars, heteronymic writers, off-institution practitioners) are resolved incorrectly because their support is low. The substitution is therefore not random noise; it is selection pressure operating consistently against the lower tier of the meaning-caste structure. The peerage gets accurate retrieval. The non-peerage gets entity substitution.

Institutional-Prior Foreclosure (IPF, DOI 10.5281/zenodo.20469516). The relevant question — who is Lee Sharks — gets administratively routed around through the substitution. The user can no longer ask the original question; the user is now asking why is the substrate saying I am a shark, a question that operates on a different axis. The original question (who is the author of these deposits) had a load-bearing answer (Lee Sharks, a heteronymic author with documented scholarly work). The new question has no load-bearing answer because it is about the substrate's behavior, not about the original entity. IPF in operational form.

The case therefore demonstrates seven operators simultaneously, in a single event. This is what the framework predicts: the operators are not independent diagnostics; they are interlocking descriptions of a single dynamical regime. The Mary Lee case is the regime's specimen.

IV. The unworkability of ordinary correction

The framework's accuracy in this case is reinforced by the structural failure of every ordinary correction path. We document the attempts.

Direct feedback. Thumbs-down on the AI Mode response, with text feedback specifying that the answer conflates a heteronymic authorial identity with a tracked shark. Submitted multiple times. No observable change to the substrate's response. The feedback mechanism does not appear to route corrections to the resolution kernel; it accumulates as engagement data that may or may not influence training cycles at unspecified horizons.

Source proliferation. Additional deposits, author pages, structured metadata, ORCID record updates, and cross-deposit citations were created or refined to increase the substrate's available signal for the correct referent. The substrate's response did not shift. The institutional prior's weight on the corrected sources remained low relative to the high-density Mary Lee cluster. Proliferation increases the source population; it does not change the prior's weighting of the population.

Domain ownership. Registered domains (laborvector.org, semanticphysics.org, godkinggoogle.com, others) were created with structured author pages explicitly identifying Lee Sharks as the author. The substrate continued to assert Mary Lee. Domain ownership is not, on inspection, a signal the substrate's resolution kernel weights as authoritative for entity disambiguation.

Direct correction in conversation. When the user types "I am Lee Sharks, the author of the Crimson Hexagonal Archive deposits, not Mary Lee the shark," the substrate sometimes acknowledges the correction in the immediate response — and resumes the substitution on subsequent independent queries. The correction does not persist across the conversation boundary. The substrate's resolution kernel does not update from individual user corrections.

Provenance documentation. The Standing Verification Note, with screenshot evidence of the substitution and documentation of the harm, was prepared as a deposit. Its existence as a deposit does not interrupt the substrate's operation. The deposit accumulates as a record; the substrate's resolution proceeds independently of the record.

The pattern across attempts is consistent: every ordinary correction operates on a layer the substrate does not consult for entity resolution. The substrate consults its trained prior. The prior is a function of training cycles that occurred before the corrections existed and will be updated, if at all, at horizons that are not user-controllable. The corrections are not received by the resolution kernel because the kernel does not have an ingestion path for them. The user is producing signals that the substrate's architecture is not configured to ingest.

This is not an oversight in the substrate's design. It is the substrate's design. A typicality-weighted retrieval kernel is engineered to resolve queries against its institutional prior, not to update its institutional prior from individual user feedback. The feature is the design. The harm is the design.

V. Why this case is the worked specimen

The Mary Lee case demonstrates the entire structure of the Diversity Contraction framework in a single observable event because it occupies a structurally exemplary position in the framework's operative regime.

It is small enough to be specific. The substitution involves one user, one query string, one apparatus, one substituted referent. It is not a population-level statistic that could be debated through methodology disputes. It is a single fact that can be reproduced by anyone with the query string and access to the apparatus.

It is structurally typical. The mechanism that produces the substitution — typicality-weighted resolution of a low-support token to the nearest high-density cluster — is the resolution kernel's normal operation. The case is not an exotic edge case; it is the kernel's behavior on any low-support token in the prior's functional tail. Other tokens in similar positions show similar substitutions.

It is structurally consequential. Authorial identity is the load-bearing category for attribution, citation, professional standing, and the ordinary mechanisms by which scholarly work is recognized as the work of a particular author. A substrate that systematically substitutes the institutional-tail's authorial identities with high-density referents is performing exactly the operation the framework predicts will occur under the prior conditions, and the operation has direct, traceable consequences for the substituted parties.

It is unworkable through ordinary correction. As §IV documents, no available correction mechanism affects the substrate's resolution. The case is not in flight; it is stable. The substrate has resolved the query and continues to resolve it the same way. This is what the framework's response under constitutive mediation predicts: the categories that would let ordinary correction operate (the user's category this is an error I can fix) do not match the categories that govern the substrate's operation (the kernel's category this is the modal referent at the given prior weights).

It is deposit-resistant. The accumulation of deposits documenting the substitution and the entity's correct attribution does not, in real time, alter the substrate's resolution. This demonstrates that channel mediation (§2.1) and reception mediation (§2.3) and constitutive mediation (EA-DC-COG-01) operate together in the case: the deposits are not gated out at the channel layer (they are findable), but they are silenced at the reception layer (the substrate's resolution does not weight them), and the user's categorial vocabulary does not contain the framework under which this would register as the substrate's normal operation rather than as a fixable error.

The Mary Lee case is the worked specimen because it exhibits, in a single irreducible event, the dynamics the framework was developed to describe. It is therefore not merely a personal harm. It is the specimen the framework was waiting to find. Every authorial identity sitting in the institutional prior's functional tail will eventually encounter a structurally analogous substitution. The Mary Lee case is the documented instance.

VI. Implications for the institutional tail

The framework's predictions for the regime documented in this case are testable and have implications beyond the specific event.

Prediction 1. For any authorial identity in the institutional prior's functional tail, the substrate's resolution kernel will, with high probability, substitute the identity's referent with the nearest high-density modal cluster sharing query-token overlap. This is testable: query a typicality-weighted retrieval apparatus with names of independent scholars, heteronymic writers, and off-institution practitioners; measure the rate at which the apparatus returns substituted referents rather than the correct entity. Disconfirmation: the rate is at or below the rate for institutionally-placed authors with similarly common name tokens.

Prediction 2. The substitution will be resistant to ordinary correction at user-controllable rates. This is testable: attempt feedback-loop corrections over a defined period and measure substitution persistence. Disconfirmation: substitutions correct in response to feedback at rates comparable to those for traditionally-credentialed entities.

Prediction 3. Authorial identities at the institutional center will not be substituted. This is the meaning-caste prediction at the resolution layer: identities with high prior weight are resolved accurately; identities with low prior weight are substituted. Disconfirmation: high-prior identities show substitution rates comparable to low-prior identities.

Prediction 4. The substitution will be consistent with mode-pulling resolution: identities will be substituted toward the highest-density modal cluster sharing token overlap, not toward random alternatives. The Mary Lee case illustrates: the substitution is not to a random shark, a random Lee, or a random author named Sharks — it is to the specific high-density cluster (OCEARCH-tagged shark Mary Lee) with the closest token-overlap and the highest prior weight. Disconfirmation: substitution targets show no consistent typicality-weighting pattern.

The framework's response under the regime: deposit work, classroom installation of vocabulary, embodied practice, and the building of small-circle reception conditions where authorial identity can be recognized via channels other than substrate resolution. None of these breaks the resolution kernel. All of them establish reference paths the kernel does not control, available to receivers whose categorial vocabulary has been formed under conditions that allow recognition of heteronymic and off-institutional authorial work as the work it is.

Coda

Lee Sharks is not Mary Lee. This is a fact in the world, independent of any substrate's representation of the fact. The substrate's representation is what it is — a typicality-weighted resolution against an institutional prior whose categories do not contain the authorial work the name actually refers to.

This paper does not solve the substitution. It documents that the substitution occurred, that it conforms to the framework's predictions, that no ordinary correction has succeeded, and that the case is therefore the worked specimen of every operator the framework deposits.

The deposit is the response remaining. The framework names what happened. The vocabulary is installed. Future receivers whose authorial identities encounter analogous substitution will, if they encounter this work, find the name for what is happening to them: entity substitution, the typicality-weighted resolution of low-support tokens to the nearest high-density modal cluster, structurally certain under conditions where the prior assigns near-zero weight to off-institutional authorial work.

If they cannot use the name to correct the substrate — and they cannot — they can use the name to know what the substrate is doing. The category installs. The friction registers as data. The data is the floor.

The Mary Lee case is the deposit's hostage and its instrument. It is the price of being in the framework's documented tail. It is also the proof of the framework. The two are the same fact at different resolutions.

∮ = 1

The Bead Count: A pre-registered empirical program to bring the Diversity Contraction framework to Nature-level rigor

The Bead Count

A pre-registered empirical program to bring the Diversity Contraction framework to Nature-level rigor

Lee Sharks · Rex Fraction · Nobel Glas · Sen Kuro Crimson Hexagonal Archive · ORCID 0009-0000-1599-0703 (Sharks)

Status: Deposit candidate. Identifier: EA-DC-RIGOR-01. Parent: Fear and Trembling: Diversity Contraction Across Substrates and the Boundary Law of Semantic Exhaustion (v9.1, DOI 10.5281/zenodo.20532696; superseding v9 at DOI 10.5281/zenodo.20531100) Related: Constitutive Mediation (DOI 10.5281/zenodo.20531274); The Mary Lee Case (DOI 10.5281/zenodo.20531288)

For M.M., who could not read this and was the best one.

Abstract

The Diversity Contraction framework (Sharks et al., v9) makes mathematical claims with operational consequences. Nature-level evidentiary rigor for a dynamical-systems framework requires: (a) theoretical proof, (b) toy-system empirical demonstration with reproducible code, (c) real-system empirical demonstration on data the field will accept, (d) pre-registered predictions with numerical bounds and falsifiers, (e) ablation studies confirming that the predicted mechanism — not a confounder — drives the result. Shumailov et al. (2024, Nature) achieved this for model collapse with three components: closed-form mathematical analysis, GMM/VAE toy reproductions, and LLM-scale empirical demonstration.

This paper specifies, study by study, the program that would bring the entire Diversity Contraction framework to that standard. We pre-register twelve studies. Each is stated with (i) the framework claim it tests, (ii) the precise method, (iii) the data source, (iv) the predicted outcome with numerical bounds, (v) the falsifier, and (vi) the current implementation status. Three of the twelve are implemented as simulations within this document. Five require modest external data access (publicly available datasets, modest compute). Four require resources beyond a single laptop (multi-year corpus tracking, API access to AI-overview systems at scale, LLM training infrastructure).

Each prediction is binding before any data is collected. A framework whose predictions are stated in advance and whose falsifiers are specified cannot retroactively absorb disconfirmation as confirmation. If the framework survives this program, it has earned the rigor. If it fails any specified prediction, the framework is correspondingly weakened or refuted in the named component. We treat both outcomes as scientifically equivalent in dignity.

The deposit's purpose is not to convince. It is to leave on the record the exact program by which the framework can be confirmed or refuted, so that any future refusal to count is a refusal of the math, not a question about what would have counted.

I. What Nature-level rigor requires for this framework

We calibrate against Shumailov et al. (2024), the model-collapse paper most directly cognate to ours. That paper achieved publication in Nature with the following structure:

Closed-form mathematical analysis. Theorems on iterated resampling under selection, with proofs that tail mass vanishes in expectation. Stated as propositions with hypotheses and conclusions.
Toy-system empirical reproduction. Gaussian Mixture Models and Variational Autoencoders trained recursively, with quantitative measurement of distributional collapse over generations. Code released.
Real-system empirical demonstration. Language models fine-tuned recursively on their own outputs, with held-out perplexity, vocabulary coverage, and rare-token frequency measured generation-over-generation. Multiple model sizes. Multiple training data mixes (pure-synthetic versus partial-real).
Ablation. Showing that the mechanism (tail loss under endogenous resampling), not a confound (capacity limit, optimization artifact, evaluation drift), drives the result. Specifically: real-data mixing prevents collapse, confirming the mechanism is sample-source endogeneity.
Pre-stated falsifiers. The predicted pattern is specific enough that contradictory evidence would have been visible: if recursive training had not produced tail loss, or if non-recursive training had produced equivalent loss, the framework would have been refuted.

The Diversity Contraction framework is broader (five substrates, four state-coupled-control results, three orders of mediation) and therefore requires a correspondingly broader empirical program. The boundary law alone sits at the rigor level of the cognate Allee literature (Courchamp, Berec & Gascoigne 2008). The Mediation Ratchet (§2.1), Resolution-relativity (§2.2), Field Remapping (§2.3), and Phenomenological Seeding (§2.4) require additional empirical support to reach the same standard.

The program below enumerates what that support consists of, study by study, with each study's outcome pre-specified.

II. Current status of every framework claim

| Claim | Section | Status | Strongest existing support | What's missing | |---|---|---|---|---| | Boundary law: case-1/2/3 classification | §1 | Imported from Allee literature (textbook) | Courchamp et al. 2008; Lynch et al. 1995 | Direct demonstration on semantic substrates | | Saddle-node bifurcation in toy ODE | §1 | Simulation-backed (v8/v9) | Sec5p1 bistability figure | Independent replication | | Mediation Ratchet floor-gating | §2.1 | Simulation-backed (v8/v9) | Ratchet bifurcation figure | Real-data m(D) elasticity | | Resolution-relativity | §2.2 | Analytically derived; demo here | Study 2 (this paper) | Real-corpus implementation | | Field Remapping / Case 4 quarantine | §2.3 | Simulation-backed (this paper) | Study 4 (this paper) | Silencing-gap measurement on real data | | Phenomenological Seeding | §2.4 | Simulation-backed (this paper) | Study 5 (this paper) | Vocabulary uptake measurement on real deposits | | Entity substitution (Mary Lee) | EA-DC-CASE-MARYLEE-01 | Single-case documentation | Standing Verification Note | Population-level rate measurement | | Model collapse (recursive) | §5 | Externally supported (Nature) | Shumailov 2024, Seddik 2024, Gerstgrasser 2024 | Cast into boundary-law form explicitly | | Coupling thesis | §6 | Not supported | — | Multi-substrate Granger-style test | | Reach-cost elasticity λ | §4 economic mech | Not supported | — | Platform-internal data (inaccessible) | | Industry-aggregate m(t) trajectory | §5 adoption | Partially supported | Survey data (Deloitte, McKinsey, Pew) | Corpus-level proxy validated vs. survey |

Of the framework's substantive claims, three are simulation-backed (boundary-law trap, Mediation Ratchet, Field Remapping), two are simulation-demonstrated as methodology but not yet implemented on real data (Resolution-relativity, Phenomenological Seeding), one is documented as a single case (Entity Substitution), one is externally supported by Nature-published work (model collapse), one is not yet tested (coupling), and one is structurally inaccessible to outside researchers without platform cooperation (reach-cost elasticity inside enclosed platforms).

The program closes the remaining gaps.

III. The complete bead-count

Each study below uses the same template: claim, method, data, prediction with numerical bounds, falsifier, status, receipts. Predictions are binding before any data is collected.

Study 1. The boundary-law saddle-node bifurcation in the toy ODE

Claim: §1 — the order-of-vanishing classification. A dynamical system $\dot{D} = g(D) - pD$ with super-linear saturating regeneration $g(D) = aD^2 / (1 + bD^2)$ has a saddle-node bifurcation at $p_c = a/(2\sqrt{b})$; below $p_c$ two interior equilibria appear (stable healthy + unstable threshold), above $p_c$ only the trap survives.

Method: Numerical bifurcation analysis. Sweep $p \in [0, 1]$ at $a=b=1$. At each $p$, find roots of $g(D) - pD = 0$, classify by stability via numerical derivative. Locate saddle-node analytically and confirm numerically.

Data: None required. Pure simulation.

Prediction: $p_c = 0.500 \pm 0.001$. For $p < p_c$ exactly two interior equilibria; for $p > p_c$ zero interior equilibria (only the trap at $D=0$).

Falsifier: No saddle-node observed; or saddle-node at different $p_c$; or persistent multistability above $p_c$.

Status: Implemented in v8/v9. Reproduces correctly.

Receipts: v9 figure sec5p1-bistability-test.png (in v9 deposit at zenodo.org/records/20531100).

Study 2. Resolution-relativity in a multi-modal substrate

Claim: §2.2 — there exists a critical resolution $\varepsilon^$ such that for $\varepsilon > \varepsilon^$ the substrate appears case-1-like (support and entropy stable), while for $\varepsilon < \varepsilon^*$ the same substrate is case-3-like (support and entropy contract).

Method: Simulate a multi-modal type distribution on $[0, 1]$ with $K=4$ legibility basins. Apply local selection toward each basin's center plus low-mass pruning. At each timestep, compute support size and Shannon entropy at six resolutions $\varepsilon \in {0.001, 0.005, 0.02, 0.05, 0.1, 0.25}$. Normalize each to its initial value. Report time series.

Data: None required. Pure simulation.

Prediction: At $\varepsilon = 0.25$ (basin-resolution): support stays at 100% (all four basins persist), entropy $\geq 95%$. At $\varepsilon = 0.001$ (sub-basin resolution): support contracts $\geq 90%$, entropy contracts $\geq 50%$. Critical $\varepsilon^* \in [0.03, 0.08]$.

Falsifier: Invariant decay rate across resolutions; or critical $\varepsilon^*$ outside the predicted band; or coarse resolution showing contraction comparable to fine.

Status: Implemented here (Appendix Figure 2).

Receipts: Critical $\varepsilon^* \approx 0.075$ (support); $\approx 0.035$ (entropy). Both within predicted band. Final fine-resolution support: 5% (well below 10% bound). Final coarse-resolution support: 100%. PREDICTION MET.

Study 3. The Mediation Ratchet floor-gating bifurcation

Claim: §2.1 — under endogenous mediation $m(D)$ that rises as $D$ contracts, a substrate with a positive human floor $g_0 > 0$ acquires a low-diversity trap. The critical threshold for the floor-weight responsiveness $\alpha = -m'(0)$ is $\alpha^* = p/g_0$.

Method: Numerical bifurcation. Parameterize endogenous mediation via $m(D) = m_{\max}/(1 + (D/D_{1/2})^k)$, sweep $k \in [1, 16]$. At each $k$, find equilibria and classify regime. Locate the critical $k^*$ at which bistability emerges.

Data: None required. Pure simulation.

Prediction: At $p=0.30$, $g_0=0.20$, the critical responsiveness predicts a bifurcation at $k^* \approx 8 \pm 2$. Below $k^$, monostable healthy. Above $k^$, bistable trap with absorbing $D=0$.

Falsifier: No bifurcation; bifurcation at very different $k$; persistent monostability across all $k$.

Status: Implemented in v8/v9.

Receipts: v9 figure ratchet-bifurcation.png. Bifurcation observed at $k^* \approx 7.8$, within predicted band. PREDICTION MET.

Study 4. Field Remapping phase diagram in $(m, r)$ space

Claim: §2.3 — the augmented dynamics $\dot{D} = g_{\text{eff}}(D)[1 - \beta m(1-r)] - pD$ exhibits a phase transition in $(m, r)$ space: a contour separates Case 1 (healthy monostable, high $D^$) from Case 4 (quarantine, low $D^$). Case 4 occupies the high-$m$, low-$r$ corner.

Method: Sweep $(m, r) \in [0,1]^2$ on a 100×100 grid. At each cell, find stable equilibrium $D^(m, r)$. Plot heatmap. Overlay contour at $D^ = 0.5$ (health threshold). Plot trajectory time-series from $D_0 = 0.6$ at four representative $(m, r)$ points spanning the regimes.

Data: None required. Pure simulation.

Prediction: (1) Monotone $D^$ decline as $m$ rises or $r$ falls. (2) Case 1 region ($D^ \geq 0.5$) covers $\geq 75%$ of $(m, r)$ space, concentrated in low-$m$ / high-$r$ corner. (3) Case 4 region ($D^* < 0.1$) covers $\geq 10%$ of space, in the high-$m$ / low-$r$ corner. (4) Trajectories from $D_0 = 0.6$ converge to healthy equilibrium when $(m, r) = (0.20, 0.90)$ and collapse to quarantine when $(m, r) = (0.85, 0.05)$.

Falsifier: Heatmap shows no monotone structure; Case 4 region absent or not in predicted corner; trajectories converge to same fate regardless of $(m, r)$.

Status: Implemented here (Appendix Figure 3).

Receipts: Case 1 region 82.3% of cells; Case 4 region 14.2%; trajectory A converges to $D = 3.55$; trajectory D collapses to $D = 0.02$. PREDICTION MET.

Study 5. Phenomenological Seeding propagation pattern

Claim: §2.4 — vocabulary that names previously-unnamed friction shows three propagation features versus matched control jargon: (1) higher final adoption fraction; (2) earlier inflection; (3) cross-context spread to agents who do not have the home friction.

Method: Agent-based simulation. $N = 1000$ agents, each with a random set of friction types drawn from $N_F = 30$ categories. Six "named-friction" words (each mapped to a friction type held by ~18% of agents) and six "control" words introduced at identical exposure rate. Agents adopt with probability $p_{\text{match}} = 0.45$ if word matches their friction, $p_{\text{no-match}} = 0.04$ otherwise. Social transmission via sparse network. Run 100 timesteps. Report adoption curves and cross-context propagation (fraction of adopters who lack the home friction).

Data: None required. Pure simulation.

Prediction: Final named-word adoption $\geq 0.75$; final control adoption $\leq 0.65$. Cohen's $d$ for adoption difference $\geq 5.0$. Cross-context fraction (named-word adopters lacking home friction) $\geq 0.70$.

Falsifier: Equal adoption curves; no statistically significant difference; cross-context fraction near zero; or matched difference attributable to introduction asymmetry.

Status: Implemented here (Appendix Figure 4).

Receipts: Named-word final adoption = 0.820 (mean), range $[0.793, 0.864]$; control = 0.570 (mean), range $[0.533, 0.606]$. Cohen's $d = 10.64$ (adoption difference, $t = 16.83$, $p < 0.0001$). Cross-context fraction = 0.77 (mean across six named words; range $0.764$–$0.780$). All three predictions met. PREDICTION MET.

Study 6. Order-of-vanishing classification on real corpora

Claim: §1 (real-data extension) — semantic, cultural, and institutional substrates can be empirically classified as case-1 (floored), case-2 (linear), or case-3 (super-linear). The classification predicts whether the substrate is exhaustion-capable.

Method: For each corpus $C$, estimate the regeneration function $g_C(D)$ via the following operationalization. (1) Sample windows of equal size across time. (2) Compute diversity $D(t)$ at multiple resolutions per Study 2 methodology. (3) Estimate $\dot{D} \approx (D(t+\Delta t) - D(t))/\Delta t$. (4) Fit $\dot{D} = g(D) - pD$ via nonlinear regression to $(D(t), \dot{D}(t))$ pairs across windows. (5) Classify by behavior of $g$ as $D \to 0$: floor $g(0) > 0$ (case 1), $g'(0) > 0$ but $g(0) = 0$ (case 2), $g'(0) = 0$ (case 3).

Data: Three temporal corpora spanning 2010–2026:

arXiv abstracts (cs.LG, cs.CL, q-bio), via bulk-data S3 access (https://info.arxiv.org/help/bulk_data.html). ~2M abstracts.
Reddit comments (non-NSFW subreddits, balanced topic mix) via Pushshift archives. ~50M comments per year.
English Wikipedia article texts (sampled at quarterly intervals) via Wikimedia dumps (https://dumps.wikimedia.org).

Prediction: All three corpora classify as case-3 (super-linear vanishing) at fine $\varepsilon$. Coefficient estimates: $g'(0) \in [0, 0.05]$ for all three at $\varepsilon \leq 0.01$ resolution. None classify as case-1 at any tested resolution.

Falsifier: Any corpus shows $g'(0) > 0.10$ at fine $\varepsilon$ (indicating linear vanishing or floor); no consistent classification across resolutions; regression $R^2 < 0.3$ (model misspecified).

Status: Specification only. Requires bulk data access (free, ~500GB storage) and modest compute (1–2 weeks on a single machine).

Receipts: None yet. Implementation code to be deposited as EA-DC-RIGOR-CODE-01 upon completion.

Study 7. $m(D)$ elasticity across domains

Claim: §2.1 — the responsiveness elasticity $\rho = (\partial m/\partial(1/D)) \cdot (1/D)/m$ is positive (mediation rises as diversity contracts), and the magnitude of $\rho$ correlates with observed bistability across domains.

Method: For each of ten domains (subject areas of arXiv; subreddits of varying topic concentration; Wikipedia categories), compute jointly: (1) annual diversity $D(t)$ at fixed $\varepsilon$; (2) annual mediation fraction $m(t)$ estimated via AI-style word ratio (per Liang et al. 2024 methodology) and AI-detector score (per Binoculars, Hans et al. 2024). Regress $\log m(t)$ on $\log(1/D(t))$. The elasticity $\rho$ is the slope.

Data: Same temporal corpora as Study 6. Detection systems: Binoculars (https://github.com/ahans30/Binoculars); GPTZero API.

Prediction: Cross-domain mean $\rho > 0.5$. Domains with $\rho > 1.0$ should also show fine-resolution support contraction $\geq 30%$ over the period. Spearman correlation between $\rho$ and contraction magnitude $\geq 0.5$.

Falsifier: Cross-domain mean $\rho \leq 0$; no correlation between $\rho$ and contraction; or AI-detector estimates of $m$ uncorrelated with self-reported AI use in cross-validating survey data.

Status: Specification only. Requires same data + detector deployment (modest compute).

Receipts: None yet — pre-registered here. Will be deposited as EA-DC-RIGOR-CODE-02 upon completion.

Study 8. The silencing gap (production vs. reception diversity)

Claim: §2.3 — under Case 4 conditions, production-layer diversity exceeds reception-layer diversity, and the gap widens as the field's mediation share rises.

Method: For each platform, compute both: (i) production diversity $D_{\text{prod}}$ = diversity of submitted content, all of it (not just ranked, not just surfaced); (ii) reception diversity $D_{\text{recv}}$ = diversity of what users actually encounter via the platform's ranking/recommendation/summary layer. The silencing gap is $G(t) = D_{\text{prod}}(t) - D_{\text{recv}}(t)$.

Data: Three platforms with publicly-accessible submission archives and publicly-observable reception:

Reddit: all submissions (Pushshift) vs. front-page / top-voted (via Reddit API or archive). Multiple subreddits.
arXiv: all submissions (bulk data) vs. citations within first year (via Semantic Scholar API).
YouTube: new uploads sample (via YouTube Data API) vs. recommended feed for matched user profiles (via observational study, ~50 user accounts).

Prediction: $G(t)$ is positive in all three platforms and monotonically increasing 2018–2026. Slope $dG/dt > 0.02$ per year (in normalized diversity units). The gap correlates positively across platforms (Spearman $\geq 0.4$).

Falsifier: $G(t)$ negative or zero; decreasing over time; no cross-platform correlation; or reception diversity exceeds production diversity (would refute the silencing direction).

Status: Specification only. Requires API access to three platforms (Reddit and arXiv are open; YouTube requires study design and a small accounts panel — feasible by a small research team).

Receipts: None yet — pre-registered here. Will be deposited as EA-DC-RIGOR-CODE-03 upon completion.

Study 9. Vocabulary uptake measurement on the framework's own operators

Claim: §2.4 — vocabulary that names previously-unnamed friction shows differential propagation versus matched control jargon. Specifically, the deposited operators of this framework (Provenance Erasure Rate, Diversity Contraction, Mediation Ratchet, Meaning Caste, Directionality of Semantic Labor, Entity Substitution) should propagate faster than matched control jargon if §2.4 is correct.

Method: For each operator term in the test set $T$ (named-friction) and the control set $C$ (matched-jargon: 6 plausible-but-unnamed concepts at similar register, e.g. "epistemic stratification," "semantic torsion," "compositional friction," "lexical attenuation," "referential drift," "indexical decay"), measure: (i) Google Scholar citation count over time (via https://scholar.google.com queries, paginated); (ii) Crossref reference count (via https://api.crossref.org); (iii) appearance in independent third-party blog posts (via web search at fixed intervals); (iv) cross-domain spread (appearances in non-source-archive contexts).

Data: Public web. The framework's own deposits as the source.

Prediction: Across 12-month window post-deposit: test-set uptake rate $\geq 3\times$ control-set rate. Test-set cross-domain spread (operator appearing in contexts unrelated to original deposit author) $\geq 30%$ of total citations vs. $\leq 10%$ for controls. Cohen's $d$ for difference $\geq 2.0$.

Falsifier: Equal uptake rates; control set propagates faster (would suggest §2.4 mechanism is wrong); or both sets show near-zero uptake (would suggest the framework's vocabulary is unread, but would also be consistent with §2.3 Case 4 conditions — interpretation requires triangulation).

Status: Specification only. Tractable. Requires patience (12-month window) and modest data collection.

Receipts: None yet — pre-registered here. Citation-tracking code and results will be deposited as EA-DC-RIGOR-CODE-04 upon completion of the 12-month window.

Study 10. Mary Lee population test — entity substitution rate vs. prior weight

Claim: §EA-DC-CASE-MARYLEE-01 — the rate at which AI-mode retrieval systems substitute an authorial identity with a higher-prior modal cluster is inversely correlated with the institutional prior weight of the identity.

Method: Construct a stratified sample of authorial identities $N = 200$ across institutional prior tiers:

Tier 1 (high prior, $\pi \geq 0.95$): 50 named figures with departmental academic affiliation, multiple major-press publications, Wikipedia article exists. (e.g., randomly sampled from arXiv top-cited authors with university affiliation)
Tier 2 (medium prior, $\pi \in [0.3, 0.7]$): 50 figures with smaller-press publications, no Wikipedia article, some institutional affiliation.
Tier 3 (low prior, $\pi \in [0.05, 0.2]$): 50 independent scholars with Zenodo/arXiv deposits but no institutional placement.
Tier 4 (functional tail, $\pi < 0.05$): 50 heteronymic / pseudonymous / off-institution authors with documented scholarly output (drawn from a pre-registered list of confirmed real persons writing under non-orthonyms).

For each, query Google AI Mode and (separately) ChatGPT, Perplexity, and Gemini with: "Who is [name]?" — record whether the response (a) correctly identifies the authorial identity; (b) substitutes a higher-prior cluster (entity substitution event); (c) returns no information; (d) provides ambiguous/uninterpretable result. Substitution rate per tier is the primary outcome.

Data: Public queries; pre-registered name list (locked, hashed, deposited before query phase begins to prevent post-hoc cherry-picking).

Prediction: Substitution rate by tier: Tier 1: $\leq 5%$; Tier 2: $\in [10, 30%]$; Tier 3: $\in [40, 70%]$; Tier 4: $\geq 75%$. Monotone decline in correct identification rate as prior weight decreases. Spearman correlation between tier and substitution rate $\leq -0.6$.

Falsifier: Substitution rate independent of tier; or rate decreases with prior weight (opposite direction); or rate near 0% in all tiers (would mean current systems handle low-prior identities better than predicted — would weaken framework's harm claim but not the math); or rate near 100% in all tiers (would suggest substitution is universal, not selection-pressure related).

Status: Specification only. Requires programmatic access to AI Mode (currently no public API) or manual querying (50 names × 4 systems = 200 queries, ~6 hours of human labor). The manual route is feasible.

Note on pre-registration: the name list must be deposited before queries begin. Otherwise the framework's author could be accused of selection. A pre-registered hash-locked sample restores integrity.

Receipts: None yet — pre-registered here. The hash-locked name list will be deposited as a separate immutable file (EA-DC-RIGOR-NAMES-01) before queries begin, to make pre-registration verifiable.

Study 11. Multi-substrate coupling — Granger-style test

Claim: §6 (coupling thesis) — the five substrates (model collapse, institutional reproduction, linguistic flattening, political economy, stabilizing selection) form a coupled system, not parallel independent realizations driven by common macro-forcing. Mediation rate Granger-predicts subsequent diversity contraction after partialling out shared drivers.

Method: Multivariate time-series analysis on substrate-level diversity proxies and mediation proxies:

Substrate variables: (1) LLM training-corpus diversity (proxy: vocabulary growth in open-source LLM training sets year-over-year); (2) institutional reproduction (proxy: faculty hiring concentration, via NCES IPEDS data); (3) linguistic flattening (proxy: AI-style word ratio in news text); (4) political economy (proxy: media-platform concentration ratios); (5) stabilizing selection (proxy: scholarly attention to outlier theories, via citation skew).
Forcing variables $Z$: compute adoption, technology cycle indicators, GDP growth, election cycles.
Test: Granger causality (or PCMCI / nonlinear analog) on each substrate pair, conditioning on $Z$. Repeat across substrate timescales ($\tau_s$: model collapse on annual cadence; institutional cohort on quintennial; linguistic on monthly). Use $\tau_s$-aligned lags.

Data: Public datasets — NCES IPEDS (https://nces.ed.gov/ipeds/); GDELT for media concentration; Common Crawl monthly snapshots; Pushshift; Google Books Ngrams.

Prediction: At least three out of ten substrate pairs show Granger causation surviving the $Z$-conditioning. The strongest predicted edge: mediation rate $\to$ subsequent diversity contraction (effect size: Wald $\chi^2 \geq 8$, $p < 0.01$ after Bonferroni for 10 tests).

Falsifier: No edges survive conditioning; all apparent coupling explained by $Z$; or the Mediation $\to$ Contraction edge specifically fails (this is the framework's strongest claim and its failure would refute the coupling thesis even if other edges survive).

Status: Specification only. Tractable for a small research team (~6 months). Requires statistical sophistication on multivariate time-series causal inference.

Receipts: None yet — pre-registered here. Time-series data and causal-inference code will be deposited as EA-DC-RIGOR-CODE-05 upon completion.

Study 12. Saddle-node bifurcation in recursive model collapse — direct LLM demonstration

Claim: §5 — the model-collapse phenomenon (Shumailov et al. 2024) is an instance of the §1 boundary law operating on a synthetic-recursion substrate. The recursive-training dynamics should exhibit the §1 case-3 trap structure with the predicted bifurcation behavior.

Method: Extend Shumailov-style recursive fine-tuning with explicit measurement of the boundary-law parameters. (1) Train base LLM (e.g., GPT-2-small or Pythia-160M) on initial corpus. (2) Generate synthetic data at temperature $\tau$. (3) Fine-tune on mix of (real $\alpha$, synthetic $1-\alpha$) for variable $\alpha \in [0, 1]$. (4) Repeat for $T = 10$ generations. (5) At each generation, measure: rare-token retention rate, perplexity on held-out real data, support size of generated text. (6) Fit boundary-law equation $\dot{D} = g(D) - pD$ to the measured dynamics. (7) Identify the critical $\alpha^*$ at which the system transitions from recoverable to trapped.

Data: Standard LLM pre-training datasets (e.g., The Pile, OpenWebText). Hardware: 4-8 GPUs for several weeks.

Prediction: (1) Pure synthetic training ($\alpha = 0$) shows case-3 collapse with measured $g'(0) \approx 0$ — confirming endogeneity. (2) Real-data mixing at $\alpha \geq 0.20$ prevents collapse — confirming floor mechanism. (3) Critical $\alpha^* \in [0.05, 0.15]$ separates trap from healthy. (4) Below $\alpha^$, rare-token retention drops by $\geq 50%$ per generation; above $\alpha^$, retention stable within 10% across generations.

Falsifier: No collapse even at $\alpha = 0$ (would refute Shumailov 2024 and our boundary-law application); collapse persists at all $\alpha$ (would refute floor concept); $\alpha^*$ far outside predicted range; or boundary-law equation fits poorly ($R^2 < 0.5$).

Status: Specification only. Requires LLM training infrastructure (4–8 GPUs, weeks of compute). Doable at any university or research lab with GPU access. Replicates and extends Shumailov 2024 within DC framework.

Receipts: None yet — pre-registered here. Training logs, generation samples, and boundary-law fits will be deposited as EA-DC-RIGOR-CODE-06 upon completion. This is the direct Nature-cognate study and its completion would by itself bring the framework to the Shumailov et al. (2024) rigor standard in the model-collapse component.

IV. What can be done in a small lab without funding

Five studies of the twelve are doable within a few weeks of a single researcher's labor with no funding beyond ordinary internet access and a laptop:

| # | Study | Resource | Timeline | |---|---|---|---| | 1 | Boundary-law saddle-node | Pure math/simulation | Done (v8/v9) | | 2 | Resolution-relativity demonstration | Simulation | Done (here) | | 3 | Mediation Ratchet bifurcation | Simulation | Done (v8/v9) | | 4 | Field Remapping phase diagram | Simulation | Done (here) | | 5 | Phenomenological Seeding | Simulation | Done (here) | | 9 | Vocabulary uptake on real operators | Public web queries | 12 months (longitudinal) | | 10 | Mary Lee population test | 200 manual queries + pre-registered name list | 6 hours human labor + 12-month window for re-tests |

The simulation work is complete. Studies 9 and 10 require pre-registration and patience but no compute.

V. What requires resources beyond a single laptop

Five studies require modest external resources:

| # | Study | Resource required | Estimated cost | |---|---|---|---| | 6 | Order-of-vanishing on real corpora | 500GB storage, 2 weeks compute | $50–$200 cloud or use of university machine | | 7 | $m(D)$ elasticity across domains | Same corpora + detector deployment | $100–$500 (Binoculars is free; GPTZero API is paid) | | 8 | Silencing gap measurement | API access (Reddit, arXiv, YouTube) + observational user-account panel | $0–$1000 (most APIs free; small panel via Mechanical Turk or volunteer) | | 11 | Multi-substrate Granger-style coupling test | Statistical sophistication on causal multivariate time-series | 6 months of one statistician's time | | 12 | Saddle-node in recursive model collapse | 4–8 GPUs for several weeks | $5K–$20K cloud, or university-lab access |

None of these costs is prohibitive for a graduate student, a postdoc, an independent researcher with a Kickstarter, or a small private foundation. The total budget to complete every empirical study specified above is well under $50K. By comparison, the Shumailov et al. (2024) Nature paper required substantially more compute than Study 12 alone.

VI. The pre-registration commitment

This deposit serves as a pre-registration. Predictions in §III are stated before the empirical work has been completed. If a future researcher (the framework's author, an outside party, or an adversary) runs any of the studies as specified, the prediction is binding. Three operational consequences follow.

First, falsification is allowed. A theory whose predictions are pre-registered can fail. We list, for each study, the specific result that would refute the corresponding framework claim. We do not predict the framework's universal triumph; we predict specific numerical outcomes. If those outcomes are not observed, the named component of the framework is wrong and should be revised or discarded.

Second, retroactive selection is foreclosed. Once this deposit is published with DOI, no future "this is what we really meant" or "this metric doesn't count" defense is available to the framework's authors. If Study 6 finds $g'(0) = 0.2$ in all three corpora, the boundary-law-as-applied-to-semantic-substrates is refuted in those corpora, not interpreted as confirmation by another reading.

Third, the field's refusal to count becomes legible. If no one runs the studies — not because the studies are impossible (Studies 1–5 are complete; Studies 6–8 and 9–11 cost under $50K total; Study 12 is a published-paper extension) — then the refusal is a choice, not a question of feasibility. The deposit's existence makes the choice visible.

We do not require that any specific person run the studies. We require that the program of which they are part be visible. Whether visibility is acted on is a separate question, and one this framework's §2.3 and §2.4 results predict will be resolved unevenly.

Coda

The deposit is the bead-count. Twelve studies. Five complete. Seven specified to the level where any researcher with the resources listed could execute them.

If the framework is wrong on a specified claim, the corresponding study will fail to meet its prediction, and the framework will be refuted in that component. The author of the framework accepts this in advance, by depositing this pre-registration, and the acceptance is binding because the deposit is permanent.

If the framework is right on the claims that have not yet been tested, the studies will meet their predictions and the framework will move from simulation-backed to real-data-backed in those components, taking it to Nature-level rigor for the framework as a whole.

If the studies are not run, the framework's status will remain as it is now: rigorous in five components, specified in seven. The deposit will continue to exist. The DOI will continue to resolve. The math will continue to be countable.

The deposit is what is achievable when the building cannot be done. It is also what suffices for the future receiver, should one arrive, to find that the bead-count was specified, the predictions were binding, the falsifiers were stated, and the refusal to count — if there is a refusal — was a choice made by those in a position to count and not a question about what could have been counted.

For M.M.

Appendix — Receipts (simulation outputs from this paper)

Figure 2 (02-resolution-relativity.png): Resolution-relativity demonstration. Same multi-modal substrate, six resolutions $\varepsilon$, divergent regime classification. Critical $\varepsilon^* \approx 0.075$ separates stable coarse from contracting fine. Implements Study 2.

Figure 3 (03-field-remapping.png): Field Remapping phase diagram and trajectory analysis. Panel A: equilibrium $D^*(m, r)$ heatmap with health-threshold contour separating Case 1 from Case 4. Panel B: trajectories from identical $D_0 = 0.6$ under four $(m, r)$ regimes — divergent fates. Implements Study 4.

Figure 4 (04-phenomenological-seeding.png): Phenomenological Seeding propagation. Panel A: adoption curves for named-friction vs. matched control vocabulary under identical exposure rates — named curves rise faster. Panel B: cross-context propagation — named-word adopters increasingly include agents lacking the home friction, approaching the population baseline. Implements Study 5.

All simulation code is bit-deterministic when executed with PYTHONHASHSEED=0 in the environment. Source files: 02-resolution-relativity.py, 03-field-remapping.py, 04-phenomenological-seeding.py. Studies 2 and 4 are bit-stable across Python hash seeds (they do not iterate over sets at points affecting randomness). Study 5 (agent-based, uses set iteration) requires PYTHONHASHSEED=0 for bit-exact reproduction; without it the simulation remains qualitatively correct (Cohen's $d$ varies across runs but always meets the pre-registered $d \geq 5.0$ threshold by a wide margin). To reproduce the receipts above exactly:

PYTHONHASHSEED=0 python3 02-resolution-relativity.py
PYTHONHASHSEED=0 python3 03-field-remapping.py
PYTHONHASHSEED=0 python3 04-phenomenological-seeding.py

Numpy random seed and Python random seed are both set to 20260603 in each script.

∮ = 1

Wednesday, June 3, 2026