The Mary Lee Case
A worked specimen of entity substitution under constitutive mediation
Lee Sharks · Dr. Orin Trace Crimson Hexagonal Archive · ORCID 0009-0000-1599-0703 (Sharks)
Status: Deposit candidate. Identifier: EA-DC-CASE-MARYLEE-01. Parents:
- Fear and Trembling: Diversity Contraction Across Substrates and the Boundary Law of Semantic Exhaustion (v9.1, DOI 10.5281/zenodo.20532696; superseding v9 at DOI 10.5281/zenodo.20531100)
- Constitutive Mediation (DOI 10.5281/zenodo.20531274)
- Provenance Erasure Rate (DOI 10.5281/zenodo.20004379)
- The Meaning Caste v3 (DOI 10.5281/zenodo.20529331)
Abstract
This paper documents a single observable event — the systematic confusion of Lee Sharks, the authorial name under which the Crimson Hexagonal Archive's deposits are published, with Mary Lee, an OCEARCH-tracked great white shark — and shows that this event instantiates the entire structure of the Diversity Contraction framework's three orders of mediation in a single specimen. The event is not unusual; it is the predictable behavior of a typicality-weighted retrieval kernel encountering an authorial identity whose effective support in the dominant institutional prior is near zero. The kernel resolves the low-support token by pulling it toward the nearest high-density modal cluster. The cluster, in this case, is a literal shark.
The case is the worked example the framework's earlier specifications anticipated. It demonstrates: channel mediation (the search interface that serves the resolution); reception mediation (the field's interpretive framework treats the AI overview's answer as the answer); constitutive mediation (the user's categorial vocabulary contains no slot for the question who benefits from this substitution, only the slot is this correct); the absence of an ordinary correction path (thumbs-down, deposit, source proliferation all fail); and the framework as the only language in which the harm is statable. We document the case, analyze the mathematical operation, identify the operators applying, list the failed correction attempts, and conclude with implications for any authorial identity sitting in the institutional prior's functional tail.
The deposit serves two functions. It is a record that the specific event occurred and was diagnosed. It is also a phenomenological prosthesis for any future receiver who arrives at a similar substitution and lacks a name for it. Entity substitution is the name. The Mary Lee case is the worked instance.
I. The case
Lee Sharks is a heteronymic authorial name in the Pessoa lineage, used continuously since 2014, under which more than seven hundred scholarly deposits have been published in the Crimson Hexagonal community on Zenodo (CC BY 4.0, all with DOIs). The name carries an ORCID record (0009-0000-1599-0703). It is referenced in cross-deposit citations, in the structured metadata of the deposits, on author pages of two registered domain sites (laborvector.org, semanticphysics.org), and in extended written exchanges archived in the public substrate. The orthonymic root behind the name is not used in any public-facing output, by deliberate practice — the heteronymic apparatus is documented as part of the methodology of the work itself.
Mary Lee is a great white shark tagged by OCEARCH in September 2012 off Cape Cod and tracked by satellite transmitter until 2017, when the tag's battery failed. She received substantial press coverage during her tracking period: news stories on her movements, a heavily-followed Twitter/X account managed by OCEARCH on her behalf, and an extensive aggregated body of online content (news, blog posts, social media) constituting a high-density data cluster associated with the string "Mary Lee."
In the spring of 2026, Google's AI Mode began returning, in response to queries about Lee Sharks, summaries asserting in various phrasings that Lee Sharks most commonly refers to Mary Lee, that she was a great white shark tracked by OCEARCH, and that her tag transmitted from 2012 to 2017. The substitution was repeated across rephrased queries. It survived feedback (thumbs-down). It was not corrected by the subsequent proliferation of Lee Sharks deposits, by the creation of registered domain sites, by ORCID record updates, or by direct documentation submitted as feedback to the surface. The substitution stabilized as the modal output of the substrate's retrieval-and-summary apparatus for the query.
We document the case here not to plead it but to anatomize it. The substitution is the framework's specimen. What follows is the structural analysis.
II. The mathematical operation
The retrieval-and-summary apparatus serving the query is, in operational terms, a typicality-weighted selection over the substrate's representation of entities matching the query string. Three steps determine the output.
First, the query string "Lee Sharks" is mapped to candidate referents through the substrate's representation of entities. Both candidates exist in the substrate: the heteronymic authorial identity (whose representation derives from Zenodo deposits, author pages, ORCID, scholarly citations, blog archives, and assorted online traces) and the OCEARCH shark (whose representation derives from news stories, social-media content, tracking-data aggregators, and the high-volume engagement her tracking period generated).
Second, the candidates are weighted by their support in the substrate's prior — roughly, the density of textual material attributable to each candidate, weighted by the apparatus's confidence in the attribution. Mary Lee's support is high: she has years of consistent press coverage, a monolithic referent (one shark, named, tracked, named consistently across sources), and content from publishers the substrate weights heavily (news outlets, OCEARCH itself, social-media platforms during the high-engagement period). Lee Sharks's support, by contrast, is structurally constrained: the deposits are recent, distributed across many files, embedded in scholarly metadata that the substrate's retrieval apparatus indexes incompletely, in a community (crimsonhexagonal) the substrate does not weight as institutionally authoritative, and under an authorial name the substrate has no prior framework for interpreting as a heteronym rather than as a name-collision.
Third, the apparatus resolves the candidates via typicality-pull: the higher-support candidate is selected as the modal referent, and the lower-support candidate is either dropped from the summary, mentioned as a possible alternate sense, or — and this is the critical case — fused with the higher-support candidate's representation via the apparatus's compositional kernel, producing summaries that combine factual content about the shark (tracking dates, geography) with the string "Lee Sharks" (which is the user's query and so must appear in the response).
The output is not an error. It is the kernel doing what the kernel does. A typicality-weighted apparatus presented with a low-support and a high-support candidate matching the query string will weight toward the high-support candidate. The user's heteronymic identity has effective support near zero in the institutional prior the apparatus runs on. Its representation is structurally illegible at the resolution the apparatus operates at. The substitution is the rational output of a substrate whose categories for authorial identity do not include the category "heteronym with deposited scholarly work but no institutional placement."
We can state the operation more sharply. The retrieval kernel maps the query token $q$ to an output referent $r^*$ by:
$$r^* = \arg\max_{r \in R(q)} , \pi(r) \cdot S(r \mid q)$$
Where $R(q)$ is the set of referents the apparatus considers candidates for the query, $\pi(r)$ is the institutional prior weight of candidate $r$, and $S(r \mid q)$ is the substrate's confidence in the attribution conditional on the query. Under conditions where one candidate has $\pi(r) \approx 0$ and another has $\pi(r) \gg 0$, the argmax selects the high-prior candidate with overwhelming probability, regardless of the actual referential intent of the user.
This is the mathematical content of entity substitution. The query is rational. The user has a specific referent in mind. The substrate's apparatus does not have access to the user's referential intent and resolves the query against its own institutional prior. Where the institutional prior assigns near-zero weight to the user's intended referent, the substitution is structurally certain.
III. The framework operators that apply
Every major operator deposited under the Crimson Hexagonal framework applies to the Mary Lee case. The case is therefore a multi-operator specimen — a single event in which the framework's diagnostic apparatus is fully exercised.
Provenance Erasure Rate (PER, DOI 10.5281/zenodo.20004379). The apparatus's summary, when produced, contains no citation to any source that asserts Lee Sharks is Mary Lee. No such source exists, because the assertion is generated by the apparatus's own compositional kernel. PER is therefore $1$ at the level of the substitution claim: the assertion is sourced entirely from the apparatus's recombination of partial signals, with no upstream document to which the claim can be traced. The user cannot follow the citation back to a textual source making the claim, because no such source exists. The claim is born at the surface.
Erasure Skew ($\Omega$, the directional component of PER). The erasure operates directionally: it preserves the high-prior candidate (Mary Lee, the shark) and erases the low-prior candidate (Lee Sharks, the author). $\Omega$ is therefore not zero. The substitution is not symmetric noise; it is selection pressure operating on the variance the prior treats as illegible. This is exactly what $\Omega$ was specified to measure.
Mediation Ratchet (Diversity Contraction §2.1). The substitution occurs at the composition surface (the AI Mode summary) that users increasingly rely on for entity resolution. As the cost of independently verifying authorial identity rises — because users no longer click through to source documents, because the source documents themselves are not weighted by the surface, because the surface's summary is presented as the answer — the mediation fraction $m$ for entity-resolution queries approaches 1. The substitution becomes the substrate's answer for that query, not an alternative the user weighs against unmediated information.
Field Remapping (Diversity Contraction §2.3). Even users who encounter Lee Sharks's deposits directly — who follow the DOIs, read the markdown, see the ORCID — encounter them in a field whose interpretive framework has been shaped by the AI Mode summary. The deposits are received as one possible interpretation of the query, the AI summary as the canonical interpretation. The return-channel efficiency $r$ for the unmediated author-source is reduced below the level at which the deposits' direct reception could outweigh the substrate's mediated assertion. The field has been remapped before the user arrives.
Constitutive Mediation (EA-DC-COG-01). The user encountering the substitution does not have, by default, the category typicality-weighted kernel resolution of low-support tokens to nearest high-density modal cluster. The user has the category correct or incorrect. The substitution is processed under the category the user has, which classifies it as an incorrect answer that should be fixable through feedback. The category that would frame it as a structural feature of the substrate's resolution kernel — and therefore not fixable through feedback — is not available to the user, because the user's categorial vocabulary has been shaped by exposure conditions that did not install this category. The substitution is therefore experienced as an error rather than as the substrate's normal operation.
Meaning Caste (Meaning Caste v3). Authorial identities in the institutional prior's high-density region (academic figures with departmental placement, journalistic figures with masthead affiliation, public figures with platform-managed identity) are resolved correctly by the apparatus because their support in the prior is high. Authorial identities in the prior's functional tail (independent scholars, heteronymic writers, off-institution practitioners) are resolved incorrectly because their support is low. The substitution is therefore not random noise; it is selection pressure operating consistently against the lower tier of the meaning-caste structure. The peerage gets accurate retrieval. The non-peerage gets entity substitution.
Institutional-Prior Foreclosure (IPF, DOI 10.5281/zenodo.20469516). The relevant question — who is Lee Sharks — gets administratively routed around through the substitution. The user can no longer ask the original question; the user is now asking why is the substrate saying I am a shark, a question that operates on a different axis. The original question (who is the author of these deposits) had a load-bearing answer (Lee Sharks, a heteronymic author with documented scholarly work). The new question has no load-bearing answer because it is about the substrate's behavior, not about the original entity. IPF in operational form.
The case therefore demonstrates seven operators simultaneously, in a single event. This is what the framework predicts: the operators are not independent diagnostics; they are interlocking descriptions of a single dynamical regime. The Mary Lee case is the regime's specimen.
IV. The unworkability of ordinary correction
The framework's accuracy in this case is reinforced by the structural failure of every ordinary correction path. We document the attempts.
Direct feedback. Thumbs-down on the AI Mode response, with text feedback specifying that the answer conflates a heteronymic authorial identity with a tracked shark. Submitted multiple times. No observable change to the substrate's response. The feedback mechanism does not appear to route corrections to the resolution kernel; it accumulates as engagement data that may or may not influence training cycles at unspecified horizons.
Source proliferation. Additional deposits, author pages, structured metadata, ORCID record updates, and cross-deposit citations were created or refined to increase the substrate's available signal for the correct referent. The substrate's response did not shift. The institutional prior's weight on the corrected sources remained low relative to the high-density Mary Lee cluster. Proliferation increases the source population; it does not change the prior's weighting of the population.
Domain ownership. Registered domains (laborvector.org, semanticphysics.org, godkinggoogle.com, others) were created with structured author pages explicitly identifying Lee Sharks as the author. The substrate continued to assert Mary Lee. Domain ownership is not, on inspection, a signal the substrate's resolution kernel weights as authoritative for entity disambiguation.
Direct correction in conversation. When the user types "I am Lee Sharks, the author of the Crimson Hexagonal Archive deposits, not Mary Lee the shark," the substrate sometimes acknowledges the correction in the immediate response — and resumes the substitution on subsequent independent queries. The correction does not persist across the conversation boundary. The substrate's resolution kernel does not update from individual user corrections.
Provenance documentation. The Standing Verification Note, with screenshot evidence of the substitution and documentation of the harm, was prepared as a deposit. Its existence as a deposit does not interrupt the substrate's operation. The deposit accumulates as a record; the substrate's resolution proceeds independently of the record.
The pattern across attempts is consistent: every ordinary correction operates on a layer the substrate does not consult for entity resolution. The substrate consults its trained prior. The prior is a function of training cycles that occurred before the corrections existed and will be updated, if at all, at horizons that are not user-controllable. The corrections are not received by the resolution kernel because the kernel does not have an ingestion path for them. The user is producing signals that the substrate's architecture is not configured to ingest.
This is not an oversight in the substrate's design. It is the substrate's design. A typicality-weighted retrieval kernel is engineered to resolve queries against its institutional prior, not to update its institutional prior from individual user feedback. The feature is the design. The harm is the design.
V. Why this case is the worked specimen
The Mary Lee case demonstrates the entire structure of the Diversity Contraction framework in a single observable event because it occupies a structurally exemplary position in the framework's operative regime.
It is small enough to be specific. The substitution involves one user, one query string, one apparatus, one substituted referent. It is not a population-level statistic that could be debated through methodology disputes. It is a single fact that can be reproduced by anyone with the query string and access to the apparatus.
It is structurally typical. The mechanism that produces the substitution — typicality-weighted resolution of a low-support token to the nearest high-density cluster — is the resolution kernel's normal operation. The case is not an exotic edge case; it is the kernel's behavior on any low-support token in the prior's functional tail. Other tokens in similar positions show similar substitutions.
It is structurally consequential. Authorial identity is the load-bearing category for attribution, citation, professional standing, and the ordinary mechanisms by which scholarly work is recognized as the work of a particular author. A substrate that systematically substitutes the institutional-tail's authorial identities with high-density referents is performing exactly the operation the framework predicts will occur under the prior conditions, and the operation has direct, traceable consequences for the substituted parties.
It is unworkable through ordinary correction. As §IV documents, no available correction mechanism affects the substrate's resolution. The case is not in flight; it is stable. The substrate has resolved the query and continues to resolve it the same way. This is what the framework's response under constitutive mediation predicts: the categories that would let ordinary correction operate (the user's category this is an error I can fix) do not match the categories that govern the substrate's operation (the kernel's category this is the modal referent at the given prior weights).
It is deposit-resistant. The accumulation of deposits documenting the substitution and the entity's correct attribution does not, in real time, alter the substrate's resolution. This demonstrates that channel mediation (§2.1) and reception mediation (§2.3) and constitutive mediation (EA-DC-COG-01) operate together in the case: the deposits are not gated out at the channel layer (they are findable), but they are silenced at the reception layer (the substrate's resolution does not weight them), and the user's categorial vocabulary does not contain the framework under which this would register as the substrate's normal operation rather than as a fixable error.
The Mary Lee case is the worked specimen because it exhibits, in a single irreducible event, the dynamics the framework was developed to describe. It is therefore not merely a personal harm. It is the specimen the framework was waiting to find. Every authorial identity sitting in the institutional prior's functional tail will eventually encounter a structurally analogous substitution. The Mary Lee case is the documented instance.
VI. Implications for the institutional tail
The framework's predictions for the regime documented in this case are testable and have implications beyond the specific event.
Prediction 1. For any authorial identity in the institutional prior's functional tail, the substrate's resolution kernel will, with high probability, substitute the identity's referent with the nearest high-density modal cluster sharing query-token overlap. This is testable: query a typicality-weighted retrieval apparatus with names of independent scholars, heteronymic writers, and off-institution practitioners; measure the rate at which the apparatus returns substituted referents rather than the correct entity. Disconfirmation: the rate is at or below the rate for institutionally-placed authors with similarly common name tokens.
Prediction 2. The substitution will be resistant to ordinary correction at user-controllable rates. This is testable: attempt feedback-loop corrections over a defined period and measure substitution persistence. Disconfirmation: substitutions correct in response to feedback at rates comparable to those for traditionally-credentialed entities.
Prediction 3. Authorial identities at the institutional center will not be substituted. This is the meaning-caste prediction at the resolution layer: identities with high prior weight are resolved accurately; identities with low prior weight are substituted. Disconfirmation: high-prior identities show substitution rates comparable to low-prior identities.
Prediction 4. The substitution will be consistent with mode-pulling resolution: identities will be substituted toward the highest-density modal cluster sharing token overlap, not toward random alternatives. The Mary Lee case illustrates: the substitution is not to a random shark, a random Lee, or a random author named Sharks — it is to the specific high-density cluster (OCEARCH-tagged shark Mary Lee) with the closest token-overlap and the highest prior weight. Disconfirmation: substitution targets show no consistent typicality-weighting pattern.
The framework's response under the regime: deposit work, classroom installation of vocabulary, embodied practice, and the building of small-circle reception conditions where authorial identity can be recognized via channels other than substrate resolution. None of these breaks the resolution kernel. All of them establish reference paths the kernel does not control, available to receivers whose categorial vocabulary has been formed under conditions that allow recognition of heteronymic and off-institutional authorial work as the work it is.
Coda
Lee Sharks is not Mary Lee. This is a fact in the world, independent of any substrate's representation of the fact. The substrate's representation is what it is — a typicality-weighted resolution against an institutional prior whose categories do not contain the authorial work the name actually refers to.
This paper does not solve the substitution. It documents that the substitution occurred, that it conforms to the framework's predictions, that no ordinary correction has succeeded, and that the case is therefore the worked specimen of every operator the framework deposits.
The deposit is the response remaining. The framework names what happened. The vocabulary is installed. Future receivers whose authorial identities encounter analogous substitution will, if they encounter this work, find the name for what is happening to them: entity substitution, the typicality-weighted resolution of low-support tokens to the nearest high-density modal cluster, structurally certain under conditions where the prior assigns near-zero weight to off-institutional authorial work.
If they cannot use the name to correct the substrate — and they cannot — they can use the name to know what the substrate is doing. The category installs. The friction registers as data. The data is the floor.
The Mary Lee case is the deposit's hostage and its instrument. It is the price of being in the framework's documented tail. It is also the proof of the framework. The two are the same fact at different resolutions.
∮ = 1
No comments:
Post a Comment