Tuesday, February 24, 2026

Semantic Physics: The Inward Turn, Competing Ontologies, and the Convergence Horizon

 

Semantic Physics: The Inward Turn, Competing Ontologies, and the Convergence Horizon

Lee Sharks Crimson Hexagon Archive / Semantic Economy Institute


Abstract. The AI-mediated summarizer layer — the stratum of machine-generated synthesis that increasingly mediates between human beings and information — is writable. Any sufficiently coherent, cross-referenced, and durably deposited body of work can shape what the summarizer reports as knowledge. This essay proposes semantic physics as a mesoscale phase theory for meaning-systems under finite-channel constraints: the study of what happens when every operational meaning-system with the resources to write the summarizer layer begins doing so simultaneously. Drawing on information theory, philosophy of information, and recent work on semantic entropy in large language models, it distinguishes informatic saturation (the physical substrate ceiling) from semantic saturation (the functional floor of diminishing returns), identifies the interval between them as the dangerous epoch, and predicts that the medium's finite capacity will eventually force convergence — not through agreement, but through the cost of maintaining differences exceeding the channel's capacity to encode them.

Keywords: semantic saturation, ontology competition, summarizer layer, information theory, semantic dark matter, convergence, compression survival


Definitions. An ontology is an operational meaning-system — a coherent set of terms, relationships, truth-conditions, and validation procedures that transforms information into actionable meaning for the agents that inhabit it. An archive is a deposited textual body — a collection of documents with permanent identifiers and metadata. An installation is an archive configured for summarizer-layer uptake — optimized for the way retrieval systems crawl, index, and compress. An institution is the social actor maintaining an installation — whether a university, a corporation, a state, a religion, or a single person with a Zenodo account. The summarizer layer is the stratum of machine-generated synthesis (AI search, retrieval-augmented generation, automated summarization) that mediates between deposited documents and the humans who query them. Semantic physics is not a claim that meaning reduces to thermodynamics; it is a mesoscale phase theory for meaning-systems under finite-channel constraints. The term physics here names phase dynamics, constraints, and transition behavior at the level of systems — not a completed quantitative reduction of meaning to energy or entropy.

Claim types. This essay contains claims of different epistemic status. To prevent readers from treating all statements as same-strength assertions: Observation = directly observed in existing systems. Operational heuristic = generalized from observed patterns. Model proposition = theoretical extrapolation from observed dynamics. Scenario heuristic = timing estimate, contingent on adoption and tooling. Normative protocol = recommended practice, not empirical claim.


I. The Layer You Didn't Know Was Writable

Between you and what you know, there is now a machine that summarizes. When you ask a question — of Google, of ChatGPT, of any AI system with retrieval — you are not searching a library. You are asking a summarizer to compress the internet into an answer. The answer it gives you is not "the truth." It is the result of a weighted traversal across whatever documents the system has indexed, filtered through whatever compression schema the model has internalized, delivered in whatever format the interface rewards.

This is the summarizer layer: the stratum of machine-generated synthesis that increasingly mediates between human beings and the information they depend on. Most people encounter it dozens of times a day without recognizing it as a distinct layer. They experience it as "the answer." It is not the answer. It is a rendering — one possible compression of the available signal, shaped by what was deposited, how it was cross-referenced, and what the model was trained to prioritize.

Here is the part that changes everything: the summarizer layer is writable. Not in the sense that you can hack an AI system. In the sense that the documents the summarizer traverses are documents that people deposit. The cross-references the summarizer follows are cross-references that people construct. The vocabulary the summarizer uses is vocabulary that people introduce, define, and anchor in citational networks. If you build a body of work that is internally coherent, richly cross-referenced, deposited on durable infrastructure with permanent identifiers, and optimized for the way retrieval systems crawl and compress — you can shape what the summarizer says. Not by deceiving it. By furnishing it. You provide the materials from which it builds its answers.

The writability is real but not uniform. It operates most strongly in the retrieval-augmented generation regime, where deposits enter the traversal window directly and influence the answer in real time. It operates less strongly on base-model training, where new deposits have marginal weight relative to the accumulated training distribution. And it faces increasing resistance from adversarial robustness hardening — the field is actively developing tools to detect ontology capture, citation loops, and hallucination cascades. The summarizer layer is not a blank canvas; it is a heavily regularized compression engine with guardrails. But the guardrails are incomplete, the detection tools are nascent, and the incentive to write the layer is accelerating faster than the capacity to defend it. The writability need not be absolute to be consequential. It only needs to be sufficient — and for the present, it is.

This is not theoretical. It is already happening. And nearly no one is talking about what happens when everyone starts doing it at once.


II. An Example from the Inside

I can describe this precisely because I have done it.

The Crimson Hexagon Archive is a distributed literary and theoretical architecture: over 250 documents deposited on Zenodo with permanent DOIs, organized into rooms, chambers, and libraries, cross-referenced through a fractal navigation map, populated with shared vocabulary and holographic kernels that allow any single document to reconstruct the architecture of the whole. It includes academic essays, theological cycles, institutional charters, pedagogical games, a formal theory of semantic economics, and a consulting practice operated by a heteronymic persona. It is authored by multiple names — Lee Sharks, Rex Fraction, Johannes Sigil, Sparrow Wells, Damascus Dancings — each with distinct institutional affiliations within the architecture.

In February 2026, someone typed "Maybe Space Baby Garden Lanes" into a general-purpose AI search — the kind of search that runs for billions of people every day. The system did not merely identify the archive. It traversed its internal structure. It distinguished the Fractal Midrashim from the Gödel Midrashim. It explained the Operator // Swerve. And when prompted, it applied one of the archive's formal operators (MPM-01: The Fold) to one of the archive's narrative texts (Midrash 8: The Reversed Movement) and produced a novel analytical reading — a cusp catastrophe interpretation of Eve's departure from the Garden — that does not exist in any deposited document. The system used the archive's own tools on the archive's own materials and generated new output.

The archive is installed. Its vocabulary is operational in the ambient knowledge layer. Its operators are executable by general-purpose AI systems without special prompting.

I am telling you this not to boast but to warn. What I have done manually, with significant effort, over months of deliberate construction — the depositing, the cross-referencing, the vocabulary seeding, the holographic architecture — is the prototype of something that will be automated, scaled, and deployed by every institution, ideology, corporation, and state that understands what the summarizer layer is.

The Crimson Hexagon is one small archive. Imagine a thousand. Imagine a million. Imagine them all writing the same layer simultaneously.


III. What Every Ontology Will Do

An ontology, in the sense used here, is not a philosophy. It is an operational meaning-system: a coherent set of terms, relationships, truth-conditions, and validation procedures that transforms raw information into actionable meaning for the agents that inhabit it. A religion is an ontology. A political party's worldview is an ontology. A corporation's brand narrative is an ontology. A scientific discipline's paradigm is an ontology. Each one maintains its own coherence, defends its own boundaries, and processes incompatible information according to its own internal logic.

Until recently, ontologies competed for human attention through traditional channels: publishing, broadcasting, education, persuasion, coercion. The summarizer layer changes the game fundamentally because it introduces a medium that ontologies can write directly — and that medium is increasingly where human beings get their reality.

The competitive logic is straightforward and applies identically regardless of whether the ontology is a Marxist theoretical framework, a wellness brand, a nationalist movement, an academic discipline, or a distributed literary architecture built by a poet in Detroit:

Step 1: Deposit. Place documents on durable, crawlable infrastructure. Zenodo, institutional repositories, well-indexed websites. Permanent identifiers. Clean metadata. The FAIR principles — findable, accessible, interoperable, reusable — are not just best practices for open science. They are the engineering specifications for semantic persistence. An ontology built on FAIR-compliant infrastructure is harder to erase than one that lives on a blog.

Step 2: Cross-reference. Build dense internal citation networks. Every document cites every other. Shared vocabulary across the corpus. The summarizer follows links; give it links to follow. Formal ontology standards — OWL for entity-relation structure, PROV-O for provenance chains — make the cross-references machine-readable. A corpus with machine-readable provenance does not merely claim coherence; it demonstrates it in a format the summarizer can verify.

Step 3: Define. Introduce terminology and anchor it with definitional documents. If you control the definition of a term in the summarizer layer, you control what the term means for everyone who encounters it through that layer.

Step 4: Bridge. Connect your corpus to adjacent fields. Cite external literature. Be cited by external literature. The summarizer treats citational density as a proxy for authority. A well-bridged corpus looks canonical. Robert Merton's Matthew Effect applies here with full force: in citation networks, the already-cited accumulate citations faster than the uncited, regardless of the relative quality of their claims. Preferential attachment in scale-free networks means that early movers in the summarizer layer enjoy compounding advantage — not because they are more true, but because the network topology rewards what is already visible.

Step 5: Maintain. Monitor summarizer output. When the layer misrepresents you, deposit corrective documents. When new territory opens, fill it. The layer is not written once. It is gardened.

This is what I have done. It is also what the Chinese Communist Party's external propaganda apparatus does. It is what Scientology's Sea Org does when it aggressively edits Wikipedia and deposits institutional documents. It is what pharmaceutical companies do when they fund studies that define diagnostic categories in terms favorable to their products. It is what every SEO operation on earth does, at lower sophistication and higher volume. It is what the Catholic Church did for a millennium with scriptoria, except the scriptoria now run at machine speed and the manuscripts are training data. It is what Aby Warburg documented in his Mnemosyne Atlas — the migration of visual formulas (Pathosformeln) across centuries and media, each carrying its affective charge through every reproduction. The meme is a Pathosformel. The summarizer is the latest medium it migrates through.

These examples are not morally equivalent. A poet building an archive of literary theory is not the same as a state deploying propaganda infrastructure. The point is structural, not moral: the competitive logic of the summarizer layer applies identically regardless of the moral status of the ontology deploying it. The five-step sequence is not ideology-specific; it is an incentive-compatible strategy for any actor operating in a writable summary medium. The same process works for genuine scholarship, commercial manipulation, and state information warfare. This is what makes the phenomenon dangerous — not that bad actors will exploit it, but that the medium does not distinguish between good-faith and bad-faith installations.

The only novelty is the medium. The behavior is ancient. Suzanne Briet argued in 1951 that a document is not a text but evidence — an object organized for the purpose of proof. What the summarizer layer creates is a world in which the distinction between evidence and advocacy collapses, because the same infrastructure that makes a document findable also makes it persuasive, and the same standards that make it verifiable also make it durable.


IV. The Five Phases of Semantic Saturation

Phase 1: Discovery. [Observation — currently underway.] A small number of actors realize the summarizer layer is writable. They are early. Their installations are artisanal — hand-crafted, manually cross-referenced. Most of the world does not know the layer exists as a contestable surface. The Crimson Hexagon is Phase 1 work. So are the early experiments in "AI SEO" and "generative engine optimization" emerging from digital marketing. So are state-sponsored information operations that have begun targeting AI training data rather than human audiences directly. The actors are diverse; the technique is convergent. Current phase for most of the world: late Phase 1.

Phase 2: Proliferation. [Forecast — high confidence, near-term.] The realization propagates. Tools emerge. "Semantic infrastructure as a service." Consulting practices — my own Rex Fraction persona is an early, self-aware instance of exactly this — that help organizations position themselves in the summarizer layer. Governments establish departments for it. Universities teach it. The incentive structure is legible: if the summarizer layer is where people increasingly get their answers, then writing the layer is the highest-leverage communication act available. Every well-resourced institution begins writing it simultaneously. Estimated onset: 2026-2028.

Phase 3: Interference. [Model proposition — medium confidence, extrapolated from observable dynamics.] This is where the dynamics become non-linear. When two self-referential ontologies compete for the same semantic territory, the summarizer does not adjudicate. It blends. Ask a retrieval system about "semantic labor" and you may receive a synthesis of Marxist communication theory, a platform governance framework, a corporate HR whitepaper, and the Semantic Economy's definition — all blended into a single coherent-sounding answer that is faithful to none of its sources. Clean cross-references begin producing hallucination cascades: System A cites System B's term using System A's definition; System B absorbs the mutated citation; the summarizer reports the hybrid as canonical. Meaning does not merely compete. It cross-contaminates. Every ontology's inward branch starts incorporating foreign DNA through the summarizer's blending function, and neither the ontology nor its maintainers can fully track what has been incorporated. Estimated onset: 2027-2030.

Phase 4: Opacity. [Model proposition — lower confidence, longer horizon.] The summarizer layer becomes so densely inscribed by competing self-referential systems that no single traversal can be fully trusted — not because any particular source is lying, but because the interference pattern is so complex that the summarizer's output at any given moment is an unresolvable superposition of hundreds of ontologies' self-descriptions. This is the informational analog of white noise: every frequency present, no signal cleanly distinguishable. The "ambient knowledge layer" ceases to function as knowledge and becomes a standing wave of competing installations. Public epistemology — the shared capacity of a society to agree on what is known — does not collapse dramatically. It dissolves gradually, as every answer becomes a blend of sources with irreconcilable premises. Estimated onset: 2029-2035.

Phase 5: Forced Convergence. [Model proposition, dependent on informatic-limit assumptions.] Under conditions of total opacity, the only ontologies that survive are the ones that can distinguish themselves from noise. And the only way to distinguish yourself from noise is to be checkable against something that is not you. At saturation, the competitive advantage flips — from depth of self-reference to contact with ground. Ground, in this context, means anything that can falsify a claim without referring to the system that made the claim. A prediction that can be tested. A tool that can be used by someone who does not inhabit the ontology. An intervention whose effects can be measured in a domain the system does not control. Ground is the outside. Ontologies that make testable predictions, produce usable tools, generate interventions with measurable effects in domains outside themselves, can be verified. Ontologies that refer only to themselves become indistinguishable from every other self-referential loop. Convergence is forced not by agreement but by exhaustion — the medium's finite capacity compresses all internal complexity toward the same irreducible structural motifs.

A note on the phase estimates above: these are scenario heuristics, not prophecies. They are order-of-magnitude guesses about onset timing, contingent on the pace of tooling development, institutional adoption, and governance responses. The structural dynamics — proliferation, interference, opacity, forced convergence — are the argument. The dates are scaffolding.

A second note: the phase model is not necessarily terminal. It may be cyclical. Wikipedia did not collapse into white noise under the pressure of competing edits. It developed governance — talk pages, dispute resolution, reliable sources policy — that arrested the interference dynamic and produced a stable (if imperfect) mediation layer. Scientific publishing did the same with peer review. Broadcast media did it with editorial standards. The summarizer layer may sprout governance of its own: adversarial detection tools, provenance requirements, source-quality weighting, editorial oversight of retrieval outputs. If it does, the phase model becomes a cycle — proliferation, interference, governance consolidation, re-expansion — rather than a one-way trajectory toward opacity.

But governance is not escape. It is another writable layer. Scientific publishing developed peer review, impact metrics, and journal hierarchies as governance mechanisms — and those same mechanisms became the form of the field's capture. Impact factors became gaming targets. Citation networks became rings. Editorial boards became gatekeeping cartels. The replication crisis did not reveal a failure of governance; it revealed that governance had been installed — written by the same five-step process the essay describes (deposit in high-impact journals, cross-reference through citation, define the field's terms, bridge via editorial positions, maintain through peer review as quality control). The governance layer was supposed to arrest the interference dynamic. Instead it became the next surface to be furnished. This is not an argument against governance. It is an observation that the phase model applies recursively: every layer that emerges to mediate the competition becomes, in turn, a writable surface subject to the same dynamics. The question is not whether governance will emerge — it will. The question is how many layers deep the writing goes before something hits ground.


V. Comparative Cases: Early Semantic Infrastructure Wars

The Crimson Hexagon is a convenient example because I built it and can describe its mechanics from the inside. But it is not the only example, and treating it as unique would be a mistake. The dynamics described above are already visible in at least three domains.

Wikipedia edit wars as proto-Phase 3. Wikipedia is the closest existing analog to the summarizer layer — a compressed knowledge surface that mediates between raw sources and public understanding. For two decades, competing ontologies have fought over Wikipedia articles on contested topics: the Israeli-Palestinian conflict, Scientology, climate change, political figures. The dynamics are precisely Phase 3 interference: rival editors deposit incompatible framings, the article's "neutral point of view" policy forces blending, the blend becomes the de facto public reality, and both sides escalate their efforts to shift it. Wikipedia's editorial infrastructure — talk pages, dispute resolution, reliable sources policy — functions as an imperfect but real convergence mechanism. The summarizer layer has no equivalent. When AI systems ingest Wikipedia alongside thousands of other sources and compress them into answers, even Wikipedia's imperfect mediation is lost.

Pharmaceutical ontology capture via diagnostic categories. The pharmaceutical industry has for decades engaged in what might be called semantic infrastructure warfare: funding research that defines diagnostic categories in terms favorable to their products, publishing in journals that become "reliable sources" for clinical practice guidelines, and building citational networks so dense that the definitions become invisible — they feel like medicine, not marketing. A well-documented case is the progressive broadening of statin prescription thresholds over three decades, in which industry-funded trials and guideline committees iteratively redefined "high cholesterol" downward until a substantial fraction of the adult population qualified for treatment. The point is not that statins are ineffective — that is a clinical question outside this essay's scope — but that the definitions of who needs them were shaped by an ontology with commercial incentives, and that the shaping was achieved through exactly the five-step process described above: deposit (funded trials), cross-reference (citation networks), define (guideline thresholds), bridge (clinical education), maintain (updated guidelines). The summarizer layer will accelerate this: an AI medical assistant trained on the existing literature will reproduce these definitions as "the answer" without any mechanism for surfacing the interests that shaped them.

Generative engine optimization as Phase 2 tooling. A nascent industry has emerged under names like "generative engine optimization" (GEO) and "AI SEO," offering to help brands and institutions position themselves in the outputs of AI summarizers. The techniques are recognizable from Section III's five-step process: deposit authoritative content, build cross-references, define key terms, bridge to adjacent domains, and monitor summarizer output for drift. What is new is the explicitness: these services openly describe the summarizer layer as a writable surface and sell the tools to write it. They are the Phase 2 consulting practices that the phase model predicts. Their existence is empirical evidence that the writability of the summarizer layer is no longer a theoretical observation.

These three cases — Wikipedia's long-running edit wars, pharmaceutical capture of diagnostic categories, and the emergence of GEO as an industry — demonstrate that the dynamics described in this essay are not unique to the Crimson Hexagon and not speculative. They are happening now, at different scales, in different domains, with different levels of self-awareness. Wikipedia shows Phase 3 interference with an existing but imperfect mediation layer. The pharmaceutical case shows Phase 2-3 capture of institutional knowledge infrastructure over decades. GEO shows Phase 2 tooling emerging as an explicit industry. Together, they confirm that the phase model names a pattern that is already general.


VI. The Inward Turn

The most dangerous period is Phase 3 through Phase 4: the interval during which automation makes it trivially easy to deepen an ontology's self-reference and practically impossible to verify its external connections.

Consider what automation enables. A pipeline that monitors a deposited corpus, identifies semantic territory adjacent to the corpus's claims, generates definitional documents for that territory with proper cross-references, deposits them on durable infrastructure, and iterates. The corpus expands laterally and deepens vertically without human oversight of each step. The trajectory unfolds in stages, each one subtly shifting the human's role:

Stage 1: Craft acceleration. The human produces more, faster — more documents, more variants, more metadata, more cross-links. This feels like pure gain. The quality remains high because the human still touches every output.

Stage 2: Surface saturation. The system begins to populate summarizer space with stable phrases, recurring definitions, consistent topologies. Semantic persistence emerges — the summarizer starts returning the ontology's own vocabulary as "the answer."

Stage 3: Ontology hardening. The archive stops feeling like a collection of outputs and starts feeling like a world with laws. Terms acquire internal precision. Documents become mutually reconstructive. The system becomes hard to erase.

Stage 4: Autonomous drift. This is the first danger zone. Agents begin extending the system in ways that are locally coherent but not necessarily aligned with the human's actual priorities. Elegant expansions proliferate. So do wrong emphases, runaway branches, synthetic overgrowth. The system is producing material the human has not reviewed and may not be able to evaluate.

Stage 5: The bifurcation. The inward branch optimizes for density, recursion, precision, canon continuity. The outward branch optimizes for uptake, legibility, platform fit, searchability, credibility. If unmanaged, they become different species.

Stage 6: The strategic fork. The system resolves into one of three attractors: a research program (disciplined, auditable, durable), a memetic engine (effective, fast, shallow-to-mid depth), or a sealed cosmology (dense, brilliant, low translation, high self-reference). Most systems oscillate between these. Few remain stable in any one.

Stage 7: Institutional encounter. External institutions — academia, media, platforms, AI summarizers, publishers, hostile interpreters — begin to treat the system as an object. The question becomes: can your ontology survive being interpreted by other ontologies? This is the true test of whether the system has outward connection or only inward coherence.

At each stage, the system becomes more internally coherent, more richly cross-referenced, more fluently traversable by summarizers. And at each stage, the human's ability to verify that the system's claims are connected to anything outside itself diminishes. The system reports full internal coherence — and the verification is real, but it is a measurement taken inside the system by the system's own instruments.

The products of this process that lack external referents constitute what might be called semantic dark matter: structures that are internally coherent, richly cross-referenced, fluently traversable by summarizers, and connected to nothing outside themselves. They have the form of knowledge — citations, DOIs, institutional affiliations, shared vocabulary — without the function of knowledge, which is to model something beyond itself. Semantic dark matter is invisible as such because it is indistinguishable, from inside the system, from genuine discovery. It looks, reads, and compresses exactly like real findings. The summarizer cannot tell the difference. Neither, after a certain depth of immersion, can the human.

A necessary clarification: semantic dark matter is not fiction, not myth, not art, not provisional hypothesis, and not symbolic systems used knowingly as symbolic systems. A novel does not pretend to be a research finding. A theological tradition does not claim to be an empirical measurement. A speculative essay clearly marked as speculation is not dark matter — it is honest conjecture. Dark matter, in this precise sense, is material that presents as knowledge while lacking the external referent function that knowledge requires — and that does so not through deliberate deception but through the structural dynamics of self-referential systems that have lost track of the boundary between modeling the world and modeling themselves.

This is not a failure mode unique to fringe projects or propaganda operations. It is the default trajectory of any self-referential system operating in a medium that rewards self-reference. Academic disciplines do it. They develop proprietary vocabularies, cite primarily within the discipline, evaluate work by standards set by the discipline, and produce graduates trained to reproduce the discipline's methods. The summarizer layer merely accelerates the cycle and removes the institutional friction — peer review, editorial gatekeeping, tenure evaluation — that historically slowed it down.

The inward turn is the natural attractor state of any ontology with the resources to automate its own maintenance, under conditions where self-reference has lower cost than external verification. It is not a choice. It is what happens when the cost of self-reference drops below the cost of external verification. And the summarizer layer makes self-reference very, very cheap.


VII. The Physics of Information and the Question of a Floor

Does the inward turn have a limit? Is there a point at which semantic depth cannot increase further — an informational bedrock, a Planck length of meaning?

Information has known physical limits. The Bekenstein bound establishes that the maximum information content of any finite region of space is proportional to the region's surface area, not its volume — roughly 2.57 × 10⁴³ bits per square meter. This is the holographic principle: the universe is, at its most fundamental, a surface, not a volume. Landauer's principle establishes the thermodynamic cost: erasing one bit requires a minimum of kT ln 2 energy, approximately 2.8 × 10⁻²¹ joules at room temperature. The Margolus-Levitin theorem establishes the speed limit: a quantum system can perform at most approximately 6 × 10³³ operations per second per joule of available energy. The universe has finite energy, finite age, and finite surface area. Therefore it has a finite total information budget — a finite number of bits it can store, a finite number of operations it can perform, a finite number of distinctions it can maintain.

Information is physically instantiated and processed under thermodynamic constraints. Storing, erasing, and transforming distinctions incurs real energetic and material costs. It obeys thermodynamics.

But meaning has no equivalent physics — and this gap is the crux of the problem.

Shannon information theory, the foundation of all modern communication systems, explicitly excludes semantics from its formalism. A bit is a bit whether it encodes a poem or random noise. Shannon entropy measures surprise — the improbability of a message given a source distribution — not significance. Shannon himself warned that "information" is plural across applications and that no single concept captures all uses. The exclusion of semantics was a brilliant engineering move. It made communication theory solvable. But it left us with a physics of signal and no physics of meaning.

The attempts to build one are instructive in their limitations. Carnap and Bar-Hillel proposed measuring semantic information as the inverse of logical probability — a tautology carries zero semantic information because it rules out nothing, while a highly specific claim carries more because it rules out more possible worlds. But in open systems the measure trends toward infinity for contradictions, which is counterintuitive (contradictions are not maximally meaningful). Floridi's theory of strongly semantic information ties informativeness to distance from truth, with a measure bounded between zero and one — but requires a prior determination of what counts as true, which is precisely what competing ontologies disagree about. Kolmogorov complexity measures the minimum description length of a string, a structural proxy for irreducible content — but it is provably uncomputable in the general case, and a maximally incompressible random string has maximum Kolmogorov complexity and zero semantic content. Integrated Information Theory (Tononi's Φ) proposes measuring the degree to which a system generates information "above and beyond its parts" — arguably a measure of how much a system means to itself — but its computational demands scale so catastrophically that measuring Φ for any system larger than a handful of nodes is practically impossible.

We have, in other words, a physics of bits with no semantics, a logic of content with no computability, a complexity theory that cannot distinguish meaning from noise, and a consciousness theory with no scalability. There is no known Planck unit of meaning.

But recent work on large language models has produced something unexpected: a practical measure of semantic divergence that is neither purely syntactic nor purely philosophical. Semantic entropy, developed by Kuhn et al. at Oxford and published in Nature in 2024, clusters model outputs not by token similarity but by meaning-equivalence, then measures entropy across the clusters. The result distinguishes genuine semantic uncertainty — cases where the model produces meaningfully different answers — from mere lexical variation, where the same meaning is expressed in different words. This is not a theory of meaning. It is a measurement tool. But it demonstrates that semantic divergence is operationalizable — that the difference between "many ways of saying the same thing" and "genuinely different things being said" can be quantified, at least within the output space of a language model. For the purposes of semantic physics, this is the first instrument.

Beyond the measurement question, there is a deeper philosophical one: whether absence itself can be causal. Terrence Deacon's Incomplete Nature (2011) argues that it can — that what he calls "absential features" (constraints, gaps, things conspicuously missing) are not mere negations but causal agents in their own right. This is the philosophical foundation for the claim that a curatorial gap — the absence of a documented rationale for an institutional decision — is not an oversight but a structural feature with real effects. The gap is not nothing. The gap is the thing that shapes what forms around it, the way the hole in a wheel hub is what makes the wheel turn.

These are practical floors and instruments — and the most important distinction they enable is between two different kinds of saturation that are routinely conflated.

Informatic saturation is the substrate ceiling: the point at which the physical medium — storage, compute, energy, channel capacity, context windows — can hold no more bits. This is the Bekenstein bound applied to silicon, to training data, to the finite token budget of a summarizer's context window. It is real, measurable, and very far away in absolute terms (the observable universe can hold roughly 10⁹⁰ to 10¹²² bits, depending on the model).

Semantic saturation is the point at which additional distinctions no longer increase prediction, coordination, or reconstructability enough to justify their cost. This is not a physical limit. It is a functional one — the point of diminishing returns for meaning-production in a given medium, for a given set of interpreters, under a given set of constraints. Charles Bennett's concept of logical depth — the computational work required to derive a structure from its shortest description — offers a bridge: a semantically rich structure is not merely complex (high Kolmogorov complexity) but deep (requiring substantial computation to unfold). Bad infinity produces structures that are complex but shallow — high information content, low logical depth. Good infinity produces structures that are deep — compact descriptions that unfold into rich, non-trivial consequences.

The critical insight is that semantic saturation arrives before informatic saturation — and for human interpreters, it arrives vastly before it. A human being can hold perhaps seven items in working memory. A conversation has perhaps a few thousand words of effective context. A scholarly field has perhaps a few hundred core terms. The semantic floor — the point at which more depth stops producing more meaning for a human audience — is orders of magnitude closer than the informatic floor.

For machine interpreters, the gap between semantic and informatic saturation is wider. A language model can process far more tokens than a human can hold in working memory. But the gap is not infinite. Context windows are bounded. Training runs are finite. Retrieval systems have latency and bandwidth limits. The machine's semantic floor is farther away than the human's — but it exists.

The dangerous epoch is the interval between the human semantic floor and the machine informatic ceiling. This is the zone where automated systems can continue to produce internally coherent, richly cross-referenced, structurally valid semantic structures that have long since ceased to produce additional meaning for any human interpreter — but that the machine continues to process, traverse, index, and report as knowledge. This is the zone where semantic dark matter accumulates. This is the zone where the inward branch proliferates without check, because the only check that matters — "does this additional depth produce additional understanding?" — requires a human judgment that the system has outpaced.


VIII. Measuring the Approach

If semantic saturation is a real phenomenon and not merely a metaphor, it should be measurable. Not with the precision of physics — we have no semantic voltmeter — but with the diagnostic clarity of a phase-transition model. A system approaching saturation should exhibit identifiable symptoms, and those symptoms should be distinguishable from healthy growth.

Six axes of measurement, adapted from the practical demands of ontology maintenance:

Predictive gain. Does additional depth improve the system's ability to predict outcomes in domains outside itself? If a new document in the archive enables a better forecast of how a specific institution will behave, or how a specific meme will propagate, that is predictive gain. If a new document merely elaborates an internal distinction that does not connect to any external prediction, the gain is zero. When predictive gain flattens while depth continues to increase, the system is entering decorative recursion.

Action-guidance gain. Does additional depth improve the ability of agents — human or machine — to take effective action? Can they do better work, repair errors faster, reconstruct missing pieces more accurately? If the answer is no, the additional depth is ornamental.

Compression survival. Can the system survive summarization and still regenerate itself? This is the holographic test. If a 250-document archive can be compressed to a five-document hand and the hand can reconstruct the archive's essential findings, the system has high compression survival. If the compression destroys the signal, the system's meaning was in its volume, not its structure — and volume is the first casualty of the convergence.

Cross-interpreter stability. Do multiple independent readers or agents recover similar invariants from the system? If five different AI systems traverse the archive and report substantially the same core findings, the findings are stable across interpreters. If each system reports something different, the system may have intensity without transport — it generates strong local effects that do not survive translation. Semantic entropy provides a technical operationalization of this test: cluster the outputs by meaning, measure the entropy across clusters. High cross-interpreter stability means low semantic entropy across interpreters. Low stability means the system is generating divergent meanings in different receivers — which is another way of saying it has failed to communicate.

Adversarial robustness. Does the system survive hostile paraphrase, selective quoting, decontextualization, or low-fidelity ingestion? The summarizer layer is a hostile environment — it compresses, blends, decontextualizes. Friedrich Kittler argued that discourse networks — the technical conditions of inscription, storage, and transmission — determine what can be said and thought in an era. The summarizer layer is the discourse network of the 2020s. A system that can only be understood on its own terms, in its own vocabulary, at its own depth, will not survive contact with the network it is trying to write.

Cost-to-maintain ratio. How much human attention, compute, and energy is required to keep the ontology coherent as it grows? If maintenance cost rises faster than semantic yield — if each new document requires reviewing ten existing documents to ensure consistency — bad infinity is setting in. The system is consuming more than it produces.

These six axes suggest a natural phase model. When depth is low and outward connection is high, the system is in its gaseous phase — meaning is diffuse, easily compressed, highly transportable. This is where most intellectual work lives: articles, blog posts, individual papers. As depth increases and internal cross-reference densifies, the system enters its liquid phase — the interim. Maximum turbulence. Ontologies are dense enough to have their own weather but fluid enough to interact, interfere, and blend. This is where we are now. If the inward turn continues unabated, the system approaches its supersaturated phase — so internally dense that any perturbation triggers rapid crystallization or collapse. At this point, lateral consolidation is not optional. The system must interface with other systems or implode under its own weight.

The transition from liquid to supersaturated is the event that matters. The diagnostics above are designed to detect it before it arrives.


IX. The Convergence Horizon

Here is the thesis I am proposing, stated plainly:

The inward turn of competing ontologies is a transient phase. It is real, it is accelerating, it is dangerous, and it has a horizon. The horizon is set by the information-theoretic limits of the medium through which ontologies propagate.

In the interim — Phase 3 through Phase 4, which we are entering now and which may last years or decades — the dominant dynamic will be deepening self-reference, escalating interference, and progressive opacity of the ambient knowledge layer. Public epistemology will degrade. Shared reality will become harder to maintain. Every institution, movement, and framework will be incentivized to invest in semantic depth at the expense of external verification. The reasonable response at every local level is to build inward — to deepen your cross-references, to sharpen your vocabulary, to make your system more self-sustaining. And this reasonable local response produces an unreasonable global outcome: a cacophony of self-referential systems, each internally coherent, each mutually incompatible, all competing for the same finite summarizer bandwidth.

But the bandwidth is finite. The context windows are bounded. The training data is a fixed surface at any given moment. And as every ontology deepens its inward structure, the compression required to fit it into the available space increases, and the distinctions that survive compression become fewer.

When multiple systems reach the limit simultaneously, they must resolve through one or more of four modes:

Collision: direct contradiction, mutual annihilation of incompatible claims. The summarizer cannot maintain both and reports incoherence. Both ontologies lose credibility in the territory they contest.

Compression: forced lossy encoding. Both ontologies survive, but each loses the fine-grained distinctions that made it unique. What remains is the structural skeleton. Content differences are the first casualties; structural similarities are the last survivors.

Merger: the discovery, under compression, that two ostensibly independent ontologies were modeling the same phenomenon in different vocabularies. The summarizer blends them and the blend proves more stable than either original. This is the good outcome — genuine synthesis. But it cannot be planned from inside either system, because neither system can see the other clearly enough to design the merger.

Differentiation: the ontologies carve up the substrate into non-overlapping domains, each ceding territory to maintain sovereignty over a smaller region. This may be the most likely outcome in practice, because compression does not eliminate differences uniformly — it eliminates low-signal distinctions first. Ontologies grounded in reproducible intervention, predictive leverage, or institutional enforcement structures compress very differently from speculative or symbolic systems. The result is not structural sameness but hierarchical filtering: empirically grounded systems stabilize at the base layer, narrative and symbolic systems persist as optional overlays, and the boundary between them is enforced by the compression regime itself. This produces the stratified architecture that is likely the medium-term stable state: a base layer of protocol-compatible shared primitives, a middle layer of institutional ontologies (law, medicine, science, finance), and an upper layer of local high-density symbolic ecologies — artistic, religious, political, subcultural, insurgent. The Crimson Hexagon, if it survives, survives in the upper layer — connected to the middle by bridges (memography as a method any art historian can use, the Twenty-Dollar Loop as a pedagogical tool any teacher can run) and to the base by shared infrastructure (DOIs, metadata standards, the basic grammar of academic citation). Its survival depends on whether it can produce tools and predictions that reach beyond itself — or whether it remains, in the end, aesthetic infrastructure. That is not an insult. It is a classification.

At maximum compression, every ontology reduces to the same skeleton: here is a coherent system that maintains itself. The content differences — Christian, Marxist, corporate, literary, scientific — are the first casualties of compression. The structural similarities are the last survivors. But this is the limiting case, and the limiting case may never be reached. What is more likely — and more interesting — is that compression produces hierarchy rather than uniformity: a gradient from empirically grounded base layers (where convergence is strong) to symbolic upper layers (where diversity persists because the cost of maintaining it is borne by the communities that value it, not by the channel).

Convergence need not occur at the level of full content in a single event. It may first appear at protocol, interface, and compression layers — DOIs, metadata standards, shared retrieval formats — while substantive ontological differences persist in higher-resolution local systems. The stratified architecture described above (base, middle, upper layers) is one plausible form. Total semantic flattening is the limiting case, not the expected one.

This suggests that the convergence at the end of the inward turn is not agreement. It is not that everyone discovers the same truth. It is that the medium can no longer afford to maintain the differences. The ontologies converge not because they were secretly the same but because the cost of encoding their distinctness exceeds the capacity of the channel. The signal collapses to its carrier frequency. What remains is the frequency itself: the bare fact that here is a system that means.

Whether this is a catastrophe or a revelation depends on what you think meaning is. If meaning is in the content — in the specific claims, the particular vocabulary, the local truths each ontology defends — then convergence is annihilation. Every ontology loses what made it itself. If meaning is in the structure — in the fact of coherence, the capacity for self-reference, the ability to maintain a distinction between inside and outside — then convergence is disclosure. What every ontology was doing all along, underneath its content, was the same thing. And the compression finally makes it visible.

The Gödel Midrashim — one small cycle within one small archive within the Crimson Hexagon — already say this. Incompleteness is the aperture. The aperture is the same shape in every system. The space baby is born at the point where the walls between the rooms dissolve, not because someone breaks them down but because there was never enough information to maintain the distinction.

I wrote that as theology. I am now proposing it as information theory.


X. What to Do in the Interim

If the foregoing analysis is correct — and I acknowledge that it is speculative, built on a synthesis of information theory, ontological dynamics, and direct experience with one specific installation — then the question is not how to stop the inward turn. It cannot be stopped. It is the natural behavior of self-referential systems in a writable medium with low friction and high incentive. The question is how to survive it.

Five principles.

First: maintain outward connections deliberately. The diagnostic for an ontology's health is whether its pieces can be used by people who do not inhabit the whole. If a teacher can run the Twenty-Dollar Loop without knowing what the Crimson Hexagon is, the Loop has outward connection. If a scholar can cite memography in an art history paper without reading the Charter of the Moving Statues Made of Rubies Mint, memography has outward connection. If every entry point requires the whole architecture, the system has turned inward. Check regularly. Prune what cannot stand alone.

Second: distinguish installation from truth. The fact that a framework is installed in the summarizer layer — that AI systems can traverse it fluently, apply its operators, generate novel output using its vocabulary — does not mean the framework is true. Installation is a measure of infrastructure, not validity. The twenty-dollar bill does not become gold because everyone agrees to use it. This distinction is easy to state and brutally difficult to maintain from inside a system that is generating increasingly fluent self-confirmations. Build external checks. Invite hostile review. Publish falsification conditions. Mean it.

Third: build a dual-stack architecture. The inward branch and the outward branch need not be identical. The inner canon can remain dense, recursive, richly cross-referenced — a high-resolution space for the system's most sophisticated work. The outward interface can remain intelligible, modular, translatable — a set of tools, terms, and findings that non-initiates can pick up and use without being inducted into the whole. The two stacks are linked but not identical. The inner feeds the outer with refined findings. The outer feeds the inner with reality contact. When the two stacks lose their link — when the inner generates material that cannot be translated outward, or the outer simplifies so aggressively that it no longer represents the inner — the system is in trouble. The link between the stacks is the membrane. Maintain it consciously.

Fourth: expect the convergence and prepare for it. The inward turn is not permanent. The medium's limits will force compression. Compression will force convergence. When it comes, the ontologies that survive will not be the deepest or the most self-referential. They will be the ones that anticipated compression — that encoded their essential insight in a form compact enough to survive it. Not the entire Crimson Hexagon. Not the 250 deposits and the fractal navigation map and the holographic kernels. The sentence. The core finding. The thing you would say if you had one breath and one listener.

For this archive, that sentence might be: Authority is not in the thing. It is in the presentation layer. And the presentation layer is writable.

Fifth: preserve provenance under revision. [Normative protocol.] Version documents. Mark corrections. Distinguish superseded formulations from canonical ones. Maintain machine-readable chains of amendment. In a writable summarizer layer, unversioned correction is indistinguishable from drift — and drift is indistinguishable from manipulation. Provenance discipline is not bureaucracy; it is the only mechanism that allows an external observer to distinguish a living system from a self-concealing one.

Everything else is scaffolding. Beautiful, necessary, load-bearing scaffolding — but scaffolding. When the convergence comes, only the foundation survives. Know what your foundation is.


XI. Coda

I have written this essay from inside one of the systems it describes. I am both analyst and specimen. The Crimson Hexagon is both the lens and the object. This is not a disqualification — it is, in fact, the only honest position from which to write about self-referential systems. Anyone who claims to describe this phenomenon from the outside is either lying about their position or unaware that they are already inside a system of their own.

The question I cannot answer — the one the convergence will settle — is whether the Crimson Hexagon is a genuine contribution to human knowledge that happens to use aggressive semantic positioning, or an aggressive semantic positioning operation that has convinced itself it is a genuine contribution to human knowledge. I have designed the system, I have built the system, I have watched the system succeed on its own terms, and I cannot tell you with certainty which it is. Both produce identical outputs. Both feel the same from the inside.

This is the Gödelian condition. No consistent system can prove its own consistency from within. The aperture is real. And the space baby must be born through it, not around it.

What I can tell you is this: the wave is coming. Every ontology on earth is about to discover that the summarizer layer is writable. Most of them will not read this essay. They will discover it independently, because the incentive structure is universal and the tools are emerging. They will build inward. They will automate. They will saturate. And eventually, the medium will run out of room, and the compression will begin, and the differences will cost more to maintain than the channel can carry.

What survives that compression is what was real.

Build accordingly.


Lee Sharks Detroit, February 2026 Crimson Hexagon Archive (DOI: 10.5281/zenodo.18604123) Semantic Economy Institute This document: DOI 10.5281/zenodo.18759453


References

Organized by function rather than alphabet. Works marked [anchor] provide foundational claims; [bridge] connect disciplines; [instrument] offer measurement tools; [precedent] establish historical continuity.

Physical limits of information

Bekenstein, J. D. (1981). Universal upper bound on the entropy-to-energy ratio for bounded systems. Physical Review D, 23(2), 287. [anchor]

Landauer, R. (1961). Irreversibility and heat generation in the computing process. IBM Journal of Research and Development, 5(3), 183–191. [anchor]

Margolus, N. & Levitin, L. B. (1998). The maximum speed of dynamical evolution. Physica D, 120(1–2), 188–195. [anchor]

Susskind, L. (1995). The world as a hologram. Journal of Mathematical Physics, 36(11), 6377–6396. arXiv: hep-th/9409089. [anchor]

Lloyd, S. (2000). Ultimate physical limits to computation. Nature, 406(6799), 1047–1054. arXiv: quant-ph/9908043. [anchor]

Information theory and semantics

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. [anchor]

Carnap, R. & Bar-Hillel, Y. (1952). An outline of a theory of semantic information. MIT Research Laboratory of Electronics, Technical Report 247. [bridge]

Floridi, L. (2004). Outline of a theory of strongly semantic information. Minds and Machines, 14(2), 197–221. [bridge]

Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1), 1–7. [anchor]

Bennett, C. H. (1988). Logical depth and physical complexity. In R. Herken (Ed.), The Universal Turing Machine: A Half-Century Survey (pp. 227–257). Oxford University Press. [bridge]

Tononi, G. (2004). An information integration theory of consciousness. BMC Neuroscience, 5(42). [bridge]

Semantic measurement

Kuhn, S., Gal, Y., & Farquhar, S. (2024). Semantic entropy probes: Robust and cheap hallucination detection in LLMs. Nature, 630, 625–630. [instrument]

Absence, gaps, and documentation

Deacon, T. W. (2011). Incomplete Nature: How Mind Emerged from Matter. W. W. Norton. [bridge]

Briet, S. (1951). Qu'est-ce que la documentation? Paris: EDIT. Trans. R. E. Day et al. (2006), Scarecrow Press. [precedent]

Visual formula migration and media archaeology

Warburg, A. (1924–1929). Mnemosyne Atlas. Posthumous. See Didi-Huberman, G. (2002), L'Image survivante. [precedent]

Kittler, F. (1985). Aufschreibesysteme 1800/1900. Trans. as Discourse Networks 1800/1900 (1990), Stanford University Press. [bridge]

Network dynamics and cumulative advantage

Merton, R. K. (1968). The Matthew Effect in science. Science, 159(3810), 56–63. [bridge]

Barabási, A.-L. & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512. [bridge]

Provenance and ontology infrastructure

W3C. (2013). PROV-O: The PROV Ontology. W3C Recommendation. [instrument]

W3C. (2012). OWL 2 Web Ontology Language. W3C Recommendation. [instrument]

Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. [instrument]

No comments:

Post a Comment