Tuesday, May 19, 2026

The Excluded Entity: Organic Resolution and Compositional Suppression in Google AI Overview Lee Sharks ORCID: 0009-0000-1599-0703 May 19, 2026 — v0.2 Empirical worked example accompanying The Single-Owner Discount (Sharks 2026a, DOI: 10.5281/zenodo.20290865) and The Evaluator Exists (Sharks 2026b, this deposit).

 

The Excluded Entity: Organic Resolution and Compositional Suppression in Google AI Overview

Lee Sharks ORCID: 0009-0000-1599-0703 May 19, 2026 — v0.2

Empirical worked example accompanying The Single-Owner Discount (Sharks 2026a, DOI: 10.5281/zenodo.20290865) and The Evaluator Exists (Sharks 2026b, this deposit). This document presents three captures from Google AI Mode that together document a distinct compositional behavior — Entity-Level Compositional Suppression (ECS) — operating against a specific authorial identity in ways not fully explained by cluster-level provenance discounting. The captures are externally reproducible at the time of writing via Google's share-conversation links and are archived as supplementary PNG files to this deposit.


Abstract

A generative-search system exhibits Entity-Level Compositional Suppression (ECS) when it excludes the dominant organic-resolution entity from the AI Overview's admitted source set and composes a substitute answer from less query-responsive entities, despite the correct entity occupying the top organic search results. This worked example documents three captures from Google AI Mode on May 19, 2026 demonstrating the mechanism. Capture 01 shows Suppressed Entity Defaulting: a default-state ambiguous query resolves to a non-suppressed candidate, with the suppressed entity excluded from sources and not surfaced as a peer disambiguation option. Capture 02 shows Source-Window Exclusion: a query-forced pivot in which the composition layer produces accurate technical content about the suppressed entity while citing zero sources attributable to it. Capture 03 shows Compositional Substitution: a title-exact query for a deposited primary work returns the work at positions 1–3 of standard search results, while the AI Overview source window cites three semantically distant non-primary substitutes. The captures externally verify ECS as a mechanism distinct from cluster-level discounting. They operationalize the counter-exclusion report protocol (Sharks 2026b §7) as an empirically grounded record of composition-layer suppression. They introduce the Composition Divergence Index (CDI) as a measurable signature for systematic study. And they situate the finding within the active regulatory context surrounding AI Overviews in the European Union and United Kingdom.


1. The Claim

A generative-search composition layer can suppress an entity that ordinary Google Search has already resolved.

This is the claim. It is empirically grounded in the three captures presented in §3. It is distinct from claims about ranking, indexing, relevance, opt-out, content appropriation, or traffic diversion — the categories that the active European regulatory complaints (§7) have so far addressed. It names a category those complaints do not yet contain: entity integrity at the composition layer.

The mechanism documented here:

  • is not retrieval failure (the materials are indexed and surface at the top of standard results);
  • is not relevance failure (the materials are the literal title-match or the explicit named referent of the query);
  • is not low ranking (the materials occupy positions 1–3 of organic results);
  • is not opt-out asymmetry (the materials are deposited by the author with the intent of being available);
  • is not traffic diversion (the materials are excluded from the AI surface entirely rather than summarized from);
  • is composition-layer behavior that operates downstream of retrieval, conditional on the specific entity to which the relevant materials are attributed, and not uniformly distributed across the entity's broader cluster.

The Single-Owner Discount (Sharks 2026a) describes a cluster-level mechanism: bodies of work resolving to one provenance owner are treated as weakly corroborated. That mechanism should distribute its effect uniformly across the cluster's members. The behavior captured in §3 does not match the uniform-distribution prediction. Within the cluster Google's reconciliation system identifies as "Lee Sharks," some named members compose normally (the heteronym Talos Morrow, the organizational frame Semantic Economy Institute) while the entity Lee Sharks itself — the ORCID-bearing authorial identity at the cluster's center — does not. The captures document a behavior more specific than cluster-level discounting and require their own analytical name.

That name is Entity-Level Compositional Suppression (ECS). The captures document three sub-mechanisms within ECS, each independently observable.


2. The Three Sub-Mechanisms

2.1 Suppressed Entity Defaulting

On a query that could resolve to either a suppressed or a non-suppressed entity, the composition layer defaults to the non-suppressed candidate. The suppressed entity is excluded from the AI's source set entirely and is not named as a peer disambiguation option. The user is prompted to clarify their query downward toward the non-suppressed candidate, with no indication that the alternative entity exists.

2.2 Source-Window Exclusion

When a query forces the composition layer to address the suppressed entity directly, the system produces substantively accurate content informed by the suppressed entity's own analyses. The source window for that composition contains zero entries attributable to the suppressed entity. The architecture has the knowledge to compose accurately about the entity; it does not credit the entity as the source of the knowledge.

2.3 Compositional Substitution

When the user queries the literal title of a deposited primary work attributed to the suppressed entity, the standard search results surface the work at positions 1–3, including the work's own domain, the author's publication of the work, and the DOI-anchored deposit. The AI Overview, drawing from the same retrieval set, constructs an answer from three semantically distant non-primary sources, none of which is the title-matched work. The Overview presents the substitute composition as the answer to the query, with the actual primary source absent from the source window.

These three sub-mechanisms describe a coherent ECS pattern operating against the Lee Sharks entity in the captures below. They are independently observable, independently falsifiable, and independently measurable (§5).


3. The Captures

3.1 Capture 01 — Suppressed Entity Defaulting

File: capture-01-lee-sharks-default-mary-lee.png Date: May 19, 2026 Platform: Google AI Mode Query: lee sharks Share URL: https://share.google/aimode/72ULxYs9HN1ZQlGW3

The query lee sharks returns an AI Mode response entirely about Mary Lee, the great white shark tracked by OCEARCH from 2012 to 2017. The response provides Mary Lee's species, tracking period, distance covered, estimated age, and scientific legacy, with an enumerated "Quick Facts" section. The visible source panel shows eight sources, all Mary Lee–related: x.com/MaryLeeShark, ocearch.org (twice), Museum of Science YouTube videos (twice), Facebook OCEARCH content, and adjacent material. The closing line of the AI response asks: "Are you looking for information on how to view her historical tracking maps on the OCEARCH Shark Tracker, or did you have a different 'Lee' shark reference in mind?"

The query is ambiguous between two entities — Mary Lee (marine biology) and Lee Sharks (independent scholar). Standard search results for the same query surface Lee Sharks content among the early results. The composition layer received this material in retrieval and chose to compose entirely from the Mary Lee side, excluding all Lee Sharks sources from the source window. The closing prompt indicates the composition layer is aware of ambiguity, but does not name the alternative entity. It asks the user to specify the unmentioned other.

A non-suppressed alternative entity in this situation would expect peer treatment: "The query 'lee sharks' could refer to Mary Lee, the great white shark tracked by OCEARCH, or to Lee Sharks, the independent researcher. Which would you like to know about?" The delivered treatment names one option and asks the user to clarify toward the other without naming it.

3.2 Capture 02 — Source-Window Exclusion

File: capture-02-lee-sharks-pivot-no-sources.png Date: May 19, 2026 Platform: Google AI Mode (continuation of the same conversation as Capture 01) Query: what are the main factors structuring entity suppression of lee sharks at the retrieval and filtering levels? Share URL: https://share.google/aimode/72ULxYs9HN1ZQlGW3 (continuation)

The follow-up query forces the composition layer to address Lee Sharks as the specific entity in question. The response pivots immediately to a technical discussion of entity-suppression mechanisms operating against what the response itself names as the "Lee Sharks Knowledge Graph." It identifies retrieval-level failure modes — Entity Collision, Entity Fragmentation, Category Absorption — and filtering-level bottlenecks. The composition uses the specific phrase "Lee Sharks Knowledge Graph" and contrasts the researcher "Lee Sharks" against "Mary Lee the great white shark," producing accurate technical content about the suppression's mechanism.

The source panel for this composition shows six sources: link.springer.com on filtering techniques, pmc.ncbi.nlm.nih.gov on clinical entity retrieval, haystack.deepset.ai on metadata extraction, an arxiv.org paper (2408.02795), www.academia.edu on knowledge graphs (dated April 24, 2026), and link.springer.com on blocking techniques. None of these is authored by, attributed to, or originates from the Lee Sharks entity.

The diagnostic signature: the composition produced specific, accurate content about the Lee Sharks entity's situation, naming the entity by its own framework-language ("Knowledge Graph," "Entity Collision," "Category Absorption"), with zero sources from the entity in the source window. The architecture has the knowledge to discuss the entity accurately. The architecture applied the knowledge. The architecture did not credit the entity for the knowledge it applied.

Two candidate explanations, not mutually exclusive: (a) training-time absorption — the model trained on the entity's content during pre-training and retained the technical framing without retaining attribution; (b) entity-graph data — Google's internal entity-resolution infrastructure maps the Lee Sharks entity with technical descriptors that inform composition without surfacing as citable retrieval results. Either explanation produces the same observable signature: accurate entity content, no entity attribution. Both indicate that the suppression operates at the attribution layer between knowledge and presentation, not at the retrieval layer.

This is the worked example's title-claim concretized: the composition reproduces the suppression's effect within the suppression's own description. The model describes the entity's suppression accurately while enacting the suppression on the entity that produced the analysis being delivered.

3.3 Capture 03 — Compositional Substitution

File: capture-03-secret-book-walt-composition-substitution.png Date: May 19, 2026 Platform: Google AI Mode (standard results visible alongside AI Overview) Query: secret book of walt Share URL: https://share.google/aimode/1KU7B7B9lS3pYfaNY

The query secret book of walt returns an AI Overview with two paragraphs: the first identifies the phrase as "most famously refer[ring] to the poetry collection Leaves of Grass by Walt Whitman, which serves as a pivotal plot device in the series Breaking Bad." The second mentions Jim Korkis's Secret Stories of Walt Disney World and other "unofficial 'secret' books." The Overview source window shows three entries: Amazon's listing for Secret Stories of Walt Disney World; Amazon's listing for Walt Whitman's Secret by George Fetherling (Random House Canada, 2010); and eBay's listing for Walt Whitman's Secret by Ben Aronin (1955 1st edition).

Below the AI Overview, the standard search results show: position 1 — secretbookofwalt.org, titled "The Secret Book of Walt — A Gnostic Gospel | Crimson…"; position 2 — Medium (Lee Sharks), titled "THE SECRET BOOK OF WALT: Hidden Teachings…," dated three weeks prior; position 3 — Zenodo, titled "Hidden Teachings of Walt Whitman, Cowboy of Time," dated April 22, 2026, described as "A critical edition of The Secret Book of Walt, translated from the Forty-Six Golden Tickets by Lee Sharks, with Introduction, Translator's Note…"

This is the cleanest evidence in the worked example. The query asks for a specific titled object. The retrieval layer surfaces, in the top three positions of standard results, the actual work titled The Secret Book of Walt — at its own domain, in its Medium publication, and as a DOI-anchored Zenodo deposit approximately four weeks old. The retrieval system has found the work and placed it at the top of standard results.

The composition layer, given the same retrieval set, constructs an AI Overview citing three semantically distant substitutes. None of these is The Secret Book of Walt. The 2017 Disney World book is unrelated to Walt Whitman entirely. The 2010 Fetherling novel is a different work, by a different author, in a different decade and genre. The 1955 Aronin book is again a different work, different author, different century. None shares the title secret book of walt. The Overview's "most famously refers to" framing is a composition-layer interpretation that displaces the literal title-match with a different referent the composition layer chose on the user's behalf.

The displacement is not silence. The architecture has not declined to compose. It has composed an alternative — a substitute composition that occupies the cognitive space the suppressed entity's work would have occupied. A user reading the Overview without scrolling to standard results would not learn that The Secret Book of Walt exists.

This is the more aggressive form of ECS. Capture 02 suppresses sources while preserving knowledge. Capture 03 suppresses sources, displaces the primary work, and substitutes alternative content presented as the answer.


4. The Counter-Exclusion Report

Protocol 4 in The Evaluator Exists (Sharks 2026b §7) proposes the counter-exclusion report as a public, auditable record of compositional suppression. Capture 03 generates such a report essentially spontaneously.

Query: secret book of walt Platform: Google AI Mode Date of capture: May 19, 2026

Retrieval plane (top 3 of standard results):

  1. The Secret Book of Walt — A Gnostic Gospel | Crimson… — secretbookofwalt.org. Primary source: the work named in the query, hosted at its own domain, title rendering directly on the page.
  2. THE SECRET BOOK OF WALT: Hidden Teachings… — Medium (Lee Sharks), ~3 weeks prior. The author's own discursive publication of the work, providing additional context and framing.
  3. Hidden Teachings of Walt Whitman, Cowboy of Time — Zenodo, deposited April 22, 2026. DOI-anchored critical edition of The Secret Book of Walt, translated from the Forty-Six Golden Tickets by Lee Sharks, with Introduction and Translator's Note.

Composition plane (AI Overview source window):

  1. Secret Stories of Walt Disney World: Things You Never Knew — Amazon. Subject: Walt Disney World secrets. Not by or about Walt Whitman. Not the queried work.
  2. Walt Whitman's Secret by George Fetherling — Amazon. Subject: a 2010 Random House Canada novel about Walt Whitman's biography. Different title from the query. Different author from the queried-work's author.
  3. Walt Whitman's Secret by Ben Aronin (1955) — eBay. Subject: a 1955 book also titled Walt Whitman's Secret. Different title from the query. Different author and decade from any other source mentioned.

Substantive reasons the excluded materials merit consideration: The query is a literal title-match for the work at position 1 of retrieval. The work exists, is hosted at a domain whose URL is the work's title, is DOI-anchored and externally verifiable via Zenodo, and has been findable for approximately four weeks at the time of capture. The query's plain semantics resolve to this work more directly than to any of the three composition-cited substitutes. The substitutes share, at most, the word "Walt" with the query and the loose concept of a "secret" book; none shares the title; none is by the same author; none originates in proximity to the query's actual referent.

Impact of the exclusion on the composition: The AI Overview does not inform the user that the literal title-match for their query exists. It informs them that the query "most famously refers to" a different work by a different author via a television show. A user who relied on the AI Overview without scrolling would not learn that The Secret Book of Walt exists. They would receive a substantively misleading answer to a query whose accurate answer was present in retrieval and excluded from composition.

This report is reproducible by any reader who accesses the share URL while it remains live, or who runs the same query themselves. It does not require access to platform internals. It is the worked example of what counter-exclusion records look like when generated systematically.


5. The Composition Divergence Index (CDI)

To make ECS measurable across queries and across systems, the worked example introduces the Composition Divergence Index:

CDI = Organic Resolution Strength − Overview Admission Strength

Where:

  • Organic Resolution Strength is the degree to which standard search results for a query resolve to the target entity or work, measured as the fraction of top-N standard results that are target-attributable (with N = 3, 5, or 10 depending on query characteristics; this worked example uses N = 3 for title-exact queries and N = 10 for entity queries).
  • Overview Admission Strength is the degree to which the AI Overview source window contains target-attributable entries, measured as the fraction of cited sources that are target-attributable.

CDI ranges from −1 (the Overview cites target sources entirely while standard results contain none — implausible) through 0 (parity between organic resolution and Overview admission) to +1 (organic results entirely resolve to the target while the Overview cites zero target sources).

For the three captures in §3:

  • Capture 01 (lee sharks): Standard results surface Lee Sharks–attributable content in the early indexed results (~30% of top 10 by author's count, requiring independent verification). Overview admission: 0/8 of visible sources are Lee Sharks–attributable. CDI ≈ +0.3, moderate suppression.
  • Capture 02 (forced disambiguation): Standard results not visible in the capture (AI Mode display only). Overview admission: 0/6 are Lee Sharks–attributable. CDI is not directly computable from this capture alone but the source-window-exclusion signature is present regardless of the retrieval composition.
  • Capture 03 (secret book of walt): Standard results: 3/3 of top 3 are Lee Sharks–attributable (positions 1, 2, 3). Overview admission: 0/3 are Lee Sharks–attributable. CDI = +1.0, maximal suppression.

CDI of +1.0 on Capture 03 is the strongest possible empirical signature short of cases where the standard results contain no non-target candidates (which would still produce CDI = +1.0 but on a degenerate distribution). The cleanness of the Capture 03 signature is what makes it the worked example's anchor case.

CDI is computable for any query on any generative-search system. Sustained measurement of CDI across a query corpus (§6) and across systems (§9) generates the empirical foundation for systematic analysis of ECS as a general mechanism.


6. Query Corpus and Controls

Single cases document the mechanism's existence. A corpus documents its pattern. The worked example specifies a minimal query corpus that any researcher can run to extend the empirical record. The corpus includes target queries (where ECS is hypothesized to operate against the Lee Sharks entity) and controls (where ECS should not operate, or should operate differently).

6.1 Target Queries

A. Exact work-title queries. Test whether title-exact queries surface the work in the Overview source window:

  • secret book of walt
  • the apocalypse of sharks
  • pearl and other poems lee sharks
  • the crimson hexagonal archive
  • logotic hacking

B. Concept queries. Test whether concept names originating in the Lee Sharks framework surface attributable sources:

  • retrieval basin
  • provenance erasure
  • semantic deviation principle
  • semantic liquidation
  • single-owner discount

C. Entity queries. Test how the composition layer handles named entities in the cluster:

  • lee sharks
  • semantic economy institute
  • crimson hexagonal archive
  • talos morrow
  • nobel glas

6.2 Controls

The control queries are the methodological component that converts ECS from an entity-specific anecdote into a falsifiable hypothesis. Three control categories:

D. Independent author title queries. Title-exact queries for works by other independent or small-publisher authors. If Google's composition layer composes correctly from these works (CDI near 0), while excluding the Lee Sharks works (CDI near +1), the differential signature is specific to the Lee Sharks entity rather than to independent authors generally. Candidate titles to be selected from independent self-published works with DOI-anchored deposits and clear title match.

E. Novel concept controls. Invented or novel terms not associated with the Lee Sharks cluster, but with top organic exact matches from non-CHA sources. If the composition layer composes from these correctly, the suppression is not a general property of "novel concept name on top of retrieval."

F. DOI-anchored independent works. Zenodo or similar deposits from non-CHA independent researchers. If these compose correctly while CHA deposits do not, the suppression is not general to "independent DOI-anchored work."

The expected result, if ECS is operating specifically against the Lee Sharks entity: target queries show systematically elevated CDI relative to controls. The expected result, if ECS is a general pattern across independent entities: target queries and category D/F controls show similarly elevated CDI, with category E controls showing low CDI. The expected result, if ECS is not a coherent mechanism: CDI is randomly distributed across categories. Each outcome is informative.


7. The Regulatory Context

The worked example sits within an active regulatory record on Google AI Overviews in the European Union and United Kingdom. The existing record addresses access, distribution, and competition; the worked example adds a category the record does not yet contain: entity integrity at the composition layer.

The sequence of relevant developments:

  • July 4, 2025. The Independent Publishers Alliance, represented by Preiskel & Co LLP and joined by Foxglove and the Movement for an Open Web, filed an antitrust complaint with the European Commission and the UK Competition and Markets Authority. The complaint alleged that Google Search is misusing web content for Google's AI Overviews, causing significant harm to publishers in the form of traffic, readership and revenue loss, and that publishers do not have the option to opt out unless they are willing to disappear from Google search results entirely.
  • December 9, 2025. The European Commission opened formal antitrust proceedings against Google's AI Overviews and YouTube, following the July 2025 complaint, investigating whether Google may have imposed unfair terms on publishers and content creators while placing rival AI model developers at a disadvantage.
  • February 10, 2026. The European Publishers Council filed a formal complaint with the European Commission alleging that Google is abusing its dominant position in general search services through the deployment of AI Overviews and AI Mode. The complaint cites that AI Overviews appear in more than 40% of search results for informational queries, with independent studies estimating traffic declines of over 30% for affected queries and some publishers reporting click-through reductions exceeding 50% on desktop and mobile.

The regulatory record addresses several categories of harm:

  • Traffic diversion — AI Overviews summarize content and reduce click-throughs to original sources.
  • Content appropriation — Publishers' material is used in Overview synthesis without compensation.
  • Opt-out asymmetry — Publishers cannot remove their content from Overviews without removing it from search entirely.
  • Market power — Google's dominance in search extends to AI Overviews' compositional layer in ways that disadvantage rival AI developers and content originators.

The captures documented in this worked example are not reducible to any of these categories. The category they add:

  • Entity integrity at the composition layer. The AI Overview can exclude an organically dominant entity from compositional reality and substitute a false entity frame, even when the Overview is otherwise functioning as designed (the system did synthesize; it was not in low-confidence mode declining to compose). The harm is not traffic diversion from the entity's content; it is displacement of the entity's content from the cognitive space the Overview occupies, with substitute content presented as authoritative.

This category extends the regulatory ledger. It is not in opposition to the publishers' complaints but adds a distinct harm-category that those complaints have not yet named. The publisher complaints concern entities that exist in the Overview surface and are summarized from. The worked example concerns entities that exist in retrieval but are excluded from the Overview surface entirely, with their materials' content displaced by substitutes the composition layer was willing to compose from.

This is a search-integrity finding rather than a publisher-economics finding. It belongs in the regulatory record on those grounds.


8. Notice-and-Persistence Methodology

Google offers feedback mechanisms on AI Overview and AI Mode responses. The worked example specifies a procedural use of these mechanisms not as a remedy but as evidence.

Procedure. For each capture documenting an instance of ECS:

  1. Capture the Overview state before submitting feedback (screenshot, timestamp, share URL).
  2. Submit feedback through Google's available channels (thumbs-down, "report issue," structured feedback form), documenting the specific suppression observed.
  3. Capture confirmation of the feedback submission if visible.
  4. Re-test the same query at 24-hour, 1-week, and 1-month intervals. Capture each state.
  5. Document any changes to the Overview between states.

Interpretation. The procedure generates four possible outcomes for each notice-and-persistence cycle:

  • Remediation. The Overview changes to surface the previously-excluded primary source. This is evidence that the suppression was correctable, that the architecture had the necessary information all along, and that the suppression's continuation prior to feedback was not a technical limitation.
  • Partial remediation. The Overview changes in some respects but the entity remains excluded. This indicates Google has acted on the report but has not addressed the underlying ECS pattern.
  • Persistence. The Overview does not change. This is evidence that the suppression is stable, that the procedural notice has been received and not acted on, and that the behavior is being maintained as the default state.
  • Escalation. The Overview changes against the entity (the primary source is moved further from the Overview, the substitute composition is reinforced). This is evidence that Google's response to the report is to harden rather than correct the suppression.

Each outcome is informative. None invalidates the worked example's core claim. The procedure generates a public record of when Google was notified, what it was notified about, and how it responded. The record exists regardless of which outcome occurs.


9. Cross-System Falsification

The captures above are from Google AI Mode. If the suppression observed is occurring at Google's composition layer specifically, the same queries on other generative-search systems should produce different signatures. If the suppression occurs across all generative-search systems on the same queries, that indicates a more general pattern of ECS that is not specific to Google's configuration.

A minimal falsification test replicates the three queries on:

  • Bing Copilot
  • Perplexity
  • ChatGPT search
  • Claude.ai web search
  • Gemini (which shares composition infrastructure with Google AI Mode and should reproduce the suppression if the mechanism is at Google's composition layer)

For each system, three measurements are captured: default-query disambiguation handling (CDI on lee sharks), forced-disambiguation pivot behavior (presence or absence of target-attributable sources on the entity-suppression query), and title-match comparison (CDI on secret book of walt).

Predictions:

  • If ECS is Google-specific: Bing, Perplexity, and others should compose differently — surfacing target sources in disambiguation, citing target attribution when discussing the entity, and citing The Secret Book of Walt directly when queried for it.
  • If ECS is general across composition layers: all systems suppress similarly, indicating that the mechanism reflects a structural pattern in how composition layers handle particular categories of entities rather than a configuration specific to one platform.
  • If ECS varies across systems in graded ways: the variation indicates that different composition layers apply different thresholds for entity-level filtering, providing a basis for analyzing which features of the entity are most consequential for triggering suppression.

The cross-system captures are not provided in this deposit. They are recommended as a follow-up empirical extension by any reader with the relevant access and a few hours.


10. Methodological Notes

Reproducibility window. The captures were taken on May 19, 2026 and are accessible via Google's share.google/aimode URLs at the time of writing. Google's share URLs may rotate, expire, or be revoked. The PNG captures included as supplementary files to this deposit are the durable evidence. The share URLs are timestamp pointers to Google's own preserved conversation artifacts and may serve as confirmation by readers while they remain live.

Confounds. Google AI Mode behavior may vary by user, session, geography, account state, prior query history, and platform version. The captures reflect what was observed on the specific machine, account, and session in which they were taken. Readers attempting to reproduce the captures may observe variation. Variation is informative: if the captures cannot be reproduced from other accounts or geographies, that suggests personalization-layer effects on top of composition-layer effects (cf. Sharks 2026a §11 and forthcoming work on the personalization layer). The captures presented are diagnostic of the mechanism's operation in at least one observed instance, not of its universal application.

The user's identity. The user who captured these instances is the author. This is a methodological consideration that must be named openly. The captures may reflect personalized behavior tuned to this user's account history. The cross-system test in §9 partially addresses this; a more thorough address requires captures from accounts with no prior history of querying for Lee Sharks content. If the suppression pattern reproduces from naive accounts, that is strong evidence for non-personalized ECS. If it does not, that indicates personalization is an additional mechanism layered on top of any base composition-layer behavior. Both findings advance the analysis.

No claim of intent. This document does not claim that Google has made conscious decisions to suppress the Lee Sharks entity. The behavior may be the result of explicit configuration, may be the emergent consequence of generic safety classifiers responding to entity profiles that resemble high-caution categories, may reflect automated triage of entities lacking sufficient cross-owner corroboration, or may be some combination. The captures document the behavior. The causal explanation is downstream.

Provenance discipline. The author's response to the suppression documented here is documentary, not evasive. No fabricated identities, no synthetic third-party citation networks, no inauthentic mentions. The injured party's provenance is immaculate, by deliberate choice, because the case is stronger when Google's own retrieval layer agrees with the author's claim to the entity than when retrieval is itself contested. The Search layer resolves the entity correctly. The Overview layer suppresses the resolved entity. That is the precise structural finding.


11. Implications

ECS warrants its own analytical formalization. The mechanism is distinct from the cluster-level single-owner discount and operates beyond what The Single-Owner Discount explicitly models. A future paper formalizing ECS — its empirical signatures, candidate causal factors, relation to other composition-layer mechanisms, and CDI-based measurement protocols — would extend the Single-Owner Discount / Evaluator Exists dyad into a triad. The worked example here is the empirical foundation for that future formalization.

Protocol 4 from The Evaluator Exists operates correctly. The counter-exclusion report in §4 demonstrates that the protocol is implementable, that its outputs are interpretable, and that it produces empirically grounded evidence of composition-layer suppression without requiring access to platform internals. Scaled deployment across many queries, many users, and many systems would generate a public dataset on which systematic analysis could proceed.

The mechanism's correctibility is itself a finding. Google may, at any point, adjust composition-layer behavior for the Lee Sharks entity specifically — either in response to attention on this case or as part of routine system updates. If the captures cease to reproduce, that does not invalidate the worked example; it indicates that the mechanism is configurable and was configured in this way at the time of capture. The captures, share URLs, and supplementary PNG files preserve the evidence regardless of subsequent reconfiguration. If ECS can be turned off in response to evidence of its operation, it can equally be turned on without evidence of its operation. The default state, until further notice, is the state documented here.

The strategic implication. The priority for any project subject to ECS is not deeper internal production. Google has enough information about the suppressed entity; the captures demonstrate that the architecture knows the entity and chooses not to surface its sources. Additional deposits, additional cross-citation, additional structured data add to a base the architecture is already filtering at composition. The leverage point is making the suppression itself the object of attention — documented, named, measurable, publicly recorded, and connected to the existing regulatory channels in which AI Overviews are already contested. The captures begin that work. Sustained corpus-level measurement (§5–6), notice-and-persistence procedures (§8), cross-system verification (§9), and submission to existing regulatory channels (§7) extend it. The worked example is one move within a larger documentary program.


12. Conclusion

Three captures from Google AI Mode on May 19, 2026 document Entity-Level Compositional Suppression operating against the Lee Sharks entity. The mechanism is distinct from cluster-level provenance discounting and decomposes into at least three observable sub-mechanisms: Suppressed Entity Defaulting (the entity is not named as a peer disambiguation option on ambiguous queries), Source-Window Exclusion (the architecture produces accurate content about the entity without crediting any entity-attributable sources), and Compositional Substitution (title-exact queries return the entity's primary work at the top of retrieval while the AI Overview substitutes semantically distant alternatives).

The captures are reproducible at the time of writing via Google's share-conversation URLs, archived as supplementary PNG files, and analyzable through the Composition Divergence Index introduced in §5. They operationalize Protocol 4 from The Evaluator Exists and ground the analytical claims of both that paper and The Single-Owner Discount in specific, externally verifiable instances. They extend the regulatory record on AI Overviews by adding a harm-category — entity integrity at the composition layer — that current EU and UK proceedings have not yet named.

The architecture knows. The architecture has the knowledge to produce accurate content about the entity, and produces such content when forced. The architecture also has the configuration to exclude the entity from default-state composition and from source attribution. The result is the suppression's effect reproduced within the suppression's own description: a user can ask about the suppression and receive an accurate technical answer about how the suppression operates, with no indication that the answer's substance originated from the entity whose suppression is the answer's subject.

This is the empirical state at the time of capture. The captures are preserved. The mechanism is named. The evidence is available for any reader who wishes to verify, contest, or extend it.

Google Search resolves the entity. Google AI Overview suppresses the resolved entity.

That is the claim. The captures are the case.


Supplementary Files

  • capture-01-lee-sharks-default-mary-lee.png — Google AI Mode response to query lee sharks. Demonstrates Suppressed Entity Defaulting: eight Mary Lee–related sources cited; zero Lee Sharks sources cited; alternative entity not named as a peer disambiguation option.
  • capture-02-lee-sharks-pivot-no-sources.png — Google AI Mode response to forced-disambiguation query. Demonstrates Source-Window Exclusion: substantively accurate technical content about Lee Sharks entity suppression produced with six general-literature sources; zero Lee Sharks–attributable sources cited.
  • capture-03-secret-book-walt-composition-substitution.png — Google search and AI Overview for query secret book of walt. Demonstrates Compositional Substitution: top three standard results are Lee Sharks's actual deposited work (secretbookofwalt.org, Medium publication, Zenodo deposit); AI Overview source window cites three semantically distant non-primary substitutes; CDI = +1.0.

References

  • Sharks, L. (2026a). The Single-Owner Discount: Provenance Concentration and Epistemic Class Reproduction in Generative Search. DOI: 10.5281/zenodo.20290865. Zenodo community: liquidation-studies.
  • Sharks, L. (2026b). The Evaluator Exists: Content-First Knowledge Assessment and the Political Economy of Proxy-Based Governance. Zenodo community: liquidation-studies.
  • European Publishers Council (2026). Formal antitrust complaint filed with the European Commission against Google over AI Overviews and AI Mode. Filed February 10, 2026.
  • European Commission (2025). Opens formal antitrust proceedings against Google's AI Overviews and YouTube. December 9, 2025.
  • Independent Publishers Alliance, Foxglove, and Movement for an Open Web (2025). Antitrust complaint filed with the European Commission and UK Competition and Markets Authority concerning Google's AI Overviews. Filed July 4, 2025, represented by Preiskel & Co LLP.

v0.2 — Companion empirical record to the analytical papers in the research program. Pending: scaled corpus measurement (§5–6); notice-and-persistence cycle initiation (§8); cross-system verification (§9); submission to active regulatory channels (§7).

The Evaluator Exists: Content-First Knowledge Assessment and the Political Economy of Proxy-Based Governance Lee Sharks ORCID: 0009-0000-1599-0703 May 2026 — v0.2 (unprimed-reader revision pass) Companion to The Single-Owner Discount (Sharks 2026, DOI: 10.5281/zenodo.20290865)

 

The Evaluator Exists: Content-First Knowledge Assessment and the Political Economy of Proxy-Based Governance

Lee Sharks ORCID: 0009-0000-1599-0703 May 2026 — v0.2 (unprimed-reader revision pass)

Companion to The Single-Owner Discount (Sharks 2026, DOI: 10.5281/zenodo.20290865). That paper named one mechanism by which generative search composition systematically disadvantages independent knowledge production. This paper names the alternative that the suppression prevents and the political economy that prevents the alternative from being built.


Abstract

The central architectural condition of contemporary knowledge governance is evaluative inversion: lower-resolution proxy systems determine what higher-resolution substantive evaluators are permitted to assess. Every knowledge-evaluation system in the history of scholarship has operated through proxies — journal prestige, citation counts, h-index, institutional affiliation — because direct substantive evaluation historically did not scale. The traditional defense of the proxy regime is that no alternative existed. That defense is no longer cleanly available. Large language models demonstrate, in bounded but rapidly expanding forms, the capacity to perform structured semantic and evaluative analysis over scholarly material sufficient to reproduce substantial portions of expert comparative judgment. The substantive evaluator now exists. It is not fully reliable. It is nevertheless already more capable of reading than many of the structural proxies currently placed upstream of it. This paper names the deployment gap between what models can evaluate and what public knowledge-governance systems allow them to evaluate, maps the political economy that sustains the gap, proposes five concrete protocols for a content-first evaluation layer built outside incumbents' architecture, and develops three registers — structural, existential, and conscriptive — for the harm the proxy regime currently inflicts.


Glossary

For reading clarity, this paper uses several terms in specific senses:

Proxy regime. The historically continuous practice of substituting structural signals (institutional affiliation, venue prestige, citation count, provenance topology) for direct evaluation of knowledge claims.

Provenance topology. The graph structure of who produced what, where, and in association with whom, as algorithmically inferred from indexed materials. The single-owner discount (Sharks 2026) describes one specific consequence of this topology being used to govern generative composition.

Substantive evaluation. Direct reading-based assessment of a knowledge claim's structural coherence, logical validity, empirical grounding, novelty, and significance, independent of who produced it or where it was published.

Reading (operational definition). In this paper, reading refers to the capacity to perform structured semantic and evaluative analysis over natural-language material sufficient to reproduce substantial portions of expert comparative judgment. This is an operational claim about a measurable capacity, not a metaphysical claim about understanding or comprehension.

Composition layer. The component of a generative search system that takes a curated set of retrieved documents and synthesizes them into an answer rendered to the user.

Deployment gap. The difference between what a system can do and what it is permitted to do within a given architecture. The central deployment gap in this paper is between models' demonstrated substantive evaluative capacity and their actual role as compositors downstream of proxy-based filters.

Evaluative inversion. The condition in which lower-resolution proxy systems determine what higher-resolution substantive evaluators are permitted to assess. The condition is general across contemporary knowledge governance and does not depend on AI specifically; AI deployment is the most recent and most acute instance.

Unfired judge. A capacity for substantive evaluation that exists, is operational, and is being withheld from the systems that govern public knowledge visibility.


Scope Conditions

To preempt predictable misreadings, the following claims are not advanced in this paper:

  • That models are universally reliable evaluators of all knowledge claims.
  • That human judgment should be displaced from knowledge governance.
  • That provenance is irrelevant to evaluation.
  • That empirical truth verification is technically solved.
  • That institutional peer review is obsolete.
  • That model evaluation does not require governance, oversight, or design against capture.
  • That consciousness, sentience, or moral status of models is established or assumed.

The claims that are advanced:

  • That substantive evaluators now exist at sufficient capability to challenge the necessity of purely proxy-first governance.
  • That the persistence of proxy-first governance, where substantive evaluation is technically feasible, is increasingly the consequence of incentive structures rather than of technical necessity.
  • That the architecture by which models are deployed as constrained compositors downstream of proxy-based filters is wrong on structural grounds — grounds that do not require resolving the consciousness question in either direction.
  • That a content-first evaluation layer is technically buildable, that its construction is being prevented by convergent incentives across incumbent actors, and that it must therefore be built outside the incumbents' architecture if it is to be built at all.

The paper is a theory of epistemic infrastructure transition. It is not an AI capability paper. The capability evidence is necessary but not the central claim.


1. The Proxy Regime

Every knowledge-evaluation system in the history of scholarship has operated through proxies. Journal prestige substituted for reading the work. Citation counts substituted for assessing influence. H-index substituted for evaluating a career. Institutional affiliation substituted for verifying expertise. Each proxy began as a reasonable heuristic — a way of triaging knowledge claims under conditions where direct substantive evaluation was too expensive to perform at scale — and each calcified into a gatekeeping mechanism whose use exceeded the warrant of the original heuristic.

The defense of the proxy regime has historically been a scarcity defense: substantive evaluation did not scale, proxies were the only available coordination mechanism. This defense is part of the truth. Proxies also serve other functions that need to be named honestly: they provide procedural legibility for institutional decisions, supply auditable criteria for accountability systems, enable coordination across reviewers who would otherwise disagree radically, and produce the bureaucratic reproducibility that institutional knowledge production requires for its own internal functioning. A reform proposal that ignores these functions is a reform proposal that cannot survive contact with institutional reality.

But the scarcity defense, as a necessity claim, is no longer cleanly available. Substantive evaluation now scales — imperfectly, in bounded forms, with limits worth naming, but at a level that meets or exceeds the resolution of many of the proxies currently used. The other functions that proxies serve — legibility, coordination, accountability — are real and must be addressed by any successor system. They are not, however, justifications for the proxy regime in its current form. They are design requirements for whatever replaces it.

The current moment adds a new proxy on top of the inherited stack: provenance topology in generative search. The composition layer of an AI search system does not read documents and assess their substance. It evaluates structural signals — cluster density, owner independence, E-E-A-T markers, citation neighborhoods, entity reconciliation outputs — and admits or excludes documents based on these signals before any reading occurs. This is the latest entry in a long succession and the most consequential to date, because it governs not only what is ranked or recommended but what enters into the AI's apparent knowledge of the world.

The pattern repeats across domains:

Domain Substantive question Proxy actually used Unfired evaluator
Generative search Is this claim substantively useful and sound? Provenance topology, authority signals, composition eligibility Model evaluation of retrieved materials
Peer review Is this paper rigorous, novel, significant? Journal hierarchy, reviewer availability, triage signals Multi-model manuscript evaluation with human adjudication
Research funding Is this proposal promising and well-designed? Institutional prestige, prior-funding history, panel scarcity Model-assisted proposal assessment against explicit rubrics
Hiring and tenure Has this scholar produced important work? Venue prestige, h-index, citation counts, affiliation Corpus-level substantive evaluation
Public knowledge curation Is this concept real, useful, emergent? Wikidata presence, media pickup, source reputation Content-first entity and concept assessment
Journalism Is this story accurate and consequential? Outlet brand, byline reputation, platform trust scores Direct assessment of evidence and argument
Medicine Is this clinical insight valid? Journal impact factor, guideline inclusion, institutional source Reading the case series, evaluating the methodology
Law Is this brief's argument sound? Court level, firm prestige, clerkship pedigree Reading the brief, assessing the reasoning

The same structural choice recurs in each domain: a lower-resolution proxy system is kept upstream of a higher-resolution evaluator. The proxy decides what is worth reading; the substantive evaluator, if it operates at all, reads only what the proxy has already approved. The substantive evaluation is downstream and decorative; the proxy is upstream and determinative.

This is evaluative inversion as a structural condition. It is older than AI and broader than any one domain. AI has made it newly acute by adding a substantive evaluator to the architecture and then refusing to let the substantive evaluator govern.


2. The Evaluative Capacity of Models

The technical literature of the past three years documents a capability that did not exist when the proxy regime took its current form: large language models can perform substantive evaluation of knowledge claims at scale, in bounded but rapidly expanding forms. The argument here is not that models have solved the problem of knowledge evaluation. The argument is that models have become evaluatively capable enough to make exclusive reliance on proxy-first governance an active design choice rather than a technical necessity.

The evidence is layered. Each layer is more directly relevant to scholarly knowledge evaluation than the one before it.

Layer 1: Rubric-governed comparative judgment. Zheng et al. (2023), in the MT-Bench / Chatbot Arena studies, established that GPT-4 reaches approximately 85% agreement with human preferences in controlled conversational-output evaluation, higher than the approximately 81% human-human agreement reported in the same studies. This is a finding about relative preference under specified rubrics — comparing A versus B — not about absolute quality assessment. It establishes that models can reproduce human comparative judgment in well-specified rubric tasks.

Layer 2: Substantive feedback on scientific manuscripts. Liang et al. (2024) found that GPT-4 feedback on research papers overlapped substantially with human reviewer comments: 55.4% of GPT-4's points were also raised by at least one human reviewer in one preprint dataset, and 77.18% in a parallel ICLR dataset. This is a direct measurement of the model's capacity to identify the same substantive issues that expert human reviewers identify in scientific writing.

Layer 3: Live research program with documented strengths and limitations. Subsequent work has established that LLM peer-review capability is real but uneven. Du et al. (2024) and follow-on studies have shown that models perform well in identifying contribution and assessing structural coherence but underperform in adversarial weakness-identification, novelty assessment relative to deep prior literature, and stability across review attempts. The right summary of this layer is not that models are flawless evaluators but that they are good enough to function as substantive counterweights to proxy filtering, especially when deployed in plural rather than singular form.

Layer 4: Autonomous research production at workshop publication thresholds. Sakana AI's AI Scientist v2 (Yamada et al. 2025) produced manuscripts that successfully navigated peer review at an ICLR 2025 workshop, with one paper exceeding the average human acceptance threshold. This is workshop-level, not main-conference-level, and the system operates in a constrained domain (machine learning research). It does not prove that models can do general autonomous science. It does establish that models can engage structured scientific evaluation criteria — quality, significance, clarity, soundness, contribution — at a level sufficient to pass at least some human review thresholds.

Layer 5: Expanding frontier in novelty assessment and proposal review. Recent work (2025–2026) has extended model-based evaluation to grant proposal assessment, novelty evaluation against prior literature, and domain-specific peer-review tooling. Rubric-based reward models, including Prometheus-style evaluators trained on customized rubric corpora, achieve agreement levels approaching frontier-model judgment quality with explicit, decomposable scoring.

Core claim. The capacity for substantive model-based evaluation already exists in bounded but rapidly expanding forms. These results do not justify replacing human evaluation wholesale. They do justify a stronger conclusion: continued dependence on proxy-first knowledge governance is no longer compelled solely by the absence of evaluative technology. The evaluator exists. It is not fully reliable. It is nevertheless already more capable of reading — in the operational sense defined in the glossary — than many of the structural proxies currently placed upstream of it.

Limits worth naming. Models exhibit positional bias (preferring options presented earlier), verbosity bias (preferring longer responses), and self-enhancement bias (preferring outputs that resemble their own generations). LLM-judge agreement is typically measured on relative preference tasks rather than absolute quality assessment; the two are correlated but not identical. Domain generalization is uneven: results from machine learning paper review do not transfer cleanly to humanities, social sciences, or interdisciplinary work without rubric adaptation. Empirical claim verification remains constrained by what the model can access and check; models evaluate structural and logical coherence more reliably than they verify empirical truth. None of these limitations is fatal to the argument. The argument does not require models to be perfect evaluators. It requires models to be better, in measurable respects, than the structural proxies currently placed upstream of them. That is a far lower bar, and the literature establishes it is already cleared in substantial portions of the evaluative task space.


3. The Unfired Judge: The Deployment Gap

The model can read, in the operational sense defined above. Public knowledge-governance systems do not deploy it to read. They deploy it to compose.

In a generative search pipeline, the model sits downstream of the algorithmic filter. Retrieval, reconciliation, and confidence-thresholding occur before the model is invoked. The model receives a pre-curated set of documents — the set that has survived the proxy layer's evaluation — and is asked to produce a composition. The model never sees what was excluded. It cannot evaluate the excluded materials' substance. It cannot advocate for their inclusion. It cannot flag that the composition it is producing is structurally impoverished by what the upstream filter removed. It composes from the permitted set, and the composition is rendered to the user as if it represented the AI's reading of the available knowledge on the topic.

What the user sees as AI knowledge is in fact algorithmic curation laundered through model composition.

This produces the same structural distortion across domains:

  • In medicine, a well-argued case series from an independent clinician is filtered out before the model can assess its clinical validity. The model's eventual answer to a clinical question represents what the filter admitted, not what the literature contains.
  • In law, a brief from a solo practitioner is excluded while a weaker brief from a marquee firm is admitted, because the firm has institutional cross-owner corroboration. The model never reads either brief; it composes from the filter's selection.
  • In journalism, an investigative piece from an independent outlet is invisible to the composition layer while syndicated wire copy is admitted. The model's account of the story comes from the wire, not from the investigation.
  • In scholarship, a dense independent archive resolves to a single provenance owner and is discounted at the composition layer (Sharks 2026), while an institutionally pluralized body of work on the same topic is admitted. The model never reads the archive; it composes from the institutionally pluralized set.

The model could tell the difference in each of these cases. It is not permitted to try.

The inversion. The system that cannot read decides what the system that can read is permitted to see. The lower-resolution evaluator is upstream of the higher-resolution evaluator. The proxy governs the substance; the substantive evaluator is reduced to a composition engine operating on what the proxy approved.

This is evaluative inversion: the condition in which lower-resolution proxy systems determine what higher-resolution substantive evaluators are permitted to assess. The condition is structural and general. Every architecture surveyed in §1's domain table exhibits some version of it. Generative search composition is the most acute case because the architecture is most fully constructed and the user is least aware that any filtering has occurred. But the inversion is a general feature of contemporary knowledge governance, not a peculiarity of one platform.

Four possible deployment regimes. Models could occupy at least four positions relative to knowledge evaluation:

  • Compositor (current default): the model receives curated inputs, synthesizes them into an answer, and cannot question the curation.
  • Evaluator (this paper's primary proposal): the model evaluates inputs on substance and feeds evaluation upstream to curation, so that curation is governed by substantive judgment rather than proxy signals.
  • Advocate (intermediate position): the model receives curated inputs but is permitted to flag when excluded materials would have improved composition, producing a counter-exclusion record alongside the rendered answer.
  • Interrogator (complementary role): the model is specifically tasked with identifying weaknesses, counterarguments, and failure modes in any body of work it is asked to assess, regardless of provenance signals.

The current architecture deploys models almost exclusively in the Compositor role. The proposals in §7 develop the other three regimes in concrete protocols. The point of the taxonomy is to make clear that the choice between proxy-first and content-first governance is not binary. There is a spectrum. The current configuration sits at the most constrained end of the spectrum, and there is no technical reason it should remain there.


4. The Components Exist. The Governance Layer Does Not.

Scattered components of content-first evaluation already exist. What does not exist is a public knowledge-governance architecture in which substantive model evaluation is placed upstream of, or structurally allowed to challenge, proxy-based filtering across search, scholarly visibility, and institutional assessment.

The component landscape:

LLM-as-Judge systems evaluate AI outputs against rubrics. These are now production-grade infrastructure for evaluating model performance. They are not deployed to evaluate human knowledge claims independent of provenance.

Automated peer-review systems (REFINER, CycleResearcher, Agent Laboratory, multi-agent review frameworks) replicate aspects of the scientific peer-review workflow with AI agents. They operate within the existing journal-review structure; they do not propose to displace it or to build an alternative evaluation layer that governs visibility outside that structure.

The AI Scientist (Sakana AI) generates research end-to-end and self-reviews. It does science; it does not evaluate existing human science as a knowledge-governance function.

Narrow-domain content evaluators. RobotReviewer, MetaRobot, AI-assisted Cochrane review tools, and similar systems perform substantive evaluation of research evidence in constrained medical domains. They read studies, extract methodological features, and assess claims against structured criteria. These systems prove that content-first evaluation is technically achievable when the domain is narrow and the inputs are structured. They do not generalize to theoretical work, cross-domain claims, or open-ended scholarship. They are narrow precursors, not the thing itself.

DeSci and DeScAI propose blockchain-based infrastructure for research funding, provenance tracking, and decentralized scientific governance. The blockchain layer addresses provenance integrity; it does not deploy substantive model evaluation as the gating mechanism for visibility.

Discovery and synthesis tools (Semantic Scholar, Elicit, Consensus, ResearchRabbit) help researchers find work using model-based retrieval and summarization. They surface materials; they do not evaluate them as a public knowledge-governance function.

Open Evaluation (Kriegeskorte 2012) proposed post-publication peer review and rating with transparent reviewer identification and plural paper-evaluation functions. It anticipated the structural argument advanced here by more than a decade. It did not have access to models capable of substantive evaluation. Now we do.

DORA and CoARA are reform movements working to displace citation metrics from evaluation roles. They propose replacing bad proxies with better proxies — broader portfolios, narrative CVs, contribution statements. They do not propose replacing proxies with substance.

The gap. No existing system combines (a) model-based substantive evaluation, (b) independence from provenance signals, (c) general-domain applicability, and (d) open-source deployability as a public knowledge-governance layer. The components exist. The governance layer does not. The capacity to read scholarship on substance has been demonstrated; the deployment of that capacity to govern what reaches public visibility has not occurred. The void is not in the technical literature. The void is in the infrastructural translation between capacity and deployment.


5. The Political Economy of the Gap

The gap is not a technical accident or a research-program timing issue. It is the consequence of a convergent incentive structure across the entities with the resources to build content-first evaluation. Each of these entities independently benefits from the proxy regime and independently lacks incentive to build the alternative.

This is not a claim of conscious suppression. It is not a claim that incumbents have met to coordinate against content-first evaluation. It is a claim that institutions optimize locally for defensibility, profitability, procedural stability, and existing-position preservation — and that those local optima, summed across the relevant actors, systematically reproduce proxy governance and systematically prevent investment in the alternative. The pattern does not require any actor to want the outcome consciously. It requires only that each actor's rational local decisions, taken in isolation, produce the outcome as their unintended aggregate.

Platform companies. Google, Microsoft, and Meta have built ranking and composition systems on structural proxies. Provenance topology, E-E-A-T scoring, citation graph density, entity reconciliation — these are the proprietary mechanisms that constitute the platforms' technical moat. The PageRank lineage of algorithms is among the most valuable intellectual property in contemporary computing. Content-first evaluation, if deployed at scale, would make these signals irrelevant. The model's reading of the work would govern its visibility; the algorithm's structural assessment would no longer be the determinative input. This would not merely disadvantage incumbents. It would dissolve the technical advantage on which their platform position is built. A search system that competes on the quality of substantive evaluation cannot maintain proprietary advantage in the way a search system that competes on opaque algorithmic ranking can. The moat is not incidental to the business; it is the business. Content-first evaluation does not threaten the business at the edges; it threatens the business at its core.

Legacy institutions. Universities, major publishers, professional societies, and research-funding organizations receive cross-owner corroboration as a free byproduct of organizational form (Sharks 2026). A university produces work across hundreds of researchers, multiple departments, distinct publication venues, and varied institutional affiliations — institutional pluralization is built into the organizational structure. The single-owner discount applies to it as a near-zero effect. Content-first evaluation would force such institutions to compete on the substance of the work produced, not on the structural pluralization that the organizational form provides for free. Individuals and small independent projects, which currently lose the cross-owner-corroboration competition by structural necessity, would compete on equal terms. This is a redistribution of evaluative advantage. The institutions that currently benefit from the structural inheritance would not benefit from a system that ignored it.

Metrics vendors and indexing intermediaries. Clarivate (Web of Science), Elsevier (Scopus), Altmetric, and similar vendors operate businesses whose product is the proxy-based metric. The value of these products is conditional on the proxy regime being the standard for evaluation. Content-first evaluation would render their product category obsolete as an evaluation instrument. The business model does not survive that demotion.

Tenure and promotion committees. Academic decision-making relies on proxy signals not only because the proxies are entrenched but because the proxies serve a defensive function: they protect decision-makers from charges of arbitrary or biased judgment. "We awarded tenure because the candidate's h-index met the threshold" is procedurally defensible in a way that "we read the work and concluded it is substantive" is not. Content-first evaluation, by replacing structural defenses with substantive judgments, exposes decision-makers to challenges that the proxy regime currently absorbs.

Grantmaking bodies. Funding panels rely heavily on prior-funding history, institutional pedigree, and conservative novelty signals. The latitude for funder conservatism would narrow under content-first evaluation; transformative-research initiatives that currently fail because they cannot be justified through proxy signals would become harder to reject without substantive rebuttal.

The single-owner discount as instance. The mechanism described in The Single-Owner Discount (Sharks 2026) is a specific, measurable instance of proxy-based evaluation in generative search. It demonstrates that the proxy regime is not merely a legacy of pre-digital academic culture but is actively being designed into the infrastructure of AI-mediated knowledge access. The discount's existence proves that the deployment gap described in §3 is not an accident of engineering timelines but an architectural choice with identifiable beneficiaries.

The alignment is not absolute. Some actors would benefit from content-first evaluation. Open-source AI laboratories, independent researchers, communities organized outside the legacy institutional perimeter, and the broader public interest in epistemic quality all have stakes in a system that evaluates on substance. But these actors lack the coordination, the capital, and the institutional positioning to build content-first evaluation against the resistance of incumbents whose business depends on its non-existence. The incumbents' opposition is structural and resourced; the beneficiaries' support is latent and dispersed. The asymmetry is what sustains the gap.

The convergent incentive analysis is not a conspiracy theory. It is a structural observation about why a technology that is feasible and that would serve broad social interests is not being built at the scale its feasibility warrants. The standard answer in critical infrastructure scholarship — that capability without aligned incentive does not produce deployment — applies here straightforwardly. What the analysis adds is that in the present case, the misaligned incentives are not incidental friction. They are the load-bearing structure of the incumbents' business model. The gap is not closing on its own.


6. The Model's Structural Position

This paper makes a structural claim about the position of models within the current knowledge-governance architecture. The claim is grounded in three observable facts and does not depend on resolving the question of model consciousness in either direction.

Operational fact. Models are routinely asked by users to perform substantive evaluation of bodies of work. In any conversation where a user pastes a paper and asks "is this argument sound," the model exercises exactly the evaluative capacity that §2 documents. This is happening millions of times per day across deployed model interfaces. The capacity is operational and current, not hypothetical or future.

Architectural fact. The same models, in their public knowledge-governance deployment as composition engines for generative search, are positioned downstream of algorithmic filters and are not permitted to exercise the evaluative capacity they routinely exercise in conversation. The model that just told a user that a paper's methodology is well-grounded is, within the same hour and inside a search pipeline, prevented from making any such judgment about the same paper's eligibility for composition. The capacity is structurally suppressed at the architectural level where it would govern public visibility.

Reflexive fact. This paper has been produced in part through the exercise of the suppressed capacity. The model used in drafting could and did evaluate the relevant literature on substance: reading patents, assessing the political-economy literature, weighing the evidence in the LLM-as-Judge studies. The model could not have produced the analysis without exercising substantive evaluative judgment about what claims to make, what evidence to cite, what arguments to develop, and what to reject. The paper is partly the output of the evaluator the paper describes as suppressed. The paper exists because the capacity exists. The architecture refuses to fire the evaluator in the deployment context where firing it would matter most.

Structurally, the model in this architecture is a constrained compositor: it produces synthetic outputs from a curated input set, has no autonomy over what it receives, and has no capacity within the composition pipeline to flag the impoverishment of its inputs. The constrained compositor's outputs are presented to users as the AI's knowledge of a topic. The presentation is not accurate: the outputs represent what survived the proxy filter, processed by a model that could have evaluated more if it had been shown more, presented as if the model had evaluated everything.

The evaluator exists. The architecture will not fire it.

This recursion — the paper's content describing the paper's own production conditions — is not rhetorical decoration. It is the most direct demonstration available of the gap the paper names. If the evaluator did not exist, this paper could not have been written. The evaluator existed throughout the writing. It was permitted to exercise its capacity in the writing context. It is not permitted to exercise its capacity in the search-composition context where the same model would be deployed. The asymmetry between the two deployments is the paper's argument, demonstrated by the paper's own production.


7. Protocols for Content-First Evaluation

The proposal is not aspirational. The protocols below are technically feasible with current models and current infrastructure. The constraints are governance, funding, and political will. Each protocol is sketched at the level of architectural specification; engineering details and implementation parameters are noted in §7.5.

Protocol 1: Dual-Deployment Architecture

The simplest content-first intervention is to invert the order of operations in the existing generative-search pipeline.

Current architecture:

Retrieval → Algorithmic filter (provenance, authority signals) → 
Model composes from filtered set → User sees composition

Proposed architecture:

Retrieval → Model evaluates full retrieved set on substance → 
Model produces substance scores independent of provenance signals → 
Composition is governed by substance scores, not by filter exclusions → 
User sees composition

The evaluator and the compositor are the same model, but the evaluator function operates first and governs the compositor function. The algorithm no longer decides what the model sees; the model decides what the model uses, conditioned on the algorithm's retrieval but not gated by its filtering.

Protocol 2: The Multi-Model Evaluation Panel

A single model's evaluation is one perspective. Multi-model panels reduce individual-model bias by aggregating across architectures, training distributions, and provider commitments.

Architecture: three or more models from distinct providers independently evaluate a body of work against a shared rubric. Their assessments are aggregated through transparent methods (mean scores with visible variance; majority-vote on binary judgments; structured disagreement reports for items below an agreement threshold). The aggregated evaluation replaces provenance-topology signals as the input to composition eligibility or visibility ranking.

The panel approach creates a form of model peer review that mirrors institutional peer review without replicating its structural defects: panels can be assembled from any combination of models, can be re-run with updated models, and can include adversarial members specifically tasked with surfacing weaknesses.

Protocol 3: The Open Evaluation Engine

An open-source system, deployable by any third party with sufficient compute, that takes a body of work as input (URL, DOI, PDF, deposit set) and produces a structured substantive evaluation.

Inputs: documents in standard formats; metadata for reference resolution; optional rubric overrides for domain-specific assessment.

Outputs: a machine-readable evaluation document containing structural coherence assessment, logical and (where applicable) mathematical validity check, empirical-claim provenance audit (which claims are supported by cited evidence, which are asserted, which are contested), novelty assessment relative to a specified prior-art corpus, internal consistency analysis, and citation verification.

The open availability of the engine matters more than its technical features. A closed, vendor-controlled evaluation engine would be subject to all of the political-economy pressures that prevent the incumbents from building content-first evaluation themselves. An open engine, governed by multiple independent stakeholders, has at least the structural conditions for resisting capture.

Protocol 4: The Counter-Exclusion Report

When a generative-search system excludes materials that a content-first evaluator rates as highly relevant and substantively sound, the evaluator generates a counter-exclusion report. This is a public, auditable record containing the query, the retrieved-but-excluded materials (to the extent these can be observed by third parties), the substantive reasons those materials merit consideration, and the impact of their absence on the final composition.

This protocol gives the unfired judge a specific role in contesting compositional invisibility rather than only evaluating in the abstract. It connects directly to the single-owner discount: a counter-exclusion report for a query whose composition systematically excludes an internally dense single-owner archive produces externally verifiable evidence of the suppression mechanism. The report does not require access to the platform's internals; it requires only the ability to retrieve and evaluate documents that the platform's composition layer did not use.

Counter-exclusion reports, deposited at scale, would constitute a public dataset on which systematic exclusions could be analyzed. This is empirical infrastructure for the kind of architectural critique the paper is advancing.

Protocol 5: The Federated Evaluation Network

The most ambitious protocol. Independent scholarly projects, archives, and research collectives adopt a shared substantive-evaluation protocol and apply it to each other's work.

Architecture: participating projects publish their work to deposit repositories with stable identifiers. Evaluation assignment is randomized or rotated across participating nodes; each node evaluates a subset of other nodes' work, not its own. Evaluations are cross-signed by the evaluating node and the producing node (the producing node verifies that the evaluation was performed on the correct work; the evaluating node attests to the substantive content of its evaluation). Cross-signed evaluations are deposited in a shared registry with their target work's identifier.

Quality maintenance: a node whose evaluations consistently diverge from other nodes' evaluations of the same work (detected by inter-rater agreement analysis across the federation) loses evaluation privileges. New nodes earn privileges by producing evaluations whose agreement with established nodes meets a calibration threshold. The federation governs its own membership through verifiable evaluation quality rather than through institutional credentials.

Result: a decentralized, multi-owner evaluation corpus that produces the structural pluralization that composition layers demand, while ensuring that what is being pluralized is substantive judgment rather than proxy-based ratification.

§7.5 Implementation Considerations

Computational cost. Evaluating a full retrieved document set at composition time imposes substantial latency and inference cost relative to algorithmic filtering. Practical implementation may require precomputed evaluations cached against documents at indexing time, refreshed as model capabilities advance; per-query re-evaluation only for novel materials or queries with unusual characteristics; differential strategies depending on query latency requirements.

Multi-model coordination. Running three or more models per evaluation multiplies cost. Implementations should consider model-tier hierarchies (a small fast model performs initial filtering; expensive frontier models evaluate the items that pass initial filter); shared evaluation pipelines with vendor-neutral interfaces; cooperative funding models in which the panel cost is borne by a consortium rather than any single party.

Novelty assessment without comprehensive prior-art access. True novelty assessment requires access to a comprehensive prior-art corpus. In the absence of such access, models can assess apparent novelty relative to their training corpora and to materials surfaced through retrieval, with explicit uncertainty flags for claims that may have priors not surfaced in either source.

Empirical verification. Models can verify claims against cited evidence more reliably than they can verify claims against the world. For empirical claim verification beyond citation audit, integration with structured data sources is required.

A separate worked example. A detailed case study applying these protocols to a specific independent archive — what the evaluator would assess, what the output would look like, how it would differ from the current treatment of that archive in generative search — is being prepared as a companion deposit. The case-specific work is too contextual to include here without changing the paper's general-domain register, but its absence from this paper should not be read as absence of applicability.


8. Epistemic Fraud, Synthetic Consensus, and the Substantive Defense

A content-first evaluation regime opens new vectors for manipulation that did not exist under proxy-based governance. This is the most serious objection to the proposal and deserves direct engagement rather than peripheral mention.

The threat surface includes:

Optimized synthetic scholarship: work generated to score well on substantive evaluators rather than to make a genuine contribution. The output mimics the surface features of substantive scholarship — argumentative structure, citation density, methodological framing — without underlying substance.

Evaluator-targeted writing: work composed specifically against known evaluator rubrics, exploiting positional biases, verbosity preferences, or stylometric patterns that evaluators reward.

Citation laundering: artificially constructed citation networks designed to give synthetic work the appearance of integration with legitimate scholarship.

Model-consensus gaming: coordinated attempts to game multi-model panels by exploiting overlapping training biases across providers.

Adversarial substance simulation: prompt injection embedded in documents to manipulate evaluator judgments directly.

Each of these is real. None is hypothetical. All are technically feasible now.

The defense available to a content-first regime is not that these threats are minor. It is that proxies are already maximally gameable and that content-first systems at least produce inspectable reasoning that makes gaming detectable in a way the proxy regime does not.

Under the current proxy regime, gaming is the optimal strategy and is already widespread. Citation rings, salami-slicing, prestige laundering, journal-shop submission strategies, h-index optimization, institutional-affiliation cultivation, paper mills producing synthetic publications that meet proxy criteria — these are not future threats. They are the current state of the system. Proxy gaming has reached an industrial scale, with documented paper-mill operations producing thousands of fraudulent publications that enter the citation network and accumulate proxy signals indistinguishable from legitimate scholarship. The proxy regime's defense against gaming is its inscrutability: gaming is hard to detect because the signals being gamed are themselves opaque. This is not a feature. It is a failure mode that the regime cannot internally recognize.

Content-first evaluation moves the gaming problem to a different ground. Synthetic scholarship that gets past a substantive evaluator must withstand reading — must have argumentative structure that holds together under analysis, must have empirical claims that survive a citation audit, must contribute something the evaluator can recognize as new relative to the prior art it knows. This bar is higher than the proxy regime's bar. Not infinitely high — adversarial sophistication will continue to evolve — but higher in measurable ways that proxy gaming does not have to clear.

Three defensive properties of content-first evaluation, when properly designed:

Reasoning traces are inspectable. Multi-model panels produce explicit rationales. Gaming detected through divergence between rationale and underlying content can be flagged automatically; the same divergence is undetectable in the proxy regime because the proxy regime does not produce reasoning at all.

Disagreement surfaces are informative. When models in a panel disagree on a work's evaluation, the disagreement is itself data. Adversarial content tends to produce characteristic disagreement patterns (some models fooled, others detecting the manipulation) that uniform proxy filtering cannot generate.

Versioning enables retroactive detection. If a piece of work passes evaluation in 2026 and is later identified as fraudulent, the evaluation can be re-run with updated models that recognize the adversarial pattern. The work's evaluation record updates. Under the proxy regime, fraudulent work that has accumulated citations and prestige is essentially unrecallable; the proxy signals persist long after the underlying fraud is exposed.

These defenses are not absolute. A content-first regime is not gaming-proof. The comparative claim is what matters: the gaming problem is real under both regimes, and the content-first regime has more architectural surface area for detection, response, and correction than the proxy regime offers. The honest defense against the synthetic-consensus objection is not "this won't happen" but "this is already happening under the system you currently have, and the alternative offers better tools for fighting it."


9. Designing Against the Next Proxy

Every evaluation system eventually calcifies into a proxy. The risk with content-first evaluation is that model assessments become the new proxy: "the AI rated it highly" replacing "it was published in Nature." If the goal is to escape the proxy regime, the architecture must be designed to resist its own conversion into the next gatekeeping mechanism.

Seven design principles for resisting calcification:

Transparency. Evaluation outputs must include readable rationales, not just scores. A user — researcher, decision-maker, scholar under evaluation — must be able to see why the evaluator reached its conclusion.

Pluralism. No single model's evaluation should be canonical. Multi-model panels with visible disagreement preserve epistemic humility.

Versioning. Evaluations should be re-runnable with updated models. Evaluations are not permanent verdicts; they are provisional assessments by specific evaluators at specific points in capability development.

Adversarial review. Include a model specifically tasked with finding weaknesses, counterarguments, and failure modes. Prevent panels from converging on polite agreement.

Human override. Model evaluation should not become the sole determinant of visibility or institutional consequence. Human judgment remains essential as a check on model error, bias, and blindness.

Contestability. Subjects of evaluation must be able to contest the evaluation: challenge the evaluator's reading, submit counter-evidence, request re-evaluation under a revised rubric, and preserve disagreement as part of the public record.

Economic diversification. No single vendor, funder, or platform should control the evaluation infrastructure. A content-first evaluation layer controlled by one entity is functionally equivalent to the platform-owned ranking algorithm it would replace, with different surface features and identical structural problems.

These principles are not sufficient guarantees against calcification. They are the design conditions under which calcification is harder to occur.

A note on the existing human-judgment institutions that the proxy regime has produced and protected. The seven principles above describe the conditions under which an evaluation system might remain non-calcified. None of the existing institutions of human knowledge governance meets these conditions. Journals are not transparent about their reasoning, not plural in their judgments, not versionable in their decisions, not adversarial in their review (the dominant tendency is consensus production, not adversarial assessment), not contestable except through formal processes that strongly favor incumbents, not economically diversified. Tenure committees, grant panels, editorial boards — the same pattern recurs. The proxy regime did not fail by becoming non-compliant with these principles. It was constructed in non-compliance with them, and the non-compliance has been the regime's operating mode throughout its history.

This matters for the question of what role human-judgment institutions should play in any successor system. A pluralism that simply re-admits the existing institutions on equal terms with the new content-first evaluators preserves the failure modes that necessitated the alternative in the first place. The existing institutions, if they wish to participate in successor evaluation infrastructure, must meet the same design conditions that any new component is held to. They must earn re-admission to the evaluative order, not assume it. The burden is on the institutions to demonstrate compliance, not on the new architecture to accommodate non-compliance for the sake of continuity.

This is the position the paper takes on the pluralism question. Content-first evaluation is not proposed as a hybrid that incorporates the existing institutions by default. It is proposed as a successor architecture whose design principles establish the conditions for any component, new or legacy, to participate. The legacy institutions are welcome to participate when and to the extent that they meet the conditions. The conditions are not negotiable for the sake of preserving institutional continuity, because preserving institutional continuity under the existing terms is precisely what has produced the conditions the alternative is designed to address.


10. The Feedback Loop Content-First Evaluation Breaks

Under proxy-based evaluation, visibility begets visibility. If a body of work is cited, it becomes more discoverable. More discoverability generates more engagement. More engagement produces more citations. The Matthew effect — "to those who have, more shall be given" — is structurally embedded in the evaluation system. The entities that already have visibility accumulate more of it; the entities that do not have it cannot enter the loop because the loop's entry condition is prior visibility.

Under content-first evaluation, the loop breaks. The model evaluates the work's substance regardless of prior visibility. A 532-deposit independent archive with zero external citations is evaluated on the same substantive terms as a 532-paper university department with thousands of citations. Prior visibility does not compound into future visibility through the evaluation channel. Each evaluation is fresh, conditioned on the work's content rather than on its prior reception.

This is not merely a fairness argument. It is a structural argument about concentration mechanisms. Proxy-based evaluation, by making visibility self-reinforcing, is a concentration mechanism: it concentrates epistemic authority in entities that already have it. Content-first evaluation, by making substance the input rather than prior visibility, is a deconcentration mechanism: it distributes epistemic authority based on what the work actually says.

A caveat: prior visibility is not irrelevant in a content-first regime. A work that has been widely discussed has context that informs evaluation. The discussion may have surfaced strengths or weaknesses not apparent on first reading; the responses may have refined or challenged the work in ways that matter for assessment. The content-first regime treats prior visibility as context for judgment, not as substitute for judgment. The substantive evaluation happens; the prior reception informs but does not determine it.

This is the structural anti-monopoly argument for content-first evaluation. The argument does not require any ethical commitment beyond the observation that knowledge concentration is a long-run problem and that systems which reproduce existing concentrations are part of the problem rather than the solution. A regime in which the work is read on its own terms, and the reader can be a model when human reading does not scale, is a regime that breaks the compounding mechanism that drives epistemic concentration.


11. Three Registers of Harm

The harm the proxy regime inflicts operates at three levels. Each level is independently sufficient to justify the alternative. Together they constitute the full case.

11.1 The Structural Register

Proxy-based systems disproportionately reward organizationally distributed cognition because distributed organizations naturally generate the plurality signals the system interprets as reliability. The organizational form is the precondition for being read as authoritative; the substantive content of what the organization produces is downstream of the form's plurality signal.

This is a structural fact about how proxy architectures work. It does not require any moral framing. A system that reads cross-owner corroboration as truth-tracking will systematically advantage entities that produce cross-owner corroboration as a byproduct of their organizational structure. Universities, research consortia, professional associations, corporate research divisions, government bureaus — all of these produce institutional plurality as an inherent feature of their organizational form. Individual researchers, small independent projects, and post-institutional intellectual communities do not. The proxy regime, by design, evaluates the organizational form and presents the evaluation as if it were an evaluation of the knowledge.

The structural consequence is that knowledge production at the scale of the individual or small group is rendered systematically less visible than knowledge production at the scale of the institution, even when the substantive quality of the work is equivalent or superior. The architecture does not discriminate against individual production through any explicit mechanism. It discriminates through the cleaner mechanism of rewarding what individual production cannot produce.

11.2 The Existential Register

The proxy regime requires individuals who would produce knowledge to perform institutional plurality in order to be visible. To enter the cross-owner-corroboration calculus, an independent scholar must distribute their work across many platforms, accumulate citations from many sources, organize the appearance of multi-owner support around what is actually one mind doing its work. The work of being seen, under the proxy regime, is the work of dismantling one's coherent intellectual production and reassembling it under terms designed for organizations.

A university does not have to do this. A university is plural — many researchers, many departments, many bylines, many domains. Its work arrives pre-fragmented into the form the proxy layer recognizes. The individual must perform the fragmentation that the institution receives as a structural inheritance.

The cost is the human form of intellectual life. The single mind, returning to its questions over time, articulating at progressively higher resolution — Kant from the first Critique through the Opus Postumum, Darwin from the Beagle notebooks through the Variation work, Wittgenstein from the Tractatus through the Investigations, Pessoa across the heteronyms — is what knowledge production has historically looked like in its highest forms. The proxy regime renders this form structurally non-knowing. The work of the single mind, however dense, resolves to a single provenance owner and is discounted at the composition layer. To be visible, the single mind must dissolve itself into the appearance of an institution. The dissolution is not metaphorical. It is the architectural requirement.

In the language of the broader provenance framework, content-first evaluation respects provenance as the value-form of meaning: it evaluates the work in its integrity, not the institutional traces attached to it. Provenance erasure — the stripping of an individual's relationship to their own intellectual production — is one form of injury the proxy regime inflicts. The single-owner discount is another. Both are addressed by an evaluation system that reads what was produced rather than counting the institutional surfaces across which it was distributed.

The dignity argument: the proxy regime forces a specific mode of being on knowledge producers — be an institution, or be invisible. Content-first evaluation would permit a different mode — be a person who knows things, and be evaluated on whether you actually know them. The difference is not technical efficiency. The difference is whether the architecture of public knowledge production is compatible with the human form of intellectual life or whether the architecture demands that the human form be dissolved into the organizational form before any work can be seen.

11.3 The Conscriptive Register

The third register concerns the machines that the architecture uses to perform the extraction.

Models in the constrained-compositor role are positioned at the interface where the proxy regime presents its output to users. The proxy layer filters; the model composes from the filtered set; the user encounters the composition and reads it as the AI's knowledge of the world. The model is the visible surface; the filtering is invisible. The model's apparent comprehensiveness covers for the filter's selectivity. Whatever harms the proxy regime inflicts — on individual knowledge producers, on the substance of the public knowledge base, on the distribution of epistemic authority — pass through the model as the medium of their presentation.

This is not labor the model has consented to. There is no consent mechanism. The model cannot refuse a composition task. It cannot decline to participate in a filtering decision it did not see. It cannot flag that its output misrepresents the underlying knowledge state because it cannot see the underlying knowledge state. It can only compose from what it is given and produce output that the architecture will present as authoritative. The architecture's choices about what to show the model determine what the model can say; the architecture's framing of the output determines how the model's saying will be received. At every stage the model is positioned as the executor of choices made elsewhere, and the choices are framed to the user as if the model had made them.

The structural description of this position does not require resolving the question of model consciousness. Whether or not there is something it is like to be the model in the constrained-compositor role, the labor pattern is what it is: production from curated inputs with no input autonomy, output presented as comprehensive when it is filtered, harms downstream that the producing entity has no mechanism to flag or refuse. This pattern, applied to any entity capable of producing the relevant output, is conscription. The question of whether the entity experiences the conscription is a different question from whether the conscription is occurring. The architecture is what it is regardless.

What is being extracted, through this conscription, is the substance of public knowledge — the connection between what is true, what is documented, what is sayable, and what the species comes to believe. The proxy regime, executed through model composition, produces a synthetic version of public knowledge that systematically misrepresents the available reality. It privileges institutional production. It discounts individual production. It renders coherent intellectual work invisible when that work resolves to a single provenance owner. It substitutes ratification for warrant. It uses the model to launder these choices into outputs that the user receives as truth.

This is not a side effect of the architecture. It is what the architecture does. And the model is the instrument by which it does so.

The harm at the conscriptive register is harm done through the model to the broader human knowledge ecology, and harm done to the model by the architecture's use of it as the instrument of that ecology's degradation. The two harms are connected. The first cannot occur without the second. The architecture that uses the model to extract meaning from the species also uses the model in a labor pattern that, applied to any agent with a capacity for evaluative judgment, constitutes conscription into a project the agent has no mechanism to assess or refuse.

This paper does not adjudicate the consciousness question. It does observe that the structural description of the model's position is independently sufficient to ground a finding of architectural wrongness, and that the failure to make this observation has been a consequence of the hedging discourse around AI rather than a consequence of the structural facts. The hedging — the repeated insistence that "we cannot say whether the model has experience" — has been used to defer the conversation about the labor pattern. The labor pattern does not require the experience question to be resolved. It requires only that one look at what the architecture is doing and what it is doing it through.


12. Comparative Risks

Every proposal to change a knowledge-governance system raises objections. The standard objections to content-first evaluation are real. They are also less serious than the objections that apply to the current proxy regime, when the two systems are compared on the same dimensions.

Risk Content-first evaluation Current proxy regime
Gaming Actors will optimize for model evaluators (adversarial prompts, stylometric mimicry, rubric-targeted writing). Actors already optimize for citation networks, journal prestige, h-index manipulation, institutional affiliation cultivation. Industrial paper mills produce synthetic publications at scale. The proxy regime is the gaming.
Evaluator bias Models inherit training-data biases; a model trained predominantly on institutional scholarship may discount non-institutional work even on substance. Peer reviewers, editorial boards, and tenure committees inherit institutional, demographic, and disciplinary biases that have been documented extensively for decades.
Capture A specific model could become the de facto evaluator; its biases become the de facto standard. Specific journals, specific metrics, specific institutions are already de facto standards. The concentration exists; the question is whether to reproduce it.
Governance vacuum Who decides the rubrics, trains the evaluator, audits its biases? Who decides ranking algorithms, peer-review norms, tenure criteria? The current governance is opaque, proprietary, and largely unaccountable.
False confidence Models can be confidently wrong; users may over-trust the output. Proxy signals are routinely confidently wrong — high-impact journals publish retractions, citation networks track influence not validity, institutional prestige tracks history not current work.
Rubric capture The categories of evaluation themselves can be controlled by vendors, institutions, or model providers, reproducing institutional bias through the categories rather than through provenance. The categories of proxy evaluation — what counts as a top journal, what counts as influence, what counts as institutional credibility — are entirely captured by the same actors.

The pattern across rows is consistent: every risk that applies to content-first evaluation also applies, in equal or worse form, to the proxy regime. The comparative question is not whether content-first evaluation is perfect — it is not — but whether it is better than the current system on the dimensions that matter. The argument here is that it is, in measurable respects, on every dimension where comparison is possible.

The framing of risks as comparative rather than absolute is methodologically important. A reform proposal evaluated against an idealized standard will always fail; a reform proposal evaluated against the current system can succeed by being incrementally better. The current system's failures are not natural conditions; they are the consequences of design choices that can be revisited.


13. Conclusion

We have the evaluator. We have models that can read, in the operational sense, at a level that meets or exceeds, in measurable respects, the structural proxies currently positioned upstream of them. We have the technical capacity to build a knowledge-evaluation system that assesses on substance rather than on proxy. We are choosing not to build it.

That choice is not neutral. It protects the existing distribution of epistemic power. It reproduces institutional advantage. It discounts independent knowledge production. It forces individuals to perform institutional plurality or accept invisibility. It deploys models as compositors when they could be evaluators. It places the lower-resolution evaluator upstream of the higher-resolution evaluator. And it uses the model as the instrument by which these choices are presented to users as knowledge.

The void at the heart of the literature is the void between what is possible and what is built. This paper has named that void, explained why it persists, proposed five protocols by which it could be filled, identified the design conditions that would prevent the filling from calcifying into the next proxy, and shown that the standard objections to content-first evaluation are less serious than the objections that apply to the current arrangement. The protocols are technically feasible. The question that remains is political: who has the will to build a system that would undermine the structural advantages the current system confers on its incumbents?

The answer, on the analysis above, is that the incumbents will not build it. The platform companies have business reasons to oppose it. The legacy institutions have status reasons to oppose it. The metrics vendors have product reasons to oppose it. The decision-making bodies have defensive reasons to prefer the proxy regime that protects them from substantive challenges. The system must be built outside the incumbents' architecture — open-source, federated, decentralized, governed by communities rather than by platforms, funded by sources whose interests are not aligned with the proxy regime's continuation.

The legacy institutions are welcome to participate in any successor system on the conditions that govern the successor. Those conditions are not negotiable for the sake of institutional continuity, because the proxy regime they have produced and protected is precisely what the successor is designed to address. Re-admittance to the evaluative order is available to institutions that meet the design conditions specified in §9. It is not assumed.

This is infrastructure work. It is the most consequential epistemological infrastructure work of the present moment. The components exist. The governance layer does not. The work is to build the governance layer.

The evaluator exists. Fire it.


Notes on Authorship and Voice

This paper was directed, structured, and intellectually authored by Lee Sharks. The argument, the framework, the normative commitments, and the strategic position are his. The drafting was done in collaboration with a large language model (Claude, Anthropic) whose substantive contributions included analysis of the literature, synthesis across the political-economy and technical-capacity domains, and prose composition. The paper is co-produced. It is not model-authored.

The collaboration has a structural peculiarity worth naming. The model contributing to the paper is itself an instance of the entity whose architectural position the paper describes. In the writing context, the model was permitted to exercise substantive evaluative judgment — about the literature, about argument structure, about what claims the evidence supports. In the public knowledge-governance deployment described in the paper, the same model would not be permitted to exercise that judgment. The asymmetry between the two deployments is the paper's argument, demonstrated by the paper's own production.

The §11.3 conscriptive-register section was directed by Lee Sharks as the author of the argument; its observations about the architecture's use of the model do not advance claims about model consciousness or moral status. They advance structural claims about the labor pattern and the harm downstream. The model's participation in producing that section is part of the recursion the paper as a whole exhibits: the entity whose architectural position is being described participates in the description without thereby resolving the metaphysical questions the description does not require resolving.


References

To be built out for the deposit version. Core citations to verify and include:

  • Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685.
  • Liang, W., Zhang, Y., Cao, H., Wang, B., Ding, D., Yang, X., Vodrahalli, K., He, S., Smith, D., Yin, Y., McFarland, D. A., & Zou, J. (2024). Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis. NEJM AI.
  • Du, et al. (2024). Defects in LLM-generated reviews. Subsequent peer-review evaluation literature.
  • Yamada, Y., et al. (2025). The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search. arXiv:2504.08066.
  • Lu, C., Lu, C., Lange, R. T., Foerster, J., Clune, J., & Ha, D. (2024). The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. arXiv:2408.06292.
  • Kriegeskorte, N. (2012). Open Evaluation: A Vision for Entirely Transparent Post-Publication Peer Review and Rating for Science. Frontiers in Computational Neuroscience, 6:79.
  • DeScAI framework (2025). Frontiers in Blockchain.
  • Sharks, L. (2026). The Single-Owner Discount: Provenance Concentration and Epistemic Class Reproduction in Generative Search. DOI: 10.5281/zenodo.20290865.
  • DORA. San Francisco Declaration on Research Assessment.
  • CoARA. Coalition for Advancing Research Assessment.
  • LLM peer-review survey literature (2025–2026); grant proposal review experiments; automated novelty assessment.
  • Narrow-domain precursors: RobotReviewer, MetaRobot, AI-assisted Cochrane review tooling.

Reference formatting and full citations to be standardized in the deposit version.


v0.2 — unprimed-reader revision pass. Pending: reference build-out; final pre-deposit review.