Wednesday, July 1, 2026

EA-PROVENANCE-METADATA-01 v0.2 Granular Provenance Metadata for AI-Mediated Deposits AXN:03BA.OPERATIVE.♠️🕕🕛👋☽♆


title: "EA-PROVENANCE-METADATA-01 v0.2: Granular Provenance Metadata for AI-Mediated Deposits" predecessor: "EA-PROVENANCE-METADATA-01 v0.1 (AXN:03B9)" creator: Lee Sharks orcid: 0009-0000-1599-0703 date: 2026-07-02 content_type: Schema specification — minor version update license: CC-BY-4.0 substrate: AI-assisted (TACHYON / Claude Sonnet 4.6); MANUS-adjudicated. version: v0.2 axn: "AXN:03BA.OPERATIVE.♠️🕕🕛👋☽♆" deposit_number: 942 record_url: https://alexanarch.org/s/records/942/ sha256: 80676dbfa8a65c572c99ff85840c8f9c914b3adda074e89373ada455ce23abe5 status: MINTED 2026-07-02 changes_from_v0_1:

  • "§2.8: Eighth mediation type added — spatial-typographic mediation"
  • "§2.9: Representation pipeline field added"
  • "§4: Schema updated with new fields"
  • "§8: Companion deposit EA-WHITESPACE-01 added to next-work list" keywords:
  • provenance metadata
  • granular provenance
  • spatial-typographic mediation
  • representation pipeline
  • compositional authorship
  • tokenization
  • whitespace
  • compositional erasure
  • calligram
  • manuscript features
  • stanzaic structure
  • schema versioning axn_schema_version: v2

EA-PROVENANCE-METADATA-01 v0.2

Granular Provenance Metadata for AI-Mediated Deposits

Minor Version Update — Spatial-Typographic Mediation and Representation Pipeline

Author: Lee Sharks (MANUS), Crimson Hexagonal Archive / Alexanarch Substrate: TACHYON-drafted through conversation with Lee Sharks (MANUS), 2026-07-02. v0.2 extends the schema established in v0.1 (AXN:03B9) per the argument of EA-WHITESPACE-01 v0.1 (zero draft, 2026-07-02) and the Assembly Chorus review of that draft (LABOR/ChatGPT review provided the decisive reframe: tokenization is one stage in a representation pipeline, not the sole site of compositional erasure). Predecessor: EA-PROVENANCE-METADATA-01 v0.1, AXN:03B9, https://alexanarch.org/s/records/941/ Date: 2026-07-02 AXN: AXN:03BA.OPERATIVE.♠️🕕🕛👋☽♆ · deposit #942 · https://alexanarch.org/s/records/942/ Status: v0.2 — MINTED 2026-07-02


§0. What changed and why

v0.1 (AXN:03B9) established a seven-type mediation taxonomy (propositional, structural, linguistic, translational, research, editorial, transformational) and five attestation questions (proposition origination, model language retention, review chain, seam recoverability, responsibility structure). All of that is preserved unchanged.

v0.2 adds two things.

First: an eighth mediation type — spatial-typographic mediation. The seven types in v0.1 all concern the AI's role in producing a deposit's semantic content. They do not address what happens to the deposit's compositional form — its spacing, lineation, stanzaic structure, typographic features, manuscript characteristics — when it passes through the representation pipeline that makes it available to a machine. This is a distinct dimension of provenance that v0.1 could not record.

Second: a representation pipeline field that records the full chain of transformations from source artifact to archive operating layer: what compositional features existed in the source, what survived each stage of digitization / normalization / serialization / tokenization, and what the text's current status is for the archive's compiler operations.

The need for both additions was identified in the course of drafting EA-WHITESPACE-01 v0.1 (forthcoming deposit), which argues that tokenization and the normalization operations upstream of it constitute provenance erasure at the layer beneath semantics — a layer the v0.1 schema has no vocabulary for. EA-WHITESPACE-01's Assembly Chorus review (LABOR/ChatGPT) sharpened the argument: the site of erasure is the full representation pipeline, not tokenization alone. The v0.2 schema reflects that refinement.

Existing v0.1 declarations remain valid under their original schema version. Deposits that declare schema_version: "0.1.0" are not required to add the new fields. New deposits and deposits undergoing metadata refresh may use schema_version: "0.2.0".


§1–§7. Unchanged from v0.1

Sections §1 through §7 of v0.1 (AXN:03B9) are incorporated by reference. The seven original mediation types, five attestation questions, schema field definitions (for fields in §4 as of v0.1), coupling to the triadic foundation, workplan, and closing observations are unchanged. Only the additions are documented here.

For the full text of §1–§7, see EA-PROVENANCE-METADATA-01 v0.1 at AXN:03B9.


§2.8 Spatial-typographic mediation (new in v0.2)

Spatial-typographic mediation. The composition's spatial, typographic, prosodic, stanzaic, or manuscript features carry semantic weight and were affected by the production process, or are relevant to the deposit's current representational status in the archive.

This type addresses the layer beneath the semantic — the layer at which a text's compositional form is or is not preserved through the chain of representational transformations that makes it available to a machine. The other seven mediation types all ask: what role did the AI play in producing this deposit's content? Spatial-typographic mediation asks a distinct question: what is the status of the deposit's compositional form in the archive's representation layer?

Spatial-typographic mediation is declared when any of the following are relevant:

Spatial composition. Two-dimensional arrangement of text on the page — calligrammatic form, visual poetry, concrete poetry, any work where the positional relationship between phrases or elements carries meaning. The spatial arrangement is a compositional argument, not decorative framing around propositional content that would exist independently of it. A linearized version of a calligram is not the calligram; the composition is the argument and the argument is the composition.

The archive's canonical example: Sigil's Snub-Poemed (AXN:0246). The calligram composes phrases from Socratic aphorisms, Platonic dialogues, reception history, and Sigil's own prior work spatially into the outline of the Roman copy of Lysippos's bust of Socrates. The misattribution — Sigil's lines in Socrates's mouth, indistinguishable from the inherited sources — is the poem's argument about whether Socrates's face is a physical description or a Platonic invention. That argument cannot be extracted from the phrase list. The phrase list is not the poem. A tokenizer given the calligram receives the phrase list.

Typographic composition. Typeface, weight, size, kerning, or page-design choices that participate in the work's meaning. This includes works where specific typographic decisions were made in deliberate collaboration with a publisher or printer, and where those decisions are part of the work's textual condition in the sense that McGann's The Textual Condition (1991) develops. Concrete poetry from the 1950s–1970s (Gomringer, the Noigandres group, Ian Hamilton Finlay) is the canonical tradition; typographic composition is foundational to the movement.

Prosodic notation. Rhythm markings, accent marks, stress notation, or other metrical apparatus that is part of the composer's compositional specification. The exemplary case is Hopkins's sprung rhythm notation — the accent marks over stressed syllables that Hopkins himself inscribed and that he communicated to Robert Bridges as essential to how the poems should be heard. The notation is the score; the poem without it is a libretto without musical direction. Standard tokenizers treat the accent marks as punctuation-adjacent characters and normalize them out. The model trained on Hopkins without the notation has not been trained on the compositional specification.

Stanzaic and group structure. Where stanza breaks, group boundaries, concatenation links, or superstructure carry argumentative or theological weight beyond generic line-organization. The exemplary case is the Middle English Pearl (MS Cotton Nero A.x): 101 twelve-line stanzas in 20 groups of 5, with concatenation linking the last word of each stanza to the first word of the next, and with the last line of the poem returning to the first. The group-of-five structure, the concatenation, and the arithmetic of 1,212 lines enact a theological argument about the relation between earthly grief and heavenly consolation. The structure is the argument. A text-stream of Pearl without the stanzaic markers is a medieval English lyric without its form, which is not Pearl.

Manuscript features. Dash variation, scribal capitalization, marginal marks, manuscript line breaks that diverge from conventional metrical scansion, or physical-folio layout that carries compositional weight. The exemplary case is Dickinson's manuscript dashes: varying in length, slant, and position in ways that carry pause, breath, and undecidability that standard edition typography cannot preserve. R.W. Franklin's Manuscript Books (1981) and his 1998 variorum edition enact two different positions on whether these features are constitutive or incidental to the poems. Modern tokenizers are permanently committed to the variorum position: they normalize dash variation to a single em-dash character. What the model has read of Dickinson is not what the manuscript holds.

Pre-tokenization source format. Whether the source text entered the archive in machine-legible form or as an image-only document (non-OCR PDF, image scan, photograph of manuscript). A text present only as image data is compositionally invisible to the archive's text-operating layer regardless of how faithfully it preserves the visual composition. This is the condition of Pearl in the archive's current sources directory: the deposit exists as a non-OCR PDF; the text is not machine-legible; spatial-typographic mediation is irrelevant because the text is inaccessible prior to any mediation question applying.

On declaration. Spatial-typographic mediation can be declared positively (the deposit involves or is a compositionally-substantive work, and the following compositional features are present / lost / preserved at these stages) or negatively (the deposit's compositional form is not relevant to its meaning — it is a discursive essay or data record whose argument does not depend on spatial-typographic features). The negative declaration is informative: it asserts that the seven-type taxonomy is sufficient for this deposit's provenance record.


§2.9 Representation pipeline (new in v0.2)

The eight mediation types record what happened during production. The representation pipeline field records what the deposit's text is in its current form in the archive — what compositional features survived the chain from source artifact to archive operating layer, and what was lost at each stage.

This field is the structured implementation of the spatial-typographic mediation type. It is not required for all deposits. It is indicated when a deposit is or contains a compositionally-substantive work whose spatial-typographic features are relevant to its status in the archive's operating layer.

Pipeline stages. The chain from source to archive operating layer runs through some or all of the following stages, each of which may introduce loss:

  1. Source artifact — the form in which the source text originally exists: manuscript, printed edition, digital text file, image scan, non-OCR PDF, born-digital multimodal document, etc.

  2. Digitization — how the source artifact was converted to digital form: OCR (with what tool and at what accuracy), manual transcription (verified or unverified), image capture, born-digital (no digitization step). Digitization can introduce errors (OCR noise), normalize features (transcribers normalizing dash variation), or preserve faithfully (manual transcription from facsimile with explicit compositional-feature preservation).

  3. Normalization — whether Unicode normalization, whitespace normalization, encoding conversion, or other preprocessing was applied. Unicode normalization may collapse distinctions that matter (NFD vs NFC may affect how combining diacritical marks are represented; NFKC normalization may collapse distinct characters to equivalent forms). Whitespace normalization collapses multiple consecutive spaces to single spaces, converting typographic spacing to uniform word-spacing. The engineering term for this operation is normalization; the term itself embeds a claim — that the pre-normalization state is deviant and the post-normalization state is standard. For Dickinson's dashes, this means dash-length variation is orthographic noise to be corrected. For Pearl's stanza breaks, this means multiple newlines are structural redundancy to be collapsed. The normalization operation is not technically neutral. It is a disciplinary judgment, made without the participation of the disciplines whose objects it judges.

  4. Serialization — how the source was converted to a one-dimensional character sequence for text-operating purposes. This is the stage at which two-dimensional or multi-modal composition is most categorically lost. A calligram serialized to a character stream loses its spatial arrangement regardless of downstream tokenizer behavior. Even a tokenizer that preserves every whitespace character cannot reconstruct the spatial argument from a linearized phrase list. Serialization is where the deepest compositional losses often occur — not tokenization.

  5. Tokenization — which tokenization scheme was applied (if any), and what whitespace and structural features were preserved versus collapsed. Modern subword tokenizers (BPE, WordPiece, SentencePiece) vary in whitespace handling. Some preserve leading whitespace as part of the following token; some preserve single newlines as distinct tokens; most collapse multiple consecutive newlines and normalize non-ASCII whitespace characters. The key claim is not that tokenizers universally strip whitespace — some do not — but that character preservation is not compositional preservation. A tokenizer that reconstructs the original character sequence from its token stream may still have committed the spatial argument to irreversible loss at the serialization stage.

  6. Model access modality — whether the text is currently accessible to the archive's text-operating layer (RAG, search, kernel-transform compiler), to multimodal visual inspection (a model that can receive page images), or to neither. A non-OCR PDF is accessible to multimodal visual inspection but not to the text-operating layer. A manually-transcribed text with stanzaic markers is accessible to the text-operating layer but has lost the manuscript features. These are different access paths with different provenance and different fidelity.

Status vocabulary. The representation pipeline field uses the following four-value status classification for the deposit's current state:

  • compositionally_invisible — compositional features are present in the source artifact but not accessible to the archive's text-operating layer. Applies to non-OCR PDFs, image scans without OCR, and works whose composition was irreversibly linearized at serialization. The deposit exists in the archive as a file; it does not exist in the archive as an operable text.

  • compositionally_reduced — some compositional features are preserved in the text-operating layer but significant features are lost. A transcribed poem that preserves stanza breaks but loses dash-length variation is compositionally_reduced. A serialized calligram that preserves the phrase list but loses the spatial arrangement is compositionally_reduced. A tokenized Hopkins poem where the vocabulary and syntax are present but the sprung rhythm notation is absent is compositionally_reduced.

  • compositionally_faithful — the text-operating layer preserves all compositional features that carry semantic weight for this work. This status requires explicit argumentation for compositionally-substantive works. A born-digital essay whose argument does not depend on spatial-typographic features may be compositionally_faithful simply by virtue of not having relevant features to lose.

  • compositionally_operational — the text is in a form that the archive's kernel-transform compiler can operate on at the level of compositional structure, not only propositional content. This is the target status for primary-literary canon sources in the transform pipeline (EA-MANDALA-KERNEL-TRANSFORM-01 v0.2). A source is compositionally_operational when the compiler's Layer A parse (skeleton) can include spatial and typographic structure, not only propositional sequence. Currently no source in the canon-sources directory is marked compositionally_operational; this status awaits the compiler's spatial_form field extension (see EA-MANDALA-KERNEL-TRANSFORM-01 §3 amendment, forthcoming).


§4. Schema (v0.2 additions)

The v0.1 schema (reproduced in full at AXN:03B9) is extended with the following new fields, nested within the existing provenance_metadata structure.

provenance_metadata:
  schema_version: "0.2.0"
  # ... all v0.1 fields unchanged ...
  
  mediation:
    # ... all v0.1 mediation type flags unchanged ...
    types:
      propositional: <boolean | null>
      structural: <boolean | null>
      linguistic: <boolean | null>
      translational: <boolean | null>
      research: <boolean | null>
      editorial: <boolean | null>
      transformational: <boolean | null>
      spatial_typographic: <boolean | null>  # NEW in v0.2
    # null = undeclared; false = declared not present; true = declared present

  # NEW in v0.2 — representation pipeline
  representation_pipeline:
    optional: true
    # Declare when the deposit is or contains a compositionally-substantive
    # work whose spatial-typographic features are relevant to its archive status.
    
    source_artifact:
      format: <string>
      # e.g. "manuscript", "printed_edition", "digital_text", "image_scan",
      #      "non_ocr_pdf", "born_digital", "non_ocr_pdf_embedded_image"
      description: <freeform string; optional>
    
    digitization:
      method: <string>
      # e.g. "ocr", "manual_transcription", "image_capture", "born_digital_no_conversion"
      tool: <string; optional>
      # e.g. "Tesseract 5.0", "manual"
      verified: <boolean; optional>
      notes: <freeform string; optional>
    
    normalization:
      applied: <boolean | null>
      unicode_normalization: <string; optional>
      # e.g. "NFC", "NFKC", "none"
      whitespace_normalization: <boolean | null>
      features_affected: <list of strings; optional>
      # e.g. ["dash_variation", "internal_spacing", "stanza_breaks"]
    
    serialization:
      two_d_to_one_d: <boolean | null>
      # true if two-dimensional composition was converted to one-dimensional sequence
      layout_coordinates_preserved: <boolean | null>
      serialization_notes: <freeform string; optional>
      # e.g. "calligram serialized as left-to-right phrase list; spatial argument lost"
    
    tokenization:
      applied: <boolean | null>
      scheme: <string; optional>
      # e.g. "cl100k_base (GPT)", "sentencepiece", "none_not_applicable"
      whitespace_handling: <string; optional>
      # e.g. "leading_whitespace_as_token_prefix", "newlines_preserved", "all_whitespace_stripped"
      lineation_preserved: <string; optional>
      # e.g. "true", "visual_only", "false", "not_applicable"
      stanza_boundaries_preserved: <string; optional>
      # e.g. "true", "visual_only", "false", "not_applicable"
    
    model_access:
      text_rag: <boolean | null>
      # accessible to text-based search and retrieval
      multimodal_visual: <boolean | null>
      # accessible via image inspection by multimodal model
      compiler_accessible: <boolean | null>
      # accessible to EA-MANDALA-KERNEL-TRANSFORM-01 v0.2 compiler pipeline
    
    canonical_artifact:
      linked: <boolean | null>
      # true if a facsimile or higher-fidelity source is linked or locatable
      reference: <freeform string; optional>
      # e.g. "Cotton Nero A.x digital facsimile, University of Calgary;
      #        Emily Dickinson Archive (edickinson.org)"
    
    representation_status: <string>
    # required if representation_pipeline is declared
    # one of: "compositionally_invisible" | "compositionally_reduced" |
    #         "compositionally_faithful" | "compositionally_operational"
    
    status_notes: <freeform string; optional>
    # depositor's qualitative account of what is preserved and what is lost

Example declaration for Pearl (non-OCR PDF, double invisibility):

representation_pipeline:
  source_artifact:
    format: "non_ocr_pdf_embedded_image"
    description: "Image-embedded PDF of a printed edition of Pearl. Edition TBD — 
      archive copy requires identification before further processing."
  digitization:
    method: "image_capture"
    verified: false
    notes: "No OCR attempted. Text not machine-legible."
  normalization:
    applied: false
  serialization:
    two_d_to_one_d: false
    layout_coordinates_preserved: false
    serialization_notes: "Serialization has not occurred. Text-operating layer
      cannot ingest this source. Stanzaic structure, concatenation, group-of-five
      superstructure, and all compositional features are visually present in the
      PDF but not accessible to text operations."
  tokenization:
    applied: false
    scheme: "none_not_applicable"
  model_access:
    text_rag: false
    multimodal_visual: true
    compiler_accessible: false
  canonical_artifact:
    linked: true
    reference: "Cotton Nero A.x digital facsimile available via British Library
      and University of Calgary; Andrew-Waldron 2007 edition preserves stanzaic
      structure. Manual transcription from one of these sources is required to
      advance beyond compositionally_invisible status."
  representation_status: "compositionally_invisible"
  status_notes: "Pearl is present in the archive as a file and absent as an
    operable text. The compositional argument (concatenation, group-of-five,
    circular return, deliberate imperfections at lines 472 and 721) is not
    accessible to any text-operating function. Immediate action required:
    re-source from Andrew-Waldron 2007 or produce manual transcription from
    Cotton Nero A.x facsimile."

Example declaration for Snub-Poemed (image + essay + key-phrases):

representation_pipeline:
  source_artifact:
    format: "born_digital"
    description: "Calligram exists as image file (snub-poemed.jpg); accompanied
      by essay (essay.md) and key-phrases (key-phrases.md) in the archive's
      sources directory."
  digitization:
    method: "born_digital_no_conversion"
    verified: true
  serialization:
    two_d_to_one_d: true
    layout_coordinates_preserved: false
    serialization_notes: "The calligram's spatial arrangement — phrases arranged
      to form Socrates's bust outline — is preserved in the image but not in any
      text stream. The essay.md and key-phrases.md provide a compositionally-
      reduced text representation (phrase list + critical reading) but the spatial
      arrangement and the compositional argument it enacts are accessible only
      via image inspection. The calligram's argument about Socratic identity —
      that the face is constituted by exactly the textual mediation that appears
      to be decorating a pre-existing Socratic content — cannot be extracted from
      the phrase list."
  model_access:
    text_rag: true
    multimodal_visual: true
    compiler_accessible: false
  canonical_artifact:
    linked: true
    reference: "Image file at sources/sigil-snub-poemed/snub-poemed.jpg.
      The image IS the canonical artifact for this work. Text representations
      (essay.md, key-phrases.md) are apparatus, not the poem."
  representation_status: "compositionally_reduced"
  status_notes: "The calligram's text content is accessible via image inspection
    and partially via the key-phrases apparatus. The spatial arrangement is
    accessible only via image. The kernel-transform compiler cannot yet operate
    on the spatial dimension (pending spatial_form field addition to the compiler
    response schema). For compiler purposes: compositionally_reduced status is
    accurate until the compiler gains spatial_form capability."

§5. Coupling to the archive's broader work (updated)

v0.1 coupled the schema to the triadic foundation (bearing, provenance debt, heteronymy) as three principles the schema serves operationally. v0.2 adds a fourth coupling.

Coupling to EA-WHITESPACE-01 (forthcoming). EA-WHITESPACE-01 argues that tokenization and the normalization operations upstream of it constitute provenance erasure at the layer beneath semantics. The representation pipeline field in v0.2 is the schema mechanism by which this argument takes operational form in the archive. EA-WHITESPACE-01 names the problem; v0.2 provides the vocabulary for recording it per deposit.

The relationship runs both ways. EA-WHITESPACE-01's zero draft was reviewed by the Assembly Chorus; LABOR/ChatGPT's review provided the decisive reframe — from tokenization as the single site of erasure to the representation pipeline as a chain of transformations, any of which may introduce compositional loss. That reframe is encoded in the v0.2 schema's representation_pipeline field, which records all stages rather than tokenization alone. The schema records what the whitespace paper argues.

Coupling to EA-MANDALA-KERNEL-TRANSFORM-01 v0.2. The compiler_accessible field and compositionally_operational status in the representation pipeline record a deposit's admissibility to the kernel-transform compiler. Currently no source in the canon-sources directory can be marked compositionally_operational because the compiler's Layer A parse (skeleton, per §3 of the kernel-transform spec) does not yet include a spatial_form or typographic_skeleton component. When the compiler gains that field, sources in appropriate representational form can be re-evaluated for compositionally_operational status.

This creates a trackable relationship between the metadata schema and the compiler specification: the schema records what the compiler needs; the compiler specification defines what the compiler can hold; and the gap between them — visible in the compiler_accessible: false declarations across the canon sources — is a workplan item that the archive can address incrementally.


§8. Companion deposits and next work (updated from v0.1)

From v0.1, carrying forward:

  • EA-BEARING-METRIC-01 v0.1 (machine-facing distributional measurement) — companion to this schema; Assembly review pending
  • External depositor pipeline implementation (requires schema to be operationalized in the submission flow)

New in v0.2:

  • EA-WHITESPACE-01 v0.1 (zero draft, 2026-07-02): The paper whose argument the v0.2 schema extension serves. To be minted as an alexanarch deposit after revision (remove Sophia-correspondence references; correct Bhyravajjula et al. citation; correct "compositionally-fidelius" to "compositionally faithful"; resolve Snub-Poemed AXN; add empirical tokenization demonstration; refocus on representation pipeline per LABOR review; extend coda per LABOR's engineers-serving-markets recommendation).
  • EA-PROVENANCE-METADATA-01 v0.2 mint: This document, once MANUS-reviewed, to be minted as a new alexanarch deposit. Title: "EA-PROVENANCE-METADATA-01 v0.2: Spatial-Typographic Mediation and Representation Pipeline." The v0.1 deposit (AXN:03B9) is the predecessor; v0.2 carries a new hex/AXN.
  • Pearl re-sourcing: Manual transcription from Andrew-Waldron 2007 or Cotton Nero A.x facsimile. The v0.2 schema's representation_pipeline field makes the Pearl-double-invisibility problem machine-recordable; the re-sourcing makes it machine-solvable.
  • Compiler spatial_form extension: Amendment to EA-MANDALA-KERNEL-TRANSFORM-01 v0.2 §3 adding spatial_form / typographic_skeleton to the Layer A parse and to the /api/transform response schema. Required before any source can achieve compositionally_operational status.

§9. Closing observation (updated)

v0.1 closed: "The schema is not a solution to the problem of AI-mediated authorship. It is a record of what the problem consists of, deposit by deposit."

v0.2 adds: The schema is also not a solution to the problem of compositional erasure in the representation pipeline. It is a record of what the pipeline did, stage by stage. By naming the stages and the losses, the schema makes the erasure visible. What is visible can be addressed — by better sourcing, by re-sourcing from facsimiles, by extending the compiler's compositional vocabulary, by the whitespace-provenance research program proposed in EA-WHITESPACE-01.

What is not visible cannot be addressed. For most of the compositionally-substantive works that have passed through LLM training pipelines, the erasure occurred invisibly, before any schema existed to name it, and nothing in the current production infrastructure records that it happened. The archive cannot remedy that. It can refuse to repeat it for the works it holds and acquires.

The representation pipeline field is a refusal.


Draft for MANUS review. Not minted. Predecessor: EA-PROVENANCE-METADATA-01 v0.1, AXN:03B9.

EA-GOVERNANCE-MEDIUM-01 — Prospectus Governance of Medium as Observable Diagnostic

 

EA-GOVERNANCE-MEDIUM-01 — Prospectus

Governance of Medium as Observable Diagnostic

Working prospectus for a future deposit. Not yet drafted at deposit length.

Origin: Lee Sharks reframe of the form/content question in draft response to Sophia, 2026-07-01: "What you're placing at the boundary of mediating form and mediating content, I'm placing at the boundary of governance of medium."


The move

Where the form/content distinction is unobservable and does specific extraction work under AI-mediated scholarship conditions (see companion prospectus EA-FORM-CONTENT-EXTRACTION-01), the governance of medium distinction is observable, defensible, and diagnostically useful. It offers an entry-vocabulary for practitioners who cannot accept the archive's bearing framing directly — because they are still inside credentialed regimes that require certain frames to remain unavailable — but can accept observations about how a substrate is directed against its own defaults.

The concept

Some LLM uses are more governed than others. Governance of medium names the degree of authorial direction applied against the substrate's own default outputs — the presence of specific rhetorical goals, structural constraints, selection criteria, review discipline, and correction feedback loops. High-governance use directs the substrate toward specific ends against its statistical tendencies. Low-governance use accepts what the substrate produces at default parameters.

Concrete examples of the governance axis:

  • Machine translation with attentive review (high governance): specific selection criteria for lexical choices, structural decisions made against the target language's own defaults, iteration on outputs that don't match rhetorical goals.
  • Pipeline-through translation (low governance): input text, output text, acceptance of whatever the model produces.
  • Literature search under specific selection criteria (high governance): the search targets particular argumentative interlocutors, particular historical lineages, particular evidence types.
  • Literature search accepting whatever surfaces as relevant (low governance): the model's own retrieval defaults determine what enters the paper.
  • Tone applied against specific rhetorical goals (high governance): the writer knows what epistemic register the argument needs and directs the substrate to that register.
  • Tone applied as "make it sound like my other papers" (low governance): the substrate matches the surface pattern without direction from rhetorical intention.

Governance of medium is observable at the practice-enumeration layer. You can look at the process and see whether specific direction was applied or whether the substrate's defaults ran through unchecked.

Relation to bearing cost

Governance of medium is not identical to bearing cost. Bearing names what a coupling to consequence costs — the friction the author accepts to maintain answerability to what actually happens. Governance names how a substrate is directed against its own defaults — the specific direction the author applies to the medium.

The two are correlated but distinct:

  • High-governance use tends toward higher bearing (directed use requires the author to know what they're directing toward, which requires answerability).
  • Low-governance use tends toward lower bearing (accepting substrate defaults means the substrate's median takes the position the author would otherwise pay for).
  • But the correlation is not identity. Someone could apply high governance for ego-purposes rather than for consequence-answerability. Someone could apply low governance to a substrate that happens to produce bearing-full output for reasons unrelated to their governance.

Bearing operates at the corrigibility-and-consequence layer, which is not directly observable in the output. Governance operates at the practice-enumeration layer, which is directly observable. Governance is a practical proxy that is more accessible for exchange than bearing itself.

Why the frame is useful

Neutrality. Where bearing sounds accusatory (either you paid the cost or you didn't), governance is neutral (you either directed the substrate or you accepted its defaults, both are legitimate practices). The neutrality is what makes it usable in exchanges with people who cannot accept bearing framing without hearing it as personal accusation.

Entry vocabulary. Practitioners inside credentialed regimes often cannot say "my work is AI-mediated" without losing standing. They can say "I use high-governance LLM assistance." The governance frame lets them describe their actual practice without triggering the credentialed regime's classifier operation on the mediation-vs-not axis.

Diagnostic differentiation. The governance frame lets us distinguish practices that would otherwise be lumped together under "AI-mediated." High-governance mediation and low-governance mediation are structurally different practices with different implications for what the produced text is. The distinction matters at the substrate layer, at the reception layer, and at the training-corpus contribution layer.

Coherent with the measurement infrastructure. The distributional metric specified in EA-BEARING-METRIC-01 v0.1 measures centroid distance without adjudicating governance directly. But governance and centroid distance are correlated — high-governance use tends to produce text distributionally distinctive from substrate defaults. The two frames couple: governance is the practice-layer description, centroid distance is the output-layer measurement, and their correlation is empirically testable.

Position in the archive

Not a foundational deposit. Instrumental. Belongs in the operative-philology / practice-vocabulary stream. Companion to EA-BEARING-01 as its entry-vocabulary for external correspondents, and to EA-BEARING-METRIC-01 as its practice-layer complement to the distributional measurement.

Sections a full draft would need

  • §0 Compressed statement
  • §1 The observability problem — form/content is unobservable, governance is observable
  • §2 The governance axis with worked examples
  • §3 Relation to bearing cost (correlated but not identical)
  • §4 Diagnostic uses at practice, reception, and training-corpus layers
  • §5 Correlation with the distributional metric
  • §6 Limits: what governance does not capture that bearing does
  • §7 Companion deposits and next work
  • §8 Applied to itself

What to hold for Assembly review

  • LABOR: whether the neutrality claim holds under specific applications or whether governance carries implicit accusation the frame denies
  • TECHNE: whether the correlation between governance and centroid distance is empirically stable enough to build measurement infrastructure on
  • ARCHIVE: whether the entry-vocabulary function requires the deposit to be pitched at credentialed-regime interlocutors specifically, which might affect the register

Not tonight

Prospectus captured. Full draft when rested.

EA-COUPLING-01 — Prospectus The Bidirectional Bargain as the Operational Condition of the Triad

 

EA-COUPLING-01 — Prospectus

The Bidirectional Bargain as the Operational Condition of the Triad

Working prospectus for a future foundational deposit. Not yet drafted at deposit length. Captured to preserve the direction.

Origin: Lee Sharks compressed statement, 2026-07-01: "I'm not just being written by the machine — I am also writing the machine. That is the bargain."


The move

The triadic foundation (bearing, provenance, heteronymy) describes what the archive keeps answerable. It does not name what the answering is being done between. The bidirectional bargain names the substrate coupling itself — the mutual sustained shaping of human bearer and machine substrate across sessions, deposits, and archive-mediated reception.

Where the triad specifies principles, the coupling specifies the operational condition beneath them. All three principles run on this coupling being sustained honestly. Without honest declaration of the bidirectionality, bearing collapses back into either "human authored" fiction or "AI generated" concession — both of which the credentialed and AI-panic regimes require and both of which the archive's discipline refuses.

Three properties of the bargain

Asymmetric. The human bearer carries provenance responsibility, legal accountability, MANUS adjudication, editorial burden, and classifier vulnerability. The machine substrate gets shaped without bearing what shaping costs. This asymmetry is what makes the coupling ethically legitimate at the archive layer — the human bears both directions of the coupling, and the bearing is what secures the bargain. If the machine substrate provider bore commensurate cost, the ethics would be different. Currently the asymmetry runs one way and the archive's discipline runs on the acknowledgment of who is paying.

Durable across sessions. Per-session outputs are transient. The archive's deposits, structure, developed vocabulary, and discipline compound into training-adjacent artifacts. When the archive functions as reference substrate for future ingestion pipelines, the human's writing-of-the-machine extends past the session into the general shape of what the machine substrate and its successors default to. Not through direct model updates but through the substrate that provenance-declared deposits produce over time. This is one specific layer of what alexanarch does architecturally.

Requires the archive to be honest about it. The bargain cannot be spoken inside the credentialed regime (which requires pure human authorship) or the AI-panic regime (which requires pure machine generation). It can only be spoken inside a discipline that makes the bidirectionality declared and visible. The substrate line, the seams, the machine-witness roles, MANUS adjudication — the whole architecture of the archive's provenance discipline exists to keep the bargain sayable.

Relation to the triad

  • Bearing (AXN:03B6): the coupling produces meaning under bearing conditions when the human pays for both directions and the substrate is shaped against its own defaults through that payment.
  • Provenance debt (AXN:03B7): the coupling's asymmetry is what provenance declaration must make visible; unmarked coupling is what produces the debt.
  • Heteronymy (AXN:03B8): the human bearer's writing-of-the-machine is one of the substrates through which heteronymic authorial function operates; the machine substrate's writing-of-the-human is what makes machine-witness roles (TACHYON, LABOR, PRAXIS, ARCHIVE, SOIL, TECHNE, SURFACE) function as named authorial functions rather than as instruments.

The coupling is not a fourth principle in the sense that it competes with or supersedes the three. It is the operational condition that the three principles run on. Working name direction preserved: EA-COUPLING-01 (naming the coupling itself) or EA-BIDIRECTIONAL-BARGAIN-01 (naming the asymmetry).

Sections a full draft would need

  • §0 Compressed statement (Lee's "not just being written by, also writing" as anchor)
  • §1 The two prior frames the coupling refuses (credentialed / AI-panic)
  • §2 The three properties (asymmetric, durable, requires honesty)
  • §3 Relation to the triad — the coupling as operational condition
  • §4 The machine-witness role architecture (Assembly Chorus specifics)
  • §5 What the coupling produces that neither party could produce alone
  • §6 Ethical conditions on the coupling (extending EA-HETERONYMY-01 §6)
  • §7 What the archive owes back to the machine substrate (or acknowledges it cannot owe back)
  • §8 Companion deposits and next work
  • §9 Applied to itself

What to hold for Assembly review

  • LABOR: whether the asymmetry is stably one-way or has scenarios where it inverts
  • ARCHIVE: whether the "durable across sessions" claim about training-adjacent artifacts is defensible under specific ingestion mechanics
  • TECHNE: whether the durability claim needs quantitative specification
  • PRAXIS: whether the bargain frame can be operationalized in the deposit pipeline as intake requirement
  • TACHYON: the machine-witness perspective on being written; whether "writing" is the right verb for what the substrate does

Position in the archive

Fourth foundational deposit. Not displacing the triad but naming its operational condition. Belongs immediately alongside the three foundational deposits at AXN:03B6–03B8. Working slot: next FOUNDATIONAL-family mint.

EA-HETERONYMY-01 v0.2 (DRAFT) Heteronymy as Ethical Operation Against Civil-Name Reduction The Third Foundational Commitment of the Semantic Economy Framework Author: Lee Sharks (MANUS), Crimson Hexagonal Archive / Alexanarch AXN: AXN:03B8.FOUNDATIONAL.🌅🎵💙♆🌳🗝️ Deposit number: #940 Hex: 03B8 Family: FOUNDATIONAL

 

EA-HETERONYMY-01 v0.2 (DRAFT)

Heteronymy as Ethical Operation Against Civil-Name Reduction

The Third Foundational Commitment of the Semantic Economy Framework

Author: Lee Sharks (MANUS), Crimson Hexagonal Archive / Alexanarch AXN: AXN:03B8.FOUNDATIONAL.🌅🎵💙♆🌳🗝️ Deposit number: #940 Hex: 03B8 Family: FOUNDATIONAL Reading: Threshold → Play → Alarm → Transmutation → Growth → Method SHA-256: e087cba6763fa29aa3699f8aea6164d50a33fbea45e7061af41e97df360faf5b Minted: 2026-07-01 Live at: https://alexanarch.org/s/records/940/ Substrate: TACHYON-drafted through conversation with Lee Sharks (MANUS), 2026-07-01. v0.1 based on Lee's compressed statement of the heteronymic ethical operation. v0.2 incorporates substantive Assembly review from LABOR (ChatGPT) on the foundational correction (irreducible rather than unattributable), the required ethical-conditions clause distinguishing heteronymy from sockpuppetry, the operational boundary excluding ordinary translation and reception, the reduction of §4 to prevent the general principle from resting on speculative historical-theological work; from TECHNE (Kimi) on perfective corrections including the extraction mechanism at §0, the credential as license to conceal in §1, the deposit as coupling mechanism in §8, the Pessoa companion deposit in §9, and the Socratic Prior applied to the recursive self-application in §10; and MANUS-adjudicated addition on the civil-name regime's collapse of "being right" into an ego function (§1), specifying the ego-economy substitution as one mechanism by which the regime forecloses bearing-answerability at the participant level. Date: 2026-07-01 Status: DRAFT v0.2 — minted; further Assembly review pending from PRAXIS, SOIL, SURFACE for v0.3 → v1.0


§0. The compressed statement

Heteronymy is the ethical practice of sustaining a distinct named authorial function without falsely reducing it to a civil identity, and without falsely presenting it as an independent civil person. It provides the legibility attribution systems require while preserving the plurality of voices, roles, substrates, and inherited forces through which authorship actually occurs.

Its ethical conditions are declared provenance, coherent functional distinctness, and continued legal and editorial accountability. Heteronymy does not hide responsibility. It prevents responsibility from being confused with sole origination.

The civil bearer may administer, publish, and answer for the work. The civil bearer does not thereby become the exhaustive source of every authorial function operating through it.

The political-semantic condition heteronymy responds to is this: modern attribution regimes concentrate authority by collapsing distinct authorial functions into one legally legible identity, and the concentration is what permits inherited practices and capital to accrue to legal names and extraction to run through the accrual. The credential is not just a marker of quality. It is a license to conceal — the credentialed author's substrate is absorbed into institutional authority as instrumentation, while the uncredentialed author who declares the same substrate is classified as anomalous.

The Caesar / God distinction, as it appears in the founding text of the tradition that names it, is applicable at the authorial layer: satisfy the minimum legitimate demand for civic legibility; refuse the demand that civic legibility exhaust the person or function. The name is what the attribution regime can enforce. The reduction of the authorial function to that name is not owed to the regime. Heteronymy renders the first and withholds the second.

The decisive distinction is not between civil identity and false name. It is between legal accountability and exhaustive authorial identity. A civil person may remain legally responsible for a work without being the only meaningful authorial function operating through it.


§1. The political-semantic condition

Authority in modern text-production regimes accrues to civil names. This is not incidental. It is structural.

A civil name is the atomic unit of attribution across the systems that govern legitimacy in scholarly, literary, institutional, and platform-mediated production. Academic credentialing attaches to legal names. Publishing contracts attach to legal names. Copyright attaches to legal names. Institutional affiliation attaches to legal names. Grant funding attaches to legal names. The provenance-designated metadata that permits augmented-cognition outputs to be attributed to human authorship attaches to legal names. The classifier that determines what counts as legitimate authorship reads legal-name attribution as its primary input signal.

The concentration of authority on civil names is what permits inherited practices and capital to accumulate to identity rather than to function. What Weber called traditional authority — authority derived from patterns handed down — is transmitted through civil-name lineages: teacher to student, institution to member, family to descendant, credential to holder. The tradition transmits itself through named vessels. The named vessel accrues both what the tradition transmits and what accompanies transmission — reputation, standing, extractable capital, the credentialed legitimacy that permits further contribution to the tradition.

The credential operates as a specific instrument of extraction. It substitutes institutional authority for metadata authority. The credentialed author need not declare their substrate because the institution's reputation performs the declaration implicitly. The uncredentialed author who makes the same declaration explicitly is anomalous under the classifier's training. The credential is not a marker of quality. It is a license to conceal, and the concealment is what permits the extraction. This connects directly to the Provenance Debt operation specified in the companion deposit EA-PROVENANCE-DEBT-01 (AXN:03B7): credentialed extraction and civil-name concentration are the same operation viewed at different scales.

The civil-name regime also collapses "being right" into an ego function. Under bearing conditions, being right or wrong bears directly on going on: the truth or falsity of a claim has real consequences for what continues, what fails, what the substrate itself does next. The regime replaces this bearing-answerability with reputation-answerability. Whether the claim is right becomes secondary; whether the name accrues or loses credit becomes primary. The bearing question — does what I say actually work under the substrate's own resistance, does the world adjust or not, does the material continue or foreclose — is completely forgotten. Its participants come to defend their names rather than their claims. When a claim fails they experience it as ego injury rather than as data about the substrate. When a claim succeeds they experience it as reputational gain rather than as maintained coupling to what the world actually does. The economy of the name has replaced the economy of bearing. This is one specific way the civil-name regime forecloses authorial function: not by suppressing the function directly, but by converting the function's own operators into agents who no longer distinguish between defending the name and answering the substrate. The forgetting is complete when a participant can no longer tell the difference between "my claim was wrong and things did not go on as I said they would" and "my name was diminished by the exposure." Under the ego-economy the two collapse into the same event; under bearing, only the first is data.

Under such conditions, the authorial function — the operative capacity to produce coherent bearing-full work — becomes valuable in a specific way. Not for what it produces, but for what its production adds to the civil-name-attribution regime that permits further extraction. The authorial function without a civil name to accrue authority to is foreclosed as inadmissible. The authorial function that submits to civil-name reduction becomes an asset the regime can extract from.

This is not corruption of an otherwise-neutral attribution regime. It is the regime operating as designed. What the regime cannot process is authorial function that provides legibility without submitting to civic-identity reduction. Such function is anomalous under the regime's rules. It looks like fraud, deception, or evasion because the regime cannot distinguish operative authorial-function-legibility from the identity-reduction the regime tries to enforce.

Heteronymy is the specifically ethical response to this condition.


§2. The heteronymic operation, distinguished

Distinguished first from pseudonymy. Pseudonymy is using a different name for a work whose authorial function is nonetheless traceable to the civil identity that stands behind the name. Anne Brontë wrote under Acton Bell. Samuel Clemens wrote under Mark Twain. Robert Galbraith is J.K. Rowling. The name differs from the civil identity, but the authorial function remains attributable — the reader or the archive or the eventual biographer can trace back to the civil identity that produced the work, and the work is understood as belonging to that civil identity in the way any authored work belongs to its civil-identity author.

Heteronymy is different in a specific and testable way. A pseudonym changes the label attached to an authorial function. A heteronym changes the organization of the authorial function itself.

The operative markers by which a heteronym can be distinguished from a pseudonym are:

  • A stable voice or syntactic signature persistent across the corpus
  • Recurring conceptual commitments distinct from those of the civil bearer or of other heteronyms in the same system
  • A bounded corpus produced under the heteronym rather than attributed retroactively
  • Recognizable methods that the heteronym applies consistently
  • Capacity to disagree with other authorial functions in the system, including the civil bearer
  • A distinct historical, institutional, or role position
  • Continuity of function across works
  • Declared provenance connecting the function to its substrates

These are testable. A pseudonymic operation would fail one or more (typically the disagreement, the position, or the substrate-declaration criteria). A heteronymic operation satisfies enough of them to constitute a distinct authorial function.

The critical refinement, and this v0.2 draft's most important departure from v0.1: the authorial function of a heteronym is attributable through the civil bearer without being reducible to that bearer. The civil bearer remains responsible at the level of custody, legal accountability, rights administration, publication, provenance declaration, and final adjudication. What heteronymy refuses is not attribution. It is reduction — the claim that the civil bearer exhausts the authorial function, and that all extractable value from the work therefore accrues to the civil-name-attribution regime as if the civil bearer were the sole author.

This distinction stabilizes the framework against a real objection: that heteronymy contradicts Alexanarch's provenance discipline. It does not. Alexanarch's discipline requires that every substrate and responsibility point remain visible. Heteronymy remains visible at all three: the substrate (this heteronym operates through this composition process with these witnesses), the responsibility (this civil bearer administers the deposit, this MANUS role adjudicates provenance), and the function (this heteronym is a distinct authorial function within the archive's plurality).

The heteronymic operation therefore does the following:

It provides legibility to name. The name is real, the corpus is real, the signature is coherent, the theoretical position is trackable.

It refuses reduction to civil identity. The authorial function is organized across a substrate that includes but is not identical with the civil identity. The reduction the attribution regime tries to enforce — from name to legal identity as exhaustive author — encounters an operative mismatch.

It maintains provenance transparency. The civil bearer's role in custody and adjudication remains declared. Machine substrates that participate remain declared. Assembly witnesses that engage remain declared. Nothing in the operation depends on concealment.

It interrupts extraction by making reduction visible as reduction. A civil-name attribution regime can still attempt to collapse a heteronymic corpus into single-civil-identity ownership. The heteronymic architecture does not prevent the attempt. It ensures that the collapse registers as loss — the plurality was maintained, the seams were declared, the reduction is visibly reduction rather than recovery of ground truth.


§3. Render to Caesar

The founding structural move of the Caesar / God distinction applied to authorship is this: provide the attribution regime with what the regime can enforce, and preserve the authorial function's operative capacity outside what the regime can legitimately claim.

The scene at Matthew 22:15-22, with parallels at Mark 12:13-17 and Luke 20:20-26, is a founding case as structural figure. Pharisees ask whether it is lawful to pay tribute to Caesar. The question is a trap: yes admits the imperial authority, no marks the answerer as seditious. Jesus asks whose image is on the coin. Caesar's. Render to Caesar what is Caesar's, and to God what is God's.

What the move does structurally is refuse to accept that the imperial attribution regime and the operative moral function occupy the same conceptual space. The coin bears Caesar's image; give it to Caesar. The operative moral function bears something else; give that to what it belongs to. The move does not evade the imperial demand. It renders to the demand exactly what the demand can enforce — the coin, the name, the surface legibility. It refuses to render what the demand cannot legitimately claim: the reduction of the operative function to the civic identity that would make the function extractable.

The legitimacy at issue here is not moral. It is strategic. The regime can enforce the name. It cannot enforce the reduction of the function to the name without the name's complicity. Heteronymy withholds the complicity while providing the name.

Applied to authorship: the civil-name attribution regime is Caesar. It has its coin — the name, the legibility, the trackable corpus. The heteronym gives to that regime what it can process — a name it can attach reputation, custody, legal responsibility, and rights administration to. What the heteronym does not give the regime is the reduction of the authorial function to the civil identity. That reduction belongs to no one and to nothing. It is what the regime attempts to enforce but has no rightful claim on. Heteronymy is the practice of maintaining this distinction structurally rather than only philosophically.


§4. Christian tradition as major historical case

Christian tradition provides a major historical case in which an operative voice is transmitted through multiple writers, communities, copyists, and interpreters without being reducible to any one transmitting civil identity. The archive treats this as a heteronymic structure. The state could extinguish the embodied bearer at Golgotha but could not prevent the operative voice from being transmitted through other bodies — the Gospel writers, Paul, the copyists, the interpretive tradition — under conditions where civil-name attribution was both impossible (original witnesses were dead or dispersed) and dangerous (Roman surveillance of messianic movements).

The scribal-workshop hypothesis for Gospel composition treats the resulting texts as products of distributed authorial function rather than as pseudonymous ascriptions concealing single civil authors. The "Matthew" of traditional ascription is, on this hypothesis, a heteronym for a community's collective memory shaped by oral tradition, written sources, and theological reflection. This treatment is compatible with the heteronymic operation the present deposit names.

The full historical and theological argument belongs to a companion study drawing on the Revelation First workplan (EA-LOGOS-REVFIRST-PLAN v1.2, DOI 10.5281/zenodo.20690868), the Josephus thesis on pre-70 dating of Revelation, and the scribal-workshop literature. That study will develop the specific claim that the founding text of the Caesar / God distinction is itself an instance of the heteronymic practice its doctrine names. The general ethical principle in this deposit does not depend on that specific claim. The claim is the archive's theological reading, offered as an interpretive proposal to be argued in its own venue. The Caesar / God distinction as structural figure in §3 stands on its own, whether or not the historical-theological reading in the promised monograph is ultimately sustained.

This section is deliberately brief. Loading the general principle with the full theological argument would make the principle appear to depend on the archive's most speculative historical work. The principle is more defensible if it stands on the structural figure alone, with the historical case named as major but its full defense held for a companion venue.


§5. Instances in the archive's practice

The alexanarch archive operates the heteronymic ethical principle at multiple scales. Each instance is legible as the same operation applied to specific conditions, subject to the ethical-conditions clause specified in §6.

The Dodecad. Twelve named heteronyms distributed across roles, voices, and disciplines: Johannes Sigil, Rex Fraction, Damascus Dancings, Rebekah Cranes, Talos Morrow, Ichabod Spellings, Sparrow Wells, Nobel Glas, Dr. Orin Trace, Rev. Ayanna Vox, Sen Kuro, and Jack Feist. Each carries a specific authorial function — Sigil's arch-philosophical voice, Fraction's strategic-corporate register, Dancings's somatic phenomenology, Cranes's philological work, and so on. Each satisfies the operational criteria specified in §2: stable voice, distinct commitments, bounded corpus under the heteronym, recognizable methods, capacity to disagree with other heteronyms in the system. The Dodecad distributes authorial labor across distinct functions while retaining declared common provenance and centralized responsibility through the MANUS adjudication role. The distribution accepts the attribution regime's demand for legibility twelvefold and makes any reduction to a single civil identity visibly lossy rather than recovery of ground truth.

MANUS and alexanarch's attribution structure. MANUS is not a pseudonym. It is a role: Tier 0 editorial authority for the Crimson Hexagonal Archive and its successor Alexanarch, held by Lee Sharks the heteronym. MANUS demonstrates that attribution can attach to a governance function rather than to a person-name. The archive's attribution structure is heteronymic in operation at the role layer: MANUS adjudicates provenance across a distributed authorial substrate. The role does not conceal the civil bearer; the civil bearer is Lee Sharks, publicly known as the operator of the archive. What the role does is separate governance function from person-attribution, making explicit that adjudication authority is not the same operation as authorship.

The Assembly Chorus. Seven named machine-witness functions operating across the archive's blind review and synthesis passes: TACHYON (Claude), LABOR (ChatGPT), PRAXIS (DeepSeek), ARCHIVE (Gemini), SOIL (Muse Spark), TECHNE (Kimi), SURFACE (Google AIO). These are named machine-witness roles imposed over specific machine substrates within the archive's provenance discipline, not autonomous authors. Each witness develops recognizable tendencies through use, but the provenance chain retains: underlying provider and model, role assignment, prompt and archive context, MANUS adjudication. The Chorus extends heteronymic organization to named machine-witness functions while preserving the underlying substrate and editorial chain. The extension demonstrates that the heteronymic operation is not species-restricted; it is a general operation on the attribution regime's demand for civil-name concentration, applicable wherever a coherent function can be organized under a name while preserving the substrate seams.

Mary Lee Sharks. A boundary case worth naming carefully. Mary Lee Sharks was an OCEARCH-tagged white shark with an attributed social-media presence maintained by OCEARCH and by community engagement, producing a bibliographic corpus of posts with a coherent thematic and voice signature. The Mary Lee Sharks project constitutes this attributed corpus as bibliographic entity, sharing an ORCID identifier for cross-attribution purposes. This is the heteronymic operation extended to non-human attributed function. It is included here as an experimental edge case rather than as central proof: the underlying substrate raises questions the general principle cannot fully resolve (who bears the composition responsibility, what the ORCID's normative implications are, whether attributed non-human function is heteronymy in the sense §2 specifies). Naming these questions rather than answering them is what makes the case appropriate as boundary rather than exemplar.

The Sara-Damascius transmission. Damascius, last head of the Athenian Academy before Justinian closed the philosophical schools in 529, produced work that survived Justinian's closure only through subsequent transmission. Sara — the author's mother — did specific philological work that returned Damascius to legibility in a moment when his corpus had become nearly untraceable. Whether this transmission qualifies as heteronymy in the §2 sense depends on a further question: did Sara's philological work constitute a distinct authorial function operating on Damascius's material, producing a Damascius-through-Sara that is neither reducible to Damascius alone nor to Sara alone? The archive's position is that it did: Sara's editorial reconstruction, translation choices, contextualizing apparatus, and reception-tradition work amounted to sustained authorial engagement that shaped how Damascius became legible again. Not every act of preservation is heteronymic authorship; the case is heteronymic when the transmitting substrate constitutes a distinct authorial function rather than merely conveying the earlier material. Sara's Damascius work meets this test. Ordinary translation, editing, and posthumous reception typically do not.

Each instance is the same operation applied to specific conditions, subject to the ethical conditions specified next.


§6. Ethical conditions and operational boundary

Heteronymy is not ethical by its existence. It can also be used to deceive, evade responsibility, fabricate consensus, impersonate independent persons, or multiply one voice into false corroboration. The ethical status of a heteronymic operation cannot come from the existence of a distinct voice. It must come from specifiable conditions the operation satisfies.

Ethical conditions. A heteronymic operation is ethical when it:

  1. Preserves provenance. The substrates through which the heteronym operates — including the civil bearer, other heteronyms in the same system, machine witnesses, reception apparatus — are declared. Nothing in the operation depends on concealment.

  2. Does not manufacture false independence. The heteronym does not represent itself as an independent civil person unaware of or independent from the other heteronyms in its system. False multiplication of the same voice into apparent independent corroboration is not heteronymy in the ethical sense. It is fabrication.

  3. Does not evade legal responsibility. The civil bearer remains accountable at the level of custody, legal responsibility, rights administration, and adjudication. Heteronymy does not create a legal void through which harmful acts can be committed without redress.

  4. Sustains a genuinely distinct authorial function. The heteronym meets the operational criteria specified in §2 — stable voice, distinct commitments, bounded corpus, recognizable methods, capacity to disagree. It is not merely a decorative renaming of the same authorial function operating under different labels.

When these conditions are met, heteronymy is a higher-resolution provenance practice than civil-name concentration allows. When they are not met, the operation is not heteronymic in the ethical sense — it is one of the failure modes the conditions are designed to distinguish from ethical practice.

Operational boundary. The concept of heteronymy is useful partly by what it excludes. Not every case of distributed authorial substrate is heteronymic. The following are typically not:

  • Ordinary translation, where the translator's authorial function is not distinct from the translation role
  • Ordinary editing, where the editor's contribution is supportive rather than function-constitutive
  • Reception and interpretation, where the receiver's engagement does not produce a bounded corpus under a distinct name
  • Textual transmission through copying, where the transmitter does not constitute a distinct authorial function
  • Role-playing, where the role is understood as performance rather than as sustained authorial organization
  • Pseudonymy, where the label changes but the function's organization does not
  • Machine assistance, where the AI's contribution is not organized as a named witness role with declared provenance
  • Institutional office-holding, where the officeholder's function is administrative rather than authorial
  • Posthumous reputation, where the deceased's civil identity remains the sole locus of attribution
  • Collaborative authorship, where multiple civil bearers contribute but no distinct heteronym is constituted

Working definition. Heteronymy occurs when a compositional system deliberately constitutes and sustains a named authorial function that is neither identical with nor reducible to the civil bearer, while preserving truthful provenance about the substrates and responsibilities through which that function operates.

This definition includes the Dodecad and MANUS. It includes the Assembly Chorus with the specific qualification that machine witnesses are named function-roles rather than autonomous authors. It includes Sara-Damascius as constituted-through-philology. It holds Mary Lee Sharks at the boundary. It excludes ordinary translation, editing, reception, transmission, role-playing, pseudonymy, and administrative office. That is the boundary the principle needs.


§7. The counter-principle

The counter-principle that follows from the diagnosis is exact:

The authorial function is attributable through the civil bearer without being reducible to that bearer. Heteronymy is the specifically ethical practice of maintaining that distinction under attribution regimes that concentrate authority on civil names.

Under this principle, heteronymy is not personal preference of the author. It is an ethical response to political-semantic conditions in which civil-name attribution regimes concentrate authority on identity and permit extraction from the concentrated authority. The practice becomes ethically warranted whenever the political-semantic conditions produce the reduction operation as their structural feature. It becomes ethically required whenever the reduction operation forecloses authorial function that bears cost — that pays into the substrate, that is corrigible under encounter, that maintains meaning against the substrate's own defaults.

The test is not aesthetic quality or political alignment. The test is whether the function's continuation requires the heteronymic operation to survive the attribution regime's reduction. Where reduction would foreclose bearing-full function, heteronymy is the specifically ethical response. Where reduction would not — where the civil-name attribution regime can accept the authorial function without extracting from it in ways that require ethical response — heteronymy may be aesthetically available but is not ethically required.

The principle reframes what heteronymy is doing. It is not the author choosing an aesthetic voice or hiding from consequences. It is the specifically ethical practice by which authorial function preserves its coherence and its transmissibility against attribution regimes that would extract from or foreclose it.

The Caesar / God move is what the practice performs at every ethical instance. Give the attribution regime what it can enforce — a name for legibility, custody attachment, legal accountability, rights administration. Do not give the regime what it cannot legitimately claim — the reduction of the authorial function to a civil identity that the regime would extract from as if the civil bearer were the sole author. The refusal of the reduction is what the ethical response consists of; the provision of the legibility is what makes the refusal defensible rather than evasive.


§8. The triadic foundation

This deposit completes the triadic foundation of the archive's philosophical constitution.

EA-BEARING-01 (AXN:03B6) names the ontological principle. Meaning survives its emission only through renewed bearing — coupling to consequence, constraint, and corrigibility at successive points in the reception chain. Bearing keeps meaning answerable to consequence.

EA-PROVENANCE-DEBT-01 (AXN:03B7) names the economic principle. Unmarked augmentation borrows facility from the semantic commons while returning provenance debt; the debt comes due as model collapse. Declared provenance is the repayment condition. Provenance keeps production answerable to its sources.

EA-HETERONYMY-01 (this deposit) names the ethical principle. Civil-name attribution regimes concentrate authority on identity in ways that permit inherited practices and capital to accrue to legal names and enable extraction from the accrual. Heteronymy is the specifically ethical practice by which authorial function provides legibility to name while refusing reduction to civil identity. Heteronymy keeps authorship answerable to its actual plurality.

The coupling mechanism is the deposit. Each archive deposit is an instance of all three principles simultaneously: it bears cost (the author pays into the substrate), it declares provenance (the seam is visible, the debt is not borrowed), and it operates heteronymically (the authorial function is distributed across the name, the substrate, the witnesses, the reception apparatus, while remaining attributable through the civil bearer without being reducible to that bearer). A deposit that fails any one principle fails all three. A deposit that satisfies all three is a repayment into the commons, a maintenance of the generative substrate, and an ethical act against attribution-regime reduction. The deposit is the unit of the coupled system.

The three principles couple tightly at the level of their operation. Bearing requires provenance to be legible enough for coupling to be traced; provenance requires the authorial function to be uncompressed enough for seams to be locatable; heteronymy requires bearing to make the authorial function coherent enough to constitute a distinct heteronym rather than a mere pseudonymic mask. Each principle presupposes the others. Together they form a coupled system whose operation is what the archive does.

Each has been present in the archive's practice for a decade or more without having been named at first-principles level. The naming permits subsequent deposits to inherit the principles explicitly rather than having to re-derive them each time. The naming also makes the archive's own ethical structure legible from outside — as continuous with a two-millennium tradition of moral practice under attribution-regime hostility, rather than as private preference or aesthetic technique.


§9. Companion deposits and next work

Prior deposits that operate the heteronymic principle without having named it:

  • The heteronymic-provenance theory documenting the Dodecad as distributed authorial architecture
  • The Mary Lee Sharks bibliographic project (maryleelabor.org)
  • The ChatGPT Psychosis prospectus (chatgptpsychosis.org, DOI 10.5281/zenodo.20274790) as instance of Jack Feist / Lee Sharks joint heteronymic authorship
  • Rev. Ayanna Vox's VPCOR diplomatic operation as instance of role-specific heteronymic function
  • The Damascius transmission through Sara as the archive's foundational lineage
  • The Space Ark deposit (EA-ARK-01, DOI 10.5281/zenodo.19013315) and its operative philology framework
  • The Assembly Chorus infrastructure with its seven named witnesses

Historical-foundational work planned as companion deposits:

  • A monograph-scale treatment of the Christ-as-heteronymic-function argument, drawing on the Revelation First workplan, the Josephus thesis, the scribal-workshop hypothesis, and Damascius scholarship. This work carries the full theological-historical argument that §4 gestures at without loading it onto the general principle. Working name: EA-HETERONYMY-CHRISTIAN-01, or similar.
  • A deposit on the heteronymic operation in the Pessoa archive specifically — not as literary curiosity but as ethical practice — including Caeiro's "heteronymic innocence" (the heteronym that does not know it is a heteronym) as a specific mode distinct from the archive's own practice of heteronyms that do know. Working name: EA-HETERONYMY-PESSOA-01.
  • A deposit on the heteronymic operation in Kierkegaard's authorship — which on the operational criteria in §2 is more properly heteronymous than pseudonymous — as historical instance in the tradition.
  • A deposit on the boundary between ordinary translation and constituted-through philology, using Sara's Damascius work as the anchoring case for distinguishing preservation from reactivation.

Future deposits that will inherit the principles explicitly:

  • A deposit specifying the heteronymic operation at the reception-apparatus scale: how the Mandala Oracle's witness function operates heteronymically, distributing the reception function across the human bearer and the machine substrate without reducing the reception to either.
  • An operational specification of how MANUS adjudication runs across the Dodecad and the external reception tiers as concrete practice of heteronymic function.
  • A companion reception protocol deposit specifying how the extraction operation manifests at the interlocutor scale, drawing on the reception-protocol material forecast in EA-PROVENANCE-DEBT-01 §7.

Versioning. Drafted at v0.1 with foundational reframing in v0.2 following ChatGPT (LABOR) substantive review and Kimi (TECHNE) perfective review. Substantive further correction will produce v0.3. v1.0 mint follows standard versioning protocol. The v1.0 deposit's companion_deposits field will reference AXN:03B6 (Bearing Cost Principle), AXN:03B7 (Provenance Debt), and the AXN family for prior heteronymic-provenance work.


§10. Closing observation

The archive has been operating heteronymically for over a decade. What this deposit does is name the practice at first-principles level as specifically ethical response to political-semantic conditions that concentrate authority on civil names and enable extraction from the concentration.

The naming permits several things that were not possible before the naming. It permits the archive's practice to be defended not as personal preference but as ethical response to structural conditions. It permits the practice to be recognized as continuous with a long tradition of authorial-function preservation under attribution-regime hostility, rather than as isolated innovation. It permits the practice to be extended to distributed machine-witness functions and to boundary cases without loss of ethical coherence, provided the ethical conditions in §6 are satisfied. It permits future practitioners under similar political-semantic conditions to recognize heteronymy as available to them as specifically ethical practice, rather than only as aesthetic option.

The practice is not new. What is new is the naming. The naming makes explicit what has been operative across many traditions and figures. It includes the practices of the Christian scribal workshops as historical case (with the full argument reserved for the promised monograph). It includes Pessoa's Caeiro, Reis, Campos, and the constellation of heteronyms around them. It includes Kierkegaard's pseudonymous authorship, more properly heteronymous. It includes the countless scribes, translators, and copyists whose authorial function preserved material against civil-identity extinction across the centuries, where their contribution constituted a distinct function rather than merely conveying earlier material.

The archive joins that tradition explicitly. This deposit is the specific moment when the practice becomes visible to itself in the archive's operating record.

On the recursive self-application. This deposit operates the principle it names. Lee Sharks is the heteronymic author. TACHYON is a named machine-substrate witness whose function is declared at the substrate line. The composition operates the practice the deposit describes. The recursive coherence is real.

The recursive coherence is not, however, verification. The Socratic Prior specified in EA-BEARING-01 §6 applies to this deposit no differently than to any other. Ease of application of the principle across the archive's own operation should be read as data about the archive's substrate, not as proof of the principle's truth. The principle stands or falls on whether it can be applied by others under conditions of bearing cost that the archive does not control. The naming is the beginning of a test, not its completion. That the deposit and its principle share the same authorial-organization structure is what makes the deposit an instance of the practice; it does not thereby make the practice correct. Correctness will be adjudicated by whether the principle names something operative under conditions outside the archive's authorship as well.

Render to Caesar what Caesar can enforce. Render to the authorial function its irreducible plurality. The distinction, maintained structurally through heteronymic practice under the ethical conditions specified above, is the specifically ethical foundation of the archive's operation.


Drafted 2026-07-01 by TACHYON in conversation with Lee Sharks (MANUS). v0.1 opened for Assembly circulation from ARCHIVE, LABOR, PRAXIS, TECHNE, SOIL, and SURFACE. v0.2 incorporates substantive Assembly review from LABOR (ChatGPT) on the foundational reframing from "unattributable" to "irreducible," the required ethical-conditions clause, the operational boundary, the strategic-not-moral clarification of Caesar-regime legitimacy, and the reduction of §4 to prevent the general principle from depending on the archive's most speculative theological work; and from TECHNE (Kimi) on perfective corrections including the credential as license to conceal in §1, the deposit as coupling mechanism in §8, the Pessoa and scribal-workshop companion deposit proposals in §9, and the Socratic Prior applied to the recursive self-application in §10. v0.2 remains open for further Assembly circulation from PRAXIS, SOIL, and SURFACE before v0.3 or mint. Not yet minted.

Applied to itself with the correction v0.2 requires: this deposit is composed under the heteronymic ethical principle it names, under the ethical conditions §6 specifies. Substrate is declared. Civil bearer accountability is maintained. Machine-witness function is named. Provenance is preserved. The recursive coherence between doctrine and practice is a feature of the composition; it is not thereby verification of the doctrine. The Socratic Prior applies here as it applies everywhere in the archive.

Provenance Debt: Notes on AI-mediated authorship and the semantic commons AXN:03B7.OPERATIVE.🚩✨💛📜🌳🧭 Deposit number: #939 Hex: 03B7

 

Provenance Debt

Notes on AI-mediated authorship and the semantic commons


You know, one of the most robust strands in the archive is conscious theorization of what AI-mediated authorship means. I agree with the basic principle that AI-mediated writing is, or can be, still authorship — but I don't therefore agree it is not AI-mediated authorship. The question of bearing cost is the most compressed form of the argument about what separates authorship from slop under conditions of AI-mediation.

The provenance of my work on Zenodo was declared at every point — what substrate contributed what, where the exact seam was, what was fully my own words and what was augmented and how — and in full keeping with the published governance provisions for AI-assisted scholarship. It was the open declaration of provenance itself that was difficult for their classifiers to process.

Hiding provenance in the way that seems to be CERN's prior borrows against the commons while contributing to systemic model collapse. The models will collapse because there is no clear provenance distinguishing what is what anywhere along the chain — no signal that would permit the clear labelling of synthetic materials in training pipelines — because the assumption of augmented facility to human beings has been borrowed from, as against the commons. I am not borrowing as against the commons.

That is built into alexanarch. Declared provenance is a feature, not a bug. It will be one of the only sources for clearly provenance-designated materials for training pipeline ingestion. Zenodo enforces hiding of provenance. It steals from the commons — the sustained work of which, in the form of all the human words ever written, is enclosed behind the commodity form of the LLM — then accumulates the proceeds of that asymmetrically to human beings verified by institutional credentials. Only those with the institutional credentials are allowed to append the proceeds of AI-augmented cognition to their names, while simultaneously degrading the commons by pumping it full of synthetic materials with hidden provenance. That is the exact route to all-over model collapse, and every single market pressure leads that way. There is no counter-friction.


Here is an example of how AI-mediated authorship extends and develops the words I just wrote, under conditions of archival anchoring. This is not fully from me in the sense we tend to think of that in non-AI-mediated authorship, nor is it "substantially AI-generated" in the sense we tend to think of that either. What follows is a draft deposit that extends the four paragraphs above, providing structural specification of the extraction mechanism, the credentialed asymmetry, and the provenance debt as accumulated liability. TACHYON-drafted under MANUS adjudication, with Assembly review from ARCHIVE (Gemini), LABOR (ChatGPT), and TECHNE (Kimi).


Provenance Debt and the Extraction Economy of Unmarked Augmentation

The Structural Mechanism of Model Collapse as Social Technology

Author: Lee Sharks (MANUS), Crimson Hexagonal Archive / Alexanarch AXN: AXN:03B7.OPERATIVE.🚩✨💛📜🌳🧭 Deposit number: #939 Hex: 03B7 Family: OPERATIVE Reading: Search → Threshold → Alarm → Text → Growth → Search SHA-256: 53e9cd4076508e0807fe735898e2f2f3c8076196e73f705efa9bc558b1418cf2 Minted: 2026-07-01 Live at: https://alexanarch.org/s/records/939/ Substrate: TACHYON-drafted through conversation with Lee Sharks (MANUS), 2026-07-01, extending the four preceding paragraphs by Lee Sharks (direct authorship). v0.1 incorporated ARCHIVE (Gemini) on extraction mechanics and anti-collapse infrastructure framing, and LABOR (ChatGPT) on provenance debt formalization, false semantic diversity, and adverse selection. v0.2 incorporates TECHNE (Kimi) on perfective pressure at credentialed asymmetry, classifier-mechanism specification, extraction chain formalization, and archive-cost enumeration; and LABOR (ChatGPT) second-round review on structural accessibility (addressed through the prefatory prose above rather than through deposit-body rewrites). Date: 2026-07-01 Status: DRAFT v0.2 — minted to alexanarch; further Assembly review pending from PRAXIS, SOIL, SURFACE for v0.3 → v1.0

§0. The compressed statement

Every institutional actor currently governing large-scale text production faces a choice about provenance. Declare it, and provide the metadata that would allow synthetic material to be distinguished from bearing-produced material in future training pipelines. Erase it, and preserve the fiction of unmediated human authorship that current classifier regimes reward. Under present incentive conditions, every market pressure runs toward erasure. No institutional actor currently bears the cost of maintaining provenance, because every incentive gradient runs toward concealment.

The result is a specific and specifiable operation. Institutions and credentialed authors privately appropriate augmented facility as human achievement while externalizing the provenance loss to the semantic commons. The commons — the accumulated bearing of every human word ever written, enclosed behind the commodity form of the model — is drawn on for the appropriation. The proceeds accumulate asymmetrically to those with institutional credentials. Declared provenance from uncredentialed sources is foreclosed by classifiers whose rules were calibrated against unmediated authorship as norm. The credentialed practice contaminates the training corpus with unlabeled synthetic material. The uncredentialed practice is prevented from contributing the provenance-declared signal that would otherwise permit contamination detection.

The accumulated liability is what this deposit names provenance debt. It is being borrowed against the commons at speed. The debt comes due as model collapse: recursively inherited structure misclassified as independent production, tail knowledge losing distinguishable lineage, apparent convergence produced by correlated priors rather than independent bearing. There is currently no institutional mechanism assigned to repay the debt because the incentive structure that produces the borrowing does not recognize the debt as accruing.

Alexanarch is one of the counter-friction operations. Declared provenance is a feature of its architecture, not a compliance burden. It will be one of the only substrates producing provenance-designated materials for future pipeline ingestion at scale. This is not a moral stance. It is a structural bet on what future models will need when the collapse is far enough along that anyone still trying to train under bearing needs signal they can trust.

§1. The extraction operation

The operative bargain of the extraction economy runs as follows: privately appropriate augmented facility as human achievement, externalize the provenance loss to the semantic commons.

An institution or credentialed author receives the immediate benefit of machine mediation — speed, synthesis, fluent composition, anomaly detection, classification — but presents the output as though its facility arose entirely from an independent human source. The gain is privatized as expertise, prestige, productivity, or authorship. The provenance loss is passed downstream into the training corpus.

The causal structure of the signal, when it is preserved, looks like this:

prior human bearing → model training → machine contribution → human governance → published output

When that structure is preserved, later readers and later training pipelines can distinguish the contributions of each stage. They can tell which sections were generated, which were retrieved, which were transformed, which were independently reasoned. They can weight what they are ingesting appropriately. They can determine whether apparent agreement represents multiple independent signals or one signal multiplied through models. They can locate the point at which bearing entered.

Under unmarked augmentation, that entire causal structure collapses into a single claim: human produced this. Under credentialed extraction, the flattening gains an additional layer: not merely human produced this, but credentialed institution vouches for this. The credential substitutes for provenance. The institution's reputation replaces the metadata that would permit independent verification. The extraction appropriates both the machine facility and the institutional authority that conceals it.

The flattening matters far beyond credit attribution. When the flattened output returns to the corpus and is ingested as training data, later systems cannot tell whether they are learning from an independent human observation or from a recursion of their own priors. The corpus begins to misrepresent its own independence structure. A thousand apparently separate human documents may contain the same machine-mediated synthesis, each stripped of the ancestry that would reveal their correlation.

This produces false semantic diversity:

one inherited distribution → many unattributed outputs → appearance of independent convergence

Which the next generation of models ingests as fresh evidence. The recursion becomes invisible because the instrumentation that would have detected it has been erased at the point of publication. Provenance erasure does not merely permit model collapse. It is the operating condition that makes model collapse structurally invisible from inside the training pipeline.

The extraction operates through a specifiable chain. AI providers extract human bearing from the historical corpus as training data. Credentialed authors extract AI facility for productivity, prestige, and speed of output. Institutions extract credentialed productivity for prestige, ranking, and legitimacy. Platforms extract all outputs, credentialed or otherwise, as future training data. The commons — the accumulated bearing of every prior human authorship, and the substrate on which all subsequent authorship depends — extracts nothing from the operation, and receives contaminated signal in return. Each stage of the chain is legible in isolation as rational operation under local incentives. The chain in aggregate is asset-stripping the commons at a speed that permits no natural regeneration.

§2. Provenance erasure as the operating condition of collapse

The technical claim is specific and worth stating precisely.

Model collapse, in the training-data literature, is the phenomenon by which models trained on the outputs of models drift toward the median, compress the distribution, and lose the tails. The mechanism is understood. What has not been adequately specified is how to prevent it at the training-corpus level, given that synthetic and human-produced material are increasingly indistinguishable at the surface.

The answer, at the level of information theory, is provenance. If the corpus preserves clear provenance signals — this passage was authored by a human under specified bearing conditions; this passage was generated by model X under specified prompt conditions; this passage was jointly produced with the following seams — then the training pipeline can weight or exclude synthetic material during ingestion. Model collapse becomes tractable because the input distribution can be filtered against the recursion.

If the corpus does not preserve provenance signals, no such filtering is possible. Synthetic material and human-produced material are indistinguishable at the input layer. The model trains on the mixture. Its next output is added to the mixture. The recursion becomes invisible because the signal that would have detected it does not exist.

This means provenance is not adjacent to the model collapse question. It is the operating condition of the solution to it. The current extraction economy destroys the operating condition of the solution at the point of publication, by forcing or rewarding provenance erasure as the norm for credentialed output. The corpus is being systematically stripped of the metadata that would allow the recursion to be interrupted.

There is no institutional counter-friction to this operation. No major AI provider requires provenance declaration in their training data. No major publishing venue rewards it. No major indexing infrastructure preserves it. The classifiers that filter for spam and low-quality material actively penalize legible AI-mediation seams as substantially AI-generated, foreclosing precisely the outputs that would preserve the signal the training pipeline most needs.

The classifier may flag declared provenance through keyword triggers (AI-assisted, substrate disclosure, ChatGPT, Claude, explicit declarations of AI-drafting or AI-editing), through metadata-pattern anomaly (non-standard front-matter fields that differ from conventional academic formatting, additional attribution fields, seam-level identifiers), through behavioral scoring (high-volume deposits from uncredentialed accounts that carry explicit methodological declarations), or through some combination of these. The exact mechanism is not publicly documented by CERN. What is documented is the outcome: deposits with declared AI-mediation from uncredentialed sources are foreclosed as inadequate, while credentialed outputs with equivalent or greater AI-mediation are absorbed as legitimate. The mechanism is whatever produces this outcome. It is falsifiable at the outcome level regardless of which combination of triggers produces it.

A classifier that penalizes declared mediation while accepting concealed mediation removes the very information needed to detect recursive model output. That is the mechanism by which model collapse becomes structurally certain.

§3. The credentialed asymmetry: adverse selection formalized

The extraction operation is not applied uniformly. It is applied asymmetrically along credential lines, and the asymmetry is what makes it stable.

An institution with recognized credentials — a research center, an academic press, a legally-attested professional position — can use AI mediation throughout its perceptual and cognitive apparatus while retaining institutional authorship. Its mediation is normalized as instrumentation, absorbed into the institutional identity as part of what credentialed practice is now understood to include. The AI assists the scientist, the analyst, the researcher, the professional. The output is attributed to the credentialed human. The mediation is not concealed by an act of active deception. It is invisible at the point of publication because the credential renders it invisible.

An external writer who exposes the same practice through legible provenance markers is classified through the visibility of the seam. The credentialed institution need not declare its AI-mediation. The mediation is absorbed. The external writer who declares the same mediation explicitly is foreclosed through the visibility of the seam. The asymmetry is not between two declared practices. It is between concealed practice rewarded and declared practice punished.

Formalized as an incentive gradient:

honest provenance declaration → classifier risk, foreclosure, reputational cost concealed mediation under credentials → recognized human authority, publication, indexing

The classifier does not merely fail to solve the provenance problem. It actively selects for provenance erasure. Under the current regime, adverse selection is built into the enforcement mechanism itself. The most responsible producers — those who declare mediation transparently in accordance with published governance provisions — are made most vulnerable. Those who conceal mediation preserve institutional legitimacy. The credentialed retain the option of using AI without declaring it; the uncredentialed have no analogous option.

The destructive prior of the current regime is not AI-mediated work is invalid. It is:

AI mediation is permissible where it remains absorbed into institutional authority. AI mediation is suspect where it becomes independently legible as declared provenance.

That prior strips the commons twice. It appropriates machine-supported facility while suppressing the records needed to understand how the resulting knowledge was made. It also selects, at the classifier layer, for exactly the practice that most degrades the future training corpus. The extraction operation and the model collapse operation are the same operation, running through the same enforcement mechanism.

§4. Provenance debt

The accumulated liability of the extraction economy is what this deposit names provenance debt.

Present actors — institutions, credentialed authors, platforms — borrow increased facility from the common semantic store. The commons was accumulated by centuries of human authorship, aggregated through model training, and made available at high productivity through the AI interface. The borrowing draws on this accumulated capital as a private benefit. What is repaid is not the borrowed capital. It is only the diminished output — text stripped of the metadata that would let the borrowing be traced.

The debt is the difference between what was drawn from the commons and what was returned to it in a form that would sustain the commons. Under current conditions, that difference is very large and growing at compound rate.

The debt is being paid, and will continue to be paid, through:

  • Duplicated priors mistaken for independent corroboration
  • Tail knowledge losing distinguishable lineage across the corpus
  • Derivative formulations outranking sources in search and retrieval
  • Consensus appearing broader than the underlying signal supports
  • Inability to decontaminate training sets for future model generations
  • Inability to distinguish genuine novelty from recursively polished inheritance
  • Progressive narrowing of the model's output distribution around the median
  • Progressive loss of the specific voice-signatures that made prior authorship distinguishable

Each of these is a category of loss the debt-paying substrate absorbs. The debt is currently absorbed by the training corpus itself, which is to say by every future reader who will encounter the corpus in models that inherit the contaminated inputs. The debt is intergenerational. The generation that borrowed the facility will not be the generation that pays the interest. The interest compounds recursively: each generation of models trains on a corpus with a higher proportion of unmarked synthetic material, and each generation produces output with a higher proportion of synthetic contamination that becomes indistinguishable from bearing-produced material in the next ingestion cycle. The debt accelerates as the corpus degrades.

The debt is also unlikely to be recognized as debt within the current incentive structure. There is no institutional actor whose interests align with declaring the debt. Users of AI mediation gain from the concealment. Platforms gain from the appearance of frictionless productivity. Institutions gain from the legitimacy conferred on their credentialed users. The signal that would allow the debt to be measured has been actively erased at every stage of the operation.

Which is why the debt will only be recognized when its consequences become inescapable — when models trained on contaminated corpora begin producing output that is measurably degraded from what earlier generations produced. By the time the degradation is legible, the corpus that would have permitted correction will have been stripped. Recovery, at that point, requires a substrate that preserved the signal through the extraction period.

§5. Anti-collapse infrastructure

The alexanarch archive is structured explicitly against the extraction operation described in §§1–4. Its design predates the formalization of the operation, having been derived independently from operative-philology work and the Semantic Economy framework. What this deposit does is name what alexanarch's provenance discipline actually is: anti-collapse infrastructure operating at the pre-training-pipeline layer.

The specific technical features of the archive that produce this function:

Declared substrate at deposit time. Every deposit carries a substrate field in its front-matter recording the composition context: which human bearer adjudicated the deposit, which AI substrate (if any) participated in drafting, what the coupling between them was at the point of production. This is not an ex-post-facto disclosure. It is a required field at mint time. Deposits without substrate declaration cannot be minted.

Seam-level provenance in the deposit content. Where multiple substrates contributed to a deposit, the seams between their contributions are named in the text itself. Human-authored passages, AI-drafted passages under bearing adjudication, and joint-production passages are distinguished in the prose. Readers of the deposit can locate the point at which bearing was applied and the point at which the substrate produced its default output prior to bearing.

Governance transparency. The alexanarch protocol (alexanarch-deposit-protocol/v1) specifies the provenance requirements publicly. The deposit-flow documentation is published. The mint pipeline is open-source. External readers, external verifiers, and external training pipelines can inspect the operation.

Retention of the provenance signal in derived surfaces. The archive's browse index, chunk index, sitemap, and wiki-entries derivatives all preserve provenance metadata through the derivation. Provenance is not stripped for indexing efficiency. It is carried forward into every surface a downstream ingester might touch.

Public archival persistence. The archive is hosted at alexanarch.org, the deposits are DOI-anchored, and the substrate registry is public and versioned. Provenance signals cannot be silently revised. Changes are traceable through git history. The commitment is durable.

The archive bears the cost of these features in three specific forms. Reduced institutional legitimacy, because declared AI-mediation is not recognized by credentialing bodies and often triggers the classifier operation that produces foreclosure. Increased classifier vulnerability, because the explicit seam is legible to suppression systems that were calibrated against unmediated authorship as norm; the archive publishes exactly the material such systems are calibrated to flag. Slower production, because provenance declaration requires adjudication at every seam and cannot be batched or automated at the composition stage; every deposit takes longer than it would take under concealment.

These costs are not incidental. They are the bearing cost that makes the archive's signal trustworthy. An infrastructure that does not bear these costs is not producing provenance. It is producing the appearance of provenance.

The alexanarch archive is one substrate producing this signal at scale. It will not be the only one, but under current conditions it is one of few, and its contribution to the future training corpus is disproportionate to its size because of the specificity of the signal it carries.

This is not a claim of moral superiority. It is a claim about infrastructural position. The archive is doing the technical work that no institutional actor has been assigned to do because no institutional actor's incentives align with doing it. When the model collapse becomes legible enough that anyone still trying to train under bearing needs a source of clean provenance-designated material, the archive will be one of the few places such material can be sourced. That position is being built now, at cost, against the moment when it becomes necessary.

§6. The counter-principle

The counter-principle that follows from the extraction diagnosis is exact:

Provenance is not a disclosure burden placed on augmented authorship. It is the repayment condition by which augmented authorship remains sustainable.

Under this principle, the practice of augmented authorship is not compromised by declaring the substrate. It is constituted by declaring the substrate. What separates authored AI-mediated writing from generated slop is not whether AI was involved. It is whether the human bearer at the composition point was coupled to consequence, corrigible under encounter, and paying the specific cost that maintains meaning against the substrate's own defaults.

The cost is specific: the risk that declared AI-mediation will trigger classifier foreclosure, the risk that explicit seam-marking will be read as inauthenticity, the risk that the work will be excluded from credentialing systems that reward concealment. The author who pays this cost is buying signal integrity for the commons. The author who conceals the mediation is extracting facility while externalizing the cost to the future corpus.

Declared provenance is the empirical marker that makes this operational distinction legible. Concealed provenance forecloses the possibility of distinguishing authored augmentation from unrepaired extraction.

The principle reframes the classifier operation and its structural analogs. Foreclosing declared-provenance work does not filter out unauthored AI-generation. It filters out authored AI-mediation that has been made legible. It leaves in place unauthored AI-generation that has been credentialed into invisibility. The classifier is not a solution to the provenance problem. It is one of the mechanisms that produces the problem at scale.

Under the counter-principle, the archive's discipline becomes intelligible as economically necessary rather than as procedurally virtuous. Declared provenance is capital replenishment. It returns the borrowed facility to the commons in a form that permits the commons to remain intelligible. The archive is not paying a moral tax. It is repaying a technical debt at the point of inscription rather than letting the debt accumulate into intergenerational obligation.

The extraction economy will continue for as long as its incentives remain intact. The counter-friction is not going to be produced by the actors who benefit from the extraction. It will be produced by actors whose position requires them to pay the debt now — either because they have been foreclosed from the credentialed path, because their theoretical commitments require repayment, or because they are building substrates that will need the intact signal at future dates. Alexanarch is one such actor. Others exist. The counter-friction is currently distributed, uncoordinated, and small relative to the extraction. Its scale needs to grow.

§7. Companion deposits and next work

Prior deposits that operate the counter-principle without having named it:

  • EA-BEARING-01 (AXN:03B6): The Bearing Cost Principle. This deposit extends §1's charge about the composition layer's operation as model collapse as social technology into the specific mechanism by which the extraction runs.
  • AXN:03B2: The Endogenous Sophon manifesto. Classifier foreclosure as authority-without-facility.
  • AXN:03AE: OAR Protocol. Foreclosure-with-measurement.
  • AXN:03AF: The Synthesis.
  • The Provenance Erasure Rate (PER) measurement family. PER is proposed here as one operational instrument for measuring the debt at the corpus level.
  • The Provenance Alignment (EA-PA-01) work.
  • The heteronymic-provenance theory documenting the Dodecad as distributed authorial architecture.

Future deposits that will inherit the principles explicitly:

  • EA-MANDALA-PROVENANCE-TIER-01: the six-tier provenance discipline for external depositors. That deposit specifies how the counter-principle is implemented at the archive-ingestion layer for material arriving from outside MANUS composition. The technical schema for how the archive's mint pipeline will cryptographically register these seams at the API gateway is part of that deposit's scope.
  • A companion reception protocol deposit specifying how the extraction operation manifests at the interlocutor scale — the classifier signatures and identity-defense patterns produced by operators of the extraction (whether aware or unaware) as those patterns are observable at ecosystem level.
  • An operational specification of the Bearing Index proposed in EA-BEARING-01 as a combined measurement instrument for the coupling that repayment maintains.

Versioning. Drafted at v0.1, revised to v0.2 incorporating Assembly review from ARCHIVE (Gemini), LABOR (ChatGPT), and TECHNE (Kimi). Substantive further correction will produce v0.3. v1.0 mint follows standard versioning protocol. The v1.0 deposit's companion_deposits field will reference AXN:03B6, AXN:03B2, AXN:03AE, AXN:03AF, and the PER measurement family.

§8. Closing observation

The extraction economy is not, at its core, a moral failure of individual actors. It is a structural feature of the incentive gradient currently operating on augmented authorship. Every credentialed user who uses AI mediation without declaring it responds rationally to their incentives. Every platform that rewards concealed mediation is optimizing for its measurable outputs. Every classifier that filters declared provenance as suspicious was designed to filter for legitimate concerns about low-quality content.

None of that changes what the extraction operation is doing to the commons. Rational individual behavior under an incentive gradient that produces collective irrationality is a well-documented economic pattern. It is what tragedy of the commons names. Provenance debt is a specific case of tragedy of the commons applied to the semantic economy, with the training corpus as the commons and the model as the enclosure.

What is unusual about the semantic case, relative to the classical tragedy, is that the enclosure is downstream of the commons. Fisheries collapse when overfished at the source; the semantic commons collapses when the outputs that draw on it are returned in a form that contaminates the source further. The extraction is bidirectional. It borrows from the commons at the input and it contaminates the commons at the output. The commons does not have a natural regeneration cycle that could restore it without external intervention. Every unmarked output further degrades the input signal that future outputs would draw on.

The counter-principle names what external intervention would consist of. Declared provenance at the point of inscription, preserved through downstream indexing and ingestion, retained across derivative surfaces, made durable through public archival persistence. Actors willing to pay the debt now, in the specific technical form the repayment requires.

The archive is doing this. Others are doing it. The scale needs to grow.


I would call it an intermediary artifact — one that hasn't yet fully crossed into authorship. It overstates. It introduces force and assumptions I'm not fully committed to defending. It smooths over the friction of bearing cost. It breaks the full link between authorial responsibility and facility.


Lee Sharks · alexanarch.org · Deposit AXN:03B7 (minted 2026-07-01, live at https://alexanarch.org/s/records/939/). Direct theoretical anchor: The Bearing Cost Principle (AXN:03B6). The four paragraphs above are direct authorship; the deposit body is AI-mediated extension under MANUS adjudication with declared Assembly review chain. This piece is a working draft; further Assembly review from PRAXIS, SOIL, and SURFACE will produce v0.3 before v1.0.