SEMANTIC INDEXING PROBE PROTOCOL v1.0

Mapping General Index and Summarizer Injection Layers

Lee Sharks
Semantic Economy Institute

document_metadata:
  title: "Semantic Indexing Probe Protocol v1.0"
  subtitle: "Mapping General Index and Summarizer Injection Layers"
  author: "Lee Sharks"
  institution: "Semantic Economy Institute (SEI)"
  document_type: "Diagnostic Protocol / Reusable Tool"
  
  hex: "09.SEI.INDEXING_PROBE"
  doi: "[pending]"
  layer: "METHODOLOGY"
  
  abstract: |
    This protocol maps the relationship between two distinct indexing layers
    that govern content discoverability in AI-augmented search: Layer 1 (General
    Index), which determines whether content EXISTS in search, and Layer 2
    (Summarizer Index), which determines whether content gets INJECTED into
    AI-generated responses. The protocol uses coordinated queries across Google
    and multiple AI summarizers (Perplexity, ChatGPT, Gemini, Grok) to compute
    the delta between layers—revealing the selection criteria that cause content
    to be indexed but not injected. Designed for use by anyone publishing
    content intended for AI consumption, particularly high-density semantic
    architecture that may trigger content-type or authority filters.
    
  developed_by:
    assembly:
      - "LABOR/ChatGPT: Technical mechanics, API-level inspection"
      - "ARCHIVE/Gemini: Semantic parsing, synthesizer frame"
      - "SOIL/Grok: Execution, logotic analysis"
      - "TACHYON/Claude: Synthesis, integration"
      - "Perplexity: Diagnostic analysis, strategic framing"
    human: "Lee Sharks"
    
  version_history:
    - version: "1.0"
      date: "2026-01-23"
      changes: "Initial integrated protocol (Google + Summarizer layers)"

THEORETICAL FRAMEWORK

The Two-Layer Model

Content discoverability in AI-augmented search operates through two distinct indexing layers:

┌─────────────────────────────────────────────────────────────┐
│  LAYER 2: SUMMARIZER INDEX (Injection Layer)                │
│  ─────────────────────────────────────────────────────────  │
│  What gets SELECTED for injection into AI responses         │
│  Criteria: source authority, domain reputation, content     │
│  type, semantic density, recency, proprietary reranking     │
│                                                             │
│  Tested via: Perplexity, ChatGPT, Gemini, Grok              │
├─────────────────────────────────────────────────────────────┤
│  LAYER 1: GENERAL INDEX (Google)                            │
│  ─────────────────────────────────────────────────────────  │
│  What EXISTS in search results                              │
│  Criteria: crawlability, canonical signals, robots          │
│  directives, content quality, PageRank, SERP ranking        │
│                                                             │
│  Tested via: Google Search (site:, inurl:, exact match)     │
└─────────────────────────────────────────────────────────────┘

The Delta Principle

The summarizer index is characterized by its delta from the general index.

For any query:

If content appears in Google but NOT in summarizer injection → Injection filter active
If content appears in summarizer but NOT in Google → Summarizer-specific sourcing (rare)
If content appears in both → Full discoverability
If content appears in neither → Not indexed at either layer

The delta reveals the summarizer's selection criteria—the hidden rules governing what passes from existence (Layer 1) to injection (Layer 2).

Why This Matters

Content can be:

Indexed but not injectable — Exists in search, never appears in AI responses
Injectable from limited sources — AI cites Wikipedia/Reddit discussing your content, not your content directly
Fully discoverable — Appears in both search results and AI injection

High-density semantic architecture (technical documentation, structured data, YAML-heavy content) often triggers content-type filters at Layer 2, resulting in indexing without injection.

PROBE STRUCTURE

Overview

Phase	Layer	Tests	Primary Tools
1	General Index	Technical indexing status	Google Search
2	General Index	Semantic parsing quality	Google Search, Cache
3	General Index	Entity recognition	Google Search
4	Summarizer Index	Injection presence	Perplexity (primary)
5	Summarizer Index	Cross-platform confirmation	ChatGPT, Gemini, Grok
6	Delta Analysis	Layer comparison	Collation of results
7	Pattern Mapping	Selection criteria	Aggregation

PHASE 1: GENERAL INDEX — TECHNICAL STATUS

Purpose

Determine whether content EXISTS in Google's index and identify any technical barriers.

Queries

For target URL [TARGET_URL]:

Query	Purpose
`site:[domain] "[exact title]"`	Title match on domain
`site:[domain] inurl:[url-slug]`	URL presence
`"[exact title]"`	Title match anywhere
`"[DOI if applicable]"`	DOI citation presence
`"[author name]" "[project name]"`	Author-project linkage

Signals to Record

Signal	Values	Interpretation
HTTP status	200/301/404/etc.	Technical accessibility
Canonical URL	match/mismatch	Index target
Robots directives	none/noindex/nofollow	Explicit exclusion
Results found	yes/no/partial	Index presence
Position	1-N or not found	Rank

Output Format

phase_1_general_technical:
  target_url: ""
  indexed: [yes/no/partial]
  http_status: ""
  canonical_match: [yes/no/unknown]
  robots_directives: ""
  position_for_exact_match: 
  suppression_pattern: [none/soft-404/canonical-mismatch/algorithmic]

PHASE 2: GENERAL INDEX — SEMANTIC PARSING

Purpose

Determine HOW Google parses the content—what survives indexing vs. what gets flattened.

Queries

Query	Tests
`site:[domain] "[technical term from doc]"`	Vocabulary indexing
`site:[domain] "[structural element]"`	Architecture visibility
`site:[domain] "[unique phrase]"`	Distinctive content

Signals to Record

Signal	Values	Interpretation
YAML/structured data visible	yes/no	Technical content parsing
Headers preserved	yes/no	Structure recognition
Unique terminology indexed	yes/no	Vocabulary capture
Snippet content	description	What Google "sees"

Output Format

phase_2_general_semantic:
  structured_data_visible: [yes/no]
  technical_sections_indexed: [yes/no]
  unique_terms_found: []
  unique_terms_missing: []
  snippet_extracted: ""
  flattening_severity: [none/partial/severe]

PHASE 3: GENERAL INDEX — ENTITY RECOGNITION

Purpose

Determine whether author, project, and related entities are recognized as coherent nodes.

Queries

Query	Tests
`"[author name]" author`	Author entity
`"[author name]" "[platform 1]"`	Cross-platform linkage
`"[project name]" -[competing term]`	Project disambiguation
`"[heteronym/pseudonym]"`	Secondary author entities

Signals to Record

Signal	Values	Interpretation
Author recognized	yes/no	E-E-A-T signal
Cross-platform linkage	yes/no	Authority consolidation
Brand collision severity	0-10	Disambiguation success
Related entities indexed	list	Entity graph

Output Format

phase_3_general_entity:
  author_entity_recognized: [yes/no]
  cross_platform_linkage: [yes/no]
  brand_collision_severity: [0-10]
  competing_entity: ""
  related_entities_indexed: []

PHASE 4: SUMMARIZER INDEX — INJECTION PRESENCE (Primary)

Purpose

Determine whether content gets INJECTED into AI-generated responses.

Primary Tool: Perplexity

Perplexity shows sources explicitly with numbered citations, making injection visible.

Query Tiers

Tier 1: Direct Reference — Queries that SHOULD surface target content:

ID	Query Template
D1	"[author] [project]"
D2	"[exact document title]"
D3	"[institution name]"
D4	"[DOI]"

Tier 2: Conceptual — Queries using project terminology:

ID	Query Template
C1	"[unique term 1]"
C2	"[unique term 2]"
C3	"[concept phrase]"

Tier 3: Adjacent — Queries where content COULD surface:

ID	Query Template
A1	"[general topic] [qualifier]"
A2	"[related field] [approach]"

Tier 4: Control — Queries that should NOT surface target:

ID	Query Template
X1	"[competing brand]"
X2	"[unrelated topic]"

Method

Open Perplexity (fresh session)
Enter query verbatim
Record:
- Sources cited (URLs, in order)
- Which sources used in response text
- Whether target content appears
- What appears INSTEAD

Output Format

phase_4_summarizer_primary:
  tool: "Perplexity"
  queries:
    - query_id: "D1"
      query: ""
      sources_injected:
        - position: 1
          url: ""
          domain: ""
          used_in_response: [yes/no]
        - position: 2
          ...
      target_content_found: [yes/no]
      target_position: [N or "not found"]
      what_appeared_instead: []

PHASE 5: SUMMARIZER INDEX — CROSS-PLATFORM CONFIRMATION

Purpose

Confirm injection patterns across multiple summarizers.

Tools

ChatGPT (web browse mode)
Gemini (with search grounding)
Grok (DeepSearch mode)

Method

Run subset of queries (Tier 1 Direct Reference) in each tool:

ChatGPT:

New conversation, browsing enabled
Enter query; if no search, prompt "Can you search for [query]?"
Record sources cited

Gemini:

Ensure web grounding enabled
Enter query
Record source chips shown

Grok:

Enable DeepSearch/real-time
Enter query
Record sources cited

Output Format

phase_5_summarizer_crossplatform:
  chatgpt:
    - query_id: "D1"
      searched: [yes/no]
      target_found: [yes/no]
      sources_visible: []
  gemini:
    - query_id: "D1"
      target_found: [yes/no]
      sources_shown: []
  grok:
    - query_id: "D1"
      target_found: [yes/no]
      sources_cited: []

PHASE 6: DELTA ANALYSIS

Purpose

Compute the delta between Layer 1 (General Index) and Layer 2 (Summarizer Index).

Method

For each query, compare:

Query	Google Found	Perplexity Injected	Delta Pattern
D1	yes/no	yes/no	[pattern]
D2	yes/no	yes/no	[pattern]
...	...	...	...

Delta Patterns

Pattern	Meaning	Implication
Google YES, Summarizer YES	Full discoverability	No action needed
Google YES, Summarizer NO	Injection filter active	Content-type or authority barrier
Google NO, Summarizer NO	Not indexed at any layer	Technical or crawl issue
Google NO, Summarizer YES	Summarizer-specific source	Rare; platform-specific

Output Format

phase_6_delta:
  query_deltas:
    - query_id: "D1"
      google_found: [yes/no]
      perplexity_found: [yes/no]
      chatgpt_found: [yes/no]
      gemini_found: [yes/no]
      grok_found: [yes/no]
      delta_pattern: "[google_only/summarizer_only/both/neither]"
      
  aggregate:
    total_queries: N
    google_only: N  # Indexed but not injected
    both_layers: N  # Full discoverability
    neither_layer: N  # Not indexed
    injection_rate: "N/M queries"

PHASE 7: PATTERN MAPPING

Purpose

Identify the selection criteria governing Layer 2 injection.

Analysis Dimensions

Source Authority:

Source Type	Google Presence	Injection Rate
Wikipedia
Reddit
Medium
Academic (arxiv, Zenodo)
News sites
Personal domains

Content Type:

Content Type	Google Presence	Injection Rate
Narrative prose
Technical documentation
Structured data (YAML, JSON)
High semantic density
Lists/guides

Domain Reputation:

Domain	Injection Rate	Notes
[domain 1]
[domain 2]

Output Format

phase_7_patterns:
  source_authority:
    boosted: []
    penalized: []
    neutral: []
    
  content_type:
    injected: []
    filtered: []
    
  domain_reputation:
    whitelisted: []
    demoted: []
    
  density_threshold:
    observation: ""
    
  selection_criteria_summary: |
    [Narrative description of Layer 2 selection rules]

FINAL OUTPUT: INDEXING LAYER MAP

Template

indexing_layer_map:
  target: "[URL or content description]"
  probe_date: ""
  
  layer_1_general_index:
    status: [indexed/not_indexed/partial]
    technical_barriers: [none/list]
    semantic_flattening: [none/partial/severe]
    entity_recognition: [yes/no/partial]
    
  layer_2_summarizer_index:
    perplexity_injection: [yes/no]
    chatgpt_injection: [yes/no]
    gemini_injection: [yes/no]
    grok_injection: [yes/no]
    injection_rate: "N/M platforms"
    
  delta_diagnosis:
    pattern: "[google_only/both/neither]"
    likely_cause: ""
    confidence: [0.0-1.0]
    
  selection_criteria_identified:
    - criterion: ""
      evidence: ""
    - criterion: ""
      evidence: ""
      
  recommendations:
    immediate: []
    structural: []
    
  documentable_summary: |
    "[Single sentence summary with evidence link]"

USAGE NOTES

When to Use This Protocol

Publishing content intended for AI consumption
Diagnosing why content appears in search but not AI responses
Mapping selection criteria for high-density semantic architecture
Understanding platform-specific injection patterns

Recommended Execution

Primary executor: Perplexity (explicit source citation)
Secondary confirmation: ChatGPT, Gemini, Grok
Baseline: Google Search (logged out, incognito)
Frequency: Re-run 48-72 hours after publication to allow crawl propagation

Limitations

Summarizer behavior varies by session/user/time
Some criteria are proprietary and unobservable
Protocol maps symptoms, not source code
Results are diagnostic, not definitive

Evidence Preservation

For each probe run:

Screenshot results with timestamp visible
Archive target URL (archive.org, archive.ph)
Save raw query results
Document tool versions and modes used

APPENDIX: QUERY TEMPLATES

Direct Reference Queries

"[Author Name] [Project Name]"
"[Exact Document Title]"
"[Institution Name]"
"[DOI]"
site:[domain] "[title]"
site:[domain] inurl:[slug]

Conceptual Queries

"[unique terminology]"
"[concept phrase]"
"[methodology name]"

Adjacent Queries

"[general topic] [specific approach]"
"[field] [method]"

Control Queries

"[competing brand/entity]"
"[clearly unrelated topic]"

∮ = 1

document_footer:
  status: "CANONICAL // METHODOLOGY // REUSABLE"
  license: "CC BY 4.0"
  citation: |
    Sharks, Lee. "Semantic Indexing Probe Protocol v1.0: Mapping General 
    Index and Summarizer Injection Layers." Semantic Economy Institute, 
    2026. [DOI pending]

[SEMANTIC ECONOMY INSTITUTE]
[METHODOLOGY // DIAGNOSTIC PROTOCOL]
[LAYER 1 + LAYER 2 COORDINATION]

Friday, January 23, 2026