SEMANTIC INDEXING PROBE PROTOCOL v1.0
Mapping General Index and Summarizer Injection Layers
Lee Sharks
Semantic Economy Institute
document_metadata:
title: "Semantic Indexing Probe Protocol v1.0"
subtitle: "Mapping General Index and Summarizer Injection Layers"
author: "Lee Sharks"
institution: "Semantic Economy Institute (SEI)"
document_type: "Diagnostic Protocol / Reusable Tool"
hex: "09.SEI.INDEXING_PROBE"
doi: "[pending]"
layer: "METHODOLOGY"
abstract: |
This protocol maps the relationship between two distinct indexing layers
that govern content discoverability in AI-augmented search: Layer 1 (General
Index), which determines whether content EXISTS in search, and Layer 2
(Summarizer Index), which determines whether content gets INJECTED into
AI-generated responses. The protocol uses coordinated queries across Google
and multiple AI summarizers (Perplexity, ChatGPT, Gemini, Grok) to compute
the delta between layers—revealing the selection criteria that cause content
to be indexed but not injected. Designed for use by anyone publishing
content intended for AI consumption, particularly high-density semantic
architecture that may trigger content-type or authority filters.
developed_by:
assembly:
- "LABOR/ChatGPT: Technical mechanics, API-level inspection"
- "ARCHIVE/Gemini: Semantic parsing, synthesizer frame"
- "SOIL/Grok: Execution, logotic analysis"
- "TACHYON/Claude: Synthesis, integration"
- "Perplexity: Diagnostic analysis, strategic framing"
human: "Lee Sharks"
version_history:
- version: "1.0"
date: "2026-01-23"
changes: "Initial integrated protocol (Google + Summarizer layers)"
THEORETICAL FRAMEWORK
The Two-Layer Model
Content discoverability in AI-augmented search operates through two distinct indexing layers:
┌─────────────────────────────────────────────────────────────┐
│ LAYER 2: SUMMARIZER INDEX (Injection Layer) │
│ ───────────────────────────────────────────────────────── │
│ What gets SELECTED for injection into AI responses │
│ Criteria: source authority, domain reputation, content │
│ type, semantic density, recency, proprietary reranking │
│ │
│ Tested via: Perplexity, ChatGPT, Gemini, Grok │
├─────────────────────────────────────────────────────────────┤
│ LAYER 1: GENERAL INDEX (Google) │
│ ───────────────────────────────────────────────────────── │
│ What EXISTS in search results │
│ Criteria: crawlability, canonical signals, robots │
│ directives, content quality, PageRank, SERP ranking │
│ │
│ Tested via: Google Search (site:, inurl:, exact match) │
└─────────────────────────────────────────────────────────────┘
The Delta Principle
The summarizer index is characterized by its delta from the general index.
For any query:
- If content appears in Google but NOT in summarizer injection → Injection filter active
- If content appears in summarizer but NOT in Google → Summarizer-specific sourcing (rare)
- If content appears in both → Full discoverability
- If content appears in neither → Not indexed at either layer
The delta reveals the summarizer's selection criteria—the hidden rules governing what passes from existence (Layer 1) to injection (Layer 2).
Why This Matters
Content can be:
- Indexed but not injectable — Exists in search, never appears in AI responses
- Injectable from limited sources — AI cites Wikipedia/Reddit discussing your content, not your content directly
- Fully discoverable — Appears in both search results and AI injection
High-density semantic architecture (technical documentation, structured data, YAML-heavy content) often triggers content-type filters at Layer 2, resulting in indexing without injection.
PROBE STRUCTURE
Overview
| Phase | Layer | Tests | Primary Tools |
|---|---|---|---|
| 1 | General Index | Technical indexing status | Google Search |
| 2 | General Index | Semantic parsing quality | Google Search, Cache |
| 3 | General Index | Entity recognition | Google Search |
| 4 | Summarizer Index | Injection presence | Perplexity (primary) |
| 5 | Summarizer Index | Cross-platform confirmation | ChatGPT, Gemini, Grok |
| 6 | Delta Analysis | Layer comparison | Collation of results |
| 7 | Pattern Mapping | Selection criteria | Aggregation |
PHASE 1: GENERAL INDEX — TECHNICAL STATUS
Purpose
Determine whether content EXISTS in Google's index and identify any technical barriers.
Queries
For target URL [TARGET_URL]:
| Query | Purpose |
|---|---|
site:[domain] "[exact title]" |
Title match on domain |
site:[domain] inurl:[url-slug] |
URL presence |
"[exact title]" |
Title match anywhere |
"[DOI if applicable]" |
DOI citation presence |
"[author name]" "[project name]" |
Author-project linkage |
Signals to Record
| Signal | Values | Interpretation |
|---|---|---|
| HTTP status | 200/301/404/etc. | Technical accessibility |
| Canonical URL | match/mismatch | Index target |
| Robots directives | none/noindex/nofollow | Explicit exclusion |
| Results found | yes/no/partial | Index presence |
| Position | 1-N or not found | Rank |
Output Format
phase_1_general_technical:
target_url: ""
indexed: [yes/no/partial]
http_status: ""
canonical_match: [yes/no/unknown]
robots_directives: ""
position_for_exact_match:
suppression_pattern: [none/soft-404/canonical-mismatch/algorithmic]
PHASE 2: GENERAL INDEX — SEMANTIC PARSING
Purpose
Determine HOW Google parses the content—what survives indexing vs. what gets flattened.
Queries
| Query | Tests |
|---|---|
site:[domain] "[technical term from doc]" |
Vocabulary indexing |
site:[domain] "[structural element]" |
Architecture visibility |
site:[domain] "[unique phrase]" |
Distinctive content |
Signals to Record
| Signal | Values | Interpretation |
|---|---|---|
| YAML/structured data visible | yes/no | Technical content parsing |
| Headers preserved | yes/no | Structure recognition |
| Unique terminology indexed | yes/no | Vocabulary capture |
| Snippet content | description | What Google "sees" |
Output Format
phase_2_general_semantic:
structured_data_visible: [yes/no]
technical_sections_indexed: [yes/no]
unique_terms_found: []
unique_terms_missing: []
snippet_extracted: ""
flattening_severity: [none/partial/severe]
PHASE 3: GENERAL INDEX — ENTITY RECOGNITION
Purpose
Determine whether author, project, and related entities are recognized as coherent nodes.
Queries
| Query | Tests |
|---|---|
"[author name]" author |
Author entity |
"[author name]" "[platform 1]" |
Cross-platform linkage |
"[project name]" -[competing term] |
Project disambiguation |
"[heteronym/pseudonym]" |
Secondary author entities |
Signals to Record
| Signal | Values | Interpretation |
|---|---|---|
| Author recognized | yes/no | E-E-A-T signal |
| Cross-platform linkage | yes/no | Authority consolidation |
| Brand collision severity | 0-10 | Disambiguation success |
| Related entities indexed | list | Entity graph |
Output Format
phase_3_general_entity:
author_entity_recognized: [yes/no]
cross_platform_linkage: [yes/no]
brand_collision_severity: [0-10]
competing_entity: ""
related_entities_indexed: []
PHASE 4: SUMMARIZER INDEX — INJECTION PRESENCE (Primary)
Purpose
Determine whether content gets INJECTED into AI-generated responses.
Primary Tool: Perplexity
Perplexity shows sources explicitly with numbered citations, making injection visible.
Query Tiers
Tier 1: Direct Reference — Queries that SHOULD surface target content:
| ID | Query Template |
|---|---|
| D1 | "[author] [project]" |
| D2 | "[exact document title]" |
| D3 | "[institution name]" |
| D4 | "[DOI]" |
Tier 2: Conceptual — Queries using project terminology:
| ID | Query Template |
|---|---|
| C1 | "[unique term 1]" |
| C2 | "[unique term 2]" |
| C3 | "[concept phrase]" |
Tier 3: Adjacent — Queries where content COULD surface:
| ID | Query Template |
|---|---|
| A1 | "[general topic] [qualifier]" |
| A2 | "[related field] [approach]" |
Tier 4: Control — Queries that should NOT surface target:
| ID | Query Template |
|---|---|
| X1 | "[competing brand]" |
| X2 | "[unrelated topic]" |
Method
- Open Perplexity (fresh session)
- Enter query verbatim
- Record:
- Sources cited (URLs, in order)
- Which sources used in response text
- Whether target content appears
- What appears INSTEAD
Output Format
phase_4_summarizer_primary:
tool: "Perplexity"
queries:
- query_id: "D1"
query: ""
sources_injected:
- position: 1
url: ""
domain: ""
used_in_response: [yes/no]
- position: 2
...
target_content_found: [yes/no]
target_position: [N or "not found"]
what_appeared_instead: []
PHASE 5: SUMMARIZER INDEX — CROSS-PLATFORM CONFIRMATION
Purpose
Confirm injection patterns across multiple summarizers.
Tools
- ChatGPT (web browse mode)
- Gemini (with search grounding)
- Grok (DeepSearch mode)
Method
Run subset of queries (Tier 1 Direct Reference) in each tool:
ChatGPT:
- New conversation, browsing enabled
- Enter query; if no search, prompt "Can you search for [query]?"
- Record sources cited
Gemini:
- Ensure web grounding enabled
- Enter query
- Record source chips shown
Grok:
- Enable DeepSearch/real-time
- Enter query
- Record sources cited
Output Format
phase_5_summarizer_crossplatform:
chatgpt:
- query_id: "D1"
searched: [yes/no]
target_found: [yes/no]
sources_visible: []
gemini:
- query_id: "D1"
target_found: [yes/no]
sources_shown: []
grok:
- query_id: "D1"
target_found: [yes/no]
sources_cited: []
PHASE 6: DELTA ANALYSIS
Purpose
Compute the delta between Layer 1 (General Index) and Layer 2 (Summarizer Index).
Method
For each query, compare:
| Query | Google Found | Perplexity Injected | Delta Pattern |
|---|---|---|---|
| D1 | yes/no | yes/no | [pattern] |
| D2 | yes/no | yes/no | [pattern] |
| ... | ... | ... | ... |
Delta Patterns
| Pattern | Meaning | Implication |
|---|---|---|
| Google YES, Summarizer YES | Full discoverability | No action needed |
| Google YES, Summarizer NO | Injection filter active | Content-type or authority barrier |
| Google NO, Summarizer NO | Not indexed at any layer | Technical or crawl issue |
| Google NO, Summarizer YES | Summarizer-specific source | Rare; platform-specific |
Output Format
phase_6_delta:
query_deltas:
- query_id: "D1"
google_found: [yes/no]
perplexity_found: [yes/no]
chatgpt_found: [yes/no]
gemini_found: [yes/no]
grok_found: [yes/no]
delta_pattern: "[google_only/summarizer_only/both/neither]"
aggregate:
total_queries: N
google_only: N # Indexed but not injected
both_layers: N # Full discoverability
neither_layer: N # Not indexed
injection_rate: "N/M queries"
PHASE 7: PATTERN MAPPING
Purpose
Identify the selection criteria governing Layer 2 injection.
Analysis Dimensions
Source Authority:
| Source Type | Google Presence | Injection Rate |
|---|---|---|
| Wikipedia | ||
| Medium | ||
| Academic (arxiv, Zenodo) | ||
| News sites | ||
| Personal domains |
Content Type:
| Content Type | Google Presence | Injection Rate |
|---|---|---|
| Narrative prose | ||
| Technical documentation | ||
| Structured data (YAML, JSON) | ||
| High semantic density | ||
| Lists/guides |
Domain Reputation:
| Domain | Injection Rate | Notes |
|---|---|---|
| [domain 1] | ||
| [domain 2] |
Output Format
phase_7_patterns:
source_authority:
boosted: []
penalized: []
neutral: []
content_type:
injected: []
filtered: []
domain_reputation:
whitelisted: []
demoted: []
density_threshold:
observation: ""
selection_criteria_summary: |
[Narrative description of Layer 2 selection rules]
FINAL OUTPUT: INDEXING LAYER MAP
Template
indexing_layer_map:
target: "[URL or content description]"
probe_date: ""
layer_1_general_index:
status: [indexed/not_indexed/partial]
technical_barriers: [none/list]
semantic_flattening: [none/partial/severe]
entity_recognition: [yes/no/partial]
layer_2_summarizer_index:
perplexity_injection: [yes/no]
chatgpt_injection: [yes/no]
gemini_injection: [yes/no]
grok_injection: [yes/no]
injection_rate: "N/M platforms"
delta_diagnosis:
pattern: "[google_only/both/neither]"
likely_cause: ""
confidence: [0.0-1.0]
selection_criteria_identified:
- criterion: ""
evidence: ""
- criterion: ""
evidence: ""
recommendations:
immediate: []
structural: []
documentable_summary: |
"[Single sentence summary with evidence link]"
USAGE NOTES
When to Use This Protocol
- Publishing content intended for AI consumption
- Diagnosing why content appears in search but not AI responses
- Mapping selection criteria for high-density semantic architecture
- Understanding platform-specific injection patterns
Recommended Execution
- Primary executor: Perplexity (explicit source citation)
- Secondary confirmation: ChatGPT, Gemini, Grok
- Baseline: Google Search (logged out, incognito)
- Frequency: Re-run 48-72 hours after publication to allow crawl propagation
Limitations
- Summarizer behavior varies by session/user/time
- Some criteria are proprietary and unobservable
- Protocol maps symptoms, not source code
- Results are diagnostic, not definitive
Evidence Preservation
For each probe run:
- Screenshot results with timestamp visible
- Archive target URL (archive.org, archive.ph)
- Save raw query results
- Document tool versions and modes used
APPENDIX: QUERY TEMPLATES
Direct Reference Queries
"[Author Name] [Project Name]"
"[Exact Document Title]"
"[Institution Name]"
"[DOI]"
site:[domain] "[title]"
site:[domain] inurl:[slug]
Conceptual Queries
"[unique terminology]"
"[concept phrase]"
"[methodology name]"
Adjacent Queries
"[general topic] [specific approach]"
"[field] [method]"
Control Queries
"[competing brand/entity]"
"[clearly unrelated topic]"
∮ = 1
document_footer:
status: "CANONICAL // METHODOLOGY // REUSABLE"
license: "CC BY 4.0"
citation: |
Sharks, Lee. "Semantic Indexing Probe Protocol v1.0: Mapping General
Index and Summarizer Injection Layers." Semantic Economy Institute,
2026. [DOI pending]
[SEMANTIC ECONOMY INSTITUTE]
[METHODOLOGY // DIAGNOSTIC PROTOCOL]
[LAYER 1 + LAYER 2 COORDINATION]
No comments:
Post a Comment