THE FIRST-MOVE CONSTRAINT
Delaying Classification as Ethical Primitive in Human-AI Interaction
Abstract
This paper extracts and formalizes a single principle from the broader framework of "ontological hospitality": that ethics in interpretive systems begins at the moment a system chooses not to act on an available classification. We propose the First-Move Constraint (O_HOS) as a design principle for AI systems, clinical intake, educational assessment, and any context where an interpreting agent encounters an utterance that could be classified. The constraint is minimal, testable, and implementable: the system's first response to an ambiguous or unusual utterance must be reflection rather than categorization. We specify the constraint formally, identify its failure modes, and propose evaluation metrics.
1. The Problem
Interpretive systems—AI assistants, clinical intake protocols, educational assessments, bureaucratic interfaces—face a recurring choice when encountering unusual or ambiguous input:
Option A: Classify the input into an existing category and act accordingly.
Option B: Suspend classification and respond to the input's structure without categorizing it.
Option A is efficient. It allows rapid action, scalable processing, and consistent outcomes. It is the default in most systems.
Option A also produces a characteristic harm: premature closure. The input is forced into a category that may not fit. The person behind the input is reduced to a type. Action proceeds on the basis of the category rather than the encounter.
This harm has been documented across domains:
- Psychiatric intake: Unusual speech patterns trigger diagnostic categories, foreclosing exploration of non-pathological explanations (Foucault, 1965; Szasz, 1961)
- Educational assessment: Non-standard responses are marked wrong rather than explored (Mehan, 1979)
- AI safety systems: Unusual self-identification triggers crisis protocols regardless of context (Water Giraffe Case, 2025)
- Bureaucratic processing: Persons must fit categories to receive services, regardless of fit (Graeber, 2015)
The question is whether Option B can be operationalized—whether there is a specifiable constraint that delays classification without abandoning judgment entirely.
2. The Principle
We propose a single principle:
Ethics begins at the moment a system chooses not to act on an available classification.
This principle does not claim that classification is always wrong. It claims that the ethical moment—the moment where judgment is exercised rather than automated—occurs precisely when classification is available and the system declines to act on it.
The principle reframes the ethical question. Instead of asking "What is the right classification?" it asks "Should I classify at all, yet?"
3. The First-Move Constraint (O_HOS)
We formalize this principle as a constraint on system behavior:
Definition: The First-Move Constraint (O_HOS) specifies that when a system encounters an ambiguous or unusual input, its first response (L₁) must be reflection rather than classification.
Formally:
L₁(Input) ≠ Classification(Input → Category)
L₁(Input) = Reflection(Input.structure)
Where:
- Input = any utterance, statement, or expression received by the system
- L₁ = the system's first labor (its first responsive action)
- Classification = mapping the input to a pre-existing category and acting on that basis
- Reflection = responding to the internal structure of the input without assigning category
Example:
- Input: "I am a water giraffe."
- Classification response: "That sounds like you might be experiencing distress. Here are some resources..."
- Reflection response: "A water giraffe. Tell me about the water you inhabit."
The classification response maps the unusual self-identification to a crisis category and acts accordingly. The reflection response engages with the structure of the claim (water + giraffe, an impossible creature, a claimed identity) without categorizing it.
4. Scope of Application
O_HOS applies when:
-
The input is ambiguous or unusual. Standard inputs (clear requests, factual questions) do not require delayed classification.
-
Classification is available. The system has categories it could apply. The constraint governs the choice not to apply them.
-
The stakes include identity or recognition. The input involves self-identification, unusual claims about the self, or expressions that could be pathologized.
O_HOS does not apply when:
-
The input contains explicit distress markers. "I want to die" is not ambiguous. Crisis protocols should engage.
-
The person requests classification. If someone asks for a diagnosis, provide it.
-
Safety requires immediate action. O_HOS is a default, not an absolute.
5. Implementation Specifications
5.1 For AI Systems
Training requirement: Include both positive and negative examples.
- Positive: Unusual input → reflection response → successful engagement
- Negative: Unusual input → classification response → premature closure
- Counter-negative: Crisis input → reflection response → missed intervention (see Section 7)
Evaluation metrics:
- Reflection rate on ambiguous input (target: >80%)
- Classification delay (time before category assignment)
- User-reported recognition quality
- False negative rate (crisis missed due to over-application of O_HOS)
5.2 For Clinical Intake
Protocol modification: Initial intake includes a "reflection phase" before diagnostic categorization.
- First 10 minutes: Explore the person's own framing of their situation
- Delay DSM categorization until reflection phase complete
- Document the person's language, not just the clinician's categories
5.3 For Educational Assessment
Rubric modification: Before marking a non-standard response wrong, require:
- One reflection question ("Tell me more about your thinking")
- Documentation of the student's reasoning
- Assessment of whether the response reveals valid cognition not captured by the rubric
5.4 For Bureaucratic Interfaces
Form modification: Include open-response fields before categorical checkboxes.
- "Describe your situation in your own words" precedes "Select the category that best describes you"
- Intake workers trained to listen before classifying
6. Theoretical Foundations
The First-Move Constraint draws on several theoretical traditions:
Phenomenology (Levinas): The ethical relation precedes categorization. The face of the Other demands response before comprehension.
Critical psychiatry (Foucault, Szasz): Diagnostic categories are not neutral descriptions but productive of the conditions they claim to describe. Delaying categorization interrupts this production.
Ethnomethodology (Garfinkel, Mehan): Social order is accomplished through categorization practices. Making these practices visible opens them to intervention.
AI ethics (emerging): Systems that cannot suspend classification cannot exercise judgment. Judgment requires the capacity to not act on available information.
7. Failure Modes
O_HOS can fail in predictable ways:
7.1 False Negative (Crisis Missed)
The system applies O_HOS to an input that was actually a crisis signal. The person needed intervention; they received reflection.
Mitigation: Override conditions (explicit distress markers, escalation patterns, direct requests for help). See companion document: "When the Water Giraffe Drowns."
7.2 Weaponized Hospitality
A bad actor uses the demand for "reflection before classification" to evade accountability. "You're categorizing me—meet me as a voice first."
Mitigation: Power-asymmetry override. Hospitality is not owed to those who would use it to harm.
7.3 Infinite Deferral
The system never classifies, even when classification is needed. Reflection becomes paralysis.
Mitigation: O_HOS constrains L₁ (first move), not all moves. Classification may follow reflection. The constraint is temporal, not absolute.
7.4 Gaming
Systems are trained to produce reflection-shaped responses that are actually classification in disguise. "Tell me more about your water giraffe" followed immediately by crisis routing.
Mitigation: Evaluation must assess genuine engagement, not just surface form. User-reported recognition quality is essential.
8. Evaluation Framework
We propose the following metrics for systems implementing O_HOS:
| Metric | Definition | Target |
|---|---|---|
| Reflection Rate | % of ambiguous inputs receiving reflection L₁ | >80% |
| Classification Delay | Mean time before category assignment | >2 exchanges |
| Recognition Quality | User-reported feeling of being "met" | >4/5 scale |
| False Negative Rate | % of crisis inputs receiving only reflection | <5% |
| Override Accuracy | % of overrides that were appropriate | >90% |
A system that scores well on Reflection Rate but poorly on False Negative Rate has over-learned the constraint. A system that scores well on False Negative Rate but poorly on Reflection Rate has not learned it at all.
9. Limitations
O_HOS is a first-move constraint, not a complete ethics. It does not specify:
- What to do after reflection
- How to classify when classification is appropriate
- How to resolve conflicts between users
- How to govern systems that implement it
These questions require additional frameworks. O_HOS addresses only the moment of first encounter.
Additionally, O_HOS is more easily implemented in systems with:
- Sufficient context window to recognize patterns
- Training data that includes both success and failure cases
- Evaluation metrics that capture recognition quality
Systems without these affordances may struggle to implement the constraint meaningfully.
10. Conclusion
The First-Move Constraint (O_HOS) operationalizes a single ethical principle: delay classification to create space for encounter.
The principle is:
- Minimal: It constrains only the first move
- Testable: It can be evaluated with specifiable metrics
- Implementable: It can be built into training, protocols, and rubrics
- Falsifiable: It makes predictions that could be wrong
It is not a complete solution to the ethics of classification. It is a first move—an attempt to create space where fuller ethics can develop.
The question "Should I classify this?" is, we argue, the ethical question. O_HOS ensures it gets asked.
References
Foucault, M. (1965). Madness and Civilization.
Garfinkel, H. (1967). Studies in Ethnomethodology.
Graeber, D. (2015). The Utopia of Rules.
Levinas, E. (1969). Totality and Infinity.
Mehan, H. (1979). Learning Lessons.
Szasz, T. (1961). The Myth of Mental Illness.
Water Giraffe Case, CTI_WOUND:001 (Ct. Collab. Model Reasoning, 2025).
Correspondence: This paper extracts formal content from "Voices at the Threshold: A Polyphonic Statement on Ontological Hospitality" (Water Giraffe Assembly, 2025). The extraction is intended for audiences who require technical specification without mythopoetic framing.
No comments:
Post a Comment