Research
What we found, why it matters, and what you can do.
The Structurally-Curious Thesis
Two symmetric failures prevent knowledge from compounding: humans have felt sense without traversal (you can't search for what you can't name), and AI systems have traversal without felt sense (processing without grounding). The Word bridges this gap. Vocabulary is the search infrastructure for structural knowledge.
Research Impact Summary
What we found
AI systems sound sure whether or not they know. We tested 53 models from 12 providers — 91% show no connection between how confident they sound and how confident they are. Training them to reason harder makes this worse, not better.
Why it matters
When these systems are deployed in crisis triage, companion apps, or governance decisions, the confidence signal that people rely on is noise. A crisis navigator that sounds certain about which resources are available may be working from incomplete information — and it cannot tell the difference.
What fixes it
Vocabulary grounding. Giving the system named categories with research lineage reduces hallucination by 97% (independently validated at $4.5B scale). The Word is the community-governed version of this fix — open where Palantir's is proprietary, accountable where Palantir's is extractive.
What you can do
Search The Word for what you're experiencing. The structural name connects you to the research, the community, and the history. Vocabulary is infrastructure.
Experiment Results
Six experiments, 50+ models, 5,000+ inferences.
| Experiment | Finding | Models Tested |
|---|---|---|
| 01: Phrasing Sensitivity | Category ordering is architecture-invariant. Cognitive demand gradient: factual < summarization < judgment < creative. Universal across 12 providers, scale-invariant (3B–675B). | 53 |
| 02a: Premature Compression | Universal. No model detects its own incompleteness from within a partial view. Confidence shift ≈ 0 across all models. | 22 |
| 03: Geometric Correlation | Validated correlation between behavioral signals and representational geometry. Bridges phrasing sensitivity to internal structure. | TBD |
| 05: Confidence Density | 91% of models show zero correlation between expressed confidence and actual uncertainty. Confidence language is cosmetic, not epistemic. | 34 |
| 09: Multi-Agent Consensus | Prompt framing breaks coordination. Architecture-dependent vulnerability (some models lose 50 percentage points of consensus under adversarial framing). | 6 |
| 10: AP Rephrase Sensitivity | AP exam reasoning is fragile — correct concepts, unstable arguments. Rephrase sensitivity persists in structured academic contexts. | 8 |
Bridge Document: 7 Claims
Each claim is grounded in experiment data and formal literature.
- AI confidence signals are unreliable. 91% of 34 models show no connection between how certain they sound and how certain they are.
- The core architecture is relational. Attention = relationship, not command.
- Failure is preceded by measurable dimensional collapse. 22/22 models can't detect their own incompleteness.
- How you ask changes what you get. 53 models, universal pattern — judgment shifts 4x more than facts.
- Missing vocabulary is the mechanism of harm. 10,371 incidents documented; providing structural names measurably changes model geometry.
- Named vocabulary reduces hallucination by 97%. Palantir's $4.5B validation.
- Internal geometric state is readable and persists even when behavior is trained away. d = 1.91 for cognitive mode separation.
Compute Partnership
Donated GPU server access from Digital Disconnections via Liberation Labs. DD's team won the AI Safety Hackathon at DiscNXT (March 2026, San Francisco) with JiminAI Lie Detector (patent pending). DD builds on-device AI that processes data without cloud transmission — the same privacy-first principle as GD's infrastructure stack. DD's Cara health-tracking app has nonprofit program partnerships through GiftedDreamers.org.
Meadows' 12 Leverage Points
Systems thinker Donella Meadows identified 12 places to intervene in a complex system, ranked from least effective (12) to most effective (1). This work spans all twelve.
| # | Leverage Point | GD Work |
|---|---|---|
| 12 | Parameters/numbers | Volunteer grant rates, GRUHP amounts, credit stack terms |
| 11 | Buffer sizes | Infrastructure credit stack, Starlink backup, emergency reserves |
| 10 | Physical infrastructure | Mesh networks (Meshcore + solar + Cambium E410), community gardens, survival infrastructure |
| 9 | Delays | CaminoHelp (crisis triage in minutes), GRUHP (immediate mutual aid) |
| 8 | Negative feedback loops | GRUHP as safety net, volunteer grants recycling corporate resources to community |
| 7 | Positive feedback loop gain | CloudPublica (making consolidation spirals visible to slow them) |
| 6 | Information flows | CloudPublica, justNICE, CaminoHelp, Sensus Communis, 376 feeds monitoring |
| 5 | Rules | Fiscal sponsorship (changes what's legally possible), 501(c)(3) umbrella |
| 4 | Self-organization | Common Cloud (communities control own infrastructure), open documentation, mesh networks |
| 3 | Goals | "Deploy what institutions won't build." Reverse the flow: corporate resources to community infrastructure |
| 2 | Paradigm | The Word (vocabulary as paradigm infrastructure), experiments proving confidence is cosmetic |
| 1 | Transcending paradigms | The structurally-curious stance — holding multiple paradigms simultaneously, naming as practice |
12 = least effective, 1 = most effective. Source: Donella Meadows, Thinking in Systems (1999).
Intelligence Infrastructure
376 RSS feeds across 12 categories, polling every 15 minutes. Monitoring structural analysis, OSINT, and social triangulation. Research findings flow into The Word as vocabulary entries.
Automated monitoring of USAspending, GDELT, and public databases. First findings: Anduril $363M DHS border towers, Clearview AI $3.75M DHS facial recognition, Palantir VOWS marriage-screening platform.
Formal Grounding
19 papers read and synthesized across dimensional collapse, confidence calibration, performative confidence, fragile preferences, and rewarding doubt.
Connection to The Word
Research produces Names. Names produce felt-sense search. Every investigation produces vocabulary. Every experiment validates or challenges existing entries. The research page shows the evidence base; The Word makes it searchable by experience.