Token Economics

The Problem

An AI reading a 500-page book needs ~90,000 tokens of context. Most LLM context windows cap at 128K-200K tokens, and more context = slower responses + higher cost. But for any given question, the AI only needs 1-2 relevant passages — maybe 500 tokens.

The challenge: finding those 500 tokens without reading all 90,000.

The Solution: Hierarchical Retrieval

Instead of dumping the entire document into context, the pipeline gives the AI three levels of access:

Step	What the AI gets	Tokens	Purpose
1. Map	L0 overview + table of contents	~500-800	Know what's in the book and where
2. Search	Top 5 ranked passage snippets	~300-500	Find which passages are relevant
3. Drill	Full content of 1-2 best passages	~300-600	Read the actual answer
Total		~1,000-2,000

Real-World Numbers

55-page research paper (MCP Interoperability)

Full document:     4,975 tokens
Per query:          ~585 tokens
Savings:              88%

513-page German book (So ticke ich)

Full document:    92,598 tokens
Per query:        ~1,317 tokens
Savings:              99%

Projected: 1,000-page technical book

Full document:   ~180,000 tokens (exceeds most context windows)
Per query:         ~2,000 tokens
Savings:              99%

The 1,000-page book doesn't even fit in a single context window. Without this pipeline, the AI simply can't use it. With the pipeline, it costs ~2,000 tokens per query — room for dozens of queries in one conversation.

What Makes It Work

Nothing is lost

All content is stored and indexed. The savings come from the AI reading only what's relevant per query, not from discarding content.

The hierarchy is the map

The L0 overview (~800 tokens) tells the AI the full structure:

Teil I: Mich verstehen
  Kapitel 1: Mein Kopf funktioniert anders
  Kapitel 3: Meine Sinne und ich
Teil II: Schwierige Momente
  Kapitel 9: Wenn alles zu viel wird (Meltdowns)
  ...

The AI can see that a question about sensory overload should look in Kapitel 3 and Kapitel 9 — without reading 500 pages to figure that out.

Semantic search finds meaning, not keywords

"How do I handle sensory overload?" matches a passage about "nervous system overflow reactions" because vector embeddings capture meaning. The AI doesn't need the exact keyword to find the right passage.

Multiple retrieval systems cross-validate

A passage found by both vector search AND keyword search is more likely relevant than one found by only one system. RRF fusion naturally ranks cross-validated results higher.

Enrichment ROI

Enrichment is a one-time ingestion cost. The return is better search precision at every query:

	Without enrichment	With enrichment
Embeddings per passage	1	5-6
What gets matched	Raw content wording	Content + summaries + hypothetical questions + concepts
Concept index	Empty	Populated
Typical precision	Good	Significantly better

Cost estimate for enrichment (gpt-4o-mini):

Document size	Passages	Estimated cost	Time
55 pages	~30	~$0.01	~1 min
500 pages	~650	~$0.20	~5 min
1,000 pages	~1,300	~$0.50	~10 min

The enrichment runs once. Every subsequent query benefits without any API cost.