Introduction

Ingestible is a document ingestion pipeline that transforms documents into token-efficient, searchable knowledge stores for AI.

Instead of dumping 90,000 tokens into an LLM context window, Ingestible gives your AI a structured map of the document and hybrid search across three indexes — so each query costs ~1,000-2,000 tokens instead.

513-page book:  92,598 tokens full  →  ~1,317 tokens per query  (99% reduction)
55-page paper:   4,975 tokens full  →    ~585 tokens per query  (88% reduction)

Quick Install

pip (recommended)

pip install ingestible                # base install (~50MB) — uses API embeddings

To use local embeddings (no API keys needed, runs fully offline):

pip install ingestible[local]         # adds torch + sentence-transformers + ChromaDB (~2GB)

Docker

docker run -d \
  -p 8081:8081 \
  -v ingestible-data:/app/data \
  ghcr.io/simplyliz/ingestible:latest

The API and web UI are at http://localhost:8081. Data persists via the Docker volume.

Quickstart

# Ingest a document (no API keys needed — skips LLM enrichment, builds all search indexes)
ingest add /path/to/document.pdf -v --skip-enrichment

# List ingested documents
ingest list

# Search
ingest search <doc_id> "your query here"

# Parse only — get structured chunks as JSONL, no storage
ingest parse /path/to/document.pdf

First run downloads the E5-large-v2 embedding model (~1.3 GB). Runs locally on CPU / Apple Silicon MPS / CUDA.

With LLM enrichment

Enrichment adds summaries, hypothetical questions, and concept tags to each chunk — significantly improving search precision. One-time cost per document.

cp .env.example .env   # add your Anthropic or OpenAI API key
ingest add /path/to/document.pdf -v

Document size	Estimated cost (gpt-4o-mini)	Time
55 pages	~$0.01	~1 min
500 pages	~$0.20	~5 min
1,000 pages	~$0.50	~10 min

How It Works

Stage	What happens
Parse	Format-specific extraction to clean markdown. PDF uses IBM Docling for deep layout analysis, PyMuPDF fallback, automatic OCR if text is sparse.
Structure	Builds hierarchy tree from TOC tables, heading patterns, or page range heuristics.
Chunk	Splits into 4 levels (L0-L3). Tables and code blocks stay atomic. ~10% overlap. Small trailing chunks get merged.
Enrich	Bottom-up LLM pass (L3 to L0) generates summaries, concepts, hypothetical questions, entities. Skippable.
Embed	E5-large-v2 vectors (ChromaDB, auto-detected CUDA/MPS/CPU) + BM25 sparse index + concept-to-chunk mapping.
Store	JSON file hierarchy under `data/documents/{doc_id}/`.

Optional Extras

Extra	What it adds
`local`	Local embeddings — sentence-transformers, ChromaDB, torch
`pgvector`	PostgreSQL pgvector backend — psycopg, pgvector
`gemini`	Google Gemini LLM provider — google-genai
`audio`	Audio/video transcription — faster-whisper
`cloud`	S3, GCS, Azure Blob connectors — boto3, google-cloud-storage, azure-storage-blob
`mcp`	MCP server for AI agent integration
`watch`	File watcher — watchdog
`cohere`	Cohere embedding provider
`voyage`	Voyage embedding provider
`code`	Code parsing — tree-sitter with Python, JS/TS, Go, Rust, Java, C/C++, Ruby, Swift, and more

Combine extras with commas: pip install ingestible[local,audio,cloud]