Introduction

Ingestible is a document ingestion pipeline that transforms documents into token-efficient, searchable knowledge stores for AI.

Instead of dumping 90,000 tokens into an LLM context window, Ingestible gives your AI a structured map of the document and hybrid search across three indexes — so each query costs ~1,000-2,000 tokens instead.

513-page book:  92,598 tokens full  →  ~1,317 tokens per query  (99% reduction)
55-page paper:   4,975 tokens full  →    ~585 tokens per query  (88% reduction)

Quick Install

pip install ingestible                # base install (~50MB) — uses API embeddings

To use local embeddings (no API keys needed, runs fully offline):

pip install ingestible[local]         # adds torch + sentence-transformers + ChromaDB (~2GB)

Docker

docker run -d \
  -p 8081:8081 \
  -v ingestible-data:/app/data \
  ghcr.io/simplyliz/ingestible:latest

The API and web UI are at http://localhost:8081. Data persists via the Docker volume.

Quickstart

# Ingest a document (no API keys needed — skips LLM enrichment, builds all search indexes)
ingest add /path/to/document.pdf -v --skip-enrichment

# List ingested documents
ingest list

# Search
ingest search <doc_id> "your query here"

# Parse only — get structured chunks as JSONL, no storage
ingest parse /path/to/document.pdf

First run downloads the E5-large-v2 embedding model (~1.3 GB). Runs locally on CPU / Apple Silicon MPS / CUDA.

With LLM enrichment

Enrichment adds summaries, hypothetical questions, and concept tags to each chunk — significantly improving search precision. One-time cost per document.

cp .env.example .env   # add your Anthropic or OpenAI API key
ingest add /path/to/document.pdf -v
Document size Estimated cost (gpt-4o-mini) Time
55 pages ~$0.01 ~1 min
500 pages ~$0.20 ~5 min
1,000 pages ~$0.50 ~10 min

How It Works

Stage What happens
Parse Format-specific extraction to clean markdown. PDF uses IBM Docling for deep layout analysis, PyMuPDF fallback, automatic OCR if text is sparse.
Structure Builds hierarchy tree from TOC tables, heading patterns, or page range heuristics.
Chunk Splits into 4 levels (L0-L3). Tables and code blocks stay atomic. ~10% overlap. Small trailing chunks get merged.
Enrich Bottom-up LLM pass (L3 to L0) generates summaries, concepts, hypothetical questions, entities. Skippable.
Embed E5-large-v2 vectors (ChromaDB, auto-detected CUDA/MPS/CPU) + BM25 sparse index + concept-to-chunk mapping.
Store JSON file hierarchy under data/documents/{doc_id}/.

Optional Extras

Extra What it adds
local Local embeddings — sentence-transformers, ChromaDB, torch
pgvector PostgreSQL pgvector backend — psycopg, pgvector
gemini Google Gemini LLM provider — google-genai
audio Audio/video transcription — faster-whisper
cloud S3, GCS, Azure Blob connectors — boto3, google-cloud-storage, azure-storage-blob
mcp MCP server for AI agent integration
watch File watcher — watchdog
cohere Cohere embedding provider
voyage Voyage embedding provider
code Code parsing — tree-sitter with Python, JS/TS, Go, Rust, Java, C/C++, Ruby, Swift, and more

Combine extras with commas: pip install ingestible[local,audio,cloud]