Deployment

Ingestible runs as a FastAPI application behind gunicorn. There are several deployment options depending on your needs.

Development

cp .env.example .env
# Edit .env with your API keys

docker compose up -d

This builds with the local extras (ChromaDB file-based storage, local embeddings) and exposes the API and web UI at http://localhost:8081.

Production with pgvector

docker compose -f docker-compose.prod.yml up -d

Required environment variables in .env:

INGEST_PGVECTOR_URL=postgresql://user:pass@host:5432/ingestible
INGEST_API_KEYS=your-api-key-here
INGEST_ANTHROPIC_API_KEY=sk-ant-...

The production compose file uses the local,pgvector extras and sets INGEST_VECTOR_BACKEND=pgvector.

Docker run (standalone)

docker run -d \
  -p 8081:8081 \
  -v ingestible-data:/app/data \
  --env-file .env \
  ghcr.io/simplyliz/ingestible:latest

Two image variants are published:

  • ghcr.io/simplyliz/ingestible:latest — full image (~3GB) with all extras
  • ghcr.io/simplyliz/ingestible:latest-slim — thin API image (~500MB)

Custom Docker build

The Dockerfile supports a build argument to control which extras are installed:

docker build --build-arg EXTRAS="local,pgvector,gemini" -t ingestible .

Fly.io

Ingestible includes a fly.toml for one-command deployment to Fly.io.

Initial setup

fly launch --no-deploy
fly volumes create ingestible_data --region fra --size 10

Set secrets

fly secrets set INGEST_API_KEYS=your-api-key
fly secrets set INGEST_GEMINI_API_KEY=your-gemini-key
# Or use your preferred LLM provider:
# fly secrets set INGEST_ANTHROPIC_API_KEY=sk-ant-...
# fly secrets set INGEST_OPENAI_API_KEY=sk-...

Deploy

fly deploy

Configuration

The included fly.toml configures:

Setting Value
Region fra (Frankfurt)
VM shared-cpu-1x, 2GB RAM
Port 8081 (HTTPS enforced)
Health check GET /health every 30s
Persistent volume /app/data
Auto-stop Machines stop when idle, start on request
Concurrency Soft 20, hard 25 requests
Extras local,pgvector,gemini

Scaling

For higher throughput, adjust workers via the WEB_CONCURRENCY environment variable:

fly secrets set WEB_CONCURRENCY=4

To upgrade the VM:

fly scale vm shared-cpu-2x --memory 4096

Gunicorn Configuration

The gunicorn.conf.py included in the repository configures:

Setting Value Notes
Workers min(cpu_count * 2 + 1, 8) Override with WEB_CONCURRENCY
Worker class uvicorn.workers.UvicornWorker Async ASGI
Timeout 600s (10 min) Ingestion can be slow for large documents
Preload true Shares model memory across workers. Set GUNICORN_PRELOAD=false for low-memory environments
Max requests 1000 + jitter Workers restart periodically to prevent memory leaks

Direct (uvicorn)

For development or simple deployments:

ingest serve                            # default: localhost:8081
ingest serve --host 0.0.0.0 --port 9000
ingest serve --reload                   # auto-reload on code changes

For production without Docker:

gunicorn ingestible.api:app -c gunicorn.conf.py

Health Checks

Two endpoints are available for orchestrators and load balancers:

  • GET /health — fast liveness probe, returns {"status": "ok"}. No dependency checks.
  • GET /health/ready — deep readiness check. Verifies data directory, disk space, embedding model, and LLM API. Returns "ready" or "degraded" with per-check details.

Monitoring

Prometheus metrics are exposed at GET /metrics:

Metric Type Description
ingestible_ingest_duration_seconds histogram Ingestion latency by stage
ingestible_active_ingestions gauge Currently running ingestion jobs
ingestible_llm_calls_total counter LLM API calls by provider and status
ingestible_search_duration_seconds histogram Search query latency

Environment Variables

See the Usage Guide for the full configuration reference. Key production settings:

INGEST_API_KEYS=key1,key2               # API authentication (comma-separated)
INGEST_CORS_ORIGINS=https://app.example.com
INGEST_RATE_LIMIT_INGEST=10/minute
INGEST_RATE_LIMIT_SEARCH=60/minute
INGEST_LOG_JSON=true
INGEST_LOG_LEVEL=INFO
INGEST_MAX_UPLOAD_BYTES=500000000
INGEST_MAX_CONCURRENT_INGESTIONS=2
INGEST_AUDIT_ENABLED=true