Security &
Privacy
Your documents contain your most sensitive knowledge. Data governance isn’t an afterthought in Ingestible — it’s a first-class design constraint. Deploy on your infrastructure, control every data flow, meet any compliance requirement.
Your data, your infrastructure
Deploy Ingestible wherever your security policy demands — on-premise, private cloud, or VPC. No vendor trust required.
No telemetry. No phone-home. No external dependencies.
Self-hosted deployment
Run on your own servers via Docker, Kubernetes, or bare metal. One command to deploy, same codebase as cloud.
Documents never leave
All parsing, chunking, embedding, and storage happens inside your perimeter. Zero data exfiltration surface.
Air-gapped support
No external API calls required. Use local embedding models and skip LLM enrichment for fully isolated operation.
One codebase
The self-hosted version is the full product — not a stripped-down or feature-gated fork. Same repo, same releases.
Data governance
Full control over where your data lives, how it moves, and who can access it.
Data residency
Deploy in any region or jurisdiction. Meet GDPR, LGPD, PIPEDA, CCPA, and sector-specific residency requirements by running Ingestible where your data already lives.
No vendor lock-in
Standard JSON output. Pluggable vector backends — ChromaDB, pgvector, or Qdrant. Switch storage, switch providers, or go fully offline at any time.
Encryption
TLS for all data in transit. Configurable encryption at rest through your infrastructure’s native encryption (EBS, LUKS, BitLocker) or database-level encryption.
Access control
API key authentication, per-key rate limiting, role-based access, and full audit logging. Every document operation is traceable to a principal.
Security practices
Built in the open with defense-in-depth. Trust but verify — the code is right there.
Open source
Every line of the pipeline is inspectable. No proprietary black boxes processing your documents.
Automated security scanning
CI runs Bandit (SAST), dependency vulnerability scanning, and import-level security checks on every commit.
No telemetry
Zero analytics, tracking, or phone-home behavior. The self-hosted binary makes no outbound connections unless you configure LLM enrichment.
Enterprise compliance
SOC 2 and HIPAA compliance documentation available for Enterprise SLA customers. Security review and custom DPA on request.
LLM data handling
Full transparency on what leaves your infrastructure during the enrichment stage — and how to prevent it entirely.
Minimal data exposure
When enrichment calls external LLM APIs (Anthropic, OpenAI), only individual chunk text is sent — never full documents, metadata, or file names.
Self-hosted LLM option
Point the enrichment stage at your own model endpoint (vLLM, Ollama, TGI). Documents never leave your network.
BYOK — bring your own keys
Your API keys, your accounts, your usage tracking. Ingestible never proxies through our servers on self-hosted deployments.
Skip enrichment entirely
For maximum isolation, disable the LLM enrichment stage. You still get parsing, hierarchical chunking, embedding, and hybrid search — no external calls at all.
Data flow: what stays local vs. what can leave
- Document parsing & cleaning
- Structure analysis & hierarchy
- Chunking (all strategies)
- Embedding (E5-large-v2)
- Index building (vector + BM25)
- Search & retrieval
- Storage (JSON files / vector DB)
- LLM enrichment (summaries, concepts)
- Hypothetical question generation
- Knowledge graph triple extraction
- HyDE query expansion
- LLM-as-judge evaluation
All optional. Disable enrichment or use a self-hosted LLM to eliminate external calls entirely.
Ready to deploy on your infrastructure?
Get started with a single Docker command, or contact our team for enterprise deployment planning, custom SLAs, and compliance documentation.