Private Cloud Deployment Details (Enterprise only)
All about deploying Sapience Enterprise in your datacenter, behind the firwall (or your private cloud environment).
Sapience Private Cloud Deployment Details
Overview
Sapience can be deployed as a fully self-hosted solution within your own private cloud or on-premises data center. This guide is for IT leaders, security teams, and infrastructure architects evaluating Sapience for environments where data sovereignty, regulatory compliance, and network isolation are non-negotiable.
When you deploy Sapience in your own infrastructure, no data leaves your network. There are no callbacks to Sapience Cloud, no telemetry, and no shared infrastructure with other customers. You own and control every layer of the stack.
Deployment Architecturhelp rebuild
e
Sapience runs on Kubernetes and is composed of five containerized services that work together as a single platform. All containers are provided as standard OCI-compliant images and are orchestrated via Kubernetes manifests (Helm charts available).
Why Kubernetes?
Sapience requires Kubernetes (v1.26+) for orchestration. This is not optional — the platform relies on Kubernetes-native features for:
- Service discovery: Containers communicate via internal cluster DNS, not public endpoints
- Health management: Liveness and readiness probes ensure automatic recovery from failures
- Scaling: Horizontal pod autoscaling allows you to scale individual services based on demand
- Secrets management: Kubernetes Secrets or your preferred vault integration (HashiCorp Vault, Azure Key Vault, AWS Secrets Manager) for credential storage
- Shared storage: Persistent Volume Claims for file handoff between services
- Network policies: Fine-grained control over inter-service communication
Any conformant Kubernetes distribution is supported: AKS, EKS, GKE, OpenShift, Rancher, or bare-metal K8s.
The Five Core Services
Sapience is composed of five containers, each with a single responsibility. Only one service is exposed externally — the rest communicate exclusively over the internal cluster network.
1. Application Server
The primary API and business logic service. This is the only container that handles external traffic.
Property | Detail |
Role | API gateway, business logic, authentication, agent orchestration |
Exposed | Yes — this is the only externally-facing service. Exposed to browser traffic from users on your private network, or if you choose, as a publicly available HTTPS endpoint on the internet (or behind a VPN gate). |
Port | 9000 (HTTPS only) |
Resources | 2 CPU / 8 GiB RAM (recommended) |
Scaling | Horizontal — multiple replicas behind a load balancer |
Health checks | Startup probe ( /startup), liveness probe (/startup) |
The application server handles all user-facing API requests, manages authentication and authorization (JWT-based), orchestrates AI agent interactions, and coordinates with the other four services over the internal network.
Networking: Place behind your ingress controller (NGINX, Traefik, or cloud-native) with TLS termination. All other services should be unreachable from outside the cluster.
2. Document Processing Service
Handles optical character recognition (OCR) and text extraction from uploaded documents such as PDFs, scanned images, Word documents, and spreadsheets.
Property | Detail |
Role | OCR, text extraction, document parsing |
Exposed | No — internal only |
Port | 9998 (HTTPS, cluster-internal) |
Resources | 1 CPU / 2 GiB RAM (recommended) |
Scaling | Horizontal — can scale to zero when idle, scales up on demand |
Health checks | Liveness and readiness probes on port 9998 |
This service supports multilingual OCR across English, Arabic, Chinese (simplified and traditional), Spanish, French, German, Italian, Japanese, Portuguese, and Russian. Documents are processed in-cluster and extracted text is passed back to the application server — no document content is transmitted externally.
3. Document Generation Service
Creates formatted business documents (presentations, spreadsheets, reports) on behalf of AI agents.
Property | Detail |
Role | Programmatic creation of PPTX, XLSX, and other office-format documents |
Exposed | No — internal only |
Port | 8000 (HTTPS, cluster-internal) |
Resources | 1 CPU / 2 GiB RAM (recommended) |
Scaling | Horizontal — can scale to zero when idle |
Health checks | Liveness probe ( /health) on port 8000 |
The document generation service receives structured instructions from the application server and produces professional-grade documents. Generated files are written to shared persistent storage, where the application server retrieves them for delivery to users.
Authentication: This service authenticates all requests using short-lived JWT tokens issued by the application server. Even within the cluster, inter-service communication is authenticated.
4. Database
PostgreSQL serves as the primary data store for all application data including user accounts, organizations, agent configurations, conversation history, audit logs, and access control policies.
Property | Detail |
Role | Persistent data storage (relational) |
Exposed | No — internal only |
Port | 5432 (PostgreSQL protocol, cluster-internal) |
Resources | 2 CPU / 4 GiB RAM minimum (scale based on user count) |
Scaling | Vertical — increase resources as needed; read replicas optional |
Storage | Persistent Volume with at least 50 GiB (SSD recommended) |
You may use the containerized PostgreSQL image provided, or connect Sapience to your existing managed PostgreSQL service (AWS RDS, Azure Database for PostgreSQL, Google Cloud SQL, or your own HA cluster). PostgreSQL 17+ is required.
Backup and recovery: Sapience does not manage database backups. You are responsible for implementing backup, point-in-time recovery, and disaster recovery strategies appropriate for your environment and compliance requirements.
5. Cache Layer (Redis compatible in-memory)
An in-memory data store used for session management, rate limiting, real-time state, and performance caching.
Property | Detail |
Role | Session cache, rate limiting, temporary data store |
Exposed | No — internal only |
Port | 6379 (Redis protocol, cluster-internal) |
Resources | 0.5 CPU / 1 GiB RAM (recommended) |
Scaling | Single instance sufficient for most deployments |
You may use the containerized cache image provided, or connect Sapience to your existing managed Redis-compatible service (AWS ElastiCache, Azure Cache for Redis, Google Memorystore). Redis 7+ is required.
Network Architecture
┌─────────────────────────────────────────────┐
│ Your Network / VPC │
│ │
HTTPS ──────────┤► Ingress / Load Balancer (TLS termination) │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Application │◄──── Only external-facing │
│ │ Server │ service │
│ └──────┬───────┘ │
│ │ Internal cluster network only │
│ ┌────┼────────┬──────────┐ │
│ ▼ ▼ ▼ ▼ │
│ ┌────┐ ┌────┐ ┌──────┐ ┌───────┐ │
│ │Doc │ │Doc │ │ Data-│ │ Cache │ │
│ │Proc│ │Gen │ │ base │ │ Layer │ │
│ └────┘ └────┘ └──────┘ └───────┘ │
│ │
│ Shared Persistent Volume │
│ (file handoff between services) │
└─────────────────────────────────────────────┘Key Network Rules
- Only the Application Server is exposed to your corporate network or the internet. All other services communicate exclusively over the internal Kubernetes cluster network.
- Note: if you are exposing the app on an internet addressable URL, we recommend putting it behind a CloudFlare Web Application Firewall (WAF) or equivalent protection scheme.
- No outbound internet required for core platform operation — with the exception of AI model API calls (see "External Dependencies" below).
- Inter-service authentication: The application server authenticates to internal services using short-lived, signed JWT tokens — even within the trusted cluster network.
- Network policies recommended: We strongly recommend Kubernetes NetworkPolicy resources to restrict pod-to-pod communication to only the paths shown above.
Shared Storage
The Application Server and Document Generation Service share a Persistent Volume for file handoff. This is how generated documents (presentations, spreadsheets) are passed from the generation service back to the application server for delivery to users.
Requirement | Detail |
Type | ReadWriteMany (RWX) PVC — NFS, Azure Files, EFS, or equivalent |
Size | 50 GiB minimum (scale based on document volume) |
Mounted by | Application Server and Document Generation Service |
Contains | Temporary generated files (automatically cleaned up) |
Important: Only these two services need access to the shared volume. The document processing service, database, and cache do not require shared storage.
External Dependencies
While Sapience runs entirely within your infrastructure, AI agent functionality requires API access to large language model providers. At least one of the following must be provided for the AI system-parts to work:
Destination | Purpose | Can Be Restricted To |
OpenAI API ( api.openai.com) | LLM inference for AI agents | Specific API endpoint IPs |
Anthropic API ( api.anthropic.com) | LLM inference for AI agents | Specific API endpoint IPs |
Google AI Apis | LLM inference for AI agents | Specific API endpoint IPs |
Your own AI Provider | LLM inference for AI agents | Any completions-API compatible LLM host (Alibaba, Tencent, Azure, etc) |
Locally Hosted LLMs | LLM inference for AI agents | Alibabe Qwen, Deep Seek, etc. Only hard requirement is support for a completions-compatible API call. |
You provide your own API keys. Sapience does not proxy through any Sapience-operated infrastructure. API calls go directly from your cluster to the model provider.
For Air-Gapped Environments
If your security requirements prohibit any outbound internet access, Sapience supports integration with self-hosted LLM endpoints that expose an OpenAI-compatible API. This includes:
- Azure OpenAI Service (deployed in your own Azure subscription)
- AWS Bedrock with compatible endpoints
- Self-hosted open-source models (via vLLM, TGI, or similar)
Contact the Sapience team for guidance on configuring air-gapped LLM connectivity.
Security Controls in Private Cloud
Deploying Sapience in your own infrastructure gives you complete control over every security layer. Here is how each concern maps to your environment:
Data Sovereignty
Control | Your Responsibility |
Data residency | You choose the physical location of all infrastructure |
Encryption at rest | Configure disk encryption on your Kubernetes nodes and persistent volumes |
Encryption in transit | TLS at the ingress; internal mTLS optional via service mesh |
Key management | Your KMS — Sapience has no access to your encryption keys |
Access Control
Control | How Sapience Supports It |
Authentication | JWT-based; integrates with your IdP via OIDC/SAML |
Authorization | Role-based access control (RBAC) with org, project, and user scopes |
Audit logging | All API access, agent interactions, and administrative actions are logged |
Session management | Configurable session timeouts; sessions stored in your cache layer |
Network Security
Control | Recommendation |
Ingress | TLS 1.2+ only; restrict to your corporate network or VPN |
Internal traffic | Kubernetes NetworkPolicy to limit pod-to-pod communication |
Egress | Allow only LLM provider API endpoints; block all other outbound traffic |
DNS | Internal cluster DNS only; no external DNS resolution required for internal services |
Compliance
Sapience's private cloud deployment model supports compliance with:
- HIPAA — PHI never leaves your controlled environment; you manage BAAs directly with your infrastructure and LLM providers
- NIST CSF 2: Sapience was built with NIST CSF as its over-arching security paradigm;
- GDPR — Full data residency control; right-to-erasure supported at the database level; no data processing by Sapience
- SOC 2 — Audit logs, access controls, and encryption are all within your control boundary
- FedRAMP / IL4+ — Air-gapped deployment option eliminates external dependencies (with self-hosted LLMs)
- Industry-specific (FINRA, PCI-DSS, etc.) — Your security team controls all infrastructure configuration
For a detailed breakdown of Sapience's data handling practices, see Sapience Data Security, HIPAA & GDPR.
Resource Summary
Minimum recommended resources for a production private cloud deployment:
Service | CPU | Memory | Storage | Replicas |
Application Server | 2 cores | 8 GiB | — | 2+ (HA) |
Document Processing | 1 core | 2 GiB | — | 1+ |
Document Generation | 1 core | 2 GiB | — | 1+ |
Database (PostgreSQL) | 2 cores | 4 GiB | 50 GiB SSD | 1 (+ replica for HA) |
Cache (Redis) | 0.5 cores | 1 GiB | — | 1 |
Shared Storage (PVC) | — | — | 50 GiB | — |
Total (minimum) | 6.5 cores | 17 GiB | 100 GiB | — |
These are starting-point recommendations. Scale the Application Server horizontally and the Database vertically as your user count grows.
Deployment Checklist
Use this checklist when planning your private cloud deployment:
Frequently Asked Questions
Can I use my existing PostgreSQL / Redis instead of the provided containers?
Yes. Sapience supports connecting to any PostgreSQL 17+ or Redis 7+ instance. Simply configure the connection strings in the environment variables. Many customers use managed database services (RDS, Cloud SQL, Azure Database) for production deployments.
What if I need to run in a completely air-gapped environment?
Sapience supports air-gapped deployment when paired with a self-hosted LLM endpoint that exposes an OpenAI-compatible API. All container images can be loaded from a private registry with no internet access required at runtime.
How are updates delivered?
Sapience provides updated container images on a regular release cadence. You pull new images to your internal registry and roll them out on your schedule using standard Kubernetes rolling update strategies. There is no automatic update mechanism — you control when and how updates are applied.
What monitoring and observability is built in?
The Application Server exposes health check endpoints (/startup, /health) compatible with Kubernetes probes. Application logs are written to stdout/stderr in structured format, compatible with any log aggregation system (ELK, Splunk, CloudWatch, etc.). You are responsible for integrating with your monitoring stack.
Does Sapience support multi-tenant isolation within a single deployment?
Yes. Sapience has built-in multi-organization support with role-based access control. Users, agents, data sources, and conversations are scoped to organizations. A single deployment can serve multiple isolated tenants.
Have questions about private cloud deployment? Contact the Sapience team at sales@sapiencecloud.ai for a deployment planning session.