Ask AI

Private Cloud Deployment Details (Enterprise only)

All about deploying Sapience Enterprise in your datacenter, behind the firwall (or your private cloud environment).

Sapience Private Cloud Deployment Details


Overview

Sapience can be deployed as a fully self-hosted solution within your own private cloud or on-premises data center. This guide is for IT leaders, security teams, and infrastructure architects evaluating Sapience for environments where data sovereignty, regulatory compliance, and network isolation are non-negotiable.

When you deploy Sapience in your own infrastructure, no data leaves your network. There are no callbacks to Sapience Cloud, no telemetry, and no shared infrastructure with other customers. You own and control every layer of the stack.


Deployment Architecturhelp rebuild

e

Sapience runs on Kubernetes and is composed of five containerized services that work together as a single platform. All containers are provided as standard OCI-compliant images and are orchestrated via Kubernetes manifests (Helm charts available).

Why Kubernetes?

Sapience requires Kubernetes (v1.26+) for orchestration. This is not optional — the platform relies on Kubernetes-native features for:

  • Service discovery: Containers communicate via internal cluster DNS, not public endpoints
  • Health management: Liveness and readiness probes ensure automatic recovery from failures
  • Scaling: Horizontal pod autoscaling allows you to scale individual services based on demand
  • Secrets management: Kubernetes Secrets or your preferred vault integration (HashiCorp Vault, Azure Key Vault, AWS Secrets Manager) for credential storage
  • Shared storage: Persistent Volume Claims for file handoff between services
  • Network policies: Fine-grained control over inter-service communication

Any conformant Kubernetes distribution is supported: AKS, EKS, GKE, OpenShift, Rancher, or bare-metal K8s.


The Five Core Services

Sapience is composed of five containers, each with a single responsibility. Only one service is exposed externally — the rest communicate exclusively over the internal cluster network.

1. Application Server

The primary API and business logic service. This is the only container that handles external traffic.

Property
Detail
Role
API gateway, business logic, authentication, agent orchestration
Exposed
Yes — this is the only externally-facing service. Exposed to browser traffic from users on your private network, or if you choose, as a publicly available HTTPS endpoint on the internet (or behind a VPN gate).
Port
9000 (HTTPS only)
Resources
2 CPU / 8 GiB RAM (recommended)
Scaling
Horizontal — multiple replicas behind a load balancer
Health checks
Startup probe (/startup), liveness probe (/startup)

The application server handles all user-facing API requests, manages authentication and authorization (JWT-based), orchestrates AI agent interactions, and coordinates with the other four services over the internal network.

Networking: Place behind your ingress controller (NGINX, Traefik, or cloud-native) with TLS termination. All other services should be unreachable from outside the cluster.

2. Document Processing Service

Handles optical character recognition (OCR) and text extraction from uploaded documents such as PDFs, scanned images, Word documents, and spreadsheets.

Property
Detail
Role
OCR, text extraction, document parsing
Exposed
No — internal only
Port
9998 (HTTPS, cluster-internal)
Resources
1 CPU / 2 GiB RAM (recommended)
Scaling
Horizontal — can scale to zero when idle, scales up on demand
Health checks
Liveness and readiness probes on port 9998

This service supports multilingual OCR across English, Arabic, Chinese (simplified and traditional), Spanish, French, German, Italian, Japanese, Portuguese, and Russian. Documents are processed in-cluster and extracted text is passed back to the application server — no document content is transmitted externally.

3. Document Generation Service

Creates formatted business documents (presentations, spreadsheets, reports) on behalf of AI agents.

Property
Detail
Role
Programmatic creation of PPTX, XLSX, and other office-format documents
Exposed
No — internal only
Port
8000 (HTTPS, cluster-internal)
Resources
1 CPU / 2 GiB RAM (recommended)
Scaling
Horizontal — can scale to zero when idle
Health checks
Liveness probe (/health) on port 8000

The document generation service receives structured instructions from the application server and produces professional-grade documents. Generated files are written to shared persistent storage, where the application server retrieves them for delivery to users.

Authentication: This service authenticates all requests using short-lived JWT tokens issued by the application server. Even within the cluster, inter-service communication is authenticated.

4. Database

PostgreSQL serves as the primary data store for all application data including user accounts, organizations, agent configurations, conversation history, audit logs, and access control policies.

Property
Detail
Role
Persistent data storage (relational)
Exposed
No — internal only
Port
5432 (PostgreSQL protocol, cluster-internal)
Resources
2 CPU / 4 GiB RAM minimum (scale based on user count)
Scaling
Vertical — increase resources as needed; read replicas optional
Storage
Persistent Volume with at least 50 GiB (SSD recommended)

You may use the containerized PostgreSQL image provided, or connect Sapience to your existing managed PostgreSQL service (AWS RDS, Azure Database for PostgreSQL, Google Cloud SQL, or your own HA cluster). PostgreSQL 17+ is required.

Backup and recovery: Sapience does not manage database backups. You are responsible for implementing backup, point-in-time recovery, and disaster recovery strategies appropriate for your environment and compliance requirements.

5. Cache Layer (Redis compatible in-memory)

An in-memory data store used for session management, rate limiting, real-time state, and performance caching.

Property
Detail
Role
Session cache, rate limiting, temporary data store
Exposed
No — internal only
Port
6379 (Redis protocol, cluster-internal)
Resources
0.5 CPU / 1 GiB RAM (recommended)
Scaling
Single instance sufficient for most deployments

You may use the containerized cache image provided, or connect Sapience to your existing managed Redis-compatible service (AWS ElastiCache, Azure Cache for Redis, Google Memorystore). Redis 7+ is required.


Network Architecture

                    ┌─────────────────────────────────────────────┐
                    │              Your Network / VPC             │
                    │                                             │
    HTTPS ──────────┤► Ingress / Load Balancer (TLS termination)  │
                    │         │                                   │
                    │         ▼                                   │
                    │  ┌──────────────┐                           │
                    │  │  Application │◄──── Only external-facing │
                    │  │    Server    │       service             │
                    │  └──────┬───────┘                           │
                    │         │ Internal cluster network only     │
                    │    ┌────┼────────┬──────────┐               │
                    │    ▼    ▼        ▼          ▼               │
                    │  ┌────┐ ┌────┐ ┌──────┐ ┌───────┐           │
                    │  │Doc │ │Doc │ │ Data-│ │ Cache │           │
                    │  │Proc│ │Gen │ │ base │ │ Layer │           │
                    │  └────┘ └────┘ └──────┘ └───────┘           │
                    │                                             │
                    │       Shared Persistent Volume              │
                    │       (file handoff between services)       │
                    └─────────────────────────────────────────────┘

Key Network Rules

  1. Only the Application Server is exposed to your corporate network or the internet. All other services communicate exclusively over the internal Kubernetes cluster network.
    1. Note: if you are exposing the app on an internet addressable URL, we recommend putting it behind a CloudFlare Web Application Firewall (WAF) or equivalent protection scheme.
  1. No outbound internet required for core platform operation — with the exception of AI model API calls (see "External Dependencies" below).
  1. Inter-service authentication: The application server authenticates to internal services using short-lived, signed JWT tokens — even within the trusted cluster network.
  1. Network policies recommended: We strongly recommend Kubernetes NetworkPolicy resources to restrict pod-to-pod communication to only the paths shown above.

Shared Storage

The Application Server and Document Generation Service share a Persistent Volume for file handoff. This is how generated documents (presentations, spreadsheets) are passed from the generation service back to the application server for delivery to users.

Requirement
Detail
Type
ReadWriteMany (RWX) PVC — NFS, Azure Files, EFS, or equivalent
Size
50 GiB minimum (scale based on document volume)
Mounted by
Application Server and Document Generation Service
Contains
Temporary generated files (automatically cleaned up)

Important: Only these two services need access to the shared volume. The document processing service, database, and cache do not require shared storage.


External Dependencies

While Sapience runs entirely within your infrastructure, AI agent functionality requires API access to large language model providers. At least one of the following must be provided for the AI system-parts to work:

Destination
Purpose
Can Be Restricted To
OpenAI API (api.openai.com)
LLM inference for AI agents
Specific API endpoint IPs
Anthropic API (api.anthropic.com)
LLM inference for AI agents
Specific API endpoint IPs
Google AI Apis
LLM inference for AI agents
Specific API endpoint IPs
Your own AI Provider
LLM inference for AI agents
Any completions-API compatible LLM host (Alibaba, Tencent, Azure, etc)
Locally Hosted LLMs
LLM inference for AI agents
Alibabe Qwen, Deep Seek, etc. Only hard requirement is support for a completions-compatible API call.

You provide your own API keys. Sapience does not proxy through any Sapience-operated infrastructure. API calls go directly from your cluster to the model provider.

For Air-Gapped Environments

If your security requirements prohibit any outbound internet access, Sapience supports integration with self-hosted LLM endpoints that expose an OpenAI-compatible API. This includes:

  • Azure OpenAI Service (deployed in your own Azure subscription)
  • AWS Bedrock with compatible endpoints
  • Self-hosted open-source models (via vLLM, TGI, or similar)

Contact the Sapience team for guidance on configuring air-gapped LLM connectivity.


Security Controls in Private Cloud

Deploying Sapience in your own infrastructure gives you complete control over every security layer. Here is how each concern maps to your environment:

Data Sovereignty

Control
Your Responsibility
Data residency
You choose the physical location of all infrastructure
Encryption at rest
Configure disk encryption on your Kubernetes nodes and persistent volumes
Encryption in transit
TLS at the ingress; internal mTLS optional via service mesh
Key management
Your KMS — Sapience has no access to your encryption keys

Access Control

Control
How Sapience Supports It
Authentication
JWT-based; integrates with your IdP via OIDC/SAML
Authorization
Role-based access control (RBAC) with org, project, and user scopes
Audit logging
All API access, agent interactions, and administrative actions are logged
Session management
Configurable session timeouts; sessions stored in your cache layer

Network Security

Control
Recommendation
Ingress
TLS 1.2+ only; restrict to your corporate network or VPN
Internal traffic
Kubernetes NetworkPolicy to limit pod-to-pod communication
Egress
Allow only LLM provider API endpoints; block all other outbound traffic
DNS
Internal cluster DNS only; no external DNS resolution required for internal services

Compliance

Sapience's private cloud deployment model supports compliance with:

  • HIPAA — PHI never leaves your controlled environment; you manage BAAs directly with your infrastructure and LLM providers
  • NIST CSF 2: Sapience was built with NIST CSF as its over-arching security paradigm;
  • GDPR — Full data residency control; right-to-erasure supported at the database level; no data processing by Sapience
  • SOC 2 — Audit logs, access controls, and encryption are all within your control boundary
  • FedRAMP / IL4+ — Air-gapped deployment option eliminates external dependencies (with self-hosted LLMs)
  • Industry-specific (FINRA, PCI-DSS, etc.) — Your security team controls all infrastructure configuration

For a detailed breakdown of Sapience's data handling practices, see Sapience Data Security, HIPAA & GDPR.


Resource Summary

Minimum recommended resources for a production private cloud deployment:

Service
CPU
Memory
Storage
Replicas
Application Server
2 cores
8 GiB
2+ (HA)
Document Processing
1 core
2 GiB
1+
Document Generation
1 core
2 GiB
1+
Database (PostgreSQL)
2 cores
4 GiB
50 GiB SSD
1 (+ replica for HA)
Cache (Redis)
0.5 cores
1 GiB
1
Shared Storage (PVC)
50 GiB
Total (minimum)
6.5 cores
17 GiB
100 GiB

These are starting-point recommendations. Scale the Application Server horizontally and the Database vertically as your user count grows.


Deployment Checklist

Use this checklist when planning your private cloud deployment:


Frequently Asked Questions

Can I use my existing PostgreSQL / Redis instead of the provided containers?

Yes. Sapience supports connecting to any PostgreSQL 17+ or Redis 7+ instance. Simply configure the connection strings in the environment variables. Many customers use managed database services (RDS, Cloud SQL, Azure Database) for production deployments.

What if I need to run in a completely air-gapped environment?

Sapience supports air-gapped deployment when paired with a self-hosted LLM endpoint that exposes an OpenAI-compatible API. All container images can be loaded from a private registry with no internet access required at runtime.

How are updates delivered?

Sapience provides updated container images on a regular release cadence. You pull new images to your internal registry and roll them out on your schedule using standard Kubernetes rolling update strategies. There is no automatic update mechanism — you control when and how updates are applied.

What monitoring and observability is built in?

The Application Server exposes health check endpoints (/startup/health) compatible with Kubernetes probes. Application logs are written to stdout/stderr in structured format, compatible with any log aggregation system (ELK, Splunk, CloudWatch, etc.). You are responsible for integrating with your monitoring stack.

Does Sapience support multi-tenant isolation within a single deployment?

Yes. Sapience has built-in multi-organization support with role-based access control. Users, agents, data sources, and conversations are scoped to organizations. A single deployment can serve multiple isolated tenants.


Have questions about private cloud deployment? Contact the Sapience team at sales@sapiencecloud.ai for a deployment planning session.

Did this answer your question?
😞
😐
🤩