Private Cloud Deployment Details (Enterprise only) – Sapience Cloud

Sapience Private Cloud Deployment Details

Overview

Sapience can be deployed as a fully self-hosted solution within your own private cloud or on-premises data center. This guide is for IT leaders, security teams, and infrastructure architects evaluating Sapience for environments where data sovereignty, regulatory compliance, and network isolation are non-negotiable.

When you deploy Sapience in your own infrastructure, no data leaves your network. There are no callbacks to Sapience Cloud, no telemetry, and no shared infrastructure with other customers. You own and control every layer of the stack.

Deployment Architecturhelp rebuild

Sapience runs on Kubernetes and is composed of five containerized services that work together as a single platform. All containers are provided as standard OCI-compliant images and are orchestrated via Kubernetes manifests (Helm charts available).

Why Kubernetes?

Sapience requires Kubernetes (v1.26+) for orchestration. This is not optional — the platform relies on Kubernetes-native features for:

Service discovery: Containers communicate via internal cluster DNS, not public endpoints

Health management: Liveness and readiness probes ensure automatic recovery from failures

Scaling: Horizontal pod autoscaling allows you to scale individual services based on demand

Secrets management: Kubernetes Secrets or your preferred vault integration (HashiCorp Vault, Azure Key Vault, AWS Secrets Manager) for credential storage

Shared storage: Persistent Volume Claims for file handoff between services

Network policies: Fine-grained control over inter-service communication

Any conformant Kubernetes distribution is supported: AKS, EKS, GKE, OpenShift, Rancher, or bare-metal K8s.

The Five Core Services

Sapience is composed of five containers, each with a single responsibility. Only one service is exposed externally — the rest communicate exclusively over the internal cluster network.

1. Application Server

The primary API and business logic service. This is the only container that handles external traffic.

Property	Detail
Role	API gateway, business logic, authentication, agent orchestration
Exposed	Yes — this is the only externally-facing service. Exposed to browser traffic from users on your private network, or if you choose, as a publicly available HTTPS endpoint on the internet (or behind a VPN gate).
Port	9000 (HTTPS only)
Resources	2 CPU / 8 GiB RAM (recommended)
Scaling	Horizontal — multiple replicas behind a load balancer
Health checks	Startup probe (`/startup`), liveness probe (`/startup`)

The application server handles all user-facing API requests, manages authentication and authorization (JWT-based), orchestrates AI agent interactions, and coordinates with the other four services over the internal network.

Networking: Place behind your ingress controller (NGINX, Traefik, or cloud-native) with TLS termination. All other services should be unreachable from outside the cluster.

2. Document Processing Service

Handles optical character recognition (OCR) and text extraction from uploaded documents such as PDFs, scanned images, Word documents, and spreadsheets.

Property	Detail
Role	OCR, text extraction, document parsing
Exposed	No — internal only
Port	9998 (HTTPS, cluster-internal)
Resources	1 CPU / 2 GiB RAM (recommended)
Scaling	Horizontal — can scale to zero when idle, scales up on demand
Health checks	Liveness and readiness probes on port 9998

This service supports multilingual OCR across English, Arabic, Chinese (simplified and traditional), Spanish, French, German, Italian, Japanese, Portuguese, and Russian. Documents are processed in-cluster and extracted text is passed back to the application server — no document content is transmitted externally.

3. Document Generation Service

Creates formatted business documents (presentations, spreadsheets, reports) on behalf of AI agents.

Property	Detail
Role	Programmatic creation of PPTX, XLSX, and other office-format documents
Exposed	No — internal only
Port	8000 (HTTPS, cluster-internal)
Resources	1 CPU / 2 GiB RAM (recommended)
Scaling	Horizontal — can scale to zero when idle
Health checks	Liveness probe (`/health`) on port 8000

The document generation service receives structured instructions from the application server and produces professional-grade documents. Generated files are written to shared persistent storage, where the application server retrieves them for delivery to users.

Authentication: This service authenticates all requests using short-lived JWT tokens issued by the application server. Even within the cluster, inter-service communication is authenticated.

4. Database

PostgreSQL serves as the primary data store for all application data including user accounts, organizations, agent configurations, conversation history, audit logs, and access control policies.

Property	Detail
Role	Persistent data storage (relational)
Exposed	No — internal only
Port	5432 (PostgreSQL protocol, cluster-internal)
Resources	2 CPU / 4 GiB RAM minimum (scale based on user count)
Scaling	Vertical — increase resources as needed; read replicas optional
Storage	Persistent Volume with at least 50 GiB (SSD recommended)

You may use the containerized PostgreSQL image provided, or connect Sapience to your existing managed PostgreSQL service (AWS RDS, Azure Database for PostgreSQL, Google Cloud SQL, or your own HA cluster). PostgreSQL 17+ is required.

Backup and recovery: Sapience does not manage database backups. You are responsible for implementing backup, point-in-time recovery, and disaster recovery strategies appropriate for your environment and compliance requirements.

5. Cache Layer (Redis compatible in-memory)

An in-memory data store used for session management, rate limiting, real-time state, and performance caching.

Property	Detail
Role	Session cache, rate limiting, temporary data store
Exposed	No — internal only
Port	6379 (Redis protocol, cluster-internal)
Resources	0.5 CPU / 1 GiB RAM (recommended)
Scaling	Single instance sufficient for most deployments

You may use the containerized cache image provided, or connect Sapience to your existing managed Redis-compatible service (AWS ElastiCache, Azure Cache for Redis, Google Memorystore). Redis 7+ is required.

Network Architecture

                    ┌─────────────────────────────────────────────┐
                    │              Your Network / VPC             │
                    │                                             │
    HTTPS ──────────┤► Ingress / Load Balancer (TLS termination)  │
                    │         │                                   │
                    │         ▼                                   │
                    │  ┌──────────────┐                           │
                    │  │  Application │◄──── Only external-facing │
                    │  │    Server    │       service             │
                    │  └──────┬───────┘                           │
                    │         │ Internal cluster network only     │
                    │    ┌────┼────────┬──────────┐               │
                    │    ▼    ▼        ▼          ▼               │
                    │  ┌────┐ ┌────┐ ┌──────┐ ┌───────┐           │
                    │  │Doc │ │Doc │ │ Data-│ │ Cache │           │
                    │  │Proc│ │Gen │ │ base │ │ Layer │           │
                    │  └────┘ └────┘ └──────┘ └───────┘           │
                    │                                             │
                    │       Shared Persistent Volume              │
                    │       (file handoff between services)       │
                    └─────────────────────────────────────────────┘

Key Network Rules

Only the Application Server is exposed to your corporate network or the internet. All other services communicate exclusively over the internal Kubernetes cluster network.

Note: if you are exposing the app on an internet addressable URL, we recommend putting it behind a CloudFlare Web Application Firewall (WAF) or equivalent protection scheme.

No outbound internet required for core platform operation — with the exception of AI model API calls (see "External Dependencies" below).

Inter-service authentication: The application server authenticates to internal services using short-lived, signed JWT tokens — even within the trusted cluster network.

Network policies recommended: We strongly recommend Kubernetes NetworkPolicy resources to restrict pod-to-pod communication to only the paths shown above.

Shared Storage

The Application Server and Document Generation Service share a Persistent Volume for file handoff. This is how generated documents (presentations, spreadsheets) are passed from the generation service back to the application server for delivery to users.

Requirement	Detail
Type	ReadWriteMany (RWX) PVC — NFS, Azure Files, EFS, or equivalent
Size	50 GiB minimum (scale based on document volume)
Mounted by	Application Server and Document Generation Service
Contains	Temporary generated files (automatically cleaned up)

Important: Only these two services need access to the shared volume. The document processing service, database, and cache do not require shared storage.

External Dependencies

While Sapience runs entirely within your infrastructure, AI agent functionality requires API access to large language model providers. At least one of the following must be provided for the AI system-parts to work:

Destination	Purpose	Can Be Restricted To
OpenAI API (`api.openai.com`)	LLM inference for AI agents	Specific API endpoint IPs
Anthropic API (`api.anthropic.com`)	LLM inference for AI agents	Specific API endpoint IPs
Google AI Apis	LLM inference for AI agents	Specific API endpoint IPs
Your own AI Provider	LLM inference for AI agents	Any completions-API compatible LLM host (Alibaba, Tencent, Azure, etc)
Locally Hosted LLMs	LLM inference for AI agents	Alibabe Qwen, Deep Seek, etc. Only hard requirement is support for a completions-compatible API call.

You provide your own API keys. Sapience does not proxy through any Sapience-operated infrastructure. API calls go directly from your cluster to the model provider.

For Air-Gapped Environments

If your security requirements prohibit any outbound internet access, Sapience supports integration with self-hosted LLM endpoints that expose an OpenAI-compatible API. This includes:

Azure OpenAI Service (deployed in your own Azure subscription)

AWS Bedrock with compatible endpoints

Self-hosted open-source models (via vLLM, TGI, or similar)

Contact the Sapience team for guidance on configuring air-gapped LLM connectivity.

Security Controls in Private Cloud

Deploying Sapience in your own infrastructure gives you complete control over every security layer. Here is how each concern maps to your environment:

Data Sovereignty

Control	Your Responsibility
Data residency	You choose the physical location of all infrastructure
Encryption at rest	Configure disk encryption on your Kubernetes nodes and persistent volumes
Encryption in transit	TLS at the ingress; internal mTLS optional via service mesh
Key management	Your KMS — Sapience has no access to your encryption keys

Access Control

Control	How Sapience Supports It
Authentication	JWT-based; integrates with your IdP via OIDC/SAML
Authorization	Role-based access control (RBAC) with org, project, and user scopes
Audit logging	All API access, agent interactions, and administrative actions are logged
Session management	Configurable session timeouts; sessions stored in your cache layer

Network Security

Control	Recommendation
Ingress	TLS 1.2+ only; restrict to your corporate network or VPN
Internal traffic	Kubernetes NetworkPolicy to limit pod-to-pod communication
Egress	Allow only LLM provider API endpoints; block all other outbound traffic
DNS	Internal cluster DNS only; no external DNS resolution required for internal services

Compliance

Sapience's private cloud deployment model supports compliance with:

HIPAA — PHI never leaves your controlled environment; you manage BAAs directly with your infrastructure and LLM providers

NIST CSF 2: Sapience was built with NIST CSF as its over-arching security paradigm;

GDPR — Full data residency control; right-to-erasure supported at the database level; no data processing by Sapience

SOC 2 — Audit logs, access controls, and encryption are all within your control boundary

FedRAMP / IL4+ — Air-gapped deployment option eliminates external dependencies (with self-hosted LLMs)

Industry-specific (FINRA, PCI-DSS, etc.) — Your security team controls all infrastructure configuration

For a detailed breakdown of Sapience's data handling practices, see Sapience Data Security, HIPAA & GDPR.

Resource Summary

Minimum recommended resources for a production private cloud deployment:

Service	CPU	Memory	Storage	Replicas
Application Server	2 cores	8 GiB	—	2+ (HA)
Document Processing	1 core	2 GiB	—	1+
Document Generation	1 core	2 GiB	—	1+
Database (PostgreSQL)	2 cores	4 GiB	50 GiB SSD	1 (+ replica for HA)
Cache (Redis)	0.5 cores	1 GiB	—	1
Shared Storage (PVC)	—	—	50 GiB	—
Total (minimum)	6.5 cores	17 GiB	100 GiB	—

These are starting-point recommendations. Scale the Application Server horizontally and the Database vertically as your user count grows.

Deployment Checklist

Use this checklist when planning your private cloud deployment:

Kubernetes cluster (v1.26+) provisioned with sufficient node capacity

Container registry access configured (images provided by Sapience or mirrored to your internal registry)

Persistent volume (RWX) provisioned for shared file storage

PostgreSQL 17+ available (containerized or managed service)

Redis 7+ available (containerized or managed service)

TLS certificates provisioned for ingress

Ingress controller configured with TLS termination

Network policies applied to restrict inter-service communication

LLM API keys provisioned (OpenAI, Anthropic, or self-hosted endpoint)

Egress firewall rules configured to allow only LLM API endpoints

DNS configured for your Sapience domain

Backup strategy implemented for PostgreSQL persistent volume

Monitoring integrated (Prometheus/Grafana, Datadog, or your preferred stack)

OIDC/SAML identity provider configured for SSO

Frequently Asked Questions

Can I use my existing PostgreSQL / Redis instead of the provided containers?

Yes. Sapience supports connecting to any PostgreSQL 17+ or Redis 7+ instance. Simply configure the connection strings in the environment variables. Many customers use managed database services (RDS, Cloud SQL, Azure Database) for production deployments.

What if I need to run in a completely air-gapped environment?

Sapience supports air-gapped deployment when paired with a self-hosted LLM endpoint that exposes an OpenAI-compatible API. All container images can be loaded from a private registry with no internet access required at runtime.

How are updates delivered?

Sapience provides updated container images on a regular release cadence. You pull new images to your internal registry and roll them out on your schedule using standard Kubernetes rolling update strategies. There is no automatic update mechanism — you control when and how updates are applied.

What monitoring and observability is built in?

The Application Server exposes health check endpoints (/startup, /health) compatible with Kubernetes probes. Application logs are written to stdout/stderr in structured format, compatible with any log aggregation system (ELK, Splunk, CloudWatch, etc.). You are responsible for integrating with your monitoring stack.

Does Sapience support multi-tenant isolation within a single deployment?

Yes. Sapience has built-in multi-organization support with role-based access control. Users, agents, data sources, and conversations are scoped to organizations. A single deployment can serve multiple isolated tenants.

Have questions about private cloud deployment? Contact the Sapience team at sales@sapiencecloud.ai for a deployment planning session.