MULTI-TENANT SAAS AI PLATFORM

Per-Org Tool Registry,Zero-Trust Isolation, Sub-200ms Routing

The multi-tenant AI platform B2B SaaS founders build when they realize one forgotten WHERE org_id = is the difference between a feature and a breach. LangGraph + Postgres row-level security + KMS-backed per-tenant credentials. 4-week MVP.

< 200ms tenant routing p95
99.99% tenant data isolation
60% lower per-tenant infra cost
New tenant in < 5 minutes
FIVE WAYS MULTI-TENANT AI FAILS

The Problems That Bite At Tenant 20

The first ten tenants forgive a lot. The next ninety do not. These are the failure modes we have rebuilt around at Cognilium since 2019.

Tenant Data Leaks Are One Forgotten WHERE Clause Away

Single-database multi-tenant code without row-level security leaks tenant A’s data into tenant B’s response the first time an engineer forgets to scope a query.

Per-Tenant Credentials Stored In .env Files

Stripe keys, Salesforce tokens, and tenant-owned API credentials end up in environment variables, secrets managers without scope, or — worst — checked into git.

Every New Tenant Requires Engineering

Tools, models, and prompts are hardcoded per customer with if-statements. Onboarding tenant 20 means a deploy. Onboarding tenant 100 means a war room.

AI Costs Are Unbounded And Untracked Per-Tenant

One tenant runs a runaway agent loop and burns $40K in model spend before anyone notices. Usage is not attributed to org_id, so you cannot bill or cap.

Logs And Metrics Are Not Tenant-Filterable

When tenant A reports a bug, your on-call grep their org_id across plaintext logs that were never indexed by tenant. Datadog dashboards aggregate every tenant into one number.

Tenant Data Leaks Are One Forgotten WHERE Clause Away

Failure Mode

Single-database multi-tenant code without row-level security leaks tenant A’s data into tenant B’s response the first time an engineer forgets to scope a query.

Business Impact

One leak triggers SOC 2 nonconformity, breach disclosure, and contract penalties from every enterprise tenant you have.

Real Cost

Average B2B SaaS breach disclosure: $4.1M direct, plus 18-24 months of stalled enterprise sales.

Bottom line: Every one of these is a configuration problem at week one and a rewrite at month twelve.

THE COGNILIUM BUILD

Six Capabilities,Engineered To Survive Tenant 100

Each capability solves one of the failure modes above. None of them is optional once you have enterprise tenants.

Zero-Trust Tenant Isolation

Postgres row-level security policies keyed on org_id. Vector store metadata filters enforced in a wrapper the application cannot bypass. LangGraph state namespaced per tenant. A forgotten scope returns zero rows — not someone else’s data.

Per-Org Tool Registry

Each tenant has a row-level allow-list of tools, models, prompts, and data sources. The LangGraph supervisor literally cannot dispatch to a tool the tenant does not own. Onboarding a new tenant or changing a plan is a row insert, not a deploy.

Per-Tenant Credential Vault

Tenant credentials for Stripe, Salesforce, vendor APIs, and the tenant’s own database live in AWS KMS or HashiCorp Vault under an org_id-scoped path. Credentials are decrypted per-call and dropped from heap. Stolen memory yields one credential, not all.

Sub-200ms Tenant Routing

Identity middleware, registry lookup, rate-limit check, and cost-cap check stay under 200ms at p95. JWKS cached, Redis Lua for atomic limiter + ledger, prefetch deferred to the tool step. Multi-tenant machinery is invisible to perceived latency.

Usage-Based Billing Wired Through

Every model and tool call emits a usage event tagged with org_id, model id, and token count. A Redis cost ledger tracks spend in real time against hard caps. Stripe Billing metered subscriptions consume the same ledger your in-app dashboard renders.

Per-Tenant Observability

OpenTelemetry spans tagged with org_id, user_id, and agent_run_id flow into Datadog and Sentry with per-tenant dashboards and alert routes. Signed, append-only audit log exportable to each tenant for their own SOC 2 evidence pack.

REQUEST PIPELINE

How A Single Request Becomes A Safe Tenant Response

Request → tenant identity → per-org tool registry → LangGraph supervisor → rate limit + cost cap → tool execution with HSM-backed credentials → response with audit trail.

< 10ms

Tenant Identity Middleware

JWT validated against cached JWKS from Auth0, Clerk, or Workos. Immutable org_id, plan, and region claims signed at issue time so they cannot be spoofed downstream.

1
< 5ms

Per-Org Tool Registry Lookup

Redis-cached allow-list resolves which tools, models, and data sources this tenant is entitled to. 60-second TTL with plan-change invalidation. No per-request database hop.

2
Dispatch < 20ms

Agent Dispatch (LangGraph Supervisor)

Supervisor graph only sees the tenant’s resolved tools. State namespaced by org_id. Checkpoints written to a Postgres table with row-level security policies.

3
< 3ms

Per-Tenant Rate Limit + Cost Cap

Redis Lua script atomically checks sliding-window quota and remaining spend against the tenant’s plan cap. Soft caps emit 429 + retry-after. Hard caps halt execution.

4
Per tool call

Tool Execution (Per-Tenant Credentials From HSM)

Outbound credential fetched at tool-call time from AWS KMS or HashiCorp Vault under the tenant’s scoped path. Decrypted, used once, dropped from heap. No long-lived plaintext.

5
Async write

Response With Full Audit Trail

Signed, append-only audit log records prompt, tool calls, and response, keyed by org_id. OpenTelemetry spans land in Datadog and Sentry tagged per-tenant. Exportable as SOC 2 evidence.

6
WHO BUILDS THIS WITH US

Five Companies That Need This Stack

The shape of the problem repeats. The tech choices we make on day one are what decide whether you scale to a hundred tenants or rewrite at twenty.

Vertical AI Startups

A vertical AI startup serving 40+ professional-services firms needs each firm to bring its own document corpus, its own SSO, and its own audit trail — without forking the codebase.

Stack

Auth0 + pgvector with RLS + per-org tool registry

Professional-Services Platforms

A legal-tech, HR-tech, or fin-tech platform shipping AI copilots to enterprise tenants who demand tenant-side SAML, dedicated KMS keys, and exportable audit logs for their own SOC 2 reports.

Stack

Workos SSO + AWS KMS bring-your-own-key + signed audit log

B2B SaaS Founders Adding AI

A B2B SaaS scaling AI features across the existing tenant base needs per-plan tool entitlements and metered billing so the AI line item shows up on the existing Stripe invoice — not a separate bill.

Stack

Clerk + Stripe metered billing + Redis cost ledger

White-Label AI Offerings

A multi-family-office SaaS resells an AI assistant to its own customer banks. Each bank needs its branding, its own credential vault, and its own per-end-user quotas under one umbrella contract.

Stack

Custom OIDC + Vault per-tenant paths + nested org_id model

Enterprise Tenants On Shared SaaS

A Fortune-class manufacturer signs onto your platform and demands their data live in their AWS account behind their VPC, while still consuming the shared control plane for identity and billing.

Stack

Single-tenant dedicated stack + peered shared control plane

ENGINEERING OUTCOMES

The Four Numbers That Decide If You Have A Multi-Tenant Platform

We hold the build to these metrics. If we miss one, it is not done.

< 200ms
Tenant-routing overhead at p95
Before us: Unmeasured
99.99%
Tenant data isolation (RLS + wrapper-enforced metadata filters)
Before us: One forgotten WHERE clause
60%
Lower per-tenant infra cost vs single-tenant clones
Before us: 1 stack per tenant
< 5 min
Deploy a new tenant (row insert, not a release)
Before us: 2-4 engineering weeks

Want These Numbers On Your Stack?

Bring a tenant model, a target plan structure, and a date — we will tell you the shortest path.

IMPLEMENTATION TIMELINE

From Empty Repo To Production Multi-Tenant In 8 Weeks

MVP at week 4, production guarantees at week 8, scale-out from week 9. Same codebase from one tenant to one hundred.

1
Week 1

Architecture & Identity (Week 1)

Tenant model (org_id, plan, region) finalized. Auth0 / Clerk / Workos wired with signed claims. Row-level security policies drafted for every tenant-scoped table. Decision: shared plane vs dedicated stacks for top tenants.

2
Weeks 2-4

4-Week MVP With One Tenant

Per-org tool registry (one tenant, three tools). LangGraph supervisor with RLS-backed state. Redis sliding-window rate limiter. Single audit-log table. End-to-end test: tenant A request never touches tenant B data.

3
Weeks 5-8

8-Week Production With Multi-Tenant Guarantees

AWS KMS / HashiCorp Vault for per-tenant credentials. Stripe metered billing wired to Redis cost ledger. Datadog + Sentry with per-tenant tags. Signed append-only audit log with tenant-side export. Sub-200ms p95 verified under load.

4
Week 9+

Scale-Out (Post Week 8)

Onboarding a new tenant is a configuration change. Enterprise patterns wired in as needed: tenant-side SSO federation, bring-your-own KMS key, dedicated single-tenant stacks for top accounts on the same codebase.

ENGINEERING FAQ

Eight Questions Every Founder Should Ask Before Build Starts

These come up on every multi-tenant project. The answers below are the answers we ship to.

One running instance serves many customer organizations while each tenant believes they are the only user. Three guarantees: data isolation (tenant A cannot read tenant B), config isolation (each tenant has its own tools, models, prompts, quotas), observability isolation (logs, costs, audit trails filterable per-tenant). We enforce these with Postgres row-level security, a per-org tool registry, AWS KMS / HashiCorp Vault per-tenant credential paths, and OpenTelemetry spans tagged with org_id.
READY WHEN YOUR TENANT MODEL IS

Build It Once. Ship It To Every Tenant.

50+ projects delivered since 2019, 96% client satisfaction, four production AI products in the wild (Paralegent AI, ProspectVox, VectorHire, VORTA). The same engineering bench builds your multi-tenant stack.

Clients in US, UAE, and Pakistan. Founded 2019.

4-week MVP 8-week production guarantees Per-tenant SOC 2 evidence