How do you isolate tenant data when every tenant shares the same database and vector store?

We use defense in depth, not a single mechanism. At the database layer, every tenant-scoped table has a row-level security policy keyed on org_id, and the session sets org_id at the start of every transaction — a query that forgets to scope simply returns zero rows instead of leaking. At the vector store layer (pgvector, Pinecone, or Weaviate), every embedding is written with an org_id metadata field and every search applies a mandatory metadata filter, enforced in a thin wrapper that application code cannot bypass. At the agent layer, LangGraph state is namespaced by org_id so concurrent runs cannot cross-contaminate.

Why a per-org tool registry instead of one global tool list?

A global tool list forces you to choose between two bad options: expose every tool to every tenant (a privilege-escalation bug waiting to happen), or hardcode if-statements per customer (which collapses by tenant 20). A per-org tool registry makes entitlements data, not code. Tenant A on the Starter plan gets web_search and a read-only Stripe tool; Tenant B on Enterprise gets those plus their own SAP connector with their own credentials. Changing a plan or onboarding a new customer is a row insert, not a deploy. The agent supervisor literally cannot call a tool the tenant does not own.

How do you handle per-tenant credentials for tools like Stripe, Salesforce, or a tenant’s own database?

Long-lived plaintext secrets in app config are the single most common multi-tenant breach pattern. Cognilium stores tenant credentials in AWS KMS, HashiCorp Vault, or AWS Secrets Manager under a path keyed by org_id, with IAM policies that scope the application’s decrypt permission to that path pattern. At tool-call time the application fetches and decrypts the credential, uses it for the single outbound request, and drops it from heap. Rotation is a Vault operation, not a deploy. Stolen application memory yields at most one tenant’s one credential, not the whole vault.

How do you bill tenants for AI usage when costs are unpredictable?

We instrument every model call and tool call with token counts, model id, and tool name, then push usage events to Stripe Billing’s metered subscriptions. Each tenant has a hard cost cap stored on their plan record; a Redis cost ledger tracks spend in real time and a sliding-window limiter halts execution at the cap with a structured 429. The tenant sees usage in your in-app dashboard sourced from the same ledger Stripe bills against, so customer support never has to argue about line items. For enterprise tenants we expose a webhook so their FinOps team can react to soft-cap thresholds (80%, 95%) before the hard cap hits.

What is the sub-200ms tenant-routing overhead, and how do you achieve it?

When a request enters the platform, the time spent on tenant identity, tool-registry lookup, rate-limit check, and credential resolution — everything before the first model token — stays under 200 ms at p95. We achieve this by validating JWTs with a cached JWKS (no per-request network hop), caching the per-org tool registry in Redis with a 60-second TTL, using Redis Lua scripts for atomic rate-limit + cost-cap checks in one round trip, and prefetching tenant credentials only at the tool-call step (not the request boundary). The result: the multi-tenant machinery is invisible to the tenant’s perceived latency.

How do you support enterprise tenants who require their own SSO and their own AWS account?

Three patterns, all supported. (1) Tenant-side SSO: we accept their SAML or OIDC identity provider via Auth0, Clerk, or Workos, and federate their groups into your org_id model. (2) Bring-your-own-key: an enterprise tenant points us at their own KMS or Vault for credential storage, so secrets never leave their cloud. (3) Dedicated isolation: for tenants that require a separate VPC or AWS account, the same codebase deploys as a single-tenant stack with a feature flag, and shared control-plane services (billing, identity) call into it over a peered network. Most customers ship 90% of tenants on the shared plane and reserve dedicated stacks for their top accounts.

How long until we have a multi-tenant AI platform in production?

A 4-week MVP with one tenant: tenant identity middleware, per-org tool registry (one tenant, three tools), LangGraph supervisor with row-level-security-backed state, per-tenant Redis rate limits, and a single audit-log table. Production multi-tenant guarantees and billing land at week 8: KMS-backed credentials, Stripe metered billing wired to the cost ledger, Datadog and Sentry with per-tenant tags, signed append-only audit log with tenant-side export, and load tests confirming sub-200ms routing overhead at p95 across 100 concurrent tenants. From week 8 onward, onboarding a new tenant is a configuration change, not an engineering project.

MULTI-TENANT SAAS AI PLATFORM

Per-Org Tool Registry,Zero-Trust Isolation, Sub-200ms Routing

The multi-tenant AI platform B2B SaaS founders build when they realize one forgotten WHERE org_id = is the difference between a feature and a breach. LangGraph + Postgres row-level security + KMS-backed per-tenant credentials. 4-week MVP.

< 200ms tenant routing p95

99.99% tenant data isolation

60% lower per-tenant infra cost

New tenant in < 5 minutes

FIVE WAYS MULTI-TENANT AI FAILS

The Problems That Bite At Tenant 20

The first ten tenants forgive a lot. The next ninety do not. These are the failure modes we have rebuilt around at Cognilium since 2019.

Tenant Data Leaks Are One Forgotten WHERE Clause Away

Single-database multi-tenant code without row-level security leaks tenant A’s data into tenant B’s response the first time an engineer forgets to scope a query.

Per-Tenant Credentials Stored In .env Files

Stripe keys, Salesforce tokens, and tenant-owned API credentials end up in environment variables, secrets managers without scope, or — worst — checked into git.

Every New Tenant Requires Engineering

Tools, models, and prompts are hardcoded per customer with if-statements. Onboarding tenant 20 means a deploy. Onboarding tenant 100 means a war room.

AI Costs Are Unbounded And Untracked Per-Tenant

One tenant runs a runaway agent loop and burns $40K in model spend before anyone notices. Usage is not attributed to org_id, so you cannot bill or cap.

Logs And Metrics Are Not Tenant-Filterable

When tenant A reports a bug, your on-call grep their org_id across plaintext logs that were never indexed by tenant. Datadog dashboards aggregate every tenant into one number.

Tenant Data Leaks Are One Forgotten WHERE Clause Away

Failure Mode

Single-database multi-tenant code without row-level security leaks tenant A’s data into tenant B’s response the first time an engineer forgets to scope a query.

Business Impact

One leak triggers SOC 2 nonconformity, breach disclosure, and contract penalties from every enterprise tenant you have.

Real Cost

Average B2B SaaS breach disclosure: $4.1M direct, plus 18-24 months of stalled enterprise sales.

Bottom line: Every one of these is a configuration problem at week one and a rewrite at month twelve.

THE COGNILIUM BUILD

Six Capabilities,Engineered To Survive Tenant 100

Each capability solves one of the failure modes above. None of them is optional once you have enterprise tenants.

Zero-Trust Tenant Isolation

Postgres row-level security policies keyed on org_id. Vector store metadata filters enforced in a wrapper the application cannot bypass. LangGraph state namespaced per tenant. A forgotten scope returns zero rows — not someone else’s data.

Per-Org Tool Registry

Each tenant has a row-level allow-list of tools, models, prompts, and data sources. The LangGraph supervisor literally cannot dispatch to a tool the tenant does not own. Onboarding a new tenant or changing a plan is a row insert, not a deploy.

Per-Tenant Credential Vault

Tenant credentials for Stripe, Salesforce, vendor APIs, and the tenant’s own database live in AWS KMS or HashiCorp Vault under an org_id-scoped path. Credentials are decrypted per-call and dropped from heap. Stolen memory yields one credential, not all.

Sub-200ms Tenant Routing

Identity middleware, registry lookup, rate-limit check, and cost-cap check stay under 200ms at p95. JWKS cached, Redis Lua for atomic limiter + ledger, prefetch deferred to the tool step. Multi-tenant machinery is invisible to perceived latency.

Usage-Based Billing Wired Through

Every model and tool call emits a usage event tagged with org_id, model id, and token count. A Redis cost ledger tracks spend in real time against hard caps. Stripe Billing metered subscriptions consume the same ledger your in-app dashboard renders.

Per-Tenant Observability

OpenTelemetry spans tagged with org_id, user_id, and agent_run_id flow into Datadog and Sentry with per-tenant dashboards and alert routes. Signed, append-only audit log exportable to each tenant for their own SOC 2 evidence pack.

REQUEST PIPELINE

How A Single Request Becomes A Safe Tenant Response

Request → tenant identity → per-org tool registry → LangGraph supervisor → rate limit + cost cap → tool execution with HSM-backed credentials → response with audit trail.

< 10ms

Tenant Identity Middleware

JWT validated against cached JWKS from Auth0, Clerk, or Workos. Immutable org_id, plan, and region claims signed at issue time so they cannot be spoofed downstream.

< 5ms

Per-Org Tool Registry Lookup

Redis-cached allow-list resolves which tools, models, and data sources this tenant is entitled to. 60-second TTL with plan-change invalidation. No per-request database hop.

Dispatch < 20ms

Agent Dispatch (LangGraph Supervisor)

Supervisor graph only sees the tenant’s resolved tools. State namespaced by org_id. Checkpoints written to a Postgres table with row-level security policies.

< 3ms

Per-Tenant Rate Limit + Cost Cap

Redis Lua script atomically checks sliding-window quota and remaining spend against the tenant’s plan cap. Soft caps emit 429 + retry-after. Hard caps halt execution.

Per tool call

Tool Execution (Per-Tenant Credentials From HSM)

Outbound credential fetched at tool-call time from AWS KMS or HashiCorp Vault under the tenant’s scoped path. Decrypted, used once, dropped from heap. No long-lived plaintext.

Async write

Response With Full Audit Trail

Signed, append-only audit log records prompt, tool calls, and response, keyed by org_id. OpenTelemetry spans land in Datadog and Sentry tagged per-tenant. Exportable as SOC 2 evidence.

WHO BUILDS THIS WITH US

Five Companies That Need This Stack

The shape of the problem repeats. The tech choices we make on day one are what decide whether you scale to a hundred tenants or rewrite at twenty.

Vertical AI Startups

A vertical AI startup serving 40+ professional-services firms needs each firm to bring its own document corpus, its own SSO, and its own audit trail — without forking the codebase.

Stack

Auth0 + pgvector with RLS + per-org tool registry

Professional-Services Platforms

A legal-tech, HR-tech, or fin-tech platform shipping AI copilots to enterprise tenants who demand tenant-side SAML, dedicated KMS keys, and exportable audit logs for their own SOC 2 reports.

Stack

Workos SSO + AWS KMS bring-your-own-key + signed audit log

B2B SaaS Founders Adding AI

A B2B SaaS scaling AI features across the existing tenant base needs per-plan tool entitlements and metered billing so the AI line item shows up on the existing Stripe invoice — not a separate bill.

Stack

Clerk + Stripe metered billing + Redis cost ledger

White-Label AI Offerings

A multi-family-office SaaS resells an AI assistant to its own customer banks. Each bank needs its branding, its own credential vault, and its own per-end-user quotas under one umbrella contract.

Stack

Custom OIDC + Vault per-tenant paths + nested org_id model

Enterprise Tenants On Shared SaaS

A Fortune-class manufacturer signs onto your platform and demands their data live in their AWS account behind their VPC, while still consuming the shared control plane for identity and billing.

Stack

Single-tenant dedicated stack + peered shared control plane

ENGINEERING OUTCOMES

The Four Numbers That Decide If You Have A Multi-Tenant Platform

We hold the build to these metrics. If we miss one, it is not done.

< 200ms

Tenant-routing overhead at p95

Before us: Unmeasured

99.99%

Tenant data isolation (RLS + wrapper-enforced metadata filters)

Before us: One forgotten WHERE clause

60%

Lower per-tenant infra cost vs single-tenant clones

Before us: 1 stack per tenant

< 5 min

Deploy a new tenant (row insert, not a release)

Before us: 2-4 engineering weeks

Want These Numbers On Your Stack?

Bring a tenant model, a target plan structure, and a date — we will tell you the shortest path.

IMPLEMENTATION TIMELINE

From Empty Repo To Production Multi-Tenant In 8 Weeks

MVP at week 4, production guarantees at week 8, scale-out from week 9. Same codebase from one tenant to one hundred.

Week 1

Architecture & Identity (Week 1)

Tenant model (org_id, plan, region) finalized. Auth0 / Clerk / Workos wired with signed claims. Row-level security policies drafted for every tenant-scoped table. Decision: shared plane vs dedicated stacks for top tenants.

Weeks 2-4

4-Week MVP With One Tenant

Per-org tool registry (one tenant, three tools). LangGraph supervisor with RLS-backed state. Redis sliding-window rate limiter. Single audit-log table. End-to-end test: tenant A request never touches tenant B data.

Weeks 5-8

8-Week Production With Multi-Tenant Guarantees

AWS KMS / HashiCorp Vault for per-tenant credentials. Stripe metered billing wired to Redis cost ledger. Datadog + Sentry with per-tenant tags. Signed append-only audit log with tenant-side export. Sub-200ms p95 verified under load.

Week 9+

Scale-Out (Post Week 8)

Onboarding a new tenant is a configuration change. Enterprise patterns wired in as needed: tenant-side SSO federation, bring-your-own KMS key, dedicated single-tenant stacks for top accounts on the same codebase.

ENGINEERING FAQ

Eight Questions Every Founder Should Ask Before Build Starts

These come up on every multi-tenant project. The answers below are the answers we ship to.

One running instance serves many customer organizations while each tenant believes they are the only user. Three guarantees: data isolation (tenant A cannot read tenant B), config isolation (each tenant has its own tools, models, prompts, quotas), observability isolation (logs, costs, audit trails filterable per-tenant). We enforce these with Postgres row-level security, a per-org tool registry, AWS KMS / HashiCorp Vault per-tenant credential paths, and OpenTelemetry spans tagged with org_id.

READY WHEN YOUR TENANT MODEL IS

Build It Once. Ship It To Every Tenant.

50+ projects delivered since 2019, 96% client satisfaction, four production AI products in the wild (Paralegent AI, ProspectVox, VectorHire, VORTA). The same engineering bench builds your multi-tenant stack.

Clients in US, UAE, and Pakistan. Founded 2019.

4-week MVP 8-week production guarantees Per-tenant SOC 2 evidence