The multi-tenant AI platform B2B SaaS founders build when they realize one forgotten WHERE org_id = is the difference between a feature and a breach. LangGraph + Postgres row-level security + KMS-backed per-tenant credentials. 4-week MVP.
The first ten tenants forgive a lot. The next ninety do not. These are the failure modes we have rebuilt around at Cognilium since 2019.
Single-database multi-tenant code without row-level security leaks tenant A’s data into tenant B’s response the first time an engineer forgets to scope a query.
Stripe keys, Salesforce tokens, and tenant-owned API credentials end up in environment variables, secrets managers without scope, or — worst — checked into git.
Tools, models, and prompts are hardcoded per customer with if-statements. Onboarding tenant 20 means a deploy. Onboarding tenant 100 means a war room.
One tenant runs a runaway agent loop and burns $40K in model spend before anyone notices. Usage is not attributed to org_id, so you cannot bill or cap.
When tenant A reports a bug, your on-call grep their org_id across plaintext logs that were never indexed by tenant. Datadog dashboards aggregate every tenant into one number.
Single-database multi-tenant code without row-level security leaks tenant A’s data into tenant B’s response the first time an engineer forgets to scope a query.
One leak triggers SOC 2 nonconformity, breach disclosure, and contract penalties from every enterprise tenant you have.
Average B2B SaaS breach disclosure: $4.1M direct, plus 18-24 months of stalled enterprise sales.
Bottom line: Every one of these is a configuration problem at week one and a rewrite at month twelve.
Each capability solves one of the failure modes above. None of them is optional once you have enterprise tenants.
Postgres row-level security policies keyed on org_id. Vector store metadata filters enforced in a wrapper the application cannot bypass. LangGraph state namespaced per tenant. A forgotten scope returns zero rows — not someone else’s data.
Each tenant has a row-level allow-list of tools, models, prompts, and data sources. The LangGraph supervisor literally cannot dispatch to a tool the tenant does not own. Onboarding a new tenant or changing a plan is a row insert, not a deploy.
Tenant credentials for Stripe, Salesforce, vendor APIs, and the tenant’s own database live in AWS KMS or HashiCorp Vault under an org_id-scoped path. Credentials are decrypted per-call and dropped from heap. Stolen memory yields one credential, not all.
Identity middleware, registry lookup, rate-limit check, and cost-cap check stay under 200ms at p95. JWKS cached, Redis Lua for atomic limiter + ledger, prefetch deferred to the tool step. Multi-tenant machinery is invisible to perceived latency.
Every model and tool call emits a usage event tagged with org_id, model id, and token count. A Redis cost ledger tracks spend in real time against hard caps. Stripe Billing metered subscriptions consume the same ledger your in-app dashboard renders.
OpenTelemetry spans tagged with org_id, user_id, and agent_run_id flow into Datadog and Sentry with per-tenant dashboards and alert routes. Signed, append-only audit log exportable to each tenant for their own SOC 2 evidence pack.
Request → tenant identity → per-org tool registry → LangGraph supervisor → rate limit + cost cap → tool execution with HSM-backed credentials → response with audit trail.
JWT validated against cached JWKS from Auth0, Clerk, or Workos. Immutable org_id, plan, and region claims signed at issue time so they cannot be spoofed downstream.
Redis-cached allow-list resolves which tools, models, and data sources this tenant is entitled to. 60-second TTL with plan-change invalidation. No per-request database hop.
Supervisor graph only sees the tenant’s resolved tools. State namespaced by org_id. Checkpoints written to a Postgres table with row-level security policies.
Redis Lua script atomically checks sliding-window quota and remaining spend against the tenant’s plan cap. Soft caps emit 429 + retry-after. Hard caps halt execution.
Outbound credential fetched at tool-call time from AWS KMS or HashiCorp Vault under the tenant’s scoped path. Decrypted, used once, dropped from heap. No long-lived plaintext.
Signed, append-only audit log records prompt, tool calls, and response, keyed by org_id. OpenTelemetry spans land in Datadog and Sentry tagged per-tenant. Exportable as SOC 2 evidence.
The shape of the problem repeats. The tech choices we make on day one are what decide whether you scale to a hundred tenants or rewrite at twenty.
A vertical AI startup serving 40+ professional-services firms needs each firm to bring its own document corpus, its own SSO, and its own audit trail — without forking the codebase.
Auth0 + pgvector with RLS + per-org tool registry
A legal-tech, HR-tech, or fin-tech platform shipping AI copilots to enterprise tenants who demand tenant-side SAML, dedicated KMS keys, and exportable audit logs for their own SOC 2 reports.
Workos SSO + AWS KMS bring-your-own-key + signed audit log
A B2B SaaS scaling AI features across the existing tenant base needs per-plan tool entitlements and metered billing so the AI line item shows up on the existing Stripe invoice — not a separate bill.
Clerk + Stripe metered billing + Redis cost ledger
A multi-family-office SaaS resells an AI assistant to its own customer banks. Each bank needs its branding, its own credential vault, and its own per-end-user quotas under one umbrella contract.
Custom OIDC + Vault per-tenant paths + nested org_id model
A Fortune-class manufacturer signs onto your platform and demands their data live in their AWS account behind their VPC, while still consuming the shared control plane for identity and billing.
Single-tenant dedicated stack + peered shared control plane
We hold the build to these metrics. If we miss one, it is not done.
MVP at week 4, production guarantees at week 8, scale-out from week 9. Same codebase from one tenant to one hundred.
Tenant model (org_id, plan, region) finalized. Auth0 / Clerk / Workos wired with signed claims. Row-level security policies drafted for every tenant-scoped table. Decision: shared plane vs dedicated stacks for top tenants.
Per-org tool registry (one tenant, three tools). LangGraph supervisor with RLS-backed state. Redis sliding-window rate limiter. Single audit-log table. End-to-end test: tenant A request never touches tenant B data.
AWS KMS / HashiCorp Vault for per-tenant credentials. Stripe metered billing wired to Redis cost ledger. Datadog + Sentry with per-tenant tags. Signed append-only audit log with tenant-side export. Sub-200ms p95 verified under load.
Onboarding a new tenant is a configuration change. Enterprise patterns wired in as needed: tenant-side SSO federation, bring-your-own KMS key, dedicated single-tenant stacks for top accounts on the same codebase.
These come up on every multi-tenant project. The answers below are the answers we ship to.
50+ projects delivered since 2019, 96% client satisfaction, four production AI products in the wild (Paralegent AI, ProspectVox, VectorHire, VORTA). The same engineering bench builds your multi-tenant stack.
Clients in US, UAE, and Pakistan. Founded 2019.