Muhammad Mudassir
Founder & CEO, Cognilium AI
A multi-family-office SaaS had to consolidate QuickBooks financials, Google Workspace (Drive, Calendar, Gmail), and investment documents into a single AI-driven platform. Family offices juggle financial statements, legal documents (PPMs, SPAs, SAFEs), cap tables, emails, and calendars across disconnected systems. Manual data extraction from investment documents is error-prone and slow. There was no unified view of portfolio, entities, and obligations — and the multi-tenant requirements made naive "register every tool and let the LLM choose" approaches both expensive (tokens per turn) and unsafe (hallucinated calls to non-integrated tools).
A supervisor-router agent on Google ADK 1.15 dispatches to 7 specialist agents based on intent. The supervisor is instantiated per-request from a factory that reads the org-level RBAC and integration status before binding tools — so each org's supervisor only sees the tools that org has paid for and connected. A document intelligence pipeline (parser → classifier → evidence → extraction → validation → scorer → graph writer) auto-extracts structured data from investment documents using Gemini 2.5 Pro, writing the results into a Neo4j knowledge graph linking companies, investments, and documents. Zero-trust tenant isolation is enforced at the Firestore path layer (organizations/{orgId}/...) and via TenantContextMiddleware on every request.
Every request hits TenantContextMiddleware which reads the immutable Firebase custom claim, fetches organizations/{orgId}/permissions, and attaches the merged context to request.context. The supervisor factory takes that context and assembles a fresh Agent: which specialists to register, which tools to bind to each, which system-prompt fragments to splice in. The output is cached by {orgId, integrations_hash} — cold path ~150ms (Firestore reads + tool wiring), warm ~5ms (in-memory map lookup). Pub/Sub invalidates the cache <1s after any integration change.
Layer one (tool registration): the factory only binds tools the org is allowed to use. The LLM literally does not know the others exist — system prompt is shorter, hallucination cannot reach into a non-connected Salesforce. Layer two (per-tool permission check inside the handler): every tool starts with assert_permission(request.context, "salesforce:read"). Defense in depth — the factory layer can be bypassed accidentally; the tool layer is enforced last. Together they cover both gaps.
Parser (PDF / DOCX / XLSX text extraction) → Classifier (document type via Gemini 2.0 Flash) → Evidence Extractor (supporting text per field) → Extractor (structured field extraction via Gemini 2.5 Pro) → Validator (cross-field consistency: dates, party names, amounts) → Scorer (confidence per field) → Graph Writer (Neo4j upsert with cross-document entity linking). The pipeline handles PPMs, SPAs, SAFEs, and cap tables. Confidence below threshold triggers a human-review queue rather than silent low-quality writes.
Every collection lives under organizations/{orgId}/. There is no top-level documents collection — only organizations/{orgId}/documents. A query that omits the orgId path segment fails at dispatch. The system has no way to "accidentally" query across tenants because the path itself enforces scope. 60+ permissions follow the format "{resource}:{action}:{scope}" — examples: "documents:read:org", "documents:*:org", "billing:read:platform". Five roles bundle them; wildcards expand at check-time, not at storage.
Gmail push notifications land in a Cloud Pub/Sub topic the backend subscribes to; new threads index into the per-org Vertex AI Search engine within ~1 second. Google Drive uses watch-channel webhooks with polling fallback (Drive's webhook reliability is good-but-not-perfect; the poll catches missed deliveries). QuickBooks uses scheduled syncs on Cloud Scheduler — 8 jobs cover financials, transactions, and entity reconciliation.
“The supervisor binds only the tools each org has actually paid for and integrated — the LLM doesn't even know the other tools exist. That single design choice eliminated a class of hallucinated API calls we were dreading.”
TL;DR
How Cognilium built a multi-tenant AI SaaS with 7 specialist agents on Google ADK, per-org tool registration, and zero-trust Firestore isolation.
A multi-family-office SaaS had to consolidate QuickBooks financials, Google Workspace, and a 10+ year archive of investment paperwork into a single AI-driven platform. The architectural challenge was not building the agents — it was building a multi-tenant agent platform where each org saw only the tools they had paid for and integrated, where data isolation was structurally enforced rather than remembered, and where unstructured documents (PPMs, SPAs, SAFEs, cap tables) became structured data without an army of analysts.
The naive approach — register every tool, let the LLM ignore the irrelevant ones — fails on two axes. Cost: every tool description in the system prompt costs tokens on every turn; for 7 agents with 12 tools each, that is roughly 84 tool descriptions on every request. Security: the LLM hallucinates a Salesforce call for an org without Salesforce, the user sees a 401 and blames the AI. The supervisor has to be instantiated per-request with org-aware tool binding.
Every request goes through TenantContextMiddleware: reads the immutable Firebase claim, looks up organizations/{orgId}/permissions, attaches the merged context to request.context. The supervisor factory takes that context and assembles a fresh ADK Agent — which specialists to register, which tools to bind, which prompt fragments to splice in. The factory caches by {orgId, integrations_hash}; warm path is ~5ms, cold path ~150ms. Pub/Sub-driven invalidation means a new integration is visible to the next request within a second.
Investment documents are not "summarize this PDF" tasks — they have schema. A PPM has named parties, monetary amounts, jurisdictions, dates, voting rights. A SAFE has valuation cap, discount, MFN. A cap table has share classes, share counts, ownership percentages that must sum to 100. The pipeline runs eight stages in order with confidence scoring at each — low-confidence fields route to human review instead of corrupting the graph.
The 60+ permissions ended up needing a wildcard expansion layer ("documents:*:org" expands to read+write+delete+list at check time) because flat enumeration produced unwieldy role definitions. Build the wildcard expansion in from day one — retrofitting it after roles are already assigned to users is a migration headache.
Any multi-tenant AI SaaS where customers connect different integrations, data isolation is non-negotiable, and unstructured documents need to become structured data. Legal, regulated finance, enterprise knowledge management — the pattern transfers.
Find answers to common questions about the topics covered in this article.
The engineering writeups that explain how the system was built.
Building a multi-tenant agent platform on Google ADK where the supervisor binds only the tools each org has paid for and integrated — without forking the agent definition per tenant.
Hard tenant isolation on Firestore: middleware, immutable claims, wildcard permissions. The architecture that makes leakage structurally impossible.
Two-tier retries, atomic DynamoDB chunk claims, and checkpoint-based cancellation — the failure-recovery layer that lets a multi-agent contract review pipeline finish even when 5% of LLM calls fail.