Astra — Enterprise SaaS Platform
Enterprise multi-tenant SaaS platform with schema-per-tenant isolation, RBAC, audit logging, billing, and webhook infrastructure — designed for thousands of organizations on a single codebase.
Domain Knowledge
What problem this project solves
Multi-Tenant SaaS is one of the hardest architectural challenges in backend engineering. The core problem is data isolation: when Organization A and Organization B use the same app, their data must never mix. Astra implements schema-per-tenant isolation where each organization gets their own PostgreSQL schema. This provides stronger guarantees than shared-table with tenant_id — a single missing WHERE clause cannot leak data across tenants.
Architecture
How the system is structured
The platform uses a layered architecture: an API gateway handles authentication and tenant resolution, a service layer orchestrates business logic, and a data layer enforces tenant isolation via dynamic schema switching. Prisma connects with a custom middleware that injects SET search_path before every query, routing each request to the correct tenant schema. Redis caches organization-scoped permissions, reducing authorization query latency from 40ms to 5ms. Background jobs (BullMQ) handle billing events, webhook delivery, and audit log persistence asynchronously.
Data Model
Schema design and data flow
Each tenant gets their own schema (tenant_orgname) containing tables for users, projects, tasks, billing, and audit logs. Cross-tenant operations use a shared public schema for reference data. Row-level security (RLS) policies provide a defense-in-depth layer — even if a query bypasses the application layer, the database itself enforces tenant boundaries.
Key Challenges
Hardest problems encountered
The hardest challenge was Prisma's lack of native runtime schema switching. Solved by injecting raw SET search_path at the connection level inside a custom Prisma middleware before every query batch. Early builds had a global Redis flush on any permission change — replaced with namespaced keys (tenant:{id}:perms:{role}) so mutations only invalidate the affected tenant's affected resource. Billing was another major complexity: handling subscription upgrades, downgrades, failed payments, trial periods, and seat-based billing requires careful state management and idempotent webhook handlers.
Scaling Strategy
How the system grows
Horizontal scaling is safe because all state lives in Postgres + Redis, not in-process. New tenants simply create a new schema and run migrations — no code change. Redis caches heavy reads (permissions, tenant config, user lists) with TTL-based invalidation keyed per tenant. Read replicas handle reporting queries. The webhook system uses a dead-letter queue for failed deliveries with exponential backoff retry.
Security
Defense-in-depth approach
Schema-per-tenant isolation is the primary boundary. JWT tokens carry tenant claims verified at the gateway. RLS policies at the database layer provide defense-in-depth. Audit logs capture who did what, when, and from which IP — meeting enterprise compliance requirements. Rate limiting per tenant prevents abuse. All billing data is PCI-compliant via Stripe's tokenization — raw card data never touches the server.
Failure Handling
Resilience and recovery
The platform handles failures at multiple levels: transient database failures retry with exponential backoff, webhook delivery uses a DLQ after 3 failed attempts, payment processing uses Stripe webhooks with idempotency keys, and the queue system persists jobs to Redis so no work is lost on crash.
Observability
Monitoring and debugging
Structured logging with correlation IDs across request boundaries. Audit logs persist to a separate table for compliance. Key metrics tracked: P99 response time per tenant, cache hit ratio, webhook delivery success rate, failed payment rate. Alerts fire when any tenant exceeds rate limits or when audit log volume spikes unexpectedly.
Trade-offs
Engineering decisions and alternatives
Schema-per-tenant was chosen over shared-table for stronger isolation despite increased migration complexity. Database-per-tenant was rejected as too expensive for the target market. Prisma was chosen over raw SQL for developer velocity, accepting the middleware complexity for schema switching. Redis was chosen for caching over Memcached for its data structure flexibility and persistence options.
Architecture Decisions
Key choices and what was rejected
Senior-Level Topics
Concepts this project explores