Backend

Astra — Enterprise SaaS Platform

Enterprise multi-tenant SaaS platform with schema-per-tenant isolation, RBAC, audit logging, billing, and webhook infrastructure — designed for thousands of organizations on a single codebase.

Node.jsTypeScriptPostgreSQLRedisPrismaDockerBullMQStripe

∞

Tenant scale

5ms

Auth latency (cached)

40ms

Auth latency (uncached)

99.9%

Webhook delivery rate

Domain Knowledge

What problem this project solves

Multi-Tenant SaaS is one of the hardest architectural challenges in backend engineering. The core problem is data isolation: when Organization A and Organization B use the same app, their data must never mix. Astra implements schema-per-tenant isolation where each organization gets their own PostgreSQL schema. This provides stronger guarantees than shared-table with tenant_id — a single missing WHERE clause cannot leak data across tenants.

Architecture

How the system is structured

The platform uses a layered architecture: an API gateway handles authentication and tenant resolution, a service layer orchestrates business logic, and a data layer enforces tenant isolation via dynamic schema switching. Prisma connects with a custom middleware that injects SET search_path before every query, routing each request to the correct tenant schema. Redis caches organization-scoped permissions, reducing authorization query latency from 40ms to 5ms. Background jobs (BullMQ) handle billing events, webhook delivery, and audit log persistence asynchronously.

Data Model

Schema design and data flow

Each tenant gets their own schema (tenant_orgname) containing tables for users, projects, tasks, billing, and audit logs. Cross-tenant operations use a shared public schema for reference data. Row-level security (RLS) policies provide a defense-in-depth layer — even if a query bypasses the application layer, the database itself enforces tenant boundaries.

Key Challenges

Hardest problems encountered

The hardest challenge was Prisma's lack of native runtime schema switching. Solved by injecting raw SET search_path at the connection level inside a custom Prisma middleware before every query batch. Early builds had a global Redis flush on any permission change — replaced with namespaced keys (tenant:{id}:perms:{role}) so mutations only invalidate the affected tenant's affected resource. Billing was another major complexity: handling subscription upgrades, downgrades, failed payments, trial periods, and seat-based billing requires careful state management and idempotent webhook handlers.

Scaling Strategy

How the system grows

Horizontal scaling is safe because all state lives in Postgres + Redis, not in-process. New tenants simply create a new schema and run migrations — no code change. Redis caches heavy reads (permissions, tenant config, user lists) with TTL-based invalidation keyed per tenant. Read replicas handle reporting queries. The webhook system uses a dead-letter queue for failed deliveries with exponential backoff retry.

Security

Defense-in-depth approach

Schema-per-tenant isolation is the primary boundary. JWT tokens carry tenant claims verified at the gateway. RLS policies at the database layer provide defense-in-depth. Audit logs capture who did what, when, and from which IP — meeting enterprise compliance requirements. Rate limiting per tenant prevents abuse. All billing data is PCI-compliant via Stripe's tokenization — raw card data never touches the server.

Failure Handling

Resilience and recovery

The platform handles failures at multiple levels: transient database failures retry with exponential backoff, webhook delivery uses a DLQ after 3 failed attempts, payment processing uses Stripe webhooks with idempotency keys, and the queue system persists jobs to Redis so no work is lost on crash.

Observability

Monitoring and debugging

Structured logging with correlation IDs across request boundaries. Audit logs persist to a separate table for compliance. Key metrics tracked: P99 response time per tenant, cache hit ratio, webhook delivery success rate, failed payment rate. Alerts fire when any tenant exceeds rate limits or when audit log volume spikes unexpectedly.

Trade-offs

Engineering decisions and alternatives

Schema-per-tenant was chosen over shared-table for stronger isolation despite increased migration complexity. Database-per-tenant was rejected as too expensive for the target market. Prisma was chosen over raw SQL for developer velocity, accepting the middleware complexity for schema switching. Redis was chosen for caching over Memcached for its data structure flexibility and persistence options.

Architecture Decisions

Key choices and what was rejected

Decision

Chosen

Rejected

Tenancy model

Schema-per-tenant

Shared database with tenant_id

ORM

Prisma + schema switching middleware

Raw SQL (too slow for development)

Cache strategy

Redis namespaced read-through

In-memory (no persistence across restarts)

Job queue

BullMQ with Redis

In-process execution (blocks requests)

Billing

Stripe + webhook idempotency

Custom billing (PCI complexity)

Senior-Level Topics

Concepts this project explores

Row-Level SecurityMulti-Tenant IndexingDistributed CachingEvent SourcingRate LimitingAudit ComplianceIdempotencyWebhook Infrastructure