Redis Caching That Doesn't Rot Your System From the Inside
Caching is the easiest performance win and the easiest way to introduce bugs that manifest days later in production. Invalidation is the hard part. This post covers the strategies that keep Redis from becoming a liability.
TTL Is Not a Strategy
Setting a blanket 5-minute TTL on every cache key is how most systems start. It works until you have a cache hit serving stale data for 4 minutes and 59 seconds while your users see outdated pricing, broken state, or incorrect counts.
The problem isn't TTL itself — it's treating TTL as the only invalidation mechanism. TTL should be the last line of defense, not the primary invalidation strategy.
Namespaced Key Design
The single most impactful change you can make to a Redis cache is switching from flat keys to namespaced, versioned keys:
// Instead of:
user:42.profile
// Use:
user:v2:42.profile
// Or for collections:
org:abc:v7:team.listWhen the profile data model changes, bump the version. Old keys expire naturally via TTL while new requests populate fresh keys. This eliminates the need to flush entire databases during deployments.
Version namespacing also makes debugging easier — you can inspect KEYS user:v2:* to understand what's in cache versus KEYS user:* which mixes versions.
Targeted Invalidation Patterns
Write-Through with Tagged Invalidation
When a record is updated, invalidate or update the cache in the same transaction. Tag each cache entry with related entity types so you can invalidate broadly without guessing:
// On article update:
await redis.del([
`article:${id}.detail`,
`article:${id}.comments`,
`feed:${authorId}.articles`
]);This is predictable, debuggable, and doesn't require background jobs or delayed expiration.
Cache-aside with Stale-While-Revalidate
For high-traffic endpoints where cache misses hurt, serve stale data while asynchronously refreshing:
async function getCached(key, fetchFn, ttl) {
const data = await redis.get(key);
if (data) {
const parsed = JSON.parse(data);
// Trigger background refresh if close to expiry
if (parsed.ttl - Date.now() < ttl * 0.2) {
refreshCache(key, fetchFn, ttl); // async, no await
}
return parsed.value;
}
return refreshCache(key, fetchFn, ttl);
}This pattern absorbs traffic spikes without thundering herd problems — one request triggers the refresh while others continue reading the stale value.
What Not to Cache
Some data should never go through a cache layer:
- Locks and semaphores: Redis
SET NXfor distributed locks is fine, but don't cache the lock state — you need atomic reads. - Rate limiter counters: Use
INCRwithEXPIREdirectly, not cached snapshots. - WebSocket session state: In-memory staleness causes ghost connections and missed messages.
Monitoring Cache Health
Track these metrics to know when your cache is working versus hiding problems:
- Hit rate per key pattern. If
user:*has a 95% hit rate butsearch:*has 20%, the search cache is wasting memory. - Eviction rate. Sudden spikes mean your working set exceeds
maxmemory. Add memory or change the eviction policy. - Stale serve count. Track how often stale-while-revalidate serves stale data. If it's above 5% of total requests, your TTL is too short or your refresh is too slow.
A cache that nobody monitors is just technical debt with a fast response time. Metrics, alerts, and clean invalidation logic separate caching from cargo-culting.
Designing Multi-Tenant Systems Without Creating a Data Leak Nightmare
Schema-per-tenant vs shared-table tenancy, tradeoffs that actually matter, and why convenience-first architecture usually turns into future damage.
Building RAG Pipelines That Actually Work in Production
Chunking strategies, embedding selection, retrieval re-ranking, and why naive RAG falls apart at scale without careful pipeline design.
How I Structure Production APIs So They Don't Collapse Under Growth
Controllers, services, validation boundaries, auth layers, cache placement, and why most beginner backends become unreadable after 3 months.