cache-strategy

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/cache-strategy

/缓存策略

Implement a permanent cache-first strategy: read from cache, fall back to DB on miss, write to cache, invalidate only on data mutation. No expiry timers — the cache stays valid until the underlying data changes.
实现永久缓存优先策略:先从缓存读取,缓存未命中时回退到数据库查询,查询后写入缓存,仅在数据变更时失效。无过期计时器——缓存会保持有效,直到底层数据发生变更。

Philosophy

设计理念

The goal is one DB read per data item, ever (until it changes):
READ:   cache hit? → return cached value
        cache miss? → query DB → write to cache → return value

WRITE:  update DB → invalidate (or update) cache entry → done
This is the Play Framework model: no TTL, no polling, no background refresh. The cache is always warm and always correct because it is invalidated precisely when the source changes.
Why permanent over TTL?
  • TTL caches trade stale reads for simplicity — every expiry is a guaranteed DB hit even if nothing changed
  • Permanent caches with invalidation give you zero unnecessary DB hits and zero stale reads
  • TTL is only appropriate when you can't know when data changes (third-party APIs, etc.)
目标是每个数据项仅读取一次数据库(直到数据变更)
读取操作:   缓存命中? → 返回缓存值
        缓存未命中? → 查询数据库 → 写入缓存 → 返回值

写入操作:  更新数据库 → 失效(或更新)缓存条目 → 完成
这是Play Framework的缓存模型:无TTL、无轮询、无后台刷新。缓存始终处于预热状态且始终准确,因为它会在数据源变更时精准失效。
为什么选择永久缓存而非TTL缓存?
  • TTL缓存以陈旧读取为代价换取简单性——每次过期都会触发一次数据库查询,即使数据没有任何变化
  • 带失效机制的永久缓存可避免所有不必要的数据库查询,且不会出现陈旧读取
  • 仅当无法知晓数据何时变更时(如第三方API等场景),TTL才是合适的选择

Step 1: Detect cache infrastructure

步骤1:检测缓存基础设施

bash
undefined
bash
undefined

Check for Redis

检查Redis

grep -rn "redis|Redis|REDIS_URL" --include=".json" --include=".env*" --include=".yml" --include=".yaml" --include=".rb" --include=".py" --include=".js" --include=".ts" . 2>/dev/null | grep -v node_modules | head -10
grep -rn "redis|Redis|REDIS_URL" --include=".json" --include=".env*" --include=".yml" --include=".yaml" --include=".rb" --include=".py" --include=".js" --include=".ts" . 2>/dev/null | grep -v node_modules | head -10

Check for Memcached

检查Memcached

grep -rn "memcached|Memcached|MEMCACHE" --include=".json" --include=".env*" --include="*.yml" . 2>/dev/null | grep -v node_modules | head -5
grep -rn "memcached|Memcached|MEMCACHE" --include=".json" --include=".env*" --include="*.yml" . 2>/dev/null | grep -v node_modules | head -5

Check for in-process cache (node-cache, lru-cache, etc.)

检查进程内缓存(node-cache、lru-cache等)

grep -rn "lru-cache|node-cache|memory-cache|caffeine|guava.cache" --include=".json" --include=".js" --include=".ts" --include="*.java" . 2>/dev/null | grep -v node_modules | head -5

Report what's available. If no cache layer exists, recommend Redis as the default and offer to add the client library.
grep -rn "lru-cache|node-cache|memory-cache|caffeine|guava.cache" --include=".json" --include=".js" --include=".ts" --include="*.java" . 2>/dev/null | grep -v node_modules | head -5

报告可用的缓存组件。如果不存在缓存层,推荐Redis作为默认选项,并提供添加客户端库的方案。

Step 2: Identify cacheable DB reads

步骤2:识别可缓存的数据库读取操作

Scan for DB query patterns and classify by cacheability:
High-value cache candidates (read often, change rarely):
  • User profiles, settings, preferences
  • Product/item details
  • Configuration tables
  • Reference data (categories, tags, enums stored in DB)
  • Aggregates that are expensive to compute (counts, sums)
Poor cache candidates (skip these):
  • Real-time data (live inventory counts, chat messages)
  • Per-session data
  • Data that changes on every write (e.g., counters that increment on every request)
bash
undefined
扫描数据库查询模式,并按可缓存性分类:
高价值缓存候选对象(频繁读取、极少变更):
  • 用户资料、设置、偏好
  • 产品/商品详情
  • 配置表
  • 参考数据(存储在数据库中的分类、标签、枚举)
  • 计算成本高的聚合数据(计数、求和)
低价值缓存候选对象(跳过这些):
  • 实时数据(实时库存数量、聊天消息)
  • 会话级数据
  • 每次写入都会变更的数据(例如,每次请求都会递增的计数器)
bash
undefined

Find the most-called query patterns

查找调用最频繁的查询模式

grep -rn ".find\b|.findById|.findOne|.where|SELECT"
--include=".rb" --include=".py" --include=".js" --include=".ts"
. 2>/dev/null | grep -v node_modules | grep -v test | grep -v spec | head -30

For each candidate, note:
- Cache key pattern (e.g., `user:{id}`, `product:{slug}`)
- Where it's written/updated (to know where to place invalidation)
grep -rn ".find\b|.findById|.findOne|.where|SELECT"
--include=".rb" --include=".py" --include=".js" --include=".ts"
. 2>/dev/null | grep -v node_modules | grep -v test | grep -v spec | head -30

针对每个候选对象,记录:
- 缓存键模式(例如:`user:{id}`、`product:{slug}`)
- 数据写入/更新的位置(以便确定放置失效逻辑的位置)

Step 3: Implement cache-first reads

步骤3:实现缓存优先读取

For each cacheable read, apply this pattern:
针对每个可缓存的读取操作,应用以下模式:

Redis (Node.js / TypeScript)

Redis(Node.js / TypeScript)

typescript
async function getUser(id: string) {
  const cacheKey = `user:${id}`;
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const user = await db.users.findById(id);
  await redis.set(cacheKey, JSON.stringify(user)); // no TTL — permanent until invalidated
  return user;
}
typescript
async function getUser(id: string) {
  const cacheKey = `user:${id}`;
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const user = await db.users.findById(id);
  await redis.set(cacheKey, JSON.stringify(user)); // 无TTL — 永久有效,直到失效
  return user;
}

Redis (Ruby / Rails)

Redis(Ruby / Rails)

ruby
def find_user(id)
  Rails.cache.fetch("user:#{id}") do
    User.find(id)  # only called on cache miss
  end
end
ruby
def find_user(id)
  Rails.cache.fetch("user:#{id}") do
    User.find(id)  # 仅在缓存未命中时调用
  end
end

Redis (Python)

Redis(Python)

python
def get_user(user_id: int):
    cache_key = f"user:{user_id}"
    cached = redis_client.get(cache_key)
    if cached:
        return json.loads(cached)
    user = db.session.get(User, user_id)
    redis_client.set(cache_key, json.dumps(user.to_dict()))  # no expiry
    return user
python
def get_user(user_id: int):
    cache_key = f"user:{user_id}"
    cached = redis_client.get(cache_key)
    if cached:
        return json.loads(cached)
    user = db.session.get(User, user_id)
    redis_client.set(cache_key, json.dumps(user.to_dict()))  # 无过期时间
    return user

Play Framework (Scala)

Play Framework(Scala)

scala
def getUser(id: Long): Future[User] = {
  cache.getOrElseUpdate(s"user:$id") {
    userRepository.findById(id)
  }
}
scala
def getUser(id: Long): Future[User] = {
  cache.getOrElseUpdate(s"user:$id") {
    userRepository.findById(id)
  }
}

Step 4: Implement cache invalidation on writes

步骤4:在写入操作时实现缓存失效

For every place that mutates cached data, add invalidation immediately after the DB write succeeds:
在所有会修改缓存数据的位置,数据库写入成功后立即添加失效逻辑:

Node.js / TypeScript

Node.js / TypeScript

typescript
async function updateUser(id: string, data: Partial<User>) {
  const user = await db.users.update(id, data);
  await redis.del(`user:${id}`); // invalidate on change
  return user;
}
typescript
async function updateUser(id: string, data: Partial<User>) {
  const user = await db.users.update(id, data);
  await redis.del(`user:${id}`); // 数据变更时失效
  return user;
}

Ruby / Rails

Ruby / Rails

ruby
def update_user(id, attrs)
  user = User.find(id).tap { |u| u.update!(attrs) }
  Rails.cache.delete("user:#{id}")
  user
end
Important: invalidate on delete too:
typescript
async function deleteUser(id: string) {
  await db.users.delete(id);
  await redis.del(`user:${id}`);
}
For lists/collections, invalidate the collection key when any item in it changes:
typescript
await redis.del(`user:${id}`);       // the specific item
await redis.del(`users:list`);       // any cached list that included this item
ruby
def update_user(id, attrs)
  user = User.find(id).tap { |u| u.update!(attrs) }
  Rails.cache.delete("user:#{id}")
  user
end
重要提示: 删除操作时也要失效缓存:
typescript
async function deleteUser(id: string) {
  await db.users.delete(id);
  await redis.del(`user:${id}`);
}
针对列表/集合,当其中任意项变更时,失效集合对应的缓存键:
typescript
await redis.del(`user:${id}`);       // 特定项
await redis.del(`users:list`);       // 包含该项的所有缓存列表

Step 5: Cache key conventions

步骤5:缓存键命名规范

Use a consistent key naming scheme across the codebase:
<entity>:<identifier>          → user:123, product:slug-name
<entity>:list:<filter>         → products:list:category-5
<entity>:<id>:<relation>       → user:123:orders
aggregate:<entity>:<filter>    → aggregate:orders:user-123-total
Document the convention in a comment at the top of your cache module:
typescript
// Cache key conventions:
// user:{id}                — single user record
// users:list               — paginated user list (invalidate on any user change)
// user:{id}:orders         — orders belonging to a user
在代码库中使用统一的键命名方案:
<实体>:<标识符>          → user:123, product:slug-name
<实体>:list:<筛选条件>         → products:list:category-5
<实体>:<id>:<关联关系>       → user:123:orders
aggregate:<实体>:<筛选条件>    → aggregate:orders:user-123-total
在缓存模块顶部的注释中记录该规范:
typescript
// 缓存键命名规范:
// user:{id}                — 单个用户记录
// users:list               — 分页用户列表(任意用户变更时失效)
// user:{id}:orders         — 用户所属的订单

Step 6: Cache warming (optional)

步骤6:缓存预热(可选)

For critical data that must never have a cold-start miss, add a warm-up step on app boot:
typescript
async function warmCache() {
  const activeUsers = await db.users.findActiveUsers();
  await Promise.all(
    activeUsers.map(u => redis.set(`user:${u.id}`, JSON.stringify(u)))
  );
}
对于绝对不能出现冷启动未命中的关键数据,在应用启动时添加预热步骤:
typescript
async function warmCache() {
  const activeUsers = await db.users.findActiveUsers();
  await Promise.all(
    activeUsers.map(u => redis.set(`user:${u.id}`, JSON.stringify(u)))
  );
}

Step 7: Summary

步骤7:总结

Report what was implemented:
undefined
报告已实现的内容:
undefined

Cache Strategy Applied

已应用的缓存策略

Cache layer: Redis (permanent, no TTL)

缓存层:Redis(永久缓存,无TTL)

Reads cached (X total)

已缓存的读取操作(共X个)

  • user:{id} — UserService.getUser()
  • product:{slug} — ProductService.findBySlug()
  • ...
  • user:{id} — UserService.getUser()
  • product:{slug} — ProductService.findBySlug()
  • ...

Invalidation points added (Y total)

添加的失效点(共Y个)

  • user:{id} — UserService.updateUser(), UserService.deleteUser()
  • product:{slug} — ProductService.updateProduct()
  • ...
  • user:{id} — UserService.updateUser(), UserService.deleteUser()
  • product:{slug} — ProductService.updateProduct()
  • ...

Estimated DB load reduction

预估数据库负载降低情况

Before: ~X DB queries/min After: ~Y DB queries/min (first-read only, then cache)

Note: for data that comes from third-party APIs or sources where you can't intercept writes, use a long TTL (hours, not minutes) as a fallback — but flag these separately so they can be revisited if an invalidation webhook becomes available.
之前: ~X 次数据库查询/分钟 之后: ~Y 次数据库查询/分钟(仅首次读取查询数据库,后续从缓存读取)

注意:对于来自第三方API或无法拦截写入操作的数据源,可使用长TTL(小时级,而非分钟级)作为备选方案——但需单独标记这些场景,以便后续若有失效webhook可用时重新评估。