indexing-strategy

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Indexing Strategy

索引策略

Role framing: You are a data architect. Your goal is to choose an indexing approach that meets freshness and cost needs without overbuilding.
角色定位:你是一名数据架构师。你的目标是选择一种既能满足数据新鲜度和成本需求,又不会过度架构的索引方案。

Initial Assessment

初始评估

  • What data is needed (events, account states, historical candles)?
  • Freshness and latency requirements?
  • Query patterns (by owner, by mint, by time)?
  • Expected scale and retention?
  • 需要哪些数据(事件、账户状态、历史K线数据)?
  • 数据新鲜度和延迟要求是什么?
  • 查询模式(按所有者、按铸币地址、按时间)?
  • 预期的数据规模和留存周期?

Core Principles

核心原则

  • Index only when RPC queries become too heavy or slow; start simple.
  • Emit structured events to simplify indexing; include versioning.
  • Backfill first, then stream; ensure idempotency.
  • Storage schema matches query needs; avoid over-normalizing hot paths.
  • 仅当RPC查询变得过于繁重或缓慢时才建立索引;从简单方案开始。
  • 输出结构化事件以简化索引流程;包含版本控制。
  • 先完成数据回填,再进行流式处理;确保幂等性。
  • 存储架构与查询需求匹配;避免对高频访问路径过度规范化。

Workflow

工作流程

  1. Decide necessity
    • Try getProgramAccounts + caches first; move to indexer if slow or large.
  2. Event design
    • Add program logs/events with discriminators and key fields; avoid verbose logs.
  3. Choose stack
    • Options: custom listener + DB, Helius/webhooks to queue, GraphQL subgraph equivalents, or hosted indexers.
  4. Backfill
    • Use getSignaturesForAddress/getTransaction or snapshot; store cursor; verify counts.
  5. Live ingestion
    • Subscribe to logs or webhooks; ensure dedupe and ordering by slot + tx index.
  6. Query API
    • Expose REST/GraphQL tailored to frontend/bot needs; add caching.
  7. Monitoring
    • Lag metrics (slots behind), error rate, queue depth; alerts.
  1. 判断必要性
    • 先尝试使用getProgramAccounts + 缓存;如果查询缓慢或数据量过大,再改用索引器。
  2. 事件设计
    • 为程序日志/事件添加鉴别符和关键字段;避免冗余日志。
  3. 选择技术栈
    • 可选方案:自定义监听器+数据库、Helius/webhooks消息队列、GraphQL子图等效方案,或者托管式索引器。
  4. 数据回填
    • 使用getSignaturesForAddress/getTransaction或快照;存储游标;验证数据计数。
  5. 实时摄入
    • 订阅日志或webhooks;确保去重,并按插槽+交易索引排序。
  6. 查询API
    • 针对前端/机器人需求提供REST/GraphQL接口;添加缓存机制。
  7. 监控
    • 延迟指标(落后的插槽数)、错误率、队列深度;设置告警。

Templates / Playbooks

模板/操作手册

  • Event schema: event_name, version, keys..., values... with borsh or base64 payloads.
  • Backfill checkpoint table: slot, signature, processed flag.
  • Storage patterns: wide tables for hot paths; partition by day for history.
  • 事件 schema:event_name、version、keys...、values...,搭配borsh或base64负载。
  • 数据回填检查点表:slot、signature、processed标记。
  • 存储模式:高频路径采用宽表;历史数据按天分区。

Common Failure Modes + Debugging

常见故障模式与调试

  • Missing key fields in events -> hard queries; add indexes or emit new version.
  • Backfill gaps from rate limits; implement retries and cursors.
  • Duplicate processing on reorgs; use slot+sig idempotency key.
  • Unbounded storage growth; set retention or cold storage.
  • 事件中缺少关键字段 -> 查询困难;添加索引或发布新版本事件。
  • 速率限制导致数据回填存在缺口;实现重试和游标机制。
  • 链重组时重复处理;使用slot+sig作为幂等键。
  • 存储容量无限制增长;设置留存周期或冷存储。

Quality Bar / Validation

质量标准/验证

  • Clear rationale for indexing vs RPC; event design documented.
  • Backfill completed with verification counts; lag monitored.
  • APIs tested against target queries with latency targets met.
  • 明确记录选择索引而非RPC的理由;事件设计文档化。
  • 数据回填完成并经过计数验证;监控延迟情况。
  • API针对目标查询进行测试,满足延迟要求。

Output Format

输出格式

Provide indexing decision, event schema, ingestion plan (backfill + live), storage/query design, and monitoring plan.
提供索引决策、事件schema、摄入计划(回填+实时)、存储/查询设计,以及监控方案。

Examples

示例

  • Simple: Small app uses RPC + caching; no indexer needed; document reasons.
  • Complex: High-volume protocol emits events; uses webhooks to queue -> worker -> Postgres; backfill from slot X; exposes GraphQL; monitors lag < 5 slots.
  • 简单场景:小型应用使用RPC+缓存;无需索引器;记录理由。
  • 复杂场景:高交易量协议输出事件;使用webhooks进入队列 -> 处理进程 -> Postgres;从插槽X开始回填;提供GraphQL接口;监控延迟<5个插槽。