service-catalog-entry

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Service Catalog Entry Skill

服务目录条目编写技能

Produce a complete service catalog entry for a microservice or internal platform service — giving any engineer at the company the context they need to understand what the service does, how to depend on it, what its reliability characteristics are, and where to go when something goes wrong. A well-written catalog entry eliminates "who owns this?" and "is this safe to use?" questions that slow down teams depending on shared services.
为微服务或内部平台服务生成完整的服务目录条目——让公司内的任何工程师都能了解该服务的功能、如何依赖它、它的可靠性特性,以及出现问题时该联系谁。一份撰写精良的目录条目能够消除诸如“谁负责这个服务?”和“使用它是否安全?”这类问题,避免这些问题拖慢依赖共享服务的团队进度。

Required Inputs

必填输入信息

Ask for these if not already provided:
  • Service name — the canonical identifier used in code, monitoring, and deployments
  • Team and owner — team name, tech lead name, and on-call contact
  • Architecture overview — what the service does, what calls it, and what it calls
  • SLA requirements — availability target, latency SLO, support tier, and maintenance window
  • Key APIs — the most important endpoints other teams use (method, path, brief description)
  • Data handled — what data the service stores or processes, sensitivity classification, retention
如果未提供以下信息,请主动询问:
  • 服务名称——代码、监控和部署中使用的标准标识符
  • 团队与负责人——团队名称、技术负责人姓名及值班联系人
  • 架构概述——服务功能、调用它的对象以及它调用的对象
  • SLA要求——可用性目标、延迟SLO、支持层级和维护窗口
  • 核心API——其他团队使用的最重要端点(方法、路径、简要说明)
  • 处理的数据——服务存储或处理的数据类型、敏感度分类、保留期限

Output Format

输出格式



Service Catalog: [Service Name]

服务目录:[服务名称]

[One sentence — what this service does for consumers, in plain language]
e.g. "The Payments Service processes charge, refund, and subscription billing events for all Acme products."

[一句话——用通俗语言说明该服务为消费者提供的价值]
例如:“支付服务为Acme所有产品处理收费、退款和订阅计费事件。”

Identity

标识信息

FieldValue
Service name
[service-name]
Canonical repository[https://github.com/[org]/[repo]]
Owner team[Team name]
Tech lead[Name] ([Slack: @handle])
On-call rotation[PagerDuty service link]
Slack channel
#[team-channel]
Support tier[Tier 1 — 24/7 / Tier 2 — business hours / Tier 3 — best effort]
Status[Active / Deprecated / Sunset date: YYYY-MM-DD]
Language / runtime[e.g. Go 1.22 / Python 3.12 / Node 20]
Deployment platform[Kubernetes / ECS / Lambda / etc.]
Environments[Production: URL]

字段
服务名称
[service-name]
标准代码仓库[https://github.com/[org]/[repo]]
负责团队[团队名称]
技术负责人[姓名] ([Slack: @handle])
值班轮换[PagerDuty服务链接]
Slack频道
#[team-channel]
支持层级[Tier 1 — 7×24小时 / Tier 2 — 工作时间 / Tier 3 — 尽力而为]
状态[活跃 / 已弃用 / 下线日期: YYYY-MM-DD]
语言/运行时[例如:Go 1.22 / Python 3.12 / Node 20]
部署平台[Kubernetes / ECS / Lambda / 等]
环境[生产环境: URL]

What It Does

服务功能

[Two to three paragraphs in plain language — no jargon or acronyms without explanation.]
[Paragraph 1: The business problem this service solves. What would break or be missing if this service did not exist?]
[Paragraph 2: How it works at a high level — the main processing model (e.g. request/response API, event-driven consumer, batch processor), what triggers it, and what it produces.]
[Paragraph 3: What this service is NOT responsible for — the explicit boundaries. This prevents other teams from building incorrect assumptions about scope.]

[2-3段通俗语言描述——无专业术语或未解释的缩写。]
[第一段:该服务解决的业务问题。如果没有这个服务,哪些功能会失效或缺失?]
[第二段:服务的高层工作原理——主要处理模式(如请求/响应API、事件驱动型消费者、批处理程序)、触发条件以及输出内容。]
[第三段:该服务不负责的事项——明确的边界。避免其他团队对服务范围产生错误假设。]

Architecture Context

架构背景

System Diagram

系统架构图

[Upstream callers]          [This Service]             [Downstream dependencies]
                                                        
  [Web App]  ──────────→                          ──→  [Primary Database — PostgreSQL]
  [Mobile API]  ────────→  [Service Name]         ──→  [Cache — Redis]
  [Partner API] ────────→  (Port 8080/gRPC)       ──→  [Message Queue — Kafka/SQS]
                                                   ──→  [External Service / API]
                           ↓ emits events to
                        [Event Bus / SNS]
                           ↓ consumed by
                  [Downstream Service A]
                  [Downstream Service B]
[上游调用方]          [本服务]             [下游依赖项]
                                                        
  [Web应用]  ──────────→                          ──→  [主数据库 — PostgreSQL]
  [移动端API]  ────────→  [服务名称]         ──→  [缓存 — Redis]
  [合作伙伴API] ────────→  (端口8080/gRPC)       ──→  [消息队列 — Kafka/SQS]
                                                   ──→  [外部服务/API]
                           ↓ 向以下组件发送事件
                        [事件总线 / SNS]
                           ↓ 被以下组件消费
                  [下游服务A]
                  [下游服务B]

Who Depends on This Service

依赖本服务的对象

CallerHow they use itContact
[Service / Team A][e.g. "Calls POST /charges to initiate payments"][Slack: #team-a]
[Service / Team B][e.g. "Subscribes to payment.completed events via Kafka topic"][Slack: #team-b]
[Service / Team C][e.g. "Calls GET /subscriptions for billing status"][Slack: #team-c]
调用方使用方式联系人
[服务/团队A][例如:“调用POST /charges接口发起支付”][Slack: #team-a]
[服务/团队B][例如:“通过Kafka主题订阅payment.completed事件”][Slack: #team-b]
[服务/团队C][例如:“调用GET /subscriptions接口查询计费状态”][Slack: #team-c]

What This Service Depends On

本服务依赖的对象

DependencyTypeCriticalityTheir on-call
[PostgreSQL instance]DatabaseCritical — all writes fail without it[DBA team: #db-oncall]
[Redis cluster]CacheHigh — latency degrades without it[Infra team: #infra-oncall]
[Kafka cluster]Message queueHigh — async events queue[Infra team: #infra-oncall]
[Stripe API]External APICritical — payment processing fails[vendor status: status.stripe.com]
[Auth Service]Internal serviceCritical — all auth fails[Auth team: #auth-oncall]

依赖项类型关键程度值班联系人
[PostgreSQL实例]数据库关键——无此依赖所有写入操作失败[DBA团队: #db-oncall]
[Redis集群]缓存高——无此依赖延迟会上升[基础设施团队: #infra-oncall]
[Kafka集群]消息队列高——异步事件会排队[基础设施团队: #infra-oncall]
[Stripe API]外部API关键——支付处理会失败[供应商状态页: status.stripe.com]
[认证服务]内部服务关键——所有认证操作失败[认证团队: #auth-oncall]

Service Level Agreement

服务级别协议(SLA)

Availability and Latency

可用性与延迟

SLOTargetMeasurement windowError budget
Availability[99.9%]Rolling 30 days[43 min/month]
p50 latency (key endpoints)< [50] msRolling 24 hours
p99 latency (key endpoints)< [500] msRolling 24 hours
p99.9 latency (key endpoints)< [2000] msRolling 24 hours
Error rate< [0.1]%Rolling 1 hour
SLO dashboard: [Link to monitoring dashboard] Current error budget remaining: [Link to SLO dashboard or inline value]
SLO目标值统计窗口错误预算
可用性[99.9%]滚动30天[每月43分钟]
p50延迟(核心端点)< [50] 毫秒滚动24小时
p99延迟(核心端点)< [500] 毫秒滚动24小时
p99.9延迟(核心端点)< [2000] 毫秒滚动24小时
错误率< [0.1]%滚动1小时
SLO仪表盘: [监控仪表盘链接] 剩余错误预算: [SLO仪表盘链接或内联数值]

Support Tiers

支持层级

TierScopeResponse timeResolution time
P1 — Service downAll authenticated requests failing15 minutes1 hour
P2 — Significant degradationError rate >1% or p99 >2× SLO30 minutes4 hours
P3 — Minor issuesNon-critical endpoints degradedNext business day3 business days
Feature requests / bugsVia standard ticket process[Ticket SLA]Per roadmap
To raise an incident: Page via [PagerDuty service link] or post in
#incidents
. To raise a feature request or bug: File a ticket in [JIRA project / GitHub repo Issues].
级别范围响应时间解决时间
P1 — 服务宕机所有认证请求失败15分钟1小时
P2 — 严重性能下降错误率>1%或p99延迟超过SLO的2倍30分钟4小时
P3 — 轻微问题非核心端点性能下降下一个工作日3个工作日
功能请求/BUG通过标准工单流程提交[工单SLA]按路线图安排
上报事件: 通过[PagerDuty服务链接]发起呼叫或在
#incidents
频道发布消息。 提交功能请求/BUG: 在[JIRA项目/GitHub仓库Issues]中创建工单。

Maintenance Windows

维护窗口

  • Planned downtime: [e.g. "Sundays 02:00–04:00 UTC — advance notice posted to #[team-channel] 48h before"]
  • Deployment window: [e.g. "Weekdays 10:00–16:00 UTC — no deploys on Fridays or the day before a public holiday"]
  • Breaking changes notice: [e.g. "Minimum 30 days notice for breaking API changes — see versioning policy below"]

  • 计划停机: [例如:“每周日02:00–04:00 UTC — 提前48小时在#[team-channel]频道发布通知”]
  • 部署窗口: [例如:“工作日10:00–16:00 UTC — 周五或公共假期前一天不部署”]
  • 破坏性变更通知: [例如:“API破坏性变更至少提前30天通知 — 见下方版本控制策略”]

API Contract

API合约

Authentication

认证方式

All API calls require: [e.g. "Bearer token via Authorization header. Tokens are issued by the Auth Service (
/api/v1/token
)"]
Authorization: Bearer [jwt-token]
Content-Type: application/json
所有API调用需满足:[例如:“通过Authorization头部传递Bearer令牌。令牌由认证服务(
/api/v1/token
)颁发”]
Authorization: Bearer [jwt-token]
Content-Type: application/json

Base URL

基础URL

EnvironmentBase URL
Production
https://[service-name].internal.[company].com
Staging
https://[service-name].staging.[company].com
Local development
http://localhost:[port]
环境基础URL
生产环境
https://[service-name].internal.[company].com
预发布环境
https://[service-name].staging.[company].com
本地开发环境
http://localhost:[port]

Key Endpoints

核心端点

MethodPathDescriptionAuth requiredRate limit
GET
/health
Liveness and readiness checkNoNone
GET
/api/v1/[resource]
[Description — e.g. "List resources for the authenticated user"]Yes[100 req/min]
GET
/api/v1/[resource]/:id
[Description — e.g. "Get a single resource by ID"]Yes[500 req/min]
POST
/api/v1/[resource]
[Description — e.g. "Create a new resource"]Yes[50 req/min]
PUT
/api/v1/[resource]/:id
[Description — e.g. "Update an existing resource"]Yes[50 req/min]
DELETE
/api/v1/[resource]/:id
[Description]Yes[20 req/min]
Full API documentation: [OpenAPI/Swagger spec URL] | [Postman collection URL]
方法路径描述是否需要认证速率限制
GET
/health
存活与就绪检查
GET
/api/v1/[resource]
[描述——例如:“列出认证用户的资源”][100次/分钟]
GET
/api/v1/[resource]/:id
[描述——例如:“根据ID获取单个资源”][500次/分钟]
POST
/api/v1/[resource]
[描述——例如:“创建新资源”][50次/分钟]
PUT
/api/v1/[resource]/:id
[描述——例如:“更新现有资源”][50次/分钟]
DELETE
/api/v1/[resource]/:id
[描述][20次/分钟]
完整API文档: [OpenAPI/Swagger规范URL] | [Postman集合URL]

Versioning Policy

版本控制策略

  • API version is in the URL path (
    /api/v1/
    ,
    /api/v2/
    )
  • Minor additions (new optional fields, new endpoints) are non-breaking — no version bump
  • Breaking changes (removed fields, changed types, authentication changes) require a new major version
  • Deprecated versions are supported for [90 days] after the successor reaches GA
  • Deprecation notices are posted to
    #[team-channel]
    and emailed to registered consumers
  • API版本包含在URL路径中(
    /api/v1/
    ,
    /api/v2/
    )
  • 次要新增(新增可选字段、新增端点)属于非破坏性变更——无需升级版本
  • 破坏性变更(删除字段、修改类型、认证方式变更)需要升级主版本
  • 已弃用版本在替代版本正式发布后仍支持[90天]
  • 弃用通知会发布到
    #[team-channel]
    频道并发送给已注册的消费者

Error Response Format

错误响应格式

json
{
  "error": {
    "code": "[ERROR_CODE]",
    "message": "[Human-readable description]",
    "request_id": "[UUID — include in support tickets]",
    "details": {}
  }
}
Common error codes:
HTTP statusError codeMeaning
400
INVALID_REQUEST
Request body or parameters fail validation
401
UNAUTHENTICATED
Missing or invalid auth token
403
FORBIDDEN
Token valid but lacks permission for this resource
404
NOT_FOUND
Resource does not exist
409
CONFLICT
Duplicate resource or state conflict
422
UNPROCESSABLE_ENTITY
Request is valid but violates business rules
429
RATE_LIMITED
Too many requests — back off and retry
500
INTERNAL_ERROR
Unexpected server error — include request_id in support ticket
503
SERVICE_UNAVAILABLE
Downstream dependency unavailable — retry with backoff
json
{
  "error": {
    "code": "[ERROR_CODE]",
    "message": "[人类可读描述]",
    "request_id": "[UUID — 上报支持工单时需包含]",
    "details": {}
  }
}
常见错误码:
HTTP状态码错误码含义
400
INVALID_REQUEST
请求体或参数验证失败
401
UNAUTHENTICATED
缺少或无效的认证令牌
403
FORBIDDEN
令牌有效但无该资源的访问权限
404
NOT_FOUND
资源不存在
409
CONFLICT
资源重复或状态冲突
422
UNPROCESSABLE_ENTITY
请求格式有效但违反业务规则
429
RATE_LIMITED
请求过于频繁——请退避并重试
500
INTERNAL_ERROR
意外服务器错误——上报支持工单时需包含request_id
503
SERVICE_UNAVAILABLE
下游依赖不可用——请退避并重试

Events Published (if event-driven)

发布的事件(如果是事件驱动型服务)

EventTopic / QueueSchemaPublished when
[resource].created
[kafka-topic / sns-arn]
[Schema URL][When a new resource is created]
[resource].updated
[kafka-topic / sns-arn]
[Schema URL][When a resource is modified]
[resource].deleted
[kafka-topic / sns-arn]
[Schema URL][When a resource is deleted]

事件主题/队列Schema发布时机
[resource].created
[kafka-topic / sns-arn]
[Schema URL][当新资源创建时]
[resource].updated
[kafka-topic / sns-arn]
[Schema URL][当资源被修改时]
[resource].deleted
[kafka-topic / sns-arn]
[Schema URL][当资源被删除时]

Data Classification

数据分类

Data elementSensitivityStored inRetentionEncrypted at rest
[User PII — e.g. email, name][PII / Restricted][PostgreSQL
users
table]
[Until account deletion]Yes
[Financial data — e.g. card last 4][PCI / Highly restricted][PostgreSQL
payment_methods
table]
[7 years per regulations]Yes — field-level encryption
[Operational logs][Internal][CloudWatch / Datadog][90 days]Yes (at rest, not searched)
[Anonymised analytics][Public][Data warehouse][Indefinite]Yes
Data residency: [e.g. "All data stored in us-east-1. EU customer data stored in eu-west-1 per GDPR requirements."] Compliance scope: [e.g. SOC 2 Type II / PCI DSS Level 2 / HIPAA / GDPR] Data access policy: [e.g. "Production database access requires [approval process]. Access logged and reviewed quarterly."]

数据元素敏感度存储位置保留期限静态加密
[用户PII——例如:邮箱、姓名][PII / 受限][PostgreSQL
users
表]
[直到账号删除]
[金融数据——例如:银行卡后4位][PCI / 高度受限][PostgreSQL
payment_methods
表]
[按法规保留7年]是——字段级加密
[运维日志][内部][CloudWatch / Datadog][90天]是(静态加密,不被搜索)
[匿名分析数据][公开][数据仓库][永久]
数据驻留地: [例如:“所有数据存储在us-east-1区域。欧盟客户数据根据GDPR要求存储在eu-west-1区域。”] 合规范围: [例如:SOC 2 Type II / PCI DSS Level 2 / HIPAA / GDPR] 数据访问策略: [例如:“生产数据库访问需要[审批流程]。访问记录会被记录并每季度审核。”]

Operational Runbooks

运维手册

RunbookLocationUse when
On-call runbook[Wiki / GitHub link]Responding to PagerDuty alerts
Deployment runbook[Wiki / GitHub link]Deploying a new version to production
Database migration runbook[Wiki / GitHub link]Running schema migrations
Rollback runbook[Wiki / GitHub link]Rolling back a bad deploy
Incident response runbook[Wiki / GitHub link]Declaring and managing incidents
Disaster recovery plan[Wiki / GitHub link]Zone/region failure or data loss
Monitoring dashboards:
DashboardLinkUse it for
Service overview[Datadog / Grafana link]Error rate, latency, throughput
Infrastructure[Link]CPU, memory, pod health
Database[Link]Query performance, connection pool
SLO / error budget[Link]Budget burn rate, availability
Dependency health[Link]Upstream dependency status

手册位置使用场景
值班手册[维基/GitHub链接]响应PagerDuty告警
部署手册[维基/GitHub链接]将新版本部署到生产环境
数据库迁移手册[维基/GitHub链接]执行schema迁移
回滚手册[维基/GitHub链接]回滚有问题的部署
事件响应手册[维基/GitHub链接]声明并管理事件
灾难恢复计划[维基/GitHub链接]可用区/区域故障或数据丢失
监控仪表盘:
仪表盘链接用途
服务概览[Datadog / Grafana链接]错误率、延迟、吞吐量
基础设施[链接]CPU、内存、Pod健康状态
数据库[链接]查询性能、连接池
SLO/错误预算[链接]预算消耗率、可用性
依赖健康状态[链接]上游依赖状态

Known Limitations

已知限制

Document limitations honestly — this section prevents other teams from building on incorrect assumptions.
LimitationImpactWorkaroundPlanned fix
[e.g. No bulk write API — items must be created one at a time][Slow for large imports — N HTTP calls required][Use the batch import CLI tool for >100 items][Bulk API in Q3 — ticket: [URL]]
[e.g. List endpoints have a maximum page size of 100][Cannot retrieve more than 100 items in a single call][Paginate using
cursor
parameter]
[No current plan to increase — by design]
[e.g. Rate limits are per-token, not per-service][High-traffic consumers may hit limits for other consumers on the same token][Request dedicated service-account token][Per-service rate limits in roadmap]
[e.g. Eventual consistency on read-after-write for list endpoints][Record may not appear in list immediately after creation (<500ms lag)][Use GET /:id to confirm creation; do not rely on list for immediate consistency][Read-your-writes consistency available via
?consistent=true
— in progress]

如实记录限制——本部分可避免其他团队基于错误假设进行开发。
限制影响临时解决方案计划修复
[例如:无批量写入API——必须逐个创建条目][大规模导入速度慢——需要N次HTTP调用][导入超过100条数据时使用批量导入CLI工具][Q3推出批量API——工单:[URL]]
[例如:列表端点最大分页大小为100][单次调用无法获取超过100条数据][使用
cursor
参数进行分页]
[目前无增大计划——设计如此]
[例如:速率限制基于令牌而非服务][高流量消费者可能导致同一令牌下的其他消费者触发限制][申请专用服务账号令牌][路线图中计划支持基于服务的速率限制]
[例如:列表端点写入后读取存在最终一致性][记录创建后可能不会立即出现在列表中(延迟<500ms)][使用GET /:id确认创建;不要依赖列表获取即时一致性][通过
?consistent=true
实现读写一致性——开发中]

Getting Started

快速开始

To start using this service:
  1. Request access: [Link to access request form or instructions]
  2. Get your service account credentials: [Link to process]
  3. Read the API docs: [OpenAPI spec URL]
  4. Try the sandbox environment:
    https://[service-name].sandbox.[company].com
  5. Join the consumer Slack channel:
    #[service-name]-consumers
Client libraries (if available):
LanguagePackageInstallation
[Python][
[package-name]
]
pip install [package-name]
[Go][
github.com/[org]/[package]
]
go get github.com/[org]/[package]
[TypeScript/JS][
@[org]/[package]
]
npm install @[org]/[package]

开始使用本服务:
  1. 请求访问权限:[访问请求表单或说明链接]
  2. 获取服务账号凭证:[流程链接]
  3. 阅读API文档:[OpenAPI规范URL]
  4. 试用沙箱环境:
    https://[service-name].sandbox.[company].com
  5. 加入消费者Slack频道:
    #[service-name]-consumers
客户端库(如有):
语言包名安装方式
[Python][
[package-name]
]
pip install [package-name]
[Go][
github.com/[org]/[package]
]
go get github.com/[org]/[package]
[TypeScript/JS][
@[org]/[package]
]
npm install @[org]/[package]

Quality Checks

质量检查清单

  • "What It Does" is written without jargon — a new engineer from another team can understand it in under 2 minutes
  • SLO targets are specific numbers agreed with stakeholders — not aspirational or copied from a template
  • All direct upstream consumers are listed in the "Who Depends on This" table — no omissions
  • API error codes are accurate and tested — not aspirational documentation
  • Known limitations are honest — nothing is glossed over to make the service look better than it is
  • All runbook links are live — not broken references or TODO placeholders
  • Data classification includes retention period and encryption status — not just sensitivity level
  • The entry has been reviewed by at least one consumer team to confirm it matches their experience of the service
  • “服务功能”部分无专业术语——其他团队的新工程师能在2分钟内理解
  • SLO目标是与利益相关方商定的具体数值——而非空想或模板内容
  • “依赖本服务的对象”表中列出了所有直接上游消费者——无遗漏
  • API错误码准确且经过测试——而非空想的文档
  • 已知限制如实记录——未为了美化服务而掩盖问题
  • 所有手册链接都是可用的——而非无效链接或待办占位符
  • 数据分类包含保留期限和加密状态——而非仅敏感度级别
  • 条目已至少经过一个消费者团队审核,确认与他们使用服务的体验一致