service-catalog-entry
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseService Catalog Entry Skill
服务目录条目编写技能
Produce a complete service catalog entry for a microservice or internal platform service — giving any engineer at the company the context they need to understand what the service does, how to depend on it, what its reliability characteristics are, and where to go when something goes wrong. A well-written catalog entry eliminates "who owns this?" and "is this safe to use?" questions that slow down teams depending on shared services.
为微服务或内部平台服务生成完整的服务目录条目——让公司内的任何工程师都能了解该服务的功能、如何依赖它、它的可靠性特性,以及出现问题时该联系谁。一份撰写精良的目录条目能够消除诸如“谁负责这个服务?”和“使用它是否安全?”这类问题,避免这些问题拖慢依赖共享服务的团队进度。
Required Inputs
必填输入信息
Ask for these if not already provided:
- Service name — the canonical identifier used in code, monitoring, and deployments
- Team and owner — team name, tech lead name, and on-call contact
- Architecture overview — what the service does, what calls it, and what it calls
- SLA requirements — availability target, latency SLO, support tier, and maintenance window
- Key APIs — the most important endpoints other teams use (method, path, brief description)
- Data handled — what data the service stores or processes, sensitivity classification, retention
如果未提供以下信息,请主动询问:
- 服务名称——代码、监控和部署中使用的标准标识符
- 团队与负责人——团队名称、技术负责人姓名及值班联系人
- 架构概述——服务功能、调用它的对象以及它调用的对象
- SLA要求——可用性目标、延迟SLO、支持层级和维护窗口
- 核心API——其他团队使用的最重要端点(方法、路径、简要说明)
- 处理的数据——服务存储或处理的数据类型、敏感度分类、保留期限
Output Format
输出格式
Service Catalog: [Service Name]
服务目录:[服务名称]
[One sentence — what this service does for consumers, in plain language]e.g. "The Payments Service processes charge, refund, and subscription billing events for all Acme products."
[一句话——用通俗语言说明该服务为消费者提供的价值]例如:“支付服务为Acme所有产品处理收费、退款和订阅计费事件。”
Identity
标识信息
| Field | Value |
|---|---|
| Service name | |
| Canonical repository | [https://github.com/[org]/[repo]] |
| Owner team | [Team name] |
| Tech lead | [Name] ([Slack: @handle]) |
| On-call rotation | [PagerDuty service link] |
| Slack channel | |
| Support tier | [Tier 1 — 24/7 / Tier 2 — business hours / Tier 3 — best effort] |
| Status | [Active / Deprecated / Sunset date: YYYY-MM-DD] |
| Language / runtime | [e.g. Go 1.22 / Python 3.12 / Node 20] |
| Deployment platform | [Kubernetes / ECS / Lambda / etc.] |
| Environments | [Production: URL] |
| 字段 | 值 |
|---|---|
| 服务名称 | |
| 标准代码仓库 | [https://github.com/[org]/[repo]] |
| 负责团队 | [团队名称] |
| 技术负责人 | [姓名] ([Slack: @handle]) |
| 值班轮换 | [PagerDuty服务链接] |
| Slack频道 | |
| 支持层级 | [Tier 1 — 7×24小时 / Tier 2 — 工作时间 / Tier 3 — 尽力而为] |
| 状态 | [活跃 / 已弃用 / 下线日期: YYYY-MM-DD] |
| 语言/运行时 | [例如:Go 1.22 / Python 3.12 / Node 20] |
| 部署平台 | [Kubernetes / ECS / Lambda / 等] |
| 环境 | [生产环境: URL] |
What It Does
服务功能
[Two to three paragraphs in plain language — no jargon or acronyms without explanation.]
[Paragraph 1: The business problem this service solves. What would break or be missing if this service did not exist?]
[Paragraph 2: How it works at a high level — the main processing model (e.g. request/response API, event-driven consumer, batch processor), what triggers it, and what it produces.]
[Paragraph 3: What this service is NOT responsible for — the explicit boundaries. This prevents other teams from building incorrect assumptions about scope.]
[2-3段通俗语言描述——无专业术语或未解释的缩写。]
[第一段:该服务解决的业务问题。如果没有这个服务,哪些功能会失效或缺失?]
[第二段:服务的高层工作原理——主要处理模式(如请求/响应API、事件驱动型消费者、批处理程序)、触发条件以及输出内容。]
[第三段:该服务不负责的事项——明确的边界。避免其他团队对服务范围产生错误假设。]
Architecture Context
架构背景
System Diagram
系统架构图
[Upstream callers] [This Service] [Downstream dependencies]
[Web App] ──────────→ ──→ [Primary Database — PostgreSQL]
[Mobile API] ────────→ [Service Name] ──→ [Cache — Redis]
[Partner API] ────────→ (Port 8080/gRPC) ──→ [Message Queue — Kafka/SQS]
──→ [External Service / API]
↓ emits events to
[Event Bus / SNS]
↓ consumed by
[Downstream Service A]
[Downstream Service B][上游调用方] [本服务] [下游依赖项]
[Web应用] ──────────→ ──→ [主数据库 — PostgreSQL]
[移动端API] ────────→ [服务名称] ──→ [缓存 — Redis]
[合作伙伴API] ────────→ (端口8080/gRPC) ──→ [消息队列 — Kafka/SQS]
──→ [外部服务/API]
↓ 向以下组件发送事件
[事件总线 / SNS]
↓ 被以下组件消费
[下游服务A]
[下游服务B]Who Depends on This Service
依赖本服务的对象
| Caller | How they use it | Contact |
|---|---|---|
| [Service / Team A] | [e.g. "Calls POST /charges to initiate payments"] | [Slack: #team-a] |
| [Service / Team B] | [e.g. "Subscribes to payment.completed events via Kafka topic"] | [Slack: #team-b] |
| [Service / Team C] | [e.g. "Calls GET /subscriptions for billing status"] | [Slack: #team-c] |
| 调用方 | 使用方式 | 联系人 |
|---|---|---|
| [服务/团队A] | [例如:“调用POST /charges接口发起支付”] | [Slack: #team-a] |
| [服务/团队B] | [例如:“通过Kafka主题订阅payment.completed事件”] | [Slack: #team-b] |
| [服务/团队C] | [例如:“调用GET /subscriptions接口查询计费状态”] | [Slack: #team-c] |
What This Service Depends On
本服务依赖的对象
| Dependency | Type | Criticality | Their on-call |
|---|---|---|---|
| [PostgreSQL instance] | Database | Critical — all writes fail without it | [DBA team: #db-oncall] |
| [Redis cluster] | Cache | High — latency degrades without it | [Infra team: #infra-oncall] |
| [Kafka cluster] | Message queue | High — async events queue | [Infra team: #infra-oncall] |
| [Stripe API] | External API | Critical — payment processing fails | [vendor status: status.stripe.com] |
| [Auth Service] | Internal service | Critical — all auth fails | [Auth team: #auth-oncall] |
| 依赖项 | 类型 | 关键程度 | 值班联系人 |
|---|---|---|---|
| [PostgreSQL实例] | 数据库 | 关键——无此依赖所有写入操作失败 | [DBA团队: #db-oncall] |
| [Redis集群] | 缓存 | 高——无此依赖延迟会上升 | [基础设施团队: #infra-oncall] |
| [Kafka集群] | 消息队列 | 高——异步事件会排队 | [基础设施团队: #infra-oncall] |
| [Stripe API] | 外部API | 关键——支付处理会失败 | [供应商状态页: status.stripe.com] |
| [认证服务] | 内部服务 | 关键——所有认证操作失败 | [认证团队: #auth-oncall] |
Service Level Agreement
服务级别协议(SLA)
Availability and Latency
可用性与延迟
| SLO | Target | Measurement window | Error budget |
|---|---|---|---|
| Availability | [99.9%] | Rolling 30 days | [43 min/month] |
| p50 latency (key endpoints) | < [50] ms | Rolling 24 hours | — |
| p99 latency (key endpoints) | < [500] ms | Rolling 24 hours | — |
| p99.9 latency (key endpoints) | < [2000] ms | Rolling 24 hours | — |
| Error rate | < [0.1]% | Rolling 1 hour | — |
SLO dashboard: [Link to monitoring dashboard]
Current error budget remaining: [Link to SLO dashboard or inline value]
| SLO | 目标值 | 统计窗口 | 错误预算 |
|---|---|---|---|
| 可用性 | [99.9%] | 滚动30天 | [每月43分钟] |
| p50延迟(核心端点) | < [50] 毫秒 | 滚动24小时 | — |
| p99延迟(核心端点) | < [500] 毫秒 | 滚动24小时 | — |
| p99.9延迟(核心端点) | < [2000] 毫秒 | 滚动24小时 | — |
| 错误率 | < [0.1]% | 滚动1小时 | — |
SLO仪表盘: [监控仪表盘链接]
剩余错误预算: [SLO仪表盘链接或内联数值]
Support Tiers
支持层级
| Tier | Scope | Response time | Resolution time |
|---|---|---|---|
| P1 — Service down | All authenticated requests failing | 15 minutes | 1 hour |
| P2 — Significant degradation | Error rate >1% or p99 >2× SLO | 30 minutes | 4 hours |
| P3 — Minor issues | Non-critical endpoints degraded | Next business day | 3 business days |
| Feature requests / bugs | Via standard ticket process | [Ticket SLA] | Per roadmap |
To raise an incident: Page via [PagerDuty service link] or post in .
To raise a feature request or bug: File a ticket in [JIRA project / GitHub repo Issues].
#incidents| 级别 | 范围 | 响应时间 | 解决时间 |
|---|---|---|---|
| P1 — 服务宕机 | 所有认证请求失败 | 15分钟 | 1小时 |
| P2 — 严重性能下降 | 错误率>1%或p99延迟超过SLO的2倍 | 30分钟 | 4小时 |
| P3 — 轻微问题 | 非核心端点性能下降 | 下一个工作日 | 3个工作日 |
| 功能请求/BUG | 通过标准工单流程提交 | [工单SLA] | 按路线图安排 |
上报事件: 通过[PagerDuty服务链接]发起呼叫或在频道发布消息。
提交功能请求/BUG: 在[JIRA项目/GitHub仓库Issues]中创建工单。
#incidentsMaintenance Windows
维护窗口
- Planned downtime: [e.g. "Sundays 02:00–04:00 UTC — advance notice posted to #[team-channel] 48h before"]
- Deployment window: [e.g. "Weekdays 10:00–16:00 UTC — no deploys on Fridays or the day before a public holiday"]
- Breaking changes notice: [e.g. "Minimum 30 days notice for breaking API changes — see versioning policy below"]
- 计划停机: [例如:“每周日02:00–04:00 UTC — 提前48小时在#[team-channel]频道发布通知”]
- 部署窗口: [例如:“工作日10:00–16:00 UTC — 周五或公共假期前一天不部署”]
- 破坏性变更通知: [例如:“API破坏性变更至少提前30天通知 — 见下方版本控制策略”]
API Contract
API合约
Authentication
认证方式
All API calls require: [e.g. "Bearer token via Authorization header. Tokens are issued by the Auth Service ()"]
/api/v1/tokenAuthorization: Bearer [jwt-token]
Content-Type: application/json所有API调用需满足:[例如:“通过Authorization头部传递Bearer令牌。令牌由认证服务()颁发”]
/api/v1/tokenAuthorization: Bearer [jwt-token]
Content-Type: application/jsonBase URL
基础URL
| Environment | Base URL |
|---|---|
| Production | |
| Staging | |
| Local development | |
| 环境 | 基础URL |
|---|---|
| 生产环境 | |
| 预发布环境 | |
| 本地开发环境 | |
Key Endpoints
核心端点
| Method | Path | Description | Auth required | Rate limit |
|---|---|---|---|---|
| | Liveness and readiness check | No | None |
| | [Description — e.g. "List resources for the authenticated user"] | Yes | [100 req/min] |
| | [Description — e.g. "Get a single resource by ID"] | Yes | [500 req/min] |
| | [Description — e.g. "Create a new resource"] | Yes | [50 req/min] |
| | [Description — e.g. "Update an existing resource"] | Yes | [50 req/min] |
| | [Description] | Yes | [20 req/min] |
Full API documentation: [OpenAPI/Swagger spec URL] | [Postman collection URL]
| 方法 | 路径 | 描述 | 是否需要认证 | 速率限制 |
|---|---|---|---|---|
| | 存活与就绪检查 | 否 | 无 |
| | [描述——例如:“列出认证用户的资源”] | 是 | [100次/分钟] |
| | [描述——例如:“根据ID获取单个资源”] | 是 | [500次/分钟] |
| | [描述——例如:“创建新资源”] | 是 | [50次/分钟] |
| | [描述——例如:“更新现有资源”] | 是 | [50次/分钟] |
| | [描述] | 是 | [20次/分钟] |
完整API文档: [OpenAPI/Swagger规范URL] | [Postman集合URL]
Versioning Policy
版本控制策略
- API version is in the URL path (,
/api/v1/)/api/v2/ - Minor additions (new optional fields, new endpoints) are non-breaking — no version bump
- Breaking changes (removed fields, changed types, authentication changes) require a new major version
- Deprecated versions are supported for [90 days] after the successor reaches GA
- Deprecation notices are posted to and emailed to registered consumers
#[team-channel]
- API版本包含在URL路径中(,
/api/v1/)/api/v2/ - 次要新增(新增可选字段、新增端点)属于非破坏性变更——无需升级版本
- 破坏性变更(删除字段、修改类型、认证方式变更)需要升级主版本
- 已弃用版本在替代版本正式发布后仍支持[90天]
- 弃用通知会发布到频道并发送给已注册的消费者
#[team-channel]
Error Response Format
错误响应格式
json
{
"error": {
"code": "[ERROR_CODE]",
"message": "[Human-readable description]",
"request_id": "[UUID — include in support tickets]",
"details": {}
}
}Common error codes:
| HTTP status | Error code | Meaning |
|---|---|---|
| 400 | | Request body or parameters fail validation |
| 401 | | Missing or invalid auth token |
| 403 | | Token valid but lacks permission for this resource |
| 404 | | Resource does not exist |
| 409 | | Duplicate resource or state conflict |
| 422 | | Request is valid but violates business rules |
| 429 | | Too many requests — back off and retry |
| 500 | | Unexpected server error — include request_id in support ticket |
| 503 | | Downstream dependency unavailable — retry with backoff |
json
{
"error": {
"code": "[ERROR_CODE]",
"message": "[人类可读描述]",
"request_id": "[UUID — 上报支持工单时需包含]",
"details": {}
}
}常见错误码:
| HTTP状态码 | 错误码 | 含义 |
|---|---|---|
| 400 | | 请求体或参数验证失败 |
| 401 | | 缺少或无效的认证令牌 |
| 403 | | 令牌有效但无该资源的访问权限 |
| 404 | | 资源不存在 |
| 409 | | 资源重复或状态冲突 |
| 422 | | 请求格式有效但违反业务规则 |
| 429 | | 请求过于频繁——请退避并重试 |
| 500 | | 意外服务器错误——上报支持工单时需包含request_id |
| 503 | | 下游依赖不可用——请退避并重试 |
Events Published (if event-driven)
发布的事件(如果是事件驱动型服务)
| Event | Topic / Queue | Schema | Published when |
|---|---|---|---|
| | [Schema URL] | [When a new resource is created] |
| | [Schema URL] | [When a resource is modified] |
| | [Schema URL] | [When a resource is deleted] |
| 事件 | 主题/队列 | Schema | 发布时机 |
|---|---|---|---|
| | [Schema URL] | [当新资源创建时] |
| | [Schema URL] | [当资源被修改时] |
| | [Schema URL] | [当资源被删除时] |
Data Classification
数据分类
| Data element | Sensitivity | Stored in | Retention | Encrypted at rest |
|---|---|---|---|---|
| [User PII — e.g. email, name] | [PII / Restricted] | [PostgreSQL | [Until account deletion] | Yes |
| [Financial data — e.g. card last 4] | [PCI / Highly restricted] | [PostgreSQL | [7 years per regulations] | Yes — field-level encryption |
| [Operational logs] | [Internal] | [CloudWatch / Datadog] | [90 days] | Yes (at rest, not searched) |
| [Anonymised analytics] | [Public] | [Data warehouse] | [Indefinite] | Yes |
Data residency: [e.g. "All data stored in us-east-1. EU customer data stored in eu-west-1 per GDPR requirements."]
Compliance scope: [e.g. SOC 2 Type II / PCI DSS Level 2 / HIPAA / GDPR]
Data access policy: [e.g. "Production database access requires [approval process]. Access logged and reviewed quarterly."]
| 数据元素 | 敏感度 | 存储位置 | 保留期限 | 静态加密 |
|---|---|---|---|---|
| [用户PII——例如:邮箱、姓名] | [PII / 受限] | [PostgreSQL | [直到账号删除] | 是 |
| [金融数据——例如:银行卡后4位] | [PCI / 高度受限] | [PostgreSQL | [按法规保留7年] | 是——字段级加密 |
| [运维日志] | [内部] | [CloudWatch / Datadog] | [90天] | 是(静态加密,不被搜索) |
| [匿名分析数据] | [公开] | [数据仓库] | [永久] | 是 |
数据驻留地: [例如:“所有数据存储在us-east-1区域。欧盟客户数据根据GDPR要求存储在eu-west-1区域。”]
合规范围: [例如:SOC 2 Type II / PCI DSS Level 2 / HIPAA / GDPR]
数据访问策略: [例如:“生产数据库访问需要[审批流程]。访问记录会被记录并每季度审核。”]
Operational Runbooks
运维手册
| Runbook | Location | Use when |
|---|---|---|
| On-call runbook | [Wiki / GitHub link] | Responding to PagerDuty alerts |
| Deployment runbook | [Wiki / GitHub link] | Deploying a new version to production |
| Database migration runbook | [Wiki / GitHub link] | Running schema migrations |
| Rollback runbook | [Wiki / GitHub link] | Rolling back a bad deploy |
| Incident response runbook | [Wiki / GitHub link] | Declaring and managing incidents |
| Disaster recovery plan | [Wiki / GitHub link] | Zone/region failure or data loss |
Monitoring dashboards:
| Dashboard | Link | Use it for |
|---|---|---|
| Service overview | [Datadog / Grafana link] | Error rate, latency, throughput |
| Infrastructure | [Link] | CPU, memory, pod health |
| Database | [Link] | Query performance, connection pool |
| SLO / error budget | [Link] | Budget burn rate, availability |
| Dependency health | [Link] | Upstream dependency status |
| 手册 | 位置 | 使用场景 |
|---|---|---|
| 值班手册 | [维基/GitHub链接] | 响应PagerDuty告警 |
| 部署手册 | [维基/GitHub链接] | 将新版本部署到生产环境 |
| 数据库迁移手册 | [维基/GitHub链接] | 执行schema迁移 |
| 回滚手册 | [维基/GitHub链接] | 回滚有问题的部署 |
| 事件响应手册 | [维基/GitHub链接] | 声明并管理事件 |
| 灾难恢复计划 | [维基/GitHub链接] | 可用区/区域故障或数据丢失 |
监控仪表盘:
| 仪表盘 | 链接 | 用途 |
|---|---|---|
| 服务概览 | [Datadog / Grafana链接] | 错误率、延迟、吞吐量 |
| 基础设施 | [链接] | CPU、内存、Pod健康状态 |
| 数据库 | [链接] | 查询性能、连接池 |
| SLO/错误预算 | [链接] | 预算消耗率、可用性 |
| 依赖健康状态 | [链接] | 上游依赖状态 |
Known Limitations
已知限制
Document limitations honestly — this section prevents other teams from building on incorrect assumptions.
| Limitation | Impact | Workaround | Planned fix |
|---|---|---|---|
| [e.g. No bulk write API — items must be created one at a time] | [Slow for large imports — N HTTP calls required] | [Use the batch import CLI tool for >100 items] | [Bulk API in Q3 — ticket: [URL]] |
| [e.g. List endpoints have a maximum page size of 100] | [Cannot retrieve more than 100 items in a single call] | [Paginate using | [No current plan to increase — by design] |
| [e.g. Rate limits are per-token, not per-service] | [High-traffic consumers may hit limits for other consumers on the same token] | [Request dedicated service-account token] | [Per-service rate limits in roadmap] |
| [e.g. Eventual consistency on read-after-write for list endpoints] | [Record may not appear in list immediately after creation (<500ms lag)] | [Use GET /:id to confirm creation; do not rely on list for immediate consistency] | [Read-your-writes consistency available via |
如实记录限制——本部分可避免其他团队基于错误假设进行开发。
| 限制 | 影响 | 临时解决方案 | 计划修复 |
|---|---|---|---|
| [例如:无批量写入API——必须逐个创建条目] | [大规模导入速度慢——需要N次HTTP调用] | [导入超过100条数据时使用批量导入CLI工具] | [Q3推出批量API——工单:[URL]] |
| [例如:列表端点最大分页大小为100] | [单次调用无法获取超过100条数据] | [使用 | [目前无增大计划——设计如此] |
| [例如:速率限制基于令牌而非服务] | [高流量消费者可能导致同一令牌下的其他消费者触发限制] | [申请专用服务账号令牌] | [路线图中计划支持基于服务的速率限制] |
| [例如:列表端点写入后读取存在最终一致性] | [记录创建后可能不会立即出现在列表中(延迟<500ms)] | [使用GET /:id确认创建;不要依赖列表获取即时一致性] | [通过 |
Getting Started
快速开始
To start using this service:
- Request access: [Link to access request form or instructions]
- Get your service account credentials: [Link to process]
- Read the API docs: [OpenAPI spec URL]
- Try the sandbox environment:
https://[service-name].sandbox.[company].com - Join the consumer Slack channel:
#[service-name]-consumers
Client libraries (if available):
| Language | Package | Installation |
|---|---|---|
| [Python] | [ | |
| [Go] | [ | |
| [TypeScript/JS] | [ | |
开始使用本服务:
- 请求访问权限:[访问请求表单或说明链接]
- 获取服务账号凭证:[流程链接]
- 阅读API文档:[OpenAPI规范URL]
- 试用沙箱环境:
https://[service-name].sandbox.[company].com - 加入消费者Slack频道:
#[service-name]-consumers
客户端库(如有):
| 语言 | 包名 | 安装方式 |
|---|---|---|
| [Python] | [ | |
| [Go] | [ | |
| [TypeScript/JS] | [ | |
Quality Checks
质量检查清单
- "What It Does" is written without jargon — a new engineer from another team can understand it in under 2 minutes
- SLO targets are specific numbers agreed with stakeholders — not aspirational or copied from a template
- All direct upstream consumers are listed in the "Who Depends on This" table — no omissions
- API error codes are accurate and tested — not aspirational documentation
- Known limitations are honest — nothing is glossed over to make the service look better than it is
- All runbook links are live — not broken references or TODO placeholders
- Data classification includes retention period and encryption status — not just sensitivity level
- The entry has been reviewed by at least one consumer team to confirm it matches their experience of the service
- “服务功能”部分无专业术语——其他团队的新工程师能在2分钟内理解
- SLO目标是与利益相关方商定的具体数值——而非空想或模板内容
- “依赖本服务的对象”表中列出了所有直接上游消费者——无遗漏
- API错误码准确且经过测试——而非空想的文档
- 已知限制如实记录——未为了美化服务而掩盖问题
- 所有手册链接都是可用的——而非无效链接或待办占位符
- 数据分类包含保留期限和加密状态——而非仅敏感度级别
- 条目已至少经过一个消费者团队审核,确认与他们使用服务的体验一致