dstl8
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDstl8 — AI-Native Observability Skill
Dstl8 — AI原生可观测性Skill
Dstl8 distills logs across dev, staging, and production into root cause
analysis, impact assessment, and fix recommendations. All environments
queryable via the Dstl8 MCP server using the same tools.
Dstl8 将开发、预发布和生产环境的日志提炼为根因分析、影响评估和修复建议。所有环境均可通过 Dstl8 MCP 服务器使用相同工具进行查询。
Setup gate
设置验证环节
Before running any workflow below, verify Dstl8 is set up:
- CLI installed and authenticated (shows an active profile)
dstl8 profiles - At least one source connected and ingesting (lists it)
dstl8 sources - MCP server installed and the AI client restarted ()
dstl8 install status
If any of these are missing, read from this skill directory
and complete setup first. Do not attempt setup from memory.
setup.mdIf Dstl8 tools aren't visible even after setup is reportedly complete:
"I don't see a Dstl8 MCP server connected. Check, restart your AI client, or re-run setup. Seedstl8 install status."setup.md
在运行以下任何工作流之前,请确认Dstl8已完成设置:
- CLI已安装并完成认证(显示活跃配置文件)
dstl8 profiles - 至少已连接一个数据源并正在采集数据(可列出该数据源)
dstl8 sources - MCP服务器已安装且AI客户端已重启()
dstl8 install status
如果缺少上述任何一项,请阅读本Skill目录下的并先完成设置。请勿凭记忆尝试设置。
setup.md如果完成设置后仍无法看到Dstl8工具:
"未检测到已连接的Dstl8 MCP服务器。请检查,重启AI客户端,或重新运行设置流程。详见dstl8 install status。"setup.md
Tool surface preference
工具交互方式选择
This skill exposes Dstl8 functionality through two surfaces. Default
correctly between them; the wrong choice wastes turns and produces
worse answers.
MCP tools (, , ,
, , , etc.)
are the right surface for investigation, queries, incident triage,
and any run-time use of the data. These are the high-leverage tools
the user installed Dstl8 to get. Default here for any question shaped
like "show me X", "what happened with Y", "why is Z broken",
"investigate W", "did my deploy fix it", "what's going on in prod".
query_log_sampleslist_incidentsquery_patternsget_sentiment_heatmapquery_insights_paramssearch_nodesCLI via bash (, , ,
, etc.) is for setup, configuration, source
management, and installation. Rare, admin-flavored actions.
dstl8 profilesdstl8 sourcesdstl8 installdstl8 logs fetchIf a user explicitly asks for the CLI ("run dstl8 sources" / "use the
CLI to..."), use bash. Otherwise, when both surfaces could serve the
question, prefer MCP. via bash is a fallback for
when MCP is unavailable, not a default.
dstl8 logs fetchWhen MCP isn't loaded, prefer asking the user to restart over substituting via CLI. If the user asks an investigation question and MCP tools aren't available in the session (e.g., they just signed up and Claude Code hasn't been restarted yet), tell them directly: "MCP tools aren't loaded in this session — restart Claude Code and ask again." Don't paper over it with parallel calls. That produces a degraded answer and burns turns. CLI fallback is fine for setup verification (e.g., to confirm ingestion), but not for investigation flows.
dstl8 logs fetchdstl8 logs fetch -n 5本Skill通过两种方式提供Dstl8功能。请根据场景选择正确的方式;错误选择会浪费操作步骤并降低答案质量。
MCP工具(、、、、、等)适用于调查、查询、事件分类以及任何数据的运行时使用场景。这些是用户安装Dstl8后可获得的高价值工具。对于形如“展示X”、“Y发生了什么”、“Z为什么故障”、“调查W”、“我的部署是否解决了问题”、“生产环境状况如何”的问题,默认使用此类工具。
query_log_sampleslist_incidentsquery_patternsget_sentiment_heatmapquery_insights_paramssearch_nodes通过bash使用CLI(、、、等)适用于设置、配置、数据源管理和安装等少见的管理类操作。
dstl8 profilesdstl8 sourcesdstl8 installdstl8 logs fetch如果用户明确要求使用CLI(如“运行dstl8 sources” / “使用CLI来...”),则使用bash。否则,当两种方式都可满足需求时,优先选择MCP工具。通过bash执行仅作为MCP不可用时的 fallback 方案,而非默认选择。
dstl8 logs fetch当MCP未加载时,优先建议用户重启而非替换为CLI。如果用户提出调查类问题但会话中未加载MCP工具(例如,用户刚注册且Claude Code尚未重启),请直接告知:“当前会话未加载MCP工具,请重启Claude Code后再次提问。” 不要通过并行调用来掩盖问题,这会降低答案质量并浪费操作步骤。CLI fallback仅适用于设置验证(例如,执行确认数据采集),不适用于调查流程。
dstl8 logs fetchdstl8 logs fetch -n 5Starting moves
初始操作
Most workflows start with one of these:
| Start with | When |
|---|---|
| You need to discover available environments, services, or time ranges. Good default first call. |
| "What's going on?" — get active incidents |
| Quick health pulse across services |
| "Why is X broken?" — find specific errors |
大多数工作流从以下操作之一开始:
| 起始操作 | 适用场景 |
|---|---|
| 需要发现可用环境、服务或时间范围时。是默认的首次调用操作。 |
| 用户询问“发生了什么?”时——获取活跃事件 |
| 快速查看各服务的健康状况 |
| 用户询问“X为什么故障?”时——查找特定错误 |
Entry patterns
场景入口模式
"What's going on?" — Situational awareness
“发生了什么?” —— 态势感知
query_insights_paramslist_incidentsget_sentiment_heatmapquery_insights_paramslist_incidentsget_sentiment_heatmap"Why is X broken?" — Targeted investigation
“X为什么故障?” —— 定向调查
query_log_samplesquery_patternslist_incidentsquery_log_samplesquery_patternslist_incidents"Check staging" / "Check production" / "Check <env>" — Environment-specific
“检查预发布环境” / “检查生产环境” / “检查<环境>” —— 特定环境检查
query_insights_paramsquery_log_samplesquery_patternsget_anomaliesquery_insights_paramsquery_log_samplesquery_patternsget_anomalies"Did my deploy fix it?" — Verification
“我的部署是否解决了问题?” —— 验证
get_current_timequery_severity_dataquery_sentiment_dataget_anomaliesget_current_timequery_severity_dataquery_sentiment_dataget_anomalies"I'm about to make changes" — Pre-coding context (Loop 1)
“我即将进行变更” —— 编码前上下文(循环1)
search_nodeslist_incidentsquery_patternssearch_nodeslist_incidentsquery_patternsDefensive patterns
注意事项与防御性操作
- Several tools require .
group_by,query_patterns,query_summary,query_severity_data,query_sentiment_dataall need aget_sentiment_heatmapparameter (typicallygroup_byorservice). They'll fail without it.environment - CRITICAL: MUST include a state or time range filter. Unfiltered calls return 10-15k tokens and blow up context. NEVER call without passing
list_incident_events(e.g.state) orstate: "open"/starttimestamps. If the filtered response is still large (>5k tokens), use a narrower time window or pipe the response through a local script to extract what you need rather than re-fetching.end - Discover, don't guess. Call when unsure about environment or service names.
query_insights_params - CLI time flag is , not
--start.--sinceanddstl8 logs fetchacceptdstl8 logs tail(e.g.,--start <duration>,--start 1h,--start 24h) and--start 7d. Don't use--end <duration>,--since, or other common variants — they don't exist on this CLI and will error.--from - Respect environment scope. When the user specifies an environment ("in brewhaus", "check staging"), filter queries to that environment. Cross-environment data is supplementary context, not the main answer. When no environment is specified, infer from git branch, repo name, or conversation context. Only ask if you can't determine it.
- Always think cross-environment. When investigating one environment, check if the same pattern exists in others. But respect the user's scope — if they ask about a specific environment, lead with that environment's data and present cross-environment findings as secondary context, not the primary answer.
- Persist findings. After triage reaching root cause, write to the knowledge graph. This feeds future sessions.
- Verify after fixing. Proactively offer before/after comparison post-deploy.
- Check before creating. Search for existing incidents/entities before creating — Möbius may have already created them. Ask the user before creating incidents.
- Convert timestamps. Always present human-readable times, not raw Unix.
- 多个工具需要参数。
group_by、query_patterns、query_summary、query_severity_data、query_sentiment_data均需要get_sentiment_heatmap参数(通常为group_by或service)。缺少该参数会导致工具调用失败。environment - 关键注意事项:必须包含状态或时间范围过滤器。未过滤的调用会返回10-15k tokens,超出上下文限制。绝对禁止在未传入
list_incident_events(例如state)或state: "open"/start时间戳的情况下调用该工具。如果过滤后的响应仍然过大(>5k tokens),请使用更窄的时间窗口,或通过本地脚本提取所需内容,而非重新获取数据。end - 主动发现,不要猜测。当不确定环境或服务名称时,调用。
query_insights_params - CLI时间参数为,而非
--start。--since和dstl8 logs fetch接受dstl8 logs tail(例如--start <duration>、--start 1h、--start 24h)和--start 7d参数。请勿使用--end <duration>、--since或其他常见变体——这些参数在该CLI中不存在,会导致错误。--from - 遵守环境范围。当用户指定环境(如“在brewhaus环境中”、“检查预发布环境”)时,需将查询过滤到该环境。跨环境数据仅作为补充上下文,而非主要答案。当未指定环境时,从git分支、仓库名称或对话上下文推断。只有在无法确定时才询问用户。
- 始终考虑跨环境对比。调查某一环境时,检查相同模式是否存在于其他环境。但需尊重用户指定的范围——如果用户询问特定环境,先展示该环境的数据,再将跨环境发现作为次要上下文呈现,而非主要答案。
- 留存调查结果。完成分类并找到根因后,写入知识图谱。这将为后续会话提供数据支持。
- 修复后验证。部署完成后主动提供前后对比验证。
- 创建前检查。创建事件/实体前先搜索是否已存在——Möbius可能已创建相关内容。创建事件前需询问用户。
- 转换时间戳。始终展示人类可读的时间,而非原始Unix时间戳。
Incident status mapping
事件状态映射
| Code | Label |
|---|---|
| 0 | Open |
| 1 | Investigating |
| 2 | Active |
| 3 | Resolved |
| 4 | Closed |
| 编码 | 标签 |
|---|---|
| 0 | 打开 |
| 1 | 调查中 |
| 2 | 活跃 |
| 3 | 已解决 |
| 4 | 已关闭 |
Output conventions
输出规范
Present investigation results as: Summary (one sentence) → Root cause
→ Impact (quantified) → Recommended fix (concrete) → Confidence level.
Default to roughly 250 words. Expand to a longer post-mortem format only
when the user explicitly asks for one ("write up a full post-mortem," "give
me the long version"). For routine investigation queries, brevity beats
thoroughness — the user is iterating, not archiving.
For post-mortems add: timeline table, action items with owner/priority.
调查结果按以下结构呈现:摘要(一句话)→ 根因
→ 影响(量化)→ 建议修复方案(具体)→ 置信度。
默认输出约250字。仅当用户明确要求时(如“撰写完整的事后分析报告”、“给我详细版本”),才扩展为更长的事后分析格式。对于常规调查查询,简洁优于详尽——用户处于迭代过程中,而非归档记录。
事后分析需额外添加:时间线表格、带负责人/优先级的行动项。
Feedback loops
反馈循环
Three loops drive compounding value:
- Loop 1 (Intent): Before coding — surface past incidents and patterns via knowledge graph. Only works if Loop 3 persisted findings.
- Loop 2 (Iteration): During dev — validate in dev environments and staging before promoting to production. Cross-environment comparison catches regressions.
- Loop 3 (Production intelligence): After deploy — triage → fix → verify → persist to graph for Loop 1.
三个循环推动价值持续提升:
- 循环1(意图): 编码前——通过知识图谱展示过往事件和模式。该循环仅在循环3留存了调查结果时有效。
- 循环2(迭代): 开发过程中——在开发环境和预发布环境验证后再推广到生产环境。跨环境对比可发现回归问题。
- 循环3(生产智能): 部署后——分类→修复→验证→留存到知识图谱,为循环1提供数据。