Loading...
Loading...
Compare original and translation side by side
--backend pupmcp__datadog-llmo-mcp__get_llmobs_experiment_summarypuppup --version"version""Neither the Datadog MCP server nor the pup CLI is available. Connect the MCP server () or install pup."claude mcp add --scope user --transport http datadog-llmo-mcp 'https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs'
--backend puppup llm-obs <subcommand> [flags][{"type": "text", "text": "<json>"}]pup auth login3a9f1c2btelemetry.intentskill:llm-obs-experiment-analyzer[<inv_id>] — skill:llm-obs-experiment-analyzer:start[<inv_id>] — :startskill:llm-obs-experiment-analyzer:start[3a9f1c2b] — Phase 1: get experiment summary to orient analysis--backend pupmcp__datadog-llmo-mcp__get_llmobs_experiment_summarypuppup --version"version""Datadog MCP服务器和pup CLI均不可用。请连接MCP服务器()或安装pup。"claude mcp add --scope user --transport http datadog-llmo-mcp 'https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs'
--backend puppup llm-obs <subcommand> [flags][{"type": "text", "text": "<json>"}]pup auth login3a9f1c2btelemetry.intentskill:llm-obs-experiment-analyzer[<inv_id>] — skill:llm-obs-experiment-analyzer:start[<inv_id>] — :startskill:llm-obs-experiment-analyzer:start[3a9f1c2b] — Phase 1: get experiment summary to orient analysis| Inputs | Mode |
|---|---|
| 2 IDs, no question | Comparative Exploratory |
| 2 IDs + question | Comparative Q&A |
| 1 ID, no question | Single Exploratory |
| 1 ID + question | Single Q&A |
| 输入内容 | 模式 |
|---|---|
| 2个ID,无问题 | 对比探索性分析 |
| 2个ID + 问题 | 对比问答分析 |
| 1个ID,无问题 | 单实验探索性分析 |
| 1个ID + 问题 | 单实验问答分析 |
/llm-obs-experiment-analyzer <experiment_id_1> [experiment_id_2] [question text] [--output agent|file|notebook]/llm-obs-experiment-analyzer <experiment_id_1> [experiment_id_2] [question text] [--output agent|file|notebook]| Tool | Purpose |
|---|---|
| Get total events, error count, metrics stats, available dimensions |
| Query events with filters, sorting, pagination |
| Get full event details (input, output, expected_output, metrics) |
| Get metric stats overall and segmented by dimension. Use |
| List unique values for a dimension with counts |
| Export report as a Datadog notebook |
| 工具 | 用途 |
|---|---|
| 获取总事件数、错误计数、指标统计信息、可用维度 |
| 通过筛选、排序、分页查询事件 |
| 获取完整事件详情(输入、输出、预期输出、指标) |
| 获取整体指标统计信息及按维度细分的统计信息。使用 |
| 列出维度的唯一值及对应计数 |
| 将报告导出为Datadog笔记本 |
--output agent|file|notebook--outputAskUserQuestion--outputevals/reports/YYYY-MM-DD-<experiment-slug>-analysis.mdmcp__datadog-mcp-core__create_datadog_notebookpup notebooks create --title "TITLE" --file /tmp/nb_cells.jsonTo enable Datadog notebook export, add the MCP server:
claude mcp add --transport http datadog-mcp https://mcp.datadoghq.com/api/unstable/mcp-server
See: https://docs.datadoghq.com/bits_ai/mcp_server/setup/AskUserQuestion--output agent|file|notebook--outputAskUserQuestion--outputevals/reports/YYYY-MM-DD-<experiment-slug>-analysis.mdmcp__datadog-mcp-core__create_datadog_notebookpup notebooks create --title "TITLE" --file /tmp/nb_cells.json要启用Datadog笔记本导出,请添加MCP服务器:
claude mcp add --transport http datadog-mcp https://mcp.datadoghq.com/api/unstable/mcp-server
参考文档:https://docs.datadoghq.com/bits_ai/mcp_server/setup/AskUserQuestionget_llmobs_experiment_summaryerror_count > 0get_llmobs_experiment_dimension_valueserror_typeasyncio.exceptions.cancellederrorget_llmobs_experiment_summaryerror_typemetric_typescorebooleancategoricalget_llmobs_experiment_summaryerror_count > 0get_llmobs_experiment_dimension_valueserror_typeasyncio.exceptions.cancellederrorget_llmobs_experiment_summaryerror_typemetric_typescorebooleancategoricalAskUserQuestion| Class | Condition | Meaning |
|---|---|---|
| | Feature disabled or not implemented — no signal |
| | Always passes — no diagnostic signal |
| | Rarely fails — low diagnostic value |
| | Meaningful failure rate — highest diagnostic value |
| | Partial failures — moderate diagnostic value |
Found N metrics. Full breakdown:
| Metric | Mean | Class |
|--------|------|-------|
| <label> | <mean> | ⚠️ Struggling |
| <label> | <mean> | Interesting |
| <label> | <mean> | Saturated |
| <label> | 1.000 | Perfect (no signal) |
| <label> | 0.000 | Always zero (disabled?) |always_zeroopen_answerc_permanenceAskUserQuestionmean: X.XX — classalways_zeroperfectstrugglinginterestingget_llmobs_experiment_metric_valuesAskUserQuestion| 类别 | 条件 | 含义 |
|---|---|---|
| | 功能已禁用或未实现——无信号 |
| | 始终通过——无诊断信号 |
| | 极少失败——诊断价值低 |
| | 失败率显著——诊断价值最高 |
| | 部分失败——诊断价值中等 |
发现N个指标。完整细分:
| 指标 | 平均值 | 类别 |
|--------|------|-------|
| <label> | <mean> | ⚠️ 表现不佳 |
| <label> | <mean> | 值得关注 |
| <label> | <mean> | 接近完美 |
| <label> | 1.000 | 完美(无信号) |
| <label> | 0.000 | 始终为零(已禁用?) |always_zeroopen_answerc_permanenceAskUserQuestionmean: X.XX — 类别always_zeroperfectstrugglinginterestingget_llmobs_experiment_metric_valueshttps://app.datadoghq.com/llm/experiment-comparisonbaselineExperimentIdexperimentIdstableView=allprojectcompareDatasetIdselectedEvaluationselectedEvaluationhttps://app.datadoghq.com/llm/experiments/{experiment_id}https://app.datadoghq.com/llm/experiment-comparisonbaselineExperimentIdexperimentIdstableView=allprojectcompareDatasetIdselectedEvaluationselectedEvaluationhttps://app.datadoghq.com/llm/experiments/{experiment_id}get_llmobs_experiment_metric_valueshttps://app.datadoghq.com/llm/experiments/{experiment_id}?selectedTab=overview&sp=[{"p":{"experimentId":"{experiment_id}","spanId":"{span_id}"},"i":"experiment-details"}]&spanId={span_id}https://app.datadoghq.com/llm/experiments/{experiment_id}?selectedTab=overview&filter[{dimension}]={value}get_llmobs_experiment_metric_valueshttps://app.datadoghq.com/llm/experiments/{experiment_id}?selectedTab=overview&sp=[{"p":{"experimentId":"{experiment_id}","spanId":"{span_id}"},"i":"experiment-details"}]&spanId={span_id}https://app.datadoghq.com/llm/experiments/{experiment_id}?selectedTab=overview&filter[{dimension}]={value}## Summary & Recommendations## Synthesis## 摘要与建议## 综合总结<path>mcp__datadog-mcp-core__create_datadog_notebookname| Mode | Name |
|---|---|
| Comparative Exploratory | |
| Comparative Q&A | |
| Single Exploratory | |
| Single Q&A | |
where |
cells#### Experiment Analysis Reporttime{ "live_span": "1h" }"Report exported to notebook: <url>"<path>mcp__datadog-mcp-core__create_datadog_notebookname| 模式 | 名称 |
|---|---|
| 对比探索性分析 | |
| 对比问答分析 | |
| 单实验探索性分析 | |
| 单实验问答分析 | |
其中 |
cells#### Experiment Analysis Reporttime{ "live_span": "1h" }"报告已导出至笔记本:<url>"------
Stay active after the report. Answer follow-up questions using the same MCP tools, referencing findings already gathered. Do not re-run analyses you've already performed unless new questions require it.
---
报告交付后保持活跃。使用相同的MCP工具回答跟进问题,参考已收集的发现。除非新问题需要,否则不要重新运行已执行过的分析。
---https://app.datadoghq.com/llm/experiments/{full_uuid}[Baseline \andundefinedhttps://app.datadoghq.com/llm/experiments/{full_uuid}[基线 \和undefinedQuestion: {original question text} (Q&A modes only — omit for Exploratory modes)
问题: {原始问题文本} (仅问答模式——探索性模式省略)
{baseline_short}{candidate_short}{experiment_short}#### Dataset Categories#### 数据集类别tool_accuracytool_accuracyerror_typeerror_type[dimension=value]AgentEvaluator/DatasetAmbiguous[dimension=value]AgentEvaluator/DatasetAmbiguousmax_turns=40max_turns=40
---
---| MCP Tool | pup Command |
|---|---|
| |
| |
| |
| |
| |
| MCP工具 | pup命令 |
|---|---|
| |
| |
| |
| |
| |
| MCP Tool | pup Command |
|---|---|
| |
[{"attributes": {"definition": {"type": "markdown", "text": "## Section\n\nContent."}}, "type": "notebook_cells"}]| MCP工具 | pup命令 |
|---|---|
| |
[{"attributes": {"definition": {"type": "markdown", "text": "## Section\n\nContent."}}, "type": "notebook_cells"}]