cekura-predefined-metrics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCekura Predefined Metrics
Cekura 预定义Metric
Purpose
用途
Predefined metrics are Cekura's built-in evaluators — ready to enable on any agent with no prompt writing required. They cover the most common quality dimensions across accuracy, conversation quality, customer experience, and speech quality. Use this skill to decide which predefined metrics to enable and how to configure them.
预定义Metric是Cekura的内置评估器——可直接在任意Agent上启用,无需编写提示词。它们涵盖了准确性、对话质量、客户体验和语音质量等最常见的质量维度。使用本技能来决定启用哪些预定义Metric以及如何配置它们。
Performing Platform Actions
执行平台操作
When this skill suggests creating, listing, updating, or evaluating something on Cekura, prefer using available platform tools over describing API calls or dashboard steps. In Claude Code with the Cekura plugin installed, these tools are auto-configured and handle authentication, parameter validation, and error handling for you. Fall back to direct API endpoints or dashboard guidance only when no tools are available in the current session.
当本技能建议在Cekura上创建、列出、更新或评估内容时,优先使用可用的平台工具,而非描述API调用或控制台步骤。在安装了Cekura插件的Claude Code中,这些工具已自动配置,可处理身份验证、参数验证和错误处理。仅当当前会话中没有可用工具时,才退回到直接使用API端点或控制台指导。
Core Terminology
核心术语
- Main agent: The client's AI voice agent being tested
- Testing agent: Cekura's simulated caller that exercises the main agent
- Predefined metric: Built-in evaluator shipped by Cekura — no prompt required, identified by a
code - Custom metric: User-authored metric with a custom prompt or (see
custom_code)cekura-metric-design - Simulation: Test runs using Cekura's testing agent against the main agent (Sim column in the catalog)
- Observability: Real production calls flowing through the agent (Obs column in the catalog)
- Project-level toggle: Enables a predefined metric across simulation OR observability for an entire project
- Evaluator attachment: Adds the metric to a specific test scenario; required for the metric to fire on that evaluator
- 主Agent:被测试的客户AI语音Agent
- 测试Agent:Cekura的模拟呼叫者,用于测试主Agent
- 预定义Metric:Cekura提供的内置评估器——无需编写提示词,通过标识
code - 自定义Metric:用户编写的Metric,带有自定义提示词或(详见
custom_code)cekura-metric-design - Simulation:使用Cekura的测试Agent对主Agent进行的测试运行(目录中的Sim列)
- Observability:Agent处理的真实生产呼叫(目录中的Obs列)
- 项目级开关:在整个项目的Simulation或Observability中启用某个预定义Metric
- 评估器关联:将Metric添加到特定测试场景;只有关联后,该Metric才会在该评估器上生效
Predefined vs Custom Metrics
预定义Metric vs 自定义Metric
Enable predefined metrics first. They require zero prompt engineering and cover the most common quality dimensions out of the box. Only reach for a custom metric when:
- You need to evaluate a business-specific workflow (booking flow, escalation protocol, etc.)
- You need to check agent behavior against your specific system prompt
- You need to combine multiple signals into one score
For everything else, a predefined metric will give you reliable, consistent results faster.
优先启用预定义Metric。它们无需提示词工程,开箱即可覆盖最常见的质量维度。仅在以下场景下使用自定义Metric:
- 需要评估特定业务流程(预订流程、升级协议等)
- 需要检查Agent行为是否符合特定系统提示词
- 需要将多个信号合并为一个评分
除此之外,预定义Metric能更快提供可靠、一致的结果。
The Predefined Metrics Workflow
预定义Metric工作流程
- Browse the catalog — Use the four catalog tables below to identify candidate metrics. Filter by Sim/Obs availability, cost, and required constraints.
- Pick a starting set — Begin with the Baseline below. For richer coverage, see for recommended sets per agent type (booking, collections, support, healthcare, voice-quality investigation).
references/selection-by-use-case.md - Toggle on at the project level — Enables the metric for simulation runs (or observability — they are independent toggles). Without this, attaching the metric to an evaluator does nothing.
- Add to individual evaluators — Attaches the metric to specific test scenarios so it fires when that scenario runs.
- Configure if required — Six metrics need or accept configuration (silence thresholds, dropoff/topic node lists, spelling categories, pronunciation phonemes). See for full payload examples.
references/configuration-guide.md - Validate by running — Execute a small batch and review results. If results look off, check the Common Pitfalls below before reaching for custom metrics.
For full API endpoints (list, toggle, attach, configure, re-evaluate), see .
references/api-reference.md- 浏览目录 — 使用下方四个目录表格筛选候选Metric,可按Sim/Obs可用性、成本和约束条件过滤。
- 选择初始集合 — 从下方的基线推荐开始。如需更全面的覆盖,可查看获取针对不同Agent类型(预订、催收、客服、医疗、语音质量排查)的推荐集合。
references/selection-by-use-case.md - 开启项目级开关 — 在Simulation(或Observability,两者为独立开关)中启用该Metric。若未开启此开关,即使将Metric关联到评估器也不会生效。
- 关联到单个评估器 — 将Metric添加到特定测试场景,使其在该场景运行时生效。
- 按需配置 — 有六个Metric需要或支持配置(静音阈值、流失/主题节点列表、拼写类别、发音音素)。完整的 payload 示例详见。
references/configuration-guide.md - 运行验证 — 执行小批量测试并查看结果。若结果不符合预期,在使用自定义Metric前先查看下方的常见误区。
完整API端点(列出、开关、关联、配置、重新评估)详见。
references/api-reference.mdEnabling Predefined Metrics
启用预定义Metric
Two steps are required — missing either means the metric never fires:
- Toggle on at the project level — enables the metric for simulation runs
- Add to individual evaluators — attaches the metric to specific test scenarios
Use to retrieve the full list of available predefined metrics and their IDs. Pass a predefined metric's field when adding it to an agent's metric set.
GET /test_framework/v1/predefined-metrics/code需完成两个步骤——缺少任意一步,Metric都不会生效:
- 开启项目级开关 — 在Simulation运行中启用该Metric
- 关联到单个评估器 — 将Metric添加到特定测试场景
使用获取所有可用预定义Metric及其ID的完整列表。将预定义Metric的字段传入,即可将其添加到Agent的Metric集合中。
GET /test_framework/v1/predefined-metrics/codeCatalog: Accuracy
目录:准确性
| Metric | Output | Cost | Sim | Obs | Notes |
|---|---|---|---|---|---|
| Expected Outcome | 0–100 score | Free | ✓ | — | Requires |
| Hallucination | True/False | 0.6 credits | ✓ | ✓ | Compares agent responses against the Knowledge Base to detect unsupported claims. |
| Mock Tool Call Accuracy | 0–100 score | Free | ✓ | — | Scores whether the right mock tools were called with the right inputs. Requires mock tools configured on the agent. |
| Relevancy | True/False | 0.2 credits | ✓ | ✓ | Checks if agent responses addressed the question asked. Flags off-topic or deflecting replies. |
| Response Consistency | True/False | 0.2 credits | ✓ | ✓ | Detects contradictions — when the agent repeats information incorrectly or contradicts a prior statement. |
| Tool Call Success | True/False | Free | ✓ | ✓ | Checks if any tool call result contains "Error" or "failed". Requires provider integration (assistant ID + API keys) so tool call data appears in the transcript. |
| Transcription Accuracy | 0–100 score | Free for simulations / 1 credit/min for production call logs | ✓ | — | Uses two transcription models for production call logs, compares against ground truth for runs. Requires audio. Production call log evaluation is expensive — use selectively. |
| Voicemail Detection | True/False | 0.2 credits | ✓ | ✓ | Detects if the call reached a voicemail or automated system. Beta. |
| Metric | 输出 | 成本 | Sim | Obs | 说明 |
|---|---|---|---|---|---|
| Expected Outcome | 0–100分 | 免费 | ✓ | — | 需要在评估器上设置 |
| Hallucination | 是/否 | 0.6积分 | ✓ | ✓ | 将Agent回复与知识库对比,检测无依据的表述。 |
| Mock Tool Call Accuracy | 0–100分 | 免费 | ✓ | — | 评分是否调用了正确的模拟工具并传入了正确的参数。需在Agent上配置模拟工具。 |
| Relevancy | 是/否 | 0.2积分 | ✓ | ✓ | 检查Agent回复是否针对用户提出的问题。标记偏离主题或回避性回复。 |
| Response Consistency | 是/否 | 0.2积分 | ✓ | ✓ | 检测矛盾情况——Agent重复错误信息或与之前表述矛盾的内容。 |
| Tool Call Success | 是/否 | 免费 | ✓ | ✓ | 检查工具调用结果是否包含"Error"或"failed"。需集成服务商(助手ID + API密钥),使工具调用数据出现在转录文本中。 |
| Transcription Accuracy | 0–100分 | 模拟测试免费 / 生产呼叫日志1积分/分钟 | ✓ | — | 对生产呼叫日志使用两种转录模型,对比测试运行的基准真值。需要音频。生产呼叫日志评估成本较高——请选择性使用。 |
| Voicemail Detection | 是/否 | 0.2积分 | ✓ | ✓ | 检测呼叫是否转到语音信箱或自动系统。Beta版本。 |
Catalog: Conversation Quality
目录:对话质量
| Metric | Output | Cost | Sim | Obs | Notes |
|---|---|---|---|---|---|
| AI Interrupting User | Count | Free | ✓ | ✓ | Counts how often the agent interrupted the user. For observability, requires stereo audio with separate speaker channels. |
| Appropriate Call Termination by Main Agent | True/False | 0.2 credits | ✓ | ✓ | Checks whether the agent ended the call prematurely and whether the user's concern was resolved. |
| Appropriate Call Termination by Testing Agent | True/False | 0.2 credits | ✓ | ✓ | Checks if the user (testing agent) ended the call abruptly — a signal of poor experience or unresolved issues. |
| Detect Silence in Conversation | True/False | Free | ✓ | ✓ | Returns False if neither speaker speaks for longer than |
| Infrastructure Issues | True/False | Free | ✓ | ✓ | Returns False when the main agent goes silent for longer than |
| Interruption Score | 0–100 score | Free | ✓ | ✓ | Continuous score for how often the agent interrupts the user. Higher = fewer interruptions = better. |
| Latency (in ms) | ms average | Free | ✓ | ✓ | Average response latency. Also reports P25/P50/P75/P90/P95/P99 percentiles. Under 2000ms is considered good. |
| Stop Time after User Interruption (ms) | ms | Free | ✓ | ✓ | Time from user interruption until the agent stops speaking. Lower = more responsive. |
| Unnecessary Repetition Count | Count | 0.2 credits | ✓ | ✓ | Counts how many times the agent unnecessarily repeated itself. |
| Unnecessary Repetition Score | 0–100 score | Free | ✓ | ✓ | Continuous score for repetition quality. Higher = more concise = better. Prefer this over the count metric for trend tracking. |
| User Interrupting AI | Count | Free | ✓ | ✓ | Counts customer interruptions of the agent. High counts signal frustration or poor turn-taking. |
| Metric | 输出 | 成本 | Sim | Obs | 说明 |
|---|---|---|---|---|---|
| AI Interrupting User | 计数 | 免费 | ✓ | ✓ | 统计Agent打断用户的次数。在Observability场景下,需要带有独立声道的立体声音频。 |
| Appropriate Call Termination by Main Agent | 是/否 | 0.2积分 | ✓ | ✓ | 检查Agent是否提前结束呼叫,以及用户的问题是否已解决。 |
| Appropriate Call Termination by Testing Agent | 是/否 | 0.2积分 | ✓ | ✓ | 检查用户(测试Agent)是否突然结束呼叫——这是体验不佳或问题未解决的信号。 |
| Detect Silence in Conversation | 是/否 | 免费 | ✓ | ✓ | 若双方说话者沉默超过 |
| Infrastructure Issues | 是/否 | 免费 | ✓ | ✓ | 若主Agent沉默超过 |
| Interruption Score | 0–100分 | 免费 | ✓ | ✓ | 衡量Agent打断用户频率的连续评分。分数越高=打断次数越少=表现越好。 |
| Latency (in ms) | 平均毫秒数 | 免费 | ✓ | ✓ | 平均响应延迟。同时报告P25/P50/P75/P90/P95/P99百分位数。低于2000ms视为良好。 |
| Stop Time after User Interruption (ms) | 毫秒数 | 免费 | ✓ | ✓ | 用户打断后Agent停止说话的时间。数值越低=响应越快。 |
| Unnecessary Repetition Count | 计数 | 0.2积分 | ✓ | ✓ | 统计Agent不必要重复自身内容的次数。 |
| Unnecessary Repetition Score | 0–100分 | 免费 | ✓ | ✓ | 衡量重复质量的连续评分。分数越高=表述越简洁=表现越好。趋势跟踪时优先使用此指标而非计数指标。 |
| User Interrupting AI | 计数 | 免费 | ✓ | ✓ | 统计用户打断Agent的次数。计数过高表示用户不满或对话轮次管理不佳。 |
Catalog: Customer Experience
目录:客户体验
| Metric | Output | Cost | Sim | Obs | Notes |
|---|---|---|---|---|---|
| CSAT | 0–100 score | 0.2 credits | ✓ | ✓ | Overall customer satisfaction. Scores above 70 indicate satisfaction. Evaluates tone, cooperation, and resolution. |
| Dropoff Node | Enum | 0.2 credits | — | ✓ | Identifies the conversation stage where the call ended. Requires |
| Sentiment | Enum | 0.2 credits | ✓ | ✓ | Classifies user sentiment as Happy, Angry, Neutral, or Disappointed based on tone and word choice across the call. |
| Topic of Call | Enum | 0.2 credits | — | ✓ | Categorizes what the call was about (e.g., billing, technical support). Requires |
| Metric | 输出 | 成本 | Sim | Obs | 说明 |
|---|---|---|---|---|---|
| CSAT | 0–100分 | 0.2积分 | ✓ | ✓ | 整体客户满意度。分数高于70表示满意。评估语调、协作性和问题解决情况。 |
| Dropoff Node | 枚举值 | 0.2积分 | — | ✓ | 识别呼叫结束时的对话阶段。需要配置 |
| Sentiment | 枚举值 | 0.2积分 | ✓ | ✓ | 根据整个呼叫过程中的语调和用词,将用户情感分类为Happy、Angry、Neutral或Disappointed。 |
| Topic of Call | 枚举值 | 0.2积分 | — | ✓ | 对呼叫主题进行分类(如账单、技术支持)。需要配置 |
Catalog: Speech Quality
目录:语音质量
| Metric | Output | Cost | Sim | Obs | Notes |
|---|---|---|---|---|---|
| Average Pitch (in Hz) | Hz | Free | ✓ | ✓ | Average vocal pitch of the main agent during the call. Useful for monitoring voice consistency. |
| Gibberish Detection | True/False | 0.3 credits/min | ✓ | ✓ | Detects garbled or incoherent speech. Requires stereo audio. Beta. |
| Letterwise Pronunciation Detection | True/False | 0.2 credits | ✓ | ✓ | Checks if the agent spells things out letter-by-letter when appropriate (e.g., confirming phone numbers). Requires |
| Pronunciation Check | 0–100 score | 0.2 credits | ✓ | ✓ | Custom word accuracy — compares spoken output against a list of expected phonemes. Requires |
| Speaking Rate | True/False | 0.2 credits | ✓ | ✓ | Detects abrupt changes in the agent's speaking pace. English only. Beta. |
| Talk Ratio | 0.0–1.0 | Free | ✓ | ✓ | Ratio of agent speaking time vs user speaking time. Typical healthy range: 0.4–0.6. Requires stereo audio for observability. |
| Voice Change Detection | True/False | 0.2 credits | ✓ | ✓ | Detects if the agent's voice changes unexpectedly (different speaker, voice model issue). Beta. |
| Voice Tone + Clarity | 0–100 score | 0.2 credits | ✓ | ✓ | Audio quality score — analyzes clarity and jitter. Scores above 70 indicate quality. |
| Words Per Minute (WPM) | WPM | Free | ✓ | ✓ | Speaking speed of the main agent. Useful baseline alongside Average Pitch and Talk Ratio. |
| Metric | 输出 | 成本 | Sim | Obs | 说明 |
|---|---|---|---|---|---|
| Average Pitch (in Hz) | 赫兹 | 免费 | ✓ | ✓ | 主Agent在呼叫过程中的平均音调。用于监控语音一致性。 |
| Gibberish Detection | 是/否 | 0.3积分/分钟 | ✓ | ✓ | 检测混乱或不连贯的语音。需要立体声音频。Beta版本。 |
| Letterwise Pronunciation Detection | 是/否 | 0.2积分 | ✓ | ✓ | 检查Agent是否在合适场景下逐字母拼写(如确认电话号码)。需要配置 |
| Pronunciation Check | 0–100分 | 0.2积分 | ✓ | ✓ | 自定义词汇准确性——将语音输出与预期音素列表对比。需要配置 |
| Speaking Rate | 是/否 | 0.2积分 | ✓ | ✓ | 检测Agent说话语速的突然变化。仅支持英文。Beta版本。 |
| Talk Ratio | 0.0–1.0 | 免费 | ✓ | ✓ | Agent说话时长与用户说话时长的比率。健康范围通常为0.4–0.6。Observability场景下需要立体声音频。 |
| Voice Change Detection | 是/否 | 0.2积分 | ✓ | ✓ | 检测Agent语音是否意外变化(如更换说话者、语音模型问题)。Beta版本。 |
| Voice Tone + Clarity | 0–100分 | 0.2积分 | ✓ | ✓ | 音频质量评分——分析清晰度和抖动。分数高于70表示质量良好。 |
| Words Per Minute (WPM) | 词/分钟 | 免费 | ✓ | ✓ | 主Agent的说话速度。与Average Pitch和Talk Ratio配合作为基准指标。 |
Configuration Reference
配置参考
Some predefined metrics require or support configuration. Pass these as key-value pairs in the metric's object.
configuration| Metric | Config Key | Type | Default | Description |
|---|---|---|---|---|
| Detect Silence in Conversation | | int (seconds) | 10 | Silence threshold for either speaker |
| Infrastructure Issues | | int (seconds) | 10 | Silence threshold for the main agent only |
| Dropoff Node | | array of strings | required | Conversation stage names (e.g., |
| Topic of Call | | array of strings | required | Topic categories (e.g., |
| Letterwise Pronunciation | | array of strings | required | Word categories to check (e.g., |
| Pronunciation Check | | array of objects | required | Phoneme pairs: |
For full payload examples (including IPA tips and naming guidance) see .
references/configuration-guide.md部分预定义Metric需要或支持配置。将配置项作为键值对传入Metric的对象中。
configuration| Metric | 配置键 | 类型 | 默认值 | 描述 |
|---|---|---|---|---|
| Detect Silence in Conversation | | 整数(秒) | 10 | 任意说话者的静音阈值 |
| Infrastructure Issues | | 整数(秒) | 10 | 仅针对主Agent的静音阈值 |
| Dropoff Node | | 字符串数组 | 必填 | 对话阶段名称(如 |
| Topic of Call | | 字符串数组 | 必填 | 主题类别(如 |
| Letterwise Pronunciation | | 字符串数组 | 必填 | 需要检查的词汇类别(如 |
| Pronunciation Check | | 对象数组 | 必填 | 音素对: |
完整的payload示例(包括IPA提示和命名指南)详见。
references/configuration-guide.mdCost & Credits Quick Reference
成本与积分速查
| Cost | Metrics |
|---|---|
| Free (0 credits) | Expected Outcome, Tool Call Success, Mock Tool Call Accuracy, AI Interrupting User, User Interrupting AI, Stop Time after User Interruption, Latency, Detect Silence, Infrastructure Issues, Interruption Score, Unnecessary Repetition Score, Average Pitch, Talk Ratio, Words Per Minute |
| 0.2 credits/call | Relevancy, Response Consistency, Voicemail Detection, Appropriate Call Termination (both), Unnecessary Repetition Count, CSAT, Dropoff Node, Sentiment, Topic of Call, Letterwise Pronunciation, Pronunciation Check, Speaking Rate, Voice Change Detection, Voice Tone + Clarity |
| 0.6 credits/call | Hallucination |
| 0.3 credits/min | Gibberish Detection |
| Free (simulations) | Transcription Accuracy (simulation runs only) |
| 1 credit/min (production call logs) | Transcription Accuracy (production call log evaluation) |
| 成本 | 对应Metric |
|---|---|
| 免费(0积分) | Expected Outcome, Tool Call Success, Mock Tool Call Accuracy, AI Interrupting User, User Interrupting AI, Stop Time after User Interruption, Latency, Detect Silence, Infrastructure Issues, Interruption Score, Unnecessary Repetition Score, Average Pitch, Talk Ratio, Words Per Minute |
| 0.2积分/呼叫 | Relevancy, Response Consistency, Voicemail Detection, Appropriate Call Termination(两类), Unnecessary Repetition Count, CSAT, Dropoff Node, Sentiment, Topic of Call, Letterwise Pronunciation, Pronunciation Check, Speaking Rate, Voice Change Detection, Voice Tone + Clarity |
| 0.6积分/呼叫 | Hallucination |
| 0.3积分/分钟 | Gibberish Detection |
| 免费(模拟测试) | Transcription Accuracy(仅模拟测试) |
| 1积分/分钟(生产呼叫日志) | Transcription Accuracy(生产呼叫日志评估) |
Baseline — Always Enable
基线推荐--必选指标
At minimum, every agent should have these four enabled for simulation (and the last three also for observability):
| Metric | Why |
|---|---|
| Expected Outcome | Without this, runs only tell you if the call completed — not if the agent actually did the right thing |
| Infrastructure Issues | Catches the agent going silent for 10+ seconds — invisible in pass/fail |
| Tool Call Success | Detects broken integrations before they impact real users |
| Latency | Baseline performance tracking; P95/P99 reveal outliers that averages hide |
For a richer baseline, also add: CSAT, Sentiment, and Unnecessary Repetition Score.
For agent-type-specific recommended sets (booking, collections, support, healthcare, voice-quality investigation), see .
references/selection-by-use-case.md每个Agent至少应在Simulation中启用以下四个指标(最后三个同时在Observability中启用):
| Metric | 原因 |
|---|---|
| Expected Outcome | 若无此指标,仅能知道呼叫是否完成——无法判断Agent是否完成了正确的任务 |
| Infrastructure Issues | 捕捉Agent沉默10秒以上的情况——这在通过/失败判定中无法体现 |
| Tool Call Success | 在影响真实用户前检测集成故障 |
| Latency | 基准性能跟踪;P95/P99百分位数能揭示平均值无法体现的异常值 |
如需更丰富的基线,还可添加:CSAT、Sentiment和Unnecessary Repetition Score。
针对不同Agent类型(预订、催收、客服、医疗、语音质量排查)的推荐集合,详见。
references/selection-by-use-case.mdKey Constraints
关键约束
- Audio required: Transcription Accuracy and Gibberish Detection need audio data. Not available for text-only runs.
- Stereo required (observability): AI Interrupting User, User Interrupting AI, Talk Ratio, and Gibberish Detection require stereo recordings with separate speaker channels for observability calls.
- Simulation only: Transcription Accuracy, Mock Tool Call Accuracy, Expected Outcome.
- Observability only: Dropoff Node, Topic of Call.
- English only: Speaking Rate.
- Requires configuration: Dropoff Node, Topic of Call, Letterwise Pronunciation Detection, Pronunciation Check — will not produce meaningful results without the configuration keys set.
- Requires provider integration: Tool Call Success requires the agent's provider assistant ID configured on Cekura so tool call data appears in transcripts.
- 需要音频:Transcription Accuracy和Gibberish Detection需要音频数据。仅文本运行无法使用。
- 需要立体声(Observability):AI Interrupting User、User Interrupting AI、Talk Ratio和Gibberish Detection在Observability场景下需要带有独立声道的立体声录音。
- 仅适用于Simulation:Transcription Accuracy、Mock Tool Call Accuracy、Expected Outcome。
- 仅适用于Observability:Dropoff Node、Topic of Call。
- 仅支持英文:Speaking Rate。
- 需要配置:Dropoff Node、Topic of Call、Letterwise Pronunciation Detection、Pronunciation Check——未配置对应键值对时无法产生有意义的结果。
- 需要服务商集成:Tool Call Success需要在Cekura上配置Agent的服务商助手ID,使工具调用数据出现在转录文本中。
Common Pitfalls
常见误区
- Enabling metrics without completing both activation steps (project toggle AND evaluator assignment) — metrics appear available but never fire
- Using Detect Silence and Infrastructure Issues interchangeably — they measure different things (both speakers vs agent only)
- Expecting Transcription Accuracy on observability calls — it's simulation-only
- Forgetting when using Expected Outcome — without it the metric has nothing to evaluate against
expected_outcome_prompt - Using Expected Outcome to evaluate voice characteristics (tone, pronunciation, speech quality) — it only has access to the transcript; use Speech Quality metrics for audio evaluation
- Writing with terms like "user", "assistant", "bot", or "AI" — always use "main agent" and "testing agent" to match Cekura's transcript labeling
expected_outcome_prompt - Enabling Dropoff Node or Topic of Call without configuring /
dropoff_nodes— results will be meaninglesstopic_nodes - Using Gibberish Detection on mono recordings — it requires stereo audio
- 仅启用Metric但未完成两个激活步骤(项目开关+评估器关联)——Metric显示可用但从未生效
- 混淆使用Detect Silence和Infrastructure Issues——两者测量的内容不同(所有说话者vs仅Agent)
- 期望在Observability场景下使用Transcription Accuracy——该指标仅适用于Simulation
- 使用Expected Outcome时忘记设置——若无此参数,Metric无评估依据
expected_outcome_prompt - 使用Expected Outcome评估语音特征(语调、发音、语音质量)——该指标仅能访问转录文本;语音评估请使用语音质量类Metric
- 编写时使用"用户"、"助手"、"机器人"或"AI"等术语——请始终使用"主Agent"和"测试Agent",与Cekura的转录文本标记一致
expected_outcome_prompt - 启用Dropoff Node或Topic of Call但未配置/
dropoff_nodes——结果无意义topic_nodes - 在单声道录音上使用Gibberish Detection——该指标需要立体声音频
Next Steps
下一步
After selecting predefined metrics, the user typically needs:
- Create or configure metrics → invoke cekura-metric-design for custom metrics to complement predefined ones
- Improve a metric that's underperforming → invoke cekura-metric-improvement for the feedback and labs cycle
- Attach metrics to test scenarios → invoke cekura-eval-design to wire up metrics in evaluators
选择预定义Metric后,用户通常需要:
- 创建或配置Metric → 调用cekura-metric-design创建自定义Metric以补充预定义Metric
- 优化表现不佳的Metric → 调用cekura-metric-improvement获取反馈和实验室周期支持
- 将Metric关联到测试场景 → 调用cekura-eval-design在评估器中配置Metric
Documentation
文档
- Pre-defined metrics reference: https://docs.cekura.ai/documentation/key-concepts/metrics/pre-defined-metrics
- Public docs: https://docs.cekura.ai
- Concepts: https://docs.cekura.ai/documentation/key-concepts/
Additional Resources
额外资源
Reference Files (loaded on demand)
参考文件(按需加载)
- — Full payload examples for every configurable predefined metric (silence thresholds, dropoff/topic node lists, spelling categories, IPA phoneme pairs)
references/configuration-guide.md - — Public API endpoints for listing, toggling, attaching, configuring, and re-evaluating predefined metrics, with an end-to-end example flow
references/api-reference.md - — Recommended predefined metric sets by agent type: booking, collections, customer support, healthcare, and voice-quality investigation
references/selection-by-use-case.md
- — 所有可配置预定义Metric的完整payload示例(静音阈值、流失/主题节点列表、拼写类别、IPA音素对)
references/configuration-guide.md - — 用于列出、开关、关联、配置和重新评估预定义Metric的公开API端点,包含端到端示例流程
references/api-reference.md - — 针对不同Agent类型(预订、催收、客服、医疗、语音质量排查)的预定义Metric推荐集合
references/selection-by-use-case.md