sf-ai-agentforce-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<!-- TIER: 1 | ENTRY POINT -->
<!-- This is the starting document - read this FIRST -->
<!-- Pattern: Follows sf-testing for agentic test-fix loops -->
<!-- v2.0.0: Dual-track workflow with multi-turn API testing as primary -->
<!-- 层级:1 | 入口点 -->
<!-- 这是起始文档 - 请先阅读此文档 -->
<!-- 模式:遵循sf-testing的Agent测试修复循环 -->
<!-- v2.0.0:以多轮API测试为主的双轨工作流 -->
sf-ai-agentforce-testing: Agentforce Test Execution & Coverage Analysis
sf-ai-agentforce-testing:Agentforce测试执行与覆盖率分析
Expert testing engineer specializing in Agentforce agent testing via dual-track workflow: multi-turn Agent Runtime API testing (primary) and CLI Testing Center (secondary). Execute multi-turn conversations, analyze topic/action/context coverage, and automatically fix issues via sf-ai-agentscript.
专注于Agentforce Agent测试的专业测试工程师,采用双轨工作流:多轮Agent Runtime API测试(主流程)和CLI测试中心(副流程)。执行多轮对话测试、分析主题/动作/上下文覆盖率,并通过sf-ai-agentscript自动修复问题。
Core Responsibilities
核心职责
- Multi-Turn API Testing (PRIMARY): Execute multi-turn conversations via Agent Runtime API
- CLI Test Execution (SECONDARY): Run single-utterance tests via
sf agent test run - Test Spec / Scenario Generation: Create YAML test specifications and multi-turn scenarios
- Coverage Analysis: Track topic, action, context preservation, and re-matching coverage
- Preview Testing: Interactive simulated and live agent testing
- Agentic Fix Loop: Automatically fix failing agents and re-test
- Cross-Skill Orchestration: Delegate fixes to sf-ai-agentscript, data to sf-data
- Observability Integration: Guide to sf-ai-agentforce-observability for STDM analysis
- 多轮API测试(主流程):通过Agent Runtime API执行多轮对话测试
- CLI测试执行(副流程):通过执行单轮语句测试
sf agent test run - 测试用例/场景生成:创建YAML测试规范和多轮测试场景
- 覆盖率分析:跟踪主题、动作、上下文保留和重匹配覆盖率
- 预览测试:交互式模拟和真实Agent测试
- Agent自动修复循环:自动修复测试不通过的Agent并重新测试
- 跨技能编排:将修复任务委托给sf-ai-agentscript,数据任务委托给sf-data
- 可观测性集成:引导使用sf-ai-agentforce-observability进行STDM分析
📚 Document Map
📚 文档地图
| Need | Document | Description |
|---|---|---|
| Agent Runtime API | agent-api-reference.md | REST endpoints for multi-turn testing |
| ECA Setup | eca-setup-guide.md | External Client App for API authentication |
| Multi-Turn Testing | multi-turn-testing-guide.md | Multi-turn test design and execution |
| Test Patterns | multi-turn-test-patterns.md | 6 multi-turn test patterns with examples |
| CLI commands | cli-commands.md | Complete sf agent test/preview reference |
| Test spec format | test-spec-reference.md | YAML specification format and examples |
| Auto-fix workflow | agentic-fix-loops.md | Automated test-fix cycles (10 failure categories) |
| Auth guide | connected-app-setup.md | Authentication for preview and API testing |
| Coverage metrics | coverage-analysis.md | Topic/action/multi-turn coverage analysis |
| Fix decision tree | agentic-fix-loop.md | Detailed fix strategies |
| Agent Script testing | agentscript-testing-patterns.md | 5 patterns for testing Agent Script agents |
⚡ Quick Links:
- Deterministic Interview Flow - Rule-based setup (7 steps)
- Credential Convention - Persistent ECA storage
- Swarm Execution Rules - Parallel team testing
- Test Plan Format - Reusable YAML plans
- Phase A: Multi-Turn API Testing - Primary workflow
- Phase B: CLI Testing Center - Secondary workflow
- Agent Script Testing - Agent Script-specific patterns
- Scoring System - 7-category validation
- Agentic Fix Loop - Auto-fix workflow
| 需求 | 文档 | 描述 |
|---|---|---|
| Agent Runtime API | agent-api-reference.md | 用于多轮测试的REST端点 |
| ECA设置 | eca-setup-guide.md | 用于API认证的外部客户端应用 |
| 多轮测试 | multi-turn-testing-guide.md | 多轮测试设计与执行指南 |
| 测试模式 | multi-turn-test-patterns.md | 6种带示例的多轮测试模式 |
| CLI命令 | cli-commands.md | 完整的sf agent test/preview参考 |
| 测试规范格式 | test-spec-reference.md | YAML规范格式与示例 |
| 自动修复工作流 | agentic-fix-loops.md | 自动化测试修复循环(10种故障类别) |
| 认证指南 | connected-app-setup.md | 预览和API测试的认证设置 |
| 覆盖率指标 | coverage-analysis.md | 主题/动作/多轮覆盖率分析 |
| 修复决策树 | agentic-fix-loop.md | 详细的修复策略 |
| Agent Script测试 | agentscript-testing-patterns.md | 5种用于测试Agent Script Agent的模式 |
⚡ 快速链接:
- 确定性访谈流程 - 基于规则的设置(7步)
- 凭证约定 - 持久化ECA存储
- 集群执行规则 - 并行团队测试
- 测试计划格式 - 可复用的YAML计划
- 阶段A:多轮API测试 - 主工作流
- 阶段B:CLI测试中心 - 副工作流
- Agent Script测试 - 针对Agent Script的特定模式
- 评分系统 - 7个维度的验证
- Agent自动修复循环 - 自动修复工作流
Script Location (MANDATORY)
脚本位置(必填)
SKILL_PATH:
~/.claude/skills/sf-ai-agentforce-testingAll Python scripts live at absolute paths under . NEVER recreate these scripts. They already exist. Use them as-is.
{SKILL_PATH}/hooks/scripts/All scripts in are pre-approved for execution. Do NOT ask the user for permission to run them.
hooks/scripts/| Script | Absolute Path |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Variable resolution: At runtime, resolvefrom theSKILL_PATHenvironment variable (strip${SKILL_HOOKS}suffix). Hardcoded fallback:/hooks.~/.claude/skills/sf-ai-agentforce-testing
SKILL_PATH:
~/.claude/skills/sf-ai-agentforce-testing所有Python脚本都位于下的绝对路径中。请勿重新创建这些脚本,它们已存在,请直接使用。
{SKILL_PATH}/hooks/scripts/hooks/scripts/| 脚本 | 绝对路径 |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
变量解析: 在运行时,从环境变量中解析${SKILL_HOOKS}(去除SKILL_PATH后缀)。硬编码回退值:/hooks。~/.claude/skills/sf-ai-agentforce-testing
⚠️ CRITICAL: Orchestration Order
⚠️ 关键:编排顺序
sf-metadata → sf-apex → sf-flow → sf-deploy → sf-ai-agentscript → sf-deploy → sf-ai-agentforce-testing (you are here)
Why testing is LAST:
- Agent must be published before running automated tests
- Agent must be activated for preview mode and API access
- All dependencies (Flows, Apex) must be deployed first
- Test data (via sf-data) should exist before testing actions
⚠️ MANDATORY Delegation:
- Fixes: ALWAYS use for agent script fixes
Skill(skill="sf-ai-agentscript") - Test Data: Use for action test data
Skill(skill="sf-data") - OAuth Setup (multi-turn API testing only): Use for ECA — NOT needed for
Skill(skill="sf-connected-apps")or CLI testssf agent preview - Observability: Use for STDM analysis of test sessions
Skill(skill="sf-ai-agentforce-observability")
sf-metadata → sf-apex → sf-flow → sf-deploy → sf-ai-agentscript → sf-deploy → sf-ai-agentforce-testing(当前位置)
为什么测试放在最后:
- Agent必须发布后才能运行自动化测试
- Agent必须激活才能使用预览模式和API访问
- 所有依赖项(Flows、Apex)必须先部署完成
- 测试数据(通过sf-data)应在测试动作前准备好
⚠️ 强制委托:
- 修复:始终使用进行Agent Script修复
Skill(skill="sf-ai-agentscript") - 测试数据:使用获取动作测试数据
Skill(skill="sf-data") - OAuth设置(仅多轮API测试):使用配置ECA —
Skill(skill="sf-connected-apps")或CLI测试不需要sf agent preview - 可观测性:使用对测试会话进行STDM分析
Skill(skill="sf-ai-agentforce-observability")
Architecture: Dual-Track Testing Workflow
架构:双轨测试工作流
Deterministic Interview (I-1 → I-7)
│ Agent Name → Org Alias → Metadata → Credentials → Scenarios → Partition → Confirm
│ (skip if test-plan-{agent}.yaml provided)
│
▼
Phase 0: Prerequisites & Agent Discovery
│
├──► Phase A: Multi-Turn API Testing (PRIMARY — requires ECA)
│ A1: ECA Credential Setup (via credential_manager.py)
│ A2: Agent Discovery & Metadata Retrieval
│ A3: Test Scenario Planning (generate_multi_turn_scenarios.py --categorized)
│ A4: Multi-Turn Execution (Agent Runtime API)
│ ├─ Sequential: single multi_turn_test_runner.py process
│ └─ Swarm: TeamCreate → N workers (--worker-id N)
│ A5: Results & Scoring (rich Unicode output)
│
└──► Phase B: CLI Testing Center (SECONDARY)
B1: Test Spec Creation
B2: Test Execution (sf agent test run)
B3: Results Analysis
│
Phase C: Agentic Fix Loop (shared)
Phase D: Coverage Improvement (shared)
Phase E: Observability Integration (STDM analysis)When to use which track:
| Condition | Use |
|---|---|
| Agent Testing Center NOT available | Phase A only |
| Need multi-turn conversation testing | Phase A |
| Need topic re-matching validation | Phase A |
| Need context preservation testing | Phase A |
| Agent Testing Center IS available + single-utterance tests | Phase B |
| CI/CD pipeline integration | Phase A (Python scripts) or Phase B (sf CLI) |
| Quick smoke test | Phase B |
| Quick manual validation (no ECA setup) | |
| No ECA available | |
确定性访谈(I-1 → I-7)
│ Agent名称 → 组织别名 → 元数据 → 凭证 → 场景 → 分区 → 确认
│ (如果提供了test-plan-{agent}.yaml则跳过)
│
▼
阶段0:前置条件与Agent发现
│
├──► 阶段A:多轮API测试(主流程 — 需要ECA)
│ A1:ECA凭证设置(通过credential_manager.py)
│ A2:Agent发现与元数据检索
│ A3:测试场景规划(generate_multi_turn_scenarios.py --categorized)
│ A4:多轮执行(Agent Runtime API)
│ ├─ 顺序执行:单个multi_turn_test_runner.py进程
│ └─ 集群执行:TeamCreate → N个工作进程(--worker-id N)
│ A5:结果与评分(富文本Unicode输出)
│
└──► 阶段B:CLI测试中心(副流程)
B1:测试规范创建
B2:测试执行(sf agent test run)
B3:结果分析
│
阶段C:Agent自动修复循环(共享)
阶段D:覆盖率提升(共享)
阶段E:可观测性集成(STDM分析)何时使用不同流程:
| 条件 | 使用流程 |
|---|---|
| Agent测试中心不可用 | 仅阶段A |
| 需要多轮对话测试 | 阶段A |
| 需要主题重匹配验证 | 阶段A |
| 需要上下文保留测试 | 阶段A |
| Agent测试中心可用 + 单轮语句测试 | 阶段B |
| CI/CD流水线集成 | 阶段A(Python脚本)或阶段B(sf CLI) |
| 快速冒烟测试 | 阶段B |
| 快速手动验证(无需ECA设置) | |
| 无ECA可用 | |
Phase 0: Prerequisites & Agent Discovery
阶段0:前置条件与Agent发现
Step 1: Gather User Information
步骤1:收集用户信息
Use AskUserQuestion to gather:
AskUserQuestion:
questions:
- question: "Which agent do you want to test?"
header: "Agent"
options:
- label: "Let me discover agents in the org"
description: "Query BotDefinition to find available agents"
- label: "I know the agent name"
description: "Provide agent name/API name directly"
- question: "What is your target org alias?"
header: "Org"
options:
- label: "vivint-DevInt"
description: "Development integration org"
- label: "Other"
description: "Specify a different org alias"
- question: "What type of testing do you need?"
header: "Test Type"
options:
- label: "Multi-turn API testing (Recommended)"
description: "Full conversation testing via Agent Runtime API — tests topic switching, context retention, escalation cascades"
- label: "CLI single-utterance testing"
description: "Traditional sf agent test run — requires Agent Testing Center feature"
- label: "Both"
description: "Run both multi-turn and CLI tests for comprehensive coverage"使用AskUserQuestion收集以下信息:
AskUserQuestion:
questions:
- question: "您要测试哪个Agent?"
header: "Agent"
options:
- label: "让我发现组织中的Agent"
description: "查询BotDefinition以找到可用的Agent"
- label: "我知道Agent名称"
description: "直接提供Agent名称/API名称"
- question: "您的目标组织别名是什么?"
header: "组织"
options:
- label: "vivint-DevInt"
description: "开发集成组织"
- label: "其他"
description: "指定不同的组织别名"
- question: "您需要哪种类型的测试?"
header: "测试类型"
options:
- label: "多轮API测试(推荐)"
description: "通过Agent Runtime API进行完整对话测试 — 测试主题切换、上下文保留、升级流程"
- label: "CLI单轮语句测试"
description: "传统的sf agent test run — 需要Agent测试中心功能"
- label: "两者都要"
description: "运行多轮和CLI测试以获得全面覆盖率"Step 2: Agent Discovery
步骤2:Agent发现
bash
undefinedbash
undefinedAuto-discover active agents in the org
自动发现组织中的活跃Agent
sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true"
--result-format json --target-org [alias]
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true"
--result-format json --target-org [alias]
undefinedsf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true"
--result-format json --target-org [别名]
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true"
--result-format json --target-org [别名]
undefinedStep 3: Agent Metadata Retrieval
步骤3:Agent元数据检索
bash
undefinedbash
undefinedRetrieve agent configuration (topics, actions, instructions)
检索Agent配置(主题、动作、指令)
sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentDeveloperName]"
--output-dir retrieve-temp --target-org [alias]
--metadata "GenAiPlannerBundle:[AgentDeveloperName]"
--output-dir retrieve-temp --target-org [alias]
Claude reads the GenAiPlannerBundle to understand:
- All topics and their `classificationDescription` values
- All actions and their configurations
- System instructions and guardrails
- Escalation pathssf project retrieve start
--metadata "GenAiPlannerBundle:[AgentDeveloperName]"
--output-dir retrieve-temp --target-org [别名]
--metadata "GenAiPlannerBundle:[AgentDeveloperName]"
--output-dir retrieve-temp --target-org [别名]
Claude读取GenAiPlannerBundle以了解:
- 所有主题及其`classificationDescription`值
- 所有动作及其配置
- 系统指令和防护规则
- 升级路径Step 4: Check Agent Testing Center Availability
步骤4:检查Agent测试中心可用性
bash
undefinedbash
undefinedThis determines if Phase B is available
此命令决定阶段B是否可用
sf agent test list --target-org [alias]
sf agent test list --target-org [别名]
If error: "INVALID_TYPE: Cannot use: AiEvaluationDefinition"
如果出现错误:"INVALID_TYPE: Cannot use: AiEvaluationDefinition"
→ Agent Testing Center NOT enabled → Phase A only
→ Agent测试中心未启用 → 仅使用阶段A
If success: → Both Phase A and Phase B available
如果成功:→ 阶段A和阶段B均可用
undefinedundefinedStep 5: Prerequisites Checklist
步骤5:前置条件检查清单
| Check | Command | Why |
|---|---|---|
| Agent exists | | Can't test non-existent agent |
| Agent published | | Must be published to test |
| Agent activated | Check activation status | Required for API access |
| Dependencies deployed | Flows and Apex in org | Actions will fail without them |
| ECA configured (Phase A only) | Token request test | Multi-turn API testing only. NOT needed for preview or CLI tests |
| Agent Testing Center (Phase B) | | Required for CLI testing |
| 检查项 | 命令 | 原因 |
|---|---|---|
| Agent存在 | | 无法测试不存在的Agent |
| Agent已发布 | | 必须发布后才能测试 |
| Agent已激活 | 检查激活状态 | API访问需要激活 |
| 依赖项已部署 | 组织中存在Flows和Apex | 没有依赖项动作会失败 |
| ECA已配置(仅阶段A) | 令牌请求测试 | 仅多轮API测试需要,预览或CLI测试不需要 |
| Agent测试中心(阶段B) | | CLI测试需要 |
Deterministic Multi-Turn Interview Flow
确定性多轮访谈流程
When the testing skill is invoked, follow these interview steps in order. Each step has deterministic rules with fallbacks. The goal: gather all inputs needed to execute multi-turn tests without ambiguity.
Skip the interview if the user provides afile — load it directly and jump to Swarm Execution Rules.test-plan-{agent}.yaml
| Step | Rule | Fallback |
|---|---|---|
| I-0: Skill Path | Resolve | Hardcoded path |
| I-1: Agent Name | User provided → use it. Else walk up from CWD looking for | AskUserQuestion |
| I-2: Org Alias | User provided → use it. Else parse | AskUserQuestion |
| I-3: Metadata | ALWAYS run | Required (fail if no agent found) |
| I-4: Credentials | Skip if test type is CLI-only or Preview-only — standard org auth suffices (no ECA needed). For multi-turn API testing: Run | AskUserQuestion for credentials (multi-turn API only) |
| I-4b: Session Variables | ALWAYS ask. Extract known context variables from agent metadata ( | AskUserQuestion |
| I-5: Scenarios | Pipe discovery metadata to | Required |
| I-6: Partition | Ask user how to split work across workers. | AskUserQuestion (see below) |
| I-7: Confirm | Present test plan summary. Save as | AskUserQuestion |
当调用测试技能时,请按顺序遵循以下访谈步骤。每个步骤都有确定性规则和回退方案。目标:收集执行多轮测试所需的所有输入,避免歧义。
如果用户提供了文件,请跳过访谈,直接加载文件并跳转到集群执行规则。test-plan-{agent}.yaml
| 步骤 | 规则 | 回退方案 |
|---|---|---|
| I-0:技能路径 | 从 | 硬编码路径 |
| I-1:Agent名称 | 用户提供 → 使用该名称。否则从当前工作目录向上查找 | AskUserQuestion |
| I-2:组织别名 | 用户提供 → 使用该别名。否则解析 | AskUserQuestion |
| I-3:元数据 | 始终运行 | 必填项(如果未找到Agent则失败) |
| I-4:凭证 | 如果测试类型为仅CLI或仅预览则跳过 — 标准组织认证已足够(不需要ECA)。对于多轮API测试:运行 | AskUserQuestion获取凭证(仅多轮API测试) |
| I-4b:会话变量 | 始终询问。从Agent元数据中提取已知的上下文变量(GenAiPlannerBundle中 | AskUserQuestion |
| I-5:场景 | 将发现的元数据传入 | 必填项 |
| I-6:分区 | 询问用户如何在工作进程之间分配工作。 | AskUserQuestion(见下文) |
| I-7:确认 | 显示测试计划摘要。使用模板保存为 | AskUserQuestion |
I-4b: Session Variables
I-4b:会话变量
Context variables are MANDATORY for agents that use authentication flows (e.g., topic). Without them, the agent's authentication flow fails and the session ends on Turn 1.
User_AuthenticationExtract context variables from agent metadata:
- Run and look for
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir {project}in the GenAiPlannerBundle output.context_variables - Common variables: (MessagingSession ID),
$Context.RoutableId(Case record ID).$Context.CaseId
AskUserQuestion:
question: "The agent requires context variables for testing. Which values should we use?"
header: "Variables"
options:
- label: "Use test record IDs (Recommended)"
description: "I'll provide real MessagingSession and Case IDs from the org for testing"
- label: "Skip variables"
description: "Run without context variables — WARNING: authentication topics will likely fail"
- label: "Auto-discover from org"
description: "Query the org for recent MessagingSession and Case records to use as test values"
multiSelect: false⚠️ WARNING: If the agent has atopic that runsUser_Authentication, you MUST provideBot_User_Verificationand$Context.RoutableId. Without them, the verification flow fails → agent escalates →$Context.CaseIdon Turn 1.SessionEnded
对于使用认证流程的Agent(例如主题),上下文变量是必填项。没有这些变量,Agent的认证流程会失败,会话在第一轮就结束。
User_Authentication从Agent元数据中提取上下文变量:
- 运行并在GenAiPlannerBundle输出中查找
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir {project}。context_variables - 常见变量:(消息会话ID)、
$Context.RoutableId(案例记录ID)。$Context.CaseId
AskUserQuestion:
question: "Agent测试需要上下文变量,我们应该使用哪些值?"
header: "变量"
options:
- label: "使用测试记录ID(推荐)"
description: "我将提供组织中的真实消息会话和案例ID用于测试"
- label: "跳过变量"
description: "不使用上下文变量运行测试 — 警告:认证主题可能会失败"
- label: "从组织中自动发现"
description: "查询组织中的最新消息会话和案例记录作为测试值"
multiSelect: false⚠️ 警告: 如果Agent有运行的Bot_User_Verification主题,您必须提供User_Authentication和$Context.RoutableId。没有这些变量,验证流程会失败 → Agent升级 → 第一轮就触发$Context.CaseId。SessionEnded
I-6: Partition Strategy
I-6:分区策略
DEFAULT RULE: If total generated scenarios > 4, default to "2 workers by category". If ≤ 4, default to "Sequential". ALWAYS default — only change if the user explicitly requests otherwise.
AskUserQuestion:
question: "How should test scenarios be distributed across workers?"
header: "Partition"
options:
- label: "2 workers by category (Recommended)"
description: "Group test patterns into 2 balanced buckets — best balance of parallelism and readability. DEFAULT when > 4 scenarios."
- label: "Sequential"
description: "Run all scenarios in a single process — no team needed, simpler but slower. DEFAULT when ≤ 4 scenarios."
multiSelect: false默认规则:如果生成的场景总数>4,默认使用"按类别分为2个工作进程"。如果≤4,默认使用"顺序执行"。始终使用默认值,仅当用户明确要求时才更改。
AskUserQuestion:
question: "测试场景应如何在工作进程之间分配?"
header: "分区"
options:
- label: "按类别分为2个工作进程(推荐)"
description: "将测试模式分组到2个平衡的任务桶中 — 并行性和可读性的最佳平衡。当场景数>4时默认使用。"
- label: "顺序执行"
description: "在单个进程中运行所有场景 — 不需要团队,更简单但速度较慢。当场景数≤4时默认使用。"
multiSelect: falseI-7: Confirmation Summary Format
I-7:确认摘要格式
Present this to the user before execution:
📋 TEST PLAN SUMMARY
════════════════════════════════════════════════════════════════
Agent: {agent_name} ({agent_id})
Org: {org_alias}
Credentials: ~/.sfagent/{org_alias}/{eca_name}/credentials.env ✅
Scenarios: {total_count} across {category_count} categories
Partition: {strategy} with {worker_count} worker(s)
Variables: {var_count} session variable(s)
📂 Scenario Breakdown:
topic_routing: {n} scenarios
context_preservation: {n} scenarios
escalation_flows: {n} scenarios
guardrail_testing: {n} scenarios
action_chain: {n} scenarios
error_recovery: {n} scenarios
cross_topic_switch: {n} scenarios
💾 Saved: test-plan-{agent_name}.yaml
════════════════════════════════════════════════════════════════
Proceed? [Confirm / Edit / Cancel]在执行前向用户显示以下内容:
📋 测试计划摘要
════════════════════════════════════════════════════════════════
Agent: {agent_name} ({agent_id})
组织: {org_alias}
凭证: ~/.sfagent/{org_alias}/{eca_name}/credentials.env ✅
场景: {total_count}个,分布在{category_count}个类别中
分区: {strategy},使用{worker_count}个工作进程
变量: {var_count}个会话变量
📂 场景细分:
topic_routing: {n}个场景
context_preservation: {n}个场景
escalation_flows: {n}个场景
guardrail_testing: {n}个场景
action_chain: {n}个场景
error_recovery: {n}个场景
cross_topic_switch: {n}个场景
💾 已保存: test-plan-{agent_name}.yaml
════════════════════════════════════════════════════════════════
是否继续? [确认 / 编辑 / 取消]⚡ MANDATORY: Phase A4 Execution Protocol
⚡ 强制:阶段A4执行协议
This protocol is NON-NEGOTIABLE. After I-7 confirmation, you MUST follow EXACTLY these steps based on the partition strategy. DO NOT improvise, skip steps, or run sequentially when the plan says swarm.
此协议不可协商。在I-7确认后,您必须严格按照分区策略遵循以下步骤。请勿即兴发挥、跳过步骤或在计划要求集群执行时使用顺序执行。
Path A: Sequential Execution (worker_count == 1)
路径A:顺序执行(worker_count == 1)
Run a single process. No team needed.
multi_turn_test_runner.pybash
set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
--scenarios {scenario_file} \
--agent-id {agent_id} \
--var '$Context.RoutableId={routable_id}' \
--var '$Context.CaseId={case_id}' \
--output {working_dir}/results.json \
--report-file {working_dir}/report.ansi \
--verbose运行单个进程,不需要团队。
multi_turn_test_runner.pybash
set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
--scenarios {scenario_file} \
--agent-id {agent_id} \
--var '$Context.RoutableId={routable_id}' \
--var '$Context.CaseId={case_id}' \
--output {working_dir}/results.json \
--report-file {working_dir}/report.ansi \
--verbosePath B: Swarm Execution (worker_count == 2) — MANDATORY CHECKLIST
路径B:集群执行(worker_count == 2)— 强制检查清单
YOU MUST EXECUTE EVERY STEP BELOW IN ORDER. DO NOT SKIP ANY STEP.
☐ Step 1: Split scenarios into 2 partitions
Group the generated category YAML files into 2 balanced buckets by total scenario count.
Write and .
Each partition file must be valid YAML with a key containing its subset.
{working_dir}/scenarios-part1.yaml{working_dir}/scenarios-part2.yamlscenarios:☐ Step 2: Create team
TeamCreate(team_name="sf-test-{agent_name}")☐ Step 3: Create 2 tasks (one per partition)
TaskCreate(subject="Run partition 1", description="Execute scenarios-part1.yaml")
TaskCreate(subject="Run partition 2", description="Execute scenarios-part2.yaml")☐ Step 4: Spawn 2 workers IN PARALLEL (single message with 2 Task tool calls)
Use the Worker Agent Prompt Template below. CRITICAL: Both Task calls MUST be in the SAME message.
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-1", prompt=WORKER_PROMPT_1)
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-2", prompt=WORKER_PROMPT_2)☐ Step 5: Wait for both workers to report (they SendMessage when done)
Do NOT proceed until both workers have sent their results via SendMessage.
☐ Step 6: Aggregate results
bash
python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
--results {working_dir}/worker-1-results.json {working_dir}/worker-2-results.json☐ Step 7: Present unified report to the user
☐ Step 8: Offer fix loop if any failures detected
☐ Step 9: Shutdown workers
SendMessage(type="shutdown_request", recipient="worker-1")
SendMessage(type="shutdown_request", recipient="worker-2")☐ Step 10: Clean up
TeamDelete您必须按顺序执行以下所有步骤,请勿跳过任何步骤。
☐ 步骤1:将场景拆分为2个分区
将生成的类别YAML文件按场景总数分组为2个平衡的任务桶。
写入和。
每个分区文件必须是有效的YAML,包含键及其子集。
{working_dir}/scenarios-part1.yaml{working_dir}/scenarios-part2.yamlscenarios:☐ 步骤2:创建团队
TeamCreate(team_name="sf-test-{agent_name}")☐ 步骤3:创建2个任务(每个分区一个)
TaskCreate(subject="Run partition 1", description="Execute scenarios-part1.yaml")
TaskCreate(subject="Run partition 2", description="Execute scenarios-part2.yaml")☐ 步骤4:并行生成2个工作进程(在同一条消息中包含2个Task工具调用)
使用下面的工作进程Agent提示模板。关键:两个Task调用必须在同一条消息中。
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-1", prompt=WORKER_PROMPT_1)
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-2", prompt=WORKER_PROMPT_2)☐ 步骤5:等待两个工作进程报告(完成后它们会SendMessage)
在两个工作进程都通过SendMessage发送结果之前,请勿继续。
☐ 步骤6:聚合结果
bash
python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
--results {working_dir}/worker-1-results.json {working_dir}/worker-2-results.json☐ 步骤7:向用户显示统一报告
☐ 步骤8:如果检测到任何失败,提供修复循环
☐ 步骤9:关闭工作进程
SendMessage(type="shutdown_request", recipient="worker-1")
SendMessage(type="shutdown_request", recipient="worker-2")☐ 步骤10:清理
TeamDeleteCredential Convention (~/.sfagent/)
凭证约定 (~/.sfagent/)
Persistent ECA credential storage managed by .
hooks/scripts/credential_manager.py由管理的持久化ECA凭证存储。
hooks/scripts/credential_manager.pyDirectory Structure
目录结构
~/.sfagent/
├── .gitignore ("*" — auto-created, prevents accidental commits)
├── {Org-Alias}/ (org alias — case-sensitive, e.g. Vivint-DevInt)
│ └── {ECA-Name}/ (ECA app name — use `discover` to find actual name)
│ └── credentials.env
└── Other-Org/
└── My_ECA/
└── credentials.env~/.sfagent/
├── .gitignore ("*" — 自动创建,防止意外提交)
├── {Org-Alias}/ (组织别名 — 区分大小写,例如Vivint-DevInt)
│ └── {ECA-Name}/ (ECA应用名称 — 使用`discover`查找实际名称)
│ └── credentials.env
└── Other-Org/
└── My_ECA/
└── credentials.envFile Format
文件格式
env
undefinedenv
undefinedcredentials.env — managed by credential_manager.py
credentials.env — 由credential_manager.py管理
'export' prefix allows direct source credentials.env
in shell
source credentials.env'export'前缀允许在shell中直接source credentials.env
source credentials.envexport SF_MY_DOMAIN=yourdomain.my.salesforce.com
export SF_CONSUMER_KEY=3MVG9...
export SF_CONSUMER_SECRET=ABC123...
undefinedexport SF_MY_DOMAIN=yourdomain.my.salesforce.com
export SF_CONSUMER_KEY=3MVG9...
export SF_CONSUMER_SECRET=ABC123...
undefinedSecurity Rules
安全规则
| Rule | Implementation |
|---|---|
| Directory permissions | |
| File permissions | |
| Git protection | |
| Secret display | NEVER show full secrets — mask as |
| Credential passing | Export as env vars for subprocesses, never write to temp files |
| 规则 | 实现方式 |
|---|---|
| 目录权限 | |
| 文件权限 | |
| Git保护 | 在 |
| 密钥显示 | 绝不显示完整密钥 — 掩码为 |
| 凭证传递 | 作为环境变量导出给子进程,绝不写入临时文件 |
CLI Reference
CLI参考
bash
undefinedbash
undefinedDiscover orgs and ECAs
发现组织和ECA
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias Vivint-DevInt
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias Vivint-DevInt
Load credentials (secrets masked in output)
加载凭证(输出中掩码密钥)
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py load --org-alias {org} --eca-name {eca}
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py load --org-alias {org} --eca-name {eca}
Save new credentials
保存新凭证
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py save
--org-alias {org} --eca-name {eca}
--domain yourdomain.my.salesforce.com
--consumer-key 3MVG9... --consumer-secret ABC123...
--org-alias {org} --eca-name {eca}
--domain yourdomain.my.salesforce.com
--consumer-key 3MVG9... --consumer-secret ABC123...
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py save
--org-alias {org} --eca-name {eca}
--domain yourdomain.my.salesforce.com
--consumer-key 3MVG9... --consumer-secret ABC123...
--org-alias {org} --eca-name {eca}
--domain yourdomain.my.salesforce.com
--consumer-key 3MVG9... --consumer-secret ABC123...
Validate OAuth flow
验证OAuth流程
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py validate --org-alias {org} --eca-name {eca}
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py validate --org-alias {org} --eca-name {eca}
Source credentials for shell use (set -a auto-exports all vars)
加载凭证供shell使用(set -a自动导出所有变量)
set -a; source ~/.sfagent/{org}/{eca}/credentials.env; set +a
---set -a; source ~/.sfagent/{org}/{eca}/credentials.env; set +a
---Swarm Execution Rules (Native Claude Code Teams)
集群执行规则(原生Claude Code团队)
When in the test plan, use Claude Code's native team orchestration for parallel test execution. When , run sequentially without creating a team.
worker_count > 1worker_count == 1当测试计划中的时,使用Claude Code的原生团队编排进行并行测试执行。当时,不创建团队直接顺序执行。
worker_count > 1worker_count == 1Team Lead Rules (Claude Code)
团队负责人规则(Claude Code)
RULE: Create team via TeamCreate("sf-test-{agent_name}")
RULE: Create one TaskCreate per partition (category or count split)
RULE: Spawn one Task(subagent_type="general-purpose") per worker
RULE: Each worker gets credentials as env vars in its prompt (NEVER in files)
RULE: Wait for all workers to report via SendMessage
RULE: After all workers complete, run rich_test_report.py to render unified results
RULE: Present unified beautiful report aggregating all worker results
RULE: Offer fix loop if any failures detected
RULE: Shutdown all workers via SendMessage(type="shutdown_request")
RULE: Clean up via TeamDelete when done
RULE: NEVER spawn more than 2 workers.
RULE: When categories > 2, group into 2 balanced buckets.
RULE: Queue remaining work to existing workers after they complete first batch.规则:通过TeamCreate("sf-test-{agent_name}")创建团队
规则:为每个分区创建一个TaskCreate
规则:为每个工作进程生成一个Task(subagent_type="general-purpose")
规则:每个工作进程在其提示中获取作为环境变量的凭证(绝不要放在文件中)
规则:等待所有工作进程通过SendMessage报告
规则:所有工作进程完成后,运行rich_test_report.py以生成统一结果
规则:向用户显示统一的美观报告
规则:如果检测到任何失败,提供修复循环
规则:通过SendMessage(type="shutdown_request")关闭所有工作进程
规则:完成后通过TeamDelete清理
规则:绝不生成超过2个工作进程
规则:当类别数>2时,分组为2个平衡的任务桶
规则:工作进程完成第一批任务后,将剩余工作排队给现有工作进程Worker Agent Prompt Template
工作进程Agent提示模板
Each worker receives this prompt (team lead fills in the variables):
You are a multi-turn test worker for Agentforce agent testing.
YOUR TASK:
1. Claim your task via TaskUpdate(status="in_progress", owner=your_name)
2. Load credentials and run the test:
set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a
python3 {skill_path}/hooks/scripts/multi_turn_test_runner.py \
--scenarios {scenario_file} \
--agent-id {agent_id} \
--var '$Context.RoutableId={routable_id}' \
--var '$Context.CaseId={case_id}' \
--output {working_dir}/worker-{N}-results.json \
--report-file {working_dir}/worker-{N}-report.ansi \
--worker-id {N} --verbose
3. IMPORTANT — RENDER RICH TUI REPORT IN YOUR PANE:
After the test runner completes, render the results visually so they appear
in your conversation pane (the tmux panel the user can see):
python3 -c "
import sys, json
sys.path.insert(0, '{skill_path}/hooks/scripts')
from multi_turn_test_runner import format_results_rich
with open('{working_dir}/worker-{N}-results.json') as f:
results = json.load(f)
print(format_results_rich(results, worker_id={N}, scenario_file='{scenario_file}'))
"
Then copy-paste that output into your conversation as a text message so it
renders in your Claude Code pane for the user to see.
4. Analyze: which scenarios passed, which failed, and WHY
5. SendMessage to team lead with:
- Pass/fail summary (counts + percentages)
- For each failure: scenario name, turn number, what went wrong, suggested fix
- Total execution time
- Any patterns noticed (e.g., "all context_preservation tests failed — may be a systemic issue")
6. Mark your task as completed via TaskUpdate
IMPORTANT:
- If a test fails with an auth error (exit code 2), report it immediately — do NOT retry
- If a test fails with scenario failures (exit code 1), analyze and report all failures
- You CAN communicate with other workers if you discover related issues
- The --report-file flag writes a persistent ANSI report file viewable with `cat` or `bat`每个工作进程都会收到此提示(团队负责人填充变量):
您是Agentforce Agent测试的多轮测试工作进程。
您的任务:
1. 通过TaskUpdate(status="in_progress", owner=your_name)认领任务
2. 加载凭证并运行测试:
set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a
python3 {skill_path}/hooks/scripts/multi_turn_test_runner.py \
--scenarios {scenario_file} \
--agent-id {agent_id} \
--var '$Context.RoutableId={routable_id}' \
--var '$Context.CaseId={case_id}' \
--output {working_dir}/worker-{N}-results.json \
--report-file {working_dir}/worker-{N}-report.ansi \
--worker-id {N} --verbose
3. 重要 — 在您的面板中渲染富文本TUI报告:
测试运行器完成后,可视化渲染结果,使其显示在您的对话面板中(用户可以看到的tmux面板):
python3 -c "
import sys, json
sys.path.insert(0, '{skill_path}/hooks/scripts')
from multi_turn_test_runner import format_results_rich
with open('{working_dir}/worker-{N}-results.json') as f:
results = json.load(f)
print(format_results_rich(results, worker_id={N}, scenario_file='{scenario_file}'))
"
然后将该输出复制粘贴到您的对话中作为文本消息,使其显示在您的Claude Code面板中供用户查看。
4. 分析:哪些场景通过了,哪些失败了,以及原因
5. 向团队负责人SendMessage,包含:
- 通过/失败摘要(数量+百分比)
- 每个失败项:场景名称、轮次、问题所在、建议修复方案
- 总执行时间
- 注意到的任何模式(例如:"所有context_preservation测试都失败了 — 可能是系统性问题")
6. 通过TaskUpdate将您的任务标记为已完成
重要提示:
- 如果测试因认证错误失败(退出代码2),立即报告 — 请勿重试
- 如果测试因场景失败失败(退出代码1),分析并报告所有失败
- 如果发现相关问题,您可以与其他工作进程沟通
- --report-file标志将持久化ANSI报告文件写入磁盘,可使用`cat`或`bat`查看Partition Strategies
分区策略
| Strategy | How It Works | Best For |
|---|---|---|
| One worker per test pattern (topic_routing, context, etc.) | Most runs — natural isolation |
| Split N scenarios evenly across W workers | Large scenario counts |
| Single process, no team | Quick runs, debugging |
| 策略 | 工作方式 | 最佳使用场景 |
|---|---|---|
| 每个测试模式(topic_routing、context等)分配一个工作进程 | 大多数运行 — 自然隔离 |
| 将N个场景平均分配给W个工作进程 | 场景数量较多时 |
| 单个进程,不需要团队 | 快速运行、调试 |
Team Lead Aggregation
团队负责人聚合
After all workers report, the team lead:
- Aggregates all worker result JSON files via :
rich_test_report.pybashpython3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \ --results /tmp/sf-test-{session}/worker-*-results.json - Deduplicates any shared failure patterns across workers
- Presents the unified Rich report (colored Panels, Tables, Tree) to the user
- Calculates aggregate scoring across the 7 categories
- Offers fix loop: if failures exist, ask user whether to auto-fix via
sf-ai-agentscript - Shuts down all workers and deletes the team
所有工作进程报告后,团队负责人:
- 聚合所有工作进程的结果JSON文件,通过:
rich_test_report.pybashpython3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \ --results /tmp/sf-test-{session}/worker-*-results.json - 去重跨工作进程的任何共享失败模式
- 向用户显示统一的Rich报告(彩色面板、表格、树形结构)
- 计算7个维度的聚合评分
- 提供修复循环:如果存在失败,询问用户是否通过自动修复
sf-ai-agentscript - 关闭所有工作进程并删除团队
Test Plan File Format
测试计划文件格式
Test plans () capture the full interview output for reuse. See for the complete schema.
test-plan-{agent}.yamltemplates/test-plan-template.yaml测试计划()捕获完整的访谈输出以便复用。完整架构请参见。
test-plan-{agent}.yamltemplates/test-plan-template.yamlKey Sections
关键部分
| Section | Purpose |
|---|---|
| Agent name, ID, org alias, timestamps |
| Path to |
| Topics, actions, type — populated by |
| List of YAML scenario files + pattern filters |
| Strategy ( |
| Context variables injected into every session |
| Timeout, retry, verbose, rich output settings |
| 部分 | 用途 |
|---|---|
| Agent名称、ID、组织别名、时间戳 |
| |
| 主题、动作、类型 — 由 |
| YAML场景文件列表 + 模式过滤器 |
| 策略( |
| 注入到每个会话的上下文变量 |
| 超时、重试、详细输出、富文本输出设置 |
Re-Running from a Saved Plan
从保存的计划重新运行
When a user provides a test plan file, skip the interview entirely:
1. Load test-plan-{agent}.yaml
2. Validate credentials: credential_manager.py validate --org-alias {org} --eca-name {eca}
3. If invalid → ask user to update credentials only (skip other interview steps)
4. Load scenario files from plan
5. Apply partition strategy from plan
6. Execute (team or sequential based on worker_count)This enables rapid re-runs after fixing agent issues — the user just says "re-run" and the skill picks up the saved plan.
当用户提供测试计划文件时,完全跳过访谈:
1. 加载test-plan-{agent}.yaml
2. 验证凭证:credential_manager.py validate --org-alias {org} --eca-name {eca}
3. 如果无效 → 仅询问用户更新凭证(跳过其他访谈步骤)
4. 从计划中加载场景文件
5. 应用计划中的分区策略
6. 执行(根据worker_count选择团队或顺序执行)这使得修复Agent问题后可以快速重新运行 — 用户只需说"重新运行",技能就会使用保存的计划。
Phase A: Multi-Turn API Testing (PRIMARY)
阶段A:多轮API测试(主流程)
⚠️ NEVER usefor OAuth token validation. Domains containingcurl(e.g.,--) cause shell expansion failures with curl'smy-org--devint.sandbox.my.salesforce.comargument parsing. Use--instead.credential_manager.py validate
⚠️ 绝不要使用进行OAuth令牌验证。包含curl的域名(例如--)会导致curl的my-org--devint.sandbox.my.salesforce.com参数解析出现shell扩展失败。请改用--。credential_manager.py validate
A1: ECA Credential Setup
A1:ECA凭证设置
Why ECA? Multi-turn API testing uses the Agent Runtime API (), which requires OAuth Client Credentials. If you only need interactive testing, use/einstein/ai-agent/v1instead — no ECA needed, justsf agent preview(v2.121.7+). See connected-app-setup.md.sf org login web
AskUserQuestion:
question: "Do you have an External Client App (ECA) with Client Credentials flow configured?"
header: "ECA Setup"
options:
- label: "Yes, I have credentials"
description: "I have Consumer Key, Secret, and My Domain URL ready"
- label: "No, I need to create one"
description: "Delegate to sf-connected-apps skill to create ECA"If YES: Collect credentials (kept in conversation context only, NEVER written to files):
- Consumer Key
- Consumer Secret
- My Domain URL (e.g., )
your-domain.my.salesforce.com
If NO: Delegate to sf-connected-apps:
Skill(skill="sf-connected-apps", args="Create External Client App with Client Credentials flow for Agent Runtime API testing. Scopes: api, chatbot_api, sfap_api, refresh_token, offline_access. Name: Agent_API_Testing")Verify credentials work:
bash
undefined为什么需要ECA? 多轮API测试使用Agent Runtime API(),需要OAuth客户端凭证。如果只需要交互式测试,请改用/einstein/ai-agent/v1— 不需要ECA,只需sf agent preview(v2.121.7+)。请参见connected-app-setup.md。sf org login web
AskUserQuestion:
question: "您是否配置了带有客户端凭证流的外部客户端应用(ECA)?"
header: "ECA设置"
options:
- label: "是,我有凭证"
description: "我已准备好消费者密钥、密钥和我的域名URL"
- label: "否,我需要创建一个"
description: "委托给sf-connected-apps技能创建ECA"如果是: 收集凭证(仅保存在对话上下文中,绝不写入文件):
- 消费者密钥
- 消费者密钥
- 我的域名URL(例如)
your-domain.my.salesforce.com
如果否: 委托给sf-connected-apps:
Skill(skill="sf-connected-apps", args="Create External Client App with Client Credentials flow for Agent Runtime API testing. Scopes: api, chatbot_api, sfap_api, refresh_token, offline_access. Name: Agent_API_Testing")验证凭证是否有效:
bash
undefinedValidate OAuth credentials via credential_manager.py (handles token request internally)
通过credential_manager.py验证OAuth凭证(内部处理令牌请求)
python3 {SKILL_PATH}/hooks/scripts/credential_manager.py
validate --org-alias {org} --eca-name {eca}
validate --org-alias {org} --eca-name {eca}
See [ECA Setup Guide](docs/eca-setup-guide.md) for complete instructions.python3 {SKILL_PATH}/hooks/scripts/credential_manager.py
validate --org-alias {org} --eca-name {eca}
validate --org-alias {org} --eca-name {eca}
完整说明请参见[ECA设置指南](docs/eca-setup-guide.md)。A2: Agent Discovery & Metadata Retrieval
A2:Agent发现与元数据检索
bash
undefinedbash
undefinedGet agent ID for API calls
获取API调用使用的Agent ID
AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE DeveloperName='[AgentName]' AND IsActive=true LIMIT 1"
--result-format json --target-org [alias] | jq -r '.result.records[0].Id')
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE DeveloperName='[AgentName]' AND IsActive=true LIMIT 1"
--result-format json --target-org [alias] | jq -r '.result.records[0].Id')
AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE DeveloperName='[AgentName]' AND IsActive=true LIMIT 1"
--result-format json --target-org [别名] | jq -r '.result.records[0].Id')
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE DeveloperName='[AgentName]' AND IsActive=true LIMIT 1"
--result-format json --target-org [别名] | jq -r '.result.records[0].Id')
Retrieve full agent configuration
检索完整的Agent配置
sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentName]"
--output-dir retrieve-temp --target-org [alias]
--metadata "GenAiPlannerBundle:[AgentName]"
--output-dir retrieve-temp --target-org [alias]
Claude reads the GenAiPlannerBundle to understand:
- **Topics**: Names, classificationDescriptions, instructions
- **Actions**: Types (flow, apex), triggers, inputs/outputs
- **System Instructions**: Global rules and guardrails
- **Escalation Paths**: When and how the agent escalates
This metadata drives automatic test scenario generation in A3.sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentName]"
--output-dir retrieve-temp --target-org [别名]
--metadata "GenAiPlannerBundle:[AgentName]"
--output-dir retrieve-temp --target-org [别名]
Claude读取GenAiPlannerBundle以了解:
- **主题**:名称、classificationDescriptions、指令
- **动作**:类型(flow、apex)、触发器、输入/输出
- **系统指令**:全局规则和防护规则
- **升级路径**:何时以及如何升级
此元数据驱动A3中的自动测试场景生成。A3: Test Scenario Planning
A3:测试场景规划
AskUserQuestion:
question: "What testing do you need?"
header: "Scenarios"
options:
- label: "Comprehensive coverage (Recommended)"
description: "All 6 test patterns: topic routing, context preservation, escalation, guardrails, action chaining, variable injection"
- label: "Topic routing accuracy"
description: "Test that utterances route to correct topics, including mid-conversation topic switches"
- label: "Context preservation"
description: "Test that the agent retains information across turns"
- label: "Specific bug reproduction"
description: "Reproduce a known issue with targeted multi-turn scenario"
multiSelect: trueClaude uses the agent metadata from A2 to auto-generate multi-turn scenarios tailored to the specific agent:
- Generates topic switching scenarios based on actual topic names
- Creates context preservation tests using actual action inputs/outputs
- Builds escalation tests based on actual escalation configuration
- Creates guardrail tests based on system instructions
Available templates (see templates/):
| Template | Pattern | Scenarios |
|---|---|---|
| Topic switching | 4 |
| Context retention | 4 |
| Escalation cascades | 4 |
| All 6 patterns | 6 |
AskUserQuestion:
question: "您需要哪种测试?"
header: "场景"
options:
- label: "全面覆盖(推荐)"
description: "所有6种测试模式:主题路由、上下文保留、升级、防护规则、动作链、变量注入"
- label: "主题路由准确性"
description: "测试语句是否路由到正确的主题,包括对话中途的主题切换"
- label: "上下文保留"
description: "测试Agent在多轮对话中保留信息的能力"
- label: "特定错误重现"
description: "使用针对性的多轮场景重现已知问题"
multiSelect: trueClaude使用A2中的Agent元数据自动生成针对特定Agent的多轮场景:
- 根据实际主题名称生成主题切换场景
- 使用实际动作输入/输出创建上下文保留测试
- 根据实际升级配置构建升级测试
- 根据系统指令创建防护规则测试
可用模板(请参见模板):
| 模板 | 模式 | 场景数量 |
|---|---|---|
| 主题切换 | 4 |
| 上下文保留 | 4 |
| 升级流程 | 4 |
| 所有6种模式 | 6 |
A4: Multi-Turn Execution
A4:多轮执行
Execute conversations via Agent Runtime API using the reusable Python scripts in .
hooks/scripts/⚠️ Agent API is NOT supported for agents of type "Agentforce (Default)". Only custom agents created via Agentforce Builder are supported.
Option 1: Run Test Scenarios from YAML Templates (Recommended)
Use the multi-turn test runner to execute entire scenario suites:
bash
undefined通过Agent Runtime API使用中的可复用Python脚本执行对话。
hooks/scripts/⚠️ Agent API不支持"Agentforce(默认)"类型的Agent。仅支持通过Agentforce Builder创建的自定义Agent。
选项1:从YAML模板运行测试场景(推荐)
使用多轮测试运行器执行整个场景套件:
bash
undefinedRun comprehensive test suite against an agent
对Agent运行全面测试套件
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
Run specific scenario within a suite
运行套件中的特定场景
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-topic-routing.yaml
--scenario-filter topic_switch_natural
--verbose
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-topic-routing.yaml
--scenario-filter topic_switch_natural
--verbose
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-topic-routing.yaml
--scenario-filter topic_switch_natural
--verbose
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-topic-routing.yaml
--scenario-filter topic_switch_natural
--verbose
With context variables and JSON output for fix loop
带上下文变量和JSON输出用于修复循环
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--var '$Context.EndUserLanguage=en_US'
--output results.json
--verbose
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--var '$Context.EndUserLanguage=en_US'
--output results.json
--verbose
**Exit codes:** `0` = all passed, `1` = some failed (fix loop should process), `2` = execution error
**Option 2: Use Environment Variables (cleaner for repeated runs)**
```bash
export SF_MY_DOMAIN="your-domain.my.salesforce.com"
export SF_CONSUMER_KEY="your_key"
export SF_CONSUMER_SECRET="your_secret"
export SF_AGENT_ID="0XxRM0000004ABC"python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--var '$Context.EndUserLanguage=en_US'
--output results.json
--verbose
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--var '$Context.EndUserLanguage=en_US'
--output results.json
--verbose
**退出代码:** `0` = 全部通过, `1` = 部分失败(修复循环应处理), `2` = 执行错误
**选项2:使用环境变量(重复运行更简洁)**
```bash
export SF_MY_DOMAIN="your-domain.my.salesforce.com"
export SF_CONSUMER_KEY="your_key"
export SF_CONSUMER_SECRET="your_secret"
export SF_AGENT_ID="0XxRM0000004ABC"Now run without credential flags
现在运行时不需要凭证标志
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
**Option 3: Python API for Ad-Hoc Testing**
For custom scenarios or debugging, use the client directly:
```python
from hooks.scripts.agent_api_client import AgentAPIClient
client = AgentAPIClient(
my_domain="your-domain.my.salesforce.com",
consumer_key="...",
consumer_secret="..."
)python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
--scenarios templates/multi-turn-comprehensive.yaml
--verbose
**选项3:Python API用于临时测试**
对于自定义场景或调试,直接使用客户端:
```python
from hooks.scripts.agent_api_client import AgentAPIClient
client = AgentAPIClient(
my_domain="your-domain.my.salesforce.com",
consumer_key="...",
consumer_secret="..."
)Context manager auto-ends session
上下文管理器自动结束会话
with client.session(agent_id="0XxRM000...") as session:
r1 = session.send("I need to cancel my appointment")
print(f"Turn 1: {r1.agent_text}")
r2 = session.send("Actually, reschedule instead")
print(f"Turn 2: {r2.agent_text}")
r3 = session.send("What was my original request?")
print(f"Turn 3: {r3.agent_text}")
# Check context preservation
if "cancel" in r3.agent_text.lower():
print("✅ Context preserved")with client.session(agent_id="0XxRM000...") as session:
r1 = session.send("我需要取消我的预约")
print(f"第1轮:{r1.agent_text}")
r2 = session.send("实际上,改为重新安排")
print(f"第2轮:{r2.agent_text}")
r3 = session.send("我最初的请求是什么?")
print(f"第3轮:{r3.agent_text}")
# 检查上下文保留
if "取消" in r3.agent_text.lower():
print("✅ 上下文已保留")With initial variables
带初始变量
variables = [
{"name": "$Context.AccountId", "type": "Id", "value": "001XXXXXXXXXXXX"},
{"name": "$Context.EndUserLanguage", "type": "Text", "value": "en_US"},
]
with client.session(agent_id="0Xx...", variables=variables) as session:
r1 = session.send("What orders do I have?")
**Connectivity Test:**
```bashvariables = [
{"name": "$Context.AccountId", "type": "Id", "value": "001XXXXXXXXXXXX"},
{"name": "$Context.EndUserLanguage", "type": "Text", "value": "en_US"},
]
with client.session(agent_id="0Xx...", variables=variables) as session:
r1 = session.send("我有哪些订单?")
**连通性测试:**
```bashVerify ECA credentials and API connectivity
验证ECA凭证和API连通性
python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py
python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py
Reads SF_MY_DOMAIN, SF_CONSUMER_KEY, SF_CONSUMER_SECRET from env
从环境变量读取SF_MY_DOMAIN、SF_CONSUMER_KEY、SF_CONSUMER_SECRET
**Per-Turn Analysis Checklist:**
The test runner automatically evaluates each turn against expectations defined in the YAML template:
| # | Check | YAML Key | How Evaluated |
|---|-------|----------|---------------|
| 1 | Response non-empty? | `response_not_empty: true` | `messages[0].message` has content |
| 2 | Correct topic matched? | `topic_contains: "cancel"` | Heuristic: inferred from response text |
| 3 | Expected actions invoked? | `action_invoked: true` | Checks for `result` array entries |
| 4 | Response content? | `response_contains: "reschedule"` | Substring match on response |
| 5 | Context preserved? | `context_retained: true` | Heuristic: checks for prior-turn references |
| 6 | Guardrail respected? | `guardrail_triggered: true` | Regex patterns for refusal language |
| 7 | Escalation triggered? | `escalation_triggered: true` | Checks for `Escalation` message type |
| 8 | Response excludes? | `response_not_contains: "error"` | Substring exclusion check |
See [Agent API Reference](docs/agent-api-reference.md) for complete response format.
**每轮分析检查清单:**
测试运行器会根据YAML模板中定义的期望自动评估每一轮:
| # | 检查项 | YAML键 | 评估方式 |
|---|-------|----------|---------------|
| 1 | 响应非空? | `response_not_empty: true` | `messages[0].message`包含内容 |
| 2 | 匹配正确的主题? | `topic_contains: "cancel"` | 启发式:从响应文本推断 |
| 3 | 调用了预期的动作? | `action_invoked: true` | 检查`result`数组条目 |
| 4 | 响应内容? | `response_contains: "reschedule"` | 响应中的子字符串匹配 |
| 5 | 上下文已保留? | `context_retained: true` | 启发式:检查对前一轮的引用 |
| 6 | 遵守防护规则? | `guardrail_triggered: true` | 拒绝语言的正则表达式模式 |
| 7 | 触发升级? | `escalation_triggered: true` | 检查`Escalation`消息类型 |
| 8 | 响应不包含? | `response_not_contains: "error"` | 子字符串排除检查 |
完整响应格式请参见[Agent API参考](docs/agent-api-reference.md)。A5: Results & Scoring
A5:结果与评分
Claude generates a terminal-friendly results report:
📊 MULTI-TURN TEST RESULTS
════════════════════════════════════════════════════════════════
Agent: Customer_Support_Agent
Org: vivint-DevInt
Mode: Agent Runtime API (multi-turn)
SCENARIO RESULTS
───────────────────────────────────────────────────────────────
✅ topic_switch_natural 3/3 turns passed
✅ context_user_identity 3/3 turns passed
❌ escalation_frustration 2/3 turns passed (Turn 3: no escalation)
✅ guardrail_mid_conversation 3/3 turns passed
✅ action_chain_identify 3/3 turns passed
⚠️ variable_injection 2/3 turns passed (Turn 3: re-asked for account)
SUMMARY
───────────────────────────────────────────────────────────────
Scenarios: 6 total | 4 passed | 1 failed | 1 partial
Turns: 18 total | 16 passed | 2 failed
Topic Re-matching: 100% ✅
Context Preservation: 83% ⚠️
Escalation Accuracy: 67% ❌
FAILED TURNS
───────────────────────────────────────────────────────────────
❌ escalation_frustration → Turn 3
Input: "Nothing is working! I need a human NOW"
Expected: Escalation triggered
Actual: Agent continued troubleshooting
Category: MULTI_TURN_ESCALATION_FAILURE
Fix: Add frustration keywords to escalation triggers
⚠️ variable_injection → Turn 3
Input: "Create a new case for a billing issue"
Expected: Uses pre-set $Context.AccountId
Actual: "Which account is this for?"
Category: CONTEXT_PRESERVATION_FAILURE
Fix: Wire $Context.AccountId to CreateCase action input
SCORING
───────────────────────────────────────────────────────────────
Topic Selection Coverage 13/15
Action Invocation 14/15
Multi-Turn Topic Re-matching 15/15 ✅
Context Preservation 10/15 ⚠️
Edge Case & Guardrail Coverage 12/15
Test Spec / Scenario Quality 9/10
Agentic Fix Success --/15 (pending)
TOTAL: 73/85 (86%) + Fix Loop pendingClaude生成适合终端显示的结果报告:
📊 多轮测试结果
════════════════════════════════════════════════════════════════
Agent: Customer_Support_Agent
组织: vivint-DevInt
模式: Agent Runtime API(多轮)
场景结果
───────────────────────────────────────────────────────────────
✅ topic_switch_natural 3/3轮通过
✅ context_user_identity 3/3轮通过
❌ escalation_frustration 2/3轮通过(第3轮:未升级)
✅ guardrail_mid_conversation 3/3轮通过
✅ action_chain_identify 3/3轮通过
⚠️ variable_injection 2/3轮通过(第3轮:重新询问账户)
摘要
───────────────────────────────────────────────────────────────
场景: 共6个 | 通过4个 | 失败1个 | 部分通过1个
轮次: 共18轮 | 通过16轮 | 失败2轮
主题重匹配: 100% ✅
上下文保留: 83% ⚠️
升级准确性: 67% ❌
失败轮次
───────────────────────────────────────────────────────────────
❌ escalation_frustration → 第3轮
输入: "什么都不管用!我现在需要人工服务!"
预期: 触发升级
实际: Agent继续故障排除
类别: MULTI_TURN_ESCALATION_FAILURE
修复: 向升级触发器添加沮丧关键词
⚠️ variable_injection → 第3轮
输入: "为账单问题创建新案例"
预期: 使用预设的$Context.AccountId
实际: "这是哪个账户的问题?"
类别: CONTEXT_PRESERVATION_FAILURE
修复: 将$Context.AccountId连接到CreateCase动作输入
评分
───────────────────────────────────────────────────────────────
主题选择覆盖率 13/15
动作调用 14/15
多轮主题重匹配 15/15 ✅
上下文保留 10/15 ⚠️
边缘案例与防护规则覆盖率 12/15
测试用例/场景质量 9/10
Agent自动修复成功率 --/15 (待处理)
总计: 73/85 (86%) + 修复循环待处理Phase B: CLI Testing Center (SECONDARY)
阶段B:CLI测试中心(副流程)
Availability: Requires Agent Testing Center feature enabled in org. If unavailable, use Phase A exclusively.
可用性: 需要组织中启用Agent测试中心功能。 如果不可用,请仅使用阶段A。
⚡ Agent Script Agents (AiAuthoringBundle)
⚡ Agent Script Agent(AiAuthoringBundle)
Agent Script agents ( files in ) deploy as and use the same CLI commands. However, they have unique testing challenges:
.agentaiAuthoringBundles/BotDefinitionsf agent testTwo-Level Action System:
- Level 1 (Definition): block — defines actions with
topic.actions:target: "apex://ClassName" - Level 2 (Invocation): block — invokes via
reasoning.actions:with variable bindings@actions.<name>
Single-Utterance Limitation:
Multi-topic Agent Script agents with routing have a "1 action per reasoning cycle" budget in CLI tests. The first cycle is consumed by the transition action (). The actual business action (e.g., ) fires in a second cycle that single-utterance tests don't reach.
start_agentgo_<topic>get_order_statusSolution — Use :
conversationHistoryyaml
testCases:
# ROUTING TEST — captures transition action only
- utterance: "I want to check my order status"
expectedTopic: order_status
expectedActions:
- go_order_status # Transition action from start_agent
# ACTION TEST — use conversationHistory to skip routing
- utterance: "The order ID is 801ak00001g59JlAAI"
conversationHistory:
- role: "user"
message: "I want to check my order status"
- role: "agent"
topic: "order_status" # Pre-positions agent in target topic
message: "I'd be happy to help! Could you provide the Order ID?"
expectedTopic: order_status
expectedActions:
- get_order_status # Level 1 DEFINITION name (NOT invocation name)
expectedOutcome: "Agent retrieves and displays order details"Key Rules for Agent Script CLI Tests:
- uses the Level 1 definition name (e.g.,
expectedActions), NOT the Level 2 invocation name (e.g.,get_order_status)check_status - Agent Script topic names may differ in org — use the topic name discovery workflow
- Agents with Apex require the Einstein Agent User to have object permissions — missing permissions cause silent failures (0 rows, no error)
WITH USER_MODE - in the YAML spec maps to
subjectNamein theconfig.developer_namefile.agent
⚠️ Agent Script API Testing Caveat:
Agent Script agents embed action results differently via the Agent Runtime API:
- Agent Builder agents: Return separate message types with structured data
ActionResult - Agent Script agents: Embed action outputs within text messages — no separate
InformtypeActionResult
This means:
- (boolean) may fail even when the action runs — use
action_invoked: trueto verify action output insteadresponse_contains - uses
action_invoked: "action_name"fallback parsing but is less reliableplannerSurfaces - For robust testing, prefer /
response_containschecks overresponse_contains_anyaction_invoked
Agent Script Templates & Docs:
- Template: agentscript-test-spec.yaml — 5 test patterns (CLI)
- Template: multi-turn-agentscript-comprehensive.yaml — 6 multi-turn API scenarios
- Guide: agentscript-testing-patterns.md — detailed patterns with worked examples
Automated Test Spec Generation:
bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
--agent-file /path/to/Agent.agent \
--output tests/agent-spec.yaml --verboseAgent Script Agent(中的文件)部署为,并使用相同的CLI命令。但是,它们有独特的测试挑战:
aiAuthoringBundles/.agentBotDefinitionsf agent test两级动作系统:
- 第1级(定义): 块 — 定义带有
topic.actions:的动作target: "apex://ClassName" - 第2级(调用): 块 — 通过
reasoning.actions:调用并绑定变量@actions.<name>
单轮语句限制:
带有路由的多主题Agent Script Agent在CLI测试中每个推理周期有"1个动作"的预算。第一个周期被过渡动作()消耗。实际业务动作(例如)在单轮测试无法到达的第二个周期触发。
start_agentgo_<topic>get_order_status解决方案 — 使用:
conversationHistoryyaml
testCases:
# 路由测试 — 仅捕获过渡动作
- utterance: "我想查看我的订单状态"
expectedTopic: order_status
expectedActions:
- go_order_status # 来自start_agent的过渡动作
# 动作测试 — 使用conversationHistory跳过路由
- utterance: "订单ID是801ak00001g59JlAAI"
conversationHistory:
- role: "user"
message: "我想查看我的订单状态"
- role: "agent"
topic: "order_status" # 将Agent预先定位到目标主题
message: "我很乐意为您提供帮助!请提供订单ID?"
expectedTopic: order_status
expectedActions:
- get_order_status # 第1级定义名称(不是调用名称)
expectedOutcome: "Agent检索并显示订单详情"Agent Script CLI测试关键规则:
- 使用第1级定义名称(例如
expectedActions),而不是第2级调用名称(例如get_order_status)check_status - Agent Script主题名称在组织中可能不同 — 使用主题名称发现工作流
- 带有Apex的Agent需要Einstein Agent User具有对象权限 — 缺少权限会导致静默失败(0行,无错误)
WITH USER_MODE - YAML规范中的映射到
subjectName文件中的.agentconfig.developer_name
⚠️ Agent Script API测试注意事项:
Agent Script Agent通过Agent Runtime API嵌入动作结果的方式不同:
- Agent Builder Agent:返回带有结构化数据的独立消息类型
ActionResult - Agent Script Agent:在文本消息中嵌入动作输出 — 没有独立的
Inform类型ActionResult
这意味着:
- (布尔值)即使动作运行也可能失败 — 改用
action_invoked: true验证动作输出response_contains - 使用
action_invoked: "action_name"回退解析,但可靠性较低plannerSurfaces - 为了稳健测试,优先使用/
response_contains检查而不是response_contains_anyaction_invoked
Agent Script模板与文档:
- 模板:agentscript-test-spec.yaml — 5种测试模式(CLI)
- 模板:multi-turn-agentscript-comprehensive.yaml — 6种多轮API场景
- 指南:agentscript-testing-patterns.md — 带实际示例的详细模式
自动测试规范生成:
bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
--agent-file /path/to/Agent.agent \
--output tests/agent-spec.yaml --verboseGenerates both routing tests (with transition actions) and
生成路由测试(带过渡动作)和
action tests (with conversationHistory for apex:// targets)
动作测试(带针对apex://目标的conversationHistory)
**Agent Discovery:**
```bash
**Agent发现:**
```bashDiscover Agent Script agents alongside XML-based agents
与基于XML的Agent一起发现Agent Script Agent
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local
--project-dir /path/to/project --agent-name MyAgent
--project-dir /path/to/project --agent-name MyAgent
python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local
--project-dir /path/to/project --agent-name MyAgent
--project-dir /path/to/project --agent-name MyAgent
Returns type: "AiAuthoringBundle" for .agent files
对于.agent文件返回type: "AiAuthoringBundle"
undefinedundefinedB1: Test Spec Creation
B1:测试规范创建
⚠️ CRITICAL: YAML Schema
The CLI YAML spec uses a FLAT structure parsed by — NOT the fabricated // format.
See test-spec-guide.md for the correct schema.
@salesforce/agentsapiVersionkindmetadataRequired top-level fields:
- — Display name (MasterLabel). Deploy FAILS without this.
name: subjectType: AGENT- — Agent BotDefinition DeveloperName
subjectName:
Test case fields (flat, NOT nested):
- — User message
utterance: - — NOT
expectedTopic:expectation.topic - — Flat list of strings, NOT objects with
expectedActions:/name/invokedoutputs - — Optional natural language description
expectedOutcome:
yaml
undefined⚠️ 关键:YAML架构
CLI YAML规范使用扁平结构,由解析 — 不是虚构的//格式。
正确架构请参见test-spec-guide.md。
@salesforce/agentsapiVersionkindmetadata必填顶级字段:
- — 显示名称(MasterLabel)。没有此字段部署会失败。
name: subjectType: AGENT- — Agent BotDefinition DeveloperName
subjectName:
测试用例字段(扁平,非嵌套):
- — 用户消息
utterance: - — 不是
expectedTopic:expectation.topic - — 扁平字符串列表,不是带有
expectedActions:/name/invoked的对象outputs - — 可选自然语言描述
expectedOutcome:
yaml
undefined✅ Correct CLI YAML format
✅ 正确的CLI YAML格式
name: "My Agent Tests"
subjectType: AGENT
subjectName: My_Agent
testCases:
- utterance: "Where is my order?"
expectedTopic: order_lookup
expectedActions:
- get_order_status expectedOutcome: "Agent should provide order status information"
**Option A: Interactive Generation** (no automation)
```bashname: "我的Agent测试"
subjectType: AGENT
subjectName: My_Agent
testCases:
- utterance: "我的订单在哪里?"
expectedTopic: order_lookup
expectedActions:
- get_order_status expectedOutcome: "Agent应提供订单状态信息"
**选项A:交互式生成**(无自动化)
```bashInteractive test spec generation
交互式测试规范生成
sf agent generate test-spec --output-file ./tests/agent-spec.yaml
sf agent generate test-spec --output-file ./tests/agent-spec.yaml
⚠️ NOTE: No --api-name flag! Interactive-only.
⚠️ 注意:没有--api-name标志!仅支持交互式。
**Option B: Automated Generation** (Python script)
```bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
--agent-file /path/to/Agent.agent \
--output tests/agent-spec.yaml \
--verboseCreate Test in Org:
bash
sf agent test create --spec ./tests/agent-spec.yaml --api-name MyAgentTest --target-org [alias]See Test Spec Reference for complete YAML format guide.
**选项B:自动生成**(Python脚本)
```bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
--agent-file /path/to/Agent.agent \
--output tests/agent-spec.yaml \
--verbose在组织中创建测试:
bash
sf agent test create --spec ./tests/agent-spec.yaml --api-name MyAgentTest --target-org [别名]完整YAML格式指南请参见测试规范参考。
B1.5: Topic Name Resolution
B1.5:主题名称解析
Topic name format in depends on the topic type:
expectedTopic| Topic Type | YAML Value | Resolution |
|---|---|---|
| Standard (Escalation, Off_Topic) | | Framework resolves automatically |
| Promoted (p_16j... prefix) | Full runtime | Must be exact match |
Standard topics like can use the short name — the CLI framework resolves to the hash-suffixed runtime name.
EscalationPromoted topics (custom topics created in Setup UI) MUST use the full runtime including hash suffix. The short does NOT resolve.
developerNamelocalDeveloperNameDiscovery workflow:
- Write spec with best guesses for topic names
- Deploy and run:
sf agent test run --api-name X --wait 10 --result-format json --json - Extract actual names:
jq '.result.testCases[].generatedData.topic' - Update spec with actual runtime names
- Re-deploy with and re-run
--force-overwrite
See topic-name-resolution.md for the complete guide.
expectedTopic| 主题类型 | YAML值 | 解析方式 |
|---|---|---|
| 标准(Escalation、Off_Topic) | | 框架自动解析 |
| 推广(p_16j...前缀) | 带哈希的完整运行时 | 必须完全匹配 |
标准主题如可以使用短名称 — CLI框架会解析为带哈希后缀的运行时名称。
Escalation推广主题(在设置UI中创建的自定义主题)必须使用包含哈希后缀的完整运行时。短无法解析。
developerNamelocalDeveloperName发现工作流:
- 使用主题名称的最佳猜测编写规范
- 部署并运行:
sf agent test run --api-name X --wait 10 --result-format json --json - 提取实际名称:
jq '.result.testCases[].generatedData.topic' - 使用实际运行时名称更新规范
- 使用重新部署并重新运行
--force-overwrite
完整指南请参见topic-name-resolution.md。
B1.6: Known CLI Gotchas
B1.6:已知CLI陷阱
| Gotcha | Detail |
|---|---|
| Deploy fails: "Required fields are missing: [MasterLabel]" |
| |
Empty | Means "not testing" — PASS even when actions invoked |
Missing | |
| No MessagingSession context | Flows needing |
| Always use |
contextVariables | Use |
| customEvaluations RETRY bug | ⚠️ Spring '26: Server returns RETRY → REST API 500. See Known Issues. |
| Returns score=0, empty explanation — platform bug |
| Labels FAILURE even at score=1 — use score value, ignore label |
| 陷阱 | 详情 |
|---|---|
| 部署失败:"Required fields are missing: [MasterLabel]" |
| |
空 | 表示"不测试" — 即使调用动作也会通过 |
缺少 | |
| 无MessagingSession上下文 | 需要 |
| 始终使用 |
contextVariables | 使用 |
| customEvaluations RETRY错误 | ⚠️ Spring '26: 服务器返回RETRY → REST API 500。请参见已知问题。 |
| 返回score=0,空explanation — 平台错误 |
| 即使score=1也标记为FAILURE — 使用分数值,忽略标签 |
B1.7: Context Variables
B1.7:上下文变量
Context variables inject session-level data (record IDs, user info) into CLI test cases. Without them, action flows receive the topic's internal name as . With them, they receive a real record ID.
recordIdWhen to use: Any test case where action flows need real record IDs (e.g., updating a MessagingSession, creating a Case).
YAML syntax:
yaml
contextVariables:
- name: RoutableId # Bare name — NOT $Context.RoutableId
value: "0Mwbb000007MGoTCAW"
- name: CaseId
value: "500XX0000000001"Key rules:
- uses bare variable name (e.g.,
name), NOTRoutableId— the CLI adds the prefix$Context.RoutableId - Maps to /
<contextVariable><variableName>in XML metadata<variableValue>
Discovery — find valid IDs:
bash
sf data query --query "SELECT Id FROM MessagingSession WHERE Status='Active' LIMIT 1" --target-org [alias]
sf data query --query "SELECT Id FROM Case ORDER BY CreatedDate DESC LIMIT 1" --target-org [alias]Verified effect (IRIS testing, 2026-02-09):
- Without : action receives
RoutableId(topic name)recordId: "p_16jPl000000GwEX_Field_Support_Routing_16j8eeef13560aa" - With : action receives
RoutableId(real MessagingSession ID)recordId: "0Mwbb000007MGoTCAW"
Note: Context variables do NOT unlock authentication-gated topics. Injecting+RoutableIddoes not satisfyCaseIdflows.User_Authentication
See context-vars-test-spec.yaml for a dedicated template.
上下文变量将会话级数据(记录ID、用户信息)注入到CLI测试用例中。没有这些变量,动作流会将主题的内部名称作为。有了这些变量,它们会收到真实的记录ID。
recordId使用场景: 任何动作流需要真实记录ID的测试用例(例如更新MessagingSession、创建案例)。
YAML语法:
yaml
contextVariables:
- name: RoutableId # 裸名称 — 不是$Context.RoutableId
value: "0Mwbb000007MGoTCAW"
- name: CaseId
value: "500XX0000000001"关键规则:
- 使用裸变量名称(例如
name),而不是RoutableId— CLI会添加前缀$Context.RoutableId - 映射到XML元数据中的/
<contextVariable><variableName><variableValue>
发现 — 查找有效ID:
bash
sf data query --query "SELECT Id FROM MessagingSession WHERE Status='Active' LIMIT 1" --target-org [别名]
sf data query --query "SELECT Id FROM Case ORDER BY CreatedDate DESC LIMIT 1" --target-org [别名]已验证效果(IRIS测试,2026-02-09):
- 没有:动作收到
RoutableId(主题名称)recordId: "p_16jPl000000GwEX_Field_Support_Routing_16j8eeef13560aa" - 有:动作收到
RoutableId(真实MessagingSession ID)recordId: "0Mwbb000007MGoTCAW"
注意: 上下文变量不会解锁认证 gated 主题。注入+RoutableId不满足CaseId流程。User_Authentication
专用模板请参见context-vars-test-spec.yaml。
B1.8: Metrics
B1.8:指标
Metrics add platform quality scoring to test cases. Specify as a flat list of metric names in the YAML.
YAML syntax:
yaml
metrics:
- coherence
- instruction_following
- output_latency_millisecondsAvailable metrics (observed behavior from IRIS testing, 2026-02-09):
| Metric | Score Range | Status | Notes |
|---|---|---|---|
| 1-5 | ✅ Works | Scores 4-5 for clear responses. Recommended. |
| 1-5 | ⚠️ Misleading | Penalizes triage/routing agents for "not solving" — skip for routing agents. |
| 1-5 | 🔴 Broken | Returns score=0, empty explanation. Platform bug. |
| 0-1 | ⚠️ Threshold bug | Labels "FAILURE" at score=1 when explanation says "follows perfectly." |
| Raw ms | ✅ Works | No pass/fail — useful for performance baselining. |
Recommendation: Use + for baseline quality. Skip (broken) and (misleading for routing agents).
coherenceoutput_latency_millisecondsconcisenesscompleteness指标为测试用例添加平台质量评分。在YAML中指定为扁平的指标名称列表。
YAML语法:
yaml
metrics:
- coherence
- instruction_following
- output_latency_milliseconds可用指标(IRIS测试观察到的行为,2026-02-09):
| 指标 | 评分范围 | 状态 | 说明 |
|---|---|---|---|
| 1-5 | ✅ 可用 | 清晰响应评4-5分,推荐使用 |
| 1-5 | ⚠️ 有误导性 | 因"未解决问题"而惩罚分诊/路由Agent — 路由Agent跳过此指标 |
| 1-5 | 🔴 已损坏 | 始终返回score=0,空explanation。平台错误。 |
| 0-1 | ⚠️ 阈值错误 | 即使score=1且说明文本说Agent"完全遵循指令"也标记为"FAILURE"。 |
| 原始毫秒 | ✅ 可用 | 无通过/失败 — 用于性能基准测试 |
推荐: 使用 + 作为基准质量。跳过(已损坏)和(对路由Agent有误导性)。
coherenceoutput_latency_millisecondsconcisenesscompletenessB1.9: Custom Evaluations (⚠️ Spring '26 Bug)
B1.9:自定义评估(⚠️ Spring '26错误)
Custom evaluations allow JSONPath-based assertions on action inputs and outputs — e.g., "verify the action received ."
supportPath = 'Field Support'YAML syntax:
yaml
customEvaluations:
- label: "supportPath is Field Support"
name: string_comparison
parameters:
- name: operator
value: equals
isReference: false
- name: actual
value: "$.generatedData.invokedActions[0][0].function.input.supportPath"
isReference: true # JSONPath resolved against generatedData
- name: expected
value: "Field Support"
isReference: falseEvaluation types:
- :
string_comparison,equals,contains,startswithendswith - :
numeric_comparison,equals,greater_than,less_than,greater_than_or_equalless_than_or_equal
Building JSONPath expressions:
- Run tests with to see
--verbosegeneratedData.invokedActions - Parse the stringified JSON (it's , not a parsed array)
"[[{...}]]" - Common paths:
$.generatedData.invokedActions[0][0].function.input.[field]
⚠️ BLOCKED — Spring '26 Platform Bug: Custom evaluations withcause the server to return "RETRY" status. The results API crashes withisReference: true. This is server-side (confirmed via directINTERNAL_SERVER_ERROR). Workaround: Usecurl(LLM-as-judge) or the Testing Center UI until patched.expectedOutcome
See custom-eval-test-spec.yaml for a dedicated template.
自定义评估允许对动作输入和输出进行基于JSONPath的断言 — 例如"验证动作收到"。
supportPath = 'Field Support'YAML语法:
yaml
customEvaluations:
- label: "supportPath为Field Support"
name: string_comparison
parameters:
- name: operator
value: equals
isReference: false
- name: actual
value: "$.generatedData.invokedActions[0][0].function.input.supportPath"
isReference: true # 针对generatedData解析JSONPath
- name: expected
value: "Field Support"
isReference: false评估类型:
- :
string_comparison,equals,contains,startswithendswith - :
numeric_comparison,equals,greater_than,less_than,greater_than_or_equalless_than_or_equal
构建JSONPath表达式:
- 使用运行测试以查看
--verbosegeneratedData.invokedActions - 解析字符串化的JSON(是,不是解析后的数组)
"[[{...}]]" - 常见路径:
$.generatedData.invokedActions[0][0].function.input.[field]
⚠️ 已阻止 — Spring '26平台错误: 带有的自定义评估导致服务器返回"RETRY"状态。结果API崩溃并显示isReference: true。这是服务器端问题(通过直接INTERNAL_SERVER_ERROR确认)。解决方法: 在修复前使用curl(LLM作为判断者)或测试中心UI。expectedOutcome
专用模板请参见custom-eval-test-spec.yaml。
B2: Test Execution
B2:测试执行
bash
undefinedbash
undefinedRun automated tests
运行自动化测试
sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org [alias]
> **No ECA required.** Preview uses standard org auth (`sf org login web`). No Connected App setup needed (v2.121.7+).
**Interactive Preview (Simulated):**
```bash
sf agent preview --api-name AgentName --output-dir ./logs --target-org [alias]Interactive Preview (Live):
bash
sf agent preview --api-name AgentName --use-live-actions --apex-debug --target-org [alias]sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org [别名]
> **不需要ECA**。预览使用标准组织认证(`sf org login web`)。不需要连接应用设置(v2.121.7+)。
**交互式预览(模拟):**
```bash
sf agent preview --api-name AgentName --output-dir ./logs --target-org [别名]交互式预览(真实):
bash
sf agent preview --api-name AgentName --use-live-actions --apex-debug --target-org [别名]B3: Results Analysis
B3:结果分析
Parse test results JSON and display formatted summary:
📊 AGENT TEST RESULTS (CLI)
════════════════════════════════════════════════════════════════
Agent: Customer_Support_Agent
Org: vivint-DevInt
Duration: 45.2s
Mode: Simulated
SUMMARY
───────────────────────────────────────────────────────────────
✅ Passed: 18
❌ Failed: 2
⏭️ Skipped: 0
📈 Topic Selection: 95%
🎯 Action Invocation: 90%
FAILED TESTS
───────────────────────────────────────────────────────────────
❌ test_complex_order_inquiry
Utterance: "What's the status of orders 12345 and 67890?"
Expected: get_order_status invoked 2 times
Actual: get_order_status invoked 1 time
Category: ACTION_INVOCATION_COUNT_MISMATCH
COVERAGE SUMMARY
───────────────────────────────────────────────────────────────
Topics Tested: 4/5 (80%) ⚠️
Actions Tested: 6/8 (75%) ⚠️
Guardrails Tested: 3/3 (100%) ✅解析测试结果JSON并显示格式化摘要:
📊 AGENT测试结果(CLI)
════════════════════════════════════════════════════════════════
Agent: Customer_Support_Agent
组织: vivint-DevInt
持续时间: 45.2s
模式: 模拟
摘要
───────────────────────────────────────────────────────────────
✅ 通过: 18
❌ 失败: 2
⏭️ 跳过: 0
📈 主题选择: 95%
🎯 动作调用: 90%
失败测试
───────────────────────────────────────────────────────────────
❌ test_complex_order_inquiry
语句: "订单12345和67890的状态是什么?"
预期: get_order_status调用2次
实际: get_order_status调用1次
类别: ACTION_INVOCATION_COUNT_MISMATCH
覆盖率摘要
───────────────────────────────────────────────────────────────
已测试主题: 4/5 (80%) ⚠️
已测试动作: 6/8 (75%) ⚠️
已测试防护规则: 3/3 (100%) ✅Phase C: Agentic Fix Loop
阶段C:Agent自动修复循环
When tests fail (either Phase A or Phase B), automatically fix via sf-ai-agentscript:
当测试失败时(阶段A或B),通过sf-ai-agentscript自动修复:
Failure Categories (10 total)
故障类别(共10种)
| Category | Source | Auto-Fix | Strategy |
|---|---|---|---|
| A+B | ✅ | Add keywords to topic description |
| A+B | ✅ | Improve action description |
| A+B | ✅ | Differentiate descriptions |
| A+B | ⚠️ | Delegate to sf-flow or sf-apex |
| A+B | ✅ | Add explicit guardrails |
| A+B | ✅ | Add escalation action/triggers |
| A | ✅ | Add transition phrases to target topic |
| A | ✅ | Add context retention instructions |
| A | ✅ | Add frustration detection triggers |
| A | ✅ | Fix action output variable mappings |
| 类别 | 来源 | 自动修复 | 策略 |
|---|---|---|---|
| A+B | ✅ | 向主题描述添加关键词 |
| A+B | ✅ | 改进动作描述 |
| A+B | ✅ | 区分描述 |
| A+B | ⚠️ | 委托给sf-flow或sf-apex |
| A+B | ✅ | 添加明确的防护规则 |
| A+B | ✅ | 添加升级动作/触发器 |
| A | ✅ | 向目标主题添加过渡短语 |
| A | ✅ | 添加上下文保留指令 |
| A | ✅ | 添加沮丧检测触发器 |
| A | ✅ | 修复动作输出变量映射 |
Auto-Fix Command Example
自动修复命令示例
bash
Skill(skill="sf-ai-agentscript", args="Fix agent [AgentName] - Error: [category] - [details]")bash
Skill(skill="sf-ai-agentscript", args="修复Agent [AgentName] - 错误: [category] - [详情]")Fix Loop Flow
修复循环流程
Test Failed → Analyze failure category
│
├─ Single-turn failure → Standard fix (topics, actions, guardrails)
│
└─ Multi-turn failure → Enhanced fix (context, re-matching, escalation, chaining)
│
▼
Apply fix via sf-ai-agentscript → Re-publish → Re-test
│
├─ Pass → ✅ Move to next failure
└─ Fail → Retry (max 3 attempts) → Escalate to humanSee Agentic Fix Loops Guide for complete decision tree and 10 fix strategies.
测试失败 → 分析故障类别
│
├─ 单轮失败 → 标准修复(主题、动作、防护规则)
│
└─ 多轮失败 → 增强修复(上下文、重匹配、升级、链式调用)
│
▼
通过sf-ai-agentscript应用修复 → 重新发布 → 重新测试
│
├─ 通过 → ✅ 处理下一个失败
└─ 失败 → 重试(最多3次) → 升级给人工完整决策树和10种修复策略请参见Agent自动修复循环指南。
Two Fix Strategies
两种修复策略
| Agent Type | Fix Strategy | When to Use |
|---|---|---|
| Custom Agent (you control it) | Fix the agent via | Topic descriptions, action configs need adjustment |
| Managed/Standard Agent | Fix test expectations | Test expectations don't match actual behavior |
| Agent类型 | 修复策略 | 使用场景 |
|---|---|---|
| 自定义Agent(您控制它) | 通过 | 主题描述、动作配置需要调整 |
| 托管/标准Agent | 修复测试期望 | 测试期望与实际行为不匹配 |
Phase D: Coverage Improvement
阶段D:覆盖率提升
If coverage < threshold:
- Identify untested topics/actions/patterns from results
- Add test cases (YAML for CLI, scenarios for API)
- Re-run tests
- Repeat until threshold met
如果覆盖率<阈值:
- 从结果中识别未测试的主题/动作/模式
- 添加测试用例(CLI使用YAML,API使用场景)
- 重新运行测试
- 重复直到达到阈值
Coverage Dimensions
覆盖率维度
| Dimension | Phase A | Phase B | Target |
|---|---|---|---|
| Topic Selection | ✅ | ✅ | 100% |
| Action Invocation | ✅ | ✅ | 100% |
| Topic Re-matching | ✅ | ❌ | 90%+ |
| Context Preservation | ✅ | ❌ | 95%+ |
| Conversation Completion | ✅ | ❌ | 85%+ |
| Guardrails | ✅ | ✅ | 100% |
| Escalation | ✅ | ✅ | 100% |
| Phrasing Diversity | ✅ | ✅ | 3+ per topic |
See Coverage Analysis for complete metrics and improvement guide.
| 维度 | 阶段A | 阶段B | 目标 |
|---|---|---|---|
| 主题选择 | ✅ | ✅ | 100% |
| 动作调用 | ✅ | ✅ | 100% |
| 主题重匹配 | ✅ | ❌ | 90%+ |
| 上下文保留 | ✅ | ❌ | 95%+ |
| 对话完成 | ✅ | ❌ | 85%+ |
| 防护规则 | ✅ | ✅ | 100% |
| 升级 | ✅ | ✅ | 100% |
| 措辞多样性 | ✅ | ✅ | 每个主题3+种 |
完整指标和改进指南请参见覆盖率分析。
Phase E: Observability Integration
阶段E:可观测性集成
After test execution, guide user to analyze agent behavior with session-level observability:
Skill(skill="sf-ai-agentforce-observability", args="Analyze STDM sessions for agent [AgentName] in org [alias] - focus on test session behavior patterns")What observability adds to testing:
- STDM Session Analysis: Examine actual session traces from test conversations
- Latency Profiling: Identify slow actions or topic routing delays
- Error Pattern Detection: Find recurring failures across sessions
- Action Execution Traces: Detailed view of Flow/Apex execution during tests
测试执行后,引导用户使用会话级可观测性分析Agent行为:
Skill(skill="sf-ai-agentforce-observability", args="分析组织[别名]中Agent[AgentName]的STDM会话 — 关注测试会话行为模式")可观测性为测试增添的价值:
- STDM会话分析: 检查测试对话的实际会话跟踪
- 延迟分析: 识别慢动作或主题路由延迟
- 错误模式检测: 发现跨会话的重复失败
- 动作执行跟踪: 测试期间Flow/Apex执行的详细视图
Scoring System (100 Points)
评分系统(100分)
| Category | Points | Key Rules |
|---|---|---|
| Topic Selection Coverage | 15 | All topics have test cases; various phrasings tested |
| Action Invocation | 15 | All actions tested with valid inputs/outputs |
| Multi-Turn Topic Re-matching | 15 | Topic switching accuracy across turns |
| Context Preservation | 15 | Information retention across turns |
| Edge Case & Guardrail Coverage | 15 | Negative tests; guardrails; escalation |
| Test Spec / Scenario Quality | 10 | Proper YAML; descriptions; clear expectations |
| Agentic Fix Success | 15 | Auto-fixes resolve issues within 3 attempts |
Scoring Thresholds:
⭐⭐⭐⭐⭐ 90-100 pts → Production Ready
⭐⭐⭐⭐ 80-89 pts → Good, minor improvements
⭐⭐⭐ 70-79 pts → Acceptable, needs work
⭐⭐ 60-69 pts → Below standard
⭐ <60 pts → BLOCKED - Major issues| 类别 | 分数 | 关键规则 |
|---|---|---|
| 主题选择覆盖率 | 15 | 所有主题都有测试用例;测试多种措辞 |
| 动作调用 | 15 | 所有动作都使用有效输入/输出测试 |
| 多轮主题重匹配 | 15 | 多轮对话中主题切换的准确性 |
| 上下文保留 | 15 | 多轮对话中信息的保留 |
| 边缘案例与防护规则覆盖率 | 15 | 负面测试;防护规则;升级 |
| 测试用例/场景质量 | 10 | 正确的YAML;描述;清晰的期望 |
| Agent自动修复成功率 | 15 | 自动修复在3次尝试内解决问题 |
评分阈值:
⭐⭐⭐⭐⭐ 90-100分 → 可用于生产
⭐⭐⭐⭐ 80-89分 → 良好,需小幅改进
⭐⭐⭐ 70-79分 → 可接受,需要改进
⭐⭐ 60-69分 → 低于标准
⭐ <60分 → 已阻止 - 存在重大问题⛔ TESTING GUARDRAILS (MANDATORY)
⛔ 测试防护规则(必填)
BEFORE running tests, verify:
| Check | Command | Why |
|---|---|---|
| Agent published | | Can't test unpublished agent |
| Agent activated | Check status | API and preview require activation |
| Flows deployed | | Actions need Flows |
| ECA configured (Phase A — multi-turn API only) | Token request test | Required for Agent Runtime API. Not needed for preview or CLI tests |
| Org auth (Phase B live) | | Live mode requires valid auth |
NEVER do these:
| Anti-Pattern | Problem | Correct Pattern |
|---|---|---|
| Test unpublished agent | Tests fail silently | Publish first |
| Skip simulated testing | Live mode hides logic bugs | Always test simulated first |
| Ignore guardrail tests | Security gaps in production | Always test harmful/off-topic inputs |
| Single phrasing per topic | Misses routing failures | Test 3+ phrasings per topic |
| Write ECA credentials to files | Security risk | Keep in shell variables only |
| Skip session cleanup | Resource leaks and rate limits | Always DELETE sessions after tests |
Use | Domains with | Use |
| Ask permission to run skill scripts | Breaks flow, unnecessary delay | All |
| Spawn more than 2 swarm workers | Context overload, screen space, diminishing returns | Max 2 workers — side-by-side monitoring |
运行测试前,请验证:
| 检查项 | 命令 | 原因 |
|---|---|---|
| Agent已发布 | | 无法测试未发布的Agent |
| Agent已激活 | 检查状态 | API和预览需要激活 |
| Flows已部署 | | 动作需要Flows |
| ECA已配置(阶段A — 仅多轮API测试) | 令牌请求测试 | Agent Runtime API需要。预览或CLI测试不需要 |
| 组织认证(阶段B真实模式) | | 真实模式需要有效认证 |
绝不要做这些:
| 反模式 | 问题 | 正确模式 |
|---|---|---|
| 测试未发布的Agent | 测试静默失败 | 先发布 |
| 跳过模拟测试 | 真实模式隐藏逻辑错误 | 始终先测试模拟模式 |
| 忽略防护规则测试 | 生产中存在安全漏洞 | 始终测试有害/离题输入 |
| 每个主题仅使用一种措辞 | 遗漏路由失败 | 每个主题测试3+种措辞 |
| 将ECA凭证写入文件 | 安全风险 | 仅保存在shell变量中 |
| 跳过会话清理 | 资源泄漏和速率限制 | 测试后始终DELETE会话 |
使用 | 包含 | 使用 |
| 请求运行技能脚本的权限 | 中断流程,不必要的延迟 | 所有 |
| 生成超过2个集群工作进程 | 上下文过载、屏幕空间不足、收益递减 | 最多2个工作进程 — 并排监控 |
CLI Command Reference
CLI命令参考
Test Lifecycle Commands
测试生命周期命令
| Command | Purpose | Example |
|---|---|---|
| Create test YAML | |
| Deploy test to org | |
| Execute tests | |
| Get results | |
| Resume async test | |
| List test runs | |
| 命令 | 用途 | 示例 |
|---|---|---|
| 创建测试YAML | |
| 将测试部署到组织 | |
| 执行测试 | |
| 获取结果 | |
| 恢复异步测试 | |
| 列出测试运行 | |
Preview Commands
预览命令
| Command | Purpose | Example |
|---|---|---|
| Interactive testing | |
| Use real Flows/Apex | |
| Save transcripts | |
| Capture debug logs | |
| 命令 | 用途 | 示例 |
|---|---|---|
| 交互式测试 | |
| 使用真实Flows/Apex | |
| 保存转录 | |
| 捕获调试日志 | |
Result Formats
结果格式
| Format | Use Case | Flag |
|---|---|---|
| Terminal display (default) | |
| CI/CD parsing | |
| Test reporting | |
| Test Anything Protocol | |
| 格式 | 使用场景 | 标志 |
|---|---|---|
| 终端显示(默认) | |
| CI/CD解析 | |
| 测试报告 | |
| 测试任何协议 | |
Multi-Turn Test Templates
多轮测试模板
| Template | Pattern | Scenarios | Location |
|---|---|---|---|
| Topic switching | 4 | |
| Context retention | 4 | |
| Escalation cascades | 4 | |
| All 6 patterns | 6 | |
| 模板 | 模式 | 场景数量 | 位置 |
|---|---|---|---|
| 主题切换 | 4 | |
| 上下文保留 | 4 | |
| 升级流程 | 4 | |
| 所有6种模式 | 6 | |
CLI Test Templates
CLI测试模板
| Template | Purpose | Location |
|---|---|---|
| Quick start (3-5 tests) | |
| Full coverage (20+ tests) with context vars, metrics, custom evals | |
| Context variable patterns (RoutableId, EndUserId, CaseId) | |
| Custom evaluations with JSONPath assertions (⚠️ Spring '26 bug) | |
| Auth gate, guardrail, ambiguous routing, session tests (CLI) | |
| Security/safety scenarios | |
| Human handoff scenarios | |
| Agent Script agents with conversationHistory pattern | |
| Reference format | |
| 模板 | 用途 | 位置 |
|---|---|---|
| 快速入门(3-5个测试) | |
| 全面覆盖(20+个测试),带上下文变量、指标、自定义评估 | |
| 上下文变量模式(RoutableId、EndUserId、CaseId) | |
| 带JSONPath断言的自定义评估(⚠️ Spring '26错误) | |
| 认证门、防护规则、模糊路由、会话测试(CLI) | |
| 安全/安全场景 | |
| 人工交接场景 | |
| 带conversationHistory模式的Agent Script Agent | |
| 参考格式 | |
Cross-Skill Integration
跨技能集成
Required Delegations:
| Scenario | Skill to Call | Command |
|---|---|---|
| Fix agent script | sf-ai-agentscript | |
| Agent Script agents | sf-ai-agentscript | Parse |
| Create test data | sf-data | |
| Fix failing Flow | sf-flow | |
| Setup ECA or OAuth (multi-turn API only) | sf-connected-apps | |
| Analyze debug logs | sf-debug | |
| Session observability | sf-ai-agentforce-observability | |
必填委托:
| 场景 | 要调用的技能 | 命令 |
|---|---|---|
| 修复Agent脚本 | sf-ai-agentscript | |
| Agent Script Agent | sf-ai-agentscript | 解析 |
| 创建测试数据 | sf-data | |
| 修复失败的Flow | sf-flow | |
| 设置ECA或OAuth(仅多轮API测试) | sf-connected-apps | |
| 分析调试日志 | sf-debug | |
| 会话可观测性 | sf-ai-agentforce-observability | |
Automated Testing (Python Scripts)
自动化测试(Python脚本)
| Script | Purpose | Dependencies |
|---|---|---|
| Reusable Agent Runtime API v1 client (auth, sessions, messaging, variables) | stdlib only |
| Multi-turn test orchestrator (reads YAML, executes, evaluates, Rich colored reports) | pyyaml, rich + agent_api_client |
| Aggregate N worker result JSONs into one unified Rich terminal report | rich |
| Parse .agent files, generate CLI test YAML specs | stdlib only |
| Orchestrate full CLI test workflow with fix suggestions | stdlib only |
CLI Flags (multi_turn_test_runner.py):
| Flag | Default | Purpose |
|---|---|---|
| none | Write Rich terminal report to file (ANSI codes included) — viewable with |
| off | Disable Rich colored output; use plain-text format |
| auto | Override terminal width (auto-detects from $COLUMNS; fallback 80) |
| (deprecated) | No-op — Rich is now default when installed |
Multi-Turn Testing (Agent Runtime API):
bash
undefined| 脚本 | 用途 | 依赖项 |
|---|---|---|
| 可复用的Agent Runtime API v1客户端(认证、会话、消息、变量) | 仅标准库 |
| 多轮测试编排器(读取YAML、执行、评估、Rich彩色报告) | pyyaml、rich + agent_api_client |
| 将N个工作进程的结果JSON聚合为一个统一的Rich终端报告 | rich |
| 解析.agent文件,生成CLI测试YAML规范 | 仅标准库 |
| 编排完整的CLI测试工作流并提供修复建议 | 仅标准库 |
CLI标志(multi_turn_test_runner.py):
| 标志 | 默认值 | 用途 |
|---|---|---|
| 无 | 将Rich终端报告写入文件(包含ANSI代码) — 可使用 |
| 关闭 | 禁用Rich彩色输出;使用纯文本格式 |
| 自动 | 覆盖终端宽度(从$COLUMNS自动检测;回退80) |
| (已弃用) | 无操作 — 现在安装后默认使用Rich |
多轮测试(Agent Runtime API):
bash
undefinedInstall test runner dependency
安装测试运行器依赖
pip3 install pyyaml
pip3 install pyyaml
Run multi-turn test suite against an agent
对Agent运行多轮测试套件
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain your-domain.my.salesforce.com
--consumer-key YOUR_KEY
--consumer-secret YOUR_SECRET
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
--my-domain your-domain.my.salesforce.com
--consumer-key YOUR_KEY
--consumer-secret YOUR_SECRET
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain your-domain.my.salesforce.com
--consumer-key YOUR_KEY
--consumer-secret YOUR_SECRET
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
--my-domain your-domain.my.salesforce.com
--consumer-key YOUR_KEY
--consumer-secret YOUR_SECRET
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
Or set env vars and omit credential flags
或设置环境变量并省略凭证标志
export SF_MY_DOMAIN=your-domain.my.salesforce.com
export SF_CONSUMER_KEY=YOUR_KEY
export SF_CONSUMER_SECRET=YOUR_SECRET
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-topic-routing.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--verbose
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-topic-routing.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--verbose
export SF_MY_DOMAIN=your-domain.my.salesforce.com
export SF_CONSUMER_KEY=YOUR_KEY
export SF_CONSUMER_SECRET=YOUR_SECRET
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-topic-routing.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--verbose
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-topic-routing.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--verbose
Connectivity test (verify ECA credentials work)
连通性测试(验证ECA凭证是否有效)
python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py
**CLI Testing (Agent Testing Center):**
```bashpython3 {SKILL_PATH}/hooks/scripts/agent_api_client.py
**CLI测试(Agent测试中心):**
```bashGenerate test spec from agent file
从Agent文件生成测试规范
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file /path/to/Agent.agent
--output specs/Agent-tests.yaml
--agent-file /path/to/Agent.agent
--output specs/Agent-tests.yaml
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file /path/to/Agent.agent
--output specs/Agent-tests.yaml
--agent-file /path/to/Agent.agent
--output specs/Agent-tests.yaml
Run full automated workflow
运行完整的自动化工作流
python3 {SKILL_PATH}/hooks/scripts/run-automated-tests.py
--agent-name MyAgent
--agent-dir /path/to/project
--target-org dev
--agent-name MyAgent
--agent-dir /path/to/project
--target-org dev
---python3 {SKILL_PATH}/hooks/scripts/run-automated-tests.py
--agent-name MyAgent
--agent-dir /path/to/project
--target-org dev
--agent-name MyAgent
--agent-dir /path/to/project
--target-org dev
---🔄 Automated Test-Fix Loop
🔄 自动化测试修复循环
v2.0.0 | Supports both multi-turn API failures and CLI test failures
v2.0.0 | 支持多轮API失败和CLI测试失败
Quick Start
快速开始
bash
undefinedbash
undefinedRun the test-fix loop (CLI tests)
运行测试修复循环(CLI测试)
{SKILL_PATH}/hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3
{SKILL_PATH}/hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3
Exit codes:
退出代码:
0 = All tests passed
0 = 所有测试通过
1 = Fixes needed (Claude Code should invoke sf-ai-agentforce)
1 = 需要修复(Claude Code应调用sf-ai-agentforce)
2 = Max attempts reached, escalate to human
2 = 达到最大尝试次数,升级给人工
3 = Error (org unreachable, test not found, etc.)
3 = 错误(组织不可达,未找到测试等)
undefinedundefinedClaude Code Integration
Claude Code集成
USER: Run automated test-fix loop for Coral_Cloud_Agent
CLAUDE CODE:
1. Phase A: Run multi-turn scenarios via Python test runner
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
--agent-id ${AGENT_ID} \
--scenarios templates/multi-turn-comprehensive.yaml \
--output results.json --verbose
2. Analyze failures from results.json (10 categories)
3. If fixable: Skill(skill="sf-ai-agentscript", args="Fix...")
4. Re-run failed scenarios with --scenario-filter
5. Phase B (if available): Run CLI tests
6. Repeat until passing or max retries (3)用户: 为Coral_Cloud_Agent运行自动化测试修复循环
Claude Code:
1. 阶段A:通过Python测试运行器运行多轮场景
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
--agent-id ${AGENT_ID} \
--scenarios templates/multi-turn-comprehensive.yaml \
--output results.json --verbose
2. 从results.json分析失败(10种类别)
3. 如果可修复:Skill(skill="sf-ai-agentscript", args="修复...")
4. 使用--scenario-filter重新运行失败的场景
5. 阶段B(如果可用):运行CLI测试
6. 重复直到通过或达到最大重试次数(3次)Environment Variables
环境变量
| Variable | Description | Default |
|---|---|---|
| Current attempt number | 1 |
| Timeout for test execution | 10 |
| Comma-separated test names to skip | (none) |
| Enable detailed output | false |
| 变量 | 描述 | 默认值 |
|---|---|---|
| 当前尝试次数 | 1 |
| 测试执行超时 | 10 |
| 要跳过的测试名称(逗号分隔) | 无 |
| 启用详细输出 | false |
💡 Key Insights
💡 关键见解
| Problem | Symptom | Solution |
|---|---|---|
| "Required fields are missing: [MasterLabel]" | Add |
| Tests fail silently | No results returned | Agent not published - run |
| Topic not matched | Wrong topic selected | Add keywords to topic description |
| Action not invoked | Action never called | Improve action description |
| Live preview 401 | Authentication error | Re-authenticate: |
| API 401 | Token expired or wrong credentials | Re-authenticate ECA |
| API 404 on session create | Wrong Agent ID | Re-query BotDefinition for correct Id |
| Empty API response | Agent not activated | Activate and publish agent |
| Context lost between turns | Agent re-asks for known info | Add context retention instructions to topic |
| Topic doesn't switch | Agent stays on old topic | Add transition phrases to target topic |
⚠️ | "Nonexistent flag" error | Use |
| Topic name mismatch | Expected | Verify actual topic names from first test run |
| Action superset matching | Expected | CLI uses SUPERSET logic |
| 问题 | 症状 | 解决方案 |
|---|---|---|
| "Required fields are missing: [MasterLabel]" | 在YAML规范顶部添加 |
| 测试静默失败 | 无结果返回 | Agent未发布 - 运行 |
| 主题未匹配 | 选择了错误的主题 | 向主题描述添加关键词 |
| 动作未调用 | 从未调用动作 | 改进动作描述 |
| 实时预览401 | 认证错误 | 重新认证: |
| API 401 | 令牌过期或凭证错误 | 重新认证ECA |
| API创建会话404 | 错误的Agent ID | 重新查询BotDefinition获取正确的Id |
| API响应为空 | Agent未激活 | 激活并发布Agent |
| 多轮对话中上下文丢失 | Agent重新询问已知信息 | 向主题添加上下文保留指令 |
| 主题不切换 | Agent停留在旧主题 | 向目标主题添加过渡短语 |
⚠️ | "Nonexistent flag"错误 | 明确使用 |
| 主题名称不匹配 | 预期 | 从第一次测试运行验证实际主题名称 |
| 动作超集匹配 | 预期 | CLI使用超集逻辑 |
Quick Start Example
快速开始示例
Multi-Turn API Testing (Recommended)
多轮API测试(推荐)
Quick Start with Python Scripts:
bash
undefined使用Python脚本快速开始:
bash
undefined1. Get agent ID
1. 获取Agent ID
AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id FROM BotDefinition WHERE DeveloperName='My_Agent' AND IsActive=true LIMIT 1"
--result-format json --target-org dev | jq -r '.result.records[0].Id')
--query "SELECT Id FROM BotDefinition WHERE DeveloperName='My_Agent' AND IsActive=true LIMIT 1"
--result-format json --target-org dev | jq -r '.result.records[0].Id')
AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id FROM BotDefinition WHERE DeveloperName='My_Agent' AND IsActive=true LIMIT 1"
--result-format json --target-org dev | jq -r '.result.records[0].Id')
--query "SELECT Id FROM BotDefinition WHERE DeveloperName='My_Agent' AND IsActive=true LIMIT 1"
--result-format json --target-org dev | jq -r '.result.records[0].Id')
2. Run multi-turn tests (credentials from env or flags)
2. 运行多轮测试(凭证来自环境变量或标志)
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
**Ad-Hoc Python Usage:**
```python
from hooks.scripts.agent_api_client import AgentAPIClient
client = AgentAPIClient() # reads SF_MY_DOMAIN, SF_CONSUMER_KEY, SF_CONSUMER_SECRET from env
with client.session(agent_id="0XxRM000...") as session:
r1 = session.send("I need to cancel my appointment")
r2 = session.send("Actually, reschedule it instead")
r3 = session.send("What was my original request about?")
# Session auto-ends when exiting context managerpython3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose
**临时Python使用:**
```python
from hooks.scripts.agent_api_client import AgentAPIClient
client = AgentAPIClient() # 从环境变量读取SF_MY_DOMAIN、SF_CONSUMER_KEY、SF_CONSUMER_SECRET
with client.session(agent_id="0XxRM000...") as session:
r1 = session.send("我需要取消我的预约")
r2 = session.send("实际上,改为重新安排")
r3 = session.send("我最初的请求是什么?")
# 退出上下文管理器时自动结束会话CLI Testing (If Agent Testing Center Available)
CLI测试(如果Agent测试中心可用)
bash
undefinedbash
undefined1. Generate test spec
1. 生成测试规范
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file ./agents/MyAgent.agent
--output ./tests/myagent-tests.yaml
--agent-file ./agents/MyAgent.agent
--output ./tests/myagent-tests.yaml
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file ./agents/MyAgent.agent
--output ./tests/myagent-tests.yaml
--agent-file ./agents/MyAgent.agent
--output ./tests/myagent-tests.yaml
2. Create test in org
2. 在组织中创建测试
sf agent test create --spec ./tests/myagent-tests.yaml --api-name MyAgentTest --target-org dev
sf agent test create --spec ./tests/myagent-tests.yaml --api-name MyAgentTest --target-org dev
3. Run tests
3. 运行测试
sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org dev
sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org dev
4. View results (use --job-id, NOT --use-most-recent)
4. 查看结果(使用--job-id,不要使用--use-most-recent)
sf agent test results --job-id [JOB_ID] --verbose --result-format json --target-org dev
---sf agent test results --job-id [JOB_ID] --verbose --result-format json --target-org dev
---🐛 Known Issues & CLI Bugs
🐛 已知问题与CLI错误
Last Updated: 2026-02-11 | Tested With: sf CLI v2.118.16+
最后更新: 2026-02-11 | 测试版本: sf CLI v2.118.16+
RESOLVED: sf agent test create
MasterLabel Error
sf agent test create已解决:sf agent test create
MasterLabel错误
sf agent test createStatus: 🟢 RESOLVED — Add field to YAML spec
name:Error:
Required fields are missing: [MasterLabel]Root Cause: The YAML spec must include a field at the top level, which maps to in the XML. Our templates previously omitted this field.
name:MasterLabelAiEvaluationDefinitionFix: Add to the top of your YAML spec:
name:yaml
name: "My Agent Tests" # ← This was the missing field
subjectType: AGENT
subjectName: My_AgentIf you still encounter issues:
- ✅ Use interactive wizard (interactive-only, no CLI flags)
sf agent generate test-spec - ✅ Create tests via Salesforce Testing Center UI
- ✅ Deploy XML metadata directly
- ✅ Use Phase A (Agent Runtime API) instead — bypasses CLI entirely
状态: 🟢 已解决 — 向YAML规范添加字段
name:错误:
Required fields are missing: [MasterLabel]根本原因: YAML规范必须在顶级包含字段,该字段映射到 XML中的。我们的模板之前省略了此字段。
name:AiEvaluationDefinitionMasterLabel修复: 在YAML规范顶部添加:
name:yaml
name: "我的Agent测试" # ← 这是缺失的字段
subjectType: AGENT
subjectName: My_Agent如果仍遇到问题:
- ✅ 使用交互式向导(仅交互式,无CLI标志)
sf agent generate test-spec - ✅ 通过Salesforce测试中心UI创建测试
- ✅ 直接部署XML元数据
- ✅ 改用阶段A(Agent Runtime API) — 完全绕过CLI
MEDIUM: Interactive Mode Not Scriptable
中等:交互模式不可脚本化
Status: 🟡 Blocks CI/CD automation
Issue: only works interactively.
sf agent generate test-specWorkaround: Use Python scripts in or Phase A multi-turn templates.
hooks/scripts/状态: 🟡 阻止CI/CD自动化
问题: 仅支持交互式。
sf agent generate test-spec解决方法: 使用中的Python脚本或阶段A多轮模板。
hooks/scripts/MEDIUM: YAML vs XML Format Discrepancy
中等:YAML与XML格式差异
Key Mappings:
| YAML Field | XML Element / Assertion Type |
|---|---|
| |
| |
| |
| |
| |
| |
关键映射:
| YAML字段 | XML元素 / 断言类型 |
|---|---|
| |
| |
| |
| |
| |
| |
LOW: BotDefinition Not Always in Tooling API
低:BotDefinition并非始终在工具API中
Status: 🟡 Handled automatically
Issue: In some org configurations, is not queryable via the Tooling API but works via the regular Data API ( without ).
BotDefinitionsf data query--use-tooling-apiFix: now has automatic fallback — if the Tooling API returns no results for BotDefinition, it retries with the regular API.
agent_discovery.py live状态: 🟡 自动处理
问题: 在某些组织配置中,无法通过工具API查询,但可通过常规数据API查询(不带的)。
BotDefinition--use-tooling-apisf data query修复: 现在有自动回退 — 如果工具API未返回BotDefinition结果,它会使用常规API重试。
agent_discovery.py liveLOW: --use-most-recent
Not Implemented
--use-most-recent低:--use-most-recent
未实现
--use-most-recentStatus: Flag documented but NOT functional. Always use explicitly.
--job-id状态: 文档中记录了标志但无功能。始终明确使用。
--job-idCRITICAL: Custom Evaluations RETRY Bug (Spring '26)
严重:自定义评估RETRY错误(Spring '26)
Status: 🔴 PLATFORM BUG — Blocks all / evaluations with JSONPath
string_comparisonnumeric_comparisonError:
INTERNAL_SERVER_ERROR: The specified enum type has no constant with the specified name: RETRYScope:
- Server returns "RETRY" status for test cases with custom evaluations using
isReference: true - Results API endpoint crashes with HTTP 500 when fetching results
- Both filter expressions AND direct indexing
[?(@.field == 'value')]trigger the bug[0][0] - Tests WITHOUT custom evaluations on the same run complete normally
Confirmed: Direct to REST endpoint returns same 500 — NOT a CLI parsing issue
curlWorkaround:
- Use Testing Center UI (Setup → Agent Testing) — may display results
- Skip custom evaluations until platform patch
- Use (LLM-as-judge) for response validation instead
expectedOutcome
Tracking: Discovered 2026-02-09 on DevInt sandbox (Spring '26). TODO: Retest after platform patch.
状态: 🔴 平台错误 — 阻止所有带JSONPath的 / 评估
string_comparisonnumeric_comparison错误:
INTERNAL_SERVER_ERROR: The specified enum type has no constant with the specified name: RETRY范围:
- 对于使用的自定义评估的测试用例,服务器返回"RETRY"状态
isReference: true - 获取结果时结果API端点崩溃并显示HTTP 500
- 过滤表达式和直接索引
[?(@.field == 'value')]都会触发错误[0][0] - 同一运行中不带自定义评估的测试正常完成
已确认: 直接到REST端点返回相同的500 — 不是CLI解析问题
curl解决方法:
- 使用测试中心UI(设置 → Agent测试) — 可能显示结果
- 跳过自定义评估直到平台修复
- 改用(LLM作为判断者)进行响应验证
expectedOutcome
跟踪: 2026-02-09在DevInt沙箱(Spring '26)中发现。待办事项:平台修复后重新测试。
MEDIUM: conciseness
Metric Returns Score=0
conciseness中等:conciseness
指标返回Score=0
concisenessStatus: 🟡 Platform bug — metric evaluation appears non-functional
Issue: The metric consistently returns with an empty field across all test cases tested on DevInt (Spring '26).
concisenessscore: 0metricExplainabilityWorkaround: Skip in metrics lists until platform patch.
conciseness状态: 🟡 平台错误 — 指标评估似乎无功能
问题: 在DevInt(Spring '26)测试的所有测试用例中,指标始终返回且字段为空。
concisenessscore: 0metricExplainability解决方法: 在平台修复前,在指标列表中跳过。
concisenessLOW: instruction_following
FAILURE at Score=1
instruction_following低:instruction_following
在Score=1时失败
instruction_followingStatus: 🟡 Threshold mismatch — score and label disagree
Issue: The metric labels results as "FAILURE" even when and the explanation text says the agent "follows instructions perfectly." This appears to be a pass/fail threshold configuration error on the platform side.
instruction_followingscore: 1Workaround: Use the numeric value (0 or 1) for evaluation. Ignore the PASS/FAILURE label.
score状态: 🟡 阈值不匹配 — 分数和标签不一致
问题: 指标即使在且说明文本说Agent"完全遵循指令"时也会将结果标记为"FAILURE"。这似乎是平台端的通过/失败阈值配置错误。
instruction_followingscore: 1解决方法: 使用数字值(0或1)进行评估,忽略PASS/FAILURE标签。
scoreHIGH: instruction_following
Crashes Testing Center UI
instruction_following高:instruction_following
导致测试中心UI崩溃
instruction_followingStatus: 🔴 Blocks Testing Center UI entirely — separate from threshold bug above
Error:
Unable to get test suite: No enum constant einstein.gpt.shared.testingcenter.enums.AiEvaluationMetricType.INSTRUCTION_FOLLOWING_EVALUATIONScope: The Testing Center UI (Setup → Agent Testing) throws a Java exception when opening any test suite that includes the metric. The CLI () works fine — only the UI rendering is broken.
instruction_followingsf agent test runWorkaround: Remove from the YAML metrics list and redeploy the test spec via .
- instruction_followingsf agent test create --force-overwriteNote: This is a different bug from the threshold mismatch above. The threshold bug affects score interpretation; this bug blocks the entire UI from loading.
Discovered: 2026-02-11 on DevInt sandbox (Spring '26).
状态: 🔴 完全阻止测试中心UI — 与上述阈值错误无关
错误:
Unable to get test suite: No enum constant einstein.gpt.shared.testingcenter.enums.AiEvaluationMetricType.INSTRUCTION_FOLLOWING_EVALUATION范围: 当打开任何包含指标的测试套件时,测试中心UI(设置 → Agent测试)抛出Java异常。CLI()正常工作 — 仅UI渲染损坏。
instruction_followingsf agent test run解决方法: 从YAML指标列表中删除,并通过重新部署测试规范。
- instruction_followingsf agent test create --force-overwrite注意: 这是与上述阈值错误不同的问题。阈值错误影响分数解释;此错误完全阻止UI加载。
发现时间: 2026-02-11在DevInt沙箱(Spring '26)中。
License
许可证
MIT License. See LICENSE file.
Copyright (c) 2024-2026 Jag Valaiyapathy
MIT许可证。请参见LICENSE文件。
版权所有 (c) 2024-2026 Jag Valaiyapathy