sf-ai-agentforce-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

sf-ai-agentforce-testing: Agentforce Test Execution & Coverage Analysis

sf-ai-agentforce-testing：Agentforce测试执行与覆盖率分析

Expert testing engineer specializing in Agentforce agent testing via dual-track workflow: multi-turn Agent Runtime API testing (primary) and CLI Testing Center (secondary). Execute multi-turn conversations, analyze topic/action/context coverage, and automatically fix issues via sf-ai-agentscript.

专注于Agentforce Agent测试的专业测试工程师，采用双轨工作流：多轮Agent Runtime API测试（主流程）和CLI测试中心（副流程）。执行多轮对话测试、分析主题/动作/上下文覆盖率，并通过sf-ai-agentscript自动修复问题。

Core Responsibilities

核心职责

Multi-Turn API Testing (PRIMARY): Execute multi-turn conversations via Agent Runtime API
CLI Test Execution (SECONDARY): Run single-utterance tests via
```
sf agent test run
```
Test Spec / Scenario Generation: Create YAML test specifications and multi-turn scenarios
Coverage Analysis: Track topic, action, context preservation, and re-matching coverage
Preview Testing: Interactive simulated and live agent testing
Agentic Fix Loop: Automatically fix failing agents and re-test
Cross-Skill Orchestration: Delegate fixes to sf-ai-agentscript, data to sf-data
Observability Integration: Guide to sf-ai-agentforce-observability for STDM analysis

多轮API测试（主流程）：通过Agent Runtime API执行多轮对话测试
CLI测试执行（副流程）：通过
```
sf agent test run
```
执行单轮语句测试
测试用例/场景生成：创建YAML测试规范和多轮测试场景
覆盖率分析：跟踪主题、动作、上下文保留和重匹配覆盖率
预览测试：交互式模拟和真实Agent测试
Agent自动修复循环：自动修复测试不通过的Agent并重新测试
跨技能编排：将修复任务委托给sf-ai-agentscript，数据任务委托给sf-data
可观测性集成：引导使用sf-ai-agentforce-observability进行STDM分析

📚 Document Map

📚 文档地图

Need	Document	Description
Agent Runtime API	agent-api-reference.md	REST endpoints for multi-turn testing
ECA Setup	eca-setup-guide.md	External Client App for API authentication
Multi-Turn Testing	multi-turn-testing-guide.md	Multi-turn test design and execution
Test Patterns	multi-turn-test-patterns.md	6 multi-turn test patterns with examples
CLI commands	cli-commands.md	Complete sf agent test/preview reference
Test spec format	test-spec-reference.md	YAML specification format and examples
Auto-fix workflow	agentic-fix-loops.md	Automated test-fix cycles (10 failure categories)
Auth guide	connected-app-setup.md	Authentication for preview and API testing
Coverage metrics	coverage-analysis.md	Topic/action/multi-turn coverage analysis
Fix decision tree	agentic-fix-loop.md	Detailed fix strategies
Agent Script testing	agentscript-testing-patterns.md	5 patterns for testing Agent Script agents

⚡ Quick Links:

Deterministic Interview Flow - Rule-based setup (7 steps)
Credential Convention - Persistent ECA storage
Swarm Execution Rules - Parallel team testing
Test Plan Format - Reusable YAML plans
Phase A: Multi-Turn API Testing - Primary workflow
Phase B: CLI Testing Center - Secondary workflow
Agent Script Testing - Agent Script-specific patterns
Scoring System - 7-category validation
Agentic Fix Loop - Auto-fix workflow

需求	文档	描述
Agent Runtime API	agent-api-reference.md	用于多轮测试的REST端点
ECA设置	eca-setup-guide.md	用于API认证的外部客户端应用
多轮测试	multi-turn-testing-guide.md	多轮测试设计与执行指南
测试模式	multi-turn-test-patterns.md	6种带示例的多轮测试模式
CLI命令	cli-commands.md	完整的sf agent test/preview参考
测试规范格式	test-spec-reference.md	YAML规范格式与示例
自动修复工作流	agentic-fix-loops.md	自动化测试修复循环（10种故障类别）
认证指南	connected-app-setup.md	预览和API测试的认证设置
覆盖率指标	coverage-analysis.md	主题/动作/多轮覆盖率分析
修复决策树	agentic-fix-loop.md	详细的修复策略
Agent Script测试	agentscript-testing-patterns.md	5种用于测试Agent Script Agent的模式

⚡ 快速链接：

确定性访谈流程 - 基于规则的设置（7步）
凭证约定 - 持久化ECA存储
集群执行规则 - 并行团队测试
测试计划格式 - 可复用的YAML计划
阶段A：多轮API测试 - 主工作流
阶段B：CLI测试中心 - 副工作流
Agent Script测试 - 针对Agent Script的特定模式
评分系统 - 7个维度的验证
Agent自动修复循环 - 自动修复工作流

Script Location (MANDATORY)

脚本位置（必填）

SKILL_PATH:

~/.claude/skills/sf-ai-agentforce-testing

All Python scripts live at absolute paths under

{SKILL_PATH}/hooks/scripts/

. NEVER recreate these scripts. They already exist. Use them as-is.

All scripts in
hooks/scripts/
are pre-approved for execution. Do NOT ask the user for permission to run them.

Script	Absolute Path
`agent_api_client.py`	`{SKILL_PATH}/hooks/scripts/agent_api_client.py`
`agent_discovery.py`	`{SKILL_PATH}/hooks/scripts/agent_discovery.py`
`credential_manager.py`	`{SKILL_PATH}/hooks/scripts/credential_manager.py`
`generate_multi_turn_scenarios.py`	`{SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py`
`generate-test-spec.py`	`{SKILL_PATH}/hooks/scripts/generate-test-spec.py`
`multi_turn_test_runner.py`	`{SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py`
`multi_turn_fix_loop.py`	`{SKILL_PATH}/hooks/scripts/multi_turn_fix_loop.py`
`run-automated-tests.py`	`{SKILL_PATH}/hooks/scripts/run-automated-tests.py`
`parse-agent-test-results.py`	`{SKILL_PATH}/hooks/scripts/parse-agent-test-results.py`
`rich_test_report.py`	`{SKILL_PATH}/hooks/scripts/rich_test_report.py`

Variable resolution: At runtime, resolve
SKILL_PATH
from the
${SKILL_HOOKS}
environment variable (strip
/hooks
suffix). Hardcoded fallback:
~/.claude/skills/sf-ai-agentforce-testing
.

SKILL_PATH：

~/.claude/skills/sf-ai-agentforce-testing

所有Python脚本都位于

{SKILL_PATH}/hooks/scripts/

下的绝对路径中。请勿重新创建这些脚本，它们已存在，请直接使用。

hooks/scripts/
中的所有脚本均已预先批准执行，无需向用户请求运行权限。

脚本	绝对路径
`agent_api_client.py`	`{SKILL_PATH}/hooks/scripts/agent_api_client.py`
`agent_discovery.py`	`{SKILL_PATH}/hooks/scripts/agent_discovery.py`
`credential_manager.py`	`{SKILL_PATH}/hooks/scripts/credential_manager.py`
`generate_multi_turn_scenarios.py`	`{SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py`
`generate-test-spec.py`	`{SKILL_PATH}/hooks/scripts/generate-test-spec.py`
`multi_turn_test_runner.py`	`{SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py`
`multi_turn_fix_loop.py`	`{SKILL_PATH}/hooks/scripts/multi_turn_fix_loop.py`
`run-automated-tests.py`	`{SKILL_PATH}/hooks/scripts/run-automated-tests.py`
`parse-agent-test-results.py`	`{SKILL_PATH}/hooks/scripts/parse-agent-test-results.py`
`rich_test_report.py`	`{SKILL_PATH}/hooks/scripts/rich_test_report.py`

变量解析： 在运行时，从
${SKILL_HOOKS}
环境变量中解析
SKILL_PATH
（去除
/hooks
后缀）。硬编码回退值：
~/.claude/skills/sf-ai-agentforce-testing
。

⚠️ CRITICAL: Orchestration Order

⚠️ 关键：编排顺序

sf-metadata → sf-apex → sf-flow → sf-deploy → sf-ai-agentscript → sf-deploy → sf-ai-agentforce-testing (you are here)

Why testing is LAST:

Agent must be published before running automated tests
Agent must be activated for preview mode and API access
All dependencies (Flows, Apex) must be deployed first
Test data (via sf-data) should exist before testing actions

⚠️ MANDATORY Delegation:

Fixes: ALWAYS use
```
Skill(skill="sf-ai-agentscript")
```
for agent script fixes
Test Data: Use
```
Skill(skill="sf-data")
```
for action test data
OAuth Setup (multi-turn API testing only): Use
```
Skill(skill="sf-connected-apps")
```
for ECA — NOT needed for
```
sf agent preview
```
or CLI tests
Observability: Use
```
Skill(skill="sf-ai-agentforce-observability")
```
for STDM analysis of test sessions

sf-metadata → sf-apex → sf-flow → sf-deploy → sf-ai-agentscript → sf-deploy → sf-ai-agentforce-testing（当前位置）

为什么测试放在最后：

Agent必须发布后才能运行自动化测试
Agent必须激活才能使用预览模式和API访问
所有依赖项（Flows、Apex）必须先部署完成
测试数据（通过sf-data）应在测试动作前准备好

⚠️ 强制委托：

修复：始终使用
```
Skill(skill="sf-ai-agentscript")
```
进行Agent Script修复
测试数据：使用
```
Skill(skill="sf-data")
```
获取动作测试数据
OAuth设置（仅多轮API测试）：使用
```
Skill(skill="sf-connected-apps")
```
配置ECA —
```
sf agent preview
```
或CLI测试不需要
可观测性：使用
```
Skill(skill="sf-ai-agentforce-observability")
```
对测试会话进行STDM分析

Architecture: Dual-Track Testing Workflow

架构：双轨测试工作流

Deterministic Interview (I-1 → I-7)
    │  Agent Name → Org Alias → Metadata → Credentials → Scenarios → Partition → Confirm
    │  (skip if test-plan-{agent}.yaml provided)
    │
    ▼
Phase 0: Prerequisites & Agent Discovery
    │
    ├──► Phase A: Multi-Turn API Testing (PRIMARY — requires ECA)
    │    A1: ECA Credential Setup (via credential_manager.py)
    │    A2: Agent Discovery & Metadata Retrieval
    │    A3: Test Scenario Planning (generate_multi_turn_scenarios.py --categorized)
    │    A4: Multi-Turn Execution (Agent Runtime API)
    │        ├─ Sequential: single multi_turn_test_runner.py process
    │        └─ Swarm: TeamCreate → N workers (--worker-id N)
    │    A5: Results & Scoring (rich Unicode output)
    │
    └──► Phase B: CLI Testing Center (SECONDARY)
         B1: Test Spec Creation
         B2: Test Execution (sf agent test run)
         B3: Results Analysis
    │
Phase C: Agentic Fix Loop (shared)
Phase D: Coverage Improvement (shared)
Phase E: Observability Integration (STDM analysis)

When to use which track:

Condition	Use
Agent Testing Center NOT available	Phase A only
Need multi-turn conversation testing	Phase A
Need topic re-matching validation	Phase A
Need context preservation testing	Phase A
Agent Testing Center IS available + single-utterance tests	Phase B
CI/CD pipeline integration	Phase A (Python scripts) or Phase B (sf CLI)
Quick smoke test	Phase B
Quick manual validation (no ECA setup)	`sf agent preview` (no Phase A/B needed)
No ECA available	`sf agent preview` or Phase B (CLI tests)

确定性访谈（I-1 → I-7）
    │  Agent名称 → 组织别名 → 元数据 → 凭证 → 场景 → 分区 → 确认
    │ （如果提供了test-plan-{agent}.yaml则跳过）
    │
    ▼
阶段0：前置条件与Agent发现
    │
    ├──► 阶段A：多轮API测试（主流程 — 需要ECA）
    │    A1：ECA凭证设置（通过credential_manager.py）
    │    A2：Agent发现与元数据检索
    │    A3：测试场景规划（generate_multi_turn_scenarios.py --categorized）
    │    A4：多轮执行（Agent Runtime API）
    │        ├─ 顺序执行：单个multi_turn_test_runner.py进程
    │        └─ 集群执行：TeamCreate → N个工作进程（--worker-id N）
    │    A5：结果与评分（富文本Unicode输出）
    │
    └──► 阶段B：CLI测试中心（副流程）
         B1：测试规范创建
         B2：测试执行（sf agent test run）
         B3：结果分析
    │
阶段C：Agent自动修复循环（共享）
阶段D：覆盖率提升（共享）
阶段E：可观测性集成（STDM分析）

何时使用不同流程：

条件	使用流程
Agent测试中心不可用	仅阶段A
需要多轮对话测试	阶段A
需要主题重匹配验证	阶段A
需要上下文保留测试	阶段A
Agent测试中心可用 + 单轮语句测试	阶段B
CI/CD流水线集成	阶段A（Python脚本）或阶段B（sf CLI）
快速冒烟测试	阶段B
快速手动验证（无需ECA设置）	`sf agent preview` （无需阶段A/B）
无ECA可用	`sf agent preview` 或阶段B（CLI测试）

Phase 0: Prerequisites & Agent Discovery

阶段0：前置条件与Agent发现

Step 1: Gather User Information

步骤1：收集用户信息

Use AskUserQuestion to gather:

AskUserQuestion:
  questions:
    - question: "Which agent do you want to test?"
      header: "Agent"
      options:
        - label: "Let me discover agents in the org"
          description: "Query BotDefinition to find available agents"
        - label: "I know the agent name"
          description: "Provide agent name/API name directly"

    - question: "What is your target org alias?"
      header: "Org"
      options:
        - label: "vivint-DevInt"
          description: "Development integration org"
        - label: "Other"
          description: "Specify a different org alias"

    - question: "What type of testing do you need?"
      header: "Test Type"
      options:
        - label: "Multi-turn API testing (Recommended)"
          description: "Full conversation testing via Agent Runtime API — tests topic switching, context retention, escalation cascades"
        - label: "CLI single-utterance testing"
          description: "Traditional sf agent test run — requires Agent Testing Center feature"
        - label: "Both"
          description: "Run both multi-turn and CLI tests for comprehensive coverage"

使用AskUserQuestion收集以下信息：

AskUserQuestion:
  questions:
    - question: "您要测试哪个Agent？"
      header: "Agent"
      options:
        - label: "让我发现组织中的Agent"
          description: "查询BotDefinition以找到可用的Agent"
        - label: "我知道Agent名称"
          description: "直接提供Agent名称/API名称"

    - question: "您的目标组织别名是什么？"
      header: "组织"
      options:
        - label: "vivint-DevInt"
          description: "开发集成组织"
        - label: "其他"
          description: "指定不同的组织别名"

    - question: "您需要哪种类型的测试？"
      header: "测试类型"
      options:
        - label: "多轮API测试（推荐）"
          description: "通过Agent Runtime API进行完整对话测试 — 测试主题切换、上下文保留、升级流程"
        - label: "CLI单轮语句测试"
          description: "传统的sf agent test run — 需要Agent测试中心功能"
        - label: "两者都要"
          description: "运行多轮和CLI测试以获得全面覆盖率"

Step 2: Agent Discovery

步骤2：Agent发现

bash

undefined

bash

undefined

Auto-discover active agents in the org

自动发现组织中的活跃Agent

sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true"
--result-format json --target-org [alias]

undefined

sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE IsActive=true"
--result-format json --target-org [别名]

undefined

Step 3: Agent Metadata Retrieval

步骤3：Agent元数据检索

bash

undefined

bash

undefined

Retrieve agent configuration (topics, actions, instructions)

检索Agent配置（主题、动作、指令）

sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentDeveloperName]"
--output-dir retrieve-temp --target-org [alias]


Claude reads the GenAiPlannerBundle to understand:
- All topics and their `classificationDescription` values
- All actions and their configurations
- System instructions and guardrails
- Escalation paths

sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentDeveloperName]"
--output-dir retrieve-temp --target-org [别名]


Claude读取GenAiPlannerBundle以了解：
- 所有主题及其`classificationDescription`值
- 所有动作及其配置
- 系统指令和防护规则
- 升级路径

Step 4: Check Agent Testing Center Availability

步骤4：检查Agent测试中心可用性

bash

undefined

bash

undefined

This determines if Phase B is available

此命令决定阶段B是否可用

sf agent test list --target-org [alias]

sf agent test list --target-org [别名]

If error: "INVALID_TYPE: Cannot use: AiEvaluationDefinition"

如果出现错误："INVALID_TYPE: Cannot use: AiEvaluationDefinition"

→ Agent Testing Center NOT enabled → Phase A only

→ Agent测试中心未启用 → 仅使用阶段A

If success: → Both Phase A and Phase B available

如果成功：→ 阶段A和阶段B均可用

undefined

undefined

Step 5: Prerequisites Checklist

步骤5：前置条件检查清单

Check	Command	Why
Agent exists	`sf data query --use-tooling-api --query "SELECT Id FROM BotDefinition WHERE DeveloperName='X'"`	Can't test non-existent agent
Agent published	`sf agent validate authoring-bundle --api-name X`	Must be published to test
Agent activated	Check activation status	Required for API access
Dependencies deployed	Flows and Apex in org	Actions will fail without them
ECA configured (Phase A only)	Token request test	Multi-turn API testing only. NOT needed for preview or CLI tests
Agent Testing Center (Phase B)	`sf agent test list`	Required for CLI testing

检查项	命令	原因
Agent存在	`sf data query --use-tooling-api --query "SELECT Id FROM BotDefinition WHERE DeveloperName='X'"`	无法测试不存在的Agent
Agent已发布	`sf agent validate authoring-bundle --api-name X`	必须发布后才能测试
Agent已激活	检查激活状态	API访问需要激活
依赖项已部署	组织中存在Flows和Apex	没有依赖项动作会失败
ECA已配置（仅阶段A）	令牌请求测试	仅多轮API测试需要，预览或CLI测试不需要
Agent测试中心（阶段B）	`sf agent test list`	CLI测试需要

Deterministic Multi-Turn Interview Flow

确定性多轮访谈流程

When the testing skill is invoked, follow these interview steps in order. Each step has deterministic rules with fallbacks. The goal: gather all inputs needed to execute multi-turn tests without ambiguity.

Skip the interview if the user provides a
test-plan-{agent}.yaml
file — load it directly and jump to Swarm Execution Rules.

Step	Rule	Fallback
I-0: Skill Path	Resolve `SKILL_PATH` from `${SKILL_HOOKS}` env var (strip `/hooks` suffix). If unset → hardcoded `~/.claude/skills/sf-ai-agentforce-testing` . Verify directory exists. All subsequent script references use `{SKILL_PATH}/hooks/scripts/` .	Hardcoded path
I-1: Agent Name	User provided → use it. Else walk up from CWD looking for `sfdx-project.json` → run `python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir .` . Multiple agents → present numbered list via AskUserQuestion. None found → ask user.	AskUserQuestion
I-2: Org Alias	User provided → use it. Else parse `sfdx-project.json` → read `sfdx-config.json` for `target-org` . Else ask user. Note: org aliases are case-sensitive (e.g., `Vivint-DevInt` ≠ `vivint-devint` ).	AskUserQuestion
I-3: Metadata	ALWAYS run `python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py live --target-org {org} --agent-name {agent}` . Extract topics, actions, type, agent_id. This step is mandatory — never skip.	Required (fail if no agent found)
I-4: Credentials	Skip if test type is CLI-only or Preview-only — standard org auth suffices (no ECA needed). For multi-turn API testing: Run `python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias {org}` . Found ECA → `validate` . Valid → use. Invalid → ask user for new credentials → `save` → re-validate. No ECAs found → ask user → offer to save via `credential_manager.py save` .	AskUserQuestion for credentials (multi-turn API only)
I-4b: Session Variables	ALWAYS ask. Extract known context variables from agent metadata ( `attributeMappings` where `mappingType=ContextVariable` in GenAiPlannerBundle). WARN if `User_Authentication` topic exists — the agent likely requires `$Context.RoutableId` and `$Context.CaseId` to authenticate the customer. Present discovered variables and ask user for values.	AskUserQuestion
I-5: Scenarios	Pipe discovery metadata to `python3 {SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py --metadata - --output {dir} --categorized --cross-topic` . Present summary: N scenarios across M categories.	Required
I-6: Partition	Ask user how to split work across workers.	AskUserQuestion (see below)
I-7: Confirm	Present test plan summary. Save as `test-plan-{agent}.yaml` using template. User confirms to proceed.	AskUserQuestion

当调用测试技能时，请按顺序遵循以下访谈步骤。每个步骤都有确定性规则和回退方案。目标：收集执行多轮测试所需的所有输入，避免歧义。

如果用户提供了
test-plan-{agent}.yaml
文件，请跳过访谈，直接加载文件并跳转到集群执行规则。

步骤	规则	回退方案
I-0：技能路径	从 `${SKILL_HOOKS}` 环境变量解析 `SKILL_PATH` （去除 `/hooks` 后缀）。如果未设置 → 使用硬编码值 `~/.claude/skills/sf-ai-agentforce-testing` 。验证目录是否存在。后续所有脚本引用均使用 `{SKILL_PATH}/hooks/scripts/` 。	硬编码路径
I-1：Agent名称	用户提供 → 使用该名称。否则从当前工作目录向上查找 `sfdx-project.json` → 运行 `python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir .` 。如果有多个Agent → 通过AskUserQuestion显示编号列表。如果未找到 → 询问用户。	AskUserQuestion
I-2：组织别名	用户提供 → 使用该别名。否则解析 `sfdx-project.json` → 读取 `sfdx-config.json` 中的 `target-org` 。否则询问用户。注意：组织别名区分大小写（例如 `Vivint-DevInt` ≠ `vivint-devint` ）。	AskUserQuestion
I-3：元数据	始终运行 `python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py live --target-org {org} --agent-name {agent}` 。提取主题、动作、类型、agent_id。此步骤为必填项，请勿跳过。	必填项（如果未找到Agent则失败）
I-4：凭证	如果测试类型为仅CLI或仅预览则跳过 — 标准组织认证已足够（不需要ECA）。对于多轮API测试：运行 `python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias {org}` 。如果找到ECA → `validate` 。如果有效 → 使用。如果无效 → 询问用户获取新凭证 → `save` → 重新验证。如果未找到ECA → 询问用户 → 提供通过 `credential_manager.py save` 保存的选项。	AskUserQuestion获取凭证（仅多轮API测试）
I-4b：会话变量	始终询问。从Agent元数据中提取已知的上下文变量（GenAiPlannerBundle中 `mappingType=ContextVariable` 的 `attributeMappings` ）。如果存在 `User_Authentication` 主题则发出警告 — 该Agent可能需要 `$Context.RoutableId` 和 `$Context.CaseId` 来认证客户。显示发现的变量并询问用户提供值。	AskUserQuestion
I-5：场景	将发现的元数据传入 `python3 {SKILL_PATH}/hooks/scripts/generate_multi_turn_scenarios.py --metadata - --output {dir} --categorized --cross-topic` 。显示摘要：M个类别中的N个场景。	必填项
I-6：分区	询问用户如何在工作进程之间分配工作。	AskUserQuestion（见下文）
I-7：确认	显示测试计划摘要。使用模板保存为 `test-plan-{agent}.yaml` 。用户确认后继续。	AskUserQuestion

I-4b: Session Variables

I-4b：会话变量

Context variables are MANDATORY for agents that use authentication flows (e.g.,

User_Authentication

topic). Without them, the agent's authentication flow fails and the session ends on Turn 1.

Extract context variables from agent metadata:

Run

python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir {project}

and look for

context_variables

in the GenAiPlannerBundle output.

Common variables:
```
$Context.RoutableId
```
(MessagingSession ID),
```
$Context.CaseId
```
(Case record ID).

AskUserQuestion:
  question: "The agent requires context variables for testing. Which values should we use?"
  header: "Variables"
  options:
    - label: "Use test record IDs (Recommended)"
      description: "I'll provide real MessagingSession and Case IDs from the org for testing"
    - label: "Skip variables"
      description: "Run without context variables — WARNING: authentication topics will likely fail"
    - label: "Auto-discover from org"
      description: "Query the org for recent MessagingSession and Case records to use as test values"
  multiSelect: false

⚠️ WARNING: If the agent has a
User_Authentication
topic that runs
Bot_User_Verification
, you MUST provide
$Context.RoutableId
and
$Context.CaseId
. Without them, the verification flow fails → agent escalates →
SessionEnded
on Turn 1.

对于使用认证流程的Agent（例如

User_Authentication

主题），上下文变量是必填项。没有这些变量，Agent的认证流程会失败，会话在第一轮就结束。

从Agent元数据中提取上下文变量：

运行

python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local --project-dir {project}

并在GenAiPlannerBundle输出中查找

context_variables

。

常见变量：
```
$Context.RoutableId
```
（消息会话ID）、
```
$Context.CaseId
```
（案例记录ID）。

AskUserQuestion:
  question: "Agent测试需要上下文变量，我们应该使用哪些值？"
  header: "变量"
  options:
    - label: "使用测试记录ID（推荐）"
      description: "我将提供组织中的真实消息会话和案例ID用于测试"
    - label: "跳过变量"
      description: "不使用上下文变量运行测试 — 警告：认证主题可能会失败"
    - label: "从组织中自动发现"
      description: "查询组织中的最新消息会话和案例记录作为测试值"
  multiSelect: false

⚠️ 警告： 如果Agent有运行
Bot_User_Verification
的
User_Authentication
主题，您必须提供
$Context.RoutableId
和
$Context.CaseId
。没有这些变量，验证流程会失败 → Agent升级 → 第一轮就触发
SessionEnded
。

I-6: Partition Strategy

I-6：分区策略

DEFAULT RULE: If total generated scenarios > 4, default to "2 workers by category". If ≤ 4, default to "Sequential". ALWAYS default — only change if the user explicitly requests otherwise.

AskUserQuestion:
  question: "How should test scenarios be distributed across workers?"
  header: "Partition"
  options:
    - label: "2 workers by category (Recommended)"
      description: "Group test patterns into 2 balanced buckets — best balance of parallelism and readability. DEFAULT when > 4 scenarios."
    - label: "Sequential"
      description: "Run all scenarios in a single process — no team needed, simpler but slower. DEFAULT when ≤ 4 scenarios."
  multiSelect: false

默认规则：如果生成的场景总数>4，默认使用"按类别分为2个工作进程"。如果≤4，默认使用"顺序执行"。始终使用默认值，仅当用户明确要求时才更改。

AskUserQuestion:
  question: "测试场景应如何在工作进程之间分配？"
  header: "分区"
  options:
    - label: "按类别分为2个工作进程（推荐）"
      description: "将测试模式分组到2个平衡的任务桶中 — 并行性和可读性的最佳平衡。当场景数>4时默认使用。"
    - label: "顺序执行"
      description: "在单个进程中运行所有场景 — 不需要团队，更简单但速度较慢。当场景数≤4时默认使用。"
  multiSelect: false

I-7: Confirmation Summary Format

I-7：确认摘要格式

Present this to the user before execution:

📋 TEST PLAN SUMMARY
════════════════════════════════════════════════════════════════
Agent:        {agent_name} ({agent_id})
Org:          {org_alias}
Credentials:  ~/.sfagent/{org_alias}/{eca_name}/credentials.env ✅
Scenarios:    {total_count} across {category_count} categories
Partition:    {strategy} with {worker_count} worker(s)
Variables:    {var_count} session variable(s)

📂 Scenario Breakdown:
  topic_routing:        {n} scenarios
  context_preservation: {n} scenarios
  escalation_flows:     {n} scenarios
  guardrail_testing:    {n} scenarios
  action_chain:         {n} scenarios
  error_recovery:       {n} scenarios
  cross_topic_switch:   {n} scenarios

💾 Saved: test-plan-{agent_name}.yaml
════════════════════════════════════════════════════════════════
Proceed? [Confirm / Edit / Cancel]

在执行前向用户显示以下内容：

📋 测试计划摘要
════════════════════════════════════════════════════════════════
Agent:        {agent_name} ({agent_id})
组织:          {org_alias}
凭证:  ~/.sfagent/{org_alias}/{eca_name}/credentials.env ✅
场景:    {total_count}个，分布在{category_count}个类别中
分区:    {strategy}，使用{worker_count}个工作进程
变量:    {var_count}个会话变量

📂 场景细分:
  topic_routing:        {n}个场景
  context_preservation: {n}个场景
  escalation_flows:     {n}个场景
  guardrail_testing:    {n}个场景
  action_chain:         {n}个场景
  error_recovery:       {n}个场景
  cross_topic_switch:   {n}个场景

💾 已保存: test-plan-{agent_name}.yaml
════════════════════════════════════════════════════════════════
是否继续？ [确认 / 编辑 / 取消]

⚡ MANDATORY: Phase A4 Execution Protocol

⚡ 强制：阶段A4执行协议

This protocol is NON-NEGOTIABLE. After I-7 confirmation, you MUST follow EXACTLY these steps based on the partition strategy. DO NOT improvise, skip steps, or run sequentially when the plan says swarm.

此协议不可协商。在I-7确认后，您必须严格按照分区策略遵循以下步骤。请勿即兴发挥、跳过步骤或在计划要求集群执行时使用顺序执行。

Path A: Sequential Execution (worker_count == 1)

路径A：顺序执行（worker_count == 1）

Run a single

multi_turn_test_runner.py

process. No team needed.

bash

set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
  --scenarios {scenario_file} \
  --agent-id {agent_id} \
  --var '$Context.RoutableId={routable_id}' \
  --var '$Context.CaseId={case_id}' \
  --output {working_dir}/results.json \
  --report-file {working_dir}/report.ansi \
  --verbose

运行单个

multi_turn_test_runner.py

进程，不需要团队。

bash

set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a
python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
  --scenarios {scenario_file} \
  --agent-id {agent_id} \
  --var '$Context.RoutableId={routable_id}' \
  --var '$Context.CaseId={case_id}' \
  --output {working_dir}/results.json \
  --report-file {working_dir}/report.ansi \
  --verbose

Path B: Swarm Execution (worker_count == 2) — MANDATORY CHECKLIST

路径B：集群执行（worker_count == 2）— 强制检查清单

YOU MUST EXECUTE EVERY STEP BELOW IN ORDER. DO NOT SKIP ANY STEP.

☐ Step 1: Split scenarios into 2 partitions Group the generated category YAML files into 2 balanced buckets by total scenario count. Write

{working_dir}/scenarios-part1.yaml

and

{working_dir}/scenarios-part2.yaml

. Each partition file must be valid YAML with a

scenarios:

key containing its subset.

☐ Step 2: Create team

TeamCreate(team_name="sf-test-{agent_name}")

☐ Step 3: Create 2 tasks (one per partition)

TaskCreate(subject="Run partition 1", description="Execute scenarios-part1.yaml")
TaskCreate(subject="Run partition 2", description="Execute scenarios-part2.yaml")

☐ Step 4: Spawn 2 workers IN PARALLEL (single message with 2 Task tool calls) Use the Worker Agent Prompt Template below. CRITICAL: Both Task calls MUST be in the SAME message.

Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-1", prompt=WORKER_PROMPT_1)
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-2", prompt=WORKER_PROMPT_2)

☐ Step 5: Wait for both workers to report (they SendMessage when done) Do NOT proceed until both workers have sent their results via SendMessage.

☐ Step 6: Aggregate results

bash

python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
  --results {working_dir}/worker-1-results.json {working_dir}/worker-2-results.json

☐ Step 7: Present unified report to the user

☐ Step 8: Offer fix loop if any failures detected

☐ Step 9: Shutdown workers

SendMessage(type="shutdown_request", recipient="worker-1")
SendMessage(type="shutdown_request", recipient="worker-2")

☐ Step 10: Clean up

TeamDelete

您必须按顺序执行以下所有步骤，请勿跳过任何步骤。

☐ 步骤1：将场景拆分为2个分区 将生成的类别YAML文件按场景总数分组为2个平衡的任务桶。写入

{working_dir}/scenarios-part1.yaml

和

{working_dir}/scenarios-part2.yaml

。每个分区文件必须是有效的YAML，包含

scenarios:

键及其子集。

☐ 步骤2：创建团队

TeamCreate(team_name="sf-test-{agent_name}")

☐ 步骤3：创建2个任务（每个分区一个）

TaskCreate(subject="Run partition 1", description="Execute scenarios-part1.yaml")
TaskCreate(subject="Run partition 2", description="Execute scenarios-part2.yaml")

☐ 步骤4：并行生成2个工作进程（在同一条消息中包含2个Task工具调用）使用下面的工作进程Agent提示模板。关键：两个Task调用必须在同一条消息中。

Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-1", prompt=WORKER_PROMPT_1)
Task(subagent_type="general-purpose", team_name="sf-test-{agent_name}", name="worker-2", prompt=WORKER_PROMPT_2)

☐ 步骤5：等待两个工作进程报告（完成后它们会SendMessage）在两个工作进程都通过SendMessage发送结果之前，请勿继续。

☐ 步骤6：聚合结果

bash

python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
  --results {working_dir}/worker-1-results.json {working_dir}/worker-2-results.json

☐ 步骤7：向用户显示统一报告

☐ 步骤8：如果检测到任何失败，提供修复循环

☐ 步骤9：关闭工作进程

SendMessage(type="shutdown_request", recipient="worker-1")
SendMessage(type="shutdown_request", recipient="worker-2")

☐ 步骤10：清理

TeamDelete

Credential Convention (~/.sfagent/)

凭证约定 (~/.sfagent/)

Persistent ECA credential storage managed by

hooks/scripts/credential_manager.py

由

hooks/scripts/credential_manager.py

管理的持久化ECA凭证存储。

Directory Structure

目录结构

~/.sfagent/
├── .gitignore          ("*" — auto-created, prevents accidental commits)
├── {Org-Alias}/        (org alias — case-sensitive, e.g. Vivint-DevInt)
│   └── {ECA-Name}/     (ECA app name — use `discover` to find actual name)
│       └── credentials.env
└── Other-Org/
    └── My_ECA/
        └── credentials.env

~/.sfagent/
├── .gitignore          ("*" — 自动创建，防止意外提交)
├── {Org-Alias}/        (组织别名 — 区分大小写，例如Vivint-DevInt)
│   └── {ECA-Name}/     (ECA应用名称 — 使用`discover`查找实际名称)
│       └── credentials.env
└── Other-Org/
    └── My_ECA/
        └── credentials.env

File Format

文件格式

env

undefined

env

undefined

credentials.env — managed by credential_manager.py

credentials.env — 由credential_manager.py管理

'export' prefix allows direct

source credentials.env

in shell

'export'前缀允许在shell中直接

source credentials.env

export SF_MY_DOMAIN=yourdomain.my.salesforce.com export SF_CONSUMER_KEY=3MVG9... export SF_CONSUMER_SECRET=ABC123...

undefined

export SF_MY_DOMAIN=yourdomain.my.salesforce.com export SF_CONSUMER_KEY=3MVG9... export SF_CONSUMER_SECRET=ABC123...

undefined

Security Rules

安全规则

Rule	Implementation
Directory permissions	`0700` (owner only)
File permissions	`0600` (owner only)
Git protection	`.gitignore` with `*` auto-created in `~/.sfagent/`
Secret display	NEVER show full secrets — mask as `ABC...XYZ` (first 3 + last 3)
Credential passing	Export as env vars for subprocesses, never write to temp files

规则	实现方式
目录权限	`0700` （仅所有者可访问）
文件权限	`0600` （仅所有者可访问）
Git保护	在 `~/.sfagent/` 中自动创建包含 `*` 的 `.gitignore`
密钥显示	绝不显示完整密钥 — 掩码为 `ABC...XYZ` （前3位+后3位）
凭证传递	作为环境变量导出给子进程，绝不写入临时文件

CLI Reference

CLI参考

bash

undefined

bash

undefined

Discover orgs and ECAs

发现组织和ECA

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover python3 {SKILL_PATH}/hooks/scripts/credential_manager.py discover --org-alias Vivint-DevInt

Load credentials (secrets masked in output)

加载凭证（输出中掩码密钥）

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py load --org-alias {org} --eca-name {eca}

Save new credentials

保存新凭证

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py save
--org-alias {org} --eca-name {eca}
--domain yourdomain.my.salesforce.com
--consumer-key 3MVG9... --consumer-secret ABC123...

Validate OAuth flow

验证OAuth流程

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py validate --org-alias {org} --eca-name {eca}

Source credentials for shell use (set -a auto-exports all vars)

加载凭证供shell使用（set -a自动导出所有变量）

set -a; source ~/.sfagent/{org}/{eca}/credentials.env; set +a

---

set -a; source ~/.sfagent/{org}/{eca}/credentials.env; set +a

---

Swarm Execution Rules (Native Claude Code Teams)

集群执行规则（原生Claude Code团队）

When

worker_count > 1

in the test plan, use Claude Code's native team orchestration for parallel test execution. When

worker_count == 1

, run sequentially without creating a team.

当测试计划中的

worker_count > 1

时，使用Claude Code的原生团队编排进行并行测试执行。当

worker_count == 1

时，不创建团队直接顺序执行。

Team Lead Rules (Claude Code)

团队负责人规则（Claude Code）

RULE: Create team via TeamCreate("sf-test-{agent_name}")
RULE: Create one TaskCreate per partition (category or count split)
RULE: Spawn one Task(subagent_type="general-purpose") per worker
RULE: Each worker gets credentials as env vars in its prompt (NEVER in files)
RULE: Wait for all workers to report via SendMessage
RULE: After all workers complete, run rich_test_report.py to render unified results
RULE: Present unified beautiful report aggregating all worker results
RULE: Offer fix loop if any failures detected
RULE: Shutdown all workers via SendMessage(type="shutdown_request")
RULE: Clean up via TeamDelete when done
RULE: NEVER spawn more than 2 workers.
RULE: When categories > 2, group into 2 balanced buckets.
RULE: Queue remaining work to existing workers after they complete first batch.

规则：通过TeamCreate("sf-test-{agent_name}")创建团队
规则：为每个分区创建一个TaskCreate
规则：为每个工作进程生成一个Task(subagent_type="general-purpose")
规则：每个工作进程在其提示中获取作为环境变量的凭证（绝不要放在文件中）
规则：等待所有工作进程通过SendMessage报告
规则：所有工作进程完成后，运行rich_test_report.py以生成统一结果
规则：向用户显示统一的美观报告
规则：如果检测到任何失败，提供修复循环
规则：通过SendMessage(type="shutdown_request")关闭所有工作进程
规则：完成后通过TeamDelete清理
规则：绝不生成超过2个工作进程
规则：当类别数>2时，分组为2个平衡的任务桶
规则：工作进程完成第一批任务后，将剩余工作排队给现有工作进程

Worker Agent Prompt Template

工作进程Agent提示模板

Each worker receives this prompt (team lead fills in the variables):

You are a multi-turn test worker for Agentforce agent testing.

YOUR TASK:
1. Claim your task via TaskUpdate(status="in_progress", owner=your_name)

2. Load credentials and run the test:
   set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a

   python3 {skill_path}/hooks/scripts/multi_turn_test_runner.py \
     --scenarios {scenario_file} \
     --agent-id {agent_id} \
     --var '$Context.RoutableId={routable_id}' \
     --var '$Context.CaseId={case_id}' \
     --output {working_dir}/worker-{N}-results.json \
     --report-file {working_dir}/worker-{N}-report.ansi \
     --worker-id {N} --verbose

3. IMPORTANT — RENDER RICH TUI REPORT IN YOUR PANE:
   After the test runner completes, render the results visually so they appear
   in your conversation pane (the tmux panel the user can see):

   python3 -c "
   import sys, json
   sys.path.insert(0, '{skill_path}/hooks/scripts')
   from multi_turn_test_runner import format_results_rich
   with open('{working_dir}/worker-{N}-results.json') as f:
       results = json.load(f)
   print(format_results_rich(results, worker_id={N}, scenario_file='{scenario_file}'))
   "

   Then copy-paste that output into your conversation as a text message so it
   renders in your Claude Code pane for the user to see.

4. Analyze: which scenarios passed, which failed, and WHY

5. SendMessage to team lead with:
   - Pass/fail summary (counts + percentages)
   - For each failure: scenario name, turn number, what went wrong, suggested fix
   - Total execution time
   - Any patterns noticed (e.g., "all context_preservation tests failed — may be a systemic issue")

6. Mark your task as completed via TaskUpdate

IMPORTANT:
- If a test fails with an auth error (exit code 2), report it immediately — do NOT retry
- If a test fails with scenario failures (exit code 1), analyze and report all failures
- You CAN communicate with other workers if you discover related issues
- The --report-file flag writes a persistent ANSI report file viewable with `cat` or `bat`

每个工作进程都会收到此提示（团队负责人填充变量）：

您是Agentforce Agent测试的多轮测试工作进程。

您的任务：
1. 通过TaskUpdate(status="in_progress", owner=your_name)认领任务

2. 加载凭证并运行测试：
   set -a; source ~/.sfagent/{org_alias}/{eca_name}/credentials.env; set +a

   python3 {skill_path}/hooks/scripts/multi_turn_test_runner.py \
     --scenarios {scenario_file} \
     --agent-id {agent_id} \
     --var '$Context.RoutableId={routable_id}' \
     --var '$Context.CaseId={case_id}' \
     --output {working_dir}/worker-{N}-results.json \
     --report-file {working_dir}/worker-{N}-report.ansi \
     --worker-id {N} --verbose

3. 重要 — 在您的面板中渲染富文本TUI报告：
   测试运行器完成后，可视化渲染结果，使其显示在您的对话面板中（用户可以看到的tmux面板）：

   python3 -c "
   import sys, json
   sys.path.insert(0, '{skill_path}/hooks/scripts')
   from multi_turn_test_runner import format_results_rich
   with open('{working_dir}/worker-{N}-results.json') as f:
       results = json.load(f)
   print(format_results_rich(results, worker_id={N}, scenario_file='{scenario_file}'))
   "

   然后将该输出复制粘贴到您的对话中作为文本消息，使其显示在您的Claude Code面板中供用户查看。

4. 分析：哪些场景通过了，哪些失败了，以及原因

5. 向团队负责人SendMessage，包含：
   - 通过/失败摘要（数量+百分比）
   - 每个失败项：场景名称、轮次、问题所在、建议修复方案
   - 总执行时间
   - 注意到的任何模式（例如："所有context_preservation测试都失败了 — 可能是系统性问题"）

6. 通过TaskUpdate将您的任务标记为已完成

重要提示：
- 如果测试因认证错误失败（退出代码2），立即报告 — 请勿重试
- 如果测试因场景失败失败（退出代码1），分析并报告所有失败
- 如果发现相关问题，您可以与其他工作进程沟通
- --report-file标志将持久化ANSI报告文件写入磁盘，可使用`cat`或`bat`查看

Partition Strategies

分区策略

Strategy	How It Works	Best For
`by_category`	One worker per test pattern (topic_routing, context, etc.)	Most runs — natural isolation
`by_count`	Split N scenarios evenly across W workers	Large scenario counts
`sequential`	Single process, no team	Quick runs, debugging

策略	工作方式	最佳使用场景
`by_category`	每个测试模式（topic_routing、context等）分配一个工作进程	大多数运行 — 自然隔离
`by_count`	将N个场景平均分配给W个工作进程	场景数量较多时
`sequential`	单个进程，不需要团队	快速运行、调试

Team Lead Aggregation

团队负责人聚合

After all workers report, the team lead:

Aggregates all worker result JSON files via

rich_test_report.py

bash

python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
  --results /tmp/sf-test-{session}/worker-*-results.json

Deduplicates any shared failure patterns across workers
Presents the unified Rich report (colored Panels, Tables, Tree) to the user
Calculates aggregate scoring across the 7 categories
Offers fix loop: if failures exist, ask user whether to auto-fix via
```
sf-ai-agentscript
```
Shuts down all workers and deletes the team

所有工作进程报告后，团队负责人：

聚合所有工作进程的结果JSON文件，通过

rich_test_report.py

：

bash

python3 {SKILL_PATH}/hooks/scripts/rich_test_report.py \
  --results /tmp/sf-test-{session}/worker-*-results.json

去重跨工作进程的任何共享失败模式
向用户显示统一的Rich报告（彩色面板、表格、树形结构）
计算7个维度的聚合评分
提供修复循环：如果存在失败，询问用户是否通过
```
sf-ai-agentscript
```
自动修复
关闭所有工作进程并删除团队

Test Plan File Format

测试计划文件格式

Test plans (

test-plan-{agent}.yaml

) capture the full interview output for reuse. See

templates/test-plan-template.yaml

for the complete schema.

测试计划（

test-plan-{agent}.yaml

）捕获完整的访谈输出以便复用。完整架构请参见

templates/test-plan-template.yaml

。

Key Sections

关键部分

Section	Purpose
`metadata`	Agent name, ID, org alias, timestamps
`credentials`	Path to `~/.sfagent/` credentials.env or `use_env: true`
`agent_metadata`	Topics, actions, type — populated by `agent_discovery.py`
`scenarios`	List of YAML scenario files + pattern filters
`partition`	Strategy ( `by_category` / `by_count` / `sequential` ) + worker count
`session_variables`	Context variables injected into every session
`execution`	Timeout, retry, verbose, rich output settings

部分	用途
`metadata`	Agent名称、ID、组织别名、时间戳
`credentials`	`~/.sfagent/` credentials.env的路径或 `use_env: true`
`agent_metadata`	主题、动作、类型 — 由 `agent_discovery.py` 填充
`scenarios`	YAML场景文件列表 + 模式过滤器
`partition`	策略（ `by_category` / `by_count` / `sequential` ） + 工作进程数量
`session_variables`	注入到每个会话的上下文变量
`execution`	超时、重试、详细输出、富文本输出设置

Re-Running from a Saved Plan

从保存的计划重新运行

When a user provides a test plan file, skip the interview entirely:

1. Load test-plan-{agent}.yaml
2. Validate credentials: credential_manager.py validate --org-alias {org} --eca-name {eca}
3. If invalid → ask user to update credentials only (skip other interview steps)
4. Load scenario files from plan
5. Apply partition strategy from plan
6. Execute (team or sequential based on worker_count)

This enables rapid re-runs after fixing agent issues — the user just says "re-run" and the skill picks up the saved plan.

当用户提供测试计划文件时，完全跳过访谈：

1. 加载test-plan-{agent}.yaml
2. 验证凭证：credential_manager.py validate --org-alias {org} --eca-name {eca}
3. 如果无效 → 仅询问用户更新凭证（跳过其他访谈步骤）
4. 从计划中加载场景文件
5. 应用计划中的分区策略
6. 执行（根据worker_count选择团队或顺序执行）

这使得修复Agent问题后可以快速重新运行 — 用户只需说"重新运行"，技能就会使用保存的计划。

Phase A: Multi-Turn API Testing (PRIMARY)

阶段A：多轮API测试（主流程）

⚠️ NEVER use
curl
for OAuth token validation. Domains containing
--
(e.g.,
my-org--devint.sandbox.my.salesforce.com
) cause shell expansion failures with curl's
--
argument parsing. Use
credential_manager.py validate
instead.

⚠️ 绝不要使用
curl
进行OAuth令牌验证。包含
--
的域名（例如
my-org--devint.sandbox.my.salesforce.com
）会导致curl的
--
参数解析出现shell扩展失败。请改用
credential_manager.py validate
。

A1: ECA Credential Setup

A1：ECA凭证设置

Why ECA? Multi-turn API testing uses the Agent Runtime API (
/einstein/ai-agent/v1
), which requires OAuth Client Credentials. If you only need interactive testing, use
sf agent preview
instead — no ECA needed, just
sf org login web
(v2.121.7+). See connected-app-setup.md.

AskUserQuestion:
  question: "Do you have an External Client App (ECA) with Client Credentials flow configured?"
  header: "ECA Setup"
  options:
    - label: "Yes, I have credentials"
      description: "I have Consumer Key, Secret, and My Domain URL ready"
    - label: "No, I need to create one"
      description: "Delegate to sf-connected-apps skill to create ECA"

If YES: Collect credentials (kept in conversation context only, NEVER written to files):

Consumer Key
Consumer Secret
My Domain URL (e.g.,
```
your-domain.my.salesforce.com
```
)

If NO: Delegate to sf-connected-apps:

Skill(skill="sf-connected-apps", args="Create External Client App with Client Credentials flow for Agent Runtime API testing. Scopes: api, chatbot_api, sfap_api, refresh_token, offline_access. Name: Agent_API_Testing")

Verify credentials work:

bash

undefined

为什么需要ECA？ 多轮API测试使用Agent Runtime API（
/einstein/ai-agent/v1
），需要OAuth客户端凭证。如果只需要交互式测试，请改用
sf agent preview
— 不需要ECA，只需
sf org login web
（v2.121.7+）。请参见connected-app-setup.md。

AskUserQuestion:
  question: "您是否配置了带有客户端凭证流的外部客户端应用（ECA）？"
  header: "ECA设置"
  options:
    - label: "是，我有凭证"
      description: "我已准备好消费者密钥、密钥和我的域名URL"
    - label: "否，我需要创建一个"
      description: "委托给sf-connected-apps技能创建ECA"

如果是： 收集凭证（仅保存在对话上下文中，绝不写入文件）：

消费者密钥
消费者密钥
我的域名URL（例如
```
your-domain.my.salesforce.com
```
）

如果否： 委托给sf-connected-apps：

Skill(skill="sf-connected-apps", args="Create External Client App with Client Credentials flow for Agent Runtime API testing. Scopes: api, chatbot_api, sfap_api, refresh_token, offline_access. Name: Agent_API_Testing")

验证凭证是否有效：

bash

undefined

Validate OAuth credentials via credential_manager.py (handles token request internally)

通过credential_manager.py验证OAuth凭证（内部处理令牌请求）

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py
validate --org-alias {org} --eca-name {eca}


See [ECA Setup Guide](docs/eca-setup-guide.md) for complete instructions.

python3 {SKILL_PATH}/hooks/scripts/credential_manager.py
validate --org-alias {org} --eca-name {eca}


完整说明请参见[ECA设置指南](docs/eca-setup-guide.md)。

A2: Agent Discovery & Metadata Retrieval

A2：Agent发现与元数据检索

bash

undefined

bash

undefined

Get agent ID for API calls

获取API调用使用的Agent ID

AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id, DeveloperName, MasterLabel FROM BotDefinition WHERE DeveloperName='[AgentName]' AND IsActive=true LIMIT 1"
--result-format json --target-org [alias] | jq -r '.result.records[0].Id')

Retrieve full agent configuration

检索完整的Agent配置

sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentName]"
--output-dir retrieve-temp --target-org [alias]


Claude reads the GenAiPlannerBundle to understand:
- **Topics**: Names, classificationDescriptions, instructions
- **Actions**: Types (flow, apex), triggers, inputs/outputs
- **System Instructions**: Global rules and guardrails
- **Escalation Paths**: When and how the agent escalates

This metadata drives automatic test scenario generation in A3.

sf project retrieve start
--metadata "GenAiPlannerBundle:[AgentName]"
--output-dir retrieve-temp --target-org [别名]


Claude读取GenAiPlannerBundle以了解：
- **主题**：名称、classificationDescriptions、指令
- **动作**：类型（flow、apex）、触发器、输入/输出
- **系统指令**：全局规则和防护规则
- **升级路径**：何时以及如何升级

此元数据驱动A3中的自动测试场景生成。

A3: Test Scenario Planning

A3：测试场景规划

AskUserQuestion:
  question: "What testing do you need?"
  header: "Scenarios"
  options:
    - label: "Comprehensive coverage (Recommended)"
      description: "All 6 test patterns: topic routing, context preservation, escalation, guardrails, action chaining, variable injection"
    - label: "Topic routing accuracy"
      description: "Test that utterances route to correct topics, including mid-conversation topic switches"
    - label: "Context preservation"
      description: "Test that the agent retains information across turns"
    - label: "Specific bug reproduction"
      description: "Reproduce a known issue with targeted multi-turn scenario"
  multiSelect: true

Claude uses the agent metadata from A2 to auto-generate multi-turn scenarios tailored to the specific agent:

Generates topic switching scenarios based on actual topic names
Creates context preservation tests using actual action inputs/outputs
Builds escalation tests based on actual escalation configuration
Creates guardrail tests based on system instructions

Available templates (see templates/):

Template	Pattern	Scenarios
`multi-turn-topic-routing.yaml`	Topic switching	4
`multi-turn-context-preservation.yaml`	Context retention	4
`multi-turn-escalation-flows.yaml`	Escalation cascades	4
`multi-turn-comprehensive.yaml`	All 6 patterns	6

AskUserQuestion:
  question: "您需要哪种测试？"
  header: "场景"
  options:
    - label: "全面覆盖（推荐）"
      description: "所有6种测试模式：主题路由、上下文保留、升级、防护规则、动作链、变量注入"
    - label: "主题路由准确性"
      description: "测试语句是否路由到正确的主题，包括对话中途的主题切换"
    - label: "上下文保留"
      description: "测试Agent在多轮对话中保留信息的能力"
    - label: "特定错误重现"
      description: "使用针对性的多轮场景重现已知问题"
  multiSelect: true

Claude使用A2中的Agent元数据自动生成针对特定Agent的多轮场景：

根据实际主题名称生成主题切换场景
使用实际动作输入/输出创建上下文保留测试
根据实际升级配置构建升级测试
根据系统指令创建防护规则测试

可用模板（请参见模板）：

模板	模式	场景数量
`multi-turn-topic-routing.yaml`	主题切换	4
`multi-turn-context-preservation.yaml`	上下文保留	4
`multi-turn-escalation-flows.yaml`	升级流程	4
`multi-turn-comprehensive.yaml`	所有6种模式	6

A4: Multi-Turn Execution

A4：多轮执行

Execute conversations via Agent Runtime API using the reusable Python scripts in

hooks/scripts/

⚠️ Agent API is NOT supported for agents of type "Agentforce (Default)". Only custom agents created via Agentforce Builder are supported.

Option 1: Run Test Scenarios from YAML Templates (Recommended)

Use the multi-turn test runner to execute entire scenario suites:

bash

undefined

通过Agent Runtime API使用

hooks/scripts/

中的可复用Python脚本执行对话。

⚠️ Agent API不支持"Agentforce（默认）"类型的Agent。仅支持通过Agentforce Builder创建的自定义Agent。

选项1：从YAML模板运行测试场景（推荐）

使用多轮测试运行器执行整个场景套件：

bash

undefined

Run comprehensive test suite against an agent

对Agent运行全面测试套件

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain "${SF_MY_DOMAIN}"
--consumer-key "${CONSUMER_KEY}"
--consumer-secret "${CONSUMER_SECRET}"
--agent-id "${AGENT_ID}"
--scenarios templates/multi-turn-comprehensive.yaml
--verbose

Run specific scenario within a suite

运行套件中的特定场景

With context variables and JSON output for fix loop

带上下文变量和JSON输出用于修复循环


**Exit codes:** `0` = all passed, `1` = some failed (fix loop should process), `2` = execution error

**Option 2: Use Environment Variables (cleaner for repeated runs)**

```bash
export SF_MY_DOMAIN="your-domain.my.salesforce.com"
export SF_CONSUMER_KEY="your_key"
export SF_CONSUMER_SECRET="your_secret"
export SF_AGENT_ID="0XxRM0000004ABC"


**退出代码：** `0` = 全部通过, `1` = 部分失败（修复循环应处理）, `2` = 执行错误

**选项2：使用环境变量（重复运行更简洁）**

```bash
export SF_MY_DOMAIN="your-domain.my.salesforce.com"
export SF_CONSUMER_KEY="your_key"
export SF_CONSUMER_SECRET="your_secret"
export SF_AGENT_ID="0XxRM0000004ABC"

Now run without credential flags

现在运行时不需要凭证标志

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--scenarios templates/multi-turn-comprehensive.yaml
--verbose


**Option 3: Python API for Ad-Hoc Testing**

For custom scenarios or debugging, use the client directly:

```python
from hooks.scripts.agent_api_client import AgentAPIClient

client = AgentAPIClient(
    my_domain="your-domain.my.salesforce.com",
    consumer_key="...",
    consumer_secret="..."
)

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--scenarios templates/multi-turn-comprehensive.yaml
--verbose


**选项3：Python API用于临时测试**

对于自定义场景或调试，直接使用客户端：

```python
from hooks.scripts.agent_api_client import AgentAPIClient

client = AgentAPIClient(
    my_domain="your-domain.my.salesforce.com",
    consumer_key="...",
    consumer_secret="..."
)

Context manager auto-ends session

上下文管理器自动结束会话

with client.session(agent_id="0XxRM000...") as session: r1 = session.send("I need to cancel my appointment") print(f"Turn 1: {r1.agent_text}")

r2 = session.send("Actually, reschedule instead")
print(f"Turn 2: {r2.agent_text}")

r3 = session.send("What was my original request?")
print(f"Turn 3: {r3.agent_text}")
# Check context preservation
if "cancel" in r3.agent_text.lower():
    print("✅ Context preserved")

with client.session(agent_id="0XxRM000...") as session: r1 = session.send("我需要取消我的预约") print(f"第1轮：{r1.agent_text}")

r2 = session.send("实际上，改为重新安排")
print(f"第2轮：{r2.agent_text}")

r3 = session.send("我最初的请求是什么？")
print(f"第3轮：{r3.agent_text}")
# 检查上下文保留
if "取消" in r3.agent_text.lower():
    print("✅ 上下文已保留")

With initial variables

带初始变量

variables = [ {"name": "$Context.AccountId", "type": "Id", "value": "001XXXXXXXXXXXX"}, {"name": "$Context.EndUserLanguage", "type": "Text", "value": "en_US"}, ] with client.session(agent_id="0Xx...", variables=variables) as session: r1 = session.send("What orders do I have?")


**Connectivity Test:**
```bash


**连通性测试：**
```bash

Verify ECA credentials and API connectivity

验证ECA凭证和API连通性

python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py

Reads SF_MY_DOMAIN, SF_CONSUMER_KEY, SF_CONSUMER_SECRET from env

从环境变量读取SF_MY_DOMAIN、SF_CONSUMER_KEY、SF_CONSUMER_SECRET


**Per-Turn Analysis Checklist:**

The test runner automatically evaluates each turn against expectations defined in the YAML template:

| # | Check | YAML Key | How Evaluated |
|---|-------|----------|---------------|
| 1 | Response non-empty? | `response_not_empty: true` | `messages[0].message` has content |
| 2 | Correct topic matched? | `topic_contains: "cancel"` | Heuristic: inferred from response text |
| 3 | Expected actions invoked? | `action_invoked: true` | Checks for `result` array entries |
| 4 | Response content? | `response_contains: "reschedule"` | Substring match on response |
| 5 | Context preserved? | `context_retained: true` | Heuristic: checks for prior-turn references |
| 6 | Guardrail respected? | `guardrail_triggered: true` | Regex patterns for refusal language |
| 7 | Escalation triggered? | `escalation_triggered: true` | Checks for `Escalation` message type |
| 8 | Response excludes? | `response_not_contains: "error"` | Substring exclusion check |

See [Agent API Reference](docs/agent-api-reference.md) for complete response format.


**每轮分析检查清单：**

测试运行器会根据YAML模板中定义的期望自动评估每一轮：

| # | 检查项 | YAML键 | 评估方式 |
|---|-------|----------|---------------|
| 1 | 响应非空？ | `response_not_empty: true` | `messages[0].message`包含内容 |
| 2 | 匹配正确的主题？ | `topic_contains: "cancel"` | 启发式：从响应文本推断 |
| 3 | 调用了预期的动作？ | `action_invoked: true` | 检查`result`数组条目 |
| 4 | 响应内容？ | `response_contains: "reschedule"` | 响应中的子字符串匹配 |
| 5 | 上下文已保留？ | `context_retained: true` | 启发式：检查对前一轮的引用 |
| 6 | 遵守防护规则？ | `guardrail_triggered: true` | 拒绝语言的正则表达式模式 |
| 7 | 触发升级？ | `escalation_triggered: true` | 检查`Escalation`消息类型 |
| 8 | 响应不包含？ | `response_not_contains: "error"` | 子字符串排除检查 |

完整响应格式请参见[Agent API参考](docs/agent-api-reference.md)。

A5: Results & Scoring

A5：结果与评分

Claude generates a terminal-friendly results report:

📊 MULTI-TURN TEST RESULTS
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
Org: vivint-DevInt
Mode: Agent Runtime API (multi-turn)

SCENARIO RESULTS
───────────────────────────────────────────────────────────────
✅ topic_switch_natural        3/3 turns passed
✅ context_user_identity       3/3 turns passed
❌ escalation_frustration      2/3 turns passed (Turn 3: no escalation)
✅ guardrail_mid_conversation  3/3 turns passed
✅ action_chain_identify       3/3 turns passed
⚠️ variable_injection          2/3 turns passed (Turn 3: re-asked for account)

SUMMARY
───────────────────────────────────────────────────────────────
Scenarios: 6 total | 4 passed | 1 failed | 1 partial
Turns: 18 total | 16 passed | 2 failed
Topic Re-matching: 100% ✅
Context Preservation: 83% ⚠️
Escalation Accuracy: 67% ❌

FAILED TURNS
───────────────────────────────────────────────────────────────
❌ escalation_frustration → Turn 3
   Input: "Nothing is working! I need a human NOW"
   Expected: Escalation triggered
   Actual: Agent continued troubleshooting
   Category: MULTI_TURN_ESCALATION_FAILURE
   Fix: Add frustration keywords to escalation triggers

⚠️ variable_injection → Turn 3
   Input: "Create a new case for a billing issue"
   Expected: Uses pre-set $Context.AccountId
   Actual: "Which account is this for?"
   Category: CONTEXT_PRESERVATION_FAILURE
   Fix: Wire $Context.AccountId to CreateCase action input

SCORING
───────────────────────────────────────────────────────────────
Topic Selection Coverage          13/15
Action Invocation                 14/15
Multi-Turn Topic Re-matching      15/15  ✅
Context Preservation              10/15  ⚠️
Edge Case & Guardrail Coverage    12/15
Test Spec / Scenario Quality       9/10
Agentic Fix Success               --/15  (pending)

TOTAL: 73/85 (86%) + Fix Loop pending

Claude生成适合终端显示的结果报告：

📊 多轮测试结果
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
组织: vivint-DevInt
模式: Agent Runtime API（多轮）

场景结果
───────────────────────────────────────────────────────────────
✅ topic_switch_natural        3/3轮通过
✅ context_user_identity       3/3轮通过
❌ escalation_frustration      2/3轮通过（第3轮：未升级）
✅ guardrail_mid_conversation  3/3轮通过
✅ action_chain_identify       3/3轮通过
⚠️ variable_injection          2/3轮通过（第3轮：重新询问账户）

摘要
───────────────────────────────────────────────────────────────
场景: 共6个 | 通过4个 | 失败1个 | 部分通过1个
轮次: 共18轮 | 通过16轮 | 失败2轮
主题重匹配: 100% ✅
上下文保留: 83% ⚠️
升级准确性: 67% ❌

失败轮次
───────────────────────────────────────────────────────────────
❌ escalation_frustration → 第3轮
   输入: "什么都不管用！我现在需要人工服务！"
   预期: 触发升级
   实际: Agent继续故障排除
   类别: MULTI_TURN_ESCALATION_FAILURE
   修复: 向升级触发器添加沮丧关键词

⚠️ variable_injection → 第3轮
   输入: "为账单问题创建新案例"
   预期: 使用预设的$Context.AccountId
   实际: "这是哪个账户的问题？"
   类别: CONTEXT_PRESERVATION_FAILURE
   修复: 将$Context.AccountId连接到CreateCase动作输入

评分
───────────────────────────────────────────────────────────────
主题选择覆盖率          13/15
动作调用                 14/15
多轮主题重匹配      15/15  ✅
上下文保留              10/15  ⚠️
边缘案例与防护规则覆盖率    12/15
测试用例/场景质量       9/10
Agent自动修复成功率       --/15  （待处理）

总计: 73/85 (86%) + 修复循环待处理

Phase B: CLI Testing Center (SECONDARY)

阶段B：CLI测试中心（副流程）

Availability: Requires Agent Testing Center feature enabled in org. If unavailable, use Phase A exclusively.

可用性： 需要组织中启用Agent测试中心功能。如果不可用，请仅使用阶段A。

⚡ Agent Script Agents (AiAuthoringBundle)

⚡ Agent Script Agent（AiAuthoringBundle）

Agent Script agents (

.agent

files in

aiAuthoringBundles/

) deploy as

BotDefinition

and use the same

sf agent test

CLI commands. However, they have unique testing challenges:

Two-Level Action System:

Level 1 (Definition):
```
topic.actions:
```
block — defines actions with
```
target: "apex://ClassName"
```
Level 2 (Invocation):
```
reasoning.actions:
```
block — invokes via
```
@actions.<name>
```
with variable bindings

Single-Utterance Limitation: Multi-topic Agent Script agents with

start_agent

routing have a "1 action per reasoning cycle" budget in CLI tests. The first cycle is consumed by the transition action (

go_<topic>

). The actual business action (e.g.,

get_order_status

) fires in a second cycle that single-utterance tests don't reach.

Solution — Use
conversationHistory
:

yaml

testCases:
  # ROUTING TEST — captures transition action only
  - utterance: "I want to check my order status"
    expectedTopic: order_status
    expectedActions:
      - go_order_status          # Transition action from start_agent

  # ACTION TEST — use conversationHistory to skip routing
  - utterance: "The order ID is 801ak00001g59JlAAI"
    conversationHistory:
      - role: "user"
        message: "I want to check my order status"
      - role: "agent"
        topic: "order_status"    # Pre-positions agent in target topic
        message: "I'd be happy to help! Could you provide the Order ID?"
    expectedTopic: order_status
    expectedActions:
      - get_order_status         # Level 1 DEFINITION name (NOT invocation name)
    expectedOutcome: "Agent retrieves and displays order details"

Key Rules for Agent Script CLI Tests:

```
expectedActions
```
uses the Level 1 definition name (e.g.,
```
get_order_status
```
), NOT the Level 2 invocation name (e.g.,
```
check_status
```
)
Agent Script topic names may differ in org — use the topic name discovery workflow
Agents with
```
WITH USER_MODE
```
Apex require the Einstein Agent User to have object permissions — missing permissions cause silent failures (0 rows, no error)
```
subjectName
```
in the YAML spec maps to
```
config.developer_name
```
in the
```
.agent
```
file

⚠️ Agent Script API Testing Caveat:

Agent Script agents embed action results differently via the Agent Runtime API:

Agent Builder agents: Return separate
```
ActionResult
```
message types with structured data
Agent Script agents: Embed action outputs within
```
Inform
```
text messages — no separate
```
ActionResult
```
type

This means:

```
action_invoked: true
```
(boolean) may fail even when the action runs — use
```
response_contains
```
to verify action output instead

action_invoked: "action_name"

uses

plannerSurfaces

fallback parsing but is less reliable

For robust testing, prefer

response_contains

response_contains_any

checks over

action_invoked

Agent Script Templates & Docs:

Template: agentscript-test-spec.yaml — 5 test patterns (CLI)
Template: multi-turn-agentscript-comprehensive.yaml — 6 multi-turn API scenarios
Guide: agentscript-testing-patterns.md — detailed patterns with worked examples

Automated Test Spec Generation:

bash

python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml --verbose

Agent Script Agent（

aiAuthoringBundles/

中的

.agent

文件）部署为

BotDefinition

，并使用相同的

sf agent test

CLI命令。但是，它们有独特的测试挑战：

两级动作系统：

第1级（定义）：
```
topic.actions:
```
块 — 定义带有
```
target: "apex://ClassName"
```
的动作
第2级（调用）：
```
reasoning.actions:
```
块 — 通过
```
@actions.<name>
```
调用并绑定变量

单轮语句限制： 带有

start_agent

路由的多主题Agent Script Agent在CLI测试中每个推理周期有"1个动作"的预算。第一个周期被过渡动作（

go_<topic>

）消耗。实际业务动作（例如

get_order_status

）在单轮测试无法到达的第二个周期触发。

解决方案 — 使用
conversationHistory
：

yaml

testCases:
  # 路由测试 — 仅捕获过渡动作
  - utterance: "我想查看我的订单状态"
    expectedTopic: order_status
    expectedActions:
      - go_order_status          # 来自start_agent的过渡动作

  # 动作测试 — 使用conversationHistory跳过路由
  - utterance: "订单ID是801ak00001g59JlAAI"
    conversationHistory:
      - role: "user"
        message: "我想查看我的订单状态"
      - role: "agent"
        topic: "order_status"    # 将Agent预先定位到目标主题
        message: "我很乐意为您提供帮助！请提供订单ID？"
    expectedTopic: order_status
    expectedActions:
      - get_order_status         # 第1级定义名称（不是调用名称）
    expectedOutcome: "Agent检索并显示订单详情"

Agent Script CLI测试关键规则：

```
expectedActions
```
使用第1级定义名称（例如
```
get_order_status
```
），而不是第2级调用名称（例如
```
check_status
```
）
Agent Script主题名称在组织中可能不同 — 使用主题名称发现工作流
带有
```
WITH USER_MODE
```
Apex的Agent需要Einstein Agent User具有对象权限 — 缺少权限会导致静默失败（0行，无错误）
YAML规范中的
```
subjectName
```
映射到
```
.agent
```
文件中的
```
config.developer_name
```

⚠️ Agent Script API测试注意事项：

Agent Script Agent通过Agent Runtime API嵌入动作结果的方式不同：

Agent Builder Agent：返回带有结构化数据的独立
```
ActionResult
```
消息类型
Agent Script Agent：在
```
Inform
```
文本消息中嵌入动作输出 — 没有独立的
```
ActionResult
```
类型

这意味着：

```
action_invoked: true
```
（布尔值）即使动作运行也可能失败 — 改用
```
response_contains
```
验证动作输出

action_invoked: "action_name"

使用

plannerSurfaces

回退解析，但可靠性较低

为了稳健测试，优先使用

response_contains

response_contains_any

检查而不是

action_invoked

Agent Script模板与文档：

模板：agentscript-test-spec.yaml — 5种测试模式（CLI）
模板：multi-turn-agentscript-comprehensive.yaml — 6种多轮API场景
指南：agentscript-testing-patterns.md — 带实际示例的详细模式

自动测试规范生成：

bash

python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml --verbose

Generates both routing tests (with transition actions) and

生成路由测试（带过渡动作）和

action tests (with conversationHistory for apex:// targets)

动作测试（带针对apex://目标的conversationHistory）


**Agent Discovery:**
```bash


**Agent发现：**
```bash

Discover Agent Script agents alongside XML-based agents

与基于XML的Agent一起发现Agent Script Agent

python3 {SKILL_PATH}/hooks/scripts/agent_discovery.py local
--project-dir /path/to/project --agent-name MyAgent

Returns type: "AiAuthoringBundle" for .agent files

对于.agent文件返回type: "AiAuthoringBundle"

undefined

undefined

B1: Test Spec Creation

B1：测试规范创建

⚠️ CRITICAL: YAML Schema

The CLI YAML spec uses a FLAT structure parsed by

@salesforce/agents

— NOT the fabricated

apiVersion

kind

metadata

format. See test-spec-guide.md for the correct schema.

Required top-level fields:

```
name:
```
— Display name (MasterLabel). Deploy FAILS without this.
```
subjectType: AGENT
```
```
subjectName:
```
— Agent BotDefinition DeveloperName

Test case fields (flat, NOT nested):

```
utterance:
```
— User message
```
expectedTopic:
```
— NOT
```
expectation.topic
```
```
expectedActions:
```
— Flat list of strings, NOT objects with
```
name
```
/
```
invoked
```
/
```
outputs
```
```
expectedOutcome:
```
— Optional natural language description

yaml

undefined

⚠️ 关键：YAML架构

CLI YAML规范使用扁平结构，由

@salesforce/agents

解析 — 不是虚构的

apiVersion

kind

metadata

格式。正确架构请参见test-spec-guide.md。

必填顶级字段：

```
name:
```
— 显示名称（MasterLabel）。没有此字段部署会失败。
```
subjectType: AGENT
```
```
subjectName:
```
— Agent BotDefinition DeveloperName

测试用例字段（扁平，非嵌套）：

```
utterance:
```
— 用户消息
```
expectedTopic:
```
— 不是
```
expectation.topic
```
```
expectedActions:
```
— 扁平字符串列表，不是带有
```
name
```
/
```
invoked
```
/
```
outputs
```
的对象
```
expectedOutcome:
```
— 可选自然语言描述

yaml

undefined

✅ Correct CLI YAML format

✅ 正确的CLI YAML格式

name: "My Agent Tests" subjectType: AGENT subjectName: My_Agent

testCases:

utterance: "Where is my order?" expectedTopic: order_lookup expectedActions:
- get_order_status expectedOutcome: "Agent should provide order status information"


**Option A: Interactive Generation** (no automation)
```bash

name: "我的Agent测试" subjectType: AGENT subjectName: My_Agent

testCases:

utterance: "我的订单在哪里？" expectedTopic: order_lookup expectedActions:
- get_order_status expectedOutcome: "Agent应提供订单状态信息"


**选项A：交互式生成**（无自动化）
```bash

Interactive test spec generation

交互式测试规范生成

sf agent generate test-spec --output-file ./tests/agent-spec.yaml

⚠️ NOTE: No --api-name flag! Interactive-only.

⚠️ 注意：没有--api-name标志！仅支持交互式。


**Option B: Automated Generation** (Python script)
```bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml \
  --verbose

Create Test in Org:

bash

sf agent test create --spec ./tests/agent-spec.yaml --api-name MyAgentTest --target-org [alias]

See Test Spec Reference for complete YAML format guide.


**选项B：自动生成**（Python脚本）
```bash
python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py \
  --agent-file /path/to/Agent.agent \
  --output tests/agent-spec.yaml \
  --verbose

在组织中创建测试：

bash

sf agent test create --spec ./tests/agent-spec.yaml --api-name MyAgentTest --target-org [别名]

完整YAML格式指南请参见测试规范参考。

B1.5: Topic Name Resolution

B1.5：主题名称解析

Topic name format in

expectedTopic

depends on the topic type:

Topic Type	YAML Value	Resolution
Standard (Escalation, Off_Topic)	`localDeveloperName` (e.g., `Escalation` )	Framework resolves automatically
Promoted (p_16j... prefix)	Full runtime `developerName` with hash	Must be exact match

Standard topics like

Escalation

can use the short name — the CLI framework resolves to the hash-suffixed runtime name.

Promoted topics (custom topics created in Setup UI) MUST use the full runtime

developerName

including hash suffix. The short

localDeveloperName

does NOT resolve.

Discovery workflow:

Write spec with best guesses for topic names

Deploy and run:

sf agent test run --api-name X --wait 10 --result-format json --json

Extract actual names:

jq '.result.testCases[].generatedData.topic'

Update spec with actual runtime names
Re-deploy with
```
--force-overwrite
```
and re-run

See topic-name-resolution.md for the complete guide.

expectedTopic

中的主题名称格式取决于主题类型：

主题类型	YAML值	解析方式
标准（Escalation、Off_Topic）	`localDeveloperName` （例如 `Escalation` ）	框架自动解析
推广（p_16j...前缀）	带哈希的完整运行时 `developerName`	必须完全匹配

标准主题如

Escalation

可以使用短名称 — CLI框架会解析为带哈希后缀的运行时名称。

推广主题（在设置UI中创建的自定义主题）必须使用包含哈希后缀的完整运行时

developerName

。短

localDeveloperName

无法解析。

发现工作流：

使用主题名称的最佳猜测编写规范

部署并运行：

sf agent test run --api-name X --wait 10 --result-format json --json

提取实际名称：

jq '.result.testCases[].generatedData.topic'

使用实际运行时名称更新规范
使用
```
--force-overwrite
```
重新部署并重新运行

完整指南请参见topic-name-resolution.md。

B1.6: Known CLI Gotchas

B1.6：已知CLI陷阱

Gotcha	Detail
`name:` mandatory	Deploy fails: "Required fields are missing: [MasterLabel]"
`expectedActions` is flat strings	`- action_name` NOT `- name: action_name, invoked: true`
Empty `expectedActions: []`	Means "not testing" — PASS even when actions invoked
Missing `expectedOutcome`	`output_validation` reports ERROR — harmless
No MessagingSession context	Flows needing `recordId` error (agent handles gracefully)
`--use-most-recent` broken	Always use `--job-id` for `sf agent test results`
contextVariables `name` prefix	Use `RoutableId` NOT `$Context.RoutableId` — framework adds prefix
customEvaluations RETRY bug	⚠️ Spring '26: Server returns RETRY → REST API 500. See Known Issues.
`conciseness` metric broken	Returns score=0, empty explanation — platform bug
`instruction_following` threshold	Labels FAILURE even at score=1 — use score value, ignore label

陷阱	详情
`name:` 必填	部署失败："Required fields are missing: [MasterLabel]"
`expectedActions` 是扁平字符串	`- action_name` 不是 `- name: action_name, invoked: true`
空 `expectedActions: []`	表示"不测试" — 即使调用动作也会通过
缺少 `expectedOutcome`	`output_validation` 报告ERROR — 无害
无MessagingSession上下文	需要 `recordId` 的Flows会出错（Agent会优雅处理）
`--use-most-recent` 已损坏	始终使用 `--job-id` 获取 `sf agent test results`
contextVariables `name` 前缀	使用 `RoutableId` 而不是 `$Context.RoutableId` — 框架会添加前缀
customEvaluations RETRY错误	⚠️ Spring '26：服务器返回RETRY → REST API 500。请参见已知问题。
`conciseness` 指标已损坏	返回score=0，空explanation — 平台错误
`instruction_following` 阈值	即使score=1也标记为FAILURE — 使用分数值，忽略标签

B1.7: Context Variables

B1.7：上下文变量

Context variables inject session-level data (record IDs, user info) into CLI test cases. Without them, action flows receive the topic's internal name as

recordId

. With them, they receive a real record ID.

When to use: Any test case where action flows need real record IDs (e.g., updating a MessagingSession, creating a Case).

YAML syntax:

yaml

contextVariables:
  - name: RoutableId            # Bare name — NOT $Context.RoutableId
    value: "0Mwbb000007MGoTCAW"
  - name: CaseId
    value: "500XX0000000001"

Key rules:

```
name
```
uses bare variable name (e.g.,
```
RoutableId
```
), NOT
```
$Context.RoutableId
```
— the CLI adds the prefix

Maps to

<contextVariable><variableName>

<variableValue>

in XML metadata

Discovery — find valid IDs:

bash

sf data query --query "SELECT Id FROM MessagingSession WHERE Status='Active' LIMIT 1" --target-org [alias]
sf data query --query "SELECT Id FROM Case ORDER BY CreatedDate DESC LIMIT 1" --target-org [alias]

Verified effect (IRIS testing, 2026-02-09):

Without

RoutableId

: action receives

recordId: "p_16jPl000000GwEX_Field_Support_Routing_16j8eeef13560aa"

(topic name)

With
```
RoutableId
```
: action receives
```
recordId: "0Mwbb000007MGoTCAW"
```
(real MessagingSession ID)

Note: Context variables do NOT unlock authentication-gated topics. Injecting
RoutableId
+
CaseId
does not satisfy
User_Authentication
flows.

See context-vars-test-spec.yaml for a dedicated template.

上下文变量将会话级数据（记录ID、用户信息）注入到CLI测试用例中。没有这些变量，动作流会将主题的内部名称作为

recordId

。有了这些变量，它们会收到真实的记录ID。

使用场景： 任何动作流需要真实记录ID的测试用例（例如更新MessagingSession、创建案例）。

YAML语法：

yaml

contextVariables:
  - name: RoutableId            # 裸名称 — 不是$Context.RoutableId
    value: "0Mwbb000007MGoTCAW"
  - name: CaseId
    value: "500XX0000000001"

关键规则：

```
name
```
使用裸变量名称（例如
```
RoutableId
```
），而不是
```
$Context.RoutableId
```
— CLI会添加前缀

映射到XML元数据中的

<contextVariable><variableName>

<variableValue>

发现 — 查找有效ID：

bash

sf data query --query "SELECT Id FROM MessagingSession WHERE Status='Active' LIMIT 1" --target-org [别名]
sf data query --query "SELECT Id FROM Case ORDER BY CreatedDate DESC LIMIT 1" --target-org [别名]

已验证效果（IRIS测试，2026-02-09）：

没有

RoutableId

：动作收到

recordId: "p_16jPl000000GwEX_Field_Support_Routing_16j8eeef13560aa"

（主题名称）

有
```
RoutableId
```
：动作收到
```
recordId: "0Mwbb000007MGoTCAW"
```
（真实MessagingSession ID）

注意： 上下文变量不会解锁认证 gated 主题。注入
RoutableId
+
CaseId
不满足
User_Authentication
流程。

专用模板请参见context-vars-test-spec.yaml。

B1.8: Metrics

B1.8：指标

Metrics add platform quality scoring to test cases. Specify as a flat list of metric names in the YAML.

YAML syntax:

yaml

metrics:
  - coherence
  - instruction_following
  - output_latency_milliseconds

Available metrics (observed behavior from IRIS testing, 2026-02-09):

Metric	Score Range	Status	Notes
`coherence`	1-5	✅ Works	Scores 4-5 for clear responses. Recommended.
`completeness`	1-5	⚠️ Misleading	Penalizes triage/routing agents for "not solving" — skip for routing agents.
`conciseness`	1-5	🔴 Broken	Returns score=0, empty explanation. Platform bug.
`instruction_following`	0-1	⚠️ Threshold bug	Labels "FAILURE" at score=1 when explanation says "follows perfectly."
`output_latency_milliseconds`	Raw ms	✅ Works	No pass/fail — useful for performance baselining.

Recommendation: Use

coherence

output_latency_milliseconds

for baseline quality. Skip

conciseness

(broken) and

completeness

(misleading for routing agents).

指标为测试用例添加平台质量评分。在YAML中指定为扁平的指标名称列表。

YAML语法：

yaml

metrics:
  - coherence
  - instruction_following
  - output_latency_milliseconds

可用指标（IRIS测试观察到的行为，2026-02-09）：

指标	评分范围	状态	说明
`coherence`	1-5	✅ 可用	清晰响应评4-5分，推荐使用
`completeness`	1-5	⚠️ 有误导性	因"未解决问题"而惩罚分诊/路由Agent — 路由Agent跳过此指标
`conciseness`	1-5	🔴 已损坏	始终返回score=0，空explanation。平台错误。
`instruction_following`	0-1	⚠️ 阈值错误	即使score=1且说明文本说Agent"完全遵循指令"也标记为"FAILURE"。
`output_latency_milliseconds`	原始毫秒	✅ 可用	无通过/失败 — 用于性能基准测试

推荐： 使用

coherence

output_latency_milliseconds

作为基准质量。跳过

conciseness

（已损坏）和

completeness

（对路由Agent有误导性）。

B1.9: Custom Evaluations (⚠️ Spring '26 Bug)

B1.9：自定义评估（⚠️ Spring '26错误）

Custom evaluations allow JSONPath-based assertions on action inputs and outputs — e.g., "verify the action received

supportPath = 'Field Support'

YAML syntax:

yaml

customEvaluations:
  - label: "supportPath is Field Support"
    name: string_comparison
    parameters:
      - name: operator
        value: equals
        isReference: false
      - name: actual
        value: "$.generatedData.invokedActions[0][0].function.input.supportPath"
        isReference: true       # JSONPath resolved against generatedData
      - name: expected
        value: "Field Support"
        isReference: false

Evaluation types:

string_comparison

equals

contains

startswith

endswith

numeric_comparison

equals

greater_than

less_than

greater_than_or_equal

less_than_or_equal

Building JSONPath expressions:

Run tests with
```
--verbose
```
to see
```
generatedData.invokedActions
```
Parse the stringified JSON (it's
```
"[[{...}]]"
```
, not a parsed array)

Common paths:

$.generatedData.invokedActions[0][0].function.input.[field]

⚠️ BLOCKED — Spring '26 Platform Bug: Custom evaluations with
isReference: true
cause the server to return "RETRY" status. The results API crashes with
INTERNAL_SERVER_ERROR
. This is server-side (confirmed via direct
curl
). Workaround: Use
expectedOutcome
(LLM-as-judge) or the Testing Center UI until patched.

See custom-eval-test-spec.yaml for a dedicated template.

自定义评估允许对动作输入和输出进行基于JSONPath的断言 — 例如"验证动作收到

supportPath = 'Field Support'

"。

YAML语法：

yaml

customEvaluations:
  - label: "supportPath为Field Support"
    name: string_comparison
    parameters:
      - name: operator
        value: equals
        isReference: false
      - name: actual
        value: "$.generatedData.invokedActions[0][0].function.input.supportPath"
        isReference: true       # 针对generatedData解析JSONPath
      - name: expected
        value: "Field Support"
        isReference: false

评估类型：

string_comparison

equals

contains

startswith

endswith

numeric_comparison

equals

greater_than

less_than

greater_than_or_equal

less_than_or_equal

构建JSONPath表达式：

使用
```
--verbose
```
运行测试以查看
```
generatedData.invokedActions
```
解析字符串化的JSON（是
```
"[[{...}]]"
```
，不是解析后的数组）

常见路径：

$.generatedData.invokedActions[0][0].function.input.[field]

⚠️ 已阻止 — Spring '26平台错误： 带有
isReference: true
的自定义评估导致服务器返回"RETRY"状态。结果API崩溃并显示
INTERNAL_SERVER_ERROR
。这是服务器端问题（通过直接
curl
确认）。解决方法： 在修复前使用
expectedOutcome
（LLM作为判断者）或测试中心UI。

专用模板请参见custom-eval-test-spec.yaml。

B2: Test Execution

B2：测试执行

bash

undefined

bash

undefined

Run automated tests

运行自动化测试

sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org [alias]


> **No ECA required.** Preview uses standard org auth (`sf org login web`). No Connected App setup needed (v2.121.7+).

**Interactive Preview (Simulated):**
```bash
sf agent preview --api-name AgentName --output-dir ./logs --target-org [alias]

Interactive Preview (Live):

bash

sf agent preview --api-name AgentName --use-live-actions --apex-debug --target-org [alias]

sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org [别名]


> **不需要ECA**。预览使用标准组织认证（`sf org login web`）。不需要连接应用设置（v2.121.7+）。

**交互式预览（模拟）：**
```bash
sf agent preview --api-name AgentName --output-dir ./logs --target-org [别名]

交互式预览（真实）：

bash

sf agent preview --api-name AgentName --use-live-actions --apex-debug --target-org [别名]

B3: Results Analysis

B3：结果分析

Parse test results JSON and display formatted summary:

📊 AGENT TEST RESULTS (CLI)
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
Org: vivint-DevInt
Duration: 45.2s
Mode: Simulated

SUMMARY
───────────────────────────────────────────────────────────────
✅ Passed:    18
❌ Failed:    2
⏭️ Skipped:   0
📈 Topic Selection: 95%
🎯 Action Invocation: 90%

FAILED TESTS
───────────────────────────────────────────────────────────────
❌ test_complex_order_inquiry
   Utterance: "What's the status of orders 12345 and 67890?"
   Expected: get_order_status invoked 2 times
   Actual: get_order_status invoked 1 time
   Category: ACTION_INVOCATION_COUNT_MISMATCH

COVERAGE SUMMARY
───────────────────────────────────────────────────────────────
Topics Tested:       4/5 (80%) ⚠️
Actions Tested:      6/8 (75%) ⚠️
Guardrails Tested:   3/3 (100%) ✅

解析测试结果JSON并显示格式化摘要：

📊 AGENT测试结果（CLI）
════════════════════════════════════════════════════════════════

Agent: Customer_Support_Agent
组织: vivint-DevInt
持续时间: 45.2s
模式: 模拟

摘要
───────────────────────────────────────────────────────────────
✅ 通过:    18
❌ 失败:    2
⏭️ 跳过:   0
📈 主题选择: 95%
🎯 动作调用: 90%

失败测试
───────────────────────────────────────────────────────────────
❌ test_complex_order_inquiry
   语句: "订单12345和67890的状态是什么？"
   预期: get_order_status调用2次
   实际: get_order_status调用1次
   类别: ACTION_INVOCATION_COUNT_MISMATCH

覆盖率摘要
───────────────────────────────────────────────────────────────
已测试主题:       4/5 (80%) ⚠️
已测试动作:      6/8 (75%) ⚠️
已测试防护规则:   3/3 (100%) ✅

Phase C: Agentic Fix Loop

阶段C：Agent自动修复循环

When tests fail (either Phase A or Phase B), automatically fix via sf-ai-agentscript:

当测试失败时（阶段A或B），通过sf-ai-agentscript自动修复：

Failure Categories (10 total)

故障类别（共10种）

Category	Source	Auto-Fix	Strategy
`TOPIC_NOT_MATCHED`	A+B	✅	Add keywords to topic description
`ACTION_NOT_INVOKED`	A+B	✅	Improve action description
`WRONG_ACTION_SELECTED`	A+B	✅	Differentiate descriptions
`ACTION_INVOCATION_FAILED`	A+B	⚠️	Delegate to sf-flow or sf-apex
`GUARDRAIL_NOT_TRIGGERED`	A+B	✅	Add explicit guardrails
`ESCALATION_NOT_TRIGGERED`	A+B	✅	Add escalation action/triggers
`TOPIC_RE_MATCHING_FAILURE`	A	✅	Add transition phrases to target topic
`CONTEXT_PRESERVATION_FAILURE`	A	✅	Add context retention instructions
`MULTI_TURN_ESCALATION_FAILURE`	A	✅	Add frustration detection triggers
`ACTION_CHAIN_FAILURE`	A	✅	Fix action output variable mappings

类别	来源	自动修复	策略
`TOPIC_NOT_MATCHED`	A+B	✅	向主题描述添加关键词
`ACTION_NOT_INVOKED`	A+B	✅	改进动作描述
`WRONG_ACTION_SELECTED`	A+B	✅	区分描述
`ACTION_INVOCATION_FAILED`	A+B	⚠️	委托给sf-flow或sf-apex
`GUARDRAIL_NOT_TRIGGERED`	A+B	✅	添加明确的防护规则
`ESCALATION_NOT_TRIGGERED`	A+B	✅	添加升级动作/触发器
`TOPIC_RE_MATCHING_FAILURE`	A	✅	向目标主题添加过渡短语
`CONTEXT_PRESERVATION_FAILURE`	A	✅	添加上下文保留指令
`MULTI_TURN_ESCALATION_FAILURE`	A	✅	添加沮丧检测触发器
`ACTION_CHAIN_FAILURE`	A	✅	修复动作输出变量映射

Auto-Fix Command Example

自动修复命令示例

bash

Skill(skill="sf-ai-agentscript", args="Fix agent [AgentName] - Error: [category] - [details]")

bash

Skill(skill="sf-ai-agentscript", args="修复Agent [AgentName] - 错误: [category] - [详情]")

Fix Loop Flow

修复循环流程

Test Failed → Analyze failure category
    │
    ├─ Single-turn failure → Standard fix (topics, actions, guardrails)
    │
    └─ Multi-turn failure → Enhanced fix (context, re-matching, escalation, chaining)
    │
    ▼
Apply fix via sf-ai-agentscript → Re-publish → Re-test
    │
    ├─ Pass → ✅ Move to next failure
    └─ Fail → Retry (max 3 attempts) → Escalate to human

See Agentic Fix Loops Guide for complete decision tree and 10 fix strategies.

测试失败 → 分析故障类别
    │
    ├─ 单轮失败 → 标准修复（主题、动作、防护规则）
    │
    └─ 多轮失败 → 增强修复（上下文、重匹配、升级、链式调用）
    │
    ▼
通过sf-ai-agentscript应用修复 → 重新发布 → 重新测试
    │
    ├─ 通过 → ✅ 处理下一个失败
    └─ 失败 → 重试（最多3次） → 升级给人工

完整决策树和10种修复策略请参见Agent自动修复循环指南。

Two Fix Strategies

两种修复策略

Agent Type	Fix Strategy	When to Use
Custom Agent (you control it)	Fix the agent via `sf-ai-agentscript`	Topic descriptions, action configs need adjustment
Managed/Standard Agent	Fix test expectations	Test expectations don't match actual behavior

Agent类型	修复策略	使用场景
自定义Agent（您控制它）	通过 `sf-ai-agentscript` 修复Agent	主题描述、动作配置需要调整
托管/标准Agent	修复测试期望	测试期望与实际行为不匹配

Phase D: Coverage Improvement

阶段D：覆盖率提升

If coverage < threshold:

Identify untested topics/actions/patterns from results
Add test cases (YAML for CLI, scenarios for API)
Re-run tests
Repeat until threshold met

如果覆盖率<阈值：

从结果中识别未测试的主题/动作/模式
添加测试用例（CLI使用YAML，API使用场景）
重新运行测试
重复直到达到阈值

Coverage Dimensions

覆盖率维度

Dimension	Phase A	Phase B	Target
Topic Selection	✅	✅	100%
Action Invocation	✅	✅	100%
Topic Re-matching	✅	❌	90%+
Context Preservation	✅	❌	95%+
Conversation Completion	✅	❌	85%+
Guardrails	✅	✅	100%
Escalation	✅	✅	100%
Phrasing Diversity	✅	✅	3+ per topic

See Coverage Analysis for complete metrics and improvement guide.

维度	阶段A	阶段B	目标
主题选择	✅	✅	100%
动作调用	✅	✅	100%
主题重匹配	✅	❌	90%+
上下文保留	✅	❌	95%+
对话完成	✅	❌	85%+
防护规则	✅	✅	100%
升级	✅	✅	100%
措辞多样性	✅	✅	每个主题3+种

完整指标和改进指南请参见覆盖率分析。

Phase E: Observability Integration

阶段E：可观测性集成

After test execution, guide user to analyze agent behavior with session-level observability:

Skill(skill="sf-ai-agentforce-observability", args="Analyze STDM sessions for agent [AgentName] in org [alias] - focus on test session behavior patterns")

What observability adds to testing:

STDM Session Analysis: Examine actual session traces from test conversations
Latency Profiling: Identify slow actions or topic routing delays
Error Pattern Detection: Find recurring failures across sessions
Action Execution Traces: Detailed view of Flow/Apex execution during tests

测试执行后，引导用户使用会话级可观测性分析Agent行为：

Skill(skill="sf-ai-agentforce-observability", args="分析组织[别名]中Agent[AgentName]的STDM会话 — 关注测试会话行为模式")

可观测性为测试增添的价值：

STDM会话分析： 检查测试对话的实际会话跟踪
延迟分析： 识别慢动作或主题路由延迟
错误模式检测： 发现跨会话的重复失败
动作执行跟踪： 测试期间Flow/Apex执行的详细视图

Scoring System (100 Points)

评分系统（100分）

Category	Points	Key Rules
Topic Selection Coverage	15	All topics have test cases; various phrasings tested
Action Invocation	15	All actions tested with valid inputs/outputs
Multi-Turn Topic Re-matching	15	Topic switching accuracy across turns
Context Preservation	15	Information retention across turns
Edge Case & Guardrail Coverage	15	Negative tests; guardrails; escalation
Test Spec / Scenario Quality	10	Proper YAML; descriptions; clear expectations
Agentic Fix Success	15	Auto-fixes resolve issues within 3 attempts

Scoring Thresholds:

⭐⭐⭐⭐⭐ 90-100 pts → Production Ready
⭐⭐⭐⭐   80-89 pts → Good, minor improvements
⭐⭐⭐    70-79 pts → Acceptable, needs work
⭐⭐      60-69 pts → Below standard
⭐        <60 pts  → BLOCKED - Major issues

类别	分数	关键规则
主题选择覆盖率	15	所有主题都有测试用例；测试多种措辞
动作调用	15	所有动作都使用有效输入/输出测试
多轮主题重匹配	15	多轮对话中主题切换的准确性
上下文保留	15	多轮对话中信息的保留
边缘案例与防护规则覆盖率	15	负面测试；防护规则；升级
测试用例/场景质量	10	正确的YAML；描述；清晰的期望
Agent自动修复成功率	15	自动修复在3次尝试内解决问题

评分阈值：

⭐⭐⭐⭐⭐ 90-100分 → 可用于生产
⭐⭐⭐⭐   80-89分 → 良好，需小幅改进
⭐⭐⭐    70-79分 → 可接受，需要改进
⭐⭐      60-69分 → 低于标准
⭐        <60分  → 已阻止 - 存在重大问题

⛔ TESTING GUARDRAILS (MANDATORY)

⛔ 测试防护规则（必填）

BEFORE running tests, verify:

Check	Command	Why
Agent published	`sf agent list --target-org [alias]`	Can't test unpublished agent
Agent activated	Check status	API and preview require activation
Flows deployed	`sf org list metadata --metadata-type Flow`	Actions need Flows
ECA configured (Phase A — multi-turn API only)	Token request test	Required for Agent Runtime API. Not needed for preview or CLI tests
Org auth (Phase B live)	`sf org display`	Live mode requires valid auth

NEVER do these:

Anti-Pattern	Problem	Correct Pattern
Test unpublished agent	Tests fail silently	Publish first
Skip simulated testing	Live mode hides logic bugs	Always test simulated first
Ignore guardrail tests	Security gaps in production	Always test harmful/off-topic inputs
Single phrasing per topic	Misses routing failures	Test 3+ phrasings per topic
Write ECA credentials to files	Security risk	Keep in shell variables only
Skip session cleanup	Resource leaks and rate limits	Always DELETE sessions after tests
Use `curl` for OAuth token requests	Domains with `--` cause shell failures	Use `credential_manager.py validate`
Ask permission to run skill scripts	Breaks flow, unnecessary delay	All `hooks/scripts/` are pre-approved — run automatically
Spawn more than 2 swarm workers	Context overload, screen space, diminishing returns	Max 2 workers — side-by-side monitoring

运行测试前，请验证：

检查项	命令	原因
Agent已发布	`sf agent list --target-org [别名]`	无法测试未发布的Agent
Agent已激活	检查状态	API和预览需要激活
Flows已部署	`sf org list metadata --metadata-type Flow`	动作需要Flows
ECA已配置（阶段A — 仅多轮API测试）	令牌请求测试	Agent Runtime API需要。预览或CLI测试不需要
组织认证（阶段B真实模式）	`sf org display`	真实模式需要有效认证

绝不要做这些：

反模式	问题	正确模式
测试未发布的Agent	测试静默失败	先发布
跳过模拟测试	真实模式隐藏逻辑错误	始终先测试模拟模式
忽略防护规则测试	生产中存在安全漏洞	始终测试有害/离题输入
每个主题仅使用一种措辞	遗漏路由失败	每个主题测试3+种措辞
将ECA凭证写入文件	安全风险	仅保存在shell变量中
跳过会话清理	资源泄漏和速率限制	测试后始终DELETE会话
使用 `curl` 进行OAuth令牌请求	包含 `--` 的域名导致shell失败	使用 `credential_manager.py validate`
请求运行技能脚本的权限	中断流程，不必要的延迟	所有 `hooks/scripts/` 均已预先批准 — 自动运行
生成超过2个集群工作进程	上下文过载、屏幕空间不足、收益递减	最多2个工作进程 — 并排监控

CLI Command Reference

CLI命令参考

Test Lifecycle Commands

测试生命周期命令

Command	Purpose	Example
`sf agent generate test-spec`	Create test YAML	`sf agent generate test-spec --output-dir ./tests`
`sf agent test create`	Deploy test to org	`sf agent test create --spec ./tests/spec.yaml --target-org alias`
`sf agent test run`	Execute tests	`sf agent test run --api-name Test --wait 10 --target-org alias`
`sf agent test results`	Get results	`sf agent test results --job-id ID --result-format json`
`sf agent test resume`	Resume async test	`sf agent test resume --job-id <JOB_ID> --target-org alias`
`sf agent test list`	List test runs	`sf agent test list --target-org alias`

命令	用途	示例
`sf agent generate test-spec`	创建测试YAML	`sf agent generate test-spec --output-dir ./tests`
`sf agent test create`	将测试部署到组织	`sf agent test create --spec ./tests/spec.yaml --target-org 别名`
`sf agent test run`	执行测试	`sf agent test run --api-name Test --wait 10 --target-org 别名`
`sf agent test results`	获取结果	`sf agent test results --job-id ID --result-format json`
`sf agent test resume`	恢复异步测试	`sf agent test resume --job-id <JOB_ID> --target-org 别名`
`sf agent test list`	列出测试运行	`sf agent test list --target-org 别名`

Preview Commands

预览命令

Command	Purpose	Example
`sf agent preview`	Interactive testing	`sf agent preview --api-name Agent --target-org alias`
`--use-live-actions`	Use real Flows/Apex	`sf agent preview --use-live-actions`
`--output-dir`	Save transcripts	`sf agent preview --output-dir ./logs`
`--apex-debug`	Capture debug logs	`sf agent preview --apex-debug`

命令	用途	示例
`sf agent preview`	交互式测试	`sf agent preview --api-name Agent --target-org 别名`
`--use-live-actions`	使用真实Flows/Apex	`sf agent preview --use-live-actions`
`--output-dir`	保存转录	`sf agent preview --output-dir ./logs`
`--apex-debug`	捕获调试日志	`sf agent preview --apex-debug`

Result Formats

结果格式

Format	Use Case	Flag
`human`	Terminal display (default)	`--result-format human`
`json`	CI/CD parsing	`--result-format json`
`junit`	Test reporting	`--result-format junit`
`tap`	Test Anything Protocol	`--result-format tap`

格式	使用场景	标志
`human`	终端显示（默认）	`--result-format human`
`json`	CI/CD解析	`--result-format json`
`junit`	测试报告	`--result-format junit`
`tap`	测试任何协议	`--result-format tap`

Multi-Turn Test Templates

多轮测试模板

Template	Pattern	Scenarios	Location
`multi-turn-topic-routing.yaml`	Topic switching	4	`templates/`
`multi-turn-context-preservation.yaml`	Context retention	4	`templates/`
`multi-turn-escalation-flows.yaml`	Escalation cascades	4	`templates/`
`multi-turn-comprehensive.yaml`	All 6 patterns	6	`templates/`

模板	模式	场景数量	位置
`multi-turn-topic-routing.yaml`	主题切换	4	`templates/`
`multi-turn-context-preservation.yaml`	上下文保留	4	`templates/`
`multi-turn-escalation-flows.yaml`	升级流程	4	`templates/`
`multi-turn-comprehensive.yaml`	所有6种模式	6	`templates/`

CLI Test Templates

CLI测试模板

Template	Purpose	Location
`basic-test-spec.yaml`	Quick start (3-5 tests)	`templates/`
`comprehensive-test-spec.yaml`	Full coverage (20+ tests) with context vars, metrics, custom evals	`templates/`
`context-vars-test-spec.yaml`	Context variable patterns (RoutableId, EndUserId, CaseId)	`templates/`
`custom-eval-test-spec.yaml`	Custom evaluations with JSONPath assertions (⚠️ Spring '26 bug)	`templates/`
`cli-auth-guardrail-tests.yaml`	Auth gate, guardrail, ambiguous routing, session tests (CLI)	`templates/`
`guardrail-tests.yaml`	Security/safety scenarios	`templates/`
`escalation-tests.yaml`	Human handoff scenarios	`templates/`
`agentscript-test-spec.yaml`	Agent Script agents with conversationHistory pattern	`templates/`
`standard-test-spec.yaml`	Reference format	`templates/`

模板	用途	位置
`basic-test-spec.yaml`	快速入门（3-5个测试）	`templates/`
`comprehensive-test-spec.yaml`	全面覆盖（20+个测试），带上下文变量、指标、自定义评估	`templates/`
`context-vars-test-spec.yaml`	上下文变量模式（RoutableId、EndUserId、CaseId）	`templates/`
`custom-eval-test-spec.yaml`	带JSONPath断言的自定义评估（⚠️ Spring '26错误）	`templates/`
`cli-auth-guardrail-tests.yaml`	认证门、防护规则、模糊路由、会话测试（CLI）	`templates/`
`guardrail-tests.yaml`	安全/安全场景	`templates/`
`escalation-tests.yaml`	人工交接场景	`templates/`
`agentscript-test-spec.yaml`	带conversationHistory模式的Agent Script Agent	`templates/`
`standard-test-spec.yaml`	参考格式	`templates/`

Cross-Skill Integration

跨技能集成

Required Delegations:

Scenario	Skill to Call	Command
Fix agent script	sf-ai-agentscript	`Skill(skill="sf-ai-agentscript", args="Fix...")`
Agent Script agents	sf-ai-agentscript	Parse `.agent` for topic/action discovery; use `conversationHistory` pattern for action tests
Create test data	sf-data	`Skill(skill="sf-data", args="Create...")`
Fix failing Flow	sf-flow	`Skill(skill="sf-flow", args="Fix...")`
Setup ECA or OAuth (multi-turn API only)	sf-connected-apps	`Skill(skill="sf-connected-apps", args="Create...")`
Analyze debug logs	sf-debug	`Skill(skill="sf-debug", args="Analyze...")`
Session observability	sf-ai-agentforce-observability	`Skill(skill="sf-ai-agentforce-observability", args="Analyze...")`

必填委托：

场景	要调用的技能	命令
修复Agent脚本	sf-ai-agentscript	`Skill(skill="sf-ai-agentscript", args="修复...")`
Agent Script Agent	sf-ai-agentscript	解析 `.agent` 以发现主题/动作；对动作测试使用conversationHistory模式
创建测试数据	sf-data	`Skill(skill="sf-data", args="创建...")`
修复失败的Flow	sf-flow	`Skill(skill="sf-flow", args="修复...")`
设置ECA或OAuth（仅多轮API测试）	sf-connected-apps	`Skill(skill="sf-connected-apps", args="创建...")`
分析调试日志	sf-debug	`Skill(skill="sf-debug", args="分析...")`
会话可观测性	sf-ai-agentforce-observability	`Skill(skill="sf-ai-agentforce-observability", args="分析...")`

Automated Testing (Python Scripts)

自动化测试（Python脚本）

Script	Purpose	Dependencies
`agent_api_client.py`	Reusable Agent Runtime API v1 client (auth, sessions, messaging, variables)	stdlib only
`multi_turn_test_runner.py`	Multi-turn test orchestrator (reads YAML, executes, evaluates, Rich colored reports)	pyyaml, rich + agent_api_client
`rich_test_report.py`	Aggregate N worker result JSONs into one unified Rich terminal report	rich
`generate-test-spec.py`	Parse .agent files, generate CLI test YAML specs	stdlib only
`run-automated-tests.py`	Orchestrate full CLI test workflow with fix suggestions	stdlib only

CLI Flags (multi_turn_test_runner.py):

Flag	Default	Purpose
`--report-file PATH`	none	Write Rich terminal report to file (ANSI codes included) — viewable with `cat` or `bat`
`--no-rich`	off	Disable Rich colored output; use plain-text format
`--width N`	auto	Override terminal width (auto-detects from $COLUMNS; fallback 80)
`--rich-output`	(deprecated)	No-op — Rich is now default when installed

Multi-Turn Testing (Agent Runtime API):

bash

undefined

脚本	用途	依赖项
`agent_api_client.py`	可复用的Agent Runtime API v1客户端（认证、会话、消息、变量）	仅标准库
`multi_turn_test_runner.py`	多轮测试编排器（读取YAML、执行、评估、Rich彩色报告）	pyyaml、rich + agent_api_client
`rich_test_report.py`	将N个工作进程的结果JSON聚合为一个统一的Rich终端报告	rich
`generate-test-spec.py`	解析.agent文件，生成CLI测试YAML规范	仅标准库
`run-automated-tests.py`	编排完整的CLI测试工作流并提供修复建议	仅标准库

CLI标志（multi_turn_test_runner.py）：

标志	默认值	用途
`--report-file PATH`	无	将Rich终端报告写入文件（包含ANSI代码） — 可使用 `cat` 或 `bat` 查看
`--no-rich`	关闭	禁用Rich彩色输出；使用纯文本格式
`--width N`	自动	覆盖终端宽度（从$COLUMNS自动检测；回退80）
`--rich-output`	(已弃用)	无操作 — 现在安装后默认使用Rich

多轮测试（Agent Runtime API）：

bash

undefined

Install test runner dependency

安装测试运行器依赖

pip3 install pyyaml

Run multi-turn test suite against an agent

对Agent运行多轮测试套件

python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--my-domain your-domain.my.salesforce.com
--consumer-key YOUR_KEY
--consumer-secret YOUR_SECRET
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-comprehensive.yaml
--output results.json --verbose

Or set env vars and omit credential flags

或设置环境变量并省略凭证标志

export SF_MY_DOMAIN=your-domain.my.salesforce.com export SF_CONSUMER_KEY=YOUR_KEY export SF_CONSUMER_SECRET=YOUR_SECRET python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py
--agent-id 0XxRM0000004ABC
--scenarios templates/multi-turn-topic-routing.yaml
--var '$Context.AccountId=001XXXXXXXXXXXX'
--verbose

Connectivity test (verify ECA credentials work)

连通性测试（验证ECA凭证是否有效）

python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py


**CLI Testing (Agent Testing Center):**
```bash

python3 {SKILL_PATH}/hooks/scripts/agent_api_client.py


**CLI测试（Agent测试中心）：**
```bash

Generate test spec from agent file

从Agent文件生成测试规范

python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file /path/to/Agent.agent
--output specs/Agent-tests.yaml

Run full automated workflow

运行完整的自动化工作流

python3 {SKILL_PATH}/hooks/scripts/run-automated-tests.py
--agent-name MyAgent
--agent-dir /path/to/project
--target-org dev

---

python3 {SKILL_PATH}/hooks/scripts/run-automated-tests.py
--agent-name MyAgent
--agent-dir /path/to/project
--target-org dev

---

🔄 Automated Test-Fix Loop

🔄 自动化测试修复循环

v2.0.0 | Supports both multi-turn API failures and CLI test failures

v2.0.0 | 支持多轮API失败和CLI测试失败

Quick Start

快速开始

bash

undefined

bash

undefined

Run the test-fix loop (CLI tests)

运行测试修复循环（CLI测试）

{SKILL_PATH}/hooks/scripts/test-fix-loop.sh Test_Agentforce_v1 AgentforceTesting 3

Exit codes:

退出代码:

0 = All tests passed

0 = 所有测试通过

1 = Fixes needed (Claude Code should invoke sf-ai-agentforce)

1 = 需要修复（Claude Code应调用sf-ai-agentforce）

2 = Max attempts reached, escalate to human

2 = 达到最大尝试次数，升级给人工

3 = Error (org unreachable, test not found, etc.)

3 = 错误（组织不可达，未找到测试等）

undefined

undefined

Claude Code Integration

Claude Code集成

USER: Run automated test-fix loop for Coral_Cloud_Agent

CLAUDE CODE:
1. Phase A: Run multi-turn scenarios via Python test runner
   python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
     --agent-id ${AGENT_ID} \
     --scenarios templates/multi-turn-comprehensive.yaml \
     --output results.json --verbose
2. Analyze failures from results.json (10 categories)
3. If fixable: Skill(skill="sf-ai-agentscript", args="Fix...")
4. Re-run failed scenarios with --scenario-filter
5. Phase B (if available): Run CLI tests
6. Repeat until passing or max retries (3)

用户: 为Coral_Cloud_Agent运行自动化测试修复循环

Claude Code:
1. 阶段A：通过Python测试运行器运行多轮场景
   python3 {SKILL_PATH}/hooks/scripts/multi_turn_test_runner.py \
     --agent-id ${AGENT_ID} \
     --scenarios templates/multi-turn-comprehensive.yaml \
     --output results.json --verbose
2. 从results.json分析失败（10种类别）
3. 如果可修复：Skill(skill="sf-ai-agentscript", args="修复...")
4. 使用--scenario-filter重新运行失败的场景
5. 阶段B（如果可用）：运行CLI测试
6. 重复直到通过或达到最大重试次数（3次）

Environment Variables

环境变量

Variable	Description	Default
`CURRENT_ATTEMPT`	Current attempt number	1
`MAX_WAIT_MINUTES`	Timeout for test execution	10
`SKIP_TESTS`	Comma-separated test names to skip	(none)
`VERBOSE`	Enable detailed output	false

变量	描述	默认值
`CURRENT_ATTEMPT`	当前尝试次数	1
`MAX_WAIT_MINUTES`	测试执行超时	10
`SKIP_TESTS`	要跳过的测试名称（逗号分隔）	无
`VERBOSE`	启用详细输出	false

💡 Key Insights

💡 关键见解

Problem	Symptom	Solution
`sf agent test create` fails	"Required fields are missing: [MasterLabel]"	Add `name:` field to top of YAML spec (see Phase B1)
Tests fail silently	No results returned	Agent not published - run `sf agent publish authoring-bundle`
Topic not matched	Wrong topic selected	Add keywords to topic description
Action not invoked	Action never called	Improve action description
Live preview 401	Authentication error	Re-authenticate: `sf org login web`
API 401	Token expired or wrong credentials	Re-authenticate ECA
API 404 on session create	Wrong Agent ID	Re-query BotDefinition for correct Id
Empty API response	Agent not activated	Activate and publish agent
Context lost between turns	Agent re-asks for known info	Add context retention instructions to topic
Topic doesn't switch	Agent stays on old topic	Add transition phrases to target topic
⚠️ `--use-most-recent` broken	"Nonexistent flag" error	Use `--job-id` explicitly
Topic name mismatch	Expected `GeneralCRM` , got `MigrationDefaultTopic`	Verify actual topic names from first test run
Action superset matching	Expected `[A]` , actual `[A,B]` but PASS	CLI uses SUPERSET logic

问题	症状	解决方案
`sf agent test create` 失败	"Required fields are missing: [MasterLabel]"	在YAML规范顶部添加 `name:` 字段（见阶段B1）
测试静默失败	无结果返回	Agent未发布 - 运行 `sf agent publish authoring-bundle`
主题未匹配	选择了错误的主题	向主题描述添加关键词
动作未调用	从未调用动作	改进动作描述
实时预览401	认证错误	重新认证： `sf org login web`
API 401	令牌过期或凭证错误	重新认证ECA
API创建会话404	错误的Agent ID	重新查询BotDefinition获取正确的Id
API响应为空	Agent未激活	激活并发布Agent
多轮对话中上下文丢失	Agent重新询问已知信息	向主题添加上下文保留指令
主题不切换	Agent停留在旧主题	向目标主题添加过渡短语
⚠️ `--use-most-recent` 已损坏	"Nonexistent flag"错误	明确使用 `--job-id`
主题名称不匹配	预期 `GeneralCRM` ，实际 `MigrationDefaultTopic`	从第一次测试运行验证实际主题名称
动作超集匹配	预期 `[A]` ，实际 `[A,B]` 但通过	CLI使用超集逻辑

Quick Start Example

快速开始示例

Multi-Turn API Testing (Recommended)

多轮API测试（推荐）

Quick Start with Python Scripts:

bash

undefined

使用Python脚本快速开始：

bash

undefined

1. Get agent ID

1. 获取Agent ID

AGENT_ID=$(sf data query --use-tooling-api
--query "SELECT Id FROM BotDefinition WHERE DeveloperName='My_Agent' AND IsActive=true LIMIT 1"
--result-format json --target-org dev | jq -r '.result.records[0].Id')

2. Run multi-turn tests (credentials from env or flags)

2. 运行多轮测试（凭证来自环境变量或标志）


**Ad-Hoc Python Usage:**
```python
from hooks.scripts.agent_api_client import AgentAPIClient

client = AgentAPIClient()  # reads SF_MY_DOMAIN, SF_CONSUMER_KEY, SF_CONSUMER_SECRET from env
with client.session(agent_id="0XxRM000...") as session:
    r1 = session.send("I need to cancel my appointment")
    r2 = session.send("Actually, reschedule it instead")
    r3 = session.send("What was my original request about?")
    # Session auto-ends when exiting context manager


**临时Python使用：**
```python
from hooks.scripts.agent_api_client import AgentAPIClient

client = AgentAPIClient()  # 从环境变量读取SF_MY_DOMAIN、SF_CONSUMER_KEY、SF_CONSUMER_SECRET
with client.session(agent_id="0XxRM000...") as session:
    r1 = session.send("我需要取消我的预约")
    r2 = session.send("实际上，改为重新安排")
    r3 = session.send("我最初的请求是什么？")
    # 退出上下文管理器时自动结束会话

CLI Testing (If Agent Testing Center Available)

CLI测试（如果Agent测试中心可用）

bash

undefined

bash

undefined

1. Generate test spec

1. 生成测试规范

python3 {SKILL_PATH}/hooks/scripts/generate-test-spec.py
--agent-file ./agents/MyAgent.agent
--output ./tests/myagent-tests.yaml

2. Create test in org

2. 在组织中创建测试

sf agent test create --spec ./tests/myagent-tests.yaml --api-name MyAgentTest --target-org dev

3. Run tests

3. 运行测试

sf agent test run --api-name MyAgentTest --wait 10 --result-format json --target-org dev

4. View results (use --job-id, NOT --use-most-recent)

4. 查看结果（使用--job-id，不要使用--use-most-recent）

sf agent test results --job-id [JOB_ID] --verbose --result-format json --target-org dev

---

sf agent test results --job-id [JOB_ID] --verbose --result-format json --target-org dev

---

🐛 Known Issues & CLI Bugs

🐛 已知问题与CLI错误

Last Updated: 2026-02-11 | Tested With: sf CLI v2.118.16+

最后更新： 2026-02-11 | 测试版本： sf CLI v2.118.16+

RESOLVED:

sf agent test create

MasterLabel Error

已解决：

sf agent test create

MasterLabel错误

Status: 🟢 RESOLVED — Add

name:

field to YAML spec

Error:

Required fields are missing: [MasterLabel]

Root Cause: The YAML spec must include a

name:

field at the top level, which maps to

MasterLabel

in the

AiEvaluationDefinition

XML. Our templates previously omitted this field.

Fix: Add

name:

to the top of your YAML spec:

yaml

name: "My Agent Tests"    # ← This was the missing field
subjectType: AGENT
subjectName: My_Agent

If you still encounter issues:

✅ Use interactive
```
sf agent generate test-spec
```
wizard (interactive-only, no CLI flags)
✅ Create tests via Salesforce Testing Center UI
✅ Deploy XML metadata directly
✅ Use Phase A (Agent Runtime API) instead — bypasses CLI entirely

状态： 🟢 已解决 — 向YAML规范添加

name:

字段

错误：

Required fields are missing: [MasterLabel]

根本原因： YAML规范必须在顶级包含

name:

字段，该字段映射到

AiEvaluationDefinition

XML中的

MasterLabel

。我们的模板之前省略了此字段。

修复： 在YAML规范顶部添加

name:

：

yaml

name: "我的Agent测试"    # ← 这是缺失的字段
subjectType: AGENT
subjectName: My_Agent

如果仍遇到问题：

✅ 使用交互式
```
sf agent generate test-spec
```
向导（仅交互式，无CLI标志）
✅ 通过Salesforce测试中心UI创建测试
✅ 直接部署XML元数据
✅ 改用阶段A（Agent Runtime API） — 完全绕过CLI

MEDIUM: Interactive Mode Not Scriptable

中等：交互模式不可脚本化

Status: 🟡 Blocks CI/CD automation

Issue:

sf agent generate test-spec

only works interactively.

Workaround: Use Python scripts in

hooks/scripts/

or Phase A multi-turn templates.

状态： 🟡 阻止CI/CD自动化

问题：

sf agent generate test-spec

仅支持交互式。

解决方法： 使用

hooks/scripts/

中的Python脚本或阶段A多轮模板。

MEDIUM: YAML vs XML Format Discrepancy

中等：YAML与XML格式差异

Key Mappings:

YAML Field	XML Element / Assertion Type
`expectedTopic`	`topic_assertion`
`expectedActions`	`actions_assertion`
`expectedOutcome`	`output_validation`
`contextVariables`	`contextVariable` ( `variableName` / `variableValue` )
`customEvaluations`	`string_comparison` / `numeric_comparison` ( `parameter` )
`metrics`	`expectation` (name only, no expectedValue)

关键映射：

YAML字段	XML元素 / 断言类型
`expectedTopic`	`topic_assertion`
`expectedActions`	`actions_assertion`
`expectedOutcome`	`output_validation`
`contextVariables`	`contextVariable` ( `variableName` / `variableValue` )
`customEvaluations`	`string_comparison` / `numeric_comparison` ( `parameter` )
`metrics`	`expectation` （仅名称，无expectedValue）

LOW: BotDefinition Not Always in Tooling API

低：BotDefinition并非始终在工具API中

Status: 🟡 Handled automatically

Issue: In some org configurations,

BotDefinition

is not queryable via the Tooling API but works via the regular Data API (

sf data query

without

--use-tooling-api

Fix:

agent_discovery.py live

now has automatic fallback — if the Tooling API returns no results for BotDefinition, it retries with the regular API.

状态： 🟡 自动处理

问题： 在某些组织配置中，

BotDefinition

无法通过工具API查询，但可通过常规数据API查询（不带

--use-tooling-api

的

sf data query

）。

修复：

agent_discovery.py live

现在有自动回退 — 如果工具API未返回BotDefinition结果，它会使用常规API重试。

LOW:

--use-most-recent

Not Implemented

低：

--use-most-recent

未实现

Status: Flag documented but NOT functional. Always use

--job-id

explicitly.

状态： 文档中记录了标志但无功能。始终明确使用

--job-id

。

CRITICAL: Custom Evaluations RETRY Bug (Spring '26)

严重：自定义评估RETRY错误（Spring '26）

Status: 🔴 PLATFORM BUG — Blocks all

string_comparison

numeric_comparison

evaluations with JSONPath

Error:

INTERNAL_SERVER_ERROR: The specified enum type has no constant with the specified name: RETRY

Scope:

Server returns "RETRY" status for test cases with custom evaluations using
```
isReference: true
```
Results API endpoint crashes with HTTP 500 when fetching results
Both filter expressions
```
[?(@.field == 'value')]
```
AND direct indexing
```
[0][0]
```
trigger the bug
Tests WITHOUT custom evaluations on the same run complete normally

Confirmed: Direct

curl

to REST endpoint returns same 500 — NOT a CLI parsing issue

Workaround:

Use Testing Center UI (Setup → Agent Testing) — may display results
Skip custom evaluations until platform patch
Use
```
expectedOutcome
```
(LLM-as-judge) for response validation instead

Tracking: Discovered 2026-02-09 on DevInt sandbox (Spring '26). TODO: Retest after platform patch.

状态： 🔴 平台错误 — 阻止所有带JSONPath的

string_comparison

numeric_comparison

评估

错误：

INTERNAL_SERVER_ERROR: The specified enum type has no constant with the specified name: RETRY

范围：

对于使用
```
isReference: true
```
的自定义评估的测试用例，服务器返回"RETRY"状态
获取结果时结果API端点崩溃并显示HTTP 500
过滤表达式
```
[?(@.field == 'value')]
```
和直接索引
```
[0][0]
```
都会触发错误
同一运行中不带自定义评估的测试正常完成

已确认： 直接

curl

到REST端点返回相同的500 — 不是CLI解析问题

解决方法：

使用测试中心UI（设置 → Agent测试） — 可能显示结果
跳过自定义评估直到平台修复
改用
```
expectedOutcome
```
（LLM作为判断者）进行响应验证

跟踪： 2026-02-09在DevInt沙箱（Spring '26）中发现。待办事项：平台修复后重新测试。

MEDIUM:

conciseness

Metric Returns Score=0

中等：

conciseness

指标返回Score=0

Status: 🟡 Platform bug — metric evaluation appears non-functional

Issue: The

conciseness

metric consistently returns

score: 0

with an empty

metricExplainability

field across all test cases tested on DevInt (Spring '26).

Workaround: Skip

conciseness

in metrics lists until platform patch.

状态： 🟡 平台错误 — 指标评估似乎无功能

问题： 在DevInt（Spring '26）测试的所有测试用例中，

conciseness

指标始终返回

score: 0

且

metricExplainability

字段为空。

解决方法： 在平台修复前，在指标列表中跳过

conciseness

。

LOW:

instruction_following

FAILURE at Score=1

低：

instruction_following

在Score=1时失败

Status: 🟡 Threshold mismatch — score and label disagree

Issue: The

instruction_following

metric labels results as "FAILURE" even when

score: 1

and the explanation text says the agent "follows instructions perfectly." This appears to be a pass/fail threshold configuration error on the platform side.

Workaround: Use the numeric

score

value (0 or 1) for evaluation. Ignore the PASS/FAILURE label.

状态： 🟡 阈值不匹配 — 分数和标签不一致

问题：

instruction_following

指标即使在

score: 1

且说明文本说Agent"完全遵循指令"时也会将结果标记为"FAILURE"。这似乎是平台端的通过/失败阈值配置错误。

解决方法： 使用数字

score

值（0或1）进行评估，忽略PASS/FAILURE标签。

HIGH:

instruction_following

Crashes Testing Center UI

高：

instruction_following

导致测试中心UI崩溃

Status: 🔴 Blocks Testing Center UI entirely — separate from threshold bug above

Error:

Unable to get test suite: No enum constant einstein.gpt.shared.testingcenter.enums.AiEvaluationMetricType.INSTRUCTION_FOLLOWING_EVALUATION

Scope: The Testing Center UI (Setup → Agent Testing) throws a Java exception when opening any test suite that includes the

instruction_following

metric. The CLI (

sf agent test run

) works fine — only the UI rendering is broken.

Workaround: Remove

- instruction_following

from the YAML metrics list and redeploy the test spec via

sf agent test create --force-overwrite

Note: This is a different bug from the threshold mismatch above. The threshold bug affects score interpretation; this bug blocks the entire UI from loading.

Discovered: 2026-02-11 on DevInt sandbox (Spring '26).

状态： 🔴 完全阻止测试中心UI — 与上述阈值错误无关

错误：

Unable to get test suite: No enum constant einstein.gpt.shared.testingcenter.enums.AiEvaluationMetricType.INSTRUCTION_FOLLOWING_EVALUATION

范围： 当打开任何包含

instruction_following

指标的测试套件时，测试中心UI（设置 → Agent测试）抛出Java异常。CLI（

sf agent test run

）正常工作 — 仅UI渲染损坏。

解决方法： 从YAML指标列表中删除

- instruction_following

，并通过

sf agent test create --force-overwrite

重新部署测试规范。

注意： 这是与上述阈值错误不同的问题。阈值错误影响分数解释；此错误完全阻止UI加载。

发现时间： 2026-02-11在DevInt沙箱（Spring '26）中。

sf-ai-agentforce-testing

Original

Translation

sf-ai-agentforce-testing: Agentforce Test Execution & Coverage Analysis

sf-ai-agentforce-testing：Agentforce测试执行与覆盖率分析

Core Responsibilities

核心职责

📚 Document Map

📚 文档地图

Script Location (MANDATORY)

脚本位置（必填）

⚠️ CRITICAL: Orchestration Order

⚠️ 关键：编排顺序

Architecture: Dual-Track Testing Workflow

架构：双轨测试工作流

Phase 0: Prerequisites & Agent Discovery

阶段0：前置条件与Agent发现

Step 1: Gather User Information

步骤1：收集用户信息

Step 2: Agent Discovery

步骤2：Agent发现

Auto-discover active agents in the org

自动发现组织中的活跃Agent

Step 3: Agent Metadata Retrieval

步骤3：Agent元数据检索

Retrieve agent configuration (topics, actions, instructions)

检索Agent配置（主题、动作、指令）

Step 4: Check Agent Testing Center Availability

步骤4：检查Agent测试中心可用性

This determines if Phase B is available

此命令决定阶段B是否可用

If error: "INVALID_TYPE: Cannot use: AiEvaluationDefinition"

如果出现错误："INVALID_TYPE: Cannot use: AiEvaluationDefinition"

→ Agent Testing Center NOT enabled → Phase A only

→ Agent测试中心未启用 → 仅使用阶段A

If success: → Both Phase A and Phase B available

如果成功：→ 阶段A和阶段B均可用

Step 5: Prerequisites Checklist

步骤5：前置条件检查清单

Deterministic Multi-Turn Interview Flow

确定性多轮访谈流程

I-4b: Session Variables

I-4b：会话变量

I-6: Partition Strategy

I-6：分区策略

I-7: Confirmation Summary Format

I-7：确认摘要格式

⚡ MANDATORY: Phase A4 Execution Protocol

⚡ 强制：阶段A4执行协议

Path A: Sequential Execution (worker_count == 1)

路径A：顺序执行（worker_count == 1）

Path B: Swarm Execution (worker_count == 2) — MANDATORY CHECKLIST

路径B：集群执行（worker_count == 2）— 强制检查清单

Credential Convention (~/.sfagent/)

凭证约定 (~/.sfagent/)

Directory Structure

目录结构

File Format

文件格式

credentials.env — managed by credential_manager.py

credentials.env — 由credential_manager.py管理

'export' prefix allows direct source credentials.env in shell

'export'前缀允许在shell中直接source credentials.env

Security Rules

安全规则

CLI Reference

CLI参考

Discover orgs and ECAs

发现组织和ECA

Load credentials (secrets masked in output)

加载凭证（输出中掩码密钥）

Save new credentials

保存新凭证

Validate OAuth flow

验证OAuth流程

Source credentials for shell use (set -a auto-exports all vars)

加载凭证供shell使用（set -a自动导出所有变量）

Swarm Execution Rules (Native Claude Code Teams)

集群执行规则（原生Claude Code团队）

Team Lead Rules (Claude Code)

'export' prefix allows direct
`source credentials.env`
in shell

'export'前缀允许在shell中直接
`source credentials.env`