midscene-yaml-generator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Midscene YAML Generator

典型工作流

Typical Workflow

用户需求 → [Generator] 生成 YAML
         → [Generator] 自动 dry-run 验证
         → 验证失败？→ [Generator] 自动修复
         → [Runner] 执行
         → 执行失败？→ [Runner] 分析 + 修复 YAML → 重新执行
         → 成功 → 展示报告摘要

User Requirement → [Generator] Generate YAML
                → [Generator] Auto dry-run validation
                → Validation failed? → [Generator] Auto-fix
                → [Runner] Execute
                → Execution failed? → [Runner] Analyze + Fix YAML → Re-execute
                → Success → Display report summary

触发条件

Trigger Conditions

当用户描述一个浏览器自动化需求（自然语言），需要生成 Midscene YAML 文件时使用。

常见触发短语：

"生成一个 YAML 来..."
"帮我写个自动化脚本..."
"创建 Midscene 测试用例..."
"我想自动化 XXX 操作..."
"把这个需求转成 YAML..."
"写个 Midscene 配置文件..."

English trigger phrases:

"Generate a YAML for..."
"Write an automation script to..."
"Create a test case for..."
"Automate the login flow"
"Convert this requirement to YAML"
"Write a Midscene config file for..."

Use this when users describe a browser automation requirement (in natural language) and need to generate a Midscene YAML file.

Common trigger phrases:

"Generate a YAML to..."
"Help me write an automation script..."
"Create a Midscene test case..."
"I want to automate the XXX operation..."
"Convert this requirement to YAML..."
"Write a Midscene config file..."

English trigger phrases:

"Generate a YAML for..."
"Write an automation script to..."
"Create a test case for..."
"Automate the login flow"
"Convert this requirement to YAML"
"Write a Midscene config file for..."

工作流程

Workflow

第 1 步：分析需求复杂度

Step 1: Analyze Requirement Complexity

根据用户描述判断所需模式：

选择 Native 模式 — 当需求仅涉及：

打开网页 / 启动应用
点击、悬停、输入、滚动、键盘操作等基础交互
AI 自动规划执行（
```
ai
```
）
数据提取（
```
aiQuery
```
）
验证断言（
```
aiAssert
```
）
等待条件（
```
aiWaitFor
```
）
工具操作（
```
sleep
```
、
```
javascript
```
、
```
recordToReport
```
）
平台特定操作（
```
runAdbShell
```
、
```
runWdaRequest
```
、
```
launch
```
）

选择 Extended 模式 — 当需求涉及以下任一：

条件判断（"如果...则..."）
循环操作（"重复"、"遍历"、"翻页"）
变量和动态数据（"定义变量"、"参数化"）
外部 API 调用（"调用接口"）
错误处理重试（"失败了就..."、"重试"）
并行任务（"同时做..."）
数据转换处理（"过滤"、"排序"、"映射"）
导入复用子流程（"复用"、"导入"）

经验法则: 先用 Native 写，当你发现自己需要

if

、

for

或变量时，切换到 Extended。

Determine the required mode based on the user's description:

Choose Native Mode — When the requirement only involves:

Open web pages / Launch applications
Basic interactions like clicking, hovering, inputting, scrolling, keyboard operations
AI automatic planning and execution (
```
ai
```
)
Data extraction (
```
aiQuery
```
)
Validation assertions (
```
aiAssert
```
)
Wait conditions (
```
aiWaitFor
```
)
Tool operations (
```
sleep
```
,
```
javascript
```
,
```
recordToReport
```
)
Platform-specific operations (
```
runAdbShell
```
,
```
runWdaRequest
```
,
```
launch
```
)

Choose Extended Mode — When the requirement involves any of the following:

Conditional judgment ("If...then...")
Loop operations ("Repeat", "Traverse", "Pagination")
Variables and dynamic data ("Define variables", "Parameterization")
External API calls ("Call API")
Error handling and retries ("If failed...", "Retry")
Parallel tasks ("Do...simultaneously")
Data transformation processing ("Filter", "Sort", "Map")
Import and reuse sub-flows ("Reuse", "Import")

Rule of Thumb: Start with Native mode, switch to Extended mode when you find you need

if

for

or variables.

第 2 步：确定目标平台

Step 2: Determine Target Platform

根据用户描述判断平台配置：

用户描述	平台	YAML 配置
"打开网页/网站/URL"	Web	`web: { url: "...", headless: false }`
"测试 Android 应用"	Android	`android: { deviceId: "..." }` + `launch: "包名"`
"测试 iOS 应用"	iOS	`ios: { wdaPort: 8100 }` + `launch: "bundleId"`
"桌面自动化"	Computer	`computer: { ... }`

Web 平台额外配置选项：

```
headless: true/false
```
— 是否无头模式运行（默认 false）
```
viewportWidth
```
/
```
viewportHeight
```
— 视口大小（默认 1280×720）
```
userAgent
```
— 自定义 User-Agent
```
deviceScaleFactor
```
— 设备像素比（如 Retina 屏设 2）

waitForNetworkIdle

— 网络空闲等待配置，支持

true

或对象格式

{ timeout: 2000, continueOnNetworkIdleError: true }

```
cookie
```
— Cookie JSON 文件路径（实现免登录会话恢复）
```
bridgeMode
```
— Bridge 模式：
```
false
```
（默认）|
```
'newTabWithUrl'
```
|
```
'currentTab'
```
，复用已登录的桌面浏览器

chromeArgs

— 自定义 Chrome 启动参数数组（如

['--disable-gpu', '--proxy-server=...']

）

```
serve
```
— 本地静态文件目录，启动内置服务器
```
acceptInsecureCerts
```
— 忽略 HTTPS 证书错误（默认 false）
```
forceSameTabNavigation
```
— 限制导航在当前标签页（默认 true）

Determine platform configuration based on user description:

User Description	Platform	YAML Configuration
"Open web page/website/URL"	Web	`web: { url: "...", headless: false }`
"Test Android app"	Android	`android: { deviceId: "..." }` + `launch: "package name"`
"Test iOS app"	iOS	`ios: { wdaPort: 8100 }` + `launch: "bundleId"`
"Desktop automation"	Computer	`computer: { ... }`

Additional Web Platform Configuration Options:

```
headless: true/false
```
— Run in headless mode (default false)
```
viewportWidth
```
/
```
viewportHeight
```
— Viewport size (default 1280×720)
```
userAgent
```
— Custom User-Agent
```
deviceScaleFactor
```
— Device pixel ratio (e.g., set to 2 for Retina screens)

waitForNetworkIdle

— Network idle wait configuration, supports

true

or object format

{ timeout: 2000, continueOnNetworkIdleError: true }

```
cookie
```
— Path to Cookie JSON file (enables login-free session recovery)
```
bridgeMode
```
— Bridge mode:
```
false
```
(default) |
```
'newTabWithUrl'
```
|
```
'currentTab'
```
, reuses logged-in desktop browser
```
chromeArgs
```
— Array of custom Chrome launch arguments (e.g.,
```
['--disable-gpu', '--proxy-server=...']
```
)
```
serve
```
— Local static file directory, starts built-in server
```
acceptInsecureCerts
```
— Ignore HTTPS certificate errors (default false)
```
forceSameTabNavigation
```
— Restrict navigation to current tab (default true)

第 3 步：自然语言 → YAML 转换

Step 3: Natural Language → YAML Conversion

动作选择优先级（重要）

Action Selection Priority (Important)

首选
ai:
— 用自然语言描述整个意图，让 AI 自动规划并执行多步骤。适合绝大多数场景，成功率最高
需要精确控制时 — 使用
```
aiTap
```
、
```
aiInput
```
等具体动作（如填写特定表单字段）
需要提取数据时 — 必须使用
```
aiQuery
```
（
```
ai:
```
不能返回结构化数据）
需要验证状态时 — 使用
```
aiAssert
```
或
```
aiWaitFor
```

经验法则: 如果用户需求可以用一句自然语言描述完成，优先用一个

ai:

步骤，而不是拆成多个

aiInput

aiTap

。

黄金路径 — 最简可工作示例:

yaml

web:
  url: "https://www.baidu.com"

tasks:
  - name: "搜索 Midscene"
    flow:
      - ai: "在搜索框输入 Midscene 并点击搜索"
      - sleep: 3000
      - aiAssert: "页面显示了搜索结果"

Prefer
ai:
— Describe the entire intent in natural language, let AI automatically plan and execute multi-step operations. Suitable for most scenarios with the highest success rate
When precise control is needed — Use specific actions like
```
aiTap
```
,
```
aiInput
```
(e.g., filling in specific form fields)
When data extraction is needed — Must use
```
aiQuery
```
(
```
ai:
```
cannot return structured data)
When state validation is needed — Use
```
aiAssert
```
or
```
aiWaitFor
```

Rule of Thumb: If the user's requirement can be described in a single natural language sentence, prioritize using one

ai:

step instead of splitting into multiple

aiInput

aiTap

steps.

Golden Path - Minimal Working Example:

yaml

web:
  url: "https://www.baidu.com"

tasks:
  - name: "Search for Midscene"
    flow:
      - ai: "Enter Midscene in the search box and click search"
      - sleep: 3000
      - aiAssert: "The page displays search results"

Native 模式 YAML 格式规范（重要）

Native Mode YAML Format Specification (Important)

Native 模式的动作参数支持两种格式：

扁平格式（推荐，简洁）：动作关键字后跟字符串值，额外参数作为同级兄弟键。

yaml

- aiInput: "搜索框"
  value: "关键词"
- aiWaitFor: "页面加载完成"
  timeout: 10000
- aiTap: "按钮描述"
  deepThink: true
- aiAssert: "页面包含预期内容"
  errorMessage: "内容验证失败"

嵌套格式（也有效，适合复杂参数）：

yaml

- aiInput:
    locator: "搜索框"
    value: "关键词"
- aiQuery:
    query: "提取商品列表"
    name: "products"

使用以下映射规则表将用户需求转换为 YAML：

Native mode supports two formats for action parameters:

Flat Format (Recommended, concise): Action keyword followed by string value, additional parameters as sibling keys.

yaml

- aiInput: "Search box"
  value: "Keyword"
- aiWaitFor: "Page loaded completely"
  timeout: 10000
- aiTap: "Button description"
  deepThink: true
- aiAssert: "Page contains expected content"
  errorMessage: "Content validation failed"

Nested Format (Also valid, suitable for complex parameters):

yaml

- aiInput:
    locator: "Search box"
    value: "Keyword"
- aiQuery:
    query: "Extract product list"
    name: "products"

Use the following mapping rule table to convert user requirements to YAML:

Native 动作映射

Native Action Mapping

自然语言模式	YAML 映射	说明
"打开/访问/进入 XXX 网站"	`web: { url: "XXX" }`	平台配置
"自动规划并执行 XXX"	`ai: "XXX"`	AI 自动拆解为多步骤执行；可选 `fileChooserAccept: "path"` 处理文件上传对话框
"点击/按/选择 XXX"	`aiTap: "XXX"`	简写形式
"悬停/移到 XXX 上"	`aiHover: "XXX"`	触发下拉菜单或 tooltip
"在 XXX 输入 YYY"	`aiInput: "XXX"` + `value: "YYY"`	扁平兄弟格式；可选 `mode: "replace"\|"clear"\|"typeOnly"\|"append"`
"按键盘 XXX 键"	`aiKeyboardPress: "XXX"`	支持组合键如 "Control+A"； `keyName` 可作为替代参数
"向下/上/左/右滚动"	`aiScroll: "目标区域"` + `direction: "down"`	扁平兄弟格式；可选 `distance` 、 `scrollType`
"等待 XXX 出现"	`aiWaitFor: "XXX"`	可选 timeout（毫秒）
"检查/验证/确认 XXX"	`aiAssert: "XXX"`	可选 errorMessage
"获取/提取/读取 XXX"	`aiQuery: { query: "XXX", name: "result" }`	name 用于存储结果
"暂停/等待 N 秒"	`sleep: N*1000`	参数为毫秒
"执行 JS 代码"	`javascript: "代码内容"`	直接执行 JavaScript
"截图记录到报告"	`recordToReport: "标题"` + `content: "描述"`	截图并记录描述到报告
"双击 XXX"	`aiDoubleClick: "XXX"`	双击操作；可选 `deepThink: true`
"右键点击 XXX"	`aiRightClick: "XXX"`	右键操作；可选 `deepThink: true`
"定位 XXX 元素"	`aiLocate: "XXX"` + `name: "elem"`	定位元素，结果存入变量（Extended 模式可引用）
"XXX 是否为真？"	`aiBoolean: "XXX"` + `name: "flag"`	返回布尔值；可选 `domIncluded` / `screenshotIncluded`
"获取 XXX 数量"	`aiNumber: "XXX"` + `name: "count"`	返回数字；可选 `domIncluded` / `screenshotIncluded`
"获取 XXX 文本"	`aiString: "XXX"` + `name: "text"`	返回字符串；可选 `domIncluded` / `screenshotIncluded`
"询问 AI XXX"	`aiAsk: "XXX"` + `name: "answer"`	自由提问，返回文本答案
"拖拽 A 到 B"	`aiDragAndDrop: "A"` + `to: "B"`	扁平格式；或嵌套 `{ from: "A", to: "B" }`
"清空 XXX 输入框"	`aiClearInput: "XXX"`	清除输入框内容
"执行 ADB 命令"	`runAdbShell: "命令"`	Android 平台特有
"执行 WDA 请求"	`runWdaRequest: { ... }`	iOS 平台特有
"启动应用"	`launch: "包名"`	移动端启动应用

Natural Language Pattern	YAML Mapping	Description
"Open/access/enter XXX website"	`web: { url: "XXX" }`	Platform configuration
"Automatically plan and execute XXX"	`ai: "XXX"`	AI automatically breaks down into multi-step execution; optional `fileChooserAccept: "path"` to handle file upload dialogs
"Click/press/select XXX"	`aiTap: "XXX"`	Short form
"Hover/move over XXX"	`aiHover: "XXX"`	Trigger dropdown menu or tooltip
"Enter YYY in XXX"	`aiInput: "XXX"` + `value: "YYY"`	Flat sibling format; optional `mode: "replace"\|"clear"\|"typeOnly"\|"append"`
"Press XXX key on keyboard"	`aiKeyboardPress: "XXX"`	Supports key combinations like "Control+A"; `keyName` can be used as an alternative parameter
"Scroll down/up/left/right"	`aiScroll: "Target area"` + `direction: "down"`	Flat sibling format; optional `distance` , `scrollType`
"Wait for XXX to appear"	`aiWaitFor: "XXX"`	Optional timeout (in milliseconds)
"Check/verify/confirm XXX"	`aiAssert: "XXX"`	Optional errorMessage
"Get/extract/read XXX"	`aiQuery: { query: "XXX", name: "result" }`	name is used to store the result
"Pause/wait N seconds"	`sleep: N*1000`	Parameter is in milliseconds
"Execute JS code"	`javascript: "Code content"`	Execute JavaScript directly
"Take screenshot and record to report"	`recordToReport: "Title"` + `content: "Description"`	Take screenshot and record description to report
"Double-click XXX"	`aiDoubleClick: "XXX"`	Double-click operation; optional `deepThink: true`
"Right-click XXX"	`aiRightClick: "XXX"`	Right-click operation; optional `deepThink: true`
"Locate XXX element"	`aiLocate: "XXX"` + `name: "elem"`	Locate element, store result in variable (referencable in Extended mode)
"Is XXX true?"	`aiBoolean: "XXX"` + `name: "flag"`	Returns boolean value; optional `domIncluded` / `screenshotIncluded`
"Get the number of XXX"	`aiNumber: "XXX"` + `name: "count"`	Returns number; optional `domIncluded` / `screenshotIncluded`
"Get text of XXX"	`aiString: "XXX"` + `name: "text"`	Returns string; optional `domIncluded` / `screenshotIncluded`
"Ask AI about XXX"	`aiAsk: "XXX"` + `name: "answer"`	Free-form question, returns text answer
"Drag A to B"	`aiDragAndDrop: "A"` + `to: "B"`	Flat format; or nested `{ from: "A", to: "B" }`
"Clear XXX input box"	`aiClearInput: "XXX"`	Clear input box content
"Execute ADB command"	`runAdbShell: "Command"`	Android platform only
"Execute WDA request"	`runWdaRequest: { ... }`	iOS platform only
"Launch app"	`launch: "Package name"`	Mobile app launch

Extended 控制流映射

Extended Control Flow Mapping

自然语言模式	YAML 映射
"定义变量 XXX 为 YYY"	`variables: { XXX: "YYY" }`
"使用环境变量 XXX"	`${ENV:XXX}` 或 `${ENV.XXX}`
"如果 XXX 则 YYY 否则 ZZZ"	`logic: { if: "XXX", then: [YYY], else: [ZZZ] }`
"重复 N 次"	`loop: { type: repeat, count: N, steps: [...] }`
"对每个 XXX 执行"	`loop: { type: for, items: "XXX", itemVar: "item", steps: [...] }` （ `itemVar` / `as` / `item` 均可）
"当 XXX 时持续做 YYY"	`loop: { type: while, condition: "XXX", maxIterations: N, steps: [...] }`
"先做 A，失败了就做 B"	`try: { steps: [A] }, catch: { steps: [B] }`
"同时做 A 和 B"	`parallel: { branches: [{steps: [A]}, {steps: [B]}], waitAll: true, merge_results: true }`
"调用 XXX 接口"	`external_call: { type: http, method: POST, url: "XXX", response_as: "varName" }`
"执行 Shell 命令"	`external_call: { type: shell, command: "XXX" }`
"导入/复用 XXX 流程"	`import: [{ flow: "XXX.yaml", as: name }]`
"过滤/排序/映射数据"	`data_transform: { source, operation, ... }`

Natural Language Pattern	YAML Mapping
"Define variable XXX as YYY"	`variables: { XXX: "YYY" }`
"Use environment variable XXX"	`${ENV:XXX}` or `${ENV.XXX}`
"If XXX then YYY else ZZZ"	`logic: { if: "XXX", then: [YYY], else: [ZZZ] }`
"Repeat N times"	`loop: { type: repeat, count: N, steps: [...] }`
"Execute for each XXX"	`loop: { type: for, items: "XXX", itemVar: "item", steps: [...] }` ( `itemVar` / `as` / `item` are all acceptable)
"Continue doing YYY while XXX"	`loop: { type: while, condition: "XXX", maxIterations: N, steps: [...] }`
"Do A first, do B if it fails"	`try: { steps: [A] }, catch: { steps: [B] }`
"Do A and B simultaneously"	`parallel: { branches: [{steps: [A]}, {steps: [B]}], waitAll: true, merge_results: true }`
"Call XXX API"	`external_call: { type: http, method: POST, url: "XXX", response_as: "varName" }`
"Execute Shell command"	`external_call: { type: shell, command: "XXX" }`
"Import/reuse XXX flow"	`import: [{ flow: "XXX.yaml", as: name }]`
"Filter/sort/map data"	`data_transform: { source, operation, ... }`

第 4 步：选择模板起点

Step 4: Select Template Starting Point

参考

templates/

目录下的模板文件，找到最接近用户需求的模板作为起点：

Native 模板：

```
templates/native/web-basic.yaml
```
— 基础网页操作
```
templates/native/web-login.yaml
```
— 登录流程
```
templates/native/web-data-extract.yaml
```
— 数据提取
```
templates/native/web-search.yaml
```
— 网页搜索流程
```
templates/native/web-file-upload.yaml
```
— 文件上传表单
```
templates/native/web-multi-tab.yaml
```
— 多标签页操作
```
templates/native/deep-think-locator.yaml
```
— 图片辅助定位（deepThink/xpath）
```
templates/native/android-app.yaml
```
— Android 测试
```
templates/native/ios-app.yaml
```
— iOS 测试
```
templates/native/computer-desktop.yaml
```
— 桌面应用自动化

Extended 模板：

templates/extended/web-conditional-flow.yaml

— 条件分支

templates/extended/web-pagination-loop.yaml

— 分页循环

templates/extended/web-data-pipeline.yaml

— 数据流水线

templates/extended/multi-step-with-retry.yaml

— 带重试的多步骤

templates/extended/api-integration-test.yaml

— API 集成

```
templates/extended/e2e-workflow.yaml
```
— 端到端完整工作流

templates/extended/reusable-sub-flows.yaml

— 子流程复用（import/use）

```
templates/extended/responsive-test.yaml
```
— 多视口响应式测试
```
templates/extended/web-auth-flow.yaml
```
— OAuth/登录认证流程（使用变量和环境引用）

模板选择决策：

需求特征	推荐模板
简单页面操作（打开、点击、输入）	`native/web-basic.yaml`
登录 / 表单填写	`native/web-login.yaml`
数据采集 / 信息提取	`native/web-data-extract.yaml`
搜索 + 结果验证	`native/web-search.yaml`
文件上传 / 附件提交	`native/web-file-upload.yaml`
OAuth/第三方认证登录	`extended/web-auth-flow.yaml`
桌面应用自动化（非浏览器）	`native/computer-desktop.yaml`
需要条件判断（如果登录了就...）	`extended/web-conditional-flow.yaml`
需要翻页 / 列表遍历	`extended/web-pagination-loop.yaml`
数据过滤 / 排序 / 聚合	`extended/web-data-pipeline.yaml`
需要失败重试	`extended/multi-step-with-retry.yaml`
需要调用外部 API	`extended/api-integration-test.yaml`
完整业务流程（多步骤 + 变量 + 导出）	`extended/e2e-workflow.yaml`
子流程复用 / 模块化	`extended/reusable-sub-flows.yaml`
多屏幕尺寸响应式验证	`extended/responsive-test.yaml`
复杂元素定位 / deepThink	`native/deep-think-locator.yaml`
多标签页操作	`native/web-multi-tab.yaml`

Refer to the template files in the

templates/

directory and find the template closest to the user's requirement as the starting point:

Native Templates:

```
templates/native/web-basic.yaml
```
— Basic web operations
```
templates/native/web-login.yaml
```
— Login flow
```
templates/native/web-data-extract.yaml
```
— Data extraction
```
templates/native/web-search.yaml
```
— Web search flow
```
templates/native/web-file-upload.yaml
```
— File upload form
```
templates/native/web-multi-tab.yaml
```
— Multi-tab operations
```
templates/native/deep-think-locator.yaml
```
— Image-assisted location (deepThink/xpath)
```
templates/native/android-app.yaml
```
— Android testing
```
templates/native/ios-app.yaml
```
— iOS testing
```
templates/native/computer-desktop.yaml
```
— Desktop app automation

Extended Templates:

templates/extended/web-conditional-flow.yaml

— Conditional branching

templates/extended/web-pagination-loop.yaml

— Pagination loop

templates/extended/web-data-pipeline.yaml

— Data pipeline

templates/extended/multi-step-with-retry.yaml

— Multi-step with retry

templates/extended/api-integration-test.yaml

— API integration

```
templates/extended/e2e-workflow.yaml
```
— End-to-end complete workflow

templates/extended/reusable-sub-flows.yaml

— Sub-flow reuse (import/use)

```
templates/extended/responsive-test.yaml
```
— Multi-viewport responsive testing
```
templates/extended/web-auth-flow.yaml
```
— OAuth/login authentication flow (using variables and environment references)

Template Selection Decision:

Requirement Feature	Recommended Template
Simple page operations (open, click, input)	`native/web-basic.yaml`
Login / Form filling	`native/web-login.yaml`
Data collection / Information extraction	`native/web-data-extract.yaml`
Search + Result validation	`native/web-search.yaml`
File upload / Attachment submission	`native/web-file-upload.yaml`
OAuth/Third-party authentication login	`extended/web-auth-flow.yaml`
Desktop app automation (non-browser)	`native/computer-desktop.yaml`
Conditional judgment needed (If logged in then...)	`extended/web-conditional-flow.yaml`
Pagination / List traversal needed	`extended/web-pagination-loop.yaml`
Data filtering / Sorting / Aggregation	`extended/web-data-pipeline.yaml`
Retry on failure needed	`extended/multi-step-with-retry.yaml`
External API call needed	`extended/api-integration-test.yaml`
Complete business flow (multi-step + variables + export)	`extended/e2e-workflow.yaml`
Sub-flow reuse / Modularization	`extended/reusable-sub-flows.yaml`
Multi-screen size responsive validation	`extended/responsive-test.yaml`
Complex element location / deepThink	`native/deep-think-locator.yaml`
Multi-tab operations	`native/web-multi-tab.yaml`

第 5 步：生成 YAML

Step 5: Generate YAML

基于模板和转换规则生成 YAML 内容，注意以下要点：

文件头部：添加注释说明需求来源和生成时间
engine 字段：Extended 模式必须显式声明
```
engine: extended
```
features 列表：Extended 模式下声明使用的特性（如
```
features: [logic, variables, loop]
```
），Native 模式可省略
agent 配置（可选）：
```
testId
```
用于标识测试、
```
groupName
```
/
```
groupDescription
```
用于报告分类、
```
cache: true
```
可缓存 AI 结果加速重复运行
aiActContext（可选）：为 AI Agent 提供额外上下文信息（如多语言网站标注语言、特殊领域术语），设置在
```
agent: { aiActContext: "描述" }
```
continueOnError（可选）：如需某个任务失败后继续执行后续任务，设置
```
continueOnError: true
```
output 导出（可选）：将
```
aiQuery
```
等结果导出为 JSON 文件，供后续流程使用

Generate YAML content based on templates and conversion rules, pay attention to the following points:

File Header: Add comments explaining the requirement source and generation time
engine field: Extended mode must explicitly declare
```
engine: extended
```
features list: In Extended mode, declare the features used (e.g.,
```
features: [logic, variables, loop]
```
), which can be omitted in Native mode
agent configuration (optional):
```
testId
```
is used to identify tests,
```
groupName
```
/
```
groupDescription
```
for report classification,
```
cache: true
```
can cache AI results to speed up repeated runs
aiActContext (optional): Provide additional context information for AI Agent (such as language annotation for multilingual websites, special domain terms), set in
```
agent: { aiActContext: "Description" }
```
continueOnError (optional): If you need to continue executing subsequent tasks after a task fails, set
```
continueOnError: true
```
output export (optional): Export results like
```
aiQuery
```
to a JSON file for use in subsequent processes

输出格式

Output Format

yaml

undefined

yaml

undefined

自动生成 by Midscene YAML Generator

Auto-generated by Midscene YAML Generator

需求描述: [用户原始需求]

Requirement Description: [Original user requirement]

生成时间: [timestamp]

Generation Time: [timestamp]

engine: native|extended features: [...] # 仅 extended 模式

engine: native|extended features: [...] # Extended mode only

可选: agent 配置

Optional: agent configuration

agent:

testId: "test-001"

groupName: "自动化测试组"

groupName: "Automation Testing Group"

groupDescription: "描述"

groupDescription: "Description"

cache: true

[platform_config]

tasks:

name: "[任务名称]"
continueOnError: true # 可选：失败后继续
flow: [生成的步骤]
output: # 可选：导出数据

filePath: "./midscene-output/data.json"

dataName: "variableName"

undefined

[platform_config]

tasks:

name: "[Task Name]"
continueOnError: true # Optional: Continue on failure
flow: [Generated steps]
output: # Optional: Export data

filePath: "./midscene-output/data.json"

dataName: "variableName"

undefined

第 6 步：验证并输出

Step 6: Validate and Output

输出文件到
```
./midscene-output/
```
目录

调用验证器确认 YAML 有效：

bash

node scripts/midscene-run.js <file> --dry-run

如果验证失败，分析错误原因并自动修复
验证通过后，提示用户可以使用 Runner 执行：
bash
```
node scripts/midscene-run.js <file>
```

Output the file to the
```
./midscene-output/
```
directory
Call the validator to confirm the YAML is valid:
bash
```
node scripts/midscene-run.js <file> --dry-run
```
If validation fails, analyze the error cause and auto-fix
After validation passes, prompt the user to execute using Runner:
bash
```
node scripts/midscene-run.js <file>
```

AI 指令编写最佳实践

Best Practices for Writing AI Instructions

生成 YAML 时，AI 指令（

aiTap

、

aiAssert

等参数）的质量直接影响执行成功率。遵循以下原则：

When generating YAML, the quality of AI instructions (parameters for

aiTap

aiAssert

, etc.) directly affects execution success rate. Follow these principles:

描述精确性

Description Precision

差:
```
aiTap: "按钮"
```
— 页面可能有多个按钮

好:

aiTap: "页面右上角的蓝色登录按钮"

— 位置 + 颜色 + 功能

更好:

aiTap: "导航栏中文字为'立即登录'的按钮"

— 精确到文字内容

Poor:
```
aiTap: "Button"
```
— There may be multiple buttons on the page

Good:

aiTap: "Blue login button at the top right corner of the page"

— Position + Color + Function

Better:

aiTap: "Button with text 'Login Now' in the navigation bar"

— Precise to text content

定位策略优先级

Location Strategy Priority

自然语言描述（首选）：可读性高，适应页面变化
deepThink 模式：复杂页面中多个相似元素时启用，AI 会进行更深层分析，准确率更高但耗时更长
图片辅助定位（image prompting）：当文字描述不够时，可通过截图标注辅助 AI 理解目标元素（官方
```
locate.images
```
能力）
xpath 选择器（最后手段）：当自然语言无法精确定位时。注意：xpath 仅适用于 Web 平台，Android/iOS 应使用自然语言描述

yaml

undefined

Natural language description (Preferred): High readability, adapts to page changes
deepThink mode: Enable when there are multiple similar elements on complex pages, AI will perform deeper analysis with higher accuracy but longer time consumption
Image-assisted location (image prompting): When text description is insufficient, screenshot annotations can be used to help AI understand the target element (official
```
locate.images
```
capability)
xpath selector (Last resort): When natural language cannot locate precisely. Note: xpath is only applicable to Web platform, Android/iOS should use natural language description

yaml

undefined

优先使用自然语言

Prefer natural language

aiTap: "商品列表中第三行的编辑按钮"

aiTap: "Edit button in the third row of the product list"

复杂场景启用 deepThink（相似元素多、定位不准时使用）

Enable deepThink for complex scenarios (when there are many similar elements or location is inaccurate)

aiTap: "第三行数据中的编辑图标" deepThink: true

aiTap: "Edit icon in the third row of data" deepThink: true

最后手段使用 xpath（仅 Web 平台）

Last resort: use xpath (Web platform only)

aiTap: "" xpath: "//table/tbody/tr[3]//button[@class='edit']"

undefined

aiTap: "" xpath: "//table/tbody/tr[3]//button[@class='edit']"

undefined

图片辅助定位（locate 对象）

Image-assisted Location (locate object)

当自然语言描述不够精确时，可通过

locate

对象提供参考图片：

yaml

undefined

When natural language description is not precise enough, reference images can be provided via the

locate

object:

yaml

undefined

使用图片辅助 AI 识别目标元素

Use image to assist AI in identifying target element

aiTap: locate: prompt: "与参考图片相似的图标按钮" images: - name: "target-icon" url: "https://example.com/icon.png" convertHttpImage2Base64: true

aiTap: locate: prompt: "Icon button similar to the reference image" images: - name: "target-icon" url: "https://example.com/icon.png" convertHttpImage2Base64: true

简化形式：直接在 images 选项中提供

Simplified form: directly provide in images option

aiTap: "与参考图片相似的图标按钮" images:
- "./images/target-icon.png"

undefined

aiTap: "Icon button similar to the reference image" images:
- "./images/target-icon.png"

undefined

aiQuery 结果格式化

aiQuery Result Formatting

在

query

中明确指定期望的数据结构：

yaml

- aiQuery:
    query: >
      提取页面上所有商品信息，返回数组格式。
      每个元素包含以下字段：
      - name: 商品名称（字符串）
      - price: 价格（数字）
      - inStock: 是否有库存（布尔值）
    name: "productList"

Clearly specify the expected data structure in

query

yaml

- aiQuery:
    query: >
      Extract all product information on the page and return it as an array.
      Each element should contain the following fields:
      - name: Product name (string)
      - price: Price (number)
      - inStock: In stock (boolean)
    name: "productList"

等待策略

Wait Strategy

在关键操作后添加

aiWaitFor

，确保页面状态就绪：

yaml

- aiTap: "提交按钮"
- aiWaitFor: "提交成功提示出现，或页面跳转到结果页"
  timeout: 10000

Add

aiWaitFor

after key operations to ensure the page state is ready:

yaml

- aiTap: "Submit button"
- aiWaitFor: "Submit success prompt appears, or page redirects to result page"
  timeout: 10000

数据转换操作参考

Data Transformation Operation Reference

Extended 模式下

data_transform

支持的操作：

操作	说明	关键参数
`filter`	按条件过滤	`condition` （JS 表达式，用 `item` 引用当前元素）
`sort`	排序	`by` （字段名）、 `order` （asc/desc）
`map`	映射/变换	`template` （字段映射模板）
`reduce`	聚合计算	`reducer` （JS 表达式）、 `initial` （初始值）
`unique` / `distinct`	去重	`by` （去重依据的字段）
`slice`	截取子集	`start` 、 `end`
`flatten`	展平嵌套数组	`depth` （展平深度，默认 1）
`groupBy`	按字段分组	`by` 或 `field` （分组依据的字段名）

两种格式: 平面格式
{source, operation, name}
适合单步操作；嵌套格式
{input, operations:[], output}
支持链式多步操作。两种格式均支持所有 8 种操作。

Operations supported by

data_transform

in Extended mode:

Operation	Description	Key Parameters
`filter`	Filter by condition	`condition` (JS expression, use `item` to reference current element)
`sort`	Sort	`by` (field name), `order` (asc/desc)
`map`	Map/Transform	`template` (field mapping template)
`reduce`	Aggregation calculation	`reducer` (JS expression), `initial` (initial value)
`unique` / `distinct`	Deduplicate	`by` (field for deduplication)
`slice`	Extract subset	`start` , `end`
`flatten`	Flatten nested array	`depth` (flatten depth, default 1)
`groupBy`	Group by field	`by` or `field` (field name for grouping)

Two Formats: Flat format
{source, operation, name}
is suitable for single-step operations; nested format
{input, operations:[], output}
supports chained multi-step operations. Both formats support all 8 operations.

平台特定注意事项

Platform-Specific Notes

Web 平台

Web Platform

```
url
```
必须包含完整协议（
```
https://
```
）
使用
```
aiWaitFor
```
等待页面加载完成后再操作
表单操作前确保输入框处于可交互状态

```
url
```
must include the full protocol (
```
https://
```
)
Use
```
aiWaitFor
```
to wait for page loading to complete before operations
Ensure input boxes are interactive before form operations

Android 平台

Android Platform

需要配置
```
deviceId
```
（ADB 设备 ID，如
```
emulator-5554
```
）
使用
```
launch: "com.example.app"
```
启动应用（在 flow 中作为 action 步骤）
可使用
```
runAdbShell
```
执行 ADB 命令

Need to configure
```
deviceId
```
(ADB device ID, e.g.,
```
emulator-5554
```
)
Use
```
launch: "com.example.app"
```
to launch the app (as an action step in flow)
Can use
```
runAdbShell
```
to execute ADB commands

iOS 平台

iOS Platform

需要配置
```
wdaPort
```
（WebDriverAgent 端口，默认 8100）和
```
wdaHost
```
（默认 localhost）
使用
```
launch: "com.example.app"
```
启动应用（在 flow 中作为 action 步骤）
可使用
```
runWdaRequest
```
发送 WebDriverAgent 请求

Need to configure
```
wdaPort
```
(WebDriverAgent port, default 8100) and
```
wdaHost
```
(default localhost)
Use
```
launch: "com.example.app"
```
to launch the app (as an action step in flow)
Can use
```
runWdaRequest
```
to send WebDriverAgent requests

Computer 平台

Computer Platform

用于通用桌面自动化场景

For general desktop automation scenarios

常见错误模式（Anti-patterns）

Common Anti-patterns

生成 YAML 时应避免以下常见错误：

不必要地使用嵌套对象格式 — 推荐扁平格式（
```
aiInput: "搜索框"
```
+
```
value: "关键词"
```
），更简洁可读。嵌套格式（
```
aiInput: { locator: "搜索框", value: "关键词" }
```
）在两种模式中均有效，但通常只在需要
```
locate
```
图片定位等复杂参数时才使用
Extended 模式遗漏
engine: extended
— 使用任何扩展功能（变量、循环、条件等）时必须声明引擎
循环忘记
maxIterations
—
```
while
```
循环必须设置安全上限，
```
for
```
和
```
repeat
```
循环的 count 不应超过 10000

aiWaitFor
使用嵌套对象格式 — 应使用

aiWaitFor: "条件"

timeout: 10000

，而非

aiWaitFor: { condition: "条件" }

缺少
features
声明 — Extended 模式应列出使用的特性，便于检测和优化

Avoid the following common mistakes when generating YAML:

Unnecessary use of nested object format — Flat format is recommended (
```
aiInput: "Search box"
```
+
```
value: "Keyword"
```
), which is more concise and readable. Nested format (
```
aiInput: { locator: "Search box", value: "Keyword" }
```
) is valid in both modes but is usually only used when complex parameters like
```
locate
```
image positioning are needed
Missing
engine: extended
in Extended mode — Must declare the engine when using any extended features (variables, loops, conditions, etc.)
Forgetting
maxIterations
in loops —
```
while
```
loops must set a safety upper limit, the count of
```
for
```
and
```
repeat
```
loops should not exceed 10000

Using nested object format for
aiWaitFor
— Should use

aiWaitFor: "Condition"

timeout: 10000

instead of

aiWaitFor: { condition: "Condition" }

Missing
features
declaration — Extended mode should list the features used to facilitate detection and optimization

输出前自检清单

Pre-output Self-check List

生成 YAML 后，在输出前核验以下事项：

每个
```
aiInput
```
都有对应的
```
value
```
参数？
关键操作后有
```
aiWaitFor
```
确保页面状态就绪？
Extended 模式声明了
```
engine: extended
```
和
```
features
```
列表？
循环有安全上限（
```
maxIterations
```
或合理的
```
count
```
）？
敏感信息（密码、Token）使用
```
${ENV:XXX}
```
引用环境变量？
AI 指令描述足够精确（包含位置、文字、颜色等特征）？

After generating YAML, verify the following items before output:

Does each
```
aiInput
```
have a corresponding
```
value
```
parameter?
Is there
```
aiWaitFor
```
after key operations to ensure page state is ready?
Does Extended mode declare
```
engine: extended
```
and
```
features
```
list?
Does the loop have a safety upper limit (
```
maxIterations
```
or reasonable
```
count
```
)?
Are sensitive information (passwords, Tokens) referenced via environment variables using
```
${ENV:XXX}
```
?
Are AI instruction descriptions precise enough (including features like position, text, color)?

注意事项

Notes

AI 指令（aiTap、aiAssert 等）的参数使用自然语言描述，不需要 CSS 选择器
中文和英文描述均可，Midscene 的 AI 引擎支持多语言
```
aiQuery
```
的结果通过
```
name
```
字段存储，在后续步骤中用
```
${name}
```
引用（仅 Extended 模式）
```
aiWaitFor
```
建议设置合理的
```
timeout
```
（毫秒），默认通常为 15 秒
循环中务必设置
```
maxIterations
```
作为安全上限，防止无限循环
```
${ENV:XXX}
```
或
```
${ENV.XXX}
```
可引用环境变量，避免在 YAML 中硬编码敏感信息
始终显式声明
```
engine
```
字段，避免自动检测带来的意外行为
变量引用区分大小写：
```
${userName}
```
和
```
${username}
```
是不同的变量
避免循环导入：A.yaml 导入 B.yaml、B.yaml 又导入 A.yaml 会导致运行时错误
生成后务必通过
```
--dry-run
```
验证语法和结构（注意：
```
--dry-run
```
不检测模型配置，AI 操作需要配置
```
MIDSCENE_MODEL_API_KEY
```
才能实际执行）
提示用户可以用 Midscene Runner skill 来执行生成的文件

Parameters for AI instructions (aiTap, aiAssert, etc.) are described in natural language, no CSS selectors needed
Both Chinese and English descriptions are acceptable, Midscene's AI engine supports multiple languages
Results of
```
aiQuery
```
are stored via the
```
name
```
field and can be referenced in subsequent steps using
```
${name}
```
(Extended mode only)
It is recommended to set a reasonable
```
timeout
```
(in milliseconds) for
```
aiWaitFor
```
, default is usually 15 seconds
Be sure to set
```
maxIterations
```
as a safety upper limit in loops to prevent infinite loops
```
${ENV:XXX}
```
or
```
${ENV.XXX}
```
can be used to reference environment variables, avoiding hardcoding sensitive information in YAML
Always explicitly declare the
```
engine
```
field to avoid unexpected behavior from automatic detection
Variable references are case-sensitive:
```
${userName}
```
and
```
${username}
```
are different variables
Avoid circular imports: Importing B.yaml in A.yaml and A.yaml in B.yaml will cause runtime errors
Be sure to verify syntax and structure via
```
--dry-run
```
after generation (Note:
```
--dry-run
```
does not detect model configuration, AI operations require
```
MIDSCENE_MODEL_API_KEY
```
to be configured for actual execution)
Prompt users to use the Midscene Runner skill to execute the generated file

迭代修复流程

Iterative Fix Process

当生成的 YAML 执行失败时：

Runner 可自行修复：如果错误可以通过修改 YAML 解决（如定位描述不够精确、等待时间不足），Runner Skill 会直接修改并重试
需要重新生成时：如果错误涉及根本性设计问题（如选错模式、缺少关键步骤），用户可以向 Generator 描述失败情况，Generator 会基于错误信息重新生成改进版 YAML
推荐流程：生成 → dry-run 验证 → 执行 → 如失败，描述错误让 Generator 修复 → 重新执行

When the generated YAML fails to execute:

Runner can fix it automatically: If the error can be resolved by modifying YAML (e.g., imprecise location description, insufficient wait time), Runner Skill will directly modify and retry
When regeneration is needed: If the error involves fundamental design issues (e.g., wrong mode selected, missing key steps), users can describe the failure to Generator, which will regenerate an improved YAML based on the error information
Recommended Flow: Generate → dry-run validation → Execute → If failed, describe error for Generator to fix → Re-execute

协作协议

Collaboration Agreement

生成完成后，向用户返回以下结构化信息：

生成的文件路径:
```
./midscene-output/<filename>.yaml
```
执行模式: native 或 extended

建议的下一步命令:

node scripts/midscene-run.js <path> --dry-run

如果 dry-run 验证失败，自动分析错误并修复 YAML，重新验证

After generation is complete, return the following structured information to the user:

Generated File Path:
```
./midscene-output/<filename>.yaml
```
Execution Mode: native or extended

Recommended Next Command:

node scripts/midscene-run.js <path> --dry-run

If dry-run validation fails, automatically analyze the error, fix the YAML, and re-validate

midscene-yaml-generator

Original

Translation

Midscene YAML Generator

Midscene YAML Generator

典型工作流

Typical Workflow

触发条件

Trigger Conditions

工作流程

Workflow

第 1 步：分析需求复杂度

Step 1: Analyze Requirement Complexity

第 2 步：确定目标平台

Step 2: Determine Target Platform

第 3 步：自然语言 → YAML 转换

Step 3: Natural Language → YAML Conversion

动作选择优先级（重要）

Action Selection Priority (Important)

Native 模式 YAML 格式规范（重要）

Native Mode YAML Format Specification (Important)

Native 动作映射

Native Action Mapping

Extended 控制流映射

Extended Control Flow Mapping

第 4 步：选择模板起点

Step 4: Select Template Starting Point

第 5 步：生成 YAML

Step 5: Generate YAML

输出格式

Output Format

自动生成 by Midscene YAML Generator

Auto-generated by Midscene YAML Generator

需求描述: [用户原始需求]

Requirement Description: [Original user requirement]

生成时间: [timestamp]

Generation Time: [timestamp]

可选: agent 配置

Optional: agent configuration

agent:

agent:

testId: "test-001"

testId: "test-001"

groupName: "自动化测试组"

groupName: "Automation Testing Group"

groupDescription: "描述"

groupDescription: "Description"

cache: true

cache: true

continueOnError: true # 可选：失败后继续

output: # 可选：导出数据

filePath: "./midscene-output/data.json"

dataName: "variableName"

continueOnError: true # Optional: Continue on failure

output: # Optional: Export data

filePath: "./midscene-output/data.json"

dataName: "variableName"

第 6 步：验证并输出

Step 6: Validate and Output

AI 指令编写最佳实践

Best Practices for Writing AI Instructions

描述精确性

Description Precision

定位策略优先级

Location Strategy Priority

优先使用自然语言

Prefer natural language

复杂场景启用 deepThink（相似元素多、定位不准时使用）

Enable deepThink for complex scenarios (when there are many similar elements or location is inaccurate)

最后手段使用 xpath（仅 Web 平台）

Last resort: use xpath (Web platform only)

图片辅助定位（locate 对象）

Image-assisted Location (locate object)

使用图片辅助 AI 识别目标元素

Use image to assist AI in identifying target element

简化形式：直接在 images 选项中提供

Simplified form: directly provide in images option

aiQuery 结果格式化

aiQuery Result Formatting

等待策略