midscene-yaml-generator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Midscene YAML Generator

Midscene YAML Generator

典型工作流

Typical Workflow

用户需求 → [Generator] 生成 YAML
         → [Generator] 自动 dry-run 验证
         → 验证失败?→ [Generator] 自动修复
         → [Runner] 执行
         → 执行失败?→ [Runner] 分析 + 修复 YAML → 重新执行
         → 成功 → 展示报告摘要
User Requirement → [Generator] Generate YAML
                → [Generator] Auto dry-run validation
                → Validation failed? → [Generator] Auto-fix
                → [Runner] Execute
                → Execution failed? → [Runner] Analyze + Fix YAML → Re-execute
                → Success → Display report summary

触发条件

Trigger Conditions

当用户描述一个浏览器自动化需求(自然语言),需要生成 Midscene YAML 文件时使用。
常见触发短语:
  • "生成一个 YAML 来..."
  • "帮我写个自动化脚本..."
  • "创建 Midscene 测试用例..."
  • "我想自动化 XXX 操作..."
  • "把这个需求转成 YAML..."
  • "写个 Midscene 配置文件..."
English trigger phrases:
  • "Generate a YAML for..."
  • "Write an automation script to..."
  • "Create a test case for..."
  • "Automate the login flow"
  • "Convert this requirement to YAML"
  • "Write a Midscene config file for..."
Use this when users describe a browser automation requirement (in natural language) and need to generate a Midscene YAML file.
Common trigger phrases:
  • "Generate a YAML to..."
  • "Help me write an automation script..."
  • "Create a Midscene test case..."
  • "I want to automate the XXX operation..."
  • "Convert this requirement to YAML..."
  • "Write a Midscene config file..."
English trigger phrases:
  • "Generate a YAML for..."
  • "Write an automation script to..."
  • "Create a test case for..."
  • "Automate the login flow"
  • "Convert this requirement to YAML"
  • "Write a Midscene config file for..."

工作流程

Workflow

第 1 步:分析需求复杂度

Step 1: Analyze Requirement Complexity

根据用户描述判断所需模式:
选择 Native 模式 — 当需求仅涉及:
  • 打开网页 / 启动应用
  • 点击、悬停、输入、滚动、键盘操作等基础交互
  • AI 自动规划执行(
    ai
  • 数据提取(
    aiQuery
  • 验证断言(
    aiAssert
  • 等待条件(
    aiWaitFor
  • 工具操作(
    sleep
    javascript
    recordToReport
  • 平台特定操作(
    runAdbShell
    runWdaRequest
    launch
选择 Extended 模式 — 当需求涉及以下任一:
  • 条件判断("如果...则...")
  • 循环操作("重复"、"遍历"、"翻页")
  • 变量和动态数据("定义变量"、"参数化")
  • 外部 API 调用("调用接口")
  • 错误处理重试("失败了就..."、"重试")
  • 并行任务("同时做...")
  • 数据转换处理("过滤"、"排序"、"映射")
  • 导入复用子流程("复用"、"导入")
经验法则: 先用 Native 写,当你发现自己需要
if
for
或变量时,切换到 Extended。
Determine the required mode based on the user's description:
Choose Native Mode — When the requirement only involves:
  • Open web pages / Launch applications
  • Basic interactions like clicking, hovering, inputting, scrolling, keyboard operations
  • AI automatic planning and execution (
    ai
    )
  • Data extraction (
    aiQuery
    )
  • Validation assertions (
    aiAssert
    )
  • Wait conditions (
    aiWaitFor
    )
  • Tool operations (
    sleep
    ,
    javascript
    ,
    recordToReport
    )
  • Platform-specific operations (
    runAdbShell
    ,
    runWdaRequest
    ,
    launch
    )
Choose Extended Mode — When the requirement involves any of the following:
  • Conditional judgment ("If...then...")
  • Loop operations ("Repeat", "Traverse", "Pagination")
  • Variables and dynamic data ("Define variables", "Parameterization")
  • External API calls ("Call API")
  • Error handling and retries ("If failed...", "Retry")
  • Parallel tasks ("Do...simultaneously")
  • Data transformation processing ("Filter", "Sort", "Map")
  • Import and reuse sub-flows ("Reuse", "Import")
Rule of Thumb: Start with Native mode, switch to Extended mode when you find you need
if
,
for
or variables.

第 2 步:确定目标平台

Step 2: Determine Target Platform

根据用户描述判断平台配置:
用户描述平台YAML 配置
"打开网页/网站/URL"Web
web: { url: "...", headless: false }
"测试 Android 应用"Android
android: { deviceId: "..." }
+
launch: "包名"
"测试 iOS 应用"iOS
ios: { wdaPort: 8100 }
+
launch: "bundleId"
"桌面自动化"Computer
computer: { ... }
Web 平台额外配置选项
  • headless: true/false
    — 是否无头模式运行(默认 false)
  • viewportWidth
    /
    viewportHeight
    — 视口大小(默认 1280×720)
  • userAgent
    — 自定义 User-Agent
  • deviceScaleFactor
    — 设备像素比(如 Retina 屏设 2)
  • waitForNetworkIdle
    — 网络空闲等待配置,支持
    true
    或对象格式
    { timeout: 2000, continueOnNetworkIdleError: true }
  • cookie
    — Cookie JSON 文件路径(实现免登录会话恢复)
  • bridgeMode
    — Bridge 模式:
    false
    (默认)|
    'newTabWithUrl'
    |
    'currentTab'
    ,复用已登录的桌面浏览器
  • chromeArgs
    — 自定义 Chrome 启动参数数组(如
    ['--disable-gpu', '--proxy-server=...']
  • serve
    — 本地静态文件目录,启动内置服务器
  • acceptInsecureCerts
    — 忽略 HTTPS 证书错误(默认 false)
  • forceSameTabNavigation
    — 限制导航在当前标签页(默认 true)
Determine platform configuration based on user description:
User DescriptionPlatformYAML Configuration
"Open web page/website/URL"Web
web: { url: "...", headless: false }
"Test Android app"Android
android: { deviceId: "..." }
+
launch: "package name"
"Test iOS app"iOS
ios: { wdaPort: 8100 }
+
launch: "bundleId"
"Desktop automation"Computer
computer: { ... }
Additional Web Platform Configuration Options:
  • headless: true/false
    — Run in headless mode (default false)
  • viewportWidth
    /
    viewportHeight
    — Viewport size (default 1280×720)
  • userAgent
    — Custom User-Agent
  • deviceScaleFactor
    — Device pixel ratio (e.g., set to 2 for Retina screens)
  • waitForNetworkIdle
    — Network idle wait configuration, supports
    true
    or object format
    { timeout: 2000, continueOnNetworkIdleError: true }
  • cookie
    — Path to Cookie JSON file (enables login-free session recovery)
  • bridgeMode
    — Bridge mode:
    false
    (default) |
    'newTabWithUrl'
    |
    'currentTab'
    , reuses logged-in desktop browser
  • chromeArgs
    — Array of custom Chrome launch arguments (e.g.,
    ['--disable-gpu', '--proxy-server=...']
    )
  • serve
    — Local static file directory, starts built-in server
  • acceptInsecureCerts
    — Ignore HTTPS certificate errors (default false)
  • forceSameTabNavigation
    — Restrict navigation to current tab (default true)

第 3 步:自然语言 → YAML 转换

Step 3: Natural Language → YAML Conversion

动作选择优先级(重要)

Action Selection Priority (Important)

  1. 首选
    ai:
    — 用自然语言描述整个意图,让 AI 自动规划并执行多步骤。适合绝大多数场景,成功率最高
  2. 需要精确控制时 — 使用
    aiTap
    aiInput
    等具体动作(如填写特定表单字段)
  3. 需要提取数据时 — 必须使用
    aiQuery
    ai:
    不能返回结构化数据)
  4. 需要验证状态时 — 使用
    aiAssert
    aiWaitFor
经验法则: 如果用户需求可以用一句自然语言描述完成,优先用一个
ai:
步骤,而不是拆成多个
aiInput
+
aiTap
黄金路径 — 最简可工作示例:
yaml
web:
  url: "https://www.baidu.com"

tasks:
  - name: "搜索 Midscene"
    flow:
      - ai: "在搜索框输入 Midscene 并点击搜索"
      - sleep: 3000
      - aiAssert: "页面显示了搜索结果"
  1. Prefer
    ai:
    — Describe the entire intent in natural language, let AI automatically plan and execute multi-step operations. Suitable for most scenarios with the highest success rate
  2. When precise control is needed — Use specific actions like
    aiTap
    ,
    aiInput
    (e.g., filling in specific form fields)
  3. When data extraction is needed — Must use
    aiQuery
    (
    ai:
    cannot return structured data)
  4. When state validation is needed — Use
    aiAssert
    or
    aiWaitFor
Rule of Thumb: If the user's requirement can be described in a single natural language sentence, prioritize using one
ai:
step instead of splitting into multiple
aiInput
+
aiTap
steps.
Golden Path - Minimal Working Example:
yaml
web:
  url: "https://www.baidu.com"

tasks:
  - name: "Search for Midscene"
    flow:
      - ai: "Enter Midscene in the search box and click search"
      - sleep: 3000
      - aiAssert: "The page displays search results"

Native 模式 YAML 格式规范(重要)

Native Mode YAML Format Specification (Important)

Native 模式的动作参数支持两种格式:
扁平格式(推荐,简洁):动作关键字后跟字符串值,额外参数作为同级兄弟键。
yaml
- aiInput: "搜索框"
  value: "关键词"
- aiWaitFor: "页面加载完成"
  timeout: 10000
- aiTap: "按钮描述"
  deepThink: true
- aiAssert: "页面包含预期内容"
  errorMessage: "内容验证失败"
嵌套格式(也有效,适合复杂参数):
yaml
- aiInput:
    locator: "搜索框"
    value: "关键词"
- aiQuery:
    query: "提取商品列表"
    name: "products"
使用以下映射规则表将用户需求转换为 YAML:
Native mode supports two formats for action parameters:
Flat Format (Recommended, concise): Action keyword followed by string value, additional parameters as sibling keys.
yaml
- aiInput: "Search box"
  value: "Keyword"
- aiWaitFor: "Page loaded completely"
  timeout: 10000
- aiTap: "Button description"
  deepThink: true
- aiAssert: "Page contains expected content"
  errorMessage: "Content validation failed"
Nested Format (Also valid, suitable for complex parameters):
yaml
- aiInput:
    locator: "Search box"
    value: "Keyword"
- aiQuery:
    query: "Extract product list"
    name: "products"
Use the following mapping rule table to convert user requirements to YAML:

Native 动作映射

Native Action Mapping

自然语言模式YAML 映射说明
"打开/访问/进入 XXX 网站"
web: { url: "XXX" }
平台配置
"自动规划并执行 XXX"
ai: "XXX"
AI 自动拆解为多步骤执行;可选
fileChooserAccept: "path"
处理文件上传对话框
"点击/按/选择 XXX"
aiTap: "XXX"
简写形式
"悬停/移到 XXX 上"
aiHover: "XXX"
触发下拉菜单或 tooltip
"在 XXX 输入 YYY"
aiInput: "XXX"
+
value: "YYY"
扁平兄弟格式;可选
mode: "replace"|"clear"|"typeOnly"|"append"
"按键盘 XXX 键"
aiKeyboardPress: "XXX"
支持组合键如 "Control+A";
keyName
可作为替代参数
"向下/上/左/右滚动"
aiScroll: "目标区域"
+
direction: "down"
扁平兄弟格式;可选
distance
scrollType
"等待 XXX 出现"
aiWaitFor: "XXX"
可选 timeout(毫秒)
"检查/验证/确认 XXX"
aiAssert: "XXX"
可选 errorMessage
"获取/提取/读取 XXX"
aiQuery: { query: "XXX", name: "result" }
name 用于存储结果
"暂停/等待 N 秒"
sleep: N*1000
参数为毫秒
"执行 JS 代码"
javascript: "代码内容"
直接执行 JavaScript
"截图记录到报告"
recordToReport: "标题"
+
content: "描述"
截图并记录描述到报告
"双击 XXX"
aiDoubleClick: "XXX"
双击操作;可选
deepThink: true
"右键点击 XXX"
aiRightClick: "XXX"
右键操作;可选
deepThink: true
"定位 XXX 元素"
aiLocate: "XXX"
+
name: "elem"
定位元素,结果存入变量(Extended 模式可引用)
"XXX 是否为真?"
aiBoolean: "XXX"
+
name: "flag"
返回布尔值;可选
domIncluded
/
screenshotIncluded
"获取 XXX 数量"
aiNumber: "XXX"
+
name: "count"
返回数字;可选
domIncluded
/
screenshotIncluded
"获取 XXX 文本"
aiString: "XXX"
+
name: "text"
返回字符串;可选
domIncluded
/
screenshotIncluded
"询问 AI XXX"
aiAsk: "XXX"
+
name: "answer"
自由提问,返回文本答案
"拖拽 A 到 B"
aiDragAndDrop: "A"
+
to: "B"
扁平格式;或嵌套
{ from: "A", to: "B" }
"清空 XXX 输入框"
aiClearInput: "XXX"
清除输入框内容
"执行 ADB 命令"
runAdbShell: "命令"
Android 平台特有
"执行 WDA 请求"
runWdaRequest: { ... }
iOS 平台特有
"启动应用"
launch: "包名"
移动端启动应用
Natural Language PatternYAML MappingDescription
"Open/access/enter XXX website"
web: { url: "XXX" }
Platform configuration
"Automatically plan and execute XXX"
ai: "XXX"
AI automatically breaks down into multi-step execution; optional
fileChooserAccept: "path"
to handle file upload dialogs
"Click/press/select XXX"
aiTap: "XXX"
Short form
"Hover/move over XXX"
aiHover: "XXX"
Trigger dropdown menu or tooltip
"Enter YYY in XXX"
aiInput: "XXX"
+
value: "YYY"
Flat sibling format; optional
mode: "replace"|"clear"|"typeOnly"|"append"
"Press XXX key on keyboard"
aiKeyboardPress: "XXX"
Supports key combinations like "Control+A";
keyName
can be used as an alternative parameter
"Scroll down/up/left/right"
aiScroll: "Target area"
+
direction: "down"
Flat sibling format; optional
distance
,
scrollType
"Wait for XXX to appear"
aiWaitFor: "XXX"
Optional timeout (in milliseconds)
"Check/verify/confirm XXX"
aiAssert: "XXX"
Optional errorMessage
"Get/extract/read XXX"
aiQuery: { query: "XXX", name: "result" }
name is used to store the result
"Pause/wait N seconds"
sleep: N*1000
Parameter is in milliseconds
"Execute JS code"
javascript: "Code content"
Execute JavaScript directly
"Take screenshot and record to report"
recordToReport: "Title"
+
content: "Description"
Take screenshot and record description to report
"Double-click XXX"
aiDoubleClick: "XXX"
Double-click operation; optional
deepThink: true
"Right-click XXX"
aiRightClick: "XXX"
Right-click operation; optional
deepThink: true
"Locate XXX element"
aiLocate: "XXX"
+
name: "elem"
Locate element, store result in variable (referencable in Extended mode)
"Is XXX true?"
aiBoolean: "XXX"
+
name: "flag"
Returns boolean value; optional
domIncluded
/
screenshotIncluded
"Get the number of XXX"
aiNumber: "XXX"
+
name: "count"
Returns number; optional
domIncluded
/
screenshotIncluded
"Get text of XXX"
aiString: "XXX"
+
name: "text"
Returns string; optional
domIncluded
/
screenshotIncluded
"Ask AI about XXX"
aiAsk: "XXX"
+
name: "answer"
Free-form question, returns text answer
"Drag A to B"
aiDragAndDrop: "A"
+
to: "B"
Flat format; or nested
{ from: "A", to: "B" }
"Clear XXX input box"
aiClearInput: "XXX"
Clear input box content
"Execute ADB command"
runAdbShell: "Command"
Android platform only
"Execute WDA request"
runWdaRequest: { ... }
iOS platform only
"Launch app"
launch: "Package name"
Mobile app launch

Extended 控制流映射

Extended Control Flow Mapping

自然语言模式YAML 映射
"定义变量 XXX 为 YYY"
variables: { XXX: "YYY" }
"使用环境变量 XXX"
${ENV:XXX}
${ENV.XXX}
"如果 XXX 则 YYY 否则 ZZZ"
logic: { if: "XXX", then: [YYY], else: [ZZZ] }
"重复 N 次"
loop: { type: repeat, count: N, steps: [...] }
"对每个 XXX 执行"
loop: { type: for, items: "XXX", itemVar: "item", steps: [...] }
itemVar
/
as
/
item
均可)
"当 XXX 时持续做 YYY"
loop: { type: while, condition: "XXX", maxIterations: N, steps: [...] }
"先做 A,失败了就做 B"
try: { steps: [A] }, catch: { steps: [B] }
"同时做 A 和 B"
parallel: { branches: [{steps: [A]}, {steps: [B]}], waitAll: true, merge_results: true }
"调用 XXX 接口"
external_call: { type: http, method: POST, url: "XXX", response_as: "varName" }
"执行 Shell 命令"
external_call: { type: shell, command: "XXX" }
"导入/复用 XXX 流程"
import: [{ flow: "XXX.yaml", as: name }]
"过滤/排序/映射数据"
data_transform: { source, operation, ... }
Natural Language PatternYAML Mapping
"Define variable XXX as YYY"
variables: { XXX: "YYY" }
"Use environment variable XXX"
${ENV:XXX}
or
${ENV.XXX}
"If XXX then YYY else ZZZ"
logic: { if: "XXX", then: [YYY], else: [ZZZ] }
"Repeat N times"
loop: { type: repeat, count: N, steps: [...] }
"Execute for each XXX"
loop: { type: for, items: "XXX", itemVar: "item", steps: [...] }
(
itemVar
/
as
/
item
are all acceptable)
"Continue doing YYY while XXX"
loop: { type: while, condition: "XXX", maxIterations: N, steps: [...] }
"Do A first, do B if it fails"
try: { steps: [A] }, catch: { steps: [B] }
"Do A and B simultaneously"
parallel: { branches: [{steps: [A]}, {steps: [B]}], waitAll: true, merge_results: true }
"Call XXX API"
external_call: { type: http, method: POST, url: "XXX", response_as: "varName" }
"Execute Shell command"
external_call: { type: shell, command: "XXX" }
"Import/reuse XXX flow"
import: [{ flow: "XXX.yaml", as: name }]
"Filter/sort/map data"
data_transform: { source, operation, ... }

第 4 步:选择模板起点

Step 4: Select Template Starting Point

参考
templates/
目录下的模板文件,找到最接近用户需求的模板作为起点:
Native 模板
  • templates/native/web-basic.yaml
    — 基础网页操作
  • templates/native/web-login.yaml
    — 登录流程
  • templates/native/web-data-extract.yaml
    — 数据提取
  • templates/native/web-search.yaml
    — 网页搜索流程
  • templates/native/web-file-upload.yaml
    — 文件上传表单
  • templates/native/web-multi-tab.yaml
    — 多标签页操作
  • templates/native/deep-think-locator.yaml
    — 图片辅助定位(deepThink/xpath)
  • templates/native/android-app.yaml
    — Android 测试
  • templates/native/ios-app.yaml
    — iOS 测试
  • templates/native/computer-desktop.yaml
    — 桌面应用自动化
Extended 模板
  • templates/extended/web-conditional-flow.yaml
    — 条件分支
  • templates/extended/web-pagination-loop.yaml
    — 分页循环
  • templates/extended/web-data-pipeline.yaml
    — 数据流水线
  • templates/extended/multi-step-with-retry.yaml
    — 带重试的多步骤
  • templates/extended/api-integration-test.yaml
    — API 集成
  • templates/extended/e2e-workflow.yaml
    — 端到端完整工作流
  • templates/extended/reusable-sub-flows.yaml
    — 子流程复用(import/use)
  • templates/extended/responsive-test.yaml
    — 多视口响应式测试
  • templates/extended/web-auth-flow.yaml
    — OAuth/登录认证流程(使用变量和环境引用)
模板选择决策
需求特征推荐模板
简单页面操作(打开、点击、输入)
native/web-basic.yaml
登录 / 表单填写
native/web-login.yaml
数据采集 / 信息提取
native/web-data-extract.yaml
搜索 + 结果验证
native/web-search.yaml
文件上传 / 附件提交
native/web-file-upload.yaml
OAuth/第三方认证登录
extended/web-auth-flow.yaml
桌面应用自动化(非浏览器)
native/computer-desktop.yaml
需要条件判断(如果登录了就...)
extended/web-conditional-flow.yaml
需要翻页 / 列表遍历
extended/web-pagination-loop.yaml
数据过滤 / 排序 / 聚合
extended/web-data-pipeline.yaml
需要失败重试
extended/multi-step-with-retry.yaml
需要调用外部 API
extended/api-integration-test.yaml
完整业务流程(多步骤 + 变量 + 导出)
extended/e2e-workflow.yaml
子流程复用 / 模块化
extended/reusable-sub-flows.yaml
多屏幕尺寸响应式验证
extended/responsive-test.yaml
复杂元素定位 / deepThink
native/deep-think-locator.yaml
多标签页操作
native/web-multi-tab.yaml
Refer to the template files in the
templates/
directory and find the template closest to the user's requirement as the starting point:
Native Templates:
  • templates/native/web-basic.yaml
    — Basic web operations
  • templates/native/web-login.yaml
    — Login flow
  • templates/native/web-data-extract.yaml
    — Data extraction
  • templates/native/web-search.yaml
    — Web search flow
  • templates/native/web-file-upload.yaml
    — File upload form
  • templates/native/web-multi-tab.yaml
    — Multi-tab operations
  • templates/native/deep-think-locator.yaml
    — Image-assisted location (deepThink/xpath)
  • templates/native/android-app.yaml
    — Android testing
  • templates/native/ios-app.yaml
    — iOS testing
  • templates/native/computer-desktop.yaml
    — Desktop app automation
Extended Templates:
  • templates/extended/web-conditional-flow.yaml
    — Conditional branching
  • templates/extended/web-pagination-loop.yaml
    — Pagination loop
  • templates/extended/web-data-pipeline.yaml
    — Data pipeline
  • templates/extended/multi-step-with-retry.yaml
    — Multi-step with retry
  • templates/extended/api-integration-test.yaml
    — API integration
  • templates/extended/e2e-workflow.yaml
    — End-to-end complete workflow
  • templates/extended/reusable-sub-flows.yaml
    — Sub-flow reuse (import/use)
  • templates/extended/responsive-test.yaml
    — Multi-viewport responsive testing
  • templates/extended/web-auth-flow.yaml
    — OAuth/login authentication flow (using variables and environment references)
Template Selection Decision:
Requirement FeatureRecommended Template
Simple page operations (open, click, input)
native/web-basic.yaml
Login / Form filling
native/web-login.yaml
Data collection / Information extraction
native/web-data-extract.yaml
Search + Result validation
native/web-search.yaml
File upload / Attachment submission
native/web-file-upload.yaml
OAuth/Third-party authentication login
extended/web-auth-flow.yaml
Desktop app automation (non-browser)
native/computer-desktop.yaml
Conditional judgment needed (If logged in then...)
extended/web-conditional-flow.yaml
Pagination / List traversal needed
extended/web-pagination-loop.yaml
Data filtering / Sorting / Aggregation
extended/web-data-pipeline.yaml
Retry on failure needed
extended/multi-step-with-retry.yaml
External API call needed
extended/api-integration-test.yaml
Complete business flow (multi-step + variables + export)
extended/e2e-workflow.yaml
Sub-flow reuse / Modularization
extended/reusable-sub-flows.yaml
Multi-screen size responsive validation
extended/responsive-test.yaml
Complex element location / deepThink
native/deep-think-locator.yaml
Multi-tab operations
native/web-multi-tab.yaml

第 5 步:生成 YAML

Step 5: Generate YAML

基于模板和转换规则生成 YAML 内容,注意以下要点:
  1. 文件头部:添加注释说明需求来源和生成时间
  2. engine 字段:Extended 模式必须显式声明
    engine: extended
  3. features 列表:Extended 模式下声明使用的特性(如
    features: [logic, variables, loop]
    ),Native 模式可省略
  4. agent 配置(可选):
    testId
    用于标识测试、
    groupName
    /
    groupDescription
    用于报告分类、
    cache: true
    可缓存 AI 结果加速重复运行
  5. aiActContext(可选):为 AI Agent 提供额外上下文信息(如多语言网站标注语言、特殊领域术语),设置在
    agent: { aiActContext: "描述" }
  6. continueOnError(可选):如需某个任务失败后继续执行后续任务,设置
    continueOnError: true
  7. output 导出(可选):将
    aiQuery
    等结果导出为 JSON 文件,供后续流程使用
Generate YAML content based on templates and conversion rules, pay attention to the following points:
  1. File Header: Add comments explaining the requirement source and generation time
  2. engine field: Extended mode must explicitly declare
    engine: extended
  3. features list: In Extended mode, declare the features used (e.g.,
    features: [logic, variables, loop]
    ), which can be omitted in Native mode
  4. agent configuration (optional):
    testId
    is used to identify tests,
    groupName
    /
    groupDescription
    for report classification,
    cache: true
    can cache AI results to speed up repeated runs
  5. aiActContext (optional): Provide additional context information for AI Agent (such as language annotation for multilingual websites, special domain terms), set in
    agent: { aiActContext: "Description" }
  6. continueOnError (optional): If you need to continue executing subsequent tasks after a task fails, set
    continueOnError: true
  7. output export (optional): Export results like
    aiQuery
    to a JSON file for use in subsequent processes

输出格式

Output Format

yaml
undefined
yaml
undefined

自动生成 by Midscene YAML Generator

Auto-generated by Midscene YAML Generator

需求描述: [用户原始需求]

Requirement Description: [Original user requirement]

生成时间: [timestamp]

Generation Time: [timestamp]

engine: native|extended features: [...] # 仅 extended 模式
engine: native|extended features: [...] # Extended mode only

可选: agent 配置

Optional: agent configuration

agent:

agent:

testId: "test-001"

testId: "test-001"

groupName: "自动化测试组"

groupName: "Automation Testing Group"

groupDescription: "描述"

groupDescription: "Description"

cache: true

cache: true

[platform_config]
tasks:
  • name: "[任务名称]"

    continueOnError: true # 可选:失败后继续

    flow: [生成的步骤]

    output: # 可选:导出数据

    filePath: "./midscene-output/data.json"

    dataName: "variableName"

undefined
[platform_config]
tasks:
  • name: "[Task Name]"

    continueOnError: true # Optional: Continue on failure

    flow: [Generated steps]

    output: # Optional: Export data

    filePath: "./midscene-output/data.json"

    dataName: "variableName"

undefined

第 6 步:验证并输出

Step 6: Validate and Output

  1. 输出文件到
    ./midscene-output/
    目录
  2. 调用验证器确认 YAML 有效:
    bash
    node scripts/midscene-run.js <file> --dry-run
  3. 如果验证失败,分析错误原因并自动修复
  4. 验证通过后,提示用户可以使用 Runner 执行:
    bash
    node scripts/midscene-run.js <file>
  1. Output the file to the
    ./midscene-output/
    directory
  2. Call the validator to confirm the YAML is valid:
    bash
    node scripts/midscene-run.js <file> --dry-run
  3. If validation fails, analyze the error cause and auto-fix
  4. After validation passes, prompt the user to execute using Runner:
    bash
    node scripts/midscene-run.js <file>

AI 指令编写最佳实践

Best Practices for Writing AI Instructions

生成 YAML 时,AI 指令(
aiTap
aiAssert
等参数)的质量直接影响执行成功率。遵循以下原则:
When generating YAML, the quality of AI instructions (parameters for
aiTap
,
aiAssert
, etc.) directly affects execution success rate. Follow these principles:

描述精确性

Description Precision

  • :
    aiTap: "按钮"
    — 页面可能有多个按钮
  • :
    aiTap: "页面右上角的蓝色登录按钮"
    — 位置 + 颜色 + 功能
  • 更好:
    aiTap: "导航栏中文字为'立即登录'的按钮"
    — 精确到文字内容
  • Poor:
    aiTap: "Button"
    — There may be multiple buttons on the page
  • Good:
    aiTap: "Blue login button at the top right corner of the page"
    — Position + Color + Function
  • Better:
    aiTap: "Button with text 'Login Now' in the navigation bar"
    — Precise to text content

定位策略优先级

Location Strategy Priority

  1. 自然语言描述(首选):可读性高,适应页面变化
  2. deepThink 模式:复杂页面中多个相似元素时启用,AI 会进行更深层分析,准确率更高但耗时更长
  3. 图片辅助定位(image prompting):当文字描述不够时,可通过截图标注辅助 AI 理解目标元素(官方
    locate.images
    能力)
  4. xpath 选择器(最后手段):当自然语言无法精确定位时。注意:xpath 仅适用于 Web 平台,Android/iOS 应使用自然语言描述
yaml
undefined
  1. Natural language description (Preferred): High readability, adapts to page changes
  2. deepThink mode: Enable when there are multiple similar elements on complex pages, AI will perform deeper analysis with higher accuracy but longer time consumption
  3. Image-assisted location (image prompting): When text description is insufficient, screenshot annotations can be used to help AI understand the target element (official
    locate.images
    capability)
  4. xpath selector (Last resort): When natural language cannot locate precisely. Note: xpath is only applicable to Web platform, Android/iOS should use natural language description
yaml
undefined

优先使用自然语言

Prefer natural language

  • aiTap: "商品列表中第三行的编辑按钮"
  • aiTap: "Edit button in the third row of the product list"

复杂场景启用 deepThink(相似元素多、定位不准时使用)

Enable deepThink for complex scenarios (when there are many similar elements or location is inaccurate)

  • aiTap: "第三行数据中的编辑图标" deepThink: true
  • aiTap: "Edit icon in the third row of data" deepThink: true

最后手段使用 xpath(仅 Web 平台)

Last resort: use xpath (Web platform only)

  • aiTap: "" xpath: "//table/tbody/tr[3]//button[@class='edit']"
undefined
  • aiTap: "" xpath: "//table/tbody/tr[3]//button[@class='edit']"
undefined

图片辅助定位(locate 对象)

Image-assisted Location (locate object)

当自然语言描述不够精确时,可通过
locate
对象提供参考图片:
yaml
undefined
When natural language description is not precise enough, reference images can be provided via the
locate
object:
yaml
undefined

使用图片辅助 AI 识别目标元素

Use image to assist AI in identifying target element

  • aiTap: locate: prompt: "与参考图片相似的图标按钮" images: - name: "target-icon" url: "https://example.com/icon.png" convertHttpImage2Base64: true
  • aiTap: locate: prompt: "Icon button similar to the reference image" images: - name: "target-icon" url: "https://example.com/icon.png" convertHttpImage2Base64: true

简化形式:直接在 images 选项中提供

Simplified form: directly provide in images option

  • aiTap: "与参考图片相似的图标按钮" images:
    • "./images/target-icon.png"
undefined
  • aiTap: "Icon button similar to the reference image" images:
    • "./images/target-icon.png"
undefined

aiQuery 结果格式化

aiQuery Result Formatting

query
中明确指定期望的数据结构:
yaml
- aiQuery:
    query: >
      提取页面上所有商品信息,返回数组格式。
      每个元素包含以下字段:
      - name: 商品名称(字符串)
      - price: 价格(数字)
      - inStock: 是否有库存(布尔值)
    name: "productList"
Clearly specify the expected data structure in
query
:
yaml
- aiQuery:
    query: >
      Extract all product information on the page and return it as an array.
      Each element should contain the following fields:
      - name: Product name (string)
      - price: Price (number)
      - inStock: In stock (boolean)
    name: "productList"

等待策略

Wait Strategy

在关键操作后添加
aiWaitFor
,确保页面状态就绪:
yaml
- aiTap: "提交按钮"
- aiWaitFor: "提交成功提示出现,或页面跳转到结果页"
  timeout: 10000
Add
aiWaitFor
after key operations to ensure the page state is ready:
yaml
- aiTap: "Submit button"
- aiWaitFor: "Submit success prompt appears, or page redirects to result page"
  timeout: 10000

数据转换操作参考

Data Transformation Operation Reference

Extended 模式下
data_transform
支持的操作:
操作说明关键参数
filter
按条件过滤
condition
(JS 表达式,用
item
引用当前元素)
sort
排序
by
(字段名)、
order
(asc/desc)
map
映射/变换
template
(字段映射模板)
reduce
聚合计算
reducer
(JS 表达式)、
initial
(初始值)
unique
/
distinct
去重
by
(去重依据的字段)
slice
截取子集
start
end
flatten
展平嵌套数组
depth
(展平深度,默认 1)
groupBy
按字段分组
by
field
(分组依据的字段名)
两种格式: 平面格式
{source, operation, name}
适合单步操作;嵌套格式
{input, operations:[], output}
支持链式多步操作。两种格式均支持所有 8 种操作。
Operations supported by
data_transform
in Extended mode:
OperationDescriptionKey Parameters
filter
Filter by condition
condition
(JS expression, use
item
to reference current element)
sort
Sort
by
(field name),
order
(asc/desc)
map
Map/Transform
template
(field mapping template)
reduce
Aggregation calculation
reducer
(JS expression),
initial
(initial value)
unique
/
distinct
Deduplicate
by
(field for deduplication)
slice
Extract subset
start
,
end
flatten
Flatten nested array
depth
(flatten depth, default 1)
groupBy
Group by field
by
or
field
(field name for grouping)
Two Formats: Flat format
{source, operation, name}
is suitable for single-step operations; nested format
{input, operations:[], output}
supports chained multi-step operations. Both formats support all 8 operations.

平台特定注意事项

Platform-Specific Notes

Web 平台

Web Platform

  • url
    必须包含完整协议(
    https://
  • 使用
    aiWaitFor
    等待页面加载完成后再操作
  • 表单操作前确保输入框处于可交互状态
  • url
    must include the full protocol (
    https://
    )
  • Use
    aiWaitFor
    to wait for page loading to complete before operations
  • Ensure input boxes are interactive before form operations

Android 平台

Android Platform

  • 需要配置
    deviceId
    (ADB 设备 ID,如
    emulator-5554
  • 使用
    launch: "com.example.app"
    启动应用(在 flow 中作为 action 步骤)
  • 可使用
    runAdbShell
    执行 ADB 命令
  • Need to configure
    deviceId
    (ADB device ID, e.g.,
    emulator-5554
    )
  • Use
    launch: "com.example.app"
    to launch the app (as an action step in flow)
  • Can use
    runAdbShell
    to execute ADB commands

iOS 平台

iOS Platform

  • 需要配置
    wdaPort
    (WebDriverAgent 端口,默认 8100)和
    wdaHost
    (默认 localhost)
  • 使用
    launch: "com.example.app"
    启动应用(在 flow 中作为 action 步骤)
  • 可使用
    runWdaRequest
    发送 WebDriverAgent 请求
  • Need to configure
    wdaPort
    (WebDriverAgent port, default 8100) and
    wdaHost
    (default localhost)
  • Use
    launch: "com.example.app"
    to launch the app (as an action step in flow)
  • Can use
    runWdaRequest
    to send WebDriverAgent requests

Computer 平台

Computer Platform

  • 用于通用桌面自动化场景
  • For general desktop automation scenarios

常见错误模式(Anti-patterns)

Common Anti-patterns

生成 YAML 时应避免以下常见错误:
  • 不必要地使用嵌套对象格式 — 推荐扁平格式(
    aiInput: "搜索框"
    +
    value: "关键词"
    ),更简洁可读。嵌套格式(
    aiInput: { locator: "搜索框", value: "关键词" }
    )在两种模式中均有效,但通常只在需要
    locate
    图片定位等复杂参数时才使用
  • Extended 模式遗漏
    engine: extended
    — 使用任何扩展功能(变量、循环、条件等)时必须声明引擎
  • 循环忘记
    maxIterations
    while
    循环必须设置安全上限,
    for
    repeat
    循环的 count 不应超过 10000
  • aiWaitFor
    使用嵌套对象格式
    — 应使用
    aiWaitFor: "条件"
    +
    timeout: 10000
    ,而非
    aiWaitFor: { condition: "条件" }
  • 缺少
    features
    声明
    — Extended 模式应列出使用的特性,便于检测和优化
Avoid the following common mistakes when generating YAML:
  • Unnecessary use of nested object format — Flat format is recommended (
    aiInput: "Search box"
    +
    value: "Keyword"
    ), which is more concise and readable. Nested format (
    aiInput: { locator: "Search box", value: "Keyword" }
    ) is valid in both modes but is usually only used when complex parameters like
    locate
    image positioning are needed
  • Missing
    engine: extended
    in Extended mode
    — Must declare the engine when using any extended features (variables, loops, conditions, etc.)
  • Forgetting
    maxIterations
    in loops
    while
    loops must set a safety upper limit, the count of
    for
    and
    repeat
    loops should not exceed 10000
  • Using nested object format for
    aiWaitFor
    — Should use
    aiWaitFor: "Condition"
    +
    timeout: 10000
    instead of
    aiWaitFor: { condition: "Condition" }
  • Missing
    features
    declaration
    — Extended mode should list the features used to facilitate detection and optimization

输出前自检清单

Pre-output Self-check List

生成 YAML 后,在输出前核验以下事项:
  • 每个
    aiInput
    都有对应的
    value
    参数?
  • 关键操作后有
    aiWaitFor
    确保页面状态就绪?
  • Extended 模式声明了
    engine: extended
    features
    列表?
  • 循环有安全上限(
    maxIterations
    或合理的
    count
    )?
  • 敏感信息(密码、Token)使用
    ${ENV:XXX}
    引用环境变量?
  • AI 指令描述足够精确(包含位置、文字、颜色等特征)?
After generating YAML, verify the following items before output:
  • Does each
    aiInput
    have a corresponding
    value
    parameter?
  • Is there
    aiWaitFor
    after key operations to ensure page state is ready?
  • Does Extended mode declare
    engine: extended
    and
    features
    list?
  • Does the loop have a safety upper limit (
    maxIterations
    or reasonable
    count
    )?
  • Are sensitive information (passwords, Tokens) referenced via environment variables using
    ${ENV:XXX}
    ?
  • Are AI instruction descriptions precise enough (including features like position, text, color)?

注意事项

Notes

  • AI 指令(aiTap、aiAssert 等)的参数使用自然语言描述,不需要 CSS 选择器
  • 中文和英文描述均可,Midscene 的 AI 引擎支持多语言
  • aiQuery
    的结果通过
    name
    字段存储,在后续步骤中用
    ${name}
    引用(仅 Extended 模式)
  • aiWaitFor
    建议设置合理的
    timeout
    (毫秒),默认通常为 15 秒
  • 循环中务必设置
    maxIterations
    作为安全上限,防止无限循环
  • ${ENV:XXX}
    ${ENV.XXX}
    可引用环境变量,避免在 YAML 中硬编码敏感信息
  • 始终显式声明
    engine
    字段,避免自动检测带来的意外行为
  • 变量引用区分大小写:
    ${userName}
    ${username}
    是不同的变量
  • 避免循环导入:A.yaml 导入 B.yaml、B.yaml 又导入 A.yaml 会导致运行时错误
  • 生成后务必通过
    --dry-run
    验证语法和结构(注意:
    --dry-run
    不检测模型配置,AI 操作需要配置
    MIDSCENE_MODEL_API_KEY
    才能实际执行)
  • 提示用户可以用 Midscene Runner skill 来执行生成的文件
  • Parameters for AI instructions (aiTap, aiAssert, etc.) are described in natural language, no CSS selectors needed
  • Both Chinese and English descriptions are acceptable, Midscene's AI engine supports multiple languages
  • Results of
    aiQuery
    are stored via the
    name
    field and can be referenced in subsequent steps using
    ${name}
    (Extended mode only)
  • It is recommended to set a reasonable
    timeout
    (in milliseconds) for
    aiWaitFor
    , default is usually 15 seconds
  • Be sure to set
    maxIterations
    as a safety upper limit in loops to prevent infinite loops
  • ${ENV:XXX}
    or
    ${ENV.XXX}
    can be used to reference environment variables, avoiding hardcoding sensitive information in YAML
  • Always explicitly declare the
    engine
    field to avoid unexpected behavior from automatic detection
  • Variable references are case-sensitive:
    ${userName}
    and
    ${username}
    are different variables
  • Avoid circular imports: Importing B.yaml in A.yaml and A.yaml in B.yaml will cause runtime errors
  • Be sure to verify syntax and structure via
    --dry-run
    after generation (Note:
    --dry-run
    does not detect model configuration, AI operations require
    MIDSCENE_MODEL_API_KEY
    to be configured for actual execution)
  • Prompt users to use the Midscene Runner skill to execute the generated file

迭代修复流程

Iterative Fix Process

当生成的 YAML 执行失败时:
  1. Runner 可自行修复:如果错误可以通过修改 YAML 解决(如定位描述不够精确、等待时间不足),Runner Skill 会直接修改并重试
  2. 需要重新生成时:如果错误涉及根本性设计问题(如选错模式、缺少关键步骤),用户可以向 Generator 描述失败情况,Generator 会基于错误信息重新生成改进版 YAML
  3. 推荐流程:生成 → dry-run 验证 → 执行 → 如失败,描述错误让 Generator 修复 → 重新执行
When the generated YAML fails to execute:
  1. Runner can fix it automatically: If the error can be resolved by modifying YAML (e.g., imprecise location description, insufficient wait time), Runner Skill will directly modify and retry
  2. When regeneration is needed: If the error involves fundamental design issues (e.g., wrong mode selected, missing key steps), users can describe the failure to Generator, which will regenerate an improved YAML based on the error information
  3. Recommended Flow: Generate → dry-run validation → Execute → If failed, describe error for Generator to fix → Re-execute

协作协议

Collaboration Agreement

生成完成后,向用户返回以下结构化信息:
  1. 生成的文件路径:
    ./midscene-output/<filename>.yaml
  2. 执行模式: native 或 extended
  3. 建议的下一步命令:
    node scripts/midscene-run.js <path> --dry-run
  4. 如果 dry-run 验证失败,自动分析错误并修复 YAML,重新验证
After generation is complete, return the following structured information to the user:
  1. Generated File Path:
    ./midscene-output/<filename>.yaml
  2. Execution Mode: native or extended
  3. Recommended Next Command:
    node scripts/midscene-run.js <path> --dry-run
  4. If dry-run validation fails, automatically analyze the error, fix the YAML, and re-validate