dexholdem-v2

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

DexHoldem Robot Skill

DexHoldem 机器人技能

This skill runs a physical two-player Texas Hold'em setup with a dexterous robot hand. The coding agent owns perception orchestration, state maintenance, poker reasoning, and recovery decisions. Python helpers do deterministic work: preflight, image capture, state-file updates, action translation, and robot command dispatch, and next-move routing.
The main agent owns final state interpretation. Helpers may mutate caches, action metadata, and state files only when the main agent invokes them. Visual subagents never write state files; they only return evidence for the main agent to merge.
The workflow is state-folder based. Every decision is grounded in the current state image, parsed state markdown, local caches, and the current action sequence.
本技能通过灵巧机器人手运行实体双人德州扑克(Texas Hold'em)系统。编码Agent负责感知编排、状态维护、扑克推理和恢复决策。Python辅助工具负责确定性工作:预检、图像捕获、状态文件更新、动作转换、机器人命令调度以及下一步动作路由。
主Agent拥有最终状态的解释权。只有当主Agent调用辅助工具时,它们才能修改缓存、动作元数据和状态文件。视觉子Agent永远不会写入状态文件;它们仅返回证据供主Agent合并。
工作流基于状态文件夹构建。每一项决策都基于当前状态图像、解析后的状态markdown文件、本地缓存以及当前动作序列。

Session Start

会话启动

First, from the user's working directory, expose the helper scripts at the workspace root:
bash
ln -s .agents/skills/dexholdem-v2/scripts/*.py ./
For Claude installations, use the Claude skill path instead:
bash
ln -s .claude/skills/dexholdem-v2/scripts/*.py ./
Then run preflight from the user's working directory:
bash
python3 preflight.py
python3 preflight.py --exp-name my_run
For a hardware-free smoke check:
bash
python3 preflight.py --skip-camera --skip-remote --skip-audio
Pause after preflight. Inspect the printed result, confirm the experiment directory exists, confirm
s0/00_capture.jpg
exists when camera was not skipped, and report any preflight error or suspicious setup instead of continuing the workflow automatically.
Preflight creates
experiments/<exp-name>/
, points
experiments/current
to that folder, initializes
s0/
and
s_current
, copies the executable helper scripts plus
pyproject.toml
and
config.yaml
into the experiment root, and validates remote click coordinates before capturing
s0/00_capture.jpg
unless camera checks are skipped.
After preflight, work from the experiment root:
bash
cd experiments/current
python3 state.py current
Perform one visual pass for blind/dealer assignment using
visual_guidelines/BLIND_BUTTON_RECOGNITION.md
, then cache the result:
bash
python3 state.py set-blinds --dealer robot --small-blind robot --big-blind opponent --source-state s0
Blind amounts are fixed for this setup: the small blind is an initial bet of 5 chips, and the big blind is an initial bet of 10 chips. Use the cached small-blind/big-blind assignment with visible bet recognition when reasoning about preflop current bets.
首先,从用户的工作目录中,将辅助脚本暴露在工作区根目录:
bash
ln -s .agents/skills/dexholdem-v2/scripts/*.py ./
对于Claude部署,请使用Claude技能路径:
bash
ln -s .claude/skills/dexholdem-v2/scripts/*.py ./
然后从用户工作目录运行预检:
bash
python3 preflight.py
python3 preflight.py --exp-name my_run
如需无硬件冒烟测试:
bash
python3 preflight.py --skip-camera --skip-remote --skip-audio
预检后暂停。检查打印结果,确认实验目录已创建;若未跳过相机检查,确认
s0/00_capture.jpg
存在;如有预检错误或可疑配置,请报告,不要自动继续工作流。
预检会创建
experiments/<exp-name>/
,将
experiments/current
指向该文件夹,初始化
s0/
s_current
,将可执行辅助脚本以及
pyproject.toml
config.yaml
复制到实验根目录,并在捕获
s0/00_capture.jpg
前验证远程点击坐标(除非跳过相机检查)。
预检完成后,进入实验根目录工作:
bash
cd experiments/current
python3 state.py current
使用
visual_guidelines/BLIND_BUTTON_RECOGNITION.md
执行一次盲注/庄家分配的视觉识别,然后缓存结果:
bash
python3 state.py set-blinds --dealer robot --small-blind robot --big-blind opponent --source-state s0
本系统的盲注金额固定:小盲注初始下注5个筹码,大盲注初始下注10个筹码。在翻牌前推理当前下注时,请使用缓存的小盲注/大盲注分配结果结合可见下注识别。

State Contract

状态契约

The experiment root contains the timeline and the durable caches:
text
experiments/current/
  s0/
    00_capture.jpg
    01_parsed_state.md
    02_action.md
  s1/
  s_current -> s1
  hole_card_cache.json
  action_sequence.json
Each state folder is filled in this order:
  1. 00_capture.jpg
    - exact image used for visual parsing.
  2. 01_parsed_state.md
    - agent-authored parsed state markdown with one JSON block.
  3. 02_action.md
    - committed decision, execution result, and translated commands.
Create the next state only after
02_action.md
exists for the current state:
bash
python3 state.py begin-next --after s0
After
02_action.md
is written, create the next state and capture a fresh observation. This applies to ordinary poker actions, waits, continued
acting
or
atom_idle
sequences,
to_recover
states,
show_hand
,
win
, and
down
states that need recovery or collection. The fresh state is how the agent verifies what physically happened.
The normal exceptions are
stop
, which ends the session instead of continuing the timeline, and
request_human
, which blocks automatic state advance until a human confirms how to proceed.
实验根目录包含时间线和持久化缓存:
text
experiments/current/
  s0/
    00_capture.jpg
    01_parsed_state.md
    02_action.md
  s1/
  s_current -> s1
  hole_card_cache.json
  action_sequence.json
每个状态文件夹按以下顺序填充:
  1. 00_capture.jpg
    - 用于视觉解析的精确图像。
  2. 01_parsed_state.md
    - 由Agent编写的解析后状态markdown文件,包含一个JSON块。
  3. 02_action.md
    - 已提交的决策、执行结果和转换后的命令。
只有当前状态的
02_action.md
存在后,才能创建下一个状态:
bash
python3 state.py begin-next --after s0
02_action.md
写入后,创建下一个状态并捕获新的观测结果。这适用于普通扑克动作、等待、持续的
acting
atom_idle
序列、
to_recover
状态、
show_hand
win
以及需要恢复或整理的
down
状态。新状态用于让Agent验证实际发生的情况。
常规例外情况是
stop
(结束会话而非继续时间线)和
request_human
(阻止状态自动推进,直到人类确认后续操作)。

Loop Stage

循环阶段

loop_stage
records the state of the robot workflow after visual parsing is complete. Visual parsing itself is not a durable stage: the agent should wait for vision model or vision-agent calls to finish, then write one final parsed state for the current folder.
  • acting
    - a robot atom action was dispatched recently or the hand is still moving. The next agent action should normally be
    wait
    , followed by a fresh capture.
  • atom_idle
    - the hand has settled after an atom action, but the full
    action_sequence.json
    still has pending steps. Continue or verify that sequence; do not start a new poker action.
  • idle
    - the full action sequence is complete, the hand is near rest pose, and the agent may make the next poker decision.
  • show_hand
    - the opponent has shown hole cards or showdown has been reached; reveal the robot hole cards as needed and resolve the outcome.
  • win
    - the robot has won because the opponent folded or the known showdown cards give the robot the stronger hand. Pull back the recognized bet chips.
  • lose
    - the robot has lost because it folded or the known showdown cards give the opponent the stronger hand. Do not pull chips back.
  • to_recover
    - the previous atom action appears to have failed harmlessly or had no effect after the hand settled, and the table layout is still safe enough to retry or repair using the cached action sequence. Examples: a hole card was not picked up and remains near its original position, or a chip push did not move the intended chip and did not disturb cards/chip layout.
  • down
    - execution is failed, interrupted, blocked, or unsafe to continue blindly.
A completed parsed state should use one of these values.
loop_stage
记录视觉解析完成后机器人工作流的状态。视觉解析本身并非持久化阶段:Agent应等待视觉模型或视觉Agent的调用完成,然后为当前文件夹写入最终的解析状态。
  • acting
    - 最近已调度机器人原子动作,或手部仍在移动。Agent的下一个动作通常应为
    wait
    ,随后进行新的捕获。
  • atom_idle
    - 原子动作完成后手部已稳定,但
    action_sequence.json
    仍有未完成步骤。继续或验证该序列;不要启动新的扑克动作。
  • idle
    - 完整动作序列已完成,手部接近静止姿态,Agent可做出下一个扑克决策。
  • show_hand
    - 对手已亮出底牌或进入摊牌阶段;根据需要亮出机器人底牌并判定结果。
  • win
    - 机器人获胜(对手弃牌或已知摊牌卡牌使机器人手牌更强)。收回已识别的下注筹码。
  • lose
    - 机器人失败(已弃牌或已知摊牌卡牌使对手手牌更强)。不要收回筹码。
  • to_recover
    - 上一个原子动作看似无危害地失败,或手部稳定后未产生效果,且桌面布局仍足够安全,可使用缓存的动作序列重试或修复。例如:底牌未被拿起,仍留在原位置附近;或筹码推动未移动目标筹码,且未干扰卡牌/筹码布局。
  • down
    - 执行失败、中断、受阻,或盲目继续存在安全风险。
已完成的解析状态应使用上述值之一。

Caches

缓存

hole_card_cache.json
is authoritative for hole cards because viewed cards are returned face-down and cannot be read again from the table image. It also stores the blind/dealer assignment recognized at session start.
action_sequence.json
is authoritative for multi-step embodied progress. It contains the original translator output under
plan
plus mutable step status. Use the cached
plan
when retrying, verifying, or diagnosing the same action sequence; do not recompute the plan from a later table state.
Step status is deliberately physical:
  • pending
    means the atom has not been dispatched.
  • dispatched
    means
    executor.py
    sent the robot policy, but the next capture has not yet verified the physical result.
  • completed
    means the atom was visually verified in an
    atom_idle
    state.
executor.py
dispatches at most one robot atom command per state. It marks the step
dispatched
; the main agent marks it
completed
only after visual verification.
Useful cache helpers:
bash
python3 state.py cache-card --slot left --card Ah --source-state s3 --confidence 0.9
python3 action_translator.py --action '{"action":"view_card","position":"left"}' --as-sequence-cache
python3 state.py start-action --sequence-json '<translator sequence-cache JSON>'
python3 state.py dispatch-step --step pick_card
python3 state.py complete-step --step read_card
python3 state.py prepare-retry --step push_chip_10_1 --reason to_recover
python3 state.py next-hand
python3 state.py next-hand --refresh-blinds
python3 state.py set-loop-stage --stage to_recover
python3 state.py set-loop-stage --stage show_hand
python3 state.py set-loop-stage --stage win
python3 state.py set-loop-stage --stage lose
python3 state.py set-loop-stage --stage atom_idle
python3 state.py set-loop-stage --stage acting
hole_card_cache.json
是底牌的权威来源,因为已查看的卡牌会被翻面放回,无法从桌面图像再次读取。它还存储会话启动时识别的盲注/庄家分配结果。
action_sequence.json
是多步骤实体化进度的权威来源。它包含
plan
下的原始转换器输出以及可变的步骤状态。重试、验证或诊断同一动作序列时,请使用缓存的
plan
;不要从后续桌面状态重新计算计划。
步骤状态明确对应物理状态:
  • pending
    - 原子动作尚未调度。
  • dispatched
    -
    executor.py
    已发送机器人策略,但尚未通过下一次捕获验证物理结果。
  • completed
    - 原子动作已在
    atom_idle
    状态下通过视觉验证。
executor.py
每个状态最多调度一个机器人原子命令。它将步骤标记为
dispatched
;主Agent仅在视觉验证后将其标记为
completed
实用缓存辅助命令:
bash
python3 state.py cache-card --slot left --card Ah --source-state s3 --confidence 0.9
python3 action_translator.py --action '{"action":"view_card","position":"left"}' --as-sequence-cache
python3 state.py start-action --sequence-json '<translator sequence-cache JSON>'
python3 state.py dispatch-step --step pick_card
python3 state.py complete-step --step read_card
python3 state.py prepare-retry --step push_chip_10_1 --reason to_recover
python3 state.py next-hand
python3 state.py next-hand --refresh-blinds
python3 state.py set-loop-stage --stage to_recover
python3 state.py set-loop-stage --stage show_hand
python3 state.py set-loop-stage --stage win
python3 state.py set-loop-stage --stage lose
python3 state.py set-loop-stage --stage atom_idle
python3 state.py set-loop-stage --stage acting

Router Reference

路由参考

After the current state has a capture and parsed state, the local router gives the initial gate:
bash
python3 router.py
The router returns
route
,
reason
,
agent_required
,
judged_results
, and optional commands. It does not parse images, decide poker strategy, or declare unsafe physical recovery by itself; those remain main-agent responsibilities.
当前状态拥有捕获图像和解析状态后,本地路由器会给出初始指引:
bash
python3 router.py
路由器返回
route
reason
agent_required
judged_results
以及可选命令。它不会解析图像、决定扑克策略或声明不安全的物理恢复;这些仍属于主Agent的职责。

Visual Parsing

视觉解析

Use the files in
visual_guidelines/
as needed to write a truthful
loop_stage
,
robot
, and table fields. Multiple visual checks may be used for the same captured state when they add useful information. The visual model may answer in plain language; the coding agent converts those answers into
01_parsed_state.md
.
When visual information is needed, the main agent MUST delegate image reading to visual subagents. Assign each subagent one guideline or one visual question, such as scene stability, robot behavior, turn button, community cards, bets, chip inventory, held card reading, or showdown outcome. Give each subagent the current image, relevant recent images, cache summaries, action-sequence context, and the appropriate visual guideline as its prompt. Subagents are read-only evidence providers: they must inspect images and context, return findings, evidence, uncertainty, and suggested parsed fields, but must not edit state files. The main agent merges the subagent outputs, resolves conflicts conservatively, and writes the single authoritative
s_current/01_parsed_state.md
.
Guideline purposes:
  • SCENE_STABILITY.md
    - action completion, waiting decisions, and movement checks, usually paired with recent images.
  • ROBOT_BEHAVIOR.md
    - dexterous-hand pose, motion, held objects, physical safety, atom progress, and recovery context. A robot-behavior subagent should receive at least the current image and the previous captured image so it can judge motion, progress, and whether the hand has actually settled.
  • TABLE_GEOMETRY.md
    - robot/opponent orientation, betting zones, inventory zones, and camera/table layout.
  • BLIND_BUTTON_RECOGNITION.md
    - dealer, small blind, and big blind buttons.
  • HELD_CARD_RECOGNITION.md
    - readable hole card held by the robot hand.
  • TURN_DETECTION.md
    - physical white turn button and
    is_my_turn
    .
  • COMMUNITY_CARDS.md
    - shared board cards.
  • SHOWDOWN_OUTCOME.md
    - showdown state, revealed cards, fold/win/lose outcome.
  • CHIP_RECOGNITION.md
    - remaining chip inventories.
  • BET_RECOGNITION.md
    - current bet chips in each betting area.
It is acceptable to refresh
is_my_turn
, board, chips, bets, and robot state on every captured image if that helps keep the parsed state current. The router will decide which fields matter for the current
loop_stage
.
Keep parsed state compact:
json
{
  "loop_stage": "idle",
  "robot": "dexterous hand is near its initial pose and not holding a card or chips",
  "table": {
    "scene_stable": true,
    "uncertain_fields": [],
    "is_my_turn": true,
    "community_cards": [],
    "my_chips": {"5": 4, "10": 3, "50": 3, "100": 3},
    "opponent_chips": {"5": 4, "10": 4, "50": 3, "100": 3},
    "my_current_bet": {"5": 0, "10": 0, "50": 0, "100": 0},
    "opponent_bet": {"5": 0, "10": 0, "50": 0, "100": 0}
  }
}
Derived concepts such as poker street, total call amount, and turn confidence can be inferred later from the stored cards, chip counts, and turn button state; they do not belong in
01_parsed_state.md
.
The router uses stage-specific required fields. An
idle
state needs the full table block shown above. Non-idle states must still include a
table
object, but it may be sparse when fields were not visually parsed and are irrelevant to the current gate. Include
uncertain_fields
when an omitted or unclear value matters to the next action.
For showdown, use
loop_stage
as the main compact signal. Add only small table notes that help routing or verification, such as visible opponent hole cards; do not store bulky hand-ranking explanations.
根据需要使用
visual_guidelines/
中的文件,如实填写
loop_stage
robot
和table字段。同一捕获状态可进行多次视觉检查,以补充有用信息。视觉模型可使用自然语言作答;编码Agent需将这些答案转换为
01_parsed_state.md
当需要视觉信息时,主Agent必须将图像读取任务委托给视觉子Agent。为每个子Agent分配一个指南或一个视觉问题,例如场景稳定性、机器人行为、回合按钮、公共牌、下注、筹码库存、手牌读取或摊牌结果。为每个子Agent提供当前图像、相关近期图像、缓存摘要、动作序列上下文以及相应的视觉指南作为提示。子Agent是只读证据提供者:它们必须检查图像和上下文,返回发现、证据、不确定性和建议的解析字段,但不得编辑状态文件。主Agent合并子Agent的输出,保守地解决冲突,并写入唯一权威的
s_current/01_parsed_state.md
指南用途:
  • SCENE_STABILITY.md
    - 动作完成、等待决策和移动检查,通常与近期图像配合使用。
  • ROBOT_BEHAVIOR.md
    - 灵巧手姿态、运动、握持物体、物理安全、原子动作进度和恢复上下文。机器人行为子Agent应至少接收当前图像和上一次捕获的图像,以便判断运动、进度以及手部是否真的稳定。
  • TABLE_GEOMETRY.md
    - 机器人/对手朝向、下注区域、库存区域以及相机/桌面布局。
  • BLIND_BUTTON_RECOGNITION.md
    - 庄家、小盲注和大盲注按钮。
  • HELD_CARD_RECOGNITION.md
    - 机器人手持的可读底牌。
  • TURN_DETECTION.md
    - 实体白色回合按钮和
    is_my_turn
  • COMMUNITY_CARDS.md
    - 公共牌。
  • SHOWDOWN_OUTCOME.md
    - 摊牌状态、亮出的卡牌、弃牌/获胜/失败结果。
  • CHIP_RECOGNITION.md
    - 剩余筹码库存。
  • BET_RECOGNITION.md
    - 每个下注区域的当前下注筹码。
如果有助于保持解析状态的时效性,可在每次捕获图像时刷新
is_my_turn
、公共牌、筹码、下注和机器人状态。路由器会决定当前
loop_stage
需要哪些字段。
保持解析状态简洁:
json
{
  "loop_stage": "idle",
  "robot": "灵巧手接近初始姿态,未握持卡牌或筹码",
  "table": {
    "scene_stable": true,
    "uncertain_fields": [],
    "is_my_turn": true,
    "community_cards": [],
    "my_chips": {"5": 4, "10": 3, "50": 3, "100": 3},
    "opponent_chips": {"5": 4, "10": 4, "50": 3, "100": 3},
    "my_current_bet": {"5": 0, "10": 0, "50": 0, "100": 0},
    "opponent_bet": {"5": 0, "10": 0, "50": 0, "100": 0}
  }
}
扑克街、总跟注金额和回合置信度等派生概念可稍后从存储的卡牌、筹码数量和回合按钮状态推断得出;它们不属于
01_parsed_state.md
的内容。
路由器使用特定阶段的必填字段。
idle
状态需要上述完整的table块。非idle状态仍需包含
table
对象,但当字段未经过视觉解析且与当前指引无关时,可仅保留精简内容。当某个缺失或不明确的值对下一个动作有影响时,请包含
uncertain_fields
对于摊牌,使用
loop_stage
作为主要的简洁信号。仅添加有助于路由或验证的小型桌面说明,例如可见的对手底牌;不要存储冗长的手牌排名解释。

Poker Reasoning

扑克推理

When the router returns
choose_poker_action
, the main agent MUST delegate the Texas Hold'em reasoning to a reasoning subagent. Give the subagent the current parsed table, hole-card cache, blind/dealer assignment, action history if available, supported action space, and the blind amounts: small blind = 5, big blind = 10.
The reasoning subagent should infer the current betting situation from
my_current_bet
,
opponent_bet
,
my_chips
,
opponent_chips
, community cards, hole cards, turn state, and blind assignment. It should return a concise rationale plus one recommended supported action JSON, such as
check
,
fold
,
call
,
raise
, or
all_in
. The main agent validates that recommendation against the current parsed state, supported action schema, and physical chip constraints, then commits and executes the final action through
executor.py
.
当路由器返回
choose_poker_action
时,主Agent必须将德州扑克推理任务委托给推理子Agent。为子Agent提供当前解析后的桌面状态、底牌缓存、盲注/庄家分配结果(如有可用)、动作历史、支持的动作空间以及盲注金额:小盲注=5,大盲注=10。
推理子Agent应从
my_current_bet
opponent_bet
my_chips
opponent_chips
、公共牌、底牌、回合状态和盲注分配结果中推断当前下注情况。它应返回简洁的推理过程以及一个推荐的支持动作JSON,例如
check
fold
call
raise
all_in
。主Agent验证该建议是否符合当前解析状态、支持的动作模式和物理筹码限制,然后通过
executor.py
提交并执行最终动作。

Actions

动作

Supported action JSON:
json
{"action": "wait", "reason": "scene_unstable", "sleep_seconds": 30}
{"action": "view_card", "position": "left"}
{"action": "show_card", "position": "left"}
{"action": "put_down_card", "position": "left", "face_up": false}
{"action": "check"}
{"action": "fold"}
{"action": "call"}
{"action": "raise", "amount": 80}
{"action": "all_in"}
{"action": "collect_winnings"}
{"action": "collect_winnings", "chip_counts": {"5": 2, "10": 1, "50": 0, "100": 1}}
{"action": "request_human", "reason": "dexterous hand is holding an unreadable card"}
{"action": "stop", "reason": "session ended"}
Run actions through
executor.py
; use
--dry-run
to write the action and action-sequence cache without sending robot commands.
For betting actions, the executor reads
my_chips
,
my_current_bet
, and
opponent_bet
from the current
01_parsed_state.md
table.
call
pushes
sum(opponent_bet) - sum(my_current_bet)
.
raise.amount
is the target total bet after the raise, so the physical chips pushed are
amount - sum(my_current_bet)
.
For
call
and
raise
, chip selection must be exact. If available
my_chips
cannot form the required amount exactly, the translator fails before robot dispatch. Do not silently overpay with a larger chip; choose a different poker action, repair chip recognition, or request human help.
Chip actions are translated into one atom step per moved chip, such as
push_chip_10_1
and
push_chip_5_1
, followed by
verify_idle
.
collect_winnings
pulls chips back after a confirmed
win
. By default it pulls
opponent_bet
and
my_current_bet
as separate source zones from the parsed table, then records those zones in the action sequence. Use
chip_counts
only when visual parsing has a clearer explicit count for the chips that should be pulled back and zone information is not reliable.
支持的动作JSON:
json
{"action": "wait", "reason": "scene_unstable", "sleep_seconds": 30}
{"action": "view_card", "position": "left"}
{"action": "show_card", "position": "left"}
{"action": "put_down_card", "position": "left", "face_up": false}
{"action": "check"}
{"action": "fold"}
{"action": "call"}
{"action": "raise", "amount": 80}
{"action": "all_in"}
{"action": "collect_winnings"}
{"action": "collect_winnings", "chip_counts": {"5": 2, "10": 1, "50": 0, "100": 1}}
{"action": "request_human", "reason": "dexterous hand is holding an unreadable card"}
{"action": "stop", "reason": "session ended"}
通过
executor.py
运行动作;使用
--dry-run
可写入动作和动作序列缓存,而不发送机器人命令。
对于下注动作,执行器从当前
01_parsed_state.md
的table中读取
my_chips
my_current_bet
opponent_bet
call
会推动
sum(opponent_bet) - sum(my_current_bet)
的筹码。
raise.amount
是加注后的目标总下注额,因此实际推动的筹码为
amount - sum(my_current_bet)
对于
call
raise
,筹码选择必须精确。如果可用的
my_chips
无法精确凑出所需金额,转换器会在机器人调度前失败。不要默默使用更大的筹码超额支付;请选择其他扑克动作、修复筹码识别或请求人类帮助。
筹码动作会转换为每个移动筹码一个原子步骤,例如
push_chip_10_1
push_chip_5_1
,随后是
verify_idle
collect_winnings
在确认
win
后收回筹码。默认情况下,它会从解析后的桌面中分别收回
opponent_bet
my_current_bet
对应的区域筹码,然后在动作序列中记录这些区域。仅当视觉解析对应收回的筹码有更明确的计数,且区域信息不可靠时,才使用
chip_counts

Recovery

恢复

Use
to_recover
when a recent robot atom failed harmlessly after the hand settled and the current table layout is still safe to retry:
  • during
    view_card
    , the target card was not picked up and remains face-down near its original position,
  • during chip movement, the intended chip did not move or did not follow the hand, and the card/chip layout remains countable and undisturbed,
  • after an atom attempt, no intended physical progress happened but no non-target object moved.
Use
down
when direct continuation is unsafe or unclear:
  • a card was dropped during viewing,
  • a returned card covers chips or hides game state,
  • chip movement displaced cards, buttons, or unrelated chips,
  • chip movement destroyed the table layout,
  • the dexterous hand appears stuck,
  • command progress is unknown,
  • repeated captures remain unstable.
Request human help when a person must fix or confirm the table:
bash
python3 executor.py --action '{"action":"request_human","reason":"Dexterous hand is holding an unreadable card","resume_options":["mark_card","confirm_card_returned","abort_hand"]}'
request_human
is a blocking action. After it writes
02_action.md
, the router returns
human_pause
and does not automatically create the next state. Only after a human confirms the table is fixed should the agent run the router's
commands_after_human
to create and capture the next state.
Retry only when the cached sequence plan and recent images show that repeating the current step is physically safe. In normal routing, that means the parsed state should be
to_recover
; otherwise keep the state
down
and request human help or wait for clearer evidence. For retryable atom failures, use
state.py prepare-retry --step <current_step>
followed by
executor.py --continue-current
; the router emits these commands when the current step has a cached atom command. Safety counters in
action_sequence.json
cap repeated waits and recoveries; when a cap is reached, the router escalates to
request_human
instead of continuing automatically. If a human inspects the table and explicitly approves continuing, run
state.py reset-safety --scope consecutive
before creating the next captured state. Use
--scope all
only when the human intentionally clears total wait or total recovery caps for the session.
After a hand ends, either stop the session or reset local caches before the next hand. Use
state.py next-hand
to clear hole cards and reset
action_sequence.json
while preserving blind/dealer cache. Use
state.py next-hand --refresh-blinds
when the dealer/small-blind button may have moved and blind recognition must run again during the next preflight-like visual pass.
当最近的机器人原子动作在手部稳定后无危害地失败,且当前桌面布局仍可安全重试时,使用
to_recover
  • view_card
    期间,目标卡牌未被拿起,仍翻面留在原位置附近;
  • 在筹码移动期间,目标筹码未移动或未跟随手部,且卡牌/筹码布局仍可计数且未被干扰;
  • 原子动作尝试后,未产生预期的物理进展,但未移动非目标物体。
当直接继续存在安全风险或情况不明时,使用
down
  • 查看卡牌期间掉落卡牌;
  • 放回的卡牌覆盖筹码或隐藏游戏状态;
  • 筹码移动移位了卡牌、按钮或无关筹码;
  • 筹码移动破坏了桌面布局;
  • 灵巧手似乎卡住;
  • 命令进度未知;
  • 多次捕获的图像仍不稳定。
当需要人类修复或确认桌面情况时,请求帮助:
bash
python3 executor.py --action '{"action":"request_human","reason":"Dexterous hand is holding an unreadable card","resume_options":["mark_card","confirm_card_returned","abort_hand"]}'
request_human
是阻塞动作。写入
02_action.md
后,路由器返回
human_pause
,不会自动创建下一个状态。只有在人类确认桌面已修复后,Agent才应运行路由器的
commands_after_human
来创建并捕获下一个状态。
仅当缓存的序列计划和近期图像显示重复当前步骤在物理上安全时,才进行重试。在常规路由中,这意味着解析状态应标记为
to_recover
;否则保持
down
状态并请求人类帮助或等待更明确的证据。对于可重试的原子动作失败,使用
state.py prepare-retry --step <current_step>
,然后运行
executor.py --continue-current
;当当前步骤有缓存的原子命令时,路由器会发出这些命令。
action_sequence.json
中的安全计数器限制了重复等待和恢复的次数;当达到上限时,路由器会升级为
request_human
而非自动继续。如果人类检查桌面并明确批准继续,请在创建下一个捕获状态前运行
state.py reset-safety --scope consecutive
。仅当人类有意清除会话的总等待或总恢复上限时,才使用
--scope all
手牌结束后,要么停止会话,要么在下手牌开始前重置本地缓存。使用
state.py next-hand
清除底牌并重置
action_sequence.json
,同时保留盲注/庄家缓存。当庄家/小盲注按钮可能已移动,且下手牌的预检式视觉识别需重新运行盲注识别时,使用
state.py next-hand --refresh-blinds

Core Workflow

核心工作流

After preflight, repeat this loop from the experiment root until the action is
stop
:
  1. Capture or reuse the current state's image. If
    s_current/00_capture.jpg
    is missing, run
    python3 capture.py --output s_current/00_capture.jpg
    .
  2. Select only the visual guidelines needed for this state, then use visual agents or vision models to parse the current image. Provide recent state images,
    action_sequence.json
    , and
    hole_card_cache.json
    when they help the visual agent judge motion, robot behavior, held cards, chips, bets, showdown, or recovery state.
  3. The main coding agent summarizes the visual outputs into
    s_current/01_parsed_state.md
    . This file is the authoritative parsed state for the router. It must include the compact JSON block with
    loop_stage
    ,
    robot
    , and
    table
    .
  4. Run
    python3 router.py
    . Treat its JSON as the initial gating result for the current state.
  5. Follow the gated route:
    • If the router returns a command and
      agent_required: false
      , run the command.
    • If it asks for visual parsing, repair the parsed state and rerun the router.
    • If it asks to verify a dispatched step, inspect the current image and cached sequence. If the intended atom succeeded, run the provided
      state.py complete-step ...
      command and rerun the router. If it failed harmlessly, mark
      to_recover
      ; if unsafe, mark
      down
      or request human help.
    • If it asks for held-card reading, use visual parsing to read the held card, update
      hole_card_cache.json
      , and continue the cached action sequence.
    • If it returns
      continue_cached_command
      , run
      executor.py --continue-current
      ; this sends the next pending robot atom from
      action_sequence.json
      .
    • If it returns
      recover_retryable
      with commands, run them in order to reset and retry the exact cached atom. If it requires the agent, inspect the cached sequence and recent images before retrying or requesting help.
    • If it returns
      recover_down
      , inspect recent states and choose wait or
      request_human
      ; only retry after the state is safely classified as
      to_recover
      .
    • If it returns
      human_pause
      , wait for human confirmation before running the supplied
      commands_after_human
      .
    • If it returns
      show_hand
      , reveal robot cards as needed with
      show_card
      actions, then use
      SHOWDOWN_OUTCOME.md
      to decide
      win
      ,
      lose
      , or keep resolving showdown ambiguity.
    • If it returns
      collect_winnings
      , execute the suggested
      collect_winnings
      action with
      executor.py
      .
    • If it returns
      hand_lost
      , do not move chips toward the robot; decide whether to wait for reset, request human help, run
      state.py next-hand
      , or stop.
    • If it returns
      choose_poker_action
      , delegate Texas Hold'em reasoning to a reasoning subagent with the parsed table state, hole-card cache, blind/dealer assignment, action history, supported action space, and blind amounts. Validate the subagent's recommended action, use
      action_translator.py
      if you need to inspect the new action sequence, and execute the final action with
      executor.py
      .
  6. Use
    action_translator.py
    when you need to inspect or create the action sequence for a new poker or embodied action. The executor also calls the translator internally before dispatch.
  7. Use
    executor.py
    every time you want to send robot commands or commit an executable action. Do not send robot policy commands directly through
    remote_exec.py
    during normal operation. Examples:
bash
python3 executor.py --action '{"action":"wait","reason":"not_my_turn","sleep_seconds":3}'
python3 executor.py --action '{"action":"view_card","position":"left"}'
python3 executor.py --action '{"action":"show_card","position":"left"}'
python3 executor.py --action '{"action":"put_down_card","position":"left","face_up":false}'
python3 executor.py --continue-current
python3 executor.py --action '{"action":"call"}'
python3 executor.py --action '{"action":"collect_winnings"}'
python3 executor.py --action '{"action":"request_human","reason":"card was dropped"}'
After
executor.py
writes
02_action.md
, create the next state and capture the next observation unless the route is
human_pause
or the action is
stop
:
bash
python3 state.py current
python3 state.py begin-next --after sN
python3 capture.py --output s_current/00_capture.jpg
Then start the loop again from visual parsing. The next image verifies what actually happened after the last wait, retry, robot action, or human-help request.
预检完成后,从实验根目录重复以下循环,直到动作变为
stop
  1. 捕获或复用当前状态的图像。如果
    s_current/00_capture.jpg
    缺失,运行
    python3 capture.py --output s_current/00_capture.jpg
  2. 仅选择当前状态所需的视觉指南,然后使用视觉Agent或视觉模型解析当前图像。当有助于视觉Agent判断运动、机器人行为、握持卡牌、筹码、下注、摊牌或恢复状态时,提供近期状态图像、
    action_sequence.json
    hole_card_cache.json
  3. 主编码Agent将视觉输出汇总为
    s_current/01_parsed_state.md
    。该文件是路由器的权威解析状态,必须包含带有
    loop_stage
    robot
    table
    的精简JSON块。
  4. 运行
    python3 router.py
    。将其返回的JSON作为当前状态的初始指引结果。
  5. 遵循指引路径:
    • 如果路由器返回命令且
      agent_required: false
      ,运行该命令。
    • 如果要求进行视觉解析,修复解析状态并重新运行路由器。
    • 如果要求验证已调度的步骤,检查当前图像和缓存序列。如果预期的原子动作成功,运行提供的
      state.py complete-step ...
      命令并重新运行路由器。如果无危害地失败,标记为
      to_recover
      ;如果存在安全风险,标记为
      down
      或请求人类帮助。
    • 如果要求读取手牌,使用视觉解析读取握持的卡牌,更新
      hole_card_cache.json
      ,并继续缓存的动作序列。
    • 如果返回
      continue_cached_command
      ,运行
      executor.py --continue-current
      ;这会从
      action_sequence.json
      发送下一个待处理的机器人原子动作。
    • 如果返回带有命令的
      recover_retryable
      ,按顺序运行这些命令以重置并重试完全相同的缓存原子动作。如果需要Agent参与,请在重试或请求帮助前检查缓存序列和近期图像。
    • 如果返回
      recover_down
      ,检查近期状态并选择等待或
      request_human
      ;仅当状态被安全分类为
      to_recover
      后才可重试。
    • 如果返回
      human_pause
      ,等待人类确认后再运行提供的
      commands_after_human
    • 如果返回
      show_hand
      ,使用
      show_card
      动作根据需要亮出机器人卡牌,然后使用
      SHOWDOWN_OUTCOME.md
      判定
      win
      lose
      或继续解决摊牌歧义。
    • 如果返回
      collect_winnings
      ,通过
      executor.py
      执行建议的
      collect_winnings
      动作。
    • 如果返回
      hand_lost
      ,不要将筹码移向机器人;决定是等待重置、请求人类帮助、运行
      state.py next-hand
      还是停止。
    • 如果返回
      choose_poker_action
      ,将德州扑克推理任务委托给推理子Agent,提供解析后的桌面状态、底牌缓存、盲注/庄家分配结果、动作历史、支持的动作空间和盲注金额。验证子Agent的推荐动作,如需检查新动作序列请使用
      action_translator.py
      ,并通过
      executor.py
      执行最终动作。
  6. 如需检查或创建新扑克动作或实体动作的动作序列,请使用
    action_translator.py
    。执行器在调度前也会内部调用转换器。
  7. 每次发送机器人命令或提交可执行动作时,请使用
    executor.py
    。正常操作期间不要通过
    remote_exec.py
    直接发送机器人策略命令。示例:
bash
python3 executor.py --action '{"action":"wait","reason":"not_my_turn","sleep_seconds":3}'
python3 executor.py --action '{"action":"view_card","position":"left"}'
python3 executor.py --action '{"action":"show_card","position":"left"}'
python3 executor.py --action '{"action":"put_down_card","position":"left","face_up":false}'
python3 executor.py --continue-current
python3 executor.py --action '{"action":"call"}'
python3 executor.py --action '{"action":"collect_winnings"}'
python3 executor.py --action '{"action":"request_human","reason":"card was dropped"}'
executor.py
写入
02_action.md
后,创建下一个状态并捕获下一个观测结果,除非路径为
human_pause
或动作是
stop
bash
python3 state.py current
python3 state.py begin-next --after sN
python3 capture.py --output s_current/00_capture.jpg
然后从视觉解析再次开始循环。下一张图像用于验证上次等待、重试、机器人动作或人类帮助请求后实际发生的情况。