codex-readiness-integration-test
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLLM Codex Readiness Integration Test
LLM Codex就绪性集成测试
This skill runs a multi-stage integration test to validate agentic execution quality. It always runs in execute mode (no read-only mode).
本Skill运行多阶段集成测试,以验证Agent执行质量。它始终在执行模式下运行(无只读模式)。
Outputs
输出内容
Each run writes to and updates .
.codex-readiness-integration-test/<timestamp>/.codex-readiness-integration-test/latest.jsonNew outputs per run:
- and
agentic_summary.json(agentic loop execution)logs/agentic.log - (automatic LLM evaluation)
llm_results.json - (human-readable summary)
summary.txt
每次运行都会写入目录,并更新文件。
.codex-readiness-integration-test/<timestamp>/.codex-readiness-integration-test/latest.json每次运行生成的新输出:
- 和
agentic_summary.json(Agent循环执行记录)logs/agentic.log - (自动LLM评估结果)
llm_results.json - (人类可读的总结)
summary.txt
Pre-conditions (Required)
前置条件(必填)
- Authenticate with the Codex CLI using the repo-local HOME before running the test. Run these in your own terminal (not via the integration test): HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login status
- The integration test creates {repo_root}/.codex-home and {repo_root}/.codex-home/.cache/codex as its first step.
- 在运行测试前,使用仓库本地HOME通过Codex CLI完成身份验证。请在您自己的终端中运行以下命令(不要通过集成测试运行): HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login status
- 集成测试的第一步会创建和
{repo_root}/.codex-home目录。{repo_root}/.codex-home/.cache/codex
Workflow
工作流程
- Ask the user how to source the task.
- Offer two explicit options: (a) user provides a custom task/prompt, or (b) auto-generate a task.
- Do not run the entry point until the user chooses one option.
- Generate or load .
{out_dir}/prompt.pending.json- Use the integration test's expected prompt path, not at the repo root.
prompt.json - With the default out dir, this path is .
.codex-readiness-integration-test/prompt.pending.json - If is provided, it is used as the starting task.
--seed-task - If not provided, generate a task with and save the JSON to
skills/codex-readiness-integration-test/references/generate_prompt.md.{out_dir}/prompt.pending.json - The user must approve the prompt before execution (no auto-approve mode). Make sure to output a summary of the prompt when asking the user to approve.
- Use the integration test's expected prompt path, not
- Execute the agentic loop via Codex CLI (uses and
AGENTS.md).change_prompt - Run build/test commands from the prompt plan via .
skills/codex-readiness-integration-test/scripts/run_plan.py - Collect evidence (), deterministic checks, and run automatic LLM evals via Codex CLI.
evidence.json - Score and write the report + summary output.
- 询问用户任务的获取方式。
- 提供两个明确选项:(a) 用户提供自定义任务/提示词,或(b) 自动生成任务。
- 在用户选择其中一个选项前,不要运行入口程序。
- 生成或加载文件。
{out_dir}/prompt.pending.json- 使用集成测试预期的提示词路径,而非仓库根目录下的。
prompt.json - 默认输出目录下,该路径为。
.codex-readiness-integration-test/prompt.pending.json - 如果提供了参数,则将其作为初始任务。
--seed-task - 如果未提供,则使用生成任务,并将JSON保存到
skills/codex-readiness-integration-test/references/generate_prompt.md。{out_dir}/prompt.pending.json - 用户必须在执行前批准该提示词(无自动批准模式)。请求用户批准时,务必输出提示词的摘要。
- 使用集成测试预期的提示词路径,而非仓库根目录下的
- 通过Codex CLI执行Agent循环(使用和
AGENTS.md)。change_prompt - 通过运行提示词计划中的构建/测试命令。
skills/codex-readiness-integration-test/scripts/run_plan.py - 收集证据()、确定性检查结果,并通过Codex CLI运行自动LLM评估。
evidence.json - 评分并写入报告及总结输出。
Configuration
配置
Optional fields in :
{out_dir}/prompt.pending.json- : configure Codex CLI invocation for the agentic loop.
agentic_loop - : configure Codex CLI invocation for automatic evals.
llm_eval
If these fields are omitted, defaults are used.
{out_dir}/prompt.pending.json- :配置Agent循环的Codex CLI调用参数。
agentic_loop - :配置自动评估的Codex CLI调用参数。
llm_eval
如果省略这些字段,则使用默认值。
Requirements
要求
- The LLM evaluator must fail if evidence mentions the phrase .
Context compaction enabled - Use qualitative context-usage evaluation (no strict thresholds).
- 如果证据中提及短语,LLM评估器必须判定失败。
Context compaction enabled - 使用定性的上下文使用情况评估(无严格阈值)。
What this test covers well
本测试的适用场景
- Runs Codex CLI against the real repo root, producing real filesystem edits and git diffs.
- Executes the approved change prompt and then runs the build/test plan in-repo.
- Captures evidence, deterministic checks, and LLM eval artifacts for review.
- 针对真实仓库根目录运行Codex CLI,生成真实的文件系统修改和Git差异。
- 执行已批准的变更提示词,然后在仓库内运行构建/测试计划。
- 收集证据、确定性检查结果和LLM评估工件以供审查。
What this test does not represent
本测试的局限性
- The agentic loop may use non-default flags (e.g., bypass approvals/sandbox), so interactive guardrails differ.
- Uses a dedicated HOME (), which can change auth/config/cache vs normal CLI use.
.codex-home - Auto-generated prompts and one-shot execution do not simulate interactive guidance.
- MCP servers/tools are not exercised unless explicitly configured.
- Agent循环可能使用非默认标志(如绕过批准/沙箱),因此交互式防护措施有所不同。
- 使用专用HOME目录(),这可能会改变与常规CLI使用不同的身份验证/配置/缓存。
.codex-home - 自动生成的提示词和一次性执行无法模拟交互式引导。
- 除非明确配置,否则不会调用MCP服务器/工具。
Notes
注意事项
- The prompts in expect strict JSON.
skills/codex-readiness-integration-test/references/ - Use to repair invalid JSON output.
skills/codex-readiness-integration-test/references/json_fix.md - This skill calls the CLI. Ensure it is installed and available on PATH, or override the command in
codex.{out_dir}/prompt.pending.json - If the agentic loop detects sandbox-blocked tool access, it now writes to
requires_escalation: trueand exits with code{run_dir}/agentic_summary.json. Re-run the integration test with escalated permissions in that case.3
- 目录下的提示词要求严格遵循JSON格式。
skills/codex-readiness-integration-test/references/ - 使用修复无效的JSON输出。
skills/codex-readiness-integration-test/references/json_fix.md - 本Skill会调用CLI。请确保它已安装并在PATH中可用,或在
codex中覆盖该命令。{out_dir}/prompt.pending.json - 如果Agent循环检测到沙箱阻止的工具访问,它现在会将写入
requires_escalation: true并以代码{run_dir}/agentic_summary.json退出。这种情况下,请使用提升的权限重新运行集成测试。3