codex-readiness-integration-test

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LLM Codex Readiness Integration Test

LLM Codex就绪性集成测试

This skill runs a multi-stage integration test to validate agentic execution quality. It always runs in execute mode (no read-only mode).
本Skill运行多阶段集成测试,以验证Agent执行质量。它始终在执行模式下运行(无只读模式)。

Outputs

输出内容

Each run writes to
.codex-readiness-integration-test/<timestamp>/
and updates
.codex-readiness-integration-test/latest.json
.
New outputs per run:
  • agentic_summary.json
    and
    logs/agentic.log
    (agentic loop execution)
  • llm_results.json
    (automatic LLM evaluation)
  • summary.txt
    (human-readable summary)
每次运行都会写入
.codex-readiness-integration-test/<timestamp>/
目录,并更新
.codex-readiness-integration-test/latest.json
文件。
每次运行生成的新输出:
  • agentic_summary.json
    logs/agentic.log
    (Agent循环执行记录)
  • llm_results.json
    (自动LLM评估结果)
  • summary.txt
    (人类可读的总结)

Pre-conditions (Required)

前置条件(必填)

  • Authenticate with the Codex CLI using the repo-local HOME before running the test. Run these in your own terminal (not via the integration test): HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login status
  • The integration test creates {repo_root}/.codex-home and {repo_root}/.codex-home/.cache/codex as its first step.
  • 在运行测试前,使用仓库本地HOME通过Codex CLI完成身份验证。请在您自己的终端中运行以下命令(不要通过集成测试运行): HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login status
  • 集成测试的第一步会创建
    {repo_root}/.codex-home
    {repo_root}/.codex-home/.cache/codex
    目录。

Workflow

工作流程

  1. Ask the user how to source the task.
    • Offer two explicit options: (a) user provides a custom task/prompt, or (b) auto-generate a task.
    • Do not run the entry point until the user chooses one option.
  2. Generate or load
    {out_dir}/prompt.pending.json
    .
    • Use the integration test's expected prompt path, not
      prompt.json
      at the repo root.
    • With the default out dir, this path is
      .codex-readiness-integration-test/prompt.pending.json
      .
    • If
      --seed-task
      is provided, it is used as the starting task.
    • If not provided, generate a task with
      skills/codex-readiness-integration-test/references/generate_prompt.md
      and save the JSON to
      {out_dir}/prompt.pending.json
      .
    • The user must approve the prompt before execution (no auto-approve mode). Make sure to output a summary of the prompt when asking the user to approve.
  3. Execute the agentic loop via Codex CLI (uses
    AGENTS.md
    and
    change_prompt
    ).
  4. Run build/test commands from the prompt plan via
    skills/codex-readiness-integration-test/scripts/run_plan.py
    .
  5. Collect evidence (
    evidence.json
    ), deterministic checks, and run automatic LLM evals via Codex CLI.
  6. Score and write the report + summary output.
  1. 询问用户任务的获取方式。
    • 提供两个明确选项:(a) 用户提供自定义任务/提示词,或(b) 自动生成任务。
    • 在用户选择其中一个选项前,不要运行入口程序。
  2. 生成或加载
    {out_dir}/prompt.pending.json
    文件。
    • 使用集成测试预期的提示词路径,而非仓库根目录下的
      prompt.json
    • 默认输出目录下,该路径为
      .codex-readiness-integration-test/prompt.pending.json
    • 如果提供了
      --seed-task
      参数,则将其作为初始任务。
    • 如果未提供,则使用
      skills/codex-readiness-integration-test/references/generate_prompt.md
      生成任务,并将JSON保存到
      {out_dir}/prompt.pending.json
    • 用户必须在执行前批准该提示词(无自动批准模式)。请求用户批准时,务必输出提示词的摘要。
  3. 通过Codex CLI执行Agent循环(使用
    AGENTS.md
    change_prompt
    )。
  4. 通过
    skills/codex-readiness-integration-test/scripts/run_plan.py
    运行提示词计划中的构建/测试命令。
  5. 收集证据(
    evidence.json
    )、确定性检查结果,并通过Codex CLI运行自动LLM评估。
  6. 评分并写入报告及总结输出。

Configuration

配置

Optional fields in
{out_dir}/prompt.pending.json
:
  • agentic_loop
    : configure Codex CLI invocation for the agentic loop.
  • llm_eval
    : configure Codex CLI invocation for automatic evals.
If these fields are omitted, defaults are used.
{out_dir}/prompt.pending.json
中的可选字段:
  • agentic_loop
    :配置Agent循环的Codex CLI调用参数。
  • llm_eval
    :配置自动评估的Codex CLI调用参数。
如果省略这些字段,则使用默认值。

Requirements

要求

  • The LLM evaluator must fail if evidence mentions the phrase
    Context compaction enabled
    .
  • Use qualitative context-usage evaluation (no strict thresholds).
  • 如果证据中提及短语
    Context compaction enabled
    ,LLM评估器必须判定失败。
  • 使用定性的上下文使用情况评估(无严格阈值)。

What this test covers well

本测试的适用场景

  • Runs Codex CLI against the real repo root, producing real filesystem edits and git diffs.
  • Executes the approved change prompt and then runs the build/test plan in-repo.
  • Captures evidence, deterministic checks, and LLM eval artifacts for review.
  • 针对真实仓库根目录运行Codex CLI,生成真实的文件系统修改和Git差异。
  • 执行已批准的变更提示词,然后在仓库内运行构建/测试计划。
  • 收集证据、确定性检查结果和LLM评估工件以供审查。

What this test does not represent

本测试的局限性

  • The agentic loop may use non-default flags (e.g., bypass approvals/sandbox), so interactive guardrails differ.
  • Uses a dedicated HOME (
    .codex-home
    ), which can change auth/config/cache vs normal CLI use.
  • Auto-generated prompts and one-shot execution do not simulate interactive guidance.
  • MCP servers/tools are not exercised unless explicitly configured.
  • Agent循环可能使用非默认标志(如绕过批准/沙箱),因此交互式防护措施有所不同。
  • 使用专用HOME目录(
    .codex-home
    ),这可能会改变与常规CLI使用不同的身份验证/配置/缓存。
  • 自动生成的提示词和一次性执行无法模拟交互式引导。
  • 除非明确配置,否则不会调用MCP服务器/工具。

Notes

注意事项

  • The prompts in
    skills/codex-readiness-integration-test/references/
    expect strict JSON.
  • Use
    skills/codex-readiness-integration-test/references/json_fix.md
    to repair invalid JSON output.
  • This skill calls the
    codex
    CLI. Ensure it is installed and available on PATH, or override the command in
    {out_dir}/prompt.pending.json
    .
  • If the agentic loop detects sandbox-blocked tool access, it now writes
    requires_escalation: true
    to
    {run_dir}/agentic_summary.json
    and exits with code
    3
    . Re-run the integration test with escalated permissions in that case.
  • skills/codex-readiness-integration-test/references/
    目录下的提示词要求严格遵循JSON格式。
  • 使用
    skills/codex-readiness-integration-test/references/json_fix.md
    修复无效的JSON输出。
  • 本Skill会调用
    codex
    CLI。请确保它已安装并在PATH中可用,或在
    {out_dir}/prompt.pending.json
    中覆盖该命令。
  • 如果Agent循环检测到沙箱阻止的工具访问,它现在会将
    requires_escalation: true
    写入
    {run_dir}/agentic_summary.json
    并以代码
    3
    退出。这种情况下,请使用提升的权限重新运行集成测试。