codex-readiness-integration-test

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM Codex Readiness Integration Test

LLM Codex就绪性集成测试

This skill runs a multi-stage integration test to validate agentic execution quality. It always runs in execute mode (no read-only mode).

本Skill运行多阶段集成测试，以验证Agent执行质量。它始终在执行模式下运行（无只读模式）。

Outputs

输出内容

Each run writes to

.codex-readiness-integration-test/<timestamp>/

and updates

.codex-readiness-integration-test/latest.json

New outputs per run:

```
agentic_summary.json
```
and
```
logs/agentic.log
```
(agentic loop execution)
```
llm_results.json
```
(automatic LLM evaluation)
```
summary.txt
```
(human-readable summary)

每次运行都会写入

.codex-readiness-integration-test/<timestamp>/

目录，并更新

.codex-readiness-integration-test/latest.json

文件。

每次运行生成的新输出：

```
agentic_summary.json
```
和
```
logs/agentic.log
```
（Agent循环执行记录）
```
llm_results.json
```
（自动LLM评估结果）
```
summary.txt
```
（人类可读的总结）

Pre-conditions (Required)

前置条件（必填）

Authenticate with the Codex CLI using the repo-local HOME before running the test. Run these in your own terminal (not via the integration test): HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login status
The integration test creates {repo_root}/.codex-home and {repo_root}/.codex-home/.cache/codex as its first step.

在运行测试前，使用仓库本地HOME通过Codex CLI完成身份验证。请在您自己的终端中运行以下命令（不要通过集成测试运行）： HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login HOME=$PWD/.codex-home XDG_CACHE_HOME=$PWD/.codex-home/.cache codex login status

集成测试的第一步会创建

{repo_root}/.codex-home

和

{repo_root}/.codex-home/.cache/codex

目录。

Workflow

工作流程

Ask the user how to source the task.
- Offer two explicit options: (a) user provides a custom task/prompt, or (b) auto-generate a task.
- Do not run the entry point until the user chooses one option.
Generate or load
```
{out_dir}/prompt.pending.json
```
.
- Use the integration test's expected prompt path, not
```
prompt.json
```
  at the repo root.
- With the default out dir, this path is
```
.codex-readiness-integration-test/prompt.pending.json
```
  .
- If
```
--seed-task
```
  is provided, it is used as the starting task.
- If not provided, generate a task with
```
skills/codex-readiness-integration-test/references/generate_prompt.md
```
  and save the JSON to
```
{out_dir}/prompt.pending.json
```
  .
- The user must approve the prompt before execution (no auto-approve mode). Make sure to output a summary of the prompt when asking the user to approve.
Execute the agentic loop via Codex CLI (uses
```
AGENTS.md
```
and
```
change_prompt
```
).

Run build/test commands from the prompt plan via

skills/codex-readiness-integration-test/scripts/run_plan.py

Collect evidence (
```
evidence.json
```
), deterministic checks, and run automatic LLM evals via Codex CLI.
Score and write the report + summary output.

询问用户任务的获取方式。
- 提供两个明确选项：(a) 用户提供自定义任务/提示词，或(b) 自动生成任务。
- 在用户选择其中一个选项前，不要运行入口程序。
生成或加载
```
{out_dir}/prompt.pending.json
```
文件。
- 使用集成测试预期的提示词路径，而非仓库根目录下的
```
prompt.json
```
  。
- 默认输出目录下，该路径为
```
.codex-readiness-integration-test/prompt.pending.json
```
  。
- 如果提供了
```
--seed-task
```
  参数，则将其作为初始任务。
- 如果未提供，则使用
```
skills/codex-readiness-integration-test/references/generate_prompt.md
```
  生成任务，并将JSON保存到
```
{out_dir}/prompt.pending.json
```
  。
- 用户必须在执行前批准该提示词（无自动批准模式）。请求用户批准时，务必输出提示词的摘要。
通过Codex CLI执行Agent循环（使用
```
AGENTS.md
```
和
```
change_prompt
```
）。

通过

skills/codex-readiness-integration-test/scripts/run_plan.py

运行提示词计划中的构建/测试命令。

收集证据（
```
evidence.json
```
）、确定性检查结果，并通过Codex CLI运行自动LLM评估。
评分并写入报告及总结输出。

Configuration

配置

Optional fields in

{out_dir}/prompt.pending.json

```
agentic_loop
```
: configure Codex CLI invocation for the agentic loop.
```
llm_eval
```
: configure Codex CLI invocation for automatic evals.

If these fields are omitted, defaults are used.

{out_dir}/prompt.pending.json

中的可选字段：

```
agentic_loop
```
：配置Agent循环的Codex CLI调用参数。
```
llm_eval
```
：配置自动评估的Codex CLI调用参数。

如果省略这些字段，则使用默认值。

Requirements

要求

The LLM evaluator must fail if evidence mentions the phrase
```
Context compaction enabled
```
.
Use qualitative context-usage evaluation (no strict thresholds).

如果证据中提及短语
```
Context compaction enabled
```
，LLM评估器必须判定失败。
使用定性的上下文使用情况评估（无严格阈值）。

What this test covers well

本测试的适用场景

Runs Codex CLI against the real repo root, producing real filesystem edits and git diffs.
Executes the approved change prompt and then runs the build/test plan in-repo.
Captures evidence, deterministic checks, and LLM eval artifacts for review.

针对真实仓库根目录运行Codex CLI，生成真实的文件系统修改和Git差异。
执行已批准的变更提示词，然后在仓库内运行构建/测试计划。
收集证据、确定性检查结果和LLM评估工件以供审查。

What this test does not represent

本测试的局限性

The agentic loop may use non-default flags (e.g., bypass approvals/sandbox), so interactive guardrails differ.
Uses a dedicated HOME (
```
.codex-home
```
), which can change auth/config/cache vs normal CLI use.
Auto-generated prompts and one-shot execution do not simulate interactive guidance.
MCP servers/tools are not exercised unless explicitly configured.

Agent循环可能使用非默认标志（如绕过批准/沙箱），因此交互式防护措施有所不同。
使用专用HOME目录（
```
.codex-home
```
），这可能会改变与常规CLI使用不同的身份验证/配置/缓存。
自动生成的提示词和一次性执行无法模拟交互式引导。
除非明确配置，否则不会调用MCP服务器/工具。

Notes

注意事项

The prompts in

skills/codex-readiness-integration-test/references/

expect strict JSON.

Use

skills/codex-readiness-integration-test/references/json_fix.md

to repair invalid JSON output.

This skill calls the
```
codex
```
CLI. Ensure it is installed and available on PATH, or override the command in
```
{out_dir}/prompt.pending.json
```
.
If the agentic loop detects sandbox-blocked tool access, it now writes
```
requires_escalation: true
```
to
```
{run_dir}/agentic_summary.json
```
and exits with code
```
3
```
. Re-run the integration test with escalated permissions in that case.

```
skills/codex-readiness-integration-test/references/
```
目录下的提示词要求严格遵循JSON格式。

使用

skills/codex-readiness-integration-test/references/json_fix.md

修复无效的JSON输出。

本Skill会调用
```
codex
```
CLI。请确保它已安装并在PATH中可用，或在
```
{out_dir}/prompt.pending.json
```
中覆盖该命令。
如果Agent循环检测到沙箱阻止的工具访问，它现在会将
```
requires_escalation: true
```
写入
```
{run_dir}/agentic_summary.json
```
并以代码
```
3
```
退出。这种情况下，请使用提升的权限重新运行集成测试。