harness-step3-session-management

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Harness Step 3: 建立跨 Session 状态管理

Harness Step 3: Establish Cross-Session State Management

目标

Objectives

创建三个文件，让 agent 在任何新 session 开始时能在 30 秒内恢复工作状态：

```
init.sh
```
：环境初始化脚本，验证项目可以正常启动
```
tasks.json
```
：当前任务清单，agent 的工作指令来源
```
progress.md
```
：人类可读的进度摘要，记录每次 session 的关键信息

核心原则：状态靠文件传递，不靠 agent 的记忆。git log 是主记录，这三个文件是辅助。

Create three files to enable agents to resume work status within 30 seconds at the start of any new session:

```
init.sh
```
: Environment initialization script to verify the project can start normally
```
tasks.json
```
: Current task list, the source of work instructions for agents
```
progress.md
```
: Human-readable progress summary that records key information from each session

Core Principle: State is transferred via files, not agent memory. git log is the primary record, and these three files are supplementary.

执行步骤

Implementation Steps

Step 1：扫描项目启动方式

Step 1: Scan Project Startup Methods

在写

init.sh

之前，先确认项目如何启动和测试：

bash

undefined

Before writing

init.sh

, first confirm how the project starts and is tested:

bash

undefined

读 package.json 的 scripts（Node.js 项目）

Read scripts in package.json (Node.js projects)

cat package.json 2>/dev/null | grep -A 20 '"scripts"'

或读 Makefile（多语言项目）

Or read Makefile (multi-language projects)

cat Makefile 2>/dev/null | head -40

或读 pyproject.toml（Python 项目）

Or read pyproject.toml (Python projects)

cat pyproject.toml 2>/dev/null | grep -A 20 '[tool.poetry.scripts]'

确认现有 AGENTS.md 里的启动命令

Confirm startup commands in existing AGENTS.md

grep -A 5 '启动命令|start|dev|run' AGENTS.md 2>/dev/null


收集：
- 开发服务器启动命令
- 测试命令
- 类型检查/lint 命令（如果有）
- 有没有需要先跑的初始化步骤（如数据库迁移）

---

grep -A 5 '启动命令|start|dev|run' AGENTS.md 2>/dev/null


Collect:
- Development server startup command
- Test command
- Type checking/lint command (if available)
- Any initialization steps that need to be run first (such as database migration)

---

Step 2：创建

init.sh

Step 2: Create

init.sh

init.sh

的作用：每次 session 开始时运行，快速验证环境是否正常，不正常就立即修复再继续。

bash

#!/bin/bash

The role of

init.sh

: Run at the start of each session to quickly verify if the environment is normal, and fix issues immediately if not before proceeding.

bash

#!/bin/bash

init.sh — 每次 session 开始时运行

init.sh — Run at the start of each session

验证开发环境处于可工作状态

Verify that the development environment is in a working state

set -e # 任何步骤失败就停止

echo "=== 检查环境 ==="

set -e # Stop if any step fails

echo "=== Checking Environment ==="

1. 确认在正确目录

1. Confirm in the correct directory

echo "工作目录: $(pwd)"

echo "Working Directory: $(pwd)"

2. 安装依赖（如果 node_modules 不存在）

2. Install dependencies (if node_modules does not exist)

[根据技术栈选择，以下是示例]

[Choose based on tech stack, the following is an example]

Node.js:

if [ ! -d "node_modules" ]; then echo "安装依赖..." npm install fi

if [ ! -d "node_modules" ]; then echo "Installing dependencies..." npm install fi

3. 冒烟测试：验证项目能正常启动

3. Smoke test: Verify the project can start normally

[根据项目实际情况写，目标是用最快的方式验证基本功能正常]

[Write according to actual project situation, the goal is to verify basic functionality in the fastest way]

示例：跑一个最快的测试

Example: Run the fastest test

npm run test -- --testPathPattern=smoke 2>/dev/null || echo "警告：冒烟测试失败，请先修复"

npm run test -- --testPathPattern=smoke 2>/dev/null || echo "Warning: Smoke test failed, please fix first"

echo "=== 环境检查完成，可以开始工作 ===" echo "提示：运行 'git log --oneline -10' 查看最近工作历史"


**写作要求**：
- 根据扫描到的实际启动命令填写，不要留示例注释
- 冒烟测试要快（< 30秒），目的是快速发现环境问题，不是跑完整测试套件
- 如果项目有数据库，加一步检查数据库连接是否正常
- 写完后实际运行一遍，确认脚本无报错：`bash init.sh`

---

echo "=== Environment check completed, ready to start working ===" echo "Tip: Run 'git log --oneline -10' to view recent work history"


**Writing Requirements**:
- Fill in according to the actual startup commands scanned, do not leave example comments
- Smoke tests should be fast (< 30 seconds), aimed at quickly detecting environment issues rather than running the full test suite
- If the project has a database, add a step to check if the database connection is normal
- Run the script after writing to confirm no errors: `bash init.sh`

---

Step 3：创建

tasks.json

Step 3: Create

tasks.json

结构设计：

json

{
  "project": "[项目名]",
  "last_updated": "[今天日期，格式 YYYY-MM-DD]",
  "current_focus": "[当前最重要的一件事，一句话]",
  "tasks": [
    {
      "id": "[模块缩写]-[序号]",
      "title": "[任务标题]",
      "description": "[具体做什么，1-3句话]",
      "status": "pending | in_progress | done | blocked",
      "priority": "high | medium | low",
      "blocked_by": "[阻塞原因，仅 blocked 状态时填写]",
      "verify": "[如何验证这个任务完成了]",
      "requires_eval": false
    }
  ]
}

字段说明（每次新增任务时必须逐字段填写，不能省略）：

字段	是否必填	说明
`id`	必填	模块缩写 + 序号，如 `auth-01` 、 `ui-03` ，简短可读
`title`	必填	任务标题，一句话
`description`	必填	具体做什么，1-3 句话
`status`	必填	初始值为 `pending` ，由 agent 工作时更新
`priority`	必填	`high / medium / low`
`blocked_by`	仅 blocked 时填	阻塞原因
`verify`	必填	如何验证完成，必须是可执行的步骤（命令或操作）
`requires_eval`	必填	是否需要独立 Evaluator 评审，默认 `false` ，见判断标准

requires_eval
判断标准（新增任务时必须对照判断，不能不加思考直接填 false）：

设为

true

的条件，满足任意一条即需要评审：

这是一个新功能（不只是修 bug 或改配置）
涉及安全、权限、数据校验相关逻辑
预计会修改 3 个以上文件
任务描述里有"重构"或"架构调整"

设为

false

的条件（以下全部满足才可以跳过评审）：

纯 bug 修复，改动范围明确
文档更新、注释补充
配置调整、环境变量修改
单元测试补充

如何确定初始任务列表：

优先从以下来源提取：

```
docs/exec-plans/active/
```
里的计划文件（如果有）
```
docs/exec-plans/tech-debt-tracker.md
```
里的高优先级债务
README 里提到的 TODO 或路线图
询问用户：「当前最想推进的 3-5 个任务是什么？」

写作要求：

```
verify
```
字段必须是可执行的步骤，不能写"确认功能正常"这种废话
任务粒度：一个任务应该在 1-2 小时内完成，太大的拆分
初始状态：所有任务都是
```
pending
```
，由 agent 工作时更新

询问用户（如果无法从现有文档推断任务）：

我已经扫描了项目，准备创建任务清单。请告诉我：当前最想推进的 3-5 个任务是什么？每个任务用一句话描述就行。

Structure Design:

json

{
  "project": "[Project Name]",
  "last_updated": "[Today's Date, Format YYYY-MM-DD]",
  "current_focus": "[The most important thing right now, one sentence]",
  "tasks": [
    {
      "id": "[Module Abbreviation]-[Serial Number]",
      "title": "[Task Title]",
      "description": "[What to do specifically, 1-3 sentences]",
      "status": "pending | in_progress | done | blocked",
      "priority": "high | medium | low",
      "blocked_by": "[Blocking reason, only fill in when status is blocked]",
      "verify": "[How to verify this task is completed]",
      "requires_eval": false
    }
  ]
}

Field Explanation (Must fill in each field when adding a new task, cannot omit):

Field	Required	Explanation
`id`	Required	Module abbreviation + serial number, e.g., `auth-01` , `ui-03` , short and readable
`title`	Required	Task title, one sentence
`description`	Required	What to do specifically, 1-3 sentences
`status`	Required	Initial value is `pending` , updated by the agent during work
`priority`	Required	`high / medium / low`
`blocked_by`	Only when blocked	Blocking reason
`verify`	Required	How to verify completion, must be executable steps (commands or operations)
`requires_eval`	Required	Whether independent Evaluator review is needed, default `false` , see judgment criteria

requires_eval
Judgment Criteria (Must compare against when adding tasks, cannot fill false without thinking):

Set to

true

if any of the following conditions are met:

This is a new feature (not just bug fixes or configuration changes)
Involves security, permissions, or data validation-related logic
Expected to modify more than 3 files
The task description includes "refactoring" or "architecture adjustment"

Set to

false

only if all of the following conditions are met:

Pure bug fix with clear scope of changes
Document updates, comment additions
Configuration adjustments, environment variable modifications
Unit test additions

How to Determine the Initial Task List:

Prioritize extracting from the following sources:

Plan files in
```
docs/exec-plans/active/
```
(if available)
High-priority debts in
```
docs/exec-plans/tech-debt-tracker.md
```
TODOs or roadmaps mentioned in README
Ask the user: "What are the 3-5 tasks you most want to advance right now?"

Writing Requirements:

The
```
verify
```
field must be executable steps, cannot write vague phrases like "confirm functionality is normal"
Task granularity: A task should be completed within 1-2 hours, split larger tasks
Initial status: All tasks are
```
pending
```
, updated by the agent during work

Ask the User (If tasks cannot be inferred from existing documents):

I have scanned the project and am ready to create the task list. Please tell me: What are the 3-5 tasks you most want to advance right now? Just describe each task in one sentence.

Step 4：创建

progress.md

Step 4: Create

progress.md

初始内容：

markdown

undefined

Initial content:

markdown

undefined

项目进度记录

Project Progress Record

每次 session 完成任务后，在顶部追加记录。不要删除历史。格式：## [日期] [任务名]

After completing tasks in each session, append records at the top. Do not delete history. Format: ## [Date] [Task Name]

[今天日期] 初始化 Harness

[Today's Date] Initialize Harness

完成 harness-step1：建立 docs/ 骨架
完成 harness-step2：填充知识库内容
完成 harness-step3：建立状态管理
tasks.json 初始任务数：[N] 个
下次从这里开始：读 tasks.json，选 priority=high 且 status=pending 的任务开始

---

Completed harness-step1: Established docs/ skeleton
Completed harness-step2: Filled knowledge base content
Completed harness-step3: Established state management
Number of initial tasks in tasks.json: [N]
Next start here: Read tasks.json, select tasks with priority=high and status=pending to start

---

Step 4b：更新

AGENTS.md

—— 写入任务管理规则

Step 4b: Update

AGENTS.md

— Write Task Management Rules

找到

AGENTS.md

里 step2 写入的"每次完成一个任务后"部分，替换为以下内容：

markdown

undefined

Find the section "After completing each task" written in step2 of

AGENTS.md

, replace it with the following content:

markdown

undefined

新增任务时，必须：

When adding tasks, you must:

填写 tasks.json 里的所有字段，不能省略
对照以下标准判断
```
requires_eval
```
，不能默认填 false 不加思考：
- 新功能 / 涉及安全权限 / 改动超过 3 个文件 / 重构 → true
- 纯 bug 修复 / 文档更新 / 配置调整 → false

Fill in all fields in tasks.json, cannot omit
Judge
```
requires_eval
```
against the following criteria, cannot default to false without thinking:
- New features / involving security permissions / modifying more than 3 files / refactoring → true
- Pure bug fixes / document updates / configuration adjustments → false

每次完成一个任务后，必须按顺序执行：

After completing each task, you must execute in order:

执行
```
tasks.json
```
里该任务
```
verify
```
字段描述的验证步骤
若该任务
```
requires_eval
```
为
```
true
```
：填写
```
sprint_output.md
```
，等待 Evaluator 评审通过后才能标记
```
done
```
若该任务
```
requires_eval
```
为
```
false
```
：验证通过即可标记
```
done
```

git commit，格式：

type(scope): 做了什么，遗留了什么（如有）

在
```
progress.md
```
顶部追加本次记录

禁止：跳过 verify 步骤自行判断任务已完成。禁止：不经判断直接把

requires_eval

设为 false。

---

Perform the verification steps described in the
```
verify
```
field of the task in tasks.json
If the task's
```
requires_eval
```
is
```
true
```
: Fill in
```
sprint_output.md
```
, wait for Evaluator review approval before marking as
```
done
```
If the task's
```
requires_eval
```
is
```
false
```
: Mark as
```
done
```
once verification passes

git commit, format:

type(scope): What was done, any leftovers (if applicable)

Append this session's record at the top of
```
progress.md
```

Prohibited: Skipping the verify step and judging task completion on your own. Prohibited: Setting

requires_eval

to false without judgment.

---

Step 5：验证整体联动

Step 5: Verify Overall Integration

三个文件创建完后，模拟一次完整的 session 启动流程，验证联动是否正常：

bash

undefined

After creating the three files, simulate a complete session startup process to verify normal integration:

bash

undefined

模拟 agent 新 session 开始的操作序列

Simulate the operation sequence when an agent starts a new session

echo "=== 模拟新 session 启动 ==="

echo "=== Simulating New Session Startup ==="

1. 跑 init.sh

1. Run init.sh

bash init.sh

2. 看 git log

2. Check git log

git log --oneline -10

3. 读 progress.md（确认文件存在且可读）

3. Read progress.md (Confirm the file exists and is readable)

head -20 progress.md

4. 读 tasks.json（确认格式正确）

4. Read tasks.json (Confirm correct format)

cat tasks.json | python3 -m json.tool > /dev/null && echo "tasks.json 格式正确" || echo "tasks.json 格式有误"


全部通过才算完成。

---

cat tasks.json | python3 -m json.tool > /dev/null && echo "tasks.json format is correct" || echo "tasks.json format is incorrect"


Only pass all steps to complete.

---

质量检验

Quality Inspection

```
init.sh
```
实际运行无报错？
```
tasks.json
```
JSON 格式合法？每个任务都有
```
verify
```
和
```
requires_eval
```
字段？
每个任务的
```
requires_eval
```
是否对照判断标准填写，而非默认 false？
```
progress.md
```
有初始记录？
```
AGENTS.md
```
里是否包含新增任务和完成任务的两条规则？

Does
```
init.sh
```
run without errors?
Is
```
tasks.json
```
in valid JSON format? Does each task have
```
verify
```
and
```
requires_eval
```
fields?
Is
```
requires_eval
```
filled in according to the judgment criteria for each task, not defaulting to false?
Does
```
progress.md
```
have initial records?
Does
```
AGENTS.md
```
include the two rules for adding and completing tasks?

完成后告知用户

Inform User After Completion

输出摘要：

创建的文件：

```
init.sh
```
：[描述做了什么检查]
```
tasks.json
```
：[N] 个任务，其中需要 Evaluator 评审的 [N] 个
```
progress.md
```
：已初始化

如何使用：

现在你可以把项目交给 Claude Code 了。它每次启动时会自动读这三个文件 + git log，恢复工作状态。你不需要每次都解释"上次做到哪里了"。

需要你做的事：

检查
```
tasks.json
```
里的任务列表是否符合你的预期，可以手动增删
确认
```
requires_eval
```
的判断是否合理
如果
```
init.sh
```
有步骤失败，告诉我，我来修复

下一步：

Harness 地基已完成（step1 + step2 + step3）
可以开始正式使用 Claude Code 开发
遇到 agent 反复违反代码规范时，运行
```
harness-step4-linter
```
，把规则变成机械约束
遇到 agent 自评不可信时，运行
```
harness-step5-evaluator
```
，引入独立评审

Output summary:

Created Files:

```
init.sh
```
: [Describe what checks are performed]
```
tasks.json
```
: [N] tasks, [N] of which require Evaluator review
```
progress.md
```
: Initialized

How to Use:

Now you can hand the project over to Claude Code. It will automatically read these three files + git log each time it starts, and resume work status. You don't need to explain "where we left off last time" every time.

What You Need to Do:

Check if the task list in
```
tasks.json
```
meets your expectations, you can manually add or delete tasks
Confirm if the
```
requires_eval
```
judgments are reasonable
If any step in
```
init.sh
```
fails, let me know and I will fix it

Next Steps:

Harness foundation is completed (step1 + step2 + step3)
You can start formal development with Claude Code
When agents repeatedly violate code specifications, run
```
harness-step4-linter
```
to turn rules into mechanical constraints
When agent self-assessment is untrustworthy, run
```
harness-step5-evaluator
```
to introduce independent review