harness-step3-session-management

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Harness Step 3: 建立跨 Session 状态管理

Harness Step 3: Establish Cross-Session State Management

目标

Objectives

创建三个文件,让 agent 在任何新 session 开始时能在 30 秒内恢复工作状态:
  • init.sh
    :环境初始化脚本,验证项目可以正常启动
  • tasks.json
    :当前任务清单,agent 的工作指令来源
  • progress.md
    :人类可读的进度摘要,记录每次 session 的关键信息
核心原则:状态靠文件传递,不靠 agent 的记忆。git log 是主记录,这三个文件是辅助。

Create three files to enable agents to resume work status within 30 seconds at the start of any new session:
  • init.sh
    : Environment initialization script to verify the project can start normally
  • tasks.json
    : Current task list, the source of work instructions for agents
  • progress.md
    : Human-readable progress summary that records key information from each session
Core Principle: State is transferred via files, not agent memory. git log is the primary record, and these three files are supplementary.

执行步骤

Implementation Steps

Step 1:扫描项目启动方式

Step 1: Scan Project Startup Methods

在写
init.sh
之前,先确认项目如何启动和测试:
bash
undefined
Before writing
init.sh
, first confirm how the project starts and is tested:
bash
undefined

读 package.json 的 scripts(Node.js 项目)

Read scripts in package.json (Node.js projects)

cat package.json 2>/dev/null | grep -A 20 '"scripts"'
cat package.json 2>/dev/null | grep -A 20 '"scripts"'

或读 Makefile(多语言项目)

Or read Makefile (multi-language projects)

cat Makefile 2>/dev/null | head -40
cat Makefile 2>/dev/null | head -40

或读 pyproject.toml(Python 项目)

Or read pyproject.toml (Python projects)

cat pyproject.toml 2>/dev/null | grep -A 20 '[tool.poetry.scripts]'
cat pyproject.toml 2>/dev/null | grep -A 20 '[tool.poetry.scripts]'

确认现有 AGENTS.md 里的启动命令

Confirm startup commands in existing AGENTS.md

grep -A 5 '启动命令|start|dev|run' AGENTS.md 2>/dev/null

收集:
- 开发服务器启动命令
- 测试命令
- 类型检查/lint 命令(如果有)
- 有没有需要先跑的初始化步骤(如数据库迁移)

---
grep -A 5 '启动命令|start|dev|run' AGENTS.md 2>/dev/null

Collect:
- Development server startup command
- Test command
- Type checking/lint command (if available)
- Any initialization steps that need to be run first (such as database migration)

---

Step 2:创建
init.sh

Step 2: Create
init.sh

init.sh
的作用:每次 session 开始时运行,快速验证环境是否正常,不正常就立即修复再继续。
bash
#!/bin/bash
The role of
init.sh
: Run at the start of each session to quickly verify if the environment is normal, and fix issues immediately if not before proceeding.
bash
#!/bin/bash

init.sh — 每次 session 开始时运行

init.sh — Run at the start of each session

验证开发环境处于可工作状态

Verify that the development environment is in a working state

set -e # 任何步骤失败就停止
echo "=== 检查环境 ==="
set -e # Stop if any step fails
echo "=== Checking Environment ==="

1. 确认在正确目录

1. Confirm in the correct directory

echo "工作目录: $(pwd)"
echo "Working Directory: $(pwd)"

2. 安装依赖(如果 node_modules 不存在)

2. Install dependencies (if node_modules does not exist)

[根据技术栈选择,以下是示例]

[Choose based on tech stack, the following is an example]

Node.js:

Node.js:

if [ ! -d "node_modules" ]; then echo "安装依赖..." npm install fi
if [ ! -d "node_modules" ]; then echo "Installing dependencies..." npm install fi

3. 冒烟测试:验证项目能正常启动

3. Smoke test: Verify the project can start normally

[根据项目实际情况写,目标是用最快的方式验证基本功能正常]

[Write according to actual project situation, the goal is to verify basic functionality in the fastest way]

示例:跑一个最快的测试

Example: Run the fastest test

npm run test -- --testPathPattern=smoke 2>/dev/null || echo "警告:冒烟测试失败,请先修复"

npm run test -- --testPathPattern=smoke 2>/dev/null || echo "Warning: Smoke test failed, please fix first"

echo "=== 环境检查完成,可以开始工作 ===" echo "提示:运行 'git log --oneline -10' 查看最近工作历史"

**写作要求**:
- 根据扫描到的实际启动命令填写,不要留示例注释
- 冒烟测试要快(< 30秒),目的是快速发现环境问题,不是跑完整测试套件
- 如果项目有数据库,加一步检查数据库连接是否正常
- 写完后实际运行一遍,确认脚本无报错:`bash init.sh`

---
echo "=== Environment check completed, ready to start working ===" echo "Tip: Run 'git log --oneline -10' to view recent work history"

**Writing Requirements**:
- Fill in according to the actual startup commands scanned, do not leave example comments
- Smoke tests should be fast (< 30 seconds), aimed at quickly detecting environment issues rather than running the full test suite
- If the project has a database, add a step to check if the database connection is normal
- Run the script after writing to confirm no errors: `bash init.sh`

---

Step 3:创建
tasks.json

Step 3: Create
tasks.json

结构设计
json
{
  "project": "[项目名]",
  "last_updated": "[今天日期,格式 YYYY-MM-DD]",
  "current_focus": "[当前最重要的一件事,一句话]",
  "tasks": [
    {
      "id": "[模块缩写]-[序号]",
      "title": "[任务标题]",
      "description": "[具体做什么,1-3句话]",
      "status": "pending | in_progress | done | blocked",
      "priority": "high | medium | low",
      "blocked_by": "[阻塞原因,仅 blocked 状态时填写]",
      "verify": "[如何验证这个任务完成了]",
      "requires_eval": false
    }
  ]
}
字段说明(每次新增任务时必须逐字段填写,不能省略):
字段是否必填说明
id
必填模块缩写 + 序号,如
auth-01
ui-03
,简短可读
title
必填任务标题,一句话
description
必填具体做什么,1-3 句话
status
必填初始值为
pending
,由 agent 工作时更新
priority
必填
high / medium / low
blocked_by
仅 blocked 时填阻塞原因
verify
必填如何验证完成,必须是可执行的步骤(命令或操作)
requires_eval
必填是否需要独立 Evaluator 评审,默认
false
,见判断标准
requires_eval
判断标准
(新增任务时必须对照判断,不能不加思考直接填 false):
设为
true
的条件,满足任意一条即需要评审:
  • 这是一个新功能(不只是修 bug 或改配置)
  • 涉及安全、权限、数据校验相关逻辑
  • 预计会修改 3 个以上文件
  • 任务描述里有"重构"或"架构调整"
设为
false
的条件(以下全部满足才可以跳过评审):
  • 纯 bug 修复,改动范围明确
  • 文档更新、注释补充
  • 配置调整、环境变量修改
  • 单元测试补充
如何确定初始任务列表
优先从以下来源提取:
  1. docs/exec-plans/active/
    里的计划文件(如果有)
  2. docs/exec-plans/tech-debt-tracker.md
    里的高优先级债务
  3. README 里提到的 TODO 或路线图
  4. 询问用户:「当前最想推进的 3-5 个任务是什么?」
写作要求
  • verify
    字段必须是可执行的步骤,不能写"确认功能正常"这种废话
  • 任务粒度:一个任务应该在 1-2 小时内完成,太大的拆分
  • 初始状态:所有任务都是
    pending
    ,由 agent 工作时更新
询问用户(如果无法从现有文档推断任务):
我已经扫描了项目,准备创建任务清单。请告诉我: 当前最想推进的 3-5 个任务是什么? 每个任务用一句话描述就行。

Structure Design:
json
{
  "project": "[Project Name]",
  "last_updated": "[Today's Date, Format YYYY-MM-DD]",
  "current_focus": "[The most important thing right now, one sentence]",
  "tasks": [
    {
      "id": "[Module Abbreviation]-[Serial Number]",
      "title": "[Task Title]",
      "description": "[What to do specifically, 1-3 sentences]",
      "status": "pending | in_progress | done | blocked",
      "priority": "high | medium | low",
      "blocked_by": "[Blocking reason, only fill in when status is blocked]",
      "verify": "[How to verify this task is completed]",
      "requires_eval": false
    }
  ]
}
Field Explanation (Must fill in each field when adding a new task, cannot omit):
FieldRequiredExplanation
id
RequiredModule abbreviation + serial number, e.g.,
auth-01
,
ui-03
, short and readable
title
RequiredTask title, one sentence
description
RequiredWhat to do specifically, 1-3 sentences
status
RequiredInitial value is
pending
, updated by the agent during work
priority
Required
high / medium / low
blocked_by
Only when blockedBlocking reason
verify
RequiredHow to verify completion, must be executable steps (commands or operations)
requires_eval
RequiredWhether independent Evaluator review is needed, default
false
, see judgment criteria
requires_eval
Judgment Criteria
(Must compare against when adding tasks, cannot fill false without thinking):
Set to
true
if any of the following conditions are met:
  • This is a new feature (not just bug fixes or configuration changes)
  • Involves security, permissions, or data validation-related logic
  • Expected to modify more than 3 files
  • The task description includes "refactoring" or "architecture adjustment"
Set to
false
only if all of the following conditions are met:
  • Pure bug fix with clear scope of changes
  • Document updates, comment additions
  • Configuration adjustments, environment variable modifications
  • Unit test additions
How to Determine the Initial Task List:
Prioritize extracting from the following sources:
  1. Plan files in
    docs/exec-plans/active/
    (if available)
  2. High-priority debts in
    docs/exec-plans/tech-debt-tracker.md
  3. TODOs or roadmaps mentioned in README
  4. Ask the user: "What are the 3-5 tasks you most want to advance right now?"
Writing Requirements:
  • The
    verify
    field must be executable steps, cannot write vague phrases like "confirm functionality is normal"
  • Task granularity: A task should be completed within 1-2 hours, split larger tasks
  • Initial status: All tasks are
    pending
    , updated by the agent during work
Ask the User (If tasks cannot be inferred from existing documents):
I have scanned the project and am ready to create the task list. Please tell me: What are the 3-5 tasks you most want to advance right now? Just describe each task in one sentence.

Step 4:创建
progress.md

Step 4: Create
progress.md

初始内容:
markdown
undefined
Initial content:
markdown
undefined

项目进度记录

Project Progress Record

每次 session 完成任务后,在顶部追加记录。不要删除历史。 格式:## [日期] [任务名]

After completing tasks in each session, append records at the top. Do not delete history. Format: ## [Date] [Task Name]

[今天日期] 初始化 Harness

[Today's Date] Initialize Harness

  • 完成 harness-step1:建立 docs/ 骨架
  • 完成 harness-step2:填充知识库内容
  • 完成 harness-step3:建立状态管理
  • tasks.json 初始任务数:[N] 个
  • 下次从这里开始:读 tasks.json,选 priority=high 且 status=pending 的任务开始

---
  • Completed harness-step1: Established docs/ skeleton
  • Completed harness-step2: Filled knowledge base content
  • Completed harness-step3: Established state management
  • Number of initial tasks in tasks.json: [N]
  • Next start here: Read tasks.json, select tasks with priority=high and status=pending to start

---

Step 4b:更新
AGENTS.md
—— 写入任务管理规则

Step 4b: Update
AGENTS.md
— Write Task Management Rules

找到
AGENTS.md
里 step2 写入的"每次完成一个任务后"部分,替换为以下内容:
markdown
undefined
Find the section "After completing each task" written in step2 of
AGENTS.md
, replace it with the following content:
markdown
undefined

新增任务时,必须:

When adding tasks, you must:

  1. 填写 tasks.json 里的所有字段,不能省略
  2. 对照以下标准判断
    requires_eval
    ,不能默认填 false 不加思考:
    • 新功能 / 涉及安全权限 / 改动超过 3 个文件 / 重构 → true
    • 纯 bug 修复 / 文档更新 / 配置调整 → false
  1. Fill in all fields in tasks.json, cannot omit
  2. Judge
    requires_eval
    against the following criteria, cannot default to false without thinking:
    • New features / involving security permissions / modifying more than 3 files / refactoring → true
    • Pure bug fixes / document updates / configuration adjustments → false

每次完成一个任务后,必须按顺序执行:

After completing each task, you must execute in order:

  1. 执行
    tasks.json
    里该任务
    verify
    字段描述的验证步骤
  2. 若该任务
    requires_eval
    true
    :填写
    sprint_output.md
    ,等待 Evaluator 评审通过后才能标记
    done
    若该任务
    requires_eval
    false
    :验证通过即可标记
    done
  3. git commit,格式:
    type(scope): 做了什么,遗留了什么(如有)
  4. progress.md
    顶部追加本次记录
禁止:跳过 verify 步骤自行判断任务已完成。 禁止:不经判断直接把
requires_eval
设为 false。

---
  1. Perform the verification steps described in the
    verify
    field of the task in tasks.json
  2. If the task's
    requires_eval
    is
    true
    : Fill in
    sprint_output.md
    , wait for Evaluator review approval before marking as
    done
    If the task's
    requires_eval
    is
    false
    : Mark as
    done
    once verification passes
  3. git commit, format:
    type(scope): What was done, any leftovers (if applicable)
  4. Append this session's record at the top of
    progress.md
Prohibited: Skipping the verify step and judging task completion on your own. Prohibited: Setting
requires_eval
to false without judgment.

---

Step 5:验证整体联动

Step 5: Verify Overall Integration

三个文件创建完后,模拟一次完整的 session 启动流程,验证联动是否正常:
bash
undefined
After creating the three files, simulate a complete session startup process to verify normal integration:
bash
undefined

模拟 agent 新 session 开始的操作序列

Simulate the operation sequence when an agent starts a new session

echo "=== 模拟新 session 启动 ==="
echo "=== Simulating New Session Startup ==="

1. 跑 init.sh

1. Run init.sh

bash init.sh
bash init.sh

2. 看 git log

2. Check git log

git log --oneline -10
git log --oneline -10

3. 读 progress.md(确认文件存在且可读)

3. Read progress.md (Confirm the file exists and is readable)

head -20 progress.md
head -20 progress.md

4. 读 tasks.json(确认格式正确)

4. Read tasks.json (Confirm correct format)

cat tasks.json | python3 -m json.tool > /dev/null && echo "tasks.json 格式正确" || echo "tasks.json 格式有误"

全部通过才算完成。

---
cat tasks.json | python3 -m json.tool > /dev/null && echo "tasks.json format is correct" || echo "tasks.json format is incorrect"

Only pass all steps to complete.

---

质量检验

Quality Inspection

  • init.sh
    实际运行无报错?
  • tasks.json
    JSON 格式合法?每个任务都有
    verify
    requires_eval
    字段?
  • 每个任务的
    requires_eval
    是否对照判断标准填写,而非默认 false?
  • progress.md
    有初始记录?
  • AGENTS.md
    里是否包含新增任务和完成任务的两条规则?

  • Does
    init.sh
    run without errors?
  • Is
    tasks.json
    in valid JSON format? Does each task have
    verify
    and
    requires_eval
    fields?
  • Is
    requires_eval
    filled in according to the judgment criteria for each task, not defaulting to false?
  • Does
    progress.md
    have initial records?
  • Does
    AGENTS.md
    include the two rules for adding and completing tasks?

完成后告知用户

Inform User After Completion

输出摘要:
创建的文件
  • init.sh
    :[描述做了什么检查]
  • tasks.json
    :[N] 个任务,其中需要 Evaluator 评审的 [N] 个
  • progress.md
    :已初始化
如何使用
现在你可以把项目交给 Claude Code 了。它每次启动时会自动读这三个文件 + git log, 恢复工作状态。你不需要每次都解释"上次做到哪里了"。
需要你做的事
  • 检查
    tasks.json
    里的任务列表是否符合你的预期,可以手动增删
  • 确认
    requires_eval
    的判断是否合理
  • 如果
    init.sh
    有步骤失败,告诉我,我来修复
下一步
  • Harness 地基已完成(step1 + step2 + step3)
  • 可以开始正式使用 Claude Code 开发
  • 遇到 agent 反复违反代码规范时,运行
    harness-step4-linter
    ,把规则变成机械约束
  • 遇到 agent 自评不可信时,运行
    harness-step5-evaluator
    ,引入独立评审
Output summary:
Created Files:
  • init.sh
    : [Describe what checks are performed]
  • tasks.json
    : [N] tasks, [N] of which require Evaluator review
  • progress.md
    : Initialized
How to Use:
Now you can hand the project over to Claude Code. It will automatically read these three files + git log each time it starts, and resume work status. You don't need to explain "where we left off last time" every time.
What You Need to Do:
  • Check if the task list in
    tasks.json
    meets your expectations, you can manually add or delete tasks
  • Confirm if the
    requires_eval
    judgments are reasonable
  • If any step in
    init.sh
    fails, let me know and I will fix it
Next Steps:
  • Harness foundation is completed (step1 + step2 + step3)
  • You can start formal development with Claude Code
  • When agents repeatedly violate code specifications, run
    harness-step4-linter
    to turn rules into mechanical constraints
  • When agent self-assessment is untrustworthy, run
    harness-step5-evaluator
    to introduce independent review