evolving-ai-agents

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Evolving AI Agents with A-Evolve

使用A-Evolve进化AI Agent

Overview

概述

A-Evolve is universal infrastructure for evolving any AI agent across any domain using any evolution algorithm with zero manual engineering. It represents all evolvable agent state as files (prompts, skills, memory, tools), runs iterative solve-observe-evolve cycles against benchmarks, and uses LLM-driven mutation to improve agent performance automatically.
Benchmark results (Claude Opus 4.6):
  • MCP-Atlas: 79.4% (#1)
  • SWE-bench Verified: 76.8% (~#5)
  • Terminal-Bench 2.0: 76.5% (~#7)
  • SkillsBench: 34.9% (#2)
A-Evolve是一款通用基础设施,无需手动工程即可借助任意进化算法在任意领域进化任意AI Agent。它将所有可进化的Agent状态以文件形式存储(提示词、技能、记忆、工具),针对基准测试运行迭代式的解决-观察-进化循环,并通过LLM驱动的变异来自动提升Agent性能。
基准测试结果(Claude Opus 4.6):
  • MCP-Atlas: 79.4% (#1)
  • SWE-bench Verified: 76.8% (~#5)
  • Terminal-Bench 2.0: 76.5% (~#7)
  • SkillsBench: 34.9% (#2)

When to Use A-Evolve

适用场景

Use A-Evolve when:
  • Optimizing agent prompts, skills, or memory against a measurable benchmark
  • Building self-improving agents with automated gating and rollback
  • Evolving domain-specific tool usage and procedures through LLM-driven mutation
  • Running iterative solve-observe-evolve loops to maximize agent performance
  • Needing reproducible, git-versioned evolution history for every change
Key differentiator: Other frameworks build agents; A-Evolve optimizes them. It sits on top of any agent framework and makes it better through automated evolution.
Do NOT use A-Evolve for:
  • Building multi-agent orchestration from scratch (use CrewAI, LangGraph)
  • One-shot agent tasks with no iteration needed (use LangChain, LlamaIndex)
  • RAG pipeline optimization (use LlamaIndex, Chroma)
  • Prompt-only optimization without skill/memory evolution (use DSPy)
在以下场景使用A-Evolve:
  • 针对可衡量的基准测试优化Agent提示词、技能或记忆
  • 构建带有自动化选通与回滚机制的自我改进型Agent
  • 通过LLM驱动的变异进化特定领域的工具使用方法与流程
  • 运行迭代式解决-观察-进化循环以最大化Agent性能
  • 需要为每一次变更保留可复现、基于Git版本控制的进化历史
核心差异化优势:其他框架用于_构建_Agent;而A-Evolve用于_优化_Agent。它可以部署在任意Agent框架之上,通过自动化进化提升其性能。
请勿在以下场景使用A-Evolve:
  • 从零开始构建多Agent编排系统(请使用CrewAI、LangGraph)
  • 无需迭代的一次性Agent任务(请使用LangChain、LlamaIndex)
  • RAG管道优化(请使用LlamaIndex、Chroma)
  • 仅优化提示词但不涉及技能/记忆进化的场景(请使用DSPy)

Quick Start

快速开始

Installation

安装

bash
pip install a-evolve                    # Core
pip install a-evolve[anthropic]         # With Claude support
pip install a-evolve[all]               # All providers
bash
pip install a-evolve                    # 核心包
pip install a-evolve[anthropic]         # 包含Claude支持
pip install a-evolve[all]               # 包含所有提供商支持

Three-Line Evolution

三行代码实现进化

python
import agent_evolve as ae

evolver = ae.Evolver(agent="swe", benchmark="swe-verified")
results = evolver.run(cycles=10)
print(f"Final score: {results.final_score}")
This copies the built-in SWE seed workspace, runs 10 evolution cycles against SWE-bench Verified, and returns the optimized agent.
python
import agent_evolve as ae

evolver = ae.Evolver(agent="swe", benchmark="swe-verified")
results = evolver.run(cycles=10)
print(f"最终得分: {results.final_score}")
这段代码会复制内置的SWE种子工作区,针对SWE-bench Verified运行10轮进化循环,并返回优化后的Agent。

Core Concepts

核心概念

The Agent Workspace

Agent工作区

All evolvable state lives as files in a workspace directory:
my-agent/
├── manifest.yaml          # Metadata + entrypoint
├── prompts/
│   ├── system.md          # Main system prompt (evolved)
│   └── fragments/         # Modular prompt pieces
├── skills/
│   └── skill-name/
│       └── SKILL.md       # Reusable procedure with frontmatter
├── memory/
│   ├── episodic.jsonl     # Lessons from failures
│   └── semantic.jsonl     # General knowledge
├── tools/
│   ├── registry.yaml      # Tool manifest
│   └── tool_name.py       # Tool implementations
└── evolution/             # Managed by engine (metrics, history)
所有可进化状态都存储在工作区目录的文件中:
my-agent/
├── manifest.yaml          # 元数据+入口点
├── prompts/
│   ├── system.md          # 主系统提示词(可进化)
│   └── fragments/         # 模块化提示词片段
├── skills/
│   └── skill-name/
│       └── SKILL.md       # 带前置元数据的可复用流程
├── memory/
│   ├── episodic.jsonl     # 从失败中总结的经验
│   └── semantic.jsonl     # 通用知识
├── tools/
│   ├── registry.yaml      # 工具清单
│   └── tool_name.py       # 工具实现代码
└── evolution/             # 由引擎管理(指标、历史记录)

The Evolution Loop

进化循环

Each cycle follows five phases:
  1. Solve — Agent processes a batch of tasks from the benchmark
  2. Observe — Benchmark evaluates trajectories, producing (task, trajectory, feedback) triples
  3. Evolve — Evolution engine mutates workspace files based on observations
  4. Gate — Validate mutations (git snapshot before/after for rollback)
  5. Reload — Agent reinitializes from evolved filesystem state
每一轮循环包含五个阶段:
  1. 解决 — Agent处理基准测试中的一批任务
  2. 观察 — 基准测试评估执行轨迹,生成(任务、轨迹、反馈)三元组
  3. 进化 — 进化引擎基于观察结果变异工作区文件
  4. 选通 — 验证变异结果(变异前后的Git快照用于回滚)
  5. 重载 — Agent从进化后的文件系统状态重新初始化

Three Pluggable Interfaces

三个可插拔接口

python
undefined
python
undefined

1. Agent — implements solve()

1. Agent — 实现solve()方法

class MyAgent(ae.BaseAgent): def solve(self, task: ae.Task) -> ae.Trajectory: # Domain-specific solving logic return ae.Trajectory(task_id=task.id, output=result, steps=steps)
class MyAgent(ae.BaseAgent): def solve(self, task: ae.Task) -> ae.Trajectory: # 特定领域的任务解决逻辑 return ae.Trajectory(task_id=task.id, output=result, steps=steps)

2. Benchmark — implements get_tasks() and evaluate()

2. Benchmark — 实现get_tasks()和evaluate()方法

class MyBenchmark(ae.BenchmarkAdapter): def get_tasks(self, split="train", limit=None) -> list[ae.Task]: return [ae.Task(id="1", input="...")]
def evaluate(self, task: ae.Task, trajectory: ae.Trajectory) -> ae.Feedback:
    return ae.Feedback(success=True, score=0.95, detail="Passed")
class MyBenchmark(ae.BenchmarkAdapter): def get_tasks(self, split="train", limit=None) -> list[ae.Task]: return [ae.Task(id="1", input="...")]
def evaluate(self, task: ae.Task, trajectory: ae.Trajectory) -> ae.Feedback:
    return ae.Feedback(success=True, score=0.95, detail="通过测试")

3. Engine — implements step()

3. Engine — 实现step()方法

class MyEngine(ae.EvolutionEngine): def step(self, workspace, observations, history, trial): # Mutate workspace based on observations return ae.StepResult(mutated=True, summary="Updated prompts")
undefined
class MyEngine(ae.EvolutionEngine): def step(self, workspace, observations, history, trial): # 基于观察结果变异工作区 return ae.StepResult(mutated=True, summary="已更新提示词")
undefined

Workflow 1: Evolve an Existing Agent

工作流1:进化现有Agent

Use when: You have a working agent and want to optimize it against a benchmark.
Critical Requirements:
  • Agent implements
    BaseAgent.solve()
    returning
    Trajectory
  • Benchmark implements
    BenchmarkAdapter
    with
    get_tasks()
    and
    evaluate()
  • Seed workspace has
    manifest.yaml
    with entrypoint and evolvable layers
  • System prompt exists at
    prompts/system.md
  • Workspace is a git repo (run
    git init && git add -A && git commit -m "init"
    )
适用场景:已有可用Agent,希望针对基准测试对其进行优化。
关键要求:
  • Agent实现
    BaseAgent.solve()
    方法并返回
    Trajectory
  • Benchmark实现
    BenchmarkAdapter
    接口,包含
    get_tasks()
    evaluate()
    方法
  • 种子工作区包含带有入口点和可进化层的
    manifest.yaml
  • 系统提示词存在于
    prompts/system.md
    路径
  • 工作区是Git仓库(执行
    git init && git add -A && git commit -m "init"

Steps

步骤

python
import agent_evolve as ae
python
import agent_evolve as ae

Configure evolution parameters

配置进化参数

config = ae.EvolveConfig( batch_size=10, # Tasks per solve round max_cycles=20, # Maximum evolution iterations evolve_prompts=True, # Mutate system prompt evolve_skills=True, # Discover and refine skills evolve_memory=True, # Build episodic memory evolver_model="us.anthropic.claude-opus-4-6-v1", )
config = ae.EvolveConfig( batch_size=10, # 每轮解决的任务数量 max_cycles=20, # 最大进化迭代次数 evolve_prompts=True, # 变异系统提示词 evolve_skills=True, # 发现并优化技能 evolve_memory=True, # 构建情景记忆 evolver_model="us.anthropic.claude-opus-4-6-v1", )

Point to your agent workspace and benchmark

指定Agent工作区和基准测试

evolver = ae.Evolver( agent="./my-agent-workspace", benchmark="swe-verified", # Or custom BenchmarkAdapter instance config=config, )
evolver = ae.Evolver( agent="./my-agent-workspace", benchmark="swe-verified", # 或自定义BenchmarkAdapter实例 config=config, )

Run evolution

运行进化

results = evolver.run(cycles=10)
results = evolver.run(cycles=10)

Inspect results

查看结果

print(f"Cycles completed: {results.cycles_completed}") print(f"Final score: {results.final_score}") print(f"Converged: {results.converged}") for cycle_num, score in enumerate(results.score_history): print(f" Cycle {cycle_num + 1}: {score:.3f}")
undefined
print(f"完成循环数: {results.cycles_completed}") print(f"最终得分: {results.final_score}") print(f"已收敛: {results.converged}") for cycle_num, score in enumerate(results.score_history): print(f" 第 {cycle_num + 1} 轮: {score:.3f}")
undefined

Post-Evolution

进化后操作

The workspace is now optimized. Inspect what changed:
bash
cd my-agent-workspace
git log --oneline              # See evo-1, evo-2, ... tags
git diff evo-1 evo-10          # Compare first and last evolution
cat prompts/system.md          # Read evolved prompt
ls skills/                     # See discovered skills
工作区现已完成优化。查看变更内容:
bash
cd my-agent-workspace
git log --oneline              # 查看evo-1、evo-2等标签
git diff evo-1 evo-10          # 对比第一轮和第十轮的进化结果
cat prompts/system.md          # 查看进化后的提示词
ls skills/                     # 查看发现的技能

Workflow 2: Add a Custom Benchmark

工作流2:添加自定义基准测试

Use when: You want to evolve agents on your own domain-specific tasks.
Critical Requirements:
  • Define task format (inputs, expected outputs)
  • Implement scoring logic (0.0–1.0 scale)
  • Prepare task dataset (train + holdout split)
适用场景:希望在自定义的特定领域任务上进化Agent。
关键要求:
  • 定义任务格式(输入、预期输出)
  • 实现评分逻辑(0.0–1.0范围)
  • 准备任务数据集(训练集+验证集拆分)

Steps

步骤

python
import agent_evolve as ae

class CodeReviewBenchmark(ae.BenchmarkAdapter):
    """Evaluate agents on code review quality."""

    def get_tasks(self, split="train", limit=None):
        tasks = load_review_dataset(split)
        if limit:
            tasks = tasks[:limit]
        return [
            ae.Task(id=t["id"], input=t["diff"], metadata={"expected": t["comments"]})
            for t in tasks
        ]

    def evaluate(self, task, trajectory):
        expected = task.metadata["expected"]
        actual = trajectory.output
        precision, recall = compute_review_metrics(expected, actual)
        f1 = 2 * precision * recall / (precision + recall + 1e-9)
        return ae.Feedback(
            success=f1 > 0.7,
            score=f1,
            detail=f"P={precision:.2f} R={recall:.2f} F1={f1:.2f}",
        )
python
import agent_evolve as ae

class CodeReviewBenchmark(ae.BenchmarkAdapter):
    """评估Agent的代码审查质量。"""

    def get_tasks(self, split="train", limit=None):
        tasks = load_review_dataset(split)
        if limit:
            tasks = tasks[:limit]
        return [
            ae.Task(id=t["id"], input=t["diff"], metadata={"expected": t["comments"]})
            for t in tasks
        ]

    def evaluate(self, task, trajectory):
        expected = task.metadata["expected"]
        actual = trajectory.output
        precision, recall = compute_review_metrics(expected, actual)
        f1 = 2 * precision * recall / (precision + recall + 1e-9)
        return ae.Feedback(
            success=f1 > 0.7,
            score=f1,
            detail=f"精确率={precision:.2f} 召回率={recall:.2f} F1={f1:.2f}",
        )

Use with any agent

与任意Agent配合使用

evolver = ae.Evolver(agent="./my-agent", benchmark=CodeReviewBenchmark()) results = evolver.run(cycles=5)
undefined
evolver = ae.Evolver(agent="./my-agent", benchmark=CodeReviewBenchmark()) results = evolver.run(cycles=5)
undefined

Workflow 3: Create a Custom Evolution Engine

工作流3:创建自定义进化引擎

Use when: The default LLM-driven mutation doesn't suit your domain.
适用场景:默认的LLM驱动变异不适合你的领域需求。

Steps

步骤

python
import agent_evolve as ae

class RuleBasedEngine(ae.EvolutionEngine):
    def step(self, workspace, observations, history, trial):
        failures = [o for o in observations if not o.feedback.success]
        if not failures:
            return ae.StepResult(mutated=False, summary="No failures to address")

        # Analyze failure patterns
        error_types = categorize_errors(failures)
        prompt = workspace.read_prompt()

        # Append learned rules to prompt
        new_rules = generate_rules(error_types)
        workspace.write_prompt(prompt + "\n" + new_rules)

        return ae.StepResult(
            mutated=True,
            summary=f"Added {len(new_rules)} rules from {len(failures)} failures",
        )

evolver = ae.Evolver(
    agent="./my-agent",
    benchmark="my-benchmark",
    engine=RuleBasedEngine(),
)
python
import agent_evolve as ae

class RuleBasedEngine(ae.EvolutionEngine):
    def step(self, workspace, observations, history, trial):
        failures = [o for o in observations if not o.feedback.success]
        if not failures:
            return ae.StepResult(mutated=False, summary="无失败案例需要处理")

        # 分析失败模式
        error_types = categorize_errors(failures)
        prompt = workspace.read_prompt()

        # 将学习到的规则追加到提示词中
        new_rules = generate_rules(error_types)
        workspace.write_prompt(prompt + "\n" + new_rules)

        return ae.StepResult(
            mutated=True,
            summary=f"从{len(failures)}个失败案例中添加了{len(new_rules)}条规则",
        )

evolver = ae.Evolver(
    agent="./my-agent",
    benchmark="my-benchmark",
    engine=RuleBasedEngine(),
)

Built-in Components

内置组件

Seed Agents

种子Agent

AgentDomainModelKey Feature
swe
SWE-benchClaude Opus 4.6Verify-fix loop, skill proposals
terminal
Terminal-BenchClaude Sonnet 4Concurrent timeout, env discovery
mcp
MCP-AtlasClaude Opus 4.6MCP server integration
Agent领域模型核心特性
swe
SWE-benchClaude Opus 4.6验证-修复循环、技能提案
terminal
Terminal-BenchClaude Sonnet 4并发超时、环境发现
mcp
MCP-AtlasClaude Opus 4.6MCP服务器集成

Benchmarks

基准测试

NameDomainMetric
swe-verified
Code patchingPass rate
mcp-atlas
Tool callingAccuracy
terminal2
Shell tasksPass rate
skill-bench
Multi-step proceduresAccuracy
arc-agi-3
Interactive gamesRHAE score
名称领域指标
swe-verified
代码补丁通过率
mcp-atlas
工具调用准确率
terminal2
Shell任务通过率
skill-bench
多步骤流程准确率
arc-agi-3
交互式游戏RHAE得分

Evolution Algorithms

进化算法

AlgorithmStrategyBest For
A-Evolve/SkillForgeLLM-driven workspace mutationGeneral-purpose
Guided SynthesisMemory-first, curated skillsSkill discovery
Adaptive EvolutionReward tracking, filtered observationsFine-grained control
Adaptive SkillSkill-centric refinementSkill-heavy domains
算法策略最佳适用场景
A-Evolve/SkillForgeLLM驱动的工作区变异通用场景
Guided Synthesis记忆优先、精选技能技能发现
Adaptive Evolution奖励跟踪、过滤观察结果细粒度控制
Adaptive Skill以技能为中心的优化技能密集型领域

Configuration Reference

配置参考

python
ae.EvolveConfig(
    batch_size=10,              # Tasks per solve round
    max_cycles=20,              # Max evolution iterations
    holdout_ratio=0.2,          # Test set split for gating
    evolve_prompts=True,        # Mutate system prompts
    evolve_skills=True,         # Discover/refine skills
    evolve_memory=True,         # Build episodic memory
    evolve_tools=False,         # Mutate tool implementations
    trajectory_only=False,      # Hide scores from evolver
    evolver_model="us.anthropic.claude-opus-4-6-v1",
    evolver_max_tokens=16384,
    egl_threshold=0.05,         # Convergence epsilon
    egl_window=3,               # Cycles for plateau detection
)
Convergence: Evolution stops early when score improvement is less than
egl_threshold
over the last
egl_window
cycles.
python
ae.EvolveConfig(
    batch_size=10,              # 每轮解决的任务数量
    max_cycles=20,              # 最大进化迭代次数
    holdout_ratio=0.2,          # 用于选通的测试集拆分比例
    evolve_prompts=True,        # 变异系统提示词
    evolve_skills=True,         # 发现/优化技能
    evolve_memory=True,         # 构建情景记忆
    evolve_tools=False,         # 变异工具实现
    trajectory_only=False,      # 向进化引擎隐藏得分
    evolver_model="us.anthropic.claude-opus-4-6-v1",
    evolver_max_tokens=16384,
    egl_threshold=0.05,         # 收敛阈值
    egl_window=3,               # 用于检测平台期的循环数
)
收敛逻辑:当最近
egl_window
轮循环的得分提升幅度小于
egl_threshold
时,进化会提前停止。

Skill Format

技能格式

Skills are reusable procedures discovered and refined during evolution:
markdown
---
name: verify-edge-cases
description: "TRIGGER when: checking boundary conditions. DO NOT TRIGGER: for happy-path tests."
---
技能是在进化过程中发现并优化的可复用流程:
markdown
---
name: verify-edge-cases
description: "触发条件:检查边界情况。禁止触发:正常路径测试。"
---

Pattern

模式

Test all falsy-but-valid values: 0, False, "", [], {}
测试所有假值但有效的输入:0, False, "", [], {}

Process

流程

  1. List all input boundaries
  2. Run each against the implementation
  3. Check both output AND side effects

Skills accumulate in the workspace `skills/` directory. The evolver curates them: ACCEPT new skills, MERGE overlapping ones, SKIP redundant proposals. Target: 5–10 broad skills, not 30 narrow ones.
  1. 列出所有输入边界
  2. 针对实现代码运行每一项测试
  3. 同时检查输出和副作用

技能会累积在工作区的`skills/`目录中。进化引擎会对其进行管理:接受新技能、合并重叠技能、跳过冗余提案。目标是保留5–10个通用技能,而非30个细分技能。

Common Issues

常见问题

Evolution score plateaus early

进化得分过早进入平台期

Cause: Batch size too small or evolver doesn't see enough failure diversity. Fix: Increase
batch_size
(try 15–20) and ensure benchmark tasks cover diverse failure modes. Set
trajectory_only=False
so the evolver sees scores.
原因:批量任务数量过小,或进化引擎未接触到足够多样的失败案例。 解决方法:增大
batch_size
(尝试15–20),确保基准测试任务覆盖多样的失败模式。设置
trajectory_only=False
,让进化引擎可以看到得分。

Agent workspace grows too large

Agent工作区体积过大

Cause: Skill library bloat from accepting every proposal. Fix: The default SkillForge engine curates skills automatically. If using a custom engine, implement merging logic to consolidate overlapping skills.
原因:接受所有技能提案导致技能库臃肿。 解决方法:默认的SkillForge引擎会自动管理技能。如果使用自定义引擎,需实现合并逻辑来整合重叠技能。

Git conflicts during evolution

进化过程中出现Git冲突

Cause: Multiple evolution runs on the same workspace. Fix: Each
evolver.run()
should operate on its own workspace copy. Use
Evolver(agent="seed-name")
to auto-copy the seed each time.
原因:在同一工作区运行多次进化任务。 解决方法:每次
evolver.run()
应在独立的工作区副本上运行。使用
Evolver(agent="seed-name")
来自动复制种子工作区。

LLM provider errors during evolution

进化过程中出现LLM提供商错误

Cause: Rate limits or authentication issues with the evolver model. Fix: Check
evolver_model
config. For Bedrock, ensure AWS credentials are configured. For Anthropic, set
ANTHROPIC_API_KEY
.
原因:进化引擎模型遇到速率限制或认证问题。 解决方法:检查
evolver_model
配置。对于Bedrock,确保AWS凭证已配置。对于Anthropic,设置
ANTHROPIC_API_KEY
环境变量。

Custom agent not picking up evolved state

自定义Agent未加载进化后的状态

Cause: Agent doesn't implement
reload_from_fs()
. Fix: Override
reload_from_fs()
in your
BaseAgent
subclass to re-read prompts, skills, and memory from the workspace after each evolution cycle.
原因:Agent未实现
reload_from_fs()
方法。 解决方法:在
BaseAgent
子类中重写
reload_from_fs()
方法,以便在每轮进化循环后重新从工作区读取提示词、技能和记忆。

Usage Instructions for Agents

Agent使用说明

When this skill is loaded:
  1. Read this entire file before implementing any evolution workflow
  2. Start with the Quick Start — get a minimal evolution running before customizing
  3. Use built-in seeds when possible
    "swe"
    ,
    "terminal"
    ,
    "mcp"
    have battle-tested configurations
  4. Always initialize git in custom workspaces before running evolution
  5. Check convergence settings — default
    egl_threshold=0.05
    with
    egl_window=3
    may be too aggressive for your domain
  6. Inspect evolved state after each run — read
    prompts/system.md
    and
    skills/
    to understand what the evolver learned
Pro Tips:
  • Set
    trajectory_only=False
    (default) so the evolver sees scores — this accelerates learning
  • Start with
    batch_size=10
    and adjust based on task diversity
  • Use
    holdout_ratio=0.2
    to prevent overfitting to training tasks
  • After evolution,
    git diff evo-1 evo-N
    shows the cumulative effect of all mutations
  • If the evolver isn't finding skills, enrich
    feedback.detail
    strings with specific failure reasons
Warning Signs:
  • Score oscillating between cycles → benchmark evaluation may be non-deterministic
  • Skills directory growing past 15+ skills → engine isn't merging/curating properly
  • Prompt growing past 10K chars → evolution is appending without refactoring
  • converged=True
    after 2-3 cycles → increase
    egl_window
    and decrease
    egl_threshold
加载此技能时:
  1. 先阅读完整文档,再实现任何进化工作流
  2. 从快速开始入手 — 在自定义之前先运行最小化的进化示例
  3. 尽可能使用内置种子
    "swe"
    "terminal"
    "mcp"
    经过实战验证的配置
  4. 在自定义工作区中始终初始化Git,再运行进化
  5. 检查收敛设置 — 默认的
    egl_threshold=0.05
    egl_window=3
    可能对你的领域过于严格
  6. 每次运行后检查进化状态 — 查看
    prompts/system.md
    skills/
    目录,了解进化引擎学到的内容
专业提示:
  • 设置
    trajectory_only=False
    (默认值),让进化引擎可以看到得分——这会加速学习过程
  • batch_size=10
    开始,根据任务多样性进行调整
  • 使用
    holdout_ratio=0.2
    避免过拟合训练任务
  • 进化完成后,
    git diff evo-1 evo-N
    可显示所有变异的累积效果
  • 如果进化引擎未发现技能,可在
    feedback.detail
    字符串中补充具体的失败原因
警告信号:
  • 得分在循环间波动 → 基准测试评估可能存在非确定性
  • 技能目录超过15个技能 → 引擎未正确合并/管理技能
  • 提示词超过10K字符 → 进化仅在追加内容而未进行重构
  • 2-3轮循环后
    converged=True
    → 增大
    egl_window
    并减小
    egl_threshold

References

参考资料

  • Architecture deep dive: See references/architecture.md
  • API reference: See references/api.md
  • Step-by-step tutorials: See references/tutorials.md
  • Real-world examples: See references/examples.md
  • GitHub issues & solutions: See references/issues.md
  • Design patterns: See references/design-patterns.md
  • Release history: See references/releases.md
  • 架构深度解析: 查看references/architecture.md
  • API参考: 查看references/api.md
  • 分步教程: 查看references/tutorials.md
  • 实战示例: 查看references/examples.md
  • GitHub问题与解决方案: 查看references/issues.md
  • 设计模式: 查看references/design-patterns.md
  • 版本历史: 查看references/releases.md