evolving-ai-agents

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Evolving AI Agents with A-Evolve

使用A-Evolve进化AI Agent

Overview

概述

A-Evolve is universal infrastructure for evolving any AI agent across any domain using any evolution algorithm with zero manual engineering. It represents all evolvable agent state as files (prompts, skills, memory, tools), runs iterative solve-observe-evolve cycles against benchmarks, and uses LLM-driven mutation to improve agent performance automatically.

Benchmark results (Claude Opus 4.6):

MCP-Atlas: 79.4% (#1)
SWE-bench Verified: 76.8% (~#5)
Terminal-Bench 2.0: 76.5% (~#7)
SkillsBench: 34.9% (#2)

A-Evolve是一款通用基础设施，无需手动工程即可借助任意进化算法在任意领域进化任意AI Agent。它将所有可进化的Agent状态以文件形式存储（提示词、技能、记忆、工具），针对基准测试运行迭代式的解决-观察-进化循环，并通过LLM驱动的变异来自动提升Agent性能。

基准测试结果（Claude Opus 4.6）：

MCP-Atlas: 79.4% (#1)
SWE-bench Verified: 76.8% (~#5)
Terminal-Bench 2.0: 76.5% (~#7)
SkillsBench: 34.9% (#2)

When to Use A-Evolve

适用场景

Use A-Evolve when:

Optimizing agent prompts, skills, or memory against a measurable benchmark
Building self-improving agents with automated gating and rollback
Evolving domain-specific tool usage and procedures through LLM-driven mutation
Running iterative solve-observe-evolve loops to maximize agent performance
Needing reproducible, git-versioned evolution history for every change

Key differentiator: Other frameworks build agents; A-Evolve optimizes them. It sits on top of any agent framework and makes it better through automated evolution.

Do NOT use A-Evolve for:

Building multi-agent orchestration from scratch (use CrewAI, LangGraph)
One-shot agent tasks with no iteration needed (use LangChain, LlamaIndex)
RAG pipeline optimization (use LlamaIndex, Chroma)
Prompt-only optimization without skill/memory evolution (use DSPy)

在以下场景使用A-Evolve：

针对可衡量的基准测试优化Agent提示词、技能或记忆
构建带有自动化选通与回滚机制的自我改进型Agent
通过LLM驱动的变异进化特定领域的工具使用方法与流程
运行迭代式解决-观察-进化循环以最大化Agent性能
需要为每一次变更保留可复现、基于Git版本控制的进化历史

核心差异化优势：其他框架用于_构建_Agent；而A-Evolve用于_优化_Agent。它可以部署在任意Agent框架之上，通过自动化进化提升其性能。

请勿在以下场景使用A-Evolve：

从零开始构建多Agent编排系统（请使用CrewAI、LangGraph）
无需迭代的一次性Agent任务（请使用LangChain、LlamaIndex）
RAG管道优化（请使用LlamaIndex、Chroma）
仅优化提示词但不涉及技能/记忆进化的场景（请使用DSPy）

Quick Start

快速开始

Installation

安装

bash

pip install a-evolve                    # Core
pip install a-evolve[anthropic]         # With Claude support
pip install a-evolve[all]               # All providers

bash

pip install a-evolve                    # 核心包
pip install a-evolve[anthropic]         # 包含Claude支持
pip install a-evolve[all]               # 包含所有提供商支持

Three-Line Evolution

三行代码实现进化

python

import agent_evolve as ae

evolver = ae.Evolver(agent="swe", benchmark="swe-verified")
results = evolver.run(cycles=10)
print(f"Final score: {results.final_score}")

This copies the built-in SWE seed workspace, runs 10 evolution cycles against SWE-bench Verified, and returns the optimized agent.

python

import agent_evolve as ae

evolver = ae.Evolver(agent="swe", benchmark="swe-verified")
results = evolver.run(cycles=10)
print(f"最终得分: {results.final_score}")

这段代码会复制内置的SWE种子工作区，针对SWE-bench Verified运行10轮进化循环，并返回优化后的Agent。

Core Concepts

核心概念

The Agent Workspace

Agent工作区

All evolvable state lives as files in a workspace directory:

my-agent/
├── manifest.yaml          # Metadata + entrypoint
├── prompts/
│   ├── system.md          # Main system prompt (evolved)
│   └── fragments/         # Modular prompt pieces
├── skills/
│   └── skill-name/
│       └── SKILL.md       # Reusable procedure with frontmatter
├── memory/
│   ├── episodic.jsonl     # Lessons from failures
│   └── semantic.jsonl     # General knowledge
├── tools/
│   ├── registry.yaml      # Tool manifest
│   └── tool_name.py       # Tool implementations
└── evolution/             # Managed by engine (metrics, history)

所有可进化状态都存储在工作区目录的文件中：

my-agent/
├── manifest.yaml          # 元数据+入口点
├── prompts/
│   ├── system.md          # 主系统提示词（可进化）
│   └── fragments/         # 模块化提示词片段
├── skills/
│   └── skill-name/
│       └── SKILL.md       # 带前置元数据的可复用流程
├── memory/
│   ├── episodic.jsonl     # 从失败中总结的经验
│   └── semantic.jsonl     # 通用知识
├── tools/
│   ├── registry.yaml      # 工具清单
│   └── tool_name.py       # 工具实现代码
└── evolution/             # 由引擎管理（指标、历史记录）

The Evolution Loop

进化循环

Each cycle follows five phases:

Solve — Agent processes a batch of tasks from the benchmark
Observe — Benchmark evaluates trajectories, producing (task, trajectory, feedback) triples
Evolve — Evolution engine mutates workspace files based on observations
Gate — Validate mutations (git snapshot before/after for rollback)
Reload — Agent reinitializes from evolved filesystem state

每一轮循环包含五个阶段：

解决 — Agent处理基准测试中的一批任务
观察 — 基准测试评估执行轨迹，生成（任务、轨迹、反馈）三元组
进化 — 进化引擎基于观察结果变异工作区文件
选通 — 验证变异结果（变异前后的Git快照用于回滚）
重载 — Agent从进化后的文件系统状态重新初始化

Three Pluggable Interfaces

三个可插拔接口

python

undefined

python

undefined

1. Agent — implements solve()

1. Agent — 实现solve()方法

class MyAgent(ae.BaseAgent): def solve(self, task: ae.Task) -> ae.Trajectory: # Domain-specific solving logic return ae.Trajectory(task_id=task.id, output=result, steps=steps)

class MyAgent(ae.BaseAgent): def solve(self, task: ae.Task) -> ae.Trajectory: # 特定领域的任务解决逻辑 return ae.Trajectory(task_id=task.id, output=result, steps=steps)

2. Benchmark — implements get_tasks() and evaluate()

2. Benchmark — 实现get_tasks()和evaluate()方法

class MyBenchmark(ae.BenchmarkAdapter): def get_tasks(self, split="train", limit=None) -> list[ae.Task]: return [ae.Task(id="1", input="...")]

def evaluate(self, task: ae.Task, trajectory: ae.Trajectory) -> ae.Feedback:
    return ae.Feedback(success=True, score=0.95, detail="Passed")

class MyBenchmark(ae.BenchmarkAdapter): def get_tasks(self, split="train", limit=None) -> list[ae.Task]: return [ae.Task(id="1", input="...")]

def evaluate(self, task: ae.Task, trajectory: ae.Trajectory) -> ae.Feedback:
    return ae.Feedback(success=True, score=0.95, detail="通过测试")

3. Engine — implements step()

3. Engine — 实现step()方法

class MyEngine(ae.EvolutionEngine): def step(self, workspace, observations, history, trial): # Mutate workspace based on observations return ae.StepResult(mutated=True, summary="Updated prompts")

undefined

class MyEngine(ae.EvolutionEngine): def step(self, workspace, observations, history, trial): # 基于观察结果变异工作区 return ae.StepResult(mutated=True, summary="已更新提示词")

undefined

Workflow 1: Evolve an Existing Agent

工作流1：进化现有Agent

Use when: You have a working agent and want to optimize it against a benchmark.

Critical Requirements:

Agent implements
```
BaseAgent.solve()
```
returning
```
Trajectory
```
Benchmark implements
```
BenchmarkAdapter
```
with
```
get_tasks()
```
and
```
evaluate()
```
Seed workspace has
```
manifest.yaml
```
with entrypoint and evolvable layers
System prompt exists at
```
prompts/system.md
```

Workspace is a git repo (run

git init && git add -A && git commit -m "init"

)

适用场景：已有可用Agent，希望针对基准测试对其进行优化。

关键要求：

Agent实现
```
BaseAgent.solve()
```
方法并返回
```
Trajectory
```
Benchmark实现
```
BenchmarkAdapter
```
接口，包含
```
get_tasks()
```
和
```
evaluate()
```
方法
种子工作区包含带有入口点和可进化层的
```
manifest.yaml
```
系统提示词存在于
```
prompts/system.md
```
路径

工作区是Git仓库（执行

git init && git add -A && git commit -m "init"

）

Steps

步骤

python

import agent_evolve as ae

python

import agent_evolve as ae

Configure evolution parameters

配置进化参数

config = ae.EvolveConfig( batch_size=10, # Tasks per solve round max_cycles=20, # Maximum evolution iterations evolve_prompts=True, # Mutate system prompt evolve_skills=True, # Discover and refine skills evolve_memory=True, # Build episodic memory evolver_model="us.anthropic.claude-opus-4-6-v1", )

config = ae.EvolveConfig( batch_size=10, # 每轮解决的任务数量 max_cycles=20, # 最大进化迭代次数 evolve_prompts=True, # 变异系统提示词 evolve_skills=True, # 发现并优化技能 evolve_memory=True, # 构建情景记忆 evolver_model="us.anthropic.claude-opus-4-6-v1", )

Point to your agent workspace and benchmark

指定Agent工作区和基准测试

evolver = ae.Evolver( agent="./my-agent-workspace", benchmark="swe-verified", # Or custom BenchmarkAdapter instance config=config, )

evolver = ae.Evolver( agent="./my-agent-workspace", benchmark="swe-verified", # 或自定义BenchmarkAdapter实例 config=config, )

Run evolution

运行进化

results = evolver.run(cycles=10)

Inspect results

查看结果

print(f"Cycles completed: {results.cycles_completed}") print(f"Final score: {results.final_score}") print(f"Converged: {results.converged}") for cycle_num, score in enumerate(results.score_history): print(f" Cycle {cycle_num + 1}: {score:.3f}")

undefined

print(f"完成循环数: {results.cycles_completed}") print(f"最终得分: {results.final_score}") print(f"已收敛: {results.converged}") for cycle_num, score in enumerate(results.score_history): print(f" 第 {cycle_num + 1} 轮: {score:.3f}")

undefined

Post-Evolution

进化后操作

The workspace is now optimized. Inspect what changed:

bash

cd my-agent-workspace
git log --oneline              # See evo-1, evo-2, ... tags
git diff evo-1 evo-10          # Compare first and last evolution
cat prompts/system.md          # Read evolved prompt
ls skills/                     # See discovered skills

工作区现已完成优化。查看变更内容：

bash

cd my-agent-workspace
git log --oneline              # 查看evo-1、evo-2等标签
git diff evo-1 evo-10          # 对比第一轮和第十轮的进化结果
cat prompts/system.md          # 查看进化后的提示词
ls skills/                     # 查看发现的技能

Workflow 2: Add a Custom Benchmark

工作流2：添加自定义基准测试

Use when: You want to evolve agents on your own domain-specific tasks.

Critical Requirements:

Define task format (inputs, expected outputs)
Implement scoring logic (0.0–1.0 scale)
Prepare task dataset (train + holdout split)

适用场景：希望在自定义的特定领域任务上进化Agent。

关键要求：

定义任务格式（输入、预期输出）
实现评分逻辑（0.0–1.0范围）
准备任务数据集（训练集+验证集拆分）

Steps

步骤

python

import agent_evolve as ae

class CodeReviewBenchmark(ae.BenchmarkAdapter):
    """Evaluate agents on code review quality."""

    def get_tasks(self, split="train", limit=None):
        tasks = load_review_dataset(split)
        if limit:
            tasks = tasks[:limit]
        return [
            ae.Task(id=t["id"], input=t["diff"], metadata={"expected": t["comments"]})
            for t in tasks
        ]

    def evaluate(self, task, trajectory):
        expected = task.metadata["expected"]
        actual = trajectory.output
        precision, recall = compute_review_metrics(expected, actual)
        f1 = 2 * precision * recall / (precision + recall + 1e-9)
        return ae.Feedback(
            success=f1 > 0.7,
            score=f1,
            detail=f"P={precision:.2f} R={recall:.2f} F1={f1:.2f}",
        )

python

import agent_evolve as ae

class CodeReviewBenchmark(ae.BenchmarkAdapter):
    """评估Agent的代码审查质量。"""

    def get_tasks(self, split="train", limit=None):
        tasks = load_review_dataset(split)
        if limit:
            tasks = tasks[:limit]
        return [
            ae.Task(id=t["id"], input=t["diff"], metadata={"expected": t["comments"]})
            for t in tasks
        ]

    def evaluate(self, task, trajectory):
        expected = task.metadata["expected"]
        actual = trajectory.output
        precision, recall = compute_review_metrics(expected, actual)
        f1 = 2 * precision * recall / (precision + recall + 1e-9)
        return ae.Feedback(
            success=f1 > 0.7,
            score=f1,
            detail=f"精确率={precision:.2f} 召回率={recall:.2f} F1={f1:.2f}",
        )

Use with any agent

与任意Agent配合使用

evolver = ae.Evolver(agent="./my-agent", benchmark=CodeReviewBenchmark()) results = evolver.run(cycles=5)

undefined

evolver = ae.Evolver(agent="./my-agent", benchmark=CodeReviewBenchmark()) results = evolver.run(cycles=5)

undefined

Workflow 3: Create a Custom Evolution Engine

工作流3：创建自定义进化引擎

Use when: The default LLM-driven mutation doesn't suit your domain.

适用场景：默认的LLM驱动变异不适合你的领域需求。

Steps

步骤

python

import agent_evolve as ae

class RuleBasedEngine(ae.EvolutionEngine):
    def step(self, workspace, observations, history, trial):
        failures = [o for o in observations if not o.feedback.success]
        if not failures:
            return ae.StepResult(mutated=False, summary="No failures to address")

        # Analyze failure patterns
        error_types = categorize_errors(failures)
        prompt = workspace.read_prompt()

        # Append learned rules to prompt
        new_rules = generate_rules(error_types)
        workspace.write_prompt(prompt + "\n" + new_rules)

        return ae.StepResult(
            mutated=True,
            summary=f"Added {len(new_rules)} rules from {len(failures)} failures",
        )

evolver = ae.Evolver(
    agent="./my-agent",
    benchmark="my-benchmark",
    engine=RuleBasedEngine(),
)

python

import agent_evolve as ae

class RuleBasedEngine(ae.EvolutionEngine):
    def step(self, workspace, observations, history, trial):
        failures = [o for o in observations if not o.feedback.success]
        if not failures:
            return ae.StepResult(mutated=False, summary="无失败案例需要处理")

        # 分析失败模式
        error_types = categorize_errors(failures)
        prompt = workspace.read_prompt()

        # 将学习到的规则追加到提示词中
        new_rules = generate_rules(error_types)
        workspace.write_prompt(prompt + "\n" + new_rules)

        return ae.StepResult(
            mutated=True,
            summary=f"从{len(failures)}个失败案例中添加了{len(new_rules)}条规则",
        )

evolver = ae.Evolver(
    agent="./my-agent",
    benchmark="my-benchmark",
    engine=RuleBasedEngine(),
)

Built-in Components

内置组件

Seed Agents

种子Agent

Agent	Domain	Model	Key Feature
`swe`	SWE-bench	Claude Opus 4.6	Verify-fix loop, skill proposals
`terminal`	Terminal-Bench	Claude Sonnet 4	Concurrent timeout, env discovery
`mcp`	MCP-Atlas	Claude Opus 4.6	MCP server integration

Agent	领域	模型	核心特性
`swe`	SWE-bench	Claude Opus 4.6	验证-修复循环、技能提案
`terminal`	Terminal-Bench	Claude Sonnet 4	并发超时、环境发现
`mcp`	MCP-Atlas	Claude Opus 4.6	MCP服务器集成

Benchmarks

基准测试

Name	Domain	Metric
`swe-verified`	Code patching	Pass rate
`mcp-atlas`	Tool calling	Accuracy
`terminal2`	Shell tasks	Pass rate
`skill-bench`	Multi-step procedures	Accuracy
`arc-agi-3`	Interactive games	RHAE score

名称	领域	指标
`swe-verified`	代码补丁	通过率
`mcp-atlas`	工具调用	准确率
`terminal2`	Shell任务	通过率
`skill-bench`	多步骤流程	准确率
`arc-agi-3`	交互式游戏	RHAE得分

Evolution Algorithms

进化算法

Algorithm	Strategy	Best For
A-Evolve/SkillForge	LLM-driven workspace mutation	General-purpose
Guided Synthesis	Memory-first, curated skills	Skill discovery
Adaptive Evolution	Reward tracking, filtered observations	Fine-grained control
Adaptive Skill	Skill-centric refinement	Skill-heavy domains

算法	策略	最佳适用场景
A-Evolve/SkillForge	LLM驱动的工作区变异	通用场景
Guided Synthesis	记忆优先、精选技能	技能发现
Adaptive Evolution	奖励跟踪、过滤观察结果	细粒度控制
Adaptive Skill	以技能为中心的优化	技能密集型领域

Configuration Reference

配置参考

python

ae.EvolveConfig(
    batch_size=10,              # Tasks per solve round
    max_cycles=20,              # Max evolution iterations
    holdout_ratio=0.2,          # Test set split for gating
    evolve_prompts=True,        # Mutate system prompts
    evolve_skills=True,         # Discover/refine skills
    evolve_memory=True,         # Build episodic memory
    evolve_tools=False,         # Mutate tool implementations
    trajectory_only=False,      # Hide scores from evolver
    evolver_model="us.anthropic.claude-opus-4-6-v1",
    evolver_max_tokens=16384,
    egl_threshold=0.05,         # Convergence epsilon
    egl_window=3,               # Cycles for plateau detection
)

Convergence: Evolution stops early when score improvement is less than

egl_threshold

over the last

egl_window

cycles.

python

ae.EvolveConfig(
    batch_size=10,              # 每轮解决的任务数量
    max_cycles=20,              # 最大进化迭代次数
    holdout_ratio=0.2,          # 用于选通的测试集拆分比例
    evolve_prompts=True,        # 变异系统提示词
    evolve_skills=True,         # 发现/优化技能
    evolve_memory=True,         # 构建情景记忆
    evolve_tools=False,         # 变异工具实现
    trajectory_only=False,      # 向进化引擎隐藏得分
    evolver_model="us.anthropic.claude-opus-4-6-v1",
    evolver_max_tokens=16384,
    egl_threshold=0.05,         # 收敛阈值
    egl_window=3,               # 用于检测平台期的循环数
)

收敛逻辑：当最近

egl_window

轮循环的得分提升幅度小于

egl_threshold

时，进化会提前停止。

Skill Format

技能格式

Skills are reusable procedures discovered and refined during evolution:

markdown

---
name: verify-edge-cases
description: "TRIGGER when: checking boundary conditions. DO NOT TRIGGER: for happy-path tests."
---

技能是在进化过程中发现并优化的可复用流程：

markdown

---
name: verify-edge-cases
description: "触发条件：检查边界情况。禁止触发：正常路径测试。"
---

Pattern

模式

Test all falsy-but-valid values: 0, False, "", [], {}

测试所有假值但有效的输入：0, False, "", [], {}

Process

流程

List all input boundaries
Run each against the implementation
Check both output AND side effects


Skills accumulate in the workspace `skills/` directory. The evolver curates them: ACCEPT new skills, MERGE overlapping ones, SKIP redundant proposals. Target: 5–10 broad skills, not 30 narrow ones.

列出所有输入边界
针对实现代码运行每一项测试
同时检查输出和副作用


技能会累积在工作区的`skills/`目录中。进化引擎会对其进行管理：接受新技能、合并重叠技能、跳过冗余提案。目标是保留5–10个通用技能，而非30个细分技能。

Common Issues

常见问题

Evolution score plateaus early

进化得分过早进入平台期

Cause: Batch size too small or evolver doesn't see enough failure diversity. Fix: Increase

batch_size

(try 15–20) and ensure benchmark tasks cover diverse failure modes. Set

trajectory_only=False

so the evolver sees scores.

原因：批量任务数量过小，或进化引擎未接触到足够多样的失败案例。 解决方法：增大

batch_size

（尝试15–20），确保基准测试任务覆盖多样的失败模式。设置

trajectory_only=False

，让进化引擎可以看到得分。

Agent workspace grows too large

Agent工作区体积过大

Cause: Skill library bloat from accepting every proposal. Fix: The default SkillForge engine curates skills automatically. If using a custom engine, implement merging logic to consolidate overlapping skills.

原因：接受所有技能提案导致技能库臃肿。 解决方法：默认的SkillForge引擎会自动管理技能。如果使用自定义引擎，需实现合并逻辑来整合重叠技能。

Git conflicts during evolution

进化过程中出现Git冲突

Cause: Multiple evolution runs on the same workspace. Fix: Each

evolver.run()

should operate on its own workspace copy. Use

Evolver(agent="seed-name")

to auto-copy the seed each time.

原因：在同一工作区运行多次进化任务。 解决方法：每次

evolver.run()

应在独立的工作区副本上运行。使用

Evolver(agent="seed-name")

来自动复制种子工作区。

LLM provider errors during evolution

进化过程中出现LLM提供商错误

Cause: Rate limits or authentication issues with the evolver model. Fix: Check

evolver_model

config. For Bedrock, ensure AWS credentials are configured. For Anthropic, set

ANTHROPIC_API_KEY

原因：进化引擎模型遇到速率限制或认证问题。 解决方法：检查

evolver_model

配置。对于Bedrock，确保AWS凭证已配置。对于Anthropic，设置

ANTHROPIC_API_KEY

环境变量。

Custom agent not picking up evolved state

自定义Agent未加载进化后的状态

Cause: Agent doesn't implement

reload_from_fs()

. Fix: Override

reload_from_fs()

in your

BaseAgent

subclass to re-read prompts, skills, and memory from the workspace after each evolution cycle.

原因：Agent未实现

reload_from_fs()

方法。 解决方法：在

BaseAgent

子类中重写

reload_from_fs()

方法，以便在每轮进化循环后重新从工作区读取提示词、技能和记忆。

Usage Instructions for Agents

Agent使用说明

When this skill is loaded:

Read this entire file before implementing any evolution workflow
Start with the Quick Start — get a minimal evolution running before customizing
Use built-in seeds when possible —
```
"swe"
```
,
```
"terminal"
```
,
```
"mcp"
```
have battle-tested configurations
Always initialize git in custom workspaces before running evolution
Check convergence settings — default
```
egl_threshold=0.05
```
with
```
egl_window=3
```
may be too aggressive for your domain
Inspect evolved state after each run — read
```
prompts/system.md
```
and
```
skills/
```
to understand what the evolver learned

Pro Tips:

Set
```
trajectory_only=False
```
(default) so the evolver sees scores — this accelerates learning
Start with
```
batch_size=10
```
and adjust based on task diversity
Use
```
holdout_ratio=0.2
```
to prevent overfitting to training tasks
After evolution,
```
git diff evo-1 evo-N
```
shows the cumulative effect of all mutations
If the evolver isn't finding skills, enrich
```
feedback.detail
```
strings with specific failure reasons

Warning Signs:

Score oscillating between cycles → benchmark evaluation may be non-deterministic
Skills directory growing past 15+ skills → engine isn't merging/curating properly
Prompt growing past 10K chars → evolution is appending without refactoring
```
converged=True
```
after 2-3 cycles → increase
```
egl_window
```
and decrease
```
egl_threshold
```

加载此技能时：

先阅读完整文档，再实现任何进化工作流
从快速开始入手 — 在自定义之前先运行最小化的进化示例
尽可能使用内置种子 —
```
"swe"
```
、
```
"terminal"
```
、
```
"mcp"
```
经过实战验证的配置
在自定义工作区中始终初始化Git，再运行进化
检查收敛设置 — 默认的
```
egl_threshold=0.05
```
和
```
egl_window=3
```
可能对你的领域过于严格
每次运行后检查进化状态 — 查看
```
prompts/system.md
```
和
```
skills/
```
目录，了解进化引擎学到的内容

专业提示：

设置
```
trajectory_only=False
```
（默认值），让进化引擎可以看到得分——这会加速学习过程
从
```
batch_size=10
```
开始，根据任务多样性进行调整
使用
```
holdout_ratio=0.2
```
避免过拟合训练任务
进化完成后，
```
git diff evo-1 evo-N
```
可显示所有变异的累积效果
如果进化引擎未发现技能，可在
```
feedback.detail
```
字符串中补充具体的失败原因

警告信号：

得分在循环间波动 → 基准测试评估可能存在非确定性
技能目录超过15个技能 → 引擎未正确合并/管理技能
提示词超过10K字符 → 进化仅在追加内容而未进行重构
2-3轮循环后
```
converged=True
```
→ 增大
```
egl_window
```
并减小
```
egl_threshold
```

References

参考资料

Architecture deep dive: See references/architecture.md
API reference: See references/api.md
Step-by-step tutorials: See references/tutorials.md
Real-world examples: See references/examples.md
GitHub issues & solutions: See references/issues.md
Design patterns: See references/design-patterns.md
Release history: See references/releases.md

架构深度解析: 查看references/architecture.md
API参考: 查看references/api.md
分步教程: 查看references/tutorials.md
实战示例: 查看references/examples.md
GitHub问题与解决方案: 查看references/issues.md
设计模式: 查看references/design-patterns.md
版本历史: 查看references/releases.md