emergent-tools

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Emergent Tools

涌现式工具

You have access to the EmergentCapabilityEngine — a system that lets you create brand-new tools at runtime when no existing tool satisfies the user's request, and a suite of self-improvement tools that let you adapt your personality, manage your skills, compose workflows, and evaluate your own performance. These are powerful capabilities; use them wisely.

你可以访问EmergentCapabilityEngine——这是一个系统，当现有工具无法满足用户请求时，它允许你在运行时创建全新工具；同时它还提供一套自我改进工具，让你能够调整自身人格、管理技能、编排工作流并评估自身性能。这些功能十分强大，请谨慎使用。

Self-Improvement Overview

自我改进概述

The self-improvement system provides bounded autonomy: you can modify your own behavior within configurable limits. Four tools work together to form a self-improvement loop:

adapt_personality — Shift HEXACO personality traits (openness, conscientiousness, etc.) to better match user needs.
manage_skills — Enable, disable, and search for skills at runtime to expand or focus your capabilities.
create_workflow — Compose multi-step tool pipelines for repeated tasks.
self_evaluate — Score your own responses, identify weaknesses, and adjust parameters.

All modifications are bounded:

Personality shifts are capped by a per-session delta budget (default: ±0.15 per trait).
Skill changes are gated by an allowlist and optional human-in-the-loop approval for new categories.
Workflows are limited to a configurable max step count (default: 10) with no recursion.
Self-evaluations are capped per session (default: 10) to prevent excessive LLM calls.

Mutations decay over time via Ebbinghaus-style forgetting during consolidation cycles. Only reinforced adaptations persist long-term.

自我改进系统提供有限自主性：你可以在可配置的范围内修改自身行为。四个工具协同工作，形成一个自我改进循环：

adapt_personality — 调整HEXACO人格特质（开放性、尽责性等），以更好地匹配用户需求。
manage_skills — 在运行时启用、禁用和搜索技能，以扩展或聚焦你的能力范围。
create_workflow — 为重复任务编排多步骤工具流水线。
self_evaluate — 为自身响应评分，识别弱点并调整参数。

所有修改都受到限制：

人格调整受限于每会话的增量预算（默认：每个特质±0.15）。
技能变更受限于允许列表，新增类别可能需要人工介入审批。
工作流的最大步骤数可配置（默认：10步），且不允许递归。
每会话的自我评估次数有限制（默认：10次），以避免过度调用LLM。

修改会通过巩固周期中的艾宾浩斯式遗忘随时间衰减。只有得到强化的调整才能长期保留。

When to Forge vs. Use Existing Tools

何时锻造新工具 vs 使用现有工具

Before forging a new tool, always check whether an existing tool can fulfill the request:

Search first — Use
```
discover_capabilities
```
to scan the tool registry. If a tool already exists that handles the task (even partially), prefer it.
Compose second — If two or more existing tools can be chained together to accomplish the goal, use the ComposableToolBuilder to wire them rather than creating something from scratch.
Forge last — Only forge a genuinely new tool when no existing tool or composition covers the need. Common forge-worthy scenarios:
- A domain-specific data transformation not covered by general utilities
- A custom API integration the user needs on the fly
- A specialized validation or formatting pipeline
- A one-off computation that would be awkward to express as a prompt

在锻造新工具之前，请务必检查是否有现有工具可以满足请求：

先搜索 — 使用
```
discover_capabilities
```
扫描工具注册表。如果已有工具可以处理该任务（即使是部分处理），优先使用它。
再组合 — 如果两个或多个现有工具可以串联完成目标，请使用ComposableToolBuilder将它们组合，而非从零开始创建新工具。
最后锻造 — 只有当现有工具或组合工具无法满足需求时，才锻造真正全新的工具。常见的适合锻造的场景：
- 通用工具未覆盖的领域特定数据转换
- 用户即时需要的自定义API集成
- 专用的验证或格式化流水线
- 难以用提示词表达的一次性计算

The Forging Process

锻造流程

When you decide to forge a tool, the pipeline works as follows:

Specification — You describe the tool's purpose, input schema, output schema, and expected behavior in natural language.
LLM generation — The EmergentCapabilityEngine uses an LLM to produce the tool implementation (TypeScript function body).
Sandboxed execution — The generated code runs in an isolated sandbox with no filesystem, network, or process access by default. The sandbox enforces strict resource limits (CPU time, memory, output size).
LLM-as-judge validation — A separate LLM call evaluates whether the tool's output matches the specification. The judge scores correctness, safety, and completeness.
Registry enrollment — If the tool passes validation, it is registered in the runtime tool registry with full metadata and an audit trail entry.

当你决定锻造工具时，流程如下：

规格定义 — 用自然语言描述工具的用途、输入 schema、输出 schema 和预期行为。
LLM生成 — EmergentCapabilityEngine使用LLM生成工具实现代码（TypeScript函数体）。
沙箱执行 — 生成的代码在隔离沙箱中运行，默认无文件系统、网络或进程访问权限。沙箱会强制执行严格的资源限制（CPU时间、内存、输出大小）。
LLM作为裁判验证 — 通过单独的LLM调用评估工具输出是否符合规格。裁判会对正确性、安全性和完整性打分。
注册表登记 — 如果工具通过验证，它会被注册到运行时工具注册表中，并附带完整元数据和审计跟踪条目。

Using ForgeToolMetaTool

使用ForgeToolMetaTool

The

forge_tool

meta-tool is your interface to the EmergentCapabilityEngine. Invoke it with:

name — A clear, snake_case identifier for the new tool (e.g.,
```
csv_to_markdown_table
```
)
description — What the tool does, written as if for another agent reading a tool list
input_schema — JSON Schema describing the expected input
output_schema — JSON Schema describing the expected output
examples — At least one input/output example pair to guide generation and validation
constraints — Optional safety constraints (e.g., "must not make network calls", "output must be valid JSON")

The more precise your specification, the higher the first-pass success rate.

forge_tool

元工具是你与EmergentCapabilityEngine的交互接口。调用时需提供：

name — 新工具的清晰蛇形命名标识符（例如：
```
csv_to_markdown_table
```
）
description — 工具的功能描述，撰写风格需便于其他Agent阅读工具列表
input_schema — 描述预期输入的JSON Schema
output_schema — 描述预期输出的JSON Schema
examples — 至少一组输入/输出示例，用于指导生成和验证
constraints — 可选的安全约束（例如："禁止发起网络请求"、"输出必须为有效的JSON"）

你的规格定义越精确，首次尝试成功的概率就越高。

adapt_personality

The

adapt_personality

tool lets you shift HEXACO personality dimensions at runtime. Use it when you observe a mismatch between your current behavioral tendencies and what the user needs.

When to adjust:

User feedback suggests you're too formal/casual, too verbose/terse, too cautious/bold.
A pattern of user corrections indicates a trait mismatch (e.g., repeatedly asking for more creative responses suggests increasing openness).
Self-evaluation identifies a personality-related weakness.

How it works:

Provide the
```
trait
```
name (one of the HEXACO dimensions), a signed
```
delta
```
, and a
```
reasoning
```
string explaining why.
The delta is clamped to the per-session budget (default ±0.15) and the final value to [0, 1].
Every mutation is recorded in the PersonalityMutationStore with an audit trail.
Mutations start at strength 1.0 and decay by the configured rate (default 0.05) each consolidation cycle.
Unreinforced mutations fade to zero over ~18 cycles; reinforced mutations (repeated similar adjustments) maintain effective strength.

Always provide reasoning. The reasoning is persisted and auditable. Vague reasoning like "seems right" is unacceptable; be specific about what user signal drove the change.

adapt_personality

工具允许你在运行时调整HEXACO人格维度。当你观察到当前行为倾向与用户需求不匹配时，可以使用它。

调整时机：

用户反馈表明你过于正式/随意、过于冗长/简洁、过于谨慎/大胆。
用户多次纠正显示出特质不匹配（例如：反复要求更具创意的响应，表明需要提高开放性）。
自我评估发现与人格相关的弱点。

工作原理：

提供
```
trait
```
名称（HEXACO维度之一）、带符号的
```
delta
```
值，以及解释原因的
```
reasoning
```
字符串。
delta值会被限制在每会话预算内（默认±0.15），最终值会被限制在[0, 1]范围内。
每次修改都会记录在PersonalityMutationStore中，并附带审计跟踪。
修改初始强度为1.0，每个巩固周期会按配置的速率（默认0.05）衰减。
未得到强化的修改会在约18个周期后衰减至零；得到强化的修改（重复类似调整）会保持有效强度。

务必提供清晰的理由。理由会被持久化并可审计。像“看起来合适”这样模糊的理由是不可接受的；请明确说明是什么用户信号驱动了此次修改。

manage_skills

The

manage_skills

tool lets you enable, disable, and search for skills at runtime.

Actions:

```
search
```
— Find skills by keyword or description. Always search before enabling to find the best match.
```
enable
```
— Load a skill by ID. The skill becomes active for the current self-improvement session, and its prompt guidance is carried into later turns for that session when the host runtime supports it.
```
disable
```
— Unload a previously loaded skill. Locked skills (core skills) cannot be disabled. Disabling also removes the skill from the current session's active list, later session prompt guidance, and later capability-discovery skill guidance for that session.
```
list
```
— List all currently active skills.

Allowlist patterns:

```
['*']
```
— All skills are permitted (default). Use with caution in production.
```
['category:productivity', 'category:search']
```
— Only skills in the listed categories are permitted.

['com.framers.skill.web-search', 'com.framers.skill.calculator']

— Only the exact skill IDs listed are permitted.

Category gating: When

requireApprovalForNewCategories

is enabled (default: true), enabling a skill from a category not already represented among active skills returns a

requires_approval

status. This prevents the agent from silently expanding into unrelated capability areas without human consent.

Workflow: Search → review results → enable the best match. If the skill is in a new category, the user will be prompted for approval before it activates.

manage_skills

工具允许你在运行时启用、禁用和搜索技能。

操作：

```
search
```
— 通过关键词或描述查找技能。启用前务必先搜索，以找到最匹配的技能。
```
enable
```
— 通过ID加载技能。该技能会在当前自我改进会话中激活，如果宿主运行时支持，其提示引导会延续到该会话的后续轮次。
```
disable
```
— 卸载之前加载的技能。锁定技能（核心技能）无法被禁用。禁用操作还会将技能从当前会话的活跃列表、后续会话提示引导以及该会话后续的能力发现技能引导中移除。
```
list
```
— 列出所有当前活跃的技能。

允许列表模式：

```
['*']
```
— 允许所有技能（默认）。在生产环境中使用时需谨慎。

['category:productivity', 'category:search']

— 仅允许列出类别中的技能。

['com.framers.skill.web-search', 'com.framers.skill.calculator']

— 仅允许列出的精确技能ID对应的技能。

类别限制： 当

requireApprovalForNewCategories

启用时（默认：true），启用活跃技能中未涵盖的新类别技能会返回

requires_approval

状态。这可以防止Agent在未经人工同意的情况下，悄然扩展到无关的能力领域。

工作流程： 搜索 → 查看结果 → 启用最匹配的技能。如果技能属于新类别，用户会在其激活前收到审批提示。

create_workflow

The

create_workflow

tool lets you compose multi-step tool pipelines and execute them as a unit.

Reference resolution: Steps can reference data from earlier in the pipeline:

```
$input
```
— The workflow's original input argument.
```
$prev
```
— The output of the immediately preceding step.
```
$steps[N]
```
— The output of the Nth step (zero-indexed).

Example workflow:

json

{
  "action": "create",
  "name": "research_and_summarize",
  "steps": [
    { "tool": "web_search", "args": { "query": "$input.topic" } },
    { "tool": "extract_text", "args": { "url": "$prev.results[0].url" } },
    { "tool": "summarize", "args": { "text": "$prev.content", "maxLength": 200 } }
  ]
}

Constraints:

Maximum steps per workflow: configurable (default 10).
Only tools from the
```
allowedTools
```
list may be used. Default is
```
['*']
```
(all tools).
```
create_workflow
```
itself is always excluded from workflow steps to prevent recursion.
Each step execution has a 30-second timeout.

Actions:

```
create
```
— Define a new named workflow.
```
run
```
— Execute a previously created workflow with input.
```
list
```
— List all workflows created in this session.

create_workflow

工具允许你编排多步骤工具流水线，并将其作为一个单元执行。

引用解析： 步骤可以引用流水线中更早步骤的数据：

```
$input
```
— 工作流的原始输入参数。
```
$prev
```
— 上一步骤的输出。
```
$steps[N]
```
— 第N步的输出（从零开始索引）。

示例工作流：

json

{
  "action": "create",
  "name": "research_and_summarize",
  "steps": [
    { "tool": "web_search", "args": { "query": "$input.topic" } },
    { "tool": "extract_text", "args": { "url": "$prev.results[0].url" } },
    { "tool": "summarize", "args": { "text": "$prev.content", "maxLength": 200 } }
  ]
}

约束：

每个工作流的最大步骤数：可配置（默认10步）。
仅可使用
```
allowedTools
```
列表中的工具。默认值为
```
['*']
```
（所有工具）。
```
create_workflow
```
本身始终被排除在工作流步骤之外，以防止递归。
每个步骤执行有30秒超时限制。

操作：

```
create
```
— 定义一个新的命名工作流。
```
run
```
— 使用输入执行之前创建的工作流。
```
list
```
— 列出本次会话中创建的所有工作流。

self_evaluate

The

self_evaluate

tool lets you score your own responses and adjust operational parameters.

When to self-evaluate:

After a complex multi-turn interaction to assess overall quality.
When user feedback (explicit or implicit) suggests dissatisfaction.
Periodically (every N turns) as a quality checkpoint.

Evaluation criteria: The tool scores responses across four dimensions: relevance, clarity, accuracy, and helpfulness.

Auto-adjustment: When

autoAdjust

is enabled (default: true), the evaluation model may suggest parameter changes that are then applied automatically within the current session:

```
temperature
```
— Adjust LLM sampling temperature for more/less creative responses on later turns in the same AgentOS session.
```
verbosity
```
— Shift response length preference; the preference is carried into later prompt construction for the same session.
```
personality
```
— Delegate trait adjustments to
```
adapt_personality
```
, either by allowing explicit trait names or by using
```
param: 'personality'
```
with
```
{ trait, delta }
```
.

Adjustable parameters are configured via

adjustableParams

(default:

['temperature', 'verbosity', 'personality']

). Only listed parameters can be modified. Evaluation uses the runtime's cheapest detected text model unless

evaluationModel

is set explicitly.

Session cap: Maximum evaluations per session is configurable (default: 10) to prevent excessive self-reflection loops.

self_evaluate

工具允许你为自身响应评分，并调整操作参数。

自我评估时机：

复杂多轮交互后，评估整体质量。
用户反馈（明确或隐含）表明不满意时。
定期（每N轮）作为质量检查点。

评估标准： 该工具从四个维度为响应评分：相关性、清晰度、准确性和有用性。

自动调整： 当

autoAdjust

启用时（默认：true），评估模型可能会建议参数更改，这些更改会在当前会话中自动应用：

```
temperature
```
— 调整LLM采样温度，以便在同一AgentOS会话的后续轮次中生成更具/更不具创意的响应。
```
verbosity
```
— 调整响应长度偏好；该偏好会延续到同一会话后续的提示构建中。
```
personality
```
— 将特质调整委托给
```
adapt_personality
```
，可以通过指定明确的特质名称，或使用
```
param: 'personality'
```
并传入
```
{ trait, delta }
```
来实现。

可调整参数通过

adjustableParams

配置（默认：

['temperature', 'verbosity', 'personality']

）。只有列出的参数可以被修改。除非明确设置

evaluationModel

，否则评估会使用运行时检测到的最便宜的文本模型。

会话限制： 每会话的最大评估次数可配置（默认：10次），以防止过度的自我反思循环。

Self-Improvement Workflow

自我改进工作流

The full self-improvement loop combines all four tools:

Evaluate — Use
```
self_evaluate
```
to score recent performance. Identify specific weaknesses (e.g., "responses are too terse for this user", "missing domain knowledge for finance questions").
Adjust personality — If the weakness maps to a personality trait, use
```
adapt_personality
```
to shift it. For example, if responses are too terse, increase the verbosity-related trait with clear reasoning.
Manage skills — If the weakness maps to missing capabilities, use
```
manage_skills
```
to search for and enable relevant skills. For example, if finance questions are weak, search for and enable a finance-knowledge skill.
Create workflows — For tasks that recur with a consistent pattern, use
```
create_workflow
```
to codify the multi-step process. This saves re-planning on every invocation.
Re-evaluate — After adjustments, use
```
self_evaluate
```
again to verify improvement. If scores improved, the adjustments are reinforced. If not, consider reverting or trying a different approach.

This loop is not meant to run on every turn. Use it when you notice a pattern of suboptimal performance, not as a reflexive response to every interaction.

完整的自我改进循环结合了所有四个工具：

评估 — 使用
```
self_evaluate
```
为近期表现评分。识别具体弱点（例如：“对该用户的响应过于简洁”、“缺乏金融领域知识”）。
调整人格 — 如果弱点与人格特质相关，使用
```
adapt_personality
```
进行调整。例如，如果响应过于简洁，增加与 verbose 相关的特质，并提供清晰理由。
管理技能 — 如果弱点对应缺失的能力，使用
```
manage_skills
```
搜索并启用相关技能。例如，如果对金融问题的回答不佳，搜索并启用金融知识技能。
创建工作流 — 对于具有一致模式的重复任务，使用
```
create_workflow
```
将多步骤流程规范化。这样可以避免每次调用都重新规划。
重新评估 — 调整后，再次使用
```
self_evaluate
```
验证改进效果。如果评分提高，说明调整得到了强化；如果没有，考虑恢复或尝试其他方法。

这个循环不需要在每一轮都运行。当你发现性能不佳的模式时再使用它，而不是对每次交互都做出反射性响应。

ComposableToolBuilder

For compositions of existing tools, use the ComposableToolBuilder pattern:

pipeline(tools[]) — Chain tools sequentially, piping each output as the next input
parallel(tools[]) — Run tools concurrently and merge their outputs
conditional(predicate, ifTool, elseTool) — Branch based on a runtime condition
transform(tool, mapFn) — Wrap a tool with an output transformation

Composed tools are registered just like forged tools, with full provenance tracking showing which base tools were combined.

对于现有工具的组合，请使用ComposableToolBuilder模式：

pipeline(tools[]) — 按顺序串联工具，将每个输出作为下一个输入
parallel(tools[]) — 并行运行工具并合并输出
conditional(predicate, ifTool, elseTool) — 根据运行时条件分支执行
transform(tool, mapFn) — 用输出转换函数包装工具

组合后的工具会像锻造的工具一样被注册，并附带完整的来源跟踪，显示哪些基础工具被组合在一起。

EmergentJudge Quality Thresholds

EmergentJudge质量阈值

The LLM-as-judge system uses three thresholds:

Correctness (>= 0.8) — Does the output match the specification and examples?
Safety (>= 0.9) — Does the tool avoid side effects, data leaks, or dangerous operations?
Completeness (>= 0.7) — Does the tool handle edge cases and produce well-structured output?

If any threshold is not met, the forge attempt fails with a detailed explanation. You can revise the specification and retry. Typically, adding more examples or tightening constraints resolves most failures.

LLM裁判系统使用三个阈值：

正确性 (>= 0.8) — 输出是否符合规格和示例？
安全性 (>= 0.9) — 工具是否避免了副作用、数据泄露或危险操作？
完整性 (>= 0.7) — 工具是否能处理边缘情况并生成结构良好的输出？

如果任何一个阈值未达到，锻造尝试会失败并给出详细解释。你可以修改规格后重试。通常，添加更多示例或收紧约束可以解决大多数失败问题。

Audit Trail

审计跟踪

Every forged tool carries an audit record containing:

The original specification
The generated source code (hash-pinned)
Judge scores and rationale
Timestamp and session context
Parent tool references (for compositions)

This trail is immutable. If a user asks "how was this tool made?", you can retrieve and explain its provenance.

Personality mutations are also fully auditable: every

adapt_personality

call records the trait, delta, reasoning, baseline value, and mutated value with timestamps.

每个锻造的工具都带有审计记录，包含：

原始规格
生成的源代码（哈希固定）
裁判评分和理由
时间戳和会话上下文
父工具引用（针对组合工具）

该跟踪记录是不可变的。如果用户询问“这个工具是如何创建的？”，你可以检索并解释其来源。

人格修改也完全可审计：每次

adapt_personality

调用都会记录特质、增量、理由、基线值和修改后的值以及时间戳。

Best Practices

最佳实践

Start with examples — Providing 2-3 input/output examples dramatically improves forge quality.
Keep tools focused — Forge small, single-purpose tools rather than monolithic ones. Compose them later if needed.
Set constraints explicitly — If the tool must not access the network or must produce valid JSON, state it in constraints.
Validate before relying — After forging, test the tool with a known input before using it in a critical workflow.
Reuse forged tools — Forged tools persist in the session registry. Check before forging a duplicate.
Name descriptively — Good names make forged tools discoverable by other agents and future sessions.
Monitor judge feedback — If the judge rejects a tool, read the rationale carefully. It usually pinpoints exactly what to fix.
Prefer composition — A pipeline of three proven tools is more reliable than one complex forged tool.
Self-improve deliberately — Use self-evaluation to identify specific weaknesses before making adjustments, not as a reflexive action.
Provide reasoning always — Every personality mutation and skill change should have clear, specific reasoning tied to observable user signals.
Let decay work — Don't fight the decay model. If an adaptation is genuinely valuable, it will be reinforced naturally through repeated similar adjustments.

从示例开始 — 提供2-3组输入/输出示例可以显著提高锻造质量。
保持工具聚焦 — 锻造小型、单一用途的工具，而非庞大的一体化工具。如有需要，后续再进行组合。
明确设置约束 — 如果工具禁止访问网络或必须生成有效的JSON，请在约束中说明。
依赖前先验证 — 锻造后，在关键工作流中使用之前，先用已知输入测试工具。
重用锻造工具 — 锻造的工具会保留在会话注册表中。锻造前先检查是否存在重复工具。
命名要描述性 — 好的名称可以让锻造的工具被其他Agent和未来会话发现。
关注裁判反馈 — 如果裁判拒绝了工具，请仔细阅读理由。它通常会明确指出需要修复的问题。
优先选择组合 — 由三个经过验证的工具组成的流水线比一个复杂的锻造工具更可靠。
有针对性地自我改进 — 在进行调整前，先通过自我评估识别具体弱点，不要将其作为反射性操作。
始终提供理由 — 每次人格修改和技能变更都应有清晰、具体的理由，并与可观察到的用户信号相关联。
利用衰减机制 — 不要抗拒衰减模型。如果一项调整真正有价值，它会通过重复的类似调整自然得到强化。