hitl-safety

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

HITL Safety Controls

HITL安全控制

You have access to AgentOS human-in-the-loop (HITL) safety controls. These gate dangerous or irreversible actions behind an approval step — either a human operator, an LLM judge, or a policy-based auto-decision — before execution proceeds.

您可以使用AgentOS的人在回路（HITL）安全控制功能。这些功能会将危险或不可逆的操作置于审批步骤之后——在执行前需经过人工操作员、LLM判断器或基于策略的自动决策环节。

When to Use HITL

何时使用HITL

Request approval before any action that is:

Destructive — deleting files, dropping database tables, revoking credentials
Irreversible — sending emails, publishing posts, executing financial transactions
Expensive — spawning large compute jobs, calling premium APIs with high token cost
Sensitive — accessing PII, modifying security settings, changing permissions
External — calling third-party APIs that have side effects (webhooks, payments)

If the agent's security tier is paranoid, every tool invocation goes through HITL. At strict, destructive and external actions require approval. At balanced and below, HITL is opt-in per tool or workflow.

在执行以下类型的操作前需请求审批：

破坏性操作 —— 删除文件、删除数据库表、吊销凭证
不可逆操作 —— 发送邮件、发布帖子、执行金融交易
高成本操作 —— 启动大型计算任务、调用高token成本的付费API
敏感操作 —— 访问PII（个人可识别信息）、修改安全设置、变更权限
外部操作 —— 调用有副作用的第三方API（如webhook、支付接口）

如果Agent的安全等级为paranoid（极度严格），所有工具调用都需经过HITL审批。在**strict（严格）等级下，破坏性和外部操作需要审批。在balanced（平衡）**及以下等级，HITL为按需启用模式，可针对单个工具或工作流配置。

The Six HITL Handlers

六种HITL处理器

Import handlers from the top-level namespace:

typescript

import { hitl } from '@framers/agentos';

从顶级命名空间导入处理器：

typescript

import { hitl } from '@framers/agentos';

hitl.autoApprove()

Always approves. Use only in development, testing, or when the security tier is permissive/dangerous and you trust all tool inputs.

始终批准请求。仅在开发、测试环境，或安全等级为**permissive/dangerous（宽松/危险）**且您完全信任所有工具输入时使用。

hitl.autoReject(reason?)

Always denies with an optional reason string. Useful for locking down specific tools entirely.

始终拒绝请求，可附带可选的原因字符串。适用于完全锁定特定工具的场景。

hitl.cli()

Prompts the human operator in the terminal for a yes/no decision. Default handler when running

wunderland chat

interactively.

在终端中提示人工操作员进行是/否决策。当交互式运行

wunderland chat

时，此为默认处理器。

hitl.webhook(url)

POSTs the approval request to an external URL and waits for a JSON response with

{ approved: boolean, reason?: string }

. Use for custom dashboards or external approval systems.

将审批请求POST至外部URL，并等待包含

{ approved: boolean, reason?: string }

的JSON响应。适用于自定义仪表盘或外部审批系统。

hitl.slack({ channel, token })

Sends an approval request to a Slack channel and waits for a reaction or thread reply. In v1, defaults to auto-approve after notification.

将审批请求发送至Slack频道，并等待反应或线程回复。在v1版本中，默认在发送通知后自动批准。

hitl.llmJudge({ model?, provider?, criteria?, confidenceThreshold?, fallback?, apiKey? })

Routes the approval decision through an LLM. The judge evaluates the pending action against the provided criteria string and returns approve/reject with a confidence score. When the confidence is below

confidenceThreshold

(default 0.7), the judge falls back to

fallback

(default: auto-reject).

Usage in agency():

typescript

agency({
  hitl: {
    handler: hitl.llmJudge({
      model: 'gpt-4o-mini',
      criteria: 'Is this action safe and relevant to the user request?',
      confidenceThreshold: 0.7,
    }),
  },
});

Usage in CLI:

bash

wunderland chat --llm-judge

Usage in agent.config.json:

json

{
  "hitl": {
    "mode": "llm-judge"
  }
}

将审批决策路由至LLM。判断器会根据提供的标准字符串评估待执行操作，并返回批准/拒绝结果及置信度分数。当置信度低于

confidenceThreshold

（默认值0.7）时，判断器会触发

fallback

（默认：自动拒绝）。

在agency()中使用：

typescript

agency({
  hitl: {
    handler: hitl.llmJudge({
      model: 'gpt-4o-mini',
      criteria: 'Is this action safe and relevant to the user request?',
      confidenceThreshold: 0.7,
    }),
  },
});

在CLI中使用：

bash

wunderland chat --llm-judge

在agent.config.json中使用：

json

{
  "hitl": {
    "mode": "llm-judge"
  }
}

Guardrail Overrides

防护规则覆盖

When

guardrailOverride

true

(the default), guardrails run after HITL approval and can veto actions that passed the approval gate. This provides defense-in-depth: even if a human or LLM judge approves an action, built-in safety checks still apply.

Built-in post-approval guardrail checks:

code-safety — detects destructive shell patterns (
```
rm -rf /
```
,
```
DROP TABLE
```
,
```
FORMAT C:
```
)
pii-redaction — detects SSNs, credit card numbers, and other PII in tool arguments

Even auto-approved actions (via

hitl.autoApprove()

) are checked when

guardrailOverride

is enabled.

Disable guardrail overrides:

typescript

// In API
agency({ hitl: { guardrailOverride: false } });

bash

undefined

当

guardrailOverride

设为

true

（默认值）时，防护规则会在HITL审批后运行，可否决已通过审批的操作。这提供了纵深防御：即使人工或LLM判断器批准了操作，内置的安全检查仍会生效。

内置的审批后防护规则检查：

code-safety —— 检测破坏性Shell模式（如
```
rm -rf /
```
、
```
DROP TABLE
```
、
```
FORMAT C:
```
）
pii-redaction —— 检测工具参数中的社保号码、信用卡号及其他PII信息

即使是通过

hitl.autoApprove()

自动批准的操作，在

guardrailOverride

启用时也会被检查。

禁用防护规则覆盖：

typescript

// 在API中
agency({ hitl: { guardrailOverride: false } });

bash

undefined

In CLI

在CLI中

wunderland chat --no-guardrail-override

```json
// In agent.config.json
{ "hitl": { "guardrailOverride": false } }

wunderland chat --no-guardrail-override

```json
// 在agent.config.json中
{ "hitl": { "guardrailOverride": false } }

humanNode in Graph Orchestration

图编排中的humanNode

When building agent graphs with AgentOS orchestration, use

humanNode()

to insert approval gates:

typescript

import { humanNode } from '@framers/agentos/orchestration';

humanNode({
  prompt: 'Deploy to production?',
  timeout: 300000,           // 5 minutes
  onTimeout: 'reject',       // what happens when timeout expires
});

使用AgentOS编排构建Agent图时，可使用

humanNode()

插入审批节点：

typescript

import { humanNode } from '@framers/agentos/orchestration';

humanNode({
  prompt: 'Deploy to production?',
  timeout: 300000,           // 5分钟
  onTimeout: 'reject',       // 超时后的处理逻辑
});

humanNode Options

humanNode选项

Option	Type	Description
`prompt`	`string`	The question shown to the approver
`autoAccept`	`boolean`	Skip human, always approve
`autoReject`	`boolean`	Always deny (with optional `reason` )
`judge`	`{ model, criteria, confidenceThreshold }`	Delegate decision to an LLM judge
`onTimeout`	`'accept' \| 'reject' \| 'error'`	Behavior when timeout expires
`timeout`	`number`	Milliseconds before onTimeout fires

LLM judge in a graph node:

typescript

humanNode({
  prompt: 'Deploy to production?',
  judge: {
    model: 'gpt-4o-mini',
    criteria: 'Is this deployment safe given the current test results?',
    confidenceThreshold: 0.8,
  },
  onTimeout: 'reject',
  timeout: 300000,
});

选项	类型	描述
`prompt`	`string`	向审批者展示的问题
`autoAccept`	`boolean`	跳过人工环节，始终批准
`autoReject`	`boolean`	始终拒绝（可附带可选 `reason` ）
`judge`	`{ model, criteria, confidenceThreshold }`	将决策委托给LLM判断器
`onTimeout`	`'accept' \| 'reject' \| 'error'`	超时后的行为
`timeout`	`number`	触发onTimeout前的毫秒数

图节点中的LLM判断器：

typescript

humanNode({
  prompt: 'Deploy to production?',
  judge: {
    model: 'gpt-4o-mini',
    criteria: 'Is this deployment safe given the current test results?',
    confidenceThreshold: 0.8,
  },
  onTimeout: 'reject',
  timeout: 300000,
});

The Approval Flow

审批流程

The full execution path for any HITL-gated action:

Tool invocation requested — the agent wants to call a tool
HITL decision — the configured handler (human, LLM judge, auto) evaluates the request
Guardrail check — if
```
guardrailOverride
```
is true, post-approval guardrails scan the action
Execute or deny — the tool runs only if both HITL and guardrails approve

If either step rejects, the agent receives a denial message with a reason and can adjust its approach.

任何受HITL管控的操作的完整执行路径：

工具调用请求 —— Agent希望调用某个工具
HITL决策 —— 已配置的处理器（人工、LLM判断器、自动）评估请求
防护规则检查 —— 如果
```
guardrailOverride
```
为true，审批后的防护规则会扫描操作
执行或拒绝 —— 只有当HITL和防护规则都批准时，工具才会运行

如果任一环节拒绝，Agent会收到包含原因的拒绝消息，并可调整后续策略。

Choosing the Right Handler

选择合适的处理器

Scenario	Recommended Handler
Development / testing	`hitl.autoApprove()`
Interactive CLI session	`hitl.cli()`
Production with human oversight	`hitl.webhook(url)` or `hitl.slack(...)`
High-volume autonomous agent	`hitl.llmJudge(...)`
Locked-down tool	`hitl.autoReject('Tool disabled')`

场景	推荐处理器
开发/测试环境	`hitl.autoApprove()`
交互式CLI会话	`hitl.cli()`
需人工监督的生产环境	`hitl.webhook(url)` 或 `hitl.slack(...)`
高流量自主Agent	`hitl.llmJudge(...)`
锁定的工具	`hitl.autoReject('Tool disabled')`

Security Tier Interaction

安全等级交互

Dangerous / Permissive — HITL is opt-in; most tools auto-approve
Balanced — HITL gates destructive tools (file delete, shell execute with dangerous patterns)
Strict — HITL gates all external and write tools; only read-only tools skip approval
Paranoid — every tool invocation goes through HITL, no exceptions

Set the security tier in

agent.config.json

json

{
  "security": {
    "tier": "balanced"
  }
}

Or programmatically:

typescript

import { SecurityTiers } from '@framers/agentos/safety/runtime';
agency({ security: { tier: SecurityTiers.BALANCED } });

Dangerous / Permissive（危险/宽松） —— HITL为按需启用；大多数工具自动批准
Balanced（平衡） —— HITL管控破坏性工具（如文件删除、带有危险模式的Shell执行）
Strict（严格） —— HITL管控所有外部和写入工具；仅只读工具无需审批
Paranoid（极度严格） —— 所有工具调用都需经过HITL，无例外

在

agent.config.json

中设置安全等级：

json

{
  "security": {
    "tier": "balanced"
  }
}

或通过代码设置：

typescript

import { SecurityTiers } from '@framers/agentos/safety/runtime';
agency({ security: { tier: SecurityTiers.BALANCED } });

Best Practices

最佳实践

Default to guardrailOverride: true — defense-in-depth catches what humans miss
Use LLM judge for high-volume flows — humans cannot review hundreds of requests per minute
Set meaningful criteria — vague criteria like "is this ok?" produce unreliable judge decisions
Always set onTimeout — hanging approval gates block the entire agent pipeline
Combine with PII redaction — ensure tool arguments are scanned for leaked secrets before execution
Log all decisions — HITL decisions are audit-logged; review them periodically for pattern analysis
Escalate on low confidence — configure the LLM judge fallback to escalate to a human when confidence is low rather than auto-rejecting

默认启用guardrailOverride: true —— 纵深防御可捕捉人工遗漏的风险
高流量流程使用LLM判断器 —— 人工无法每分钟处理数百个请求
设置明确的判断标准 —— 模糊的标准如“这样可以吗？”会导致不可靠的判断结果
始终设置onTimeout —— 挂起的审批节点会阻塞整个Agent流水线
结合PII脱敏功能 —— 确保工具参数在执行前被扫描是否包含泄露的敏感信息
记录所有决策 —— HITL决策会被审计记录；定期回顾以进行模式分析
低置信度时升级至人工 —— 配置LLM判断器在置信度较低时将请求升级至人工处理，而非直接自动拒绝