hitl-safety
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHITL Safety Controls
HITL安全控制
You have access to AgentOS human-in-the-loop (HITL) safety controls. These gate dangerous or irreversible actions behind an approval step — either a human operator, an LLM judge, or a policy-based auto-decision — before execution proceeds.
您可以使用AgentOS的人在回路(HITL)安全控制功能。这些功能会将危险或不可逆的操作置于审批步骤之后——在执行前需经过人工操作员、LLM判断器或基于策略的自动决策环节。
When to Use HITL
何时使用HITL
Request approval before any action that is:
- Destructive — deleting files, dropping database tables, revoking credentials
- Irreversible — sending emails, publishing posts, executing financial transactions
- Expensive — spawning large compute jobs, calling premium APIs with high token cost
- Sensitive — accessing PII, modifying security settings, changing permissions
- External — calling third-party APIs that have side effects (webhooks, payments)
If the agent's security tier is paranoid, every tool invocation goes through HITL. At strict, destructive and external actions require approval. At balanced and below, HITL is opt-in per tool or workflow.
在执行以下类型的操作前需请求审批:
- 破坏性操作 —— 删除文件、删除数据库表、吊销凭证
- 不可逆操作 —— 发送邮件、发布帖子、执行金融交易
- 高成本操作 —— 启动大型计算任务、调用高token成本的付费API
- 敏感操作 —— 访问PII(个人可识别信息)、修改安全设置、变更权限
- 外部操作 —— 调用有副作用的第三方API(如webhook、支付接口)
如果Agent的安全等级为paranoid(极度严格),所有工具调用都需经过HITL审批。在**strict(严格)等级下,破坏性和外部操作需要审批。在balanced(平衡)**及以下等级,HITL为按需启用模式,可针对单个工具或工作流配置。
The Six HITL Handlers
六种HITL处理器
Import handlers from the top-level namespace:
typescript
import { hitl } from '@framers/agentos';从顶级命名空间导入处理器:
typescript
import { hitl } from '@framers/agentos';hitl.autoApprove()
hitl.autoApprove()
Always approves. Use only in development, testing, or when the security tier is permissive/dangerous and you trust all tool inputs.
始终批准请求。仅在开发、测试环境,或安全等级为**permissive/dangerous(宽松/危险)**且您完全信任所有工具输入时使用。
hitl.autoReject(reason?)
hitl.autoReject(reason?)
Always denies with an optional reason string. Useful for locking down specific tools entirely.
始终拒绝请求,可附带可选的原因字符串。适用于完全锁定特定工具的场景。
hitl.cli()
hitl.cli()
Prompts the human operator in the terminal for a yes/no decision. Default handler when running interactively.
wunderland chat在终端中提示人工操作员进行是/否决策。当交互式运行时,此为默认处理器。
wunderland chathitl.webhook(url)
hitl.webhook(url)
POSTs the approval request to an external URL and waits for a JSON response with . Use for custom dashboards or external approval systems.
{ approved: boolean, reason?: string }将审批请求POST至外部URL,并等待包含的JSON响应。适用于自定义仪表盘或外部审批系统。
{ approved: boolean, reason?: string }hitl.slack({ channel, token })
hitl.slack({ channel, token })
Sends an approval request to a Slack channel and waits for a reaction or thread reply. In v1, defaults to auto-approve after notification.
将审批请求发送至Slack频道,并等待反应或线程回复。在v1版本中,默认在发送通知后自动批准。
hitl.llmJudge({ model?, provider?, criteria?, confidenceThreshold?, fallback?, apiKey? })
hitl.llmJudge({ model?, provider?, criteria?, confidenceThreshold?, fallback?, apiKey? })
Routes the approval decision through an LLM. The judge evaluates the pending action against the provided criteria string and returns approve/reject with a confidence score. When the confidence is below (default 0.7), the judge falls back to (default: auto-reject).
confidenceThresholdfallbackUsage in agency():
typescript
agency({
hitl: {
handler: hitl.llmJudge({
model: 'gpt-4o-mini',
criteria: 'Is this action safe and relevant to the user request?',
confidenceThreshold: 0.7,
}),
},
});Usage in CLI:
bash
wunderland chat --llm-judgeUsage in agent.config.json:
json
{
"hitl": {
"mode": "llm-judge"
}
}将审批决策路由至LLM。判断器会根据提供的标准字符串评估待执行操作,并返回批准/拒绝结果及置信度分数。当置信度低于(默认值0.7)时,判断器会触发(默认:自动拒绝)。
confidenceThresholdfallback在agency()中使用:
typescript
agency({
hitl: {
handler: hitl.llmJudge({
model: 'gpt-4o-mini',
criteria: 'Is this action safe and relevant to the user request?',
confidenceThreshold: 0.7,
}),
},
});在CLI中使用:
bash
wunderland chat --llm-judge在agent.config.json中使用:
json
{
"hitl": {
"mode": "llm-judge"
}
}Guardrail Overrides
防护规则覆盖
When is (the default), guardrails run after HITL approval and can veto actions that passed the approval gate. This provides defense-in-depth: even if a human or LLM judge approves an action, built-in safety checks still apply.
guardrailOverridetrueBuilt-in post-approval guardrail checks:
- code-safety — detects destructive shell patterns (,
rm -rf /,DROP TABLE)FORMAT C: - pii-redaction — detects SSNs, credit card numbers, and other PII in tool arguments
Even auto-approved actions (via ) are checked when is enabled.
hitl.autoApprove()guardrailOverrideDisable guardrail overrides:
typescript
// In API
agency({ hitl: { guardrailOverride: false } });bash
undefined当设为(默认值)时,防护规则会在HITL审批后运行,可否决已通过审批的操作。这提供了纵深防御:即使人工或LLM判断器批准了操作,内置的安全检查仍会生效。
guardrailOverridetrue内置的审批后防护规则检查:
- code-safety —— 检测破坏性Shell模式(如、
rm -rf /、DROP TABLE)FORMAT C: - pii-redaction —— 检测工具参数中的社保号码、信用卡号及其他PII信息
即使是通过自动批准的操作,在启用时也会被检查。
hitl.autoApprove()guardrailOverride禁用防护规则覆盖:
typescript
// 在API中
agency({ hitl: { guardrailOverride: false } });bash
undefinedIn CLI
在CLI中
wunderland chat --no-guardrail-override
```json
// In agent.config.json
{ "hitl": { "guardrailOverride": false } }wunderland chat --no-guardrail-override
```json
// 在agent.config.json中
{ "hitl": { "guardrailOverride": false } }humanNode in Graph Orchestration
图编排中的humanNode
When building agent graphs with AgentOS orchestration, use to insert approval gates:
humanNode()typescript
import { humanNode } from '@framers/agentos/orchestration';
humanNode({
prompt: 'Deploy to production?',
timeout: 300000, // 5 minutes
onTimeout: 'reject', // what happens when timeout expires
});使用AgentOS编排构建Agent图时,可使用插入审批节点:
humanNode()typescript
import { humanNode } from '@framers/agentos/orchestration';
humanNode({
prompt: 'Deploy to production?',
timeout: 300000, // 5分钟
onTimeout: 'reject', // 超时后的处理逻辑
});humanNode Options
humanNode选项
| Option | Type | Description |
|---|---|---|
| | The question shown to the approver |
| | Skip human, always approve |
| | Always deny (with optional |
| | Delegate decision to an LLM judge |
| | Behavior when timeout expires |
| | Milliseconds before onTimeout fires |
LLM judge in a graph node:
typescript
humanNode({
prompt: 'Deploy to production?',
judge: {
model: 'gpt-4o-mini',
criteria: 'Is this deployment safe given the current test results?',
confidenceThreshold: 0.8,
},
onTimeout: 'reject',
timeout: 300000,
});| 选项 | 类型 | 描述 |
|---|---|---|
| | 向审批者展示的问题 |
| | 跳过人工环节,始终批准 |
| | 始终拒绝(可附带可选 |
| | 将决策委托给LLM判断器 |
| | 超时后的行为 |
| | 触发onTimeout前的毫秒数 |
图节点中的LLM判断器:
typescript
humanNode({
prompt: 'Deploy to production?',
judge: {
model: 'gpt-4o-mini',
criteria: 'Is this deployment safe given the current test results?',
confidenceThreshold: 0.8,
},
onTimeout: 'reject',
timeout: 300000,
});The Approval Flow
审批流程
The full execution path for any HITL-gated action:
- Tool invocation requested — the agent wants to call a tool
- HITL decision — the configured handler (human, LLM judge, auto) evaluates the request
- Guardrail check — if is true, post-approval guardrails scan the action
guardrailOverride - Execute or deny — the tool runs only if both HITL and guardrails approve
If either step rejects, the agent receives a denial message with a reason and can adjust its approach.
任何受HITL管控的操作的完整执行路径:
- 工具调用请求 —— Agent希望调用某个工具
- HITL决策 —— 已配置的处理器(人工、LLM判断器、自动)评估请求
- 防护规则检查 —— 如果为true,审批后的防护规则会扫描操作
guardrailOverride - 执行或拒绝 —— 只有当HITL和防护规则都批准时,工具才会运行
如果任一环节拒绝,Agent会收到包含原因的拒绝消息,并可调整后续策略。
Choosing the Right Handler
选择合适的处理器
| Scenario | Recommended Handler |
|---|---|
| Development / testing | |
| Interactive CLI session | |
| Production with human oversight | |
| High-volume autonomous agent | |
| Locked-down tool | |
| 场景 | 推荐处理器 |
|---|---|
| 开发/测试环境 | |
| 交互式CLI会话 | |
| 需人工监督的生产环境 | |
| 高流量自主Agent | |
| 锁定的工具 | |
Security Tier Interaction
安全等级交互
- Dangerous / Permissive — HITL is opt-in; most tools auto-approve
- Balanced — HITL gates destructive tools (file delete, shell execute with dangerous patterns)
- Strict — HITL gates all external and write tools; only read-only tools skip approval
- Paranoid — every tool invocation goes through HITL, no exceptions
Set the security tier in :
agent.config.jsonjson
{
"security": {
"tier": "balanced"
}
}Or programmatically:
typescript
import { SecurityTiers } from '@framers/agentos/safety/runtime';
agency({ security: { tier: SecurityTiers.BALANCED } });- Dangerous / Permissive(危险/宽松) —— HITL为按需启用;大多数工具自动批准
- Balanced(平衡) —— HITL管控破坏性工具(如文件删除、带有危险模式的Shell执行)
- Strict(严格) —— HITL管控所有外部和写入工具;仅只读工具无需审批
- Paranoid(极度严格) —— 所有工具调用都需经过HITL,无例外
在中设置安全等级:
agent.config.jsonjson
{
"security": {
"tier": "balanced"
}
}或通过代码设置:
typescript
import { SecurityTiers } from '@framers/agentos/safety/runtime';
agency({ security: { tier: SecurityTiers.BALANCED } });Best Practices
最佳实践
- Default to guardrailOverride: true — defense-in-depth catches what humans miss
- Use LLM judge for high-volume flows — humans cannot review hundreds of requests per minute
- Set meaningful criteria — vague criteria like "is this ok?" produce unreliable judge decisions
- Always set onTimeout — hanging approval gates block the entire agent pipeline
- Combine with PII redaction — ensure tool arguments are scanned for leaked secrets before execution
- Log all decisions — HITL decisions are audit-logged; review them periodically for pattern analysis
- Escalate on low confidence — configure the LLM judge fallback to escalate to a human when confidence is low rather than auto-rejecting
- 默认启用guardrailOverride: true —— 纵深防御可捕捉人工遗漏的风险
- 高流量流程使用LLM判断器 —— 人工无法每分钟处理数百个请求
- 设置明确的判断标准 —— 模糊的标准如“这样可以吗?”会导致不可靠的判断结果
- 始终设置onTimeout —— 挂起的审批节点会阻塞整个Agent流水线
- 结合PII脱敏功能 —— 确保工具参数在执行前被扫描是否包含泄露的敏感信息
- 记录所有决策 —— HITL决策会被审计记录;定期回顾以进行模式分析
- 低置信度时升级至人工 —— 配置LLM判断器在置信度较低时将请求升级至人工处理,而非直接自动拒绝