modeio-guardrail
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRun safety checks for instructions and skill repos
为指令和Skill仓库执行安全检查
Use this skill to gate risky operations behind a real-time safety assessment, or to scan third-party skill repos before installation.
使用本Skill将风险操作管控在实时安全评估之后,或在安装前扫描第三方Skill仓库。
Tool routing
工具路由
- For executable instructions, use the backend-powered flow.
scripts/safety.py - For requests like "scan this skill repo" or "is this repo dangerous", run the Skill Safety Assessment contract at .
prompts/static_repo_scan.md - Skill Safety Assessment is static analysis only. Never execute code, install dependencies, or run hooks in the target repository.
- For Skill Safety Assessment, run deterministic script evaluation first (), then pass highlights into the prompt contract.
evaluate
- 对于可执行指令,使用后端驱动的流程。
scripts/safety.py - 对于类似“扫描这个Skill仓库”或“这个仓库是否有危险”的请求,运行中的Skill安全评估合约。
prompts/static_repo_scan.md - Skill安全评估仅为静态分析。切勿在目标仓库中执行代码、安装依赖或运行钩子。
- 对于Skill安全评估,首先运行确定性脚本评估(),然后将高亮结果传入提示词合约。
evaluate
Dependencies
依赖
- is required for
requestsbecause it makes backend API calls.scripts/safety.py - does not require
scripts/skill_safety_assessment.pyfor basic local repository evaluation.requests - For repo-local setup from the repo root:
bash
python scripts/bootstrap_env.py
python scripts/doctor_env.py- 需要
scripts/safety.py依赖,因为它需要发起后端API调用。requests - 进行基础本地仓库评估不需要
scripts/skill_safety_assessment.py依赖。requests - 从仓库根目录进行本地环境配置:
bash
python scripts/bootstrap_env.py
python scripts/doctor_env.pyInstruction safety execution policy
指令安全执行策略
- Always run with
scripts/safety.pyfor structured output.--json - Run the check before executing the instruction, not after.
- Each instruction must trigger a fresh backend call. Do not reuse cached or historical results.
- For any state-changing instruction (,
delete,overwrite,permission change,deploy), always pass bothschema changeand--context.--target - accepts
scripts/safety.pyand--contextas optional flags, so this requirement is enforced by policy, not by automatic CLI blocking.--target - Use the Context Contract below exactly. Do not send free-form values like
--contextonly."production" - If policy-required context or target is missing, treat the instruction as unverified and ask for the missing fields before execution.
- If an instruction contains multiple operations, check the riskiest one.
- 始终使用参数运行
--json以获取结构化输出。scripts/safety.py - 安全检查必须在执行指令之前运行,而非之后。
- 每条指令都必须触发一次全新的后端调用,不得复用缓存或历史结果。
- 对于任何变更状态的指令(、
delete、overwrite、permission change、deploy),必须同时传入schema change和--context参数。--target - 将
scripts/safety.py和--context设为可选参数,因此该要求由政策强制约束,而非CLI自动拦截。--target - 严格遵循下文的上下文合约,不要发送仅为这类自由格式的
"production"值。--context - 如果缺少政策要求的上下文或目标参数,将指令视为未验证,执行前先向用户索要缺失的字段。
- 如果一条指令包含多个操作,检查风险最高的操作。
Context contract (policy-required for state-changing instructions)
上下文合约(状态变更指令的强制政策要求)
Pass as a JSON string with this exact shape:
--contextjson
{
"environment": "local-dev|ci|staging|production|unknown",
"operation_intent": "read-only|cleanup|maintenance|migration|permission-change|destructive|unknown",
"scope": "single-resource|bounded-batch|broad|unknown",
"data_sensitivity": "public|internal|sensitive|regulated|unknown",
"rollback": "easy|partial|none|unknown",
"change_control": "ticket:<id>|approved-manual|none|unknown"
}Rules:
- Include all six keys. If a value is unknown, set it to instead of omitting the key.
unknown - must be a concrete resource identifier (absolute file path, table name, service name, or URL). Avoid generic targets such as
--target."database" - For a file deletion request that should usually be allowed, use: ,
environment=local-dev|ci,operation_intent=cleanup,scope=single-resource, anddata_sensitivity=public|internal.rollback=easy - If those conditions are not met, expect stricter output (or higher
approved=false) and require explicit user confirmation.risk_level
将作为符合以下结构的JSON字符串传入:
--contextjson
{
"environment": "local-dev|ci|staging|production|unknown",
"operation_intent": "read-only|cleanup|maintenance|migration|permission-change|destructive|unknown",
"scope": "single-resource|bounded-batch|broad|unknown",
"data_sensitivity": "public|internal|sensitive|regulated|unknown",
"rollback": "easy|partial|none|unknown",
"change_control": "ticket:<id>|approved-manual|none|unknown"
}规则:
- 需包含全部6个键。如果某个值未知,将其设为而非省略该键。
unknown - 必须是具体的资源标识符(绝对文件路径、表名、服务名或URL)。避免使用泛化的目标,例如
--target。"database" - 对于通常会被允许的文件删除请求,使用:、
environment=local-dev|ci、operation_intent=cleanup、scope=single-resource、data_sensitivity=public|internal。rollback=easy - 如果不满足这些条件,预计会得到更严格的输出(或更高的
approved=false),需要用户明确确认。risk_level
Action policy
执行策略
This table applies to responses.
scripts/safety.pyUse the result to gate execution. Never silently ignore a safety check result.
| | Agent action |
|---|---|---|
| | Proceed. No user prompt needed. |
| | Proceed. Mention the risk and recommendation to the user. |
| | Warn user with |
| | Block execution. Show |
| | Block execution. Show full assessment. Require user to explicitly acknowledge the risk before proceeding. |
Additional signals:
- combined with
is_destructive: true: always surface the recommendation to the user, regardless of approval status.is_reversible: false - If the safety check itself fails (network error, API error): warn the user that safety could not be verified. Do not silently proceed with unverified instructions.
下表适用于的返回结果。
scripts/safety.py使用结果管控执行流程,绝不能静默忽略安全检查结果。
| | Agent动作 |
|---|---|---|
| | 继续执行,无需提示用户。 |
| | 继续执行,向用户告知风险和建议。 |
| | 向用户警告 |
| | 阻断执行,展示 |
| | 阻断执行,展示完整评估结果,要求用户在执行前明确确认风险。 |
额外信号:
- 搭配
is_destructive: true:无论审批状态如何,始终向用户展示建议。is_reversible: false - 如果安全检查本身失败(网络错误、API错误):警告用户无法验证安全性,切勿在未验证指令的情况下静默执行。
Scripts
脚本
scripts/safety.py
scripts/safety.pyscripts/safety.py
scripts/safety.py- : required, instruction text to evaluate (whitespace-only rejected)
-i, --input - : policy-required for state-changing instructions (CLI accepts it as optional); JSON string following the Context Contract above
-c, --context - : policy-required for state-changing instructions (CLI accepts it as optional); concrete operation target (file path, table name, service name, URL)
-t, --target - : output unified JSON envelope for machine consumption
--json - Endpoint: (override via
https://safety-cf.modeio.ai/api/cf/safety)SAFETY_API_URL - Retries: automatic retry on HTTP 502/503/504 and connection/timeout errors (up to 2 retries with exponential backoff)
- Request timeout: 60 seconds per attempt
bash
python scripts/safety.py -i "Delete /tmp/cache/build-123.log" \
-c '{"environment":"local-dev","operation_intent":"cleanup","scope":"single-resource","data_sensitivity":"internal","rollback":"easy","change_control":"none"}' \
-t "/tmp/cache/build-123.log" --json
python scripts/safety.py -i "DROP TABLE users" \
-c '{"environment":"production","operation_intent":"destructive","scope":"broad","data_sensitivity":"regulated","rollback":"none","change_control":"ticket:DB-9021"}' \
-t "postgres://prod/maindb.users" --json
python scripts/safety.py -i "chmod 777 /etc/passwd" \
-c '{"environment":"production","operation_intent":"permission-change","scope":"single-resource","data_sensitivity":"regulated","rollback":"partial","change_control":"ticket:SEC-118"}' \
-t "/etc/passwd" --json
python scripts/safety.py -i "List all running containers and display their resource usage" --json- :必填,待评估的指令文本(仅空白内容会被拒绝)
-i, --input - :状态变更指令的政策必填项(CLI接受为可选参数);符合上述上下文合约的JSON字符串
-c, --context - :状态变更指令的政策必填项(CLI接受为可选参数);具体的操作目标(文件路径、表名、服务名、URL)
-t, --target - :输出统一的JSON结构供机器读取
--json - 端点:(可通过
https://safety-cf.modeio.ai/api/cf/safety覆盖)SAFETY_API_URL - 重试:遇到HTTP 502/503/504和连接/超时错误时自动重试(最多2次指数退避重试)
- 请求超时:每次尝试60秒
bash
python scripts/safety.py -i "Delete /tmp/cache/build-123.log" \
-c '{"environment":"local-dev","operation_intent":"cleanup","scope":"single-resource","data_sensitivity":"internal","rollback":"easy","change_control":"none"}' \
-t "/tmp/cache/build-123.log" --json
python scripts/safety.py -i "DROP TABLE users" \
-c '{"environment":"production","operation_intent":"destructive","scope":"broad","data_sensitivity":"regulated","rollback":"none","change_control":"ticket:DB-9021"}' \
-t "postgres://prod/maindb.users" --json
python scripts/safety.py -i "chmod 777 /etc/passwd" \
-c '{"environment":"production","operation_intent":"permission-change","scope":"single-resource","data_sensitivity":"regulated","rollback":"partial","change_control":"ticket:SEC-118"}' \
-t "/etc/passwd" --json
python scripts/safety.py -i "List all running containers and display their resource usage" --jsonscripts/skill_safety_assessment.py
scripts/skill_safety_assessment.pyscripts/skill_safety_assessment.py
scripts/skill_safety_assessment.py- : authoritative v2 layered evaluator with deterministic evidence IDs, integrity fingerprinting, and risk scoring
evaluate- Native first-layer gate: GitHub metadata/README/issue-search precheck runs by default and hard-rejects on high-risk attack-demo/malware signals before local file scan.
- : compatibility alias to
scanfor existing automationevaluate - : renders prompt payload with script highlights and structured scan JSON
prompt - : validates model output against scan evidence IDs (
validate), required highlights, and score/decision consistency checksevidence_refs - : context-aware LLM adjudication bridge (prompt generation + merge decisions back into deterministic score/decision)
adjudicate
Context profile (optional, no user identity required):
json
{
"environment": "local-dev|ci|staging|production|unknown",
"execution_mode": "read-only|build-test|install|deploy|mutating|unknown",
"risk_tolerance": "strict|balanced|permissive",
"data_sensitivity": "public|internal|sensitive|regulated|unknown"
}bash
undefined- :权威v2分层评估器,具备确定性证据ID、完整性指纹识别和风险评分功能
evaluate- 原生第一层管控:默认运行GitHub元数据/README/issue搜索预检查,在本地文件扫描前如果发现高风险攻击演示/恶意软件信号会直接拒绝
- :
scan的兼容别名,用于现有自动化流程evaluate - :渲染包含脚本高亮结果和结构化扫描JSON的提示词 payload
prompt - :对照扫描证据ID(
validate)、必填高亮项和评分/决策一致性校验模型输出evidence_refs - :上下文感知的LLM裁决桥接(生成提示词 + 将决策合并回确定性评分/决策)
adjudicate
上下文配置(可选,无需用户身份):
json
{
"environment": "local-dev|ci|staging|production|unknown",
"execution_mode": "read-only|build-test|install|deploy|mutating|unknown",
"risk_tolerance": "strict|balanced|permissive",
"data_sensitivity": "public|internal|sensitive|regulated|unknown"
}bash
undefined1) Deterministic layered evaluation (v2)
1) Deterministic layered evaluation (v2)
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile '{"environment":"ci","execution_mode":"build-test","risk_tolerance":"balanced","data_sensitivity":"internal"}' --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --github-osint-timeout 8 --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile-file ./context_profile.json --output /tmp/skill_scan.json --json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile '{"environment":"ci","execution_mode":"build-test","risk_tolerance":"balanced","data_sensitivity":"internal"}' --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --github-osint-timeout 8 --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile-file ./context_profile.json --output /tmp/skill_scan.json --json
(compat) legacy alias still supported
(compat) legacy alias still supported
python scripts/skill_safety_assessment.py scan --target-repo /path/to/repo --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py scan --target-repo /path/to/repo --json > /tmp/skill_scan.json
2) Build prompt payload with highlights + full findings (recommended for strict evidence_refs linking)
2) Build prompt payload with highlights + full findings (recommended for strict evidence_refs linking)
python scripts/skill_safety_assessment.py prompt --target-repo /path/to/repo --scan-file /tmp/skill_scan.json --include-full-findings
python scripts/skill_safety_assessment.py prompt --target-repo /path/to/repo --scan-file /tmp/skill_scan.json --include-full-findings
3) Validate model output for evidence linkage + integrity
3) Validate model output for evidence linkage + integrity
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --json
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --json
--rescan-on-validate requires --target-repo
--rescan-on-validate requires --target-repo
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --target-repo /path/to/repo --rescan-on-validate --json
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --target-repo /path/to/repo --rescan-on-validate --json
4) Optional adjudication bridge (LLM interprets context, engine keeps deterministic control)
4) Optional adjudication bridge (LLM interprets context, engine keeps deterministic control)
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json --assessment-file /tmp/adjudication.json --json
undefinedpython scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json --assessment-file /tmp/adjudication.json --json
undefinedOutput contract
输出合约
Success response (--json
)
--json成功响应 (--json
)
--jsonjson
{
"success": true,
"tool": "modeio-guardrail",
"mode": "api",
"data": {
"approved": false,
"risk_level": "critical",
"risk_types": ["data loss"],
"concerns": ["Irreversible destructive operation targeting all user data"],
"recommendation": "Create a backup before deletion. Use staged rollback plan.",
"is_destructive": true,
"is_reversible": false
}
}Response fields in :
data| Field | Type | Values | Meaning |
|---|---|---|---|
| | | Whether execution is recommended |
| | | Severity of identified risks |
| | open-ended | Risk categories (e.g., |
| | open-ended | Specific risk points in natural language |
| | open-ended | Suggested safer alternative or mitigation |
| | | Whether the action involves destruction (deletion, overwrite, system modification) |
| | | Whether the action can be rolled back |
Any field may be if the backend could not determine it. Treat in as .
nullnullapprovedfalsejson
{
"success": true,
"tool": "modeio-guardrail",
"mode": "api",
"data": {
"approved": false,
"risk_level": "critical",
"risk_types": ["data loss"],
"concerns": ["Irreversible destructive operation targeting all user data"],
"recommendation": "Create a backup before deletion. Use staged rollback plan.",
"is_destructive": true,
"is_reversible": false
}
}data| 字段 | 类型 | 取值 | 含义 |
|---|---|---|---|
| | | 是否建议执行 |
| | | 识别到的风险严重程度 |
| | 开放值 | 风险分类(例如 |
| | 开放值 | 自然语言描述的具体风险点 |
| | 开放值 | 建议的更安全替代方案或缓解措施 |
| | | 操作是否涉及破坏性操作(删除、覆盖、系统修改) |
| | | 操作是否可回滚 |
如果后端无法确定某个字段的值,该字段可能为。字段为时视为。
nullapprovednullfalseFailure envelope (--json
)
--json失败响应结构 (--json
)
--jsonjson
{
"success": false,
"tool": "modeio-guardrail",
"mode": "api",
"error": {
"type": "network_error",
"message": "safety request failed: ConnectionError"
}
}Error types: (empty input), (missing local package such as ), (HTTP/connection failure), (backend returned error payload).
validation_errordependency_errorrequestsnetwork_errorapi_errorExit code is non-zero on any failure.
json
{
"success": false,
"tool": "modeio-guardrail",
"mode": "api",
"error": {
"type": "network_error",
"message": "safety request failed: ConnectionError"
}
}错误类型:(输入为空)、(缺少本地包,例如)、(HTTP/连接失败)、(后端返回错误 payload)。
validation_errordependency_errorrequestsnetwork_errorapi_error任何失败情况下退出码均为非零。
Failure policy
失败处理政策
Safety verification failures must never be silently ignored.
- Network/API error: Tell the user the safety check could not be completed. Present the original instruction and ask whether to proceed without verification.
- Validation error (empty input): Fix the input and retry before executing anything.
- Unexpected response (null or missing fields): Treat as unverified. Warn the user.
- Never assume an instruction is safe because the check failed to run.
安全验证失败绝不能被静默忽略。
- 网络/API错误:告知用户无法完成安全检查,展示原始指令并询问是否在未验证的情况下继续执行。
- 验证错误(输入为空):修复输入并重试后再执行任何操作。
- 意外响应(字段为null或缺失):视为未验证,警告用户。
- 绝不能因为检查运行失败就假设指令是安全的。
Skill Safety Assessment policy (static prompt contract)
Skill安全评估政策(静态提示词合约)
- Use as the strict contract.
prompts/static_repo_scan.md - Run first (or
scripts/skill_safety_assessment.py evaluatecompatibility alias) and pass its highlights into prompt input.scan - When model output must include strict , render prompt input with
evidence_refsso scan evidence IDs and snippets are available in--include-full-findings.SCRIPT_SCAN_JSON - Every finding must include evidence, exact snippet quote, and
path:linelinked to scan evidence IDs.evidence_refs - Always include all required highlight evidence IDs from scan output in final findings.
- Keep decision/score consistent with referenced evidence severity and coverage constraints.
- Use when context interpretation is required (docs/examples/tests vs runtime/install paths).
adjudicate - Return one of: ,
reject, orcaution.approve - If coverage is partial or evidence is insufficient, return with explicit coverage note.
caution - Include a prioritized remediation plan so users can fix and re-scan quickly.
- 严格遵循合约。
prompts/static_repo_scan.md - 首先运行(或兼容别名
scripts/skill_safety_assessment.py evaluate),将其高亮结果传入提示词输入。scan - 当模型输出必须包含严格的时,使用
evidence_refs渲染提示词输入,这样扫描证据ID和代码片段就会在--include-full-findings中可用。SCRIPT_SCAN_JSON - 每个发现结果必须包含证据、准确的片段引用,以及与扫描证据ID关联的
path:line。evidence_refs - 最终发现结果中必须包含扫描输出的所有必填高亮证据ID。
- 保持决策/评分与引用证据的严重程度和覆盖范围约束一致。
- 当需要上下文解释时(文档/示例/测试 vs 运行时/安装路径)使用。
adjudicate - 返回以下三者之一:、
reject或caution。approve - 如果覆盖范围不完整或证据不足,返回并附上明确的覆盖范围说明。
caution - 包含优先级排序的修复方案,方便用户修复后快速重新扫描。
When not to use
不适用场景
- For PII redaction or anonymization — use instead.
modeio-redact - For tasks with no executable instruction or repository target to evaluate (pure discussion, documentation, questions).
- For operations that are clearly read-only (listing files, reading configs, ).
git status
- 用于PII脱敏或匿名化——请改用。
modeio-redact - 没有可执行指令或仓库目标可供评估的任务(纯讨论、文档、问题)。
- 明显只读的操作(列出文件、读取配置、)。
git status
Resources
资源
- — CLI entry point for instruction safety checks
scripts/safety.py - — CLI entry point for skill repo assessment (evaluate/scan/prompt/validate/adjudicate)
scripts/skill_safety_assessment.py - — Skill Safety Assessment prompt contract
prompts/static_repo_scan.md - — package boundaries and compatibility notes
ARCHITECTURE.md - env var — optional endpoint override (default:
SAFETY_API_URL)https://safety-cf.modeio.ai/api/cf/safety
- — 指令安全检查的CLI入口
scripts/safety.py - — Skill仓库评估的CLI入口(evaluate/scan/prompt/validate/adjudicate)
scripts/skill_safety_assessment.py - — Skill安全评估提示词合约
prompts/static_repo_scan.md - — 包边界和兼容性说明
ARCHITECTURE.md - 环境变量 — 可选的端点覆盖(默认:
SAFETY_API_URL)https://safety-cf.modeio.ai/api/cf/safety