semgrep
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSemgrep Static Analysis
Semgrep静态分析
Fast, pattern-based static analysis for security scanning and custom rule creation.
一款快速的、基于模式的静态分析工具,用于安全扫描和自定义规则创建。
When to Use Semgrep
何时使用Semgrep
Ideal scenarios:
- Quick security scans (minutes, not hours)
- Pattern-based bug and vulnerability detection
- Enforcing coding standards and best practices
- Finding known vulnerability patterns (OWASP, CWE)
- Creating custom detection rules for your codebase
- Data flow analysis with taint mode
适用场景:
- 快速安全扫描(仅需数分钟,而非数小时)
- 基于模式的bug和漏洞检测
- 强制执行编码标准和最佳实践
- 查找已知漏洞模式(OWASP、CWE)
- 为你的代码库创建自定义检测规则
- 使用污点模式进行数据流分析
Installation
安装
bash
undefinedbash
undefinedpip (recommended)
pip (推荐)
python3 -m pip install semgrep
python3 -m pip install semgrep
Homebrew
Homebrew
brew install semgrep
brew install semgrep
Docker
Docker
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src
---docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src
---Part 1: Running Scans
第一部分:运行扫描
Quick Scan
快速扫描
bash
semgrep --config auto . # Auto-detect rulesbash
semgrep --config auto . # Auto-detect rulesUsing Rulesets
使用规则集
bash
semgrep --config p/<RULESET> . # Single ruleset
semgrep --config p/security-audit --config p/trailofbits . # Multiple| Ruleset | Description |
|---|---|
| General security and code quality |
| Comprehensive security rules |
| OWASP Top 10 vulnerabilities |
| CWE Top 25 vulnerabilities |
| Trail of Bits security rules |
| Python-specific |
| JavaScript-specific |
| Go-specific |
bash
semgrep --config p/<RULESET> . # 单个规则集
semgrep --config p/security-audit --config p/trailofbits . # 多个规则集| 规则集 | 描述 |
|---|---|
| 通用安全与代码质量规则 |
| 全面的安全规则 |
| OWASP十大漏洞规则 |
| CWE前25位漏洞规则 |
| Trail of Bits安全规则 |
| Python专属规则 |
| JavaScript专属规则 |
| Go专属规则 |
Output Formats
输出格式
bash
semgrep --config p/security-audit --sarif -o results.sarif . # SARIF
semgrep --config p/security-audit --json -o results.json . # JSONbash
semgrep --config p/security-audit --sarif -o results.sarif . # SARIF格式
semgrep --config p/security-audit --json -o results.json . # JSON格式Scan Specific Paths
扫描特定路径
bash
semgrep --config p/python app.py # Single file
semgrep --config p/javascript src/ # Directory
semgrep --config auto --include='**/test/**' . # Include testsbash
semgrep --config p/python app.py # 单个文件
semgrep --config p/javascript src/ # 目录
semgrep --config auto --include='**/test/**' . # 包含测试文件Configuration
配置
.semgrepignore
.semgrepignore
tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/Suppress False Positives
抑制误报
python
password = get_from_vault() # nosemgrep: hardcoded-password
dangerous_but_safe() # nosemgreppython
password = get_from_vault() # nosemgrep: hardcoded-password
dangerous_but_safe() # nosemgrepPart 2: Creating Custom Rules
第二部分:创建自定义规则
When to Create Custom Rules
何时创建自定义规则
- Detecting project-specific vulnerability patterns
- Enforcing internal coding standards
- Building security checks for custom frameworks
- Creating taint-mode rules for data flow analysis
- 检测项目特定的漏洞模式
- 强制执行内部编码标准
- 为自定义框架构建安全检查
- 创建用于数据流分析的污点模式规则
Approach Selection
方法选择
| Approach | Use When |
|---|---|
| Taint mode | Data flows from untrusted source to dangerous sink (injection vulnerabilities) |
| Pattern matching | Syntactic patterns without data flow requirements (deprecated APIs, hardcoded values) |
Prioritize taint mode for injection vulnerabilities. Pattern matching alone can't distinguish between (vulnerable) and (safe).
eval(user_input)eval("safe_literal")| 方法 | 适用场景 |
|---|---|
| 污点模式 | 数据从不可信源流向危险操作(注入漏洞) |
| 模式匹配 | 无需数据流要求的语法模式(已弃用的API、硬编码值) |
针对注入漏洞优先使用污点模式。仅使用模式匹配无法区分(存在漏洞)和(安全)。
eval(user_input)eval("safe_literal")Quick Start: Pattern Matching
快速入门:模式匹配
yaml
rules:
- id: hardcoded-password
languages: [python]
message: "Hardcoded password detected: $PASSWORD"
severity: ERROR
pattern: password = "$PASSWORD"yaml
rules:
- id: hardcoded-password
languages: [python]
message: "Hardcoded password detected: $PASSWORD"
severity: ERROR
pattern: password = "$PASSWORD"Quick Start: Taint Mode
快速入门:污点模式
yaml
rules:
- id: command-injection
languages: [python]
message: User input flows to command execution
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
pattern-sinks:
- pattern: os.system(...)
- pattern: subprocess.call($CMD, shell=True, ...)
pattern-sanitizers:
- pattern: shlex.quote(...)yaml
rules:
- id: command-injection
languages: [python]
message: User input flows to command execution
severity: ERROR
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form[...]
pattern-sinks:
- pattern: os.system(...)
- pattern: subprocess.call($CMD, shell=True, ...)
pattern-sanitizers:
- pattern: shlex.quote(...)Pattern Syntax Quick Reference
模式语法快速参考
| Syntax | Description | Example |
|---|---|---|
| Match anything | |
| Capture metavariable | |
| Deep expression match | |
| Operator | Description |
|---|---|
| Match exact pattern |
| All must match (AND) |
| Any matches (OR) |
| Exclude matches |
| Match only inside context |
| Match only outside context |
| Regex on captured value |
| 语法 | 描述 | 示例 |
|---|---|---|
| 匹配任意内容 | |
| 捕获元变量 | |
| 深度表达式匹配 | |
| 操作符 | 描述 |
|---|---|
| 匹配精确模式 |
| 所有模式必须匹配(逻辑与) |
| 任意模式匹配即可(逻辑或) |
| 排除匹配项 |
| 仅在指定上下文中匹配 |
| 仅在指定上下文外匹配 |
| 对捕获的值使用正则表达式 |
Testing Rules
测试规则
Test-first is mandatory. Create test files with annotations:
python
undefined必须采用测试优先的方式。创建带有注释的测试文件:
python
undefinedtest_rule.py
test_rule.py
def test_vulnerable():
user_input = request.args.get("id")
# ruleid: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe():
user_input = request.args.get("id")
# ok: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
Run tests:
```bash
semgrep --test --config rule.yaml test-filedef test_vulnerable():
user_input = request.args.get("id")
# ruleid: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe():
user_input = request.args.get("id")
# ok: my-rule-id
cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
运行测试:
```bash
semgrep --test --config rule.yaml test-fileCommand Reference
命令参考
| Task | Command |
|---|---|
| Run tests | |
| Validate YAML | |
| Dump AST | |
| Debug taint flow | |
| 任务 | 命令 |
|---|---|
| 运行测试 | |
| 验证YAML | |
| 导出AST | |
| 调试污点流 | |
Rule Creation Workflow
规则创建工作流
- Analyze the problem - Understand the bug pattern, determine taint vs pattern approach
- Create test cases first - Write and
ruleid:annotations before the ruleok: - Analyze AST - Run to understand code structure
semgrep --dump-ast - Write the rule - Start simple, iterate
- Test until 100% pass - No "missed lines" or "incorrect lines"
- Optimize patterns - Remove redundancies only after tests pass
Output structure:
<rule-id>/
├── <rule-id>.yaml # Semgrep rule
└── <rule-id>.<ext> # Test file- 分析问题 - 理解bug模式,确定使用污点模式还是模式匹配方法
- 先创建测试用例 - 在编写规则之前先添加和
ruleid:注释ok: - 分析AST - 运行以理解代码结构
semgrep --dump-ast - 编写规则 - 从简单开始,逐步迭代
- 测试直至100%通过 - 没有"遗漏行"或"错误匹配行"
- 优化模式 - 仅在测试通过后再移除冗余内容
输出结构:
<rule-id>/
├── <rule-id>.yaml # Semgrep规则
└── <rule-id>.<ext> # 测试文件Detailed References
详细参考
Official Semgrep Documentation:
- Rule Syntax - Complete YAML structure, operators, and options
- Rule Schema - Full JSON schema specification
Local References:
- Workflow Guide - Complete step-by-step rule creation process
- Quick Reference - Pattern operators and taint components
官方Semgrep文档:
- Rule Syntax - 完整的YAML结构、操作符和选项
- Rule Schema - 完整的JSON模式规范
本地参考:
- Workflow Guide - 完整的规则创建分步流程
- Quick Reference - 模式操作符和污点组件
Anti-Patterns to Avoid
应避免的反模式
Too broad:
yaml
undefined过于宽泛:
yaml
undefinedBAD: Matches any function call
错误:匹配任意函数调用
pattern: $FUNC(...)
pattern: $FUNC(...)
GOOD: Specific dangerous function
正确:匹配特定危险函数
pattern: eval(...)
**Missing safe cases:**
```pythonpattern: eval(...)
**缺少安全案例:**
```pythonBAD: Only tests vulnerable case
错误:仅测试存在漏洞的案例
ruleid: my-rule
ruleid: my-rule
dangerous(user_input)
dangerous(user_input)
GOOD: Include safe cases
正确:包含安全案例
ruleid: my-rule
ruleid: my-rule
dangerous(user_input)
dangerous(user_input)
ok: my-rule
ok: my-rule
dangerous(sanitize(user_input))
undefineddangerous(sanitize(user_input))
undefinedRationalizations to Reject
应拒绝的不合理做法
| Shortcut | Why It's Wrong |
|---|---|
| "Semgrep found nothing, code is clean" | Semgrep is pattern-based; can't track complex cross-function data flow |
| "The pattern looks complete" | Untested rules have hidden false positives/negatives |
| "It matches the vulnerable case" | Matching vulnerabilities is half the job; verify safe cases don't match |
| "Taint mode is overkill" | For injection vulnerabilities, taint mode gives better precision |
| "One test case is enough" | Include edge cases: different coding styles, sanitized inputs, safe alternatives |
| 捷径 | 错误原因 |
|---|---|
| "Semgrep没找到问题,代码就是干净的" | Semgrep是基于模式的工具;无法追踪复杂的跨函数数据流 |
| "这个模式看起来很完整" | 未测试的规则存在隐藏的误报/漏报 |
| "它匹配了存在漏洞的案例" | 匹配漏洞只是工作的一半;还要验证安全案例不会被匹配 |
| "污点模式小题大做" | 对于注入漏洞,污点模式能提供更高的精准度 |
| "一个测试用例就够了" | 要包含边缘案例:不同的编码风格、经过清理的输入、安全替代方案 |
CI/CD Integration
CI/CD集成
GitHub Actions
GitHub Actions
yaml
name: Semgrep
on:
push:
branches: [main]
pull_request:
schedule:
- cron: '0 0 1 * *'
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run Semgrep
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
else
semgrep ci
fi
env:
SEMGREP_RULES: >-
p/security-audit
p/owasp-top-ten
p/trailofbitsyaml
name: Semgrep
on:
push:
branches: [main]
pull_request:
schedule:
- cron: '0 0 1 * *'
jobs:
semgrep:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run Semgrep
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
else
semgrep ci
fi
env:
SEMGREP_RULES: >-
p/security-audit
p/owasp-top-ten
p/trailofbitsResources
资源
Rule Writing:
- Rule Syntax: https://semgrep.dev/docs/writing-rules/rule-syntax
- Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
- Rule Schema: https://github.com/semgrep/semgrep-interfaces/blob/main/rule_schema_v1.yaml
General:
- Registry: https://semgrep.dev/explore
- Playground: https://semgrep.dev/playground
- Docs: https://semgrep.dev/docs/
- Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules
规则编写:
- Rule Syntax: https://semgrep.dev/docs/writing-rules/rule-syntax
- Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
- Rule Schema: https://github.com/semgrep/semgrep-interfaces/blob/main/rule_schema_v1.yaml
通用资源:
- Registry: https://semgrep.dev/explore
- Playground: https://semgrep.dev/playground
- Docs: https://semgrep.dev/docs/
- Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules