semgrep

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Semgrep Static Analysis

Semgrep静态分析

Fast, pattern-based static analysis for security scanning and custom rule creation.

一款快速的、基于模式的静态分析工具，用于安全扫描和自定义规则创建。

When to Use Semgrep

何时使用Semgrep

Ideal scenarios:

Quick security scans (minutes, not hours)
Pattern-based bug and vulnerability detection
Enforcing coding standards and best practices
Finding known vulnerability patterns (OWASP, CWE)
Creating custom detection rules for your codebase
Data flow analysis with taint mode

适用场景：

快速安全扫描（仅需数分钟，而非数小时）
基于模式的bug和漏洞检测
强制执行编码标准和最佳实践
查找已知漏洞模式（OWASP、CWE）
为你的代码库创建自定义检测规则
使用污点模式进行数据流分析

Installation

安装

bash

undefined

bash

undefined

pip (recommended)

pip (推荐)

python3 -m pip install semgrep

Homebrew

brew install semgrep

Docker

docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

---

docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

---

Part 1: Running Scans

第一部分：运行扫描

Quick Scan

快速扫描

bash

semgrep --config auto .                    # Auto-detect rules

bash

semgrep --config auto .                    # Auto-detect rules

Using Rulesets

使用规则集

bash

semgrep --config p/<RULESET> .             # Single ruleset
semgrep --config p/security-audit --config p/trailofbits .  # Multiple

Ruleset	Description
`p/default`	General security and code quality
`p/security-audit`	Comprehensive security rules
`p/owasp-top-ten`	OWASP Top 10 vulnerabilities
`p/cwe-top-25`	CWE Top 25 vulnerabilities
`p/trailofbits`	Trail of Bits security rules
`p/python`	Python-specific
`p/javascript`	JavaScript-specific
`p/golang`	Go-specific

bash

semgrep --config p/<RULESET> .             # 单个规则集
semgrep --config p/security-audit --config p/trailofbits .  # 多个规则集

规则集	描述
`p/default`	通用安全与代码质量规则
`p/security-audit`	全面的安全规则
`p/owasp-top-ten`	OWASP十大漏洞规则
`p/cwe-top-25`	CWE前25位漏洞规则
`p/trailofbits`	Trail of Bits安全规则
`p/python`	Python专属规则
`p/javascript`	JavaScript专属规则
`p/golang`	Go专属规则

Output Formats

输出格式

bash

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF
semgrep --config p/security-audit --json -o results.json .     # JSON

bash

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF格式
semgrep --config p/security-audit --json -o results.json .     # JSON格式

Scan Specific Paths

扫描特定路径

bash

semgrep --config p/python app.py           # Single file
semgrep --config p/javascript src/         # Directory
semgrep --config auto --include='**/test/**' .  # Include tests

bash

semgrep --config p/python app.py           # 单个文件
semgrep --config p/javascript src/         # 目录
semgrep --config auto --include='**/test/**' .  # 包含测试文件

Configuration

配置

.semgrepignore

tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/

tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/

Suppress False Positives

抑制误报

python

password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

python

password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

Part 2: Creating Custom Rules

第二部分：创建自定义规则

When to Create Custom Rules

何时创建自定义规则

Detecting project-specific vulnerability patterns
Enforcing internal coding standards
Building security checks for custom frameworks
Creating taint-mode rules for data flow analysis

检测项目特定的漏洞模式
强制执行内部编码标准
为自定义框架构建安全检查
创建用于数据流分析的污点模式规则

Approach Selection

方法选择

Approach	Use When
Taint mode	Data flows from untrusted source to dangerous sink (injection vulnerabilities)
Pattern matching	Syntactic patterns without data flow requirements (deprecated APIs, hardcoded values)

Prioritize taint mode for injection vulnerabilities. Pattern matching alone can't distinguish between

eval(user_input)

(vulnerable) and

eval("safe_literal")

(safe).

方法	适用场景
污点模式	数据从不可信源流向危险操作（注入漏洞）
模式匹配	无需数据流要求的语法模式（已弃用的API、硬编码值）

针对注入漏洞优先使用污点模式。仅使用模式匹配无法区分

eval(user_input)

（存在漏洞）和

eval("safe_literal")

（安全）。

Quick Start: Pattern Matching

快速入门：模式匹配

yaml

rules:
  - id: hardcoded-password
    languages: [python]
    message: "Hardcoded password detected: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

yaml

rules:
  - id: hardcoded-password
    languages: [python]
    message: "Hardcoded password detected: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

Quick Start: Taint Mode

快速入门：污点模式

yaml

rules:
  - id: command-injection
    languages: [python]
    message: User input flows to command execution
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: os.system(...)
      - pattern: subprocess.call($CMD, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)

yaml

rules:
  - id: command-injection
    languages: [python]
    message: User input flows to command execution
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: os.system(...)
      - pattern: subprocess.call($CMD, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)

Pattern Syntax Quick Reference

模式语法快速参考

Syntax	Description	Example
`...`	Match anything	`func(...)`
`$VAR`	Capture metavariable	`$FUNC($INPUT)`
`<... ...>`	Deep expression match	`<... user_input ...>`

Operator	Description
`pattern`	Match exact pattern
`patterns`	All must match (AND)
`pattern-either`	Any matches (OR)
`pattern-not`	Exclude matches
`pattern-inside`	Match only inside context
`pattern-not-inside`	Match only outside context
`metavariable-regex`	Regex on captured value

语法	描述	示例
`...`	匹配任意内容	`func(...)`
`$VAR`	捕获元变量	`$FUNC($INPUT)`
`<... ...>`	深度表达式匹配	`<... user_input ...>`

操作符	描述
`pattern`	匹配精确模式
`patterns`	所有模式必须匹配（逻辑与）
`pattern-either`	任意模式匹配即可（逻辑或）
`pattern-not`	排除匹配项
`pattern-inside`	仅在指定上下文中匹配
`pattern-not-inside`	仅在指定上下文外匹配
`metavariable-regex`	对捕获的值使用正则表达式

Testing Rules

测试规则

Test-first is mandatory. Create test files with annotations:

python

undefined

必须采用测试优先的方式。创建带有注释的测试文件：

python

undefined

test_rule.py

def test_vulnerable(): user_input = request.args.get("id") # ruleid: my-rule-id cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe(): user_input = request.args.get("id") # ok: my-rule-id cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))


Run tests:
```bash
semgrep --test --config rule.yaml test-file

def test_vulnerable(): user_input = request.args.get("id") # ruleid: my-rule-id cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe(): user_input = request.args.get("id") # ok: my-rule-id cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))


运行测试：
```bash
semgrep --test --config rule.yaml test-file

Command Reference

命令参考

Task	Command
Run tests	`semgrep --test --config rule.yaml test-file`
Validate YAML	`semgrep --validate --config rule.yaml`
Dump AST	`semgrep --dump-ast -l <lang> <file>`
Debug taint flow	`semgrep --dataflow-traces -f rule.yaml file`

任务	命令
运行测试	`semgrep --test --config rule.yaml test-file`
验证YAML	`semgrep --validate --config rule.yaml`
导出AST	`semgrep --dump-ast -l <lang> <file>`
调试污点流	`semgrep --dataflow-traces -f rule.yaml file`

Rule Creation Workflow

规则创建工作流

Analyze the problem - Understand the bug pattern, determine taint vs pattern approach
Create test cases first - Write
```
ruleid:
```
and
```
ok:
```
annotations before the rule
Analyze AST - Run
```
semgrep --dump-ast
```
to understand code structure
Write the rule - Start simple, iterate
Test until 100% pass - No "missed lines" or "incorrect lines"
Optimize patterns - Remove redundancies only after tests pass

Output structure:

<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file

分析问题 - 理解bug模式，确定使用污点模式还是模式匹配方法
先创建测试用例 - 在编写规则之前先添加
```
ruleid:
```
和
```
ok:
```
注释
分析AST - 运行
```
semgrep --dump-ast
```
以理解代码结构
编写规则 - 从简单开始，逐步迭代
测试直至100%通过 - 没有"遗漏行"或"错误匹配行"
优化模式 - 仅在测试通过后再移除冗余内容

输出结构：

<rule-id>/
├── <rule-id>.yaml     # Semgrep规则
└── <rule-id>.<ext>    # 测试文件

Detailed References

详细参考

Official Semgrep Documentation:

Rule Syntax - Complete YAML structure, operators, and options
Rule Schema - Full JSON schema specification

Local References:

Workflow Guide - Complete step-by-step rule creation process
Quick Reference - Pattern operators and taint components

官方Semgrep文档：

Rule Syntax - 完整的YAML结构、操作符和选项
Rule Schema - 完整的JSON模式规范

本地参考：

Workflow Guide - 完整的规则创建分步流程
Quick Reference - 模式操作符和污点组件

Anti-Patterns to Avoid

应避免的反模式

Too broad:

yaml

undefined

过于宽泛：

yaml

undefined

BAD: Matches any function call

错误：匹配任意函数调用

pattern: $FUNC(...)

GOOD: Specific dangerous function

正确：匹配特定危险函数

pattern: eval(...)


**Missing safe cases:**
```python

pattern: eval(...)


**缺少安全案例：**
```python

BAD: Only tests vulnerable case

错误：仅测试存在漏洞的案例

ruleid: my-rule

dangerous(user_input)

GOOD: Include safe cases

正确：包含安全案例

ruleid: my-rule

dangerous(user_input)

ok: my-rule

dangerous(sanitize(user_input))

undefined

dangerous(sanitize(user_input))

undefined

Rationalizations to Reject

应拒绝的不合理做法

Shortcut	Why It's Wrong
"Semgrep found nothing, code is clean"	Semgrep is pattern-based; can't track complex cross-function data flow
"The pattern looks complete"	Untested rules have hidden false positives/negatives
"It matches the vulnerable case"	Matching vulnerabilities is half the job; verify safe cases don't match
"Taint mode is overkill"	For injection vulnerabilities, taint mode gives better precision
"One test case is enough"	Include edge cases: different coding styles, sanitized inputs, safe alternatives

捷径	错误原因
"Semgrep没找到问题，代码就是干净的"	Semgrep是基于模式的工具；无法追踪复杂的跨函数数据流
"这个模式看起来很完整"	未测试的规则存在隐藏的误报/漏报
"它匹配了存在漏洞的案例"	匹配漏洞只是工作的一半；还要验证安全案例不会被匹配
"污点模式小题大做"	对于注入漏洞，污点模式能提供更高的精准度
"一个测试用例就够了"	要包含边缘案例：不同的编码风格、经过清理的输入、安全替代方案

CI/CD Integration

CI/CD集成

GitHub Actions

yaml

name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

yaml

name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

Resources

资源

Rule Writing:

Rule Syntax: https://semgrep.dev/docs/writing-rules/rule-syntax
Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
Rule Schema: https://github.com/semgrep/semgrep-interfaces/blob/main/rule_schema_v1.yaml

General:

Registry: https://semgrep.dev/explore
Playground: https://semgrep.dev/playground
Docs: https://semgrep.dev/docs/
Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules

规则编写：

Rule Syntax: https://semgrep.dev/docs/writing-rules/rule-syntax
Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
Rule Schema: https://github.com/semgrep/semgrep-interfaces/blob/main/rule_schema_v1.yaml

通用资源：

Registry: https://semgrep.dev/explore
Playground: https://semgrep.dev/playground
Docs: https://semgrep.dev/docs/
Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules