semgrep

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Semgrep Static Analysis

Semgrep静态分析

Fast, pattern-based static analysis for security scanning and custom rule creation.
一款快速的、基于模式的静态分析工具,用于安全扫描和自定义规则创建。

When to Use Semgrep

何时使用Semgrep

Ideal scenarios:
  • Quick security scans (minutes, not hours)
  • Pattern-based bug and vulnerability detection
  • Enforcing coding standards and best practices
  • Finding known vulnerability patterns (OWASP, CWE)
  • Creating custom detection rules for your codebase
  • Data flow analysis with taint mode
适用场景:
  • 快速安全扫描(仅需数分钟,而非数小时)
  • 基于模式的bug和漏洞检测
  • 强制执行编码标准和最佳实践
  • 查找已知漏洞模式(OWASP、CWE)
  • 为你的代码库创建自定义检测规则
  • 使用污点模式进行数据流分析

Installation

安装

bash
undefined
bash
undefined

pip (recommended)

pip (推荐)

python3 -m pip install semgrep
python3 -m pip install semgrep

Homebrew

Homebrew

brew install semgrep
brew install semgrep

Docker

Docker

docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

---
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

---

Part 1: Running Scans

第一部分:运行扫描

Quick Scan

快速扫描

bash
semgrep --config auto .                    # Auto-detect rules
bash
semgrep --config auto .                    # Auto-detect rules

Using Rulesets

使用规则集

bash
semgrep --config p/<RULESET> .             # Single ruleset
semgrep --config p/security-audit --config p/trailofbits .  # Multiple
RulesetDescription
p/default
General security and code quality
p/security-audit
Comprehensive security rules
p/owasp-top-ten
OWASP Top 10 vulnerabilities
p/cwe-top-25
CWE Top 25 vulnerabilities
p/trailofbits
Trail of Bits security rules
p/python
Python-specific
p/javascript
JavaScript-specific
p/golang
Go-specific
bash
semgrep --config p/<RULESET> .             # 单个规则集
semgrep --config p/security-audit --config p/trailofbits .  # 多个规则集
规则集描述
p/default
通用安全与代码质量规则
p/security-audit
全面的安全规则
p/owasp-top-ten
OWASP十大漏洞规则
p/cwe-top-25
CWE前25位漏洞规则
p/trailofbits
Trail of Bits安全规则
p/python
Python专属规则
p/javascript
JavaScript专属规则
p/golang
Go专属规则

Output Formats

输出格式

bash
semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF
semgrep --config p/security-audit --json -o results.json .     # JSON
bash
semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF格式
semgrep --config p/security-audit --json -o results.json .     # JSON格式

Scan Specific Paths

扫描特定路径

bash
semgrep --config p/python app.py           # Single file
semgrep --config p/javascript src/         # Directory
semgrep --config auto --include='**/test/**' .  # Include tests
bash
semgrep --config p/python app.py           # 单个文件
semgrep --config p/javascript src/         # 目录
semgrep --config auto --include='**/test/**' .  # 包含测试文件

Configuration

配置

.semgrepignore

.semgrepignore

tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/
tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/

Suppress False Positives

抑制误报

python
password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

python
password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

Part 2: Creating Custom Rules

第二部分:创建自定义规则

When to Create Custom Rules

何时创建自定义规则

  • Detecting project-specific vulnerability patterns
  • Enforcing internal coding standards
  • Building security checks for custom frameworks
  • Creating taint-mode rules for data flow analysis
  • 检测项目特定的漏洞模式
  • 强制执行内部编码标准
  • 为自定义框架构建安全检查
  • 创建用于数据流分析的污点模式规则

Approach Selection

方法选择

ApproachUse When
Taint modeData flows from untrusted source to dangerous sink (injection vulnerabilities)
Pattern matchingSyntactic patterns without data flow requirements (deprecated APIs, hardcoded values)
Prioritize taint mode for injection vulnerabilities. Pattern matching alone can't distinguish between
eval(user_input)
(vulnerable) and
eval("safe_literal")
(safe).
方法适用场景
污点模式数据从不可信源流向危险操作(注入漏洞)
模式匹配无需数据流要求的语法模式(已弃用的API、硬编码值)
针对注入漏洞优先使用污点模式。仅使用模式匹配无法区分
eval(user_input)
(存在漏洞)和
eval("safe_literal")
(安全)。

Quick Start: Pattern Matching

快速入门:模式匹配

yaml
rules:
  - id: hardcoded-password
    languages: [python]
    message: "Hardcoded password detected: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"
yaml
rules:
  - id: hardcoded-password
    languages: [python]
    message: "Hardcoded password detected: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

Quick Start: Taint Mode

快速入门:污点模式

yaml
rules:
  - id: command-injection
    languages: [python]
    message: User input flows to command execution
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: os.system(...)
      - pattern: subprocess.call($CMD, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)
yaml
rules:
  - id: command-injection
    languages: [python]
    message: User input flows to command execution
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: os.system(...)
      - pattern: subprocess.call($CMD, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)

Pattern Syntax Quick Reference

模式语法快速参考

SyntaxDescriptionExample
...
Match anything
func(...)
$VAR
Capture metavariable
$FUNC($INPUT)
<... ...>
Deep expression match
<... user_input ...>
OperatorDescription
pattern
Match exact pattern
patterns
All must match (AND)
pattern-either
Any matches (OR)
pattern-not
Exclude matches
pattern-inside
Match only inside context
pattern-not-inside
Match only outside context
metavariable-regex
Regex on captured value
语法描述示例
...
匹配任意内容
func(...)
$VAR
捕获元变量
$FUNC($INPUT)
<... ...>
深度表达式匹配
<... user_input ...>
操作符描述
pattern
匹配精确模式
patterns
所有模式必须匹配(逻辑与)
pattern-either
任意模式匹配即可(逻辑或)
pattern-not
排除匹配项
pattern-inside
仅在指定上下文中匹配
pattern-not-inside
仅在指定上下文外匹配
metavariable-regex
对捕获的值使用正则表达式

Testing Rules

测试规则

Test-first is mandatory. Create test files with annotations:
python
undefined
必须采用测试优先的方式。创建带有注释的测试文件:
python
undefined

test_rule.py

test_rule.py

def test_vulnerable(): user_input = request.args.get("id") # ruleid: my-rule-id cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe(): user_input = request.args.get("id") # ok: my-rule-id cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))

Run tests:
```bash
semgrep --test --config rule.yaml test-file
def test_vulnerable(): user_input = request.args.get("id") # ruleid: my-rule-id cursor.execute("SELECT * FROM users WHERE id = " + user_input)
def test_safe(): user_input = request.args.get("id") # ok: my-rule-id cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))

运行测试:
```bash
semgrep --test --config rule.yaml test-file

Command Reference

命令参考

TaskCommand
Run tests
semgrep --test --config rule.yaml test-file
Validate YAML
semgrep --validate --config rule.yaml
Dump AST
semgrep --dump-ast -l <lang> <file>
Debug taint flow
semgrep --dataflow-traces -f rule.yaml file
任务命令
运行测试
semgrep --test --config rule.yaml test-file
验证YAML
semgrep --validate --config rule.yaml
导出AST
semgrep --dump-ast -l <lang> <file>
调试污点流
semgrep --dataflow-traces -f rule.yaml file

Rule Creation Workflow

规则创建工作流

  1. Analyze the problem - Understand the bug pattern, determine taint vs pattern approach
  2. Create test cases first - Write
    ruleid:
    and
    ok:
    annotations before the rule
  3. Analyze AST - Run
    semgrep --dump-ast
    to understand code structure
  4. Write the rule - Start simple, iterate
  5. Test until 100% pass - No "missed lines" or "incorrect lines"
  6. Optimize patterns - Remove redundancies only after tests pass
Output structure:
<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file
  1. 分析问题 - 理解bug模式,确定使用污点模式还是模式匹配方法
  2. 先创建测试用例 - 在编写规则之前先添加
    ruleid:
    ok:
    注释
  3. 分析AST - 运行
    semgrep --dump-ast
    以理解代码结构
  4. 编写规则 - 从简单开始,逐步迭代
  5. 测试直至100%通过 - 没有"遗漏行"或"错误匹配行"
  6. 优化模式 - 仅在测试通过后再移除冗余内容
输出结构:
<rule-id>/
├── <rule-id>.yaml     # Semgrep规则
└── <rule-id>.<ext>    # 测试文件

Detailed References

详细参考

Official Semgrep Documentation:
Local References:
  • Workflow Guide - Complete step-by-step rule creation process
  • Quick Reference - Pattern operators and taint components
官方Semgrep文档:
本地参考:
  • Workflow Guide - 完整的规则创建分步流程
  • Quick Reference - 模式操作符和污点组件

Anti-Patterns to Avoid

应避免的反模式

Too broad:
yaml
undefined
过于宽泛:
yaml
undefined

BAD: Matches any function call

错误:匹配任意函数调用

pattern: $FUNC(...)
pattern: $FUNC(...)

GOOD: Specific dangerous function

正确:匹配特定危险函数

pattern: eval(...)

**Missing safe cases:**
```python
pattern: eval(...)

**缺少安全案例:**
```python

BAD: Only tests vulnerable case

错误:仅测试存在漏洞的案例

ruleid: my-rule

ruleid: my-rule

dangerous(user_input)
dangerous(user_input)

GOOD: Include safe cases

正确:包含安全案例

ruleid: my-rule

ruleid: my-rule

dangerous(user_input)
dangerous(user_input)

ok: my-rule

ok: my-rule

dangerous(sanitize(user_input))
undefined
dangerous(sanitize(user_input))
undefined

Rationalizations to Reject

应拒绝的不合理做法

ShortcutWhy It's Wrong
"Semgrep found nothing, code is clean"Semgrep is pattern-based; can't track complex cross-function data flow
"The pattern looks complete"Untested rules have hidden false positives/negatives
"It matches the vulnerable case"Matching vulnerabilities is half the job; verify safe cases don't match
"Taint mode is overkill"For injection vulnerabilities, taint mode gives better precision
"One test case is enough"Include edge cases: different coding styles, sanitized inputs, safe alternatives

捷径错误原因
"Semgrep没找到问题,代码就是干净的"Semgrep是基于模式的工具;无法追踪复杂的跨函数数据流
"这个模式看起来很完整"未测试的规则存在隐藏的误报/漏报
"它匹配了存在漏洞的案例"匹配漏洞只是工作的一半;还要验证安全案例不会被匹配
"污点模式小题大做"对于注入漏洞,污点模式能提供更高的精准度
"一个测试用例就够了"要包含边缘案例:不同的编码风格、经过清理的输入、安全替代方案

CI/CD Integration

CI/CD集成

GitHub Actions

GitHub Actions

yaml
name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

yaml
name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

Resources

资源