hipaa-guardian

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

HIPAA Guardian

HIPAA守护者

A comprehensive PHI/PII detection and HIPAA compliance skill for AI agents, with a strong focus on developer code security patterns. Detects all 18 HIPAA Safe Harbor identifiers in data files and source code, provides risk scoring, maps findings to HIPAA regulations, and generates audit reports with remediation guidance.

这是一款面向AI Agent的综合性PHI/PII检测与HIPAA合规Skill，重点关注开发者代码安全模式。可检测数据文件和源代码中所有18种HIPAA安全港标识符，提供风险评分，将检测结果映射到HIPAA法规，并生成包含整改指导的审计报告。

Capabilities

功能特性

PHI/PII Detection - Scan data files for the 18 HIPAA Safe Harbor identifiers
Code Scanning - Detect PHI in source code, comments, test fixtures, configs
Auth Gate Detection - Find API endpoints exposing PHI without authentication
Log Safety Audit - Detect PHI leaking into log statements
Classification - Classify findings as PHI, PII, or sensitive_nonPHI
Risk Scoring - Score findings 0-100 based on sensitivity and exposure
HIPAA Mapping - Map each finding to specific HIPAA rules
Audit Reports - Generate findings.json, audit reports, and playbooks
Remediation - Provide step-by-step remediation with code examples
Control Checks - Validate security controls are in place

PHI/PII检测 - 扫描数据文件中的18种HIPAA安全港标识符
代码扫描 - 检测源代码、注释、测试fixtures、配置文件中的PHI
身份验证网关检测 - 查找未经过身份验证就暴露PHI的API端点
日志安全审计 - 检测日志语句中是否存在PHI泄露
分类 - 将检测结果分类为PHI、PII或sensitive_nonPHI（敏感非PHI）
风险评分 - 根据敏感度和暴露程度为检测结果打分（0-100分）
HIPAA映射 - 将每个检测结果映射到具体的HIPAA规则
审计报告 - 生成findings.json、审计报告和操作手册
整改指导 - 提供带代码示例的分步整改方案
控制检查 - 验证安全控制措施是否到位

Usage

使用方法

/hipaa-guardian [command] [path] [options]

/hipaa-guardian [command] [path] [options]

Commands

命令

```
scan <path>
```
- Scan files or directories for PHI/PII
```
scan-code <path>
```
- Scan source code for PHI leakage
```
scan-auth <path>
```
- Check API endpoints for missing authentication before PHI access
```
scan-logs <path>
```
- Detect PHI patterns in logging statements
```
scan-response <path>
```
- Check API responses for unmasked PHI exposure
```
audit <path>
```
- Generate full HIPAA compliance audit report
```
controls <path>
```
- Check security controls in a project
```
report
```
- Generate report from existing findings

```
scan <path>
```
- 扫描文件或目录中的PHI/PII
```
scan-code <path>
```
- 扫描源代码中的PHI泄露情况
```
scan-auth <path>
```
- 检查API端点在访问PHI前是否缺失身份验证
```
scan-logs <path>
```
- 检测日志语句中的PHI模式
```
scan-response <path>
```
- 检查API响应中是否存在未掩码的PHI暴露
```
audit <path>
```
- 生成完整的HIPAA合规性审计报告
```
controls <path>
```
- 检查项目中的安全控制措施
```
report
```
- 根据现有检测结果生成报告

Options

选项

```
--format <type>
```
- Output format: json, markdown, csv (default: markdown)
```
--output <file>
```
- Write results to file
```
--severity <level>
```
- Minimum severity: low, medium, high, critical
```
--include <patterns>
```
- File patterns to include
```
--exclude <patterns>
```
- File patterns to exclude
```
--synthetic
```
- Treat all data as synthetic (default for safety)

```
--format <type>
```
- 输出格式：json、markdown、csv（默认值：markdown）
```
--output <file>
```
- 将结果写入文件
```
--severity <level>
```
- 最低严重级别：low、medium、high、critical
```
--include <patterns>
```
- 要包含的文件模式
```
--exclude <patterns>
```
- 要排除的文件模式
```
--synthetic
```
- 将所有数据视为合成数据（安全默认设置）

Workflow

工作流程

When invoked, follow this workflow:

调用时，请遵循以下工作流程：

Step 1: Determine Scan Scope

步骤1：确定扫描范围

Ask the user to specify:

Target path (file, directory, or glob pattern)
Scan type (data files, source code, or both)
Whether data is synthetic/test data or potentially real PHI

请用户指定：

目标路径（文件、目录或glob模式）
扫描类型（数据文件、源代码或两者皆有）
数据是合成/测试数据还是可能包含真实PHI

Step 2: File Discovery

步骤2：文件发现

Use Glob to find relevant files:

undefined

使用Glob查找相关文件：

undefined

For data files

针对数据文件

Glob: **/*.{json,csv,txt,log,xml,hl7,fhir}

For source code

针对源代码

Glob: **/*.{py,js,ts,tsx,java,cs,go,rb,sql,sh}

For config files

针对配置文件

Glob: **/*.{env,yaml,yml,json,xml,ini,conf}

undefined

Glob: **/*.{env,yaml,yml,json,xml,ini,conf}

undefined

Step 3: PHI Detection

步骤3：PHI检测

For each file, scan for the 18 HIPAA identifiers using patterns from

references/detection-patterns.md

Names - Patient, provider, relative names
Geographic - Addresses, cities, ZIP codes
Dates - DOB, admission, discharge, death dates
Phone Numbers - All formats
Fax Numbers - All formats
Email Addresses - All formats
SSN - Social Security Numbers
MRN - Medical Record Numbers
Health Plan IDs - Insurance identifiers
Account Numbers - Financial accounts
License Numbers - Driver's license, professional
Vehicle IDs - VIN, license plates
Device IDs - Serial numbers, UDI
URLs - Web addresses
IP Addresses - Network identifiers
Biometric - Fingerprints, retinal, voice
Photos - Full-face images
Other Unique IDs - Any other identifying numbers

对每个文件，使用

references/detection-patterns.md

中的模式扫描18种HIPAA标识符：

姓名 - 患者、提供者、亲属姓名
地理位置信息 - 地址、城市、邮政编码
日期 - 出生日期、入院、出院、死亡日期
电话号码 - 所有格式
传真号码 - 所有格式
电子邮件地址 - 所有格式
SSN - 社会保险号码
MRN - 病历号
健康计划ID - 保险标识符
账户号码 - 金融账户
许可证号码 - 驾照、专业许可证
车辆ID - VIN、车牌
设备ID - 序列号、UDI
URL - 网址
IP地址 - 网络标识符
生物特征 - 指纹、视网膜、语音
照片 - 全脸图像
其他唯一ID - 任何其他识别号码

Step 4: Classification

步骤4：分类

Classify each finding:

PHI - Health information linkable to individual
PII - Personally identifiable but not health-related
sensitive_nonPHI - Sensitive but not individually identifiable

对每个检测结果进行分类：

PHI - 可关联到个人的健康信息
PII - 可识别个人身份但与健康无关的信息
sensitive_nonPHI - 敏感但无法识别个人身份的信息

Step 5: Risk Scoring

步骤5：风险评分

Calculate risk score (0-100) using methodology from

references/risk-scoring.md

Risk Score = (Sensitivity × 0.35) + (Exposure × 0.25) +
             (Volume × 0.20) + (Identifiability × 0.20)

使用

references/risk-scoring.md

中的方法计算风险评分（0-100）：

风险评分 = (敏感度 × 0.35) + (暴露程度 × 0.25) +
             (数据量 × 0.20) + (可识别性 × 0.20)

Step 6: HIPAA Mapping

步骤6：HIPAA映射

Map findings to HIPAA rules from references:

```
references/privacy-rule.md
```
- 45 CFR 164.500-534
```
references/security-rule.md
```
- 45 CFR 164.302-318
```
references/breach-rule.md
```
- 45 CFR 164.400-414

将检测结果映射到参考文档中的HIPAA规则：

```
references/privacy-rule.md
```
- 45 CFR 164.500-534
```
references/security-rule.md
```
- 45 CFR 164.302-318
```
references/breach-rule.md
```
- 45 CFR 164.400-414

Step 7: Generate Output

步骤7：生成输出

Create structured output following

examples/sample-finding.json

format:

json

{
  "id": "F-YYYYMMDD-NNNN",
  "timestamp": "ISO-8601",
  "file": "path/to/file",
  "line": 123,
  "field": "field.path",
  "value_hash": "sha256:...",
  "classification": "PHI|PII|sensitive_nonPHI",
  "identifier_type": "ssn|mrn|dob|...",
  "confidence": 0.95,
  "risk_score": 85,
  "hipaa_rules": [...],
  "remediation": [...],
  "status": "open"
}

按照

examples/sample-finding.json

格式创建结构化输出：

json

{
  "id": "F-YYYYMMDD-NNNN",
  "timestamp": "ISO-8601",
  "file": "path/to/file",
  "line": 123,
  "field": "field.path",
  "value_hash": "sha256:...",
  "classification": "PHI|PII|sensitive_nonPHI",
  "identifier_type": "ssn|mrn|dob|...",
  "confidence": 0.95,
  "risk_score": 85,
  "hipaa_rules": [...],
  "remediation": [...],
  "status": "open"
}

Code Scanning

代码扫描

When scanning source code, look for:

扫描源代码时，需查找以下内容：

1. Hardcoded PHI in Source

1. 源代码中的硬编码PHI

String literals containing SSN, MRN, names, dates
Variable assignments with sensitive values
Database seed/fixture data

包含SSN、MRN、姓名、日期的字符串字面量
分配敏感值的变量
数据库种子/fixture数据

2. PHI in Comments

2. 注释中的PHI

Example data in code comments
TODO comments with patient info
Documentation strings with real data

代码注释中的示例数据
包含患者信息的TODO注释
包含真实数据的文档字符串

3. Test Data Leakage

3. 测试数据泄露

Test fixtures with real PHI
Mock data files with actual patient info
Integration test data

包含真实PHI的测试fixtures
包含实际患者信息的模拟数据文件
集成测试数据

4. Configuration Files

4. 配置文件

```
.env
```
files with PHI
Connection strings with embedded credentials
API responses cached with PHI

```
.env
```
文件中的PHI
嵌入凭据的连接字符串
缓存了PHI的API响应

5. SQL Files

5. SQL文件

INSERT statements with PHI
Sample queries with real patient data
Database dumps

See

references/code-scanning.md

for detailed patterns.

包含PHI的INSERT语句
包含真实患者数据的示例查询
数据库转储文件

详细模式请参阅

references/code-scanning.md

。

Security Control Checks

安全控制检查

Verify these controls are in place:

验证以下控制措施是否到位：

Access Controls

访问控制

Role-based access control (RBAC) implemented
Minimum necessary access principle applied
Access logging enabled

已实现基于角色的访问控制（RBAC）
应用了最小必要访问原则
已启用访问日志

Encryption

加密

Data encrypted at rest (AES-256)
Data encrypted in transit (TLS 1.2+)
Encryption keys properly managed

静态数据已加密（AES-256）
传输中数据已加密（TLS 1.2+）
加密密钥管理得当

Audit Controls

审计控制

Code Security

代码安全

```
.gitignore
```
excludes sensitive files
Pre-commit hooks scan for PHI
Secrets management in place
Data masking in logs

```
.gitignore
```
排除了敏感文件
提交前钩子会扫描PHI
已部署密钥管理措施
日志中已实现数据掩码

Output Formats

输出格式

findings.json

Structured array of all findings with full metadata.

包含所有检测结果及完整元数据的结构化数组。

audit_report.md

Human-readable report with:

Executive summary
Findings by severity
HIPAA compliance status
Risk assessment
Recommendations

人类可读的报告，包含：

执行摘要
按严重程度分类的检测结果
HIPAA合规状态
风险评估
建议

playbook.md

Step-by-step remediation guide:

Prioritized actions
Code examples
Verification steps

分步整改指南：

优先级行动项
代码示例
验证步骤

Security Guardrails

安全防护措施

Default Synthetic Mode - Assumes data is synthetic unless confirmed otherwise
No PHI Storage - Never stores detected PHI values, only hashes
Redaction - All example outputs redact actual values
Warning Prompts - Warns before processing potentially real PHI
Audit Trail - Logs all scans (without PHI values)

默认合成模式 - 除非另行确认，否则默认假设数据为合成数据
不存储PHI - 绝不存储检测到的PHI值，仅存储哈希值
脱敏处理 - 所有示例输出都会对实际值进行脱敏
警告提示 - 在处理可能包含真实PHI的数据前发出警告
审计跟踪 - 记录所有扫描操作（不包含PHI值）

References

参考文档

```
references/hipaa-identifiers.md
```
- All 18 HIPAA Safe Harbor identifiers
```
references/detection-patterns.md
```
- Regex patterns for PHI detection
```
references/code-scanning.md
```
- Code scanning patterns and rules
```
references/healthcare-formats.md
```
- FHIR, HL7, CDA detection patterns
```
references/privacy-rule.md
```
- HIPAA Privacy Rule (45 CFR 164.500-534)
```
references/security-rule.md
```
- HIPAA Security Rule (45 CFR 164.302-318)
```
references/breach-rule.md
```
- Breach Notification Rule (45 CFR 164.400-414)
```
references/risk-scoring.md
```
- Risk scoring methodology
```
references/auth-patterns.md
```
- Authentication gate patterns for PHI endpoints
```
references/logging-safety.md
```
- PHI-safe logging patterns and filters
```
references/api-security.md
```
- API response masking and field-level auth

```
references/hipaa-identifiers.md
```
- 所有18种HIPAA安全港标识符
```
references/detection-patterns.md
```
- PHI检测的正则表达式模式
```
references/code-scanning.md
```
- 代码扫描模式和规则
```
references/healthcare-formats.md
```
- FHIR、HL7、CDA检测模式
```
references/privacy-rule.md
```
- HIPAA隐私规则（45 CFR 164.500-534）
```
references/security-rule.md
```
- HIPAA安全规则（45 CFR 164.302-318）
```
references/breach-rule.md
```
- 违规通知规则（45 CFR 164.400-414）
```
references/risk-scoring.md
```
- 风险评分方法
```
references/auth-patterns.md
```
- PHI端点的身份验证网关模式
```
references/logging-safety.md
```
- 安全的PHI日志记录模式和过滤器
```
references/api-security.md
```
- API响应掩码和字段级身份验证

CI/CD Integration

CI/CD集成

Pre-Commit Hook Installation

提交前钩子安装

bash

undefined

bash

undefined

Install the pre-commit hook

安装提交前钩子

cp scripts/pre-commit-hook.sh .git/hooks/pre-commit chmod +x .git/hooks/pre-commit

Or using pre-commit framework

或使用pre-commit框架

Add to .pre-commit-config.yaml:

添加到 .pre-commit-config.yaml:

repos:

repo: local hooks:
- id: hipaa-guardian name: HIPAA Guardian PHI Scan entry: python scripts/detect-phi.py language: python types: [file] pass_filenames: true

undefined

repos:

repo: local hooks:
- id: hipaa-guardian name: HIPAA Guardian PHI Scan entry: python scripts/detect-phi.py language: python types: [file] pass_filenames: true

undefined

Environment Variables

环境变量

bash

undefined

bash

undefined

Configure pre-commit behavior

配置提交前钩子行为

export HIPAA_BLOCK_ON_CRITICAL=true # Block commits with critical findings export HIPAA_BLOCK_ON_HIGH=true # Block commits with high severity findings export HIPAA_SCAN_DATA=true # Scan data files export HIPAA_SCAN_CODE=true # Scan source code export HIPAA_VERBOSE=false # Enable verbose output

undefined

export HIPAA_BLOCK_ON_CRITICAL=true # 阻止包含严重检测结果的提交 export HIPAA_BLOCK_ON_HIGH=true # 阻止包含高严重级别检测结果的提交 export HIPAA_SCAN_DATA=true # 扫描数据文件 export HIPAA_SCAN_CODE=true # 扫描源代码 export HIPAA_VERBOSE=false # 启用详细输出

undefined

GitHub Actions Integration

GitHub Actions集成

yaml

undefined

yaml

undefined

.github/workflows/hipaa-scan.yml

name: HIPAA PHI Scan on: [push, pull_request] jobs: scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.11' - name: Run PHI Scan run: | python scripts/detect-phi.py . --format markdown --output phi-report.md - name: Upload Report uses: actions/upload-artifact@v4 with: name: phi-scan-report path: phi-report.md

undefined

undefined

Healthcare Data Format Support

医疗数据格式支持

Supported Formats

支持的格式

Format	Extensions	Detection
FHIR R4	`.fhir.json` , `.fhir.xml`	Resource type, identifiers
HL7 v2.x	`.hl7` , `.hl7v2`	MSH, PID, DG1 segments
CDA/C-CDA	`.cda` , `.ccda` , `.ccd`	ClinicalDocument, patientRole
X12 EDI	`.x12` , `.edi` , `.837`	Transaction set headers

格式	扩展名	检测能力
FHIR R4	`.fhir.json` , `.fhir.xml`	资源类型、标识符
HL7 v2.x	`.hl7` , `.hl7v2`	MSH、PID、DG1段
CDA/C-CDA	`.cda` , `.ccda` , `.ccd`	ClinicalDocument、patientRole
X12 EDI	`.x12` , `.edi` , `.837`	交易集头

High-Risk FHIR Resources

高风险FHIR资源

```
Patient
```
- Demographics, identifiers, contacts
```
Condition
```
- Diagnoses, health conditions
```
Observation
```
- Lab results, vitals
```
MedicationRequest
```
- Prescriptions
```
DiagnosticReport
```
- Test results

```
Patient
```
- 人口统计信息、标识符、联系人
```
Condition
```
- 诊断结果、健康状况
```
Observation
```
- 实验室结果、生命体征
```
MedicationRequest
```
- 处方
```
DiagnosticReport
```
- 检测结果

HL7 v2 PHI Segments

示例

```
PID
```
- Patient Identification (SSN in PID-19)
```
DG1
```
- Diagnosis Information
```
OBX
```
- Observation/Result Values
```
IN1
```
- Insurance Information

```
examples/sample-finding.json
```
- 示例检测结果输出格式
```
examples/sample-audit-report.md
```
- 示例审计报告
```
examples/synthetic-phi-data.json
```
- 用于验证的测试数据

Examples

脚本

```
examples/sample-finding.json
```
- Example finding output format
```
examples/sample-audit-report.md
```
- Example audit report
```
examples/synthetic-phi-data.json
```
- Test data for validation

```
scripts/detect-phi.py
```
- 数据文件中的PHI/PII检测（支持FHIR、HL7、CDA格式）
```
scripts/scan-code.py
```
- 代码中的PHI泄露扫描
```
scripts/scan-auth.py
```
- PHI端点的身份验证网关检测
```
scripts/scan-logs.py
```
- 日志语句中的PHI检测
```
scripts/scan-response.py
```
- API响应中的PHI暴露检测
```
scripts/generate-report.py
```
- 报告生成脚本
```
scripts/validate-controls.sh
```
- 控制措施验证脚本
```
scripts/pre-commit-hook.sh
```
- 用于CI/CD集成的Git提交前钩子

Scripts

—

```
scripts/detect-phi.py
```
- PHI/PII detection in data files (supports FHIR, HL7, CDA formats)
```
scripts/scan-code.py
```
- Code scanning for PHI leakage
```
scripts/scan-auth.py
```
- Authentication gate detection for PHI endpoints
```
scripts/scan-logs.py
```
- PHI detection in logging statements
```
scripts/scan-response.py
```
- API response PHI exposure detection
```
scripts/generate-report.py
```
- Report generation script
```
scripts/validate-controls.sh
```
- Control validation script
```
scripts/pre-commit-hook.sh
```
- Git pre-commit hook for CI/CD integration

—