hipaa-guardian

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

HIPAA Guardian

HIPAA守护者

A comprehensive PHI/PII detection and HIPAA compliance skill for AI agents, with a strong focus on developer code security patterns. Detects all 18 HIPAA Safe Harbor identifiers in data files and source code, provides risk scoring, maps findings to HIPAA regulations, and generates audit reports with remediation guidance.
这是一款面向AI Agent的综合性PHI/PII检测与HIPAA合规Skill,重点关注开发者代码安全模式。可检测数据文件和源代码中所有18种HIPAA安全港标识符,提供风险评分,将检测结果映射到HIPAA法规,并生成包含整改指导的审计报告。

Capabilities

功能特性

  1. PHI/PII Detection - Scan data files for the 18 HIPAA Safe Harbor identifiers
  2. Code Scanning - Detect PHI in source code, comments, test fixtures, configs
  3. Auth Gate Detection - Find API endpoints exposing PHI without authentication
  4. Log Safety Audit - Detect PHI leaking into log statements
  5. Classification - Classify findings as PHI, PII, or sensitive_nonPHI
  6. Risk Scoring - Score findings 0-100 based on sensitivity and exposure
  7. HIPAA Mapping - Map each finding to specific HIPAA rules
  8. Audit Reports - Generate findings.json, audit reports, and playbooks
  9. Remediation - Provide step-by-step remediation with code examples
  10. Control Checks - Validate security controls are in place
  1. PHI/PII检测 - 扫描数据文件中的18种HIPAA安全港标识符
  2. 代码扫描 - 检测源代码、注释、测试fixtures、配置文件中的PHI
  3. 身份验证网关检测 - 查找未经过身份验证就暴露PHI的API端点
  4. 日志安全审计 - 检测日志语句中是否存在PHI泄露
  5. 分类 - 将检测结果分类为PHI、PII或sensitive_nonPHI(敏感非PHI)
  6. 风险评分 - 根据敏感度和暴露程度为检测结果打分(0-100分)
  7. HIPAA映射 - 将每个检测结果映射到具体的HIPAA规则
  8. 审计报告 - 生成findings.json、审计报告和操作手册
  9. 整改指导 - 提供带代码示例的分步整改方案
  10. 控制检查 - 验证安全控制措施是否到位

Usage

使用方法

/hipaa-guardian [command] [path] [options]
/hipaa-guardian [command] [path] [options]

Commands

命令

  • scan <path>
    - Scan files or directories for PHI/PII
  • scan-code <path>
    - Scan source code for PHI leakage
  • scan-auth <path>
    - Check API endpoints for missing authentication before PHI access
  • scan-logs <path>
    - Detect PHI patterns in logging statements
  • scan-response <path>
    - Check API responses for unmasked PHI exposure
  • audit <path>
    - Generate full HIPAA compliance audit report
  • controls <path>
    - Check security controls in a project
  • report
    - Generate report from existing findings
  • scan <path>
    - 扫描文件或目录中的PHI/PII
  • scan-code <path>
    - 扫描源代码中的PHI泄露情况
  • scan-auth <path>
    - 检查API端点在访问PHI前是否缺失身份验证
  • scan-logs <path>
    - 检测日志语句中的PHI模式
  • scan-response <path>
    - 检查API响应中是否存在未掩码的PHI暴露
  • audit <path>
    - 生成完整的HIPAA合规性审计报告
  • controls <path>
    - 检查项目中的安全控制措施
  • report
    - 根据现有检测结果生成报告

Options

选项

  • --format <type>
    - Output format: json, markdown, csv (default: markdown)
  • --output <file>
    - Write results to file
  • --severity <level>
    - Minimum severity: low, medium, high, critical
  • --include <patterns>
    - File patterns to include
  • --exclude <patterns>
    - File patterns to exclude
  • --synthetic
    - Treat all data as synthetic (default for safety)
  • --format <type>
    - 输出格式:json、markdown、csv(默认值:markdown)
  • --output <file>
    - 将结果写入文件
  • --severity <level>
    - 最低严重级别:low、medium、high、critical
  • --include <patterns>
    - 要包含的文件模式
  • --exclude <patterns>
    - 要排除的文件模式
  • --synthetic
    - 将所有数据视为合成数据(安全默认设置)

Workflow

工作流程

When invoked, follow this workflow:
调用时,请遵循以下工作流程:

Step 1: Determine Scan Scope

步骤1:确定扫描范围

Ask the user to specify:
  • Target path (file, directory, or glob pattern)
  • Scan type (data files, source code, or both)
  • Whether data is synthetic/test data or potentially real PHI
请用户指定:
  • 目标路径(文件、目录或glob模式)
  • 扫描类型(数据文件、源代码或两者皆有)
  • 数据是合成/测试数据还是可能包含真实PHI

Step 2: File Discovery

步骤2:文件发现

Use Glob to find relevant files:
undefined
使用Glob查找相关文件:
undefined

For data files

针对数据文件

Glob: **/*.{json,csv,txt,log,xml,hl7,fhir}
Glob: **/*.{json,csv,txt,log,xml,hl7,fhir}

For source code

针对源代码

Glob: **/*.{py,js,ts,tsx,java,cs,go,rb,sql,sh}
Glob: **/*.{py,js,ts,tsx,java,cs,go,rb,sql,sh}

For config files

针对配置文件

Glob: **/*.{env,yaml,yml,json,xml,ini,conf}
undefined
Glob: **/*.{env,yaml,yml,json,xml,ini,conf}
undefined

Step 3: PHI Detection

步骤3:PHI检测

For each file, scan for the 18 HIPAA identifiers using patterns from
references/detection-patterns.md
:
  1. Names - Patient, provider, relative names
  2. Geographic - Addresses, cities, ZIP codes
  3. Dates - DOB, admission, discharge, death dates
  4. Phone Numbers - All formats
  5. Fax Numbers - All formats
  6. Email Addresses - All formats
  7. SSN - Social Security Numbers
  8. MRN - Medical Record Numbers
  9. Health Plan IDs - Insurance identifiers
  10. Account Numbers - Financial accounts
  11. License Numbers - Driver's license, professional
  12. Vehicle IDs - VIN, license plates
  13. Device IDs - Serial numbers, UDI
  14. URLs - Web addresses
  15. IP Addresses - Network identifiers
  16. Biometric - Fingerprints, retinal, voice
  17. Photos - Full-face images
  18. Other Unique IDs - Any other identifying numbers
对每个文件,使用
references/detection-patterns.md
中的模式扫描18种HIPAA标识符:
  1. 姓名 - 患者、提供者、亲属姓名
  2. 地理位置信息 - 地址、城市、邮政编码
  3. 日期 - 出生日期、入院、出院、死亡日期
  4. 电话号码 - 所有格式
  5. 传真号码 - 所有格式
  6. 电子邮件地址 - 所有格式
  7. SSN - 社会保险号码
  8. MRN - 病历号
  9. 健康计划ID - 保险标识符
  10. 账户号码 - 金融账户
  11. 许可证号码 - 驾照、专业许可证
  12. 车辆ID - VIN、车牌
  13. 设备ID - 序列号、UDI
  14. URL - 网址
  15. IP地址 - 网络标识符
  16. 生物特征 - 指纹、视网膜、语音
  17. 照片 - 全脸图像
  18. 其他唯一ID - 任何其他识别号码

Step 4: Classification

步骤4:分类

Classify each finding:
  • PHI - Health information linkable to individual
  • PII - Personally identifiable but not health-related
  • sensitive_nonPHI - Sensitive but not individually identifiable
对每个检测结果进行分类:
  • PHI - 可关联到个人的健康信息
  • PII - 可识别个人身份但与健康无关的信息
  • sensitive_nonPHI - 敏感但无法识别个人身份的信息

Step 5: Risk Scoring

步骤5:风险评分

Calculate risk score (0-100) using methodology from
references/risk-scoring.md
:
Risk Score = (Sensitivity × 0.35) + (Exposure × 0.25) +
             (Volume × 0.20) + (Identifiability × 0.20)
使用
references/risk-scoring.md
中的方法计算风险评分(0-100):
风险评分 = (敏感度 × 0.35) + (暴露程度 × 0.25) +
             (数据量 × 0.20) + (可识别性 × 0.20)

Step 6: HIPAA Mapping

步骤6:HIPAA映射

Map findings to HIPAA rules from references:
  • references/privacy-rule.md
    - 45 CFR 164.500-534
  • references/security-rule.md
    - 45 CFR 164.302-318
  • references/breach-rule.md
    - 45 CFR 164.400-414
将检测结果映射到参考文档中的HIPAA规则:
  • references/privacy-rule.md
    - 45 CFR 164.500-534
  • references/security-rule.md
    - 45 CFR 164.302-318
  • references/breach-rule.md
    - 45 CFR 164.400-414

Step 7: Generate Output

步骤7:生成输出

Create structured output following
examples/sample-finding.json
format:
json
{
  "id": "F-YYYYMMDD-NNNN",
  "timestamp": "ISO-8601",
  "file": "path/to/file",
  "line": 123,
  "field": "field.path",
  "value_hash": "sha256:...",
  "classification": "PHI|PII|sensitive_nonPHI",
  "identifier_type": "ssn|mrn|dob|...",
  "confidence": 0.95,
  "risk_score": 85,
  "hipaa_rules": [...],
  "remediation": [...],
  "status": "open"
}
按照
examples/sample-finding.json
格式创建结构化输出:
json
{
  "id": "F-YYYYMMDD-NNNN",
  "timestamp": "ISO-8601",
  "file": "path/to/file",
  "line": 123,
  "field": "field.path",
  "value_hash": "sha256:...",
  "classification": "PHI|PII|sensitive_nonPHI",
  "identifier_type": "ssn|mrn|dob|...",
  "confidence": 0.95,
  "risk_score": 85,
  "hipaa_rules": [...],
  "remediation": [...],
  "status": "open"
}

Code Scanning

代码扫描

When scanning source code, look for:
扫描源代码时,需查找以下内容:

1. Hardcoded PHI in Source

1. 源代码中的硬编码PHI

  • String literals containing SSN, MRN, names, dates
  • Variable assignments with sensitive values
  • Database seed/fixture data
  • 包含SSN、MRN、姓名、日期的字符串字面量
  • 分配敏感值的变量
  • 数据库种子/fixture数据

2. PHI in Comments

2. 注释中的PHI

  • Example data in code comments
  • TODO comments with patient info
  • Documentation strings with real data
  • 代码注释中的示例数据
  • 包含患者信息的TODO注释
  • 包含真实数据的文档字符串

3. Test Data Leakage

3. 测试数据泄露

  • Test fixtures with real PHI
  • Mock data files with actual patient info
  • Integration test data
  • 包含真实PHI的测试fixtures
  • 包含实际患者信息的模拟数据文件
  • 集成测试数据

4. Configuration Files

4. 配置文件

  • .env
    files with PHI
  • Connection strings with embedded credentials
  • API responses cached with PHI
  • .env
    文件中的PHI
  • 嵌入凭据的连接字符串
  • 缓存了PHI的API响应

5. SQL Files

5. SQL文件

  • INSERT statements with PHI
  • Sample queries with real patient data
  • Database dumps
See
references/code-scanning.md
for detailed patterns.
  • 包含PHI的INSERT语句
  • 包含真实患者数据的示例查询
  • 数据库转储文件
详细模式请参阅
references/code-scanning.md

Security Control Checks

安全控制检查

Verify these controls are in place:
验证以下控制措施是否到位:

Access Controls

访问控制

  • Role-based access control (RBAC) implemented
  • Minimum necessary access principle applied
  • Access logging enabled
  • 已实现基于角色的访问控制(RBAC)
  • 应用了最小必要访问原则
  • 已启用访问日志

Encryption

加密

  • Data encrypted at rest (AES-256)
  • Data encrypted in transit (TLS 1.2+)
  • Encryption keys properly managed
  • 静态数据已加密(AES-256)
  • 传输中数据已加密(TLS 1.2+)
  • 加密密钥管理得当

Audit Controls

审计控制

  • Audit logging implemented
  • Log integrity protected
  • Retention policies defined
  • 已实现审计日志
  • 日志完整性受到保护
  • 已定义保留策略

Code Security

代码安全

  • .gitignore
    excludes sensitive files
  • Pre-commit hooks scan for PHI
  • Secrets management in place
  • Data masking in logs
  • .gitignore
    排除了敏感文件
  • 提交前钩子会扫描PHI
  • 已部署密钥管理措施
  • 日志中已实现数据掩码

Output Formats

输出格式

findings.json

findings.json

Structured array of all findings with full metadata.
包含所有检测结果及完整元数据的结构化数组。

audit_report.md

audit_report.md

Human-readable report with:
  • Executive summary
  • Findings by severity
  • HIPAA compliance status
  • Risk assessment
  • Recommendations
人类可读的报告,包含:
  • 执行摘要
  • 按严重程度分类的检测结果
  • HIPAA合规状态
  • 风险评估
  • 建议

playbook.md

playbook.md

Step-by-step remediation guide:
  • Prioritized actions
  • Code examples
  • Verification steps
分步整改指南:
  • 优先级行动项
  • 代码示例
  • 验证步骤

Security Guardrails

安全防护措施

  1. Default Synthetic Mode - Assumes data is synthetic unless confirmed otherwise
  2. No PHI Storage - Never stores detected PHI values, only hashes
  3. Redaction - All example outputs redact actual values
  4. Warning Prompts - Warns before processing potentially real PHI
  5. Audit Trail - Logs all scans (without PHI values)
  1. 默认合成模式 - 除非另行确认,否则默认假设数据为合成数据
  2. 不存储PHI - 绝不存储检测到的PHI值,仅存储哈希值
  3. 脱敏处理 - 所有示例输出都会对实际值进行脱敏
  4. 警告提示 - 在处理可能包含真实PHI的数据前发出警告
  5. 审计跟踪 - 记录所有扫描操作(不包含PHI值)

References

参考文档

  • references/hipaa-identifiers.md
    - All 18 HIPAA Safe Harbor identifiers
  • references/detection-patterns.md
    - Regex patterns for PHI detection
  • references/code-scanning.md
    - Code scanning patterns and rules
  • references/healthcare-formats.md
    - FHIR, HL7, CDA detection patterns
  • references/privacy-rule.md
    - HIPAA Privacy Rule (45 CFR 164.500-534)
  • references/security-rule.md
    - HIPAA Security Rule (45 CFR 164.302-318)
  • references/breach-rule.md
    - Breach Notification Rule (45 CFR 164.400-414)
  • references/risk-scoring.md
    - Risk scoring methodology
  • references/auth-patterns.md
    - Authentication gate patterns for PHI endpoints
  • references/logging-safety.md
    - PHI-safe logging patterns and filters
  • references/api-security.md
    - API response masking and field-level auth
  • references/hipaa-identifiers.md
    - 所有18种HIPAA安全港标识符
  • references/detection-patterns.md
    - PHI检测的正则表达式模式
  • references/code-scanning.md
    - 代码扫描模式和规则
  • references/healthcare-formats.md
    - FHIR、HL7、CDA检测模式
  • references/privacy-rule.md
    - HIPAA隐私规则(45 CFR 164.500-534)
  • references/security-rule.md
    - HIPAA安全规则(45 CFR 164.302-318)
  • references/breach-rule.md
    - 违规通知规则(45 CFR 164.400-414)
  • references/risk-scoring.md
    - 风险评分方法
  • references/auth-patterns.md
    - PHI端点的身份验证网关模式
  • references/logging-safety.md
    - 安全的PHI日志记录模式和过滤器
  • references/api-security.md
    - API响应掩码和字段级身份验证

CI/CD Integration

CI/CD集成

Pre-Commit Hook Installation

提交前钩子安装

bash
undefined
bash
undefined

Install the pre-commit hook

安装提交前钩子

cp scripts/pre-commit-hook.sh .git/hooks/pre-commit chmod +x .git/hooks/pre-commit
cp scripts/pre-commit-hook.sh .git/hooks/pre-commit chmod +x .git/hooks/pre-commit

Or using pre-commit framework

或使用pre-commit框架

Add to .pre-commit-config.yaml:

添加到 .pre-commit-config.yaml:

repos:
  • repo: local hooks:
    • id: hipaa-guardian name: HIPAA Guardian PHI Scan entry: python scripts/detect-phi.py language: python types: [file] pass_filenames: true
undefined
repos:
  • repo: local hooks:
    • id: hipaa-guardian name: HIPAA Guardian PHI Scan entry: python scripts/detect-phi.py language: python types: [file] pass_filenames: true
undefined

Environment Variables

环境变量

bash
undefined
bash
undefined

Configure pre-commit behavior

配置提交前钩子行为

export HIPAA_BLOCK_ON_CRITICAL=true # Block commits with critical findings export HIPAA_BLOCK_ON_HIGH=true # Block commits with high severity findings export HIPAA_SCAN_DATA=true # Scan data files export HIPAA_SCAN_CODE=true # Scan source code export HIPAA_VERBOSE=false # Enable verbose output
undefined
export HIPAA_BLOCK_ON_CRITICAL=true # 阻止包含严重检测结果的提交 export HIPAA_BLOCK_ON_HIGH=true # 阻止包含高严重级别检测结果的提交 export HIPAA_SCAN_DATA=true # 扫描数据文件 export HIPAA_SCAN_CODE=true # 扫描源代码 export HIPAA_VERBOSE=false # 启用详细输出
undefined

GitHub Actions Integration

GitHub Actions集成

yaml
undefined
yaml
undefined

.github/workflows/hipaa-scan.yml

.github/workflows/hipaa-scan.yml

name: HIPAA PHI Scan on: [push, pull_request] jobs: scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.11' - name: Run PHI Scan run: | python scripts/detect-phi.py . --format markdown --output phi-report.md - name: Upload Report uses: actions/upload-artifact@v4 with: name: phi-scan-report path: phi-report.md
undefined
name: HIPAA PHI Scan on: [push, pull_request] jobs: scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: '3.11' - name: Run PHI Scan run: | python scripts/detect-phi.py . --format markdown --output phi-report.md - name: Upload Report uses: actions/upload-artifact@v4 with: name: phi-scan-report path: phi-report.md
undefined

Healthcare Data Format Support

医疗数据格式支持

Supported Formats

支持的格式

FormatExtensionsDetection
FHIR R4
.fhir.json
,
.fhir.xml
Resource type, identifiers
HL7 v2.x
.hl7
,
.hl7v2
MSH, PID, DG1 segments
CDA/C-CDA
.cda
,
.ccda
,
.ccd
ClinicalDocument, patientRole
X12 EDI
.x12
,
.edi
,
.837
Transaction set headers
格式扩展名检测能力
FHIR R4
.fhir.json
,
.fhir.xml
资源类型、标识符
HL7 v2.x
.hl7
,
.hl7v2
MSH、PID、DG1段
CDA/C-CDA
.cda
,
.ccda
,
.ccd
ClinicalDocument、patientRole
X12 EDI
.x12
,
.edi
,
.837
交易集头

High-Risk FHIR Resources

高风险FHIR资源

  • Patient
    - Demographics, identifiers, contacts
  • Condition
    - Diagnoses, health conditions
  • Observation
    - Lab results, vitals
  • MedicationRequest
    - Prescriptions
  • DiagnosticReport
    - Test results
  • Patient
    - 人口统计信息、标识符、联系人
  • Condition
    - 诊断结果、健康状况
  • Observation
    - 实验室结果、生命体征
  • MedicationRequest
    - 处方
  • DiagnosticReport
    - 检测结果

HL7 v2 PHI Segments

示例

  • PID
    - Patient Identification (SSN in PID-19)
  • DG1
    - Diagnosis Information
  • OBX
    - Observation/Result Values
  • IN1
    - Insurance Information
  • examples/sample-finding.json
    - 示例检测结果输出格式
  • examples/sample-audit-report.md
    - 示例审计报告
  • examples/synthetic-phi-data.json
    - 用于验证的测试数据

Examples

脚本

  • examples/sample-finding.json
    - Example finding output format
  • examples/sample-audit-report.md
    - Example audit report
  • examples/synthetic-phi-data.json
    - Test data for validation
  • scripts/detect-phi.py
    - 数据文件中的PHI/PII检测(支持FHIR、HL7、CDA格式)
  • scripts/scan-code.py
    - 代码中的PHI泄露扫描
  • scripts/scan-auth.py
    - PHI端点的身份验证网关检测
  • scripts/scan-logs.py
    - 日志语句中的PHI检测
  • scripts/scan-response.py
    - API响应中的PHI暴露检测
  • scripts/generate-report.py
    - 报告生成脚本
  • scripts/validate-controls.sh
    - 控制措施验证脚本
  • scripts/pre-commit-hook.sh
    - 用于CI/CD集成的Git提交前钩子

Scripts

  • scripts/detect-phi.py
    - PHI/PII detection in data files (supports FHIR, HL7, CDA formats)
  • scripts/scan-code.py
    - Code scanning for PHI leakage
  • scripts/scan-auth.py
    - Authentication gate detection for PHI endpoints
  • scripts/scan-logs.py
    - PHI detection in logging statements
  • scripts/scan-response.py
    - API response PHI exposure detection
  • scripts/generate-report.py
    - Report generation script
  • scripts/validate-controls.sh
    - Control validation script
  • scripts/pre-commit-hook.sh
    - Git pre-commit hook for CI/CD integration