hipaa-guardian
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHIPAA Guardian
HIPAA守护者
A comprehensive PHI/PII detection and HIPAA compliance skill for AI agents, with a strong focus on developer code security patterns. Detects all 18 HIPAA Safe Harbor identifiers in data files and source code, provides risk scoring, maps findings to HIPAA regulations, and generates audit reports with remediation guidance.
这是一款面向AI Agent的综合性PHI/PII检测与HIPAA合规Skill,重点关注开发者代码安全模式。可检测数据文件和源代码中所有18种HIPAA安全港标识符,提供风险评分,将检测结果映射到HIPAA法规,并生成包含整改指导的审计报告。
Capabilities
功能特性
- PHI/PII Detection - Scan data files for the 18 HIPAA Safe Harbor identifiers
- Code Scanning - Detect PHI in source code, comments, test fixtures, configs
- Auth Gate Detection - Find API endpoints exposing PHI without authentication
- Log Safety Audit - Detect PHI leaking into log statements
- Classification - Classify findings as PHI, PII, or sensitive_nonPHI
- Risk Scoring - Score findings 0-100 based on sensitivity and exposure
- HIPAA Mapping - Map each finding to specific HIPAA rules
- Audit Reports - Generate findings.json, audit reports, and playbooks
- Remediation - Provide step-by-step remediation with code examples
- Control Checks - Validate security controls are in place
- PHI/PII检测 - 扫描数据文件中的18种HIPAA安全港标识符
- 代码扫描 - 检测源代码、注释、测试fixtures、配置文件中的PHI
- 身份验证网关检测 - 查找未经过身份验证就暴露PHI的API端点
- 日志安全审计 - 检测日志语句中是否存在PHI泄露
- 分类 - 将检测结果分类为PHI、PII或sensitive_nonPHI(敏感非PHI)
- 风险评分 - 根据敏感度和暴露程度为检测结果打分(0-100分)
- HIPAA映射 - 将每个检测结果映射到具体的HIPAA规则
- 审计报告 - 生成findings.json、审计报告和操作手册
- 整改指导 - 提供带代码示例的分步整改方案
- 控制检查 - 验证安全控制措施是否到位
Usage
使用方法
/hipaa-guardian [command] [path] [options]/hipaa-guardian [command] [path] [options]Commands
命令
- - Scan files or directories for PHI/PII
scan <path> - - Scan source code for PHI leakage
scan-code <path> - - Check API endpoints for missing authentication before PHI access
scan-auth <path> - - Detect PHI patterns in logging statements
scan-logs <path> - - Check API responses for unmasked PHI exposure
scan-response <path> - - Generate full HIPAA compliance audit report
audit <path> - - Check security controls in a project
controls <path> - - Generate report from existing findings
report
- - 扫描文件或目录中的PHI/PII
scan <path> - - 扫描源代码中的PHI泄露情况
scan-code <path> - - 检查API端点在访问PHI前是否缺失身份验证
scan-auth <path> - - 检测日志语句中的PHI模式
scan-logs <path> - - 检查API响应中是否存在未掩码的PHI暴露
scan-response <path> - - 生成完整的HIPAA合规性审计报告
audit <path> - - 检查项目中的安全控制措施
controls <path> - - 根据现有检测结果生成报告
report
Options
选项
- - Output format: json, markdown, csv (default: markdown)
--format <type> - - Write results to file
--output <file> - - Minimum severity: low, medium, high, critical
--severity <level> - - File patterns to include
--include <patterns> - - File patterns to exclude
--exclude <patterns> - - Treat all data as synthetic (default for safety)
--synthetic
- - 输出格式:json、markdown、csv(默认值:markdown)
--format <type> - - 将结果写入文件
--output <file> - - 最低严重级别:low、medium、high、critical
--severity <level> - - 要包含的文件模式
--include <patterns> - - 要排除的文件模式
--exclude <patterns> - - 将所有数据视为合成数据(安全默认设置)
--synthetic
Workflow
工作流程
When invoked, follow this workflow:
调用时,请遵循以下工作流程:
Step 1: Determine Scan Scope
步骤1:确定扫描范围
Ask the user to specify:
- Target path (file, directory, or glob pattern)
- Scan type (data files, source code, or both)
- Whether data is synthetic/test data or potentially real PHI
请用户指定:
- 目标路径(文件、目录或glob模式)
- 扫描类型(数据文件、源代码或两者皆有)
- 数据是合成/测试数据还是可能包含真实PHI
Step 2: File Discovery
步骤2:文件发现
Use Glob to find relevant files:
undefined使用Glob查找相关文件:
undefinedFor data files
针对数据文件
Glob: **/*.{json,csv,txt,log,xml,hl7,fhir}
Glob: **/*.{json,csv,txt,log,xml,hl7,fhir}
For source code
针对源代码
Glob: **/*.{py,js,ts,tsx,java,cs,go,rb,sql,sh}
Glob: **/*.{py,js,ts,tsx,java,cs,go,rb,sql,sh}
For config files
针对配置文件
Glob: **/*.{env,yaml,yml,json,xml,ini,conf}
undefinedGlob: **/*.{env,yaml,yml,json,xml,ini,conf}
undefinedStep 3: PHI Detection
步骤3:PHI检测
For each file, scan for the 18 HIPAA identifiers using patterns from :
references/detection-patterns.md- Names - Patient, provider, relative names
- Geographic - Addresses, cities, ZIP codes
- Dates - DOB, admission, discharge, death dates
- Phone Numbers - All formats
- Fax Numbers - All formats
- Email Addresses - All formats
- SSN - Social Security Numbers
- MRN - Medical Record Numbers
- Health Plan IDs - Insurance identifiers
- Account Numbers - Financial accounts
- License Numbers - Driver's license, professional
- Vehicle IDs - VIN, license plates
- Device IDs - Serial numbers, UDI
- URLs - Web addresses
- IP Addresses - Network identifiers
- Biometric - Fingerprints, retinal, voice
- Photos - Full-face images
- Other Unique IDs - Any other identifying numbers
对每个文件,使用中的模式扫描18种HIPAA标识符:
references/detection-patterns.md- 姓名 - 患者、提供者、亲属姓名
- 地理位置信息 - 地址、城市、邮政编码
- 日期 - 出生日期、入院、出院、死亡日期
- 电话号码 - 所有格式
- 传真号码 - 所有格式
- 电子邮件地址 - 所有格式
- SSN - 社会保险号码
- MRN - 病历号
- 健康计划ID - 保险标识符
- 账户号码 - 金融账户
- 许可证号码 - 驾照、专业许可证
- 车辆ID - VIN、车牌
- 设备ID - 序列号、UDI
- URL - 网址
- IP地址 - 网络标识符
- 生物特征 - 指纹、视网膜、语音
- 照片 - 全脸图像
- 其他唯一ID - 任何其他识别号码
Step 4: Classification
步骤4:分类
Classify each finding:
- PHI - Health information linkable to individual
- PII - Personally identifiable but not health-related
- sensitive_nonPHI - Sensitive but not individually identifiable
对每个检测结果进行分类:
- PHI - 可关联到个人的健康信息
- PII - 可识别个人身份但与健康无关的信息
- sensitive_nonPHI - 敏感但无法识别个人身份的信息
Step 5: Risk Scoring
步骤5:风险评分
Calculate risk score (0-100) using methodology from :
references/risk-scoring.mdRisk Score = (Sensitivity × 0.35) + (Exposure × 0.25) +
(Volume × 0.20) + (Identifiability × 0.20)使用中的方法计算风险评分(0-100):
references/risk-scoring.md风险评分 = (敏感度 × 0.35) + (暴露程度 × 0.25) +
(数据量 × 0.20) + (可识别性 × 0.20)Step 6: HIPAA Mapping
步骤6:HIPAA映射
Map findings to HIPAA rules from references:
- - 45 CFR 164.500-534
references/privacy-rule.md - - 45 CFR 164.302-318
references/security-rule.md - - 45 CFR 164.400-414
references/breach-rule.md
将检测结果映射到参考文档中的HIPAA规则:
- - 45 CFR 164.500-534
references/privacy-rule.md - - 45 CFR 164.302-318
references/security-rule.md - - 45 CFR 164.400-414
references/breach-rule.md
Step 7: Generate Output
步骤7:生成输出
Create structured output following format:
examples/sample-finding.jsonjson
{
"id": "F-YYYYMMDD-NNNN",
"timestamp": "ISO-8601",
"file": "path/to/file",
"line": 123,
"field": "field.path",
"value_hash": "sha256:...",
"classification": "PHI|PII|sensitive_nonPHI",
"identifier_type": "ssn|mrn|dob|...",
"confidence": 0.95,
"risk_score": 85,
"hipaa_rules": [...],
"remediation": [...],
"status": "open"
}按照格式创建结构化输出:
examples/sample-finding.jsonjson
{
"id": "F-YYYYMMDD-NNNN",
"timestamp": "ISO-8601",
"file": "path/to/file",
"line": 123,
"field": "field.path",
"value_hash": "sha256:...",
"classification": "PHI|PII|sensitive_nonPHI",
"identifier_type": "ssn|mrn|dob|...",
"confidence": 0.95,
"risk_score": 85,
"hipaa_rules": [...],
"remediation": [...],
"status": "open"
}Code Scanning
代码扫描
When scanning source code, look for:
扫描源代码时,需查找以下内容:
1. Hardcoded PHI in Source
1. 源代码中的硬编码PHI
- String literals containing SSN, MRN, names, dates
- Variable assignments with sensitive values
- Database seed/fixture data
- 包含SSN、MRN、姓名、日期的字符串字面量
- 分配敏感值的变量
- 数据库种子/fixture数据
2. PHI in Comments
2. 注释中的PHI
- Example data in code comments
- TODO comments with patient info
- Documentation strings with real data
- 代码注释中的示例数据
- 包含患者信息的TODO注释
- 包含真实数据的文档字符串
3. Test Data Leakage
3. 测试数据泄露
- Test fixtures with real PHI
- Mock data files with actual patient info
- Integration test data
- 包含真实PHI的测试fixtures
- 包含实际患者信息的模拟数据文件
- 集成测试数据
4. Configuration Files
4. 配置文件
- files with PHI
.env - Connection strings with embedded credentials
- API responses cached with PHI
- 文件中的PHI
.env - 嵌入凭据的连接字符串
- 缓存了PHI的API响应
5. SQL Files
5. SQL文件
- INSERT statements with PHI
- Sample queries with real patient data
- Database dumps
See for detailed patterns.
references/code-scanning.md- 包含PHI的INSERT语句
- 包含真实患者数据的示例查询
- 数据库转储文件
详细模式请参阅。
references/code-scanning.mdSecurity Control Checks
安全控制检查
Verify these controls are in place:
验证以下控制措施是否到位:
Access Controls
访问控制
- Role-based access control (RBAC) implemented
- Minimum necessary access principle applied
- Access logging enabled
- 已实现基于角色的访问控制(RBAC)
- 应用了最小必要访问原则
- 已启用访问日志
Encryption
加密
- Data encrypted at rest (AES-256)
- Data encrypted in transit (TLS 1.2+)
- Encryption keys properly managed
- 静态数据已加密(AES-256)
- 传输中数据已加密(TLS 1.2+)
- 加密密钥管理得当
Audit Controls
审计控制
- Audit logging implemented
- Log integrity protected
- Retention policies defined
- 已实现审计日志
- 日志完整性受到保护
- 已定义保留策略
Code Security
代码安全
- excludes sensitive files
.gitignore - Pre-commit hooks scan for PHI
- Secrets management in place
- Data masking in logs
- 排除了敏感文件
.gitignore - 提交前钩子会扫描PHI
- 已部署密钥管理措施
- 日志中已实现数据掩码
Output Formats
输出格式
findings.json
findings.json
Structured array of all findings with full metadata.
包含所有检测结果及完整元数据的结构化数组。
audit_report.md
audit_report.md
Human-readable report with:
- Executive summary
- Findings by severity
- HIPAA compliance status
- Risk assessment
- Recommendations
人类可读的报告,包含:
- 执行摘要
- 按严重程度分类的检测结果
- HIPAA合规状态
- 风险评估
- 建议
playbook.md
playbook.md
Step-by-step remediation guide:
- Prioritized actions
- Code examples
- Verification steps
分步整改指南:
- 优先级行动项
- 代码示例
- 验证步骤
Security Guardrails
安全防护措施
- Default Synthetic Mode - Assumes data is synthetic unless confirmed otherwise
- No PHI Storage - Never stores detected PHI values, only hashes
- Redaction - All example outputs redact actual values
- Warning Prompts - Warns before processing potentially real PHI
- Audit Trail - Logs all scans (without PHI values)
- 默认合成模式 - 除非另行确认,否则默认假设数据为合成数据
- 不存储PHI - 绝不存储检测到的PHI值,仅存储哈希值
- 脱敏处理 - 所有示例输出都会对实际值进行脱敏
- 警告提示 - 在处理可能包含真实PHI的数据前发出警告
- 审计跟踪 - 记录所有扫描操作(不包含PHI值)
References
参考文档
- - All 18 HIPAA Safe Harbor identifiers
references/hipaa-identifiers.md - - Regex patterns for PHI detection
references/detection-patterns.md - - Code scanning patterns and rules
references/code-scanning.md - - FHIR, HL7, CDA detection patterns
references/healthcare-formats.md - - HIPAA Privacy Rule (45 CFR 164.500-534)
references/privacy-rule.md - - HIPAA Security Rule (45 CFR 164.302-318)
references/security-rule.md - - Breach Notification Rule (45 CFR 164.400-414)
references/breach-rule.md - - Risk scoring methodology
references/risk-scoring.md - - Authentication gate patterns for PHI endpoints
references/auth-patterns.md - - PHI-safe logging patterns and filters
references/logging-safety.md - - API response masking and field-level auth
references/api-security.md
- - 所有18种HIPAA安全港标识符
references/hipaa-identifiers.md - - PHI检测的正则表达式模式
references/detection-patterns.md - - 代码扫描模式和规则
references/code-scanning.md - - FHIR、HL7、CDA检测模式
references/healthcare-formats.md - - HIPAA隐私规则(45 CFR 164.500-534)
references/privacy-rule.md - - HIPAA安全规则(45 CFR 164.302-318)
references/security-rule.md - - 违规通知规则(45 CFR 164.400-414)
references/breach-rule.md - - 风险评分方法
references/risk-scoring.md - - PHI端点的身份验证网关模式
references/auth-patterns.md - - 安全的PHI日志记录模式和过滤器
references/logging-safety.md - - API响应掩码和字段级身份验证
references/api-security.md
CI/CD Integration
CI/CD集成
Pre-Commit Hook Installation
提交前钩子安装
bash
undefinedbash
undefinedInstall the pre-commit hook
安装提交前钩子
cp scripts/pre-commit-hook.sh .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
cp scripts/pre-commit-hook.sh .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit
Or using pre-commit framework
或使用pre-commit框架
Add to .pre-commit-config.yaml:
添加到 .pre-commit-config.yaml:
repos:
- repo: local
hooks:
- id: hipaa-guardian name: HIPAA Guardian PHI Scan entry: python scripts/detect-phi.py language: python types: [file] pass_filenames: true
undefinedrepos:
- repo: local
hooks:
- id: hipaa-guardian name: HIPAA Guardian PHI Scan entry: python scripts/detect-phi.py language: python types: [file] pass_filenames: true
undefinedEnvironment Variables
环境变量
bash
undefinedbash
undefinedConfigure pre-commit behavior
配置提交前钩子行为
export HIPAA_BLOCK_ON_CRITICAL=true # Block commits with critical findings
export HIPAA_BLOCK_ON_HIGH=true # Block commits with high severity findings
export HIPAA_SCAN_DATA=true # Scan data files
export HIPAA_SCAN_CODE=true # Scan source code
export HIPAA_VERBOSE=false # Enable verbose output
undefinedexport HIPAA_BLOCK_ON_CRITICAL=true # 阻止包含严重检测结果的提交
export HIPAA_BLOCK_ON_HIGH=true # 阻止包含高严重级别检测结果的提交
export HIPAA_SCAN_DATA=true # 扫描数据文件
export HIPAA_SCAN_CODE=true # 扫描源代码
export HIPAA_VERBOSE=false # 启用详细输出
undefinedGitHub Actions Integration
GitHub Actions集成
yaml
undefinedyaml
undefined.github/workflows/hipaa-scan.yml
.github/workflows/hipaa-scan.yml
name: HIPAA PHI Scan
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Run PHI Scan
run: |
python scripts/detect-phi.py . --format markdown --output phi-report.md
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: phi-scan-report
path: phi-report.md
undefinedname: HIPAA PHI Scan
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Run PHI Scan
run: |
python scripts/detect-phi.py . --format markdown --output phi-report.md
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: phi-scan-report
path: phi-report.md
undefinedHealthcare Data Format Support
医疗数据格式支持
Supported Formats
支持的格式
| Format | Extensions | Detection |
|---|---|---|
| FHIR R4 | | Resource type, identifiers |
| HL7 v2.x | | MSH, PID, DG1 segments |
| CDA/C-CDA | | ClinicalDocument, patientRole |
| X12 EDI | | Transaction set headers |
| 格式 | 扩展名 | 检测能力 |
|---|---|---|
| FHIR R4 | | 资源类型、标识符 |
| HL7 v2.x | | MSH、PID、DG1段 |
| CDA/C-CDA | | ClinicalDocument、patientRole |
| X12 EDI | | 交易集头 |
High-Risk FHIR Resources
高风险FHIR资源
- - Demographics, identifiers, contacts
Patient - - Diagnoses, health conditions
Condition - - Lab results, vitals
Observation - - Prescriptions
MedicationRequest - - Test results
DiagnosticReport
- - 人口统计信息、标识符、联系人
Patient - - 诊断结果、健康状况
Condition - - 实验室结果、生命体征
Observation - - 处方
MedicationRequest - - 检测结果
DiagnosticReport
HL7 v2 PHI Segments
示例
- - Patient Identification (SSN in PID-19)
PID - - Diagnosis Information
DG1 - - Observation/Result Values
OBX - - Insurance Information
IN1
- - 示例检测结果输出格式
examples/sample-finding.json - - 示例审计报告
examples/sample-audit-report.md - - 用于验证的测试数据
examples/synthetic-phi-data.json
Examples
脚本
- - Example finding output format
examples/sample-finding.json - - Example audit report
examples/sample-audit-report.md - - Test data for validation
examples/synthetic-phi-data.json
- - 数据文件中的PHI/PII检测(支持FHIR、HL7、CDA格式)
scripts/detect-phi.py - - 代码中的PHI泄露扫描
scripts/scan-code.py - - PHI端点的身份验证网关检测
scripts/scan-auth.py - - 日志语句中的PHI检测
scripts/scan-logs.py - - API响应中的PHI暴露检测
scripts/scan-response.py - - 报告生成脚本
scripts/generate-report.py - - 控制措施验证脚本
scripts/validate-controls.sh - - 用于CI/CD集成的Git提交前钩子
scripts/pre-commit-hook.sh
Scripts
—
- - PHI/PII detection in data files (supports FHIR, HL7, CDA formats)
scripts/detect-phi.py - - Code scanning for PHI leakage
scripts/scan-code.py - - Authentication gate detection for PHI endpoints
scripts/scan-auth.py - - PHI detection in logging statements
scripts/scan-logs.py - - API response PHI exposure detection
scripts/scan-response.py - - Report generation script
scripts/generate-report.py - - Control validation script
scripts/validate-controls.sh - - Git pre-commit hook for CI/CD integration
scripts/pre-commit-hook.sh
—