ethics-safety-impact

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Ethics, Safety & Impact Assessment

伦理、安全与影响评估

Table of Contents

目录

Purpose

目的

Ethics, Safety & Impact Assessment provides a structured framework for identifying potential harms, benefits, and differential impacts before launching features, implementing policies, or making decisions that affect people. This skill guides you through stakeholder identification, harm/benefit analysis, fairness evaluation, risk mitigation design, and ongoing monitoring to ensure responsible and equitable outcomes.
伦理、安全与影响评估提供了一个结构化框架,用于在推出功能、实施政策或做出影响用户的决策之前,识别潜在危害、收益和差异化影响。该方法将引导你完成利益相关者识别、危害/收益分析、公平性评估、风险缓解设计和持续监控,以确保实现负责任且公平的结果。

When to Use

适用场景

Use this skill when:
  • Product launches: New features, algorithm changes, UI redesigns that affect user experience or outcomes
  • Policy decisions: Terms of service updates, content moderation rules, data usage policies, pricing changes
  • Data & AI systems: Training models, deploying algorithms, using sensitive data, automated decision-making
  • Platform changes: Recommendation systems, search ranking, feed algorithms, matching/routing logic
  • Access & inclusion: Features affecting accessibility, vulnerable populations, underrepresented groups, global markets
  • Safety-critical systems: Health, finance, transportation, security applications where errors have serious consequences
  • High-stakes decisions: Hiring, lending, admissions, criminal justice, insurance where outcomes significantly affect lives
  • Content & communication: Moderation policies, fact-checking systems, content ranking, amplification rules
Trigger phrases: "ethical review", "impact assessment", "who might be harmed", "differential impact", "vulnerable populations", "bias audit", "fairness check", "safety analysis", "responsible AI", "unintended consequences"
在以下场景中使用本方法:
  • 产品发布:影响用户体验或结果的新功能、算法变更、UI重设计
  • 政策决策:服务条款更新、内容审核规则、数据使用政策、定价变更
  • 数据与AI系统:模型训练、算法部署、敏感数据使用、自动化决策
  • 平台变更:推荐系统、搜索排名、信息流算法、匹配/路由逻辑
  • 可访问性与包容性:影响无障碍使用、弱势群体、代表性不足群体、全球市场的功能
  • 安全关键系统:医疗、金融、交通、安全等领域的应用,其中错误会导致严重后果
  • 高风险决策:招聘、贷款、录取、司法、保险等显著影响用户生活的场景
  • 内容与沟通:审核政策、事实核查系统、内容排名、传播规则
触发关键词:"伦理审查"、"影响评估"、"谁可能受到伤害"、"差异化影响"、"弱势群体"、"偏见审计"、"公平性检查"、"安全分析"、"负责任AI"、"意外后果"

What Is It?

什么是伦理、安全与影响评估?

Ethics, Safety & Impact Assessment is a proactive evaluation framework that systematically examines:
  • Who is affected (stakeholder mapping, vulnerable groups)
  • What could go wrong (harm scenarios, failure modes)
  • Why it matters (severity, likelihood, distribution of impacts)
  • How to mitigate (design changes, safeguards, monitoring)
  • When to escalate (triggers, thresholds, review processes)
Core ethical principles:
  • Fairness: Equal treatment, non-discrimination, equitable outcomes across groups
  • Autonomy: User choice, informed consent, control over data and experience
  • Beneficence: Maximize benefits, design for positive impact
  • Non-maleficence: Minimize harms, "do no harm" as baseline
  • Transparency: Explain decisions, disclose limitations, build trust
  • Accountability: Clear ownership, redress mechanisms, audit trails
  • Privacy: Data protection, confidentiality, purpose limitation
  • Justice: Equitable distribution of benefits and burdens, address historical inequities
Quick example:
Scenario: Launching credit scoring algorithm for loan approvals
Ethical impact assessment:
  1. Stakeholders affected: Loan applicants (diverse demographics), lenders, society (economic mobility)
  2. Potential harms:
    • Disparate impact: Algorithm trained on historical data may perpetuate bias against protected groups (race, gender, age)
    • Opacity: Applicants denied loans without explanation, cannot contest decision
    • Feedback loops: Denying loans to disadvantaged groups → lack of credit history → continued denials
    • Economic harm: Incorrect denials prevent wealth building, perpetuate poverty
  3. Vulnerable groups: Racial minorities historically discriminated in lending, immigrants with thin credit files, young adults, people in poverty
  4. Mitigations:
    • Fairness audit: Test for disparate impact across protected classes, equalized odds
    • Explainability: Provide reason codes (top 3 factors), allow appeals
    • Alternative data: Include rent, utility payments to expand access
    • Human review: Flag edge cases for manual review, override capability
    • Regular monitoring: Track approval rates by demographic, quarterly bias audits
  5. Monitoring & escalation:
    • Metrics: Approval rate parity (within 10% across groups), false positive/negative rates, appeal overturn rate
    • Triggers: If disparate impact >20%, escalate to ethics committee
    • Review: Quarterly fairness audits, annual independent assessment
伦理、安全与影响评估是一种前瞻性评估框架,系统地审视以下内容:
  • 受影响对象(利益相关者映射、弱势群体)
  • 潜在风险(危害场景、故障模式)
  • 重要性原因(影响严重程度、发生可能性、影响分布)
  • 缓解方式(设计变更、保障措施、监控手段)
  • 升级触发条件(触发因素、阈值、审查流程)
核心伦理原则
  • 公平性:群体间平等对待、非歧视、结果公平
  • 自主性:用户选择权、知情同意、对数据和体验的控制权
  • 有益性:最大化收益、设计正向影响
  • 无害性:最小化危害,以"不造成伤害"为底线
  • 透明度:解释决策逻辑、披露局限性、建立信任
  • 问责制:明确所有权、补救机制、审计追踪
  • 隐私性:数据保护、保密性、用途限制
  • 公正性:收益与负担的公平分配、解决历史不公
快速示例
场景:推出用于贷款审批的信用评分算法
伦理影响评估
  1. 受影响利益相关者:贷款申请人(不同人群)、放贷机构、社会(经济流动性)
  2. 潜在危害
    • 差异化影响:基于历史数据训练的算法可能延续对受保护群体(种族、性别、年龄)的偏见
    • 不透明性:被拒贷的申请人无法获得解释,无法申诉决策
    • 反馈循环:拒绝向弱势群体放贷 → 缺乏信用记录 → 持续被拒贷
    • 经济危害:错误拒贷阻碍财富积累,加剧贫困
  3. 弱势群体:历史上在贷款领域遭受歧视的少数族裔、信用记录薄弱的移民、年轻人、贫困人口
  4. 缓解措施
    • 公平性审计:测试受保护群体间的差异化影响,确保平等机会
    • 可解释性:提供决策原因(前3个影响因素),允许申诉
    • 替代数据:纳入房租、水电缴费记录以扩大准入
    • 人工审核:标记边缘案例进行人工审查,提供推翻决策的权限
    • 定期监控:按人口统计数据追踪审批率,每季度进行偏见审计
  5. 监控与升级
    • 指标:审批率公平性(群体间差异≤10%)、假阳性/假阴性率、申诉推翻率
    • 触发条件:若差异化影响>20%,升级至伦理委员会
    • 审查:每季度公平性审计,每年独立评估

Workflow

工作流程

Copy this checklist and track your progress:
Ethics & Safety Assessment Progress:
- [ ] Step 1: Map stakeholders and identify vulnerable groups
- [ ] Step 2: Analyze potential harms and benefits
- [ ] Step 3: Assess fairness and differential impacts
- [ ] Step 4: Evaluate severity and likelihood
- [ ] Step 5: Design mitigations and safeguards
- [ ] Step 6: Define monitoring and escalation protocols
Step 1: Map stakeholders and identify vulnerable groups
Identify all affected parties (direct users, indirect, society). Prioritize vulnerable populations most at risk. See resources/template.md for stakeholder analysis framework.
Step 2: Analyze potential harms and benefits
Brainstorm what could go wrong (harms) and what value is created (benefits) for each stakeholder group. See resources/template.md for structured analysis.
Step 3: Assess fairness and differential impacts
Evaluate whether outcomes, treatment, or access differ across groups. Check for disparate impact. See resources/methodology.md for fairness criteria and measurement.
Step 4: Evaluate severity and likelihood
Score each harm on severity (1-5) and likelihood (1-5), prioritize high-risk combinations. See resources/template.md for prioritization framework.
Step 5: Design mitigations and safeguards
For high-priority harms, propose design changes, policy safeguards, oversight mechanisms. See resources/methodology.md for intervention types.
Step 6: Define monitoring and escalation protocols
Set metrics, thresholds, review cadence, escalation triggers. Validate using resources/evaluators/rubric_ethics_safety_impact.json. Minimum standard: Average score ≥ 3.5.
复制以下清单并跟踪进度:
Ethics & Safety Assessment Progress:
- [ ] Step 1: Map stakeholders and identify vulnerable groups
- [ ] Step 2: Analyze potential harms and benefits
- [ ] Step 3: Assess fairness and differential impacts
- [ ] Step 4: Evaluate severity and likelihood
- [ ] Step 5: Design mitigations and safeguards
- [ ] Step 6: Define monitoring and escalation protocols
步骤1:映射利益相关者并识别弱势群体
识别所有受影响方(直接用户、间接用户、社会)。优先关注风险最高的弱势群体。利益相关者分析框架请参见 resources/template.md
步骤2:分析潜在危害与收益
针对每个利益相关者群体, brainstorm 可能出现的问题(危害)和创造的价值(收益)。结构化分析模板请参见 resources/template.md
步骤3:评估公平性与差异化影响
评估不同群体的结果、待遇或访问权限是否存在差异,检查差异化影响。公平性标准与衡量方法请参见 resources/methodology.md
步骤4:评估严重程度与可能性
为每个危害评分(严重程度1-5分,可能性1-5分),优先处理高风险组合。优先级框架请参见 resources/template.md
步骤5:设计缓解措施与保障方案
针对高优先级危害,提出设计变更、政策保障、监督机制。干预类型请参见 resources/methodology.md
步骤6:定义监控与升级协议
设置指标、阈值、审查频率、升级触发条件。使用 resources/evaluators/rubric_ethics_safety_impact.json 进行验证。最低标准:平均得分≥3.5。

Common Patterns

常见模式

Pattern 1: Algorithm Fairness Audit
  • Stakeholders: Users receiving algorithmic decisions (hiring, lending, content ranking), protected groups
  • Harms: Disparate impact (bias against protected classes), feedback loops amplifying inequality, opacity preventing accountability
  • Assessment: Test for demographic parity, equalized odds, calibration across groups; analyze training data for historical bias
  • Mitigations: Debiasing techniques, fairness constraints, explainability, human review for edge cases, regular audits
  • Monitoring: Disparate impact ratio, false positive/negative rates by group, user appeals and overturn rates
Pattern 2: Data Privacy & Consent
  • Stakeholders: Data subjects (users whose data is collected), vulnerable groups (children, marginalized communities)
  • Harms: Privacy violations, surveillance, data breaches, lack of informed consent, secondary use without permission, re-identification risk
  • Assessment: Map data flows (collection → storage → use → sharing), identify sensitive attributes (PII, health, location), consent adequacy
  • Mitigations: Data minimization (collect only necessary), anonymization/differential privacy, granular consent, user data controls (export, delete), encryption
  • Monitoring: Breach incidents, data access logs, consent withdrawal rates, user data requests (GDPR, CCPA)
Pattern 3: Content Moderation & Free Expression
  • Stakeholders: Content creators, viewers, vulnerable groups (targets of harassment), society (information integrity)
  • Harms: Over-moderation (silencing legitimate speech, especially marginalized voices), under-moderation (allowing harm, harassment, misinformation), inconsistent enforcement
  • Assessment: Analyze moderation error rates (false positives/negatives), differential enforcement across groups, cultural context sensitivity
  • Mitigations: Clear policies with examples, appeals process, human review, diverse moderators, cultural context training, transparency reports
  • Monitoring: Moderation volume and error rates by category, appeal overturn rates, disparate enforcement across languages/regions
Pattern 4: Accessibility & Inclusive Design
  • Stakeholders: Users with disabilities (visual, auditory, motor, cognitive), elderly, low-literacy, low-bandwidth users
  • Harms: Exclusion (cannot use product), degraded experience, safety risks (cannot access critical features), digital divide
  • Assessment: WCAG compliance audit, assistive technology testing, user research with diverse abilities, cross-cultural usability
  • Mitigations: Accessible design (WCAG AA/AAA), alt text, keyboard navigation, screen reader support, low-bandwidth mode, multi-language, plain language
  • Monitoring: Accessibility test coverage, user feedback from disability communities, task completion rates across abilities
Pattern 5: Safety-Critical Systems
  • Stakeholders: End users (patients, drivers, operators), vulnerable groups (children, elderly, compromised health), public safety
  • Harms: Physical harm (injury, death), psychological harm (trauma), property damage, cascade failures affecting many
  • Assessment: Failure mode analysis (FMEA), fault tree analysis, worst-case scenarios, edge cases that break assumptions
  • Mitigations: Redundancy, fail-safes, human oversight, rigorous testing (stress, chaos, adversarial), incident response plans, staged rollouts
  • Monitoring: Error rates, near-miss incidents, safety metrics (accidents, adverse events), user-reported issues, compliance audits
模式1:算法公平性审计
  • 利益相关者:接收算法决策的用户(招聘、贷款、内容排名)、受保护群体
  • 危害:差异化影响(对受保护群体的偏见)、放大不平等的反馈循环、阻碍问责的不透明性
  • 评估:测试群体间的人口统计公平性、平等机会、校准度;分析训练数据中的历史偏见
  • 缓解措施:去偏技术、公平性约束、可解释性、边缘案例人工审查、定期审计
  • 监控:差异化影响比率、群体间假阳性/假阴性率、用户申诉与推翻率
模式2:数据隐私与同意
  • 利益相关者:数据主体(数据被收集的用户)、弱势群体(儿童、边缘化社区)
  • 危害:隐私侵犯、监控、数据泄露、缺乏知情同意、未经许可的二次使用、重识别风险
  • 评估:映射数据流(收集→存储→使用→共享)、识别敏感属性(PII、健康、位置)、同意充分性
  • 缓解措施:数据最小化(仅收集必要数据)、匿名化/差分隐私、精细化同意、用户数据控制(导出、删除)、加密
  • 监控:泄露事件、数据访问日志、同意撤回率、用户数据请求(GDPR、CCPA)
模式3:内容审核与言论自由
  • 利益相关者:内容创作者、观众、弱势群体(骚扰目标)、社会(信息完整性)
  • 危害:过度审核(压制合法言论,尤其是边缘化群体的声音)、审核不足(允许伤害、骚扰、错误信息)、执行不一致
  • 评估:分析审核错误率(假阳性/假阴性)、群体间的差异化执行、文化背景敏感性
  • 缓解措施:明确的政策与示例、申诉流程、人工审查、多元化审核员、文化背景培训、透明度报告
  • 监控:按类别划分的审核量与错误率、申诉推翻率、不同语言/地区的差异化执行情况
模式4:可访问性与包容性设计
  • 利益相关者:残障用户(视觉、听觉、运动、认知障碍)、老年人、低识字率用户、低带宽用户
  • 危害:排斥(无法使用产品)、体验降级、安全风险(无法访问关键功能)、数字鸿沟
  • 评估:WCAG合规性审计、辅助技术测试、针对不同能力用户的研究、跨文化可用性
  • 缓解措施:无障碍设计(WCAG AA/AAA)、替代文本、键盘导航、屏幕阅读器支持、低带宽模式、多语言、简洁语言
  • 监控:无障碍测试覆盖率、残障社区的用户反馈、不同能力用户的任务完成率
模式5:安全关键系统
  • 利益相关者:终端用户(患者、司机、操作员)、弱势群体(儿童、老年人、健康状况不佳者)、公共安全
  • 危害:身体伤害(受伤、死亡)、心理伤害(创伤)、财产损失、影响多人的连锁故障
  • 评估:故障模式分析(FMEA)、故障树分析、最坏情况场景、打破假设的边缘案例
  • 缓解措施:冗余设计、故障安全机制、人工监督、严格测试(压力测试、混沌测试、对抗测试)、事件响应计划、分阶段推出
  • 监控:错误率、未遂事件、安全指标(事故、不良事件)、用户报告问题、合规审计

Guardrails

准则要求

Critical requirements:
  1. Identify vulnerable groups explicitly: Not all stakeholders are equally at risk. Prioritize: children, elderly, people with disabilities, marginalized/discriminated groups, low-income, low-literacy, geographically isolated, politically targeted. If none identified, you're probably missing them.
  2. Consider second-order and long-term effects: First-order obvious harms are just the start. Look for: feedback loops (harm → disadvantage → more harm), normalization (practice becomes standard), precedent (enables worse future behavior), accumulation (small harms compound over time). Ask "what happens next?"
  3. Assess differential impact, not just average: Feature may help average user but harm specific groups. Metrics: disparate impact (outcome differences across groups >20% = red flag), intersectionality (combinations of identities may face unique harms), distributive justice (who gets benefits vs. burdens?).
  4. Design mitigations before launch, not after harm: Reactive fixes are too late for those already harmed. Proactive: Build safeguards into design, test with diverse users, staged rollout with monitoring, kill switches, pre-commit to audits. "Move fast and break things" is unethical for systems affecting people's lives.
  5. Provide transparency and recourse: People affected have right to know and contest. Minimum: Explain decisions (what factors, why outcome), Appeal mechanism (human review, overturn if wrong), Redress (compensate harm), Audit trails (investigate complaints). Opacity is often a sign of hidden bias or risk.
  6. Monitor outcomes, not just intentions: Good intentions don't prevent harm. Measure actual impacts: outcome disparities by group, user-reported harms, error rates and their distribution, unintended consequences. Set thresholds that trigger review/shutdown.
  7. Establish clear accountability and escalation: Assign ownership. Define: Who reviews ethics risks before launch? Who monitors post-launch? What triggers escalation? Who can halt harmful features? Document decisions and rationale for later review.
  8. Respect autonomy and consent: Users deserve: Informed choice (understand what they're agreeing to, in plain language), Meaningful alternatives (consent not coerced), Control (opt out, delete data, configure settings), Purpose limitation (data used only for stated purpose). Children and vulnerable groups need extra protections.
Common pitfalls:
  • Assuming "we treat everyone the same" = fairness: Equal treatment of unequal groups perpetuates inequality. Fairness often requires differential treatment.
  • Optimization without constraints: Maximizing engagement/revenue unconstrained leads to amplifying outrage, addiction, polarization. Set ethical boundaries.
  • Moving fast and apologizing later: For safety/ethics, prevention > apology. Harms to vulnerable groups are not acceptable experiments.
  • Privacy theater: Requiring consent without explaining risks, or making consent mandatory for service, is not meaningful consent.
  • Sampling bias in testing: Testing only on employees (young, educated, English-speaking) misses how diverse users experience harm.
  • Ethics washing: Performative statements without material changes. Impact assessments must change decisions, not just document them.
关键要求
  1. 明确识别弱势群体:并非所有利益相关者面临的风险都相同。优先关注:儿童、老年人、残障人士、边缘化/受歧视群体、低收入人群、低识字率人群、地理隔离人群、政治目标人群。若未识别出任何弱势群体,你可能遗漏了相关群体。
  2. 考虑二阶与长期影响:一阶明显危害只是开始。需关注:反馈循环(伤害→劣势→更多伤害)、常态化(实践成为标准)、先例(为更恶劣的未来行为铺路)、累积效应(小危害随时间复合)。问自己“接下来会发生什么?”
  3. 评估差异化影响,而非仅平均影响:功能可能对普通用户有益,但伤害特定群体。指标:差异化影响(群体间结果差异>20%=红色预警)、交叉性(身份组合可能面临独特危害)、分配正义(谁获得收益 vs 谁承担负担)。
  4. 在发布前设计缓解措施,而非伤害发生后:被动修复对已受伤害的人来说为时已晚。主动措施:在设计中内置保障、与多元化用户测试、分阶段推出并监控、紧急关闭开关、预先承诺审计。“快速行动,打破常规”对于影响用户生活的系统来说是不道德的。
  5. 提供透明度与补救途径:受影响的人有权知晓并申诉。最低要求:解释决策(影响因素、结果原因)、申诉机制(人工审查、错误时推翻)、补救(赔偿伤害)、审计追踪(调查投诉)。不透明往往是隐藏偏见或风险的信号。
  6. 监控结果,而非仅意图:良好意图无法防止伤害。衡量实际影响:群体间结果差异、用户报告的伤害、错误率及其分布、意外后果。设置触发审查/关闭的阈值。
  7. 建立明确的问责与升级机制:分配责任人。定义:谁在发布前审查伦理风险?谁在发布后监控?什么触发升级?谁可以停止有害功能?记录决策与理由以供后续审查。
  8. 尊重自主性与同意:用户应享有:知情选择(用简洁语言理解所同意的内容)、有意义的替代方案(同意非强制)、控制权(退出、删除数据、配置设置)、用途限制(数据仅用于声明的目的)。儿童与弱势群体需要额外保护。
常见误区
  • 假设“一视同仁”=公平:对不平等群体的平等待遇会延续不平等。公平往往需要差异化对待。
  • 无约束优化:无约束地最大化参与度/收入会导致放大愤怒、成瘾、极化。设定伦理边界。
  • 快速行动,事后道歉:对于安全/伦理问题,预防优于道歉。弱势群体受到的伤害不应被视为实验。
  • 隐私作秀:要求同意却不解释风险,或强制同意才能使用服务,都不是有意义的同意。
  • 测试中的抽样偏见:仅在员工(年轻、受过教育、英语使用者)中测试,会遗漏多元化用户的伤害体验。
  • 伦理洗白:仅做表面声明而无实质性改变。影响评估必须改变决策,而非仅记录决策。

Quick Reference

快速参考

Key resources:
  • resources/template.md: Stakeholder mapping, harm/benefit analysis, risk matrix, mitigation planning, monitoring framework
  • resources/methodology.md: Fairness metrics, privacy analysis, safety assessment, bias detection, participatory design
  • resources/evaluators/rubric_ethics_safety_impact.json: Quality criteria for stakeholder analysis, harm identification, mitigation design, monitoring
Stakeholder Priorities:
High-risk groups to always consider:
  • Children (<18, especially <13)
  • People with disabilities (visual, auditory, motor, cognitive)
  • Racial/ethnic minorities, especially historically discriminated groups
  • Low-income, unhoused, financially precarious
  • LGBTQ+, especially in hostile jurisdictions
  • Elderly (>65), especially digitally less-skilled
  • Non-English speakers, low-literacy
  • Political dissidents, activists, journalists in repressive contexts
  • Refugees, immigrants, undocumented
  • Mentally ill, cognitively impaired
Harm Categories:
  • Physical: Injury, death, health deterioration
  • Psychological: Trauma, stress, anxiety, depression, addiction
  • Economic: Lost income, debt, poverty, exclusion from opportunity
  • Social: Discrimination, harassment, ostracism, loss of relationships
  • Autonomy: Coercion, manipulation, loss of control, dignity violation
  • Privacy: Surveillance, exposure, data breach, re-identification
  • Reputational: Stigma, defamation, loss of standing
  • Epistemic: Misinformation, loss of knowledge access, filter bubbles
  • Political: Disenfranchisement, censorship, targeted repression
Fairness Definitions (choose appropriate for context):
  • Demographic parity: Outcome rates equal across groups (e.g., 40% approval rate for all)
  • Equalized odds: False positive and false negative rates equal across groups
  • Equal opportunity: True positive rate equal across groups (equal access to benefit)
  • Calibration: Predicted probabilities match observed frequencies for all groups
  • Individual fairness: Similar individuals treated similarly (Lipschitz condition)
  • Counterfactual fairness: Outcome same if sensitive attribute (race, gender) were different
Mitigation Strategies:
  • Prevent: Design change eliminates harm (e.g., don't collect sensitive data)
  • Reduce: Decrease likelihood or severity (e.g., rate limiting, friction for risky actions)
  • Detect: Monitor and alert when harm occurs (e.g., bias dashboard, anomaly detection)
  • Respond: Process to address harm when found (e.g., appeals, human review, compensation)
  • Safeguard: Redundancy, fail-safes, circuit breakers for critical failures
  • Transparency: Explain, educate, build understanding and trust
  • Empower: Give users control, choice, ability to opt out or customize
Monitoring Metrics:
  • Outcome disparities: Measure by protected class (approval rates, error rates, treatment quality)
  • Error distribution: False positives/negatives, who bears burden?
  • User complaints: Volume, categories, resolution rates, disparities
  • Engagement/retention: Differences across groups (are some excluded?)
  • Safety incidents: Volume, severity, affected populations
  • Consent/opt-outs: How many decline? Demographics of decliners?
Escalation Triggers:
  • Disparate impact >20% without justification
  • Safety incidents causing serious harm (injury, death)
  • Vulnerable group disproportionately affected (>2× harm rate)
  • User complaints spike (>2× baseline)
  • Press/regulator attention
  • Internal ethics concerns raised
When to escalate beyond this skill:
  • Legal compliance required (GDPR, ADA, Civil Rights Act, industry regulations)
  • Life-or-death safety-critical system (medical, transportation)
  • Children or vulnerable populations primary users
  • High controversy or political salience
  • Novel ethical terrain (new technology, no precedent) → Consult: Legal counsel, ethics board, domain experts, affected communities, regulators
Inputs required:
  • Feature or decision (what is being proposed? what changes?)
  • Affected groups (who is impacted? direct and indirect?)
  • Context (what problem does this solve? why now?)
Outputs produced:
  • ethics-safety-impact.md
    : Stakeholder analysis, harm/benefit assessment, fairness evaluation, risk prioritization, mitigation plan, monitoring framework, escalation protocol
关键资源
  • resources/template.md:利益相关者分析、危害/收益评估、风险矩阵、缓解计划、监控框架
  • resources/methodology.md:公平性指标、隐私分析、安全评估、偏见检测、参与式设计
  • resources/evaluators/rubric_ethics_safety_impact.json:利益相关者分析、危害识别、缓解设计、监控的质量标准
利益相关者优先级
需始终考虑的高风险群体:
  • 儿童(<18岁,尤其<13岁)
  • 残障人士(视觉、听觉、运动、认知障碍)
  • 种族/族裔少数群体,尤其是历史上受歧视的群体
  • 低收入、无家可归、经济不稳定人群
  • LGBTQ+群体,尤其在敌对司法管辖区
  • 老年人(>65岁),尤其数字技能薄弱者
  • 非英语使用者、低识字率人群
  • 异见人士、活动家、记者(在压制性环境中)
  • 难民、移民、无证件人士
  • 精神病患者、认知障碍者
危害类别
  • 身体伤害:受伤、死亡、健康恶化
  • 心理伤害:创伤、压力、焦虑、抑郁、成瘾
  • 经济伤害:收入损失、债务、贫困、被剥夺机会
  • 社会伤害:歧视、骚扰、排斥、失去关系
  • 自主性伤害:胁迫、操纵、失去控制、尊严侵犯
  • 隐私伤害:监控、曝光、数据泄露、重识别
  • 声誉伤害:污名、诽谤、地位丧失
  • 认知伤害:错误信息、失去知识获取渠道、过滤气泡
  • 政治伤害:剥夺选举权、审查、针对性压制
公平性定义(根据上下文选择合适的定义):
  • 人口统计公平性:群体间结果率相等(如所有群体40%的审批率)
  • 平等机会:群体间假阳性和假阴性率相等
  • 平等准入:群体间真阳性率相等(平等获得收益的机会)
  • 校准度:预测概率与所有群体的观察频率匹配
  • 个体公平性:相似个体得到相似对待(利普希茨条件)
  • 反事实公平性:若敏感属性(种族、性别)不同,结果相同
缓解策略
  • 预防:设计变更消除危害(如不收集敏感数据)
  • 降低:减少发生可能性或严重程度(如速率限制、高风险操作增加摩擦)
  • 检测:监控并在危害发生时发出警报(如偏见仪表板、异常检测)
  • 响应:发现危害后的处理流程(如申诉、人工审查、赔偿)
  • 保障:冗余设计、故障安全机制、关键故障的断路器
  • 透明度:解释、教育、建立理解与信任
  • 赋能:赋予用户控制权、选择权、退出或自定义的能力
监控指标
  • 结果差异:按受保护类别衡量(审批率、错误率、服务质量)
  • 错误分布:假阳性/假阴性,谁承担负担?
  • 用户投诉:数量、类别、解决率、差异
  • 参与度/留存率:群体间差异(是否有群体被排斥?)
  • 安全事件:数量、严重程度、受影响人群
  • 同意/退出率:多少人拒绝?拒绝者的人口统计数据?
升级触发条件
  • 差异化影响>20%且无正当理由
  • 造成严重伤害的安全事件(受伤、死亡)
  • 弱势群体受到不成比例的影响(伤害率>2倍)
  • 用户投诉激增(>基线的2倍)
  • 媒体/监管机构关注
  • 内部提出伦理担忧
何时需超越本方法升级
  • 需要合规(GDPR、ADA、民权法案、行业法规)
  • 生死攸关的安全关键系统(医疗、交通)
  • 儿童或弱势群体为主要用户
  • 高争议性或政治敏感性
  • 新型伦理领域(新技术、无先例) → 咨询:法律顾问、伦理委员会、领域专家、受影响社区、监管机构
所需输入
  • 功能或决策(提议的内容是什么?有哪些变更?)
  • 受影响群体(谁会受到影响?直接和间接?)
  • 上下文(解决什么问题?为什么是现在?)
产出
  • ethics-safety-impact.md
    :利益相关者分析、危害/收益评估、公平性评价、风险优先级、缓解计划、监控框架、升级协议