emergency-release-workflow
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEmergency Release Workflow Skill
紧急发布工作流技能
Summary
概述
Fast-track workflow for critical production issues requiring immediate deployment. Covers urgency assessment, expedited PR process, deployment verification, and post-incident analysis.
针对需要立即部署的关键生产环境问题的快速通道工作流,涵盖紧急程度评估、加急PR流程、部署验证以及事后分析。
When to Use
适用场景
- Critical production bugs affecting users
- Security vulnerabilities (CVEs)
- Urgent business requirements
- Data integrity issues
- Service outages
- Payment processing failures
- 影响用户的生产环境关键漏洞
- 安全漏洞(CVE)
- 紧急业务需求
- 数据完整性问题
- 服务中断
- 支付处理故障
Urgency Assessment
紧急程度评估
Priority Levels
优先级等级
| Level | Type | Response Time | Deployment | Example |
|---|---|---|---|---|
| P0 | Security vulnerability | < 2 hours | Immediate to production | Auth bypass, data leak, active exploit |
| P1 | Production down | < 4 hours | Same day | App crash, complete feature failure, payment down |
| P2 | Major bug | < 24 hours | Next business day | Critical feature broken, significant user impact |
| P3 | Business critical | < 48 hours | Scheduled release | Marketing campaign blocker, partner deadline |
| 等级 | 类型 | 响应时间 | 部署要求 | 示例 |
|---|---|---|---|---|
| P0 | 安全漏洞 | < 2小时 | 立即部署到生产环境 | 认证绕过、数据泄露、活跃漏洞利用 |
| P1 | 生产环境宕机 | < 4小时 | 当日部署 | 应用崩溃、核心功能完全失效、支付服务不可用 |
| P2 | 重大漏洞 | < 24小时 | 下一个工作日部署 | 核心功能损坏、大量用户受影响 |
| P3 | 业务关键问题 | < 48小时 | 计划发布 | 营销活动阻塞、合作方截止日期要求 |
P0 Criteria (Immediate Action)
P0判定标准(需立即处理)
- Authentication/authorization bypass
- Data breach or exposure
- Remote code execution vulnerability
- Production service completely unavailable
- Data corruption affecting multiple users
- Payment processing completely broken
- 认证/授权绕过
- 数据泄露或暴露
- 远程代码执行漏洞
- 生产服务完全不可用
- 影响多用户的数据损坏
- 支付处理完全故障
P1 Criteria (Same Day)
P1判定标准(当日处理)
- Critical feature completely broken
- Error affecting majority of users
- Revenue-impacting bug
- Database connectivity issues
- Third-party integration failure (critical service)
- 核心功能完全损坏
- 影响大多数用户的错误
- 影响营收的漏洞
- 数据库连接问题
- 第三方集成故障(核心服务)
P2 Criteria (Next Business Day)
P2判定标准(下一个工作日处理)
- Major feature partially broken
- Affects specific user segment
- Workaround available but not ideal
- Performance degradation (not complete failure)
- 核心功能部分损坏
- 影响特定用户群体
- 存在可用但不理想的临时解决方案
- 性能下降(未完全失效)
Emergency Release Process
紧急发布流程
1. Create Hotfix Branch
1. 创建Hotfix分支
bash
undefinedbash
undefinedBranch from current production (main)
从当前生产分支(main)拉取代码
git checkout main
git pull origin main
git checkout main
git pull origin main
Create hotfix branch
创建hotfix分支
git checkout -b hotfix/ENG-XXX-brief-description
git checkout -b hotfix/ENG-XXX-brief-description
Example:
示例:
git checkout -b hotfix/ENG-1234-fix-auth-bypass
undefinedgit checkout -b hotfix/ENG-1234-fix-auth-bypass
undefined2. Implement Minimal Fix
2. 实现最小范围修复
⚠️ CRITICAL: Minimal change only
DO:
✅ Fix the immediate issue
✅ Add regression test
✅ Document root cause in comments
DON'T:
❌ Refactor surrounding code
❌ Fix unrelated issues
❌ Add new features
❌ Update dependencies (unless that's the fix)⚠️ 关键注意事项:仅做最小范围改动
推荐操作:
✅ 仅修复当前紧急问题
✅ 添加回归测试
✅ 在注释中记录根因
禁止操作:
❌ 重构周边代码
❌ 修复不相关的问题
❌ 添加新功能
❌ 更新依赖(除非这就是修复方案本身)3. Test Thoroughly
3. 充分测试
bash
undefinedbash
undefinedRun full test suite
运行完整测试套件
pnpm test
pnpm test
Type check
类型检查
pnpm tsc --noEmit
pnpm tsc --noEmit
Build verification
构建验证
pnpm build
pnpm build
Manual testing checklist:
手动测试 checklist:
- [ ] Reproduce original issue
- [ ] 复现原始问题
- [ ] Verify fix resolves issue
- [ ] 验证修复方案可解决问题
- [ ] Test happy path
- [ ] 测试正常流程
- [ ] Test edge cases
- [ ] 测试边界场景
- [ ] Verify no new issues introduced
- [ ] 验证未引入新问题
undefinedundefined4. Create PR with Labels
4. 创建带标签的PR
bash
git add .
git commit -m "fix: [brief description of fix]
Fixes critical issue where [description].
Root cause: [explanation].
Ticket: ENG-XXX
Priority: P0"
git push origin hotfix/ENG-XXX-brief-descriptionUse clear labels in PR title:
- - Direct to production
[RELEASE] - - Critical fix, expedited review
[HOTFIX] - or
[P0]- Priority indicator[P1]
bash
git add .
git commit -m "fix: [修复内容简要说明]
修复了[问题描述]的关键问题。
根因: [解释说明]。
关联工单: ENG-XXX
优先级: P0"
git push origin hotfix/ENG-XXX-brief-description在PR标题中使用清晰的标签:
- - 直接发布到生产环境
[RELEASE] - - 关键修复,加急审核
[HOTFIX] - 或
[P0]- 优先级标识[P1]
PR Template for Hotfixes
Hotfix PR模板
Hotfix PR Template
Hotfix PR模板
markdown
undefinedmarkdown
undefined🚨 [RELEASE] ENG-XXX: Brief description of fix
🚨 [RELEASE] ENG-XXX: 修复内容简要描述
Urgency
紧急程度
- P0 - Security vulnerability
- P1 - Production down
- P2 - Major bug
- P3 - Business critical
- P0 - 安全漏洞
- P1 - 生产环境宕机
- P2 - 重大漏洞
- P3 - 业务关键问题
Impact
影响范围
Users affected: [All users / Premium tier / Specific region / etc.]
Severity: [Choose one]
- Service completely unavailable
- Critical feature broken
- Security vulnerability
- Data integrity issue
- Degraded performance
User impact:
Describe how this affects end users.
受影响用户: [所有用户 / 高级会员 / 特定区域 / 等]
严重程度: [选择一项]
- 服务完全不可用
- 核心功能损坏
- 安全漏洞
- 数据完整性问题
- 性能下降
用户影响:
描述该问题对终端用户的影响。
Root Cause
根因
[Brief explanation of what caused the issue]
How it happened:
- [Step 1]
- [Step 2]
- [Result: issue manifested]
Why it wasn't caught:
- Missing test coverage
- Race condition in production
- External service behavior changed
- Recent deployment introduced regression
- Other: [explain]
[问题产生原因的简要说明]
问题发生过程:
- [步骤1]
- [步骤2]
- [结果:问题显现]
未被提前发现的原因:
- 缺少测试覆盖
- 生产环境存在竞态条件
- 外部服务行为变更
- 近期部署引入了回归问题
- 其他: [说明]
The Fix
修复方案
[What this PR changes to resolve the issue]
Changes made:
- Modified to [specific change]
file.ts - Added validation for [specific case]
- Fixed logic in [specific function]
Why this fixes it:
[Explanation of how the change resolves the root cause]
[本PR为解决问题做了哪些改动]
改动内容:
- 修改了 以实现[具体改动]
file.ts - 为[特定场景]添加了校验
- 修复了[特定函数]中的逻辑
修复原理:
[解释该改动如何解决根因问题]
Testing
测试情况
- ✅ Reproduced issue locally
- ✅ Verified fix resolves issue
- ✅ Regression test added
- ✅ No other functionality affected
- ✅ Tested edge cases
- ✅ Deployed to staging and verified
- ✅ 本地复现了问题
- ✅ 验证修复方案可解决问题
- ✅ 添加了回归测试
- ✅ 未影响其他功能
- ✅ 测试了边界场景
- ✅ 部署到预发环境并验证通过
Regression Test
回归测试
typescript
// Test added to prevent recurrence
describe('ENG-XXX: Auth bypass fix', () => {
it('should reject expired tokens', async () => {
const expiredToken = generateExpiredToken();
const response = await fetch('/api/protected', {
headers: { Authorization: `Bearer ${expiredToken}` }
});
expect(response.status).toBe(401);
});
});typescript
// 新增测试防止问题复现
describe('ENG-XXX: Auth bypass fix', () => {
it('should reject expired tokens', async () => {
const expiredToken = generateExpiredToken();
const response = await fetch('/api/protected', {
headers: { Authorization: `Bearer ${expiredToken}` }
});
expect(response.status).toBe(401);
});
});Rollback Plan
回滚方案
If this causes issues:
bash
undefined如果本次部署引发问题:
bash
undefinedOption 1: Revert commit
方案1: 回滚提交
git revert <commit-hash>
git push origin main
git revert <commit-hash>
git push origin main
Option 2: Deploy previous version
方案2: 部署之前的版本
vercel rollback # or your platform's rollback command
vercel rollback # 或你的平台对应的回滚命令
Option 3: Feature flag
方案3: 功能开关
Set FEATURE_FIX_XXX=false in environment
**Monitoring:**
- [ ] Error rate in Sentry
- [ ] API response times in monitoring dashboard
- [ ] User reports in support channels在环境变量中设置FEATURE_FIX_XXX=false
**监控内容:**
- [ ] Sentry中的错误率
- [ ] 监控面板中的API响应时间
- [ ] 支持渠道的用户反馈Deploy Checklist
部署 Checklist
- PR approved by at least one reviewer (waive for P0 if necessary)
- All CI checks pass
- Deployed to staging and verified
- Monitoring alerts configured
- On-call engineer notified
- Ready for production deployment
- 至少1名审核人批准PR(P0问题必要时可跳过)
- 所有CI检查通过
- 部署到预发环境并验证通过
- 配置好监控告警
- 通知值守工程师
- 准备好生产环境部署
Post-Deploy Verification
部署后验证
Immediately after deploy:
- Verify fix in production (test endpoint directly)
- Check error tracking (Sentry, etc.)
- Monitor for new errors
- Confirm user reports stop coming in
Metrics to watch:
- Error rate (should drop)
- API latency (should remain stable)
- User activity (should normalize)
部署后立即执行:
- 在生产环境验证修复效果(直接测试接口)
- 检查错误追踪工具(Sentry等)
- 监控是否有新错误产生
- 确认用户反馈停止
需要关注的指标:
- 错误率(应当下降)
- API延迟(应当保持稳定)
- 用户活跃度(应当恢复正常)
Follow-Up
后续跟进
- Update Linear ticket with resolution
- Schedule post-incident review (if P0/P1)
- Create tickets for proper fix (if this was a band-aid)
- Update runbook/documentation
---- 更新Linear工单的解决详情
- 安排事后复盘(P0/P1问题)
- 为正式修复创建工单(如果当前是临时方案)
- 更新运行手册/文档
---Deployment Steps
部署步骤
Pre-Deployment
部署前
bash
undefinedbash
undefined1. Merge PR to main
1. 合并PR到main分支
(After approval or P0 emergency waiver)
(在获得批准或P0紧急豁免后)
2. Pull latest
2. 拉取最新代码
git checkout main
git pull origin main
git checkout main
git pull origin main
3. Verify commit
3. 验证提交
git log -1
git log -1
Confirm this is your hotfix commit
确认这是你的hotfix提交
4. Tag release (if using semantic versioning)
4. 打发布标签(如果使用语义化版本)
git tag -a v2.3.5 -m "Hotfix: Fix auth bypass vulnerability"
git push origin v2.3.5
undefinedgit tag -a v2.3.5 -m "Hotfix: Fix auth bypass vulnerability"
git push origin v2.3.5
undefinedDeployment (Platform-Specific)
部署(按平台选择)
Vercel
Vercel
bash
undefinedbash
undefinedTrigger production deployment
触发生产环境部署
vercel --prod
vercel --prod
Or use Vercel dashboard:
或使用Vercel面板:
Deployments → Select commit → Deploy to Production
Deployments → 选择对应提交 → Deploy to Production
Monitor deployment
监控部署
vercel logs --follow
undefinedvercel logs --follow
undefinedNetlify
Netlify
bash
undefinedbash
undefinedDeploy via CLI
通过CLI部署
netlify deploy --prod
netlify deploy --prod
Or trigger from dashboard:
或通过面板触发:
Deploys → Select commit → Publish deploy
Deploys → 选择对应提交 → Publish deploy
undefinedundefinedRailway
Railway
bash
undefinedbash
undefinedPush to main triggers deployment automatically
推送到main分支会自动触发部署
Monitor in dashboard: railway.app/project/logs
在面板中监控: railway.app/project/logs
undefinedundefinedAWS/GCP/Azure
AWS/GCP/Azure
bash
undefinedbash
undefinedFollow platform-specific deployment process
遵循对应平台的部署流程
Example for AWS Elastic Beanstalk:
AWS Elastic Beanstalk示例:
eb deploy production --staged
eb deploy production --staged
Monitor:
监控:
eb logs --follow
undefinedeb logs --follow
undefinedPost-Deployment Verification
部署后验证
1. Smoke Test
1. 冒烟测试
bash
undefinedbash
undefinedTest the specific fix
测试特定修复点
curl -X POST https://api.production.com/auth/login
-H "Content-Type: application/json"
-d '{"token": "expired_token"}'
-H "Content-Type: application/json"
-d '{"token": "expired_token"}'
curl -X POST https://api.production.com/auth/login
-H "Content-Type: application/json"
-d '{"token": "expired_token"}'
-H "Content-Type: application/json"
-d '{"token": "expired_token"}'
Expected: 401 Unauthorized
预期返回: 401 Unauthorized
undefinedundefined2. Monitor Error Tracking
2. 监控错误追踪
✅ Check Sentry/Rollbar/etc.:
- Error rate should drop
- No new errors introduced
⏱️ Monitor for 15-30 minutes after deployment✅ 检查Sentry/Rollbar等工具:
- 错误率应当下降
- 无新错误引入
⏱️ 部署后持续监控15-30分钟3. Verify Metrics
3. 验证指标
Check monitoring dashboard:
- API response times (should be normal)
- Error rates (should drop)
- Database performance (should be stable)
- Third-party service health检查监控面板:
- API响应时间(应当正常)
- 错误率(应当下降)
- 数据库性能(应当稳定)
- 第三方服务健康状态4. Check User Reports
4. 检查用户反馈
Monitor support channels:
- Support tickets
- In-app chat
- Social media
- Status page comments监控支持渠道:
- 支持工单
- 应用内聊天
- 社交媒体
- 状态页评论Communication
沟通机制
Internal Communication
内部沟通
Slack/Teams Message Template
Slack/Teams消息模板
🚨 **Production Hotfix Deployed**
**Issue**: [Brief description]
**Ticket**: ENG-XXX
**Priority**: P0
**Status**: ✅ Resolved
**Timeline:**
- Issue discovered: 14:23 UTC
- Fix deployed: 15:47 UTC
- Duration: 1h 24m
**Impact**:
[Who was affected and how]
**Root Cause**:
[Brief explanation]
**Fix**:
[What was changed]
**Verification**:
✅ Error rate dropped from 450/min to 0/min
✅ All systems operating normally
**PR**: https://github.com/org/repo/pull/XXX
**Follow-up**:
- [ ] Post-incident review scheduled for [date]
- [ ] Documentation updated🚨 **生产环境Hotfix已部署**
**问题**: [简要描述]
**关联工单**: ENG-XXX
**优先级**: P0
**状态**: ✅ 已解决
**时间线:**
- 问题发现: 14:23 UTC
- 修复部署: 15:47 UTC
- 持续时长: 1小时24分钟
**影响**:
[说明受影响人群和影响程度]
**根因**:
[简要说明]
**修复方案**:
[做了哪些改动]
**验证情况**:
✅ 错误率从450次/分钟下降到0次/分钟
✅ 所有系统运行正常
**PR地址**: https://github.com/org/repo/pull/XXX
**后续跟进**:
- [ ] 事后复盘已安排在[日期]
- [ ] 文档已更新External Communication (if needed)
外部沟通(必要时)
Status Page Update
状态页更新
🟢 Resolved - [Issue Title]
We've resolved an issue that was affecting [feature/service].
**What happened:**
Between 14:23 and 15:47 UTC, users experienced [specific issue].
**Current status:**
The issue has been fully resolved. All systems are operating normally.
**Next steps:**
We're conducting a thorough review to prevent similar issues in the future.
We apologize for any inconvenience.🟢 已解决 - [问题标题]
我们已经修复了影响[功能/服务]的问题。
**事件经过:**
在14:23到15:47 UTC期间,用户遇到了[具体问题]。
**当前状态:**
问题已完全解决,所有系统正常运行。
**后续步骤:**
我们正在进行全面复盘,防止类似问题再次发生。
对给您带来的不便我们深表歉意。Email to Affected Users (for serious issues)
给受影响用户的邮件(严重问题使用)
Subject: Update on [Service] Issue - Resolved
Hi [User],
We're writing to update you on an issue that affected [feature/service] earlier today.
**What happened:**
Between [time] and [time], you may have experienced [specific issue].
**Resolution:**
Our team quickly identified and resolved the root cause. The service is now operating normally.
**What we're doing:**
We take these issues seriously and are:
- Conducting a full review of the incident
- Implementing additional safeguards
- Improving our monitoring
We apologize for any inconvenience this may have caused.
If you have any questions or concerns, please reach out to support@company.com.
Thank you for your patience.
The [Company] Team主题: [服务]问题更新 - 已解决
您好 [用户],
本次邮件告知您今日早些时候影响[功能/服务]的问题进展。
**事件经过:**
在[时间]到[时间]期间,您可能遇到了[具体问题]。
**解决情况:**
我们的团队快速定位并解决了根因,服务现已恢复正常运行。
**我们的后续措施:**
我们非常重视此类问题,正在:
- 对事件进行全面复盘
- 落地额外的防护措施
- 优化我们的监控能力
对给您带来的不便我们深表歉意。
如果您有任何问题或顾虑,请联系support@company.com。
感谢您的耐心。
[公司名称]团队Post-Incident
事后处理
Immediate Actions (Within 24 hours)
立即行动(24小时内)
- Update Linear ticket with full resolution details
- Add incident to incident log/spreadsheet
- Document timeline of events
- Identify metrics that should have alerted earlier
- Create follow-up tickets for proper fix (if hotfix was temporary)
- 更新Linear工单,补充完整解决详情
- 将事件添加到事件日志/表格中
- 记录事件时间线
- 识别应当提前告警的指标
- 为正式修复创建后续工单(如果hotfix是临时方案)
Post-Incident Review (PIR) - For P0/P1
事后复盘(PIR)- P0/P1问题适用
Schedule within 72 hours
72小时内安排
PIR Template
PIR模板
markdown
undefinedmarkdown
undefinedPost-Incident Review: [ENG-XXX]
事后复盘: [ENG-XXX]
Date: YYYY-MM-DD
Severity: P0/P1
Duration: Xh Xm
日期: YYYY-MM-DD
严重程度: P0/P1
持续时长: X小时X分钟
Summary
概述
Brief description of the incident.
事件的简要描述。
Timeline (UTC)
时间线(UTC)
- 14:23 - Issue first detected
- 14:25 - On-call engineer alerted
- 14:30 - Root cause identified
- 14:45 - Fix PR opened
- 15:20 - PR approved and merged
- 15:47 - Fix deployed to production
- 16:00 - Verified resolved
- 14:23 - 首次检测到问题
- 14:25 - 值守工程师收到告警
- 14:30 - 定位到根因
- 14:45 - 提交修复PR
- 15:20 - PR审核通过并合并
- 15:47 - 修复部署到生产环境
- 16:00 - 验证问题已解决
Impact
影响
- Users affected: ~5,000 users
- Duration: 1h 24m
- User experience: Unable to log in
- Revenue impact: Estimated $X in lost transactions
- Reputation impact: 23 support tickets, 5 social media mentions
- 受影响用户: ~5,000名用户
- 持续时长: 1小时24分钟
- 用户体验: 无法登录
- 营收影响: 预估损失$X交易金额
- 品牌影响: 23个支持工单,5次社交媒体提及
Root Cause
根因
Detailed technical explanation of what caused the issue.
[Include code snippets, sequence diagrams if helpful]
问题产生的详细技术说明。
[可附上代码片段、时序图辅助说明]
Resolution
解决方案
What was changed to fix the issue.
为修复问题做了哪些改动。
What Went Well
做得好的地方
- ✅ Fast detection (2 minutes after deploy)
- ✅ Clear reproduction steps identified quickly
- ✅ Team collaborated effectively
- ✅ Fix deployed in under 90 minutes
- ✅ 检测速度快(部署后2分钟发现问题)
- ✅ 快速梳理出清晰的复现步骤
- ✅ 团队协作高效
- ✅ 90分钟内完成部署修复
What Went Wrong
待改进的地方
- ❌ Missing test coverage for expired token edge case
- ❌ Staging didn't catch the issue (different token expiry settings)
- ❌ No automatic rollback triggered
- ❌ Monitoring alert threshold too high
- ❌ 过期token边界场景缺少测试覆盖
- ❌ 预发环境未发现问题(token过期配置与生产不一致)
- ❌ 未触发自动回滚
- ❌ 监控告警阈值设置过高
Action Items
行动项
- ENG-XXX-1: Add test for expired token validation (@engineer, by YYYY-MM-DD)
- ENG-XXX-2: Align staging token expiry with production (@devops, by YYYY-MM-DD)
- ENG-XXX-3: Implement automatic rollback on error spike (@platform, by YYYY-MM-DD)
- ENG-XXX-4: Lower monitoring alert threshold (@observability, by YYYY-MM-DD)
- ENG-XXX-5: Add runbook for similar issues (@oncall, by YYYY-MM-DD)
- ENG-XXX-1: 添加过期token校验测试 (@工程师, 截止YYYY-MM-DD)
- ENG-XXX-2: 对齐预发与生产环境的token过期配置 (@devops, 截止YYYY-MM-DD)
- ENG-XXX-3: 实现错误突增时自动回滚 (@平台团队, 截止YYYY-MM-DD)
- ENG-XXX-4: 降低监控告警阈值 (@可观测性团队, 截止YYYY-MM-DD)
- ENG-XXX-5: 补充同类问题的运行手册 (@值守团队, 截止YYYY-MM-DD)
Prevention
预防措施
How we'll prevent this from happening again:
- Testing: Add test coverage for edge cases
- Monitoring: Improve alerting thresholds
- Process: Update deployment checklist
- Documentation: Create runbook for on-call
我们将如何避免同类问题再次发生:
- 测试: 补充边界场景的测试覆盖
- 监控: 优化告警阈值
- 流程: 更新部署checklist
- 文档: 为值守团队创建运行手册
Lessons Learned
经验总结
Key takeaways for the team.
---团队的核心收获。
---Backport Strategy
补丁回退策略
When fix needs to go to multiple branches/environments:
当修复需要同步到多个分支/环境时:
Multiple Environment Deployment
多环境部署
bash
undefinedbash
undefined1. Fix applied to main (production)
1. 修复已合并到main(生产环境)分支
git checkout main
git cherry-pick <hotfix-commit-hash>
git checkout main
git cherry-pick <hotfix-commit-hash>
2. Backport to release candidate
2. 同步到待发布候选分支
git checkout release-candidate
git cherry-pick <hotfix-commit-hash>
git push origin release-candidate
git checkout release-candidate
git cherry-pick <hotfix-commit-hash>
git push origin release-candidate
3. Backport to develop
3. 同步到开发分支
git checkout develop
git cherry-pick <hotfix-commit-hash>
git push origin develop
git checkout develop
git cherry-pick <hotfix-commit-hash>
git push origin develop
Create PRs for each backport:
为每个回退创建PR:
gh pr create --base release-candidate --head backport/rc/hotfix-ENG-XXX
gh pr create --base develop --head backport/dev/hotfix-ENG-XXX
undefinedgh pr create --base release-candidate --head backport/rc/hotfix-ENG-XXX
gh pr create --base develop --head backport/dev/hotfix-ENG-XXX
undefinedHandling Merge Conflicts
处理合并冲突
bash
undefinedbash
undefinedIf cherry-pick fails due to conflicts
如果cherry-pick因冲突失败
git cherry-pick <hotfix-commit-hash>
git cherry-pick <hotfix-commit-hash>
CONFLICT in file.ts
CONFLICT in file.ts
Resolve conflicts manually
手动解决冲突
Then:
然后执行:
git add file.ts
git cherry-pick --continue
undefinedgit add file.ts
git cherry-pick --continue
undefinedAlternative: Patch File
替代方案:补丁文件
bash
undefinedbash
undefinedCreate patch from hotfix
从hotfix生成补丁
git format-patch -1 <hotfix-commit-hash>
git format-patch -1 <hotfix-commit-hash>
Creates: 0001-fix-auth-bypass.patch
生成文件: 0001-fix-auth-bypass.patch
Apply to other branch
应用到其他分支
git checkout release-candidate
git apply 0001-fix-auth-bypass.patch
---git checkout release-candidate
git apply 0001-fix-auth-bypass.patch
---Summary
总结
Emergency Release Quick Reference
紧急发布快速参考
Decision Tree
决策树
Is production broken?
├─ Yes → Severity level?
│ ├─ P0 (security/down) → Deploy immediately, inform after
│ ├─ P1 (critical bug) → Fast-track PR, deploy same day
│ └─ P2 (major bug) → Standard expedited process
└─ No → Use normal deployment process生产环境是否故障?
├─ 是 → 严重等级?
│ ├─ P0 (安全问题/宕机) → 立即部署,事后同步
│ ├─ P1 (关键漏洞) → 加急PR,当日部署
│ └─ P2 (重大漏洞) → 标准加急流程
└─ 否 → 使用正常发布流程Time Targets
时间目标
- P0: Issue → Deploy in < 2 hours
- P1: Issue → Deploy in < 4 hours
- P2: Issue → Deploy in < 24 hours
- P0: 问题发现 → 部署完成 < 2小时
- P1: 问题发现 → 部署完成 < 4小时
- P2: 问题发现 → 部署完成 < 24小时
Key Principles
核心原则
- Minimal Change: Fix only the immediate issue
- Add Regression Test: Prevent recurrence
- Fast Feedback: Deploy to staging first (except P0)
- Clear Communication: Keep stakeholders informed
- Learn & Improve: Conduct PIR for P0/P1
- 最小改动: 仅修复当前紧急问题
- 添加回归测试: 防止问题复现
- 快速反馈: 先部署到预发环境验证(P0除外)
- 清晰沟通: 同步所有相关方进度
- 学习改进: P0/P1事件必须开展事后复盘
Checklist
Checklist
- Assess urgency correctly
- Create hotfix branch from production
- Implement minimal fix with regression test
- Use appropriate PR labels
- Test thoroughly (staging for P1/P2)
- Get approval (or waive for P0)
- Deploy with monitoring
- Verify fix in production
- Communicate status
- Schedule PIR (P0/P1)
- Create follow-up tickets
Use this skill when production issues require immediate attention and fast-track deployment outside normal release processes.
- 正确评估紧急程度
- 从生产分支创建hotfix分支
- 实现最小范围修复并添加回归测试
- 使用正确的PR标签
- 充分测试(P1/P2需在预发环境验证)
- 获得审核批准(P0可豁免)
- 部署并开启监控
- 在生产环境验证修复效果
- 同步状态
- 安排事后复盘(P0/P1)
- 创建后续跟进工单
当生产环境出现问题,需要在正常发布流程外紧急快速部署时,使用本技能。