Skill
4
Agent
All Skills
Search
Tools
中文
|
EN
Explore
Loading...
Back to Details
enterprise-agent-ops
Compare original and translation side by side
🇺🇸
Original
English
🇨🇳
Translation
Chinese
Enterprise Agent Ops
企业级Agent运维
Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.
该技能适用于云托管或持续运行的Agent系统,这类系统需要超出单CLI会话范围的运营控制。
Operational Domains
运营域
runtime lifecycle (start, pause, stop, restart)
observability (logs, metrics, traces)
safety controls (scopes, permissions, kill switches)
change management (rollout, rollback, audit)
运行时生命周期(启动、暂停、停止、重启)
可观测性(日志、指标、链路追踪)
安全控制(作用域、权限、紧急停止开关)
变更管理(上线、回滚、审计)
Baseline Controls
基线控制
immutable deployment artifacts
least-privilege credentials
environment-level secret injection
hard timeout and retry budgets
audit log for high-risk actions
不可变部署制品
最小权限凭证
环境级密钥注入
硬性超时和重试预算
高风险操作审计日志
Metrics to Track
需跟踪的指标
success rate
mean retries per task
time to recovery
cost per successful task
failure class distribution
成功率
单任务平均重试次数
恢复时间
单成功任务成本
故障类型分布
Incident Pattern
事件处理模式
When failure spikes:
freeze new rollout
capture representative traces
isolate failing route
patch with smallest safe change
run regression + security checks
resume gradually
当故障激增时:
冻结新上线
采集代表性链路追踪数据
隔离故障路由
采用最小安全变更进行补丁修复
运行回归测试+安全检查
逐步恢复服务
Deployment Integrations
部署集成
This skill pairs with:
PM2 workflows
systemd services
container orchestrators
CI/CD gates
该技能可与以下工具搭配使用:
PM2工作流
systemd服务
容器编排器
CI/CD门禁