hosted-agents
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHosted Agent Infrastructure
托管代理基础设施
Hosted agents run in remote sandboxed environments rather than on local machines. When designed well, they provide unlimited concurrency, consistent execution environments, and multiplayer collaboration. The critical insight is that session speed should be limited only by model provider time-to-first-token, with all infrastructure setup completed before the user starts their session.
托管Agent运行在远程沙箱环境中,而非本地机器。设计完善的托管Agent可提供无限并发能力、一致的执行环境以及多玩家协作功能。核心要点在于,会话速度应仅受模型提供商的首令牌生成时间限制,所有基础设施配置需在用户启动会话前完成。
When to Activate
激活场景
Activate this skill when:
- Building background coding agents that run independently of user devices
- Designing sandboxed execution environments for agent workloads
- Implementing multiplayer agent sessions with shared state
- Creating multi-client agent interfaces (Slack, Web, Chrome extensions)
- Scaling agent infrastructure beyond local machine constraints
- Building systems where agents spawn sub-agents for parallel work
在以下场景中激活此技能:
- 构建独立于用户设备运行的后台编码Agent
- 为Agent工作负载设计沙箱执行环境
- 实现具备共享状态的多玩家Agent会话
- 创建多客户端Agent交互界面(Slack、网页、Chrome扩展)
- 突破本地机器限制,扩展Agent基础设施规模
- 构建可生成子Agent以并行处理任务的系统
Core Concepts
核心概念
Move agent execution to remote sandboxed environments to eliminate the fundamental limits of local execution: resource contention, environment inconsistency, and single-user constraints. Remote sandboxes unlock unlimited concurrency, reproducible environments, and collaborative workflows because each session gets its own isolated compute with a known-good environment image.
Design the architecture in three layers because each layer scales independently. Build sandbox infrastructure for isolated execution, an API layer for state management and client coordination, and client interfaces for user interaction across platforms. Keep these layers cleanly separated so sandbox changes do not ripple into clients.
将Agent执行转移至远程沙箱环境,消除本地执行的根本性限制:资源竞争、环境不一致以及单用户约束。远程沙箱可解锁无限并发、可复现环境与协作工作流,因为每个会话都拥有独立的计算资源与已知可用的环境镜像。
将架构分为三层设计,因为每层可独立扩展:构建用于隔离执行的沙箱基础设施、用于状态管理与客户端协调的API层,以及支持跨平台用户交互的客户端界面。保持各层清晰分离,确保沙箱的变更不会影响客户端。
Detailed Topics
详细主题
Sandbox Infrastructure
沙箱基础设施
The Core Challenge
Eliminate sandbox spin-up latency because users perceive anything over a few seconds as broken. Development environments require cloning repositories, installing dependencies, and running build steps -- do all of this before the user ever submits a prompt.
Image Registry Pattern
Pre-build environment images on a regular cadence (every 30 minutes works well) because this makes synchronization with the latest code a fast delta rather than a full clone. Include in each image:
- Cloned repository at a known commit
- All runtime dependencies installed
- Initial setup and build commands completed
- Cached files from running app and test suite once
When starting a session, spin up a sandbox from the most recent image. The repository is at most 30 minutes out of date, making the remaining git sync fast.
Snapshot and Restore
Take filesystem snapshots at key points to enable instant restoration for follow-up prompts without re-running setup:
- After initial image build (base snapshot)
- When agent finishes making changes (session snapshot)
- Before sandbox exit for potential follow-up
Git Configuration for Background Agents
Configure git identity explicitly in every sandbox because background agents are not tied to a specific user during image builds:
- Generate GitHub app installation tokens for repository access during clone
- Set git config and
user.namewhen committing and pushing changesuser.email - Use the prompting user's identity for commits, not the app identity
Warm Pool Strategy
Maintain a pool of pre-warmed sandboxes for high-volume repositories because cold starts are the primary source of user frustration:
- Keep sandboxes ready before users start sessions
- Expire and recreate pool entries as new image builds complete
- Start warming a sandbox as soon as a user begins typing (predictive warm-up)
核心挑战
消除沙箱启动延迟,因为用户会将超过几秒的等待视为功能故障。开发环境需要克隆代码库、安装依赖并执行构建步骤——所有这些操作都要在用户提交提示之前完成。
镜像仓库模式
定期预构建环境镜像(每30分钟一次效果良好),这样与最新代码的同步只需快速拉取增量变更,而非完整克隆。每个镜像中应包含:
- 已克隆至指定提交版本的代码库
- 已安装的所有运行时依赖
- 已完成的初始配置与构建步骤
- 已运行过一次应用与测试套件的缓存文件
启动会话时,从最新镜像创建沙箱。代码库最多仅滞后30分钟,剩余的Git同步操作会非常快速。
快照与恢复
在关键节点创建文件系统快照,以便在后续提示时即时恢复,无需重新执行配置步骤:
- 初始镜像构建完成后(基础快照)
- Agent完成变更后(会话快照)
- 沙箱退出前(用于潜在的后续操作)
后台Agent的Git配置
在每个沙箱中显式配置Git身份,因为在镜像构建阶段,后台Agent并不绑定特定用户:
- 生成GitHub应用安装令牌,用于克隆时的代码库访问
- 提交与推送变更时设置git config的和
user.nameuser.email - 提交记录归属发起提示的用户,而非应用本身
预热池策略
为高访问量的代码库维护预预热的沙池,因为冷启动是用户不满的主要来源:
- 在用户启动会话前准备好沙箱
- 当新镜像构建完成时,过期并重新创建池中的沙箱
- 用户开始输入时即启动沙箱预热(预测性预热)
Agent Framework Selection
Agent框架选择
Server-First Architecture
Structure the agent framework as a server first, with TUI and desktop apps as thin clients, because this prevents duplicating agent logic across surfaces:
- Multiple custom clients share one agent backend
- Consistent behavior across all interaction surfaces
- Plugin systems extend functionality without client changes
- Event-driven architectures deliver real-time updates to any connected client
Code as Source of Truth
Select frameworks where the agent can read its own source code to understand behavior. Prioritize this because having code as source of truth prevents the agent from hallucinating about its own capabilities -- an underrated failure mode in AI development.
Plugin System Requirements
Require a plugin system that supports runtime interception because this enables safety controls and observability without modifying core agent logic:
- Listen to tool execution events (e.g., )
tool.execute.before - Block or modify tool calls conditionally
- Inject context or state at runtime
服务器优先架构
将Agent框架设计为服务器优先模式,TUI与桌面应用作为轻量客户端,这样可避免在不同交互端重复实现Agent逻辑:
- 多个自定义客户端共享同一个Agent后端
- 所有交互端行为保持一致
- 插件系统无需修改客户端即可扩展功能
- 事件驱动架构可向所有连接的客户端推送实时更新
代码作为事实来源
选择Agent可读取自身源代码以理解行为的框架。优先考虑这一点,因为将代码作为事实来源可避免Agent对自身能力产生幻觉——这是AI开发中一个被低估的故障模式。
插件系统要求
插件系统需支持运行时拦截,这样无需修改核心Agent逻辑即可实现安全控制与可观测性:
- 监听工具执行事件(如)
tool.execute.before - 有条件地阻止或修改工具调用
- 在运行时注入上下文或状态
Speed Optimizations
速度优化
Predictive Warm-Up
Start warming the sandbox as soon as a user begins typing their prompt, not when they submit it, because the typing interval (5-30 seconds) is enough to complete most setup:
- Clone latest changes in parallel with user typing
- Run initial setup before user hits enter
- For fast spin-up, sandbox can be ready before user finishes typing
Parallel File Reading
Allow the agent to start reading files immediately even if sync from latest base branch is not complete, because in large repositories incoming prompts rarely touch recently-changed files:
- Agent can research immediately without waiting for git sync
- Block file edits (not reads) until synchronization completes
- This separation is safe because read-time data staleness of 30 minutes rarely matters for research
Maximize Build-Time Work
Move everything possible to the image build step because build-time duration is invisible to users:
- Full dependency installation
- Database schema setup
- Initial app and test suite runs (populates caches)
预测性预热
用户开始输入提示时即启动沙箱预热,而非等到提交时,因为输入间隔(5-30秒)足够完成大部分配置:
- 在用户输入的同时并行克隆最新变更
- 用户按下回车前完成初始配置
- 实现快速启动,沙箱可在用户完成输入前准备就绪
并行文件读取
允许Agent在与最新基础分支的同步完成前立即开始读取文件,因为在大型代码库中,用户的提示很少涉及最近变更的文件:
- Agent无需等待Git同步即可立即开展研究
- 同步完成前仅阻止文件编辑(不阻止读取)
- 这种分离是安全的,因为30分钟的读取数据过期对研究几乎无影响
最大化构建阶段工作
将所有可实现的操作转移至镜像构建阶段,因为构建时长对用户不可见:
- 完整的依赖安装
- 数据库架构配置
- 初始应用与测试套件运行(填充缓存)
Self-Spawning Agents
自生成Agent
Agent-Spawned Sessions
Build tools that allow agents to spawn new sessions because frontier models are capable of decomposing work and coordinating sub-tasks:
- Research tasks across different repositories
- Parallel subtask execution for large changes
- Multiple smaller PRs from one major task
Expose three primitives: start a new session with specified parameters, read status of any session (check-in capability), and continue main work while sub-sessions run in parallel.
Prompt Engineering for Self-Spawning
Engineer prompts that guide when agents should spawn sub-sessions rather than doing work inline:
- Research tasks that require cross-repository exploration
- Breaking monolithic changes into smaller PRs
- Parallel exploration of different approaches
Agent生成的会话
构建允许Agent生成新会话的工具,因为前沿模型具备分解任务与协调子任务的能力:
- 跨不同代码库的研究任务
- 大型变更的并行子任务执行
- 从单个主要任务生成多个小型PR
提供三个核心原语:启动带指定参数的新会话、读取任意会话的状态(检查能力)、在子会话并行运行时继续主任务。
自生成的提示工程
设计可引导Agent何时生成子会话而非内联处理任务的提示:
- 需要跨代码库探索的研究任务
- 将单体拆分为多个小型PR
- 不同方案的并行探索
API Layer
API层
Per-Session State Isolation
Isolate state per session (SQLite per session works well) because cross-session interference is a subtle and hard-to-debug failure mode:
- Dedicated database per session
- No session can impact another's performance
- Architecture handles hundreds of concurrent sessions
Real-Time Streaming
Stream all agent work in real-time because high-frequency feedback is critical for user trust:
- Token streaming from model providers
- Tool execution status updates
- File change notifications
Use WebSocket connections with hibernation APIs to reduce compute costs during idle periods while maintaining open connections.
Synchronization Across Clients
Build a single state system that synchronizes across all clients (chat interfaces, Slack bots, Chrome extensions, web interfaces, VS Code instances) because users switch surfaces frequently and expect continuity. All changes sync to the session state, enabling seamless client switching.
会话级状态隔离
为每个会话隔离状态(每个会话使用独立SQLite即可),因为跨会话干扰是一种难以调试的隐性故障:
- 每个会话拥有专属数据库
- 会话之间不会互相影响性能
- 架构可支持数百个并发会话
实时流传输
实时传输所有Agent工作内容,因为高频反馈对用户信任至关重要:
- 模型提供商的令牌流
- 工具执行状态更新
- 文件变更通知
使用带休眠API的WebSocket连接,在空闲时段降低计算成本的同时保持连接开放。
跨客户端同步
构建可跨所有客户端(聊天界面、Slack机器人、Chrome扩展、网页界面、VS Code实例)同步的单一状态系统,因为用户经常切换交互端并期望保持连续性。所有变更同步至会话状态,实现无缝的客户端切换。
Multiplayer Support
多玩家支持
Why Multiplayer Matters
Design for multiplayer from day one because it is nearly free to add with proper synchronization architecture, and it unlocks high-value workflows:
- Teaching non-engineers to use AI effectively
- Live QA sessions with multiple team members
- Real-time PR review with immediate changes
- Collaborative debugging sessions
Implementation Requirements
Build the data model so sessions are not tied to single authors because multiplayer fails silently if authorship is hardcoded:
- Pass authorship info to each prompt
- Attribute code changes to the prompting user
- Share session links for instant collaboration
多玩家的重要性
从设计初期就考虑多玩家支持,因为借助合适的同步架构,实现多玩家功能几乎无额外成本,且能解锁高价值工作流:
- 指导非工程师有效使用AI
- 多团队成员参与的实时QA会话
- 可即时修改的实时PR评审
- 协作调试会话
实现要求
设计数据模型时,确保会话不绑定单一作者,因为如果作者身份被硬编码,多玩家功能会静默失效:
- 为每个提示传递作者信息
- 代码变更归属发起提示的用户
- 共享会话链接以实现即时协作
Authentication and Authorization
认证与授权
User-Based Commits
Use GitHub authentication to open PRs on behalf of the user (not the app) because this preserves the audit trail and prevents users from approving their own AI-generated changes:
- Obtain user tokens for PR creation
- PRs appear as authored by the human, not the bot
Sandbox-to-API Flow
Follow this sequence because it keeps sandbox permissions minimal while letting the API handle sensitive operations:
- Sandbox pushes changes (updating git user config)
- Sandbox sends event to API with branch name and session ID
- API uses user's GitHub token to create PR
- GitHub webhooks notify API of PR events
基于用户的提交
使用GitHub认证代表用户创建PR(而非应用本身),这样可保留审计轨迹,避免用户审批自己的AI生成变更:
- 获取用户令牌以创建PR
- PR显示为人类用户提交,而非机器人
沙箱到API的流程
遵循以下流程,可在保持沙箱权限最小化的同时,让API处理敏感操作:
- 沙箱推送变更(更新Git用户配置)
- 沙箱向API发送包含分支名称与会话ID的事件
- API使用用户的GitHub令牌创建PR
- GitHub Webhook通知API PR事件
Client Implementations
客户端实现
Slack Integration
Prioritize Slack as the first distribution channel for internal adoption because it creates a virality loop as team members see others using it:
- No syntax required, natural chat interface
- Build a classifier (fast model with repo descriptions) to determine which repository to work in
- Include hints for common repositories; allow "unknown" for ambiguous cases
Web Interface
Build a web interface with these features because it serves as the primary power-user surface:
- Real-time streaming of agent work on desktop and mobile
- Hosted VS Code instance running inside sandbox
- Streamed desktop view for visual verification
- Before/after screenshots for PRs
- Statistics page: sessions resulting in merged PRs (primary metric), usage over time, live "humans prompting" count
Chrome Extension
Build a Chrome extension for non-engineering users because DOM and React internals extraction gives higher precision than raw screenshots at lower token cost:
- Sidebar chat interface with screenshot tool
- Extract DOM/React internals instead of raw images
- Distribute via managed device policy (bypasses Chrome Web Store)
Slack集成
优先将Slack作为内部推广的首个分发渠道,因为团队成员看到他人使用时会形成病毒式传播循环:
- 无需语法,自然聊天界面
- 构建分类器(结合代码库描述的轻量模型)以确定要操作的代码库
- 为常见代码库提供提示;对模糊情况允许标记为“未知”
网页界面
为网页界面添加以下功能,使其成为核心高级用户交互端:
- 桌面与移动端的Agent工作实时流
- 运行在沙箱内的托管VS Code实例
- 用于视觉验证的桌面流视图
- PR的前后对比截图
- 统计页面:产生合并PR的会话数(核心指标)、使用时长趋势、实时“用户提示”数量
Chrome扩展
为非工程师用户构建Chrome扩展,因为DOM与React内部结构提取比原始截图的令牌成本更低,且精度更高:
- 带截图工具的侧边栏聊天界面
- 提取DOM/React内部结构而非原始图片
- 通过托管设备策略分发(绕过Chrome应用商店)
Practical Guidance
实践指导
Follow-Up Message Handling
后续消息处理
Choose between queueing and inserting follow-up messages sent during execution. Prefer queueing because it is simpler to manage and lets users send thoughts on next steps while the agent works. Build a mechanism to stop the agent mid-execution when needed, because without it users feel trapped.
选择队列或插入方式处理执行过程中收到的后续消息。优先使用队列,因为其管理更简单,且允许用户在Agent工作时发送下一步想法。构建可在需要时中途停止Agent的机制,因为没有此功能会让用户感到被动。
Metrics That Matter
关键指标
Track these metrics because they indicate real value rather than vanity usage:
- Sessions resulting in merged PRs (primary success metric)
- Time from session start to first model response
- PR approval rate and revision count
- Agent-written code percentage across repositories
跟踪以下指标,因为它们反映实际价值而非虚荣性使用数据:
- 产生合并PR的会话数(核心成功指标)
- 会话启动到首次模型响应的时间
- PR通过率与修订次数
- 各代码库中Agent编写代码的占比
Adoption Strategy
推广策略
Drive internal adoption through visibility rather than mandates because forced usage breeds resentment:
- Work in public spaces (Slack channels) for visibility
- Let the product create virality loops
- Do not force usage over existing tools
- Build to people's needs, not hypothetical requirements
通过可见性而非强制要求推动内部采用,因为强制使用会引发抵触情绪:
- 在公共空间(Slack频道)工作以增加可见性
- 让产品自然形成病毒式传播循环
- 不要强制用户放弃现有工具
- 围绕用户需求构建,而非假设性需求
Guidelines
指导原则
- Pre-build environment images on regular cadence (30 minutes is a good default)
- Start warming sandboxes when users begin typing, not when they submit
- Allow file reads before git sync completes; block only writes
- Structure agent framework as server-first with clients as thin wrappers
- Isolate state per session to prevent cross-session interference
- Attribute commits to the user who prompted, not the app
- Track merged PRs as primary success metric
- Build for multiplayer from the start; it is nearly free with proper sync architecture
- 定期预构建环境镜像(默认30分钟一次)
- 用户开始输入时即启动沙箱预热,而非提交时
- 允许在Git同步完成前读取文件;仅阻止写入操作
- 将Agent框架设计为服务器优先模式,客户端作为轻量封装
- 为每个会话隔离状态,防止跨会话干扰
- 提交记录归属发起提示的用户,而非应用本身
- 将合并PR数作为核心成功指标
- 从设计初期就考虑多玩家支持;借助合适的同步架构,实现成本几乎为零
Gotchas
常见陷阱
- Cold start latency: First sandbox spin-up takes 30-60s and users perceive this as broken. Use warm pools and predictive warm-up on keystroke to eliminate perceived wait time.
- Image staleness: Infrequent image rebuilds mean agents run with outdated dependencies or code. Set a 30-minute rebuild cadence and monitor image age; alert if builds fail silently.
- Sandbox cost runaway: Long-running agents without timeout or budget caps accumulate unexpected costs. Set hard timeout limits (default 4 hours) and per-session cost ceilings.
- Auth token expiration mid-session: Long tasks fail when GitHub tokens expire partway through. Implement token refresh logic and check token validity before sensitive operations like PR creation.
- Git config in sandboxes: Missing or
user.namecauses commit failures in background agents. Always set git identity explicitly during sandbox configuration, never assume it carries over from the image.user.email - State loss on sandbox recycle: Agents lose completed work if the sandbox is recycled or times out before results are extracted. Always snapshot before termination and extract artifacts (branches, PRs, files) before letting the sandbox die.
- Oversubscribing warm pools: Maintaining too many warm sandboxes wastes money during low-traffic periods. Scale pool size based on traffic patterns and time-of-day; use autoscaling rather than fixed pool sizes.
- Missing output extraction: Agents complete work inside the sandbox but results never get pulled out to the user. Build explicit extraction steps (push branch, create PR, return file contents) into the session teardown flow.
- 冷启动延迟:首次沙箱启动需30-60秒,用户会认为功能故障。使用预热池与按键触发的预测性预热消除感知等待时间。
- 镜像过时:镜像重建频率过低会导致Agent使用过期依赖或代码。设置30分钟的重建周期,监控镜像时长;若构建静默失败则触发告警。
- 沙箱成本失控:无超时或预算上限的长期运行Agent会产生意外成本。设置硬超时限制(默认4小时)与会话级成本上限。
- 会话中令牌过期:长时间任务会因GitHub令牌中途过期而失败。实现令牌刷新逻辑,并在创建PR等敏感操作前检查令牌有效性。
- 沙箱中的Git配置:缺少或
user.name会导致后台Agent提交失败。沙箱配置时始终显式设置Git身份,切勿假设会从镜像继承。user.email - 沙箱回收时状态丢失:若沙箱在结果导出前被回收或超时,Agent完成的工作会丢失。终止前始终创建快照,并在沙箱销毁前导出工件(分支、PR、文件)。
- 预热池过度订阅:低峰期维护过多预热沙箱会浪费资金。根据流量模式与时段调整池大小;使用自动扩缩容而非固定池大小。
- 输出提取缺失:Agent在沙箱内完成工作但结果未交付给用户。在会话收尾流程中添加显式提取步骤(推送分支、创建PR、返回文件内容)。
Integration
集成
This skill builds on multi-agent-patterns for agent coordination and tool-design for agent-tool interfaces. It connects to:
- multi-agent-patterns - Self-spawning agents follow supervisor patterns
- tool-design - Building tools for agent spawning and status checking
- context-optimization - Managing context across distributed sessions
- filesystem-context - Using filesystem for session state and artifacts
此技能基于多Agent模式实现Agent协调,基于工具设计实现Agent-工具交互。它与以下内容关联:
- multi-agent-patterns - 自生成Agent遵循管理者模式
- tool-design - 构建用于Agent生成与状态检查的工具
- context-optimization - 管理分布式会话的上下文
- filesystem-context - 使用文件系统存储会话状态与工件
References
参考资料
Internal reference:
- Infrastructure Patterns - Read when: implementing sandbox lifecycle, image builds, or warm pool logic for the first time
Related skills in this collection:
- multi-agent-patterns - Read when: designing self-spawning or supervisor coordination patterns
- tool-design - Read when: building tools for agent session management or status checking
- context-optimization - Read when: context windows fill up across distributed agent sessions
External resources:
- Ramp - Read when: evaluating whether to build vs. buy background agent infrastructure
- Modal Sandboxes - Read when: choosing a cloud sandbox provider or comparing isolation models
- Cloudflare Durable Objects - Read when: designing per-session state management with WebSocket hibernation
- OpenCode - Read when: selecting a server-first agent framework or studying plugin architectures
内部参考:
- 基础设施模式 - 首次实现沙箱生命周期、镜像构建或预热池逻辑时阅读
本套件中的相关技能:
- multi-agent-patterns - 设计自生成或管理者协调模式时阅读
- tool-design - 构建Agent会话管理或状态检查工具时阅读
- context-optimization - 分布式Agent会话上下文窗口占满时阅读
外部资源:
- Ramp - 评估自主构建还是采购后台Agent基础设施时阅读
- Modal Sandboxes - 选择云沙箱提供商或比较隔离模型时阅读
- Cloudflare Durable Objects - 设计带WebSocket休眠的会话级状态管理时阅读
- OpenCode - 选择服务器优先Agent框架或研究插件架构时阅读
Skill Metadata
技能元数据
Created: 2026-01-12
Last Updated: 2026-03-17
Author: Agent Skills for Context Engineering Contributors
Version: 1.1.0
创建时间:2026-01-12
最后更新:2026-03-17
作者:上下文工程技能贡献者
版本:1.1.0