agentdeploy-deploy

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AgentDeploy Deploy

AgentDeploy 部署

Use this skill when the user wants an application deployed onto AgentDeploy, or when an existing AgentDeploy deployment needs to be updated or debugged.
当用户需要将应用部署到AgentDeploy环境,或需要更新、调试已有的AgentDeploy部署时,可使用本技能。

What this skill covers

本技能涵盖的内容

  • infer the right split between
    SharedInfra
    and
    Service
  • choose the correct workload type and minimum infrastructure
  • validate and dry-run before changing live state
  • deploy with
    agentdeploy
    through the Platform API when available, then poll structured status
  • debug policy, auth, infrastructure, and rollout failures
Read references/service-contract.md when writing or editing
SharedInfra
or
Service
. Read references/operations.md when running the CLI or handling failures.
Use the templates in
assets/
as the starting point:
  • assets/shared-infra.yaml
  • assets/service-web.yaml
  • assets/service-api.yaml
  • assets/service-worker.yaml
  • assets/service-cron.yaml
  • 合理划分
    SharedInfra
    Service
    的职责边界
  • 选择正确的工作负载类型与最小化基础设施配置
  • 在变更生产环境前进行验证与试运行
  • 当可用时,通过Platform API使用
    agentdeploy
    进行部署,随后轮询结构化状态
  • 调试策略、认证、基础设施与发布故障
编写或编辑
SharedInfra
Service
时,请阅读references/service-contract.md。 运行CLI或处理故障时,请阅读references/operations.md
assets/
目录下的模板为起点:
  • assets/shared-infra.yaml
  • assets/service-web.yaml
  • assets/service-api.yaml
  • assets/service-worker.yaml
  • assets/service-cron.yaml

Prerequisite: CLI availability

前置条件:CLI可用性

Before using this skill, make sure
agentdeploy
is installed and on
PATH
.
Current supported user install path:
bash
command -v brew
使用本技能前,请确保
agentdeploy
已安装并添加到
PATH
中。
当前支持的用户安装路径:
bash
command -v brew

if brew is missing, stop and ask the user to install Homebrew themselves from:

如果未安装brew,请停止操作并引导用户从以下地址自行安装Homebrew:

continue only after brew is available on PATH

仅当brew已添加到PATH后再继续

command -v gh
command -v gh

if gh is missing:

如果未安装gh:

brew install gh
gh auth login gh auth setup-git gh auth status
brew tap elementx-ai/tap https://github.com/elementx-ai/homebrew-tap brew install --HEAD elementx-ai/tap/agentdeploy
brew install gh
gh auth login gh auth setup-git gh auth status
brew tap elementx-ai/tap https://github.com/elementx-ai/homebrew-tap brew install --HEAD elementx-ai/tap/agentdeploy

or, if it is already installed:

若已安装,执行升级:

brew upgrade --fetch-HEAD elementx-ai/tap/agentdeploy

This is the current private macOS install path. If `brew` is missing, direct the user to [brew.sh](https://brew.sh/) and wait for them to finish that install themselves before continuing. If `gh`, GitHub auth, or `agentdeploy` is still unavailable after that, stop and report the install blocker before attempting deploy commands.

Before debugging any feature mismatch between docs and the installed CLI, run:

```bash
agentdeploy version
The current CLI already carries the prototype Platform API URL and Entra scope by default. On the first API-backed command it will run its own Entra device-code login flow and cache the session token until it expires. You only need to set API flags or environment variables when overriding the installation defaults.
Treat
AGENTDEPLOY_CONFIG_REPO_REMOTE
as an explicit fallback only for intentional direct GitOps mode without the Platform API.
brew upgrade --fetch-HEAD elementx-ai/tap/agentdeploy

以上是当前适用于私有macOS环境的安装路径。若缺少brew,请引导用户访问[brew.sh](https://brew.sh/)完成安装,待安装完成后再继续。若完成上述步骤后`gh`、GitHub认证或`agentdeploy`仍不可用,请停止操作并报告安装阻塞问题,再尝试部署命令。

在调试文档与已安装CLI之间的功能不匹配问题前,先运行:

```bash
agentdeploy version
当前CLI默认已内置原型Platform API URL和Entra作用域。首次执行基于API的命令时,它会自动运行Entra设备码登录流程,并缓存会话令牌直至过期。仅当需要覆盖安装默认值时,才需设置API标志或环境变量。
仅在有意使用无Platform API的直接GitOps模式时,才将
AGENTDEPLOY_CONFIG_REPO_REMOTE
作为显式回退选项。

Workflow

工作流程

  1. Confirm the CLI is available, then inspect the app before writing contracts.
    • Run
      command -v agentdeploy
      before using the workflow.
    • Look for whether it serves HTTP, runs background work, or is purely scheduled.
    • Check whether it already expects
      PORT
      ,
      DATABASE_URL
      ,
      REDIS_URL
      , or migration commands.
    • Prefer reusing an existing immutable image digest. Use a
      build:
      block only when the user wants AgentDeploy to build from source.
  2. Choose the application shape.
    • If the app has exactly one service, no shared state, no
      valueFrom.infrastructureRef
      , and no
      valueFrom.serviceRef
      , a single
      Service
      is enough.
    • In that standalone shape, AgentDeploy can bootstrap
      Namespace
      ,
      ResourceQuota
      , and
      LimitRange
      directly from the
      Service
      record.
    • If multiple services need to share a namespace, PostgreSQL, Redis, object storage, or service-to-service wiring, create a
      SharedInfra
      record first and then point each service at it through
      metadata.application
      .
    • Treat
      SharedInfra
      as the only place that owns built-in DB, Redis, or object-storage resources. Do not put
      spec.infrastructure
      on a
      Service
      ; that model is gone.
  3. Choose the workload shape for each service.
    • web
      : browser-facing app with HTTP ingress.
    • api
      : browser-consumed backend with HTTP ingress.
    • worker
      : no ingress, background process.
    • cron
      : scheduled job, no ingress.
  4. Create or update the contracts.
    • Start from
      assets/shared-infra.yaml
      when the app needs shared DB/Redis or multiple services.
    • Start each service from the closest service template in
      assets/
      .
    • Keep the app name DNS-safe.
    • If the repo does not already declare deploy metadata, derive a unique app name and subdomain from the repo or directory name so the deployment does not collide with an existing app.
    • If
      owner
      is missing from repo context, prefer a real maintainer email from git config, docs, or project context. Only invent a synthetic owner for an explicit smoke test.
    • If
      team
      is missing from repo context, prefer an obvious team name from the repo, parent directory, or surrounding project docs. If none exists, choose a clearly temporary team name for a smoke test and call out the assumption.
    • Default to
      dataClassification: internal
      unless the user explicitly says the data is more sensitive.
    • Keep
      metadata.application
      explicit on services when more than one service shares the same app.
    • For a tiny standalone app, letting
      metadata.application
      default to
      metadata.name
      is fine.
    • Default to
      visibility: internal
      unless the user explicitly needs public reachability.
    • Default to
      authorization.mode: group-based
      unless the user explicitly wants
      org-wide
      and policy allows it.
    • Prefer dedicated Entra security groups for app access. Use broader team-wide groups only when the whole team should be able to use the app.
    • For smoke tests or lab installs without real group IDs,
      org-wide
      is acceptable only for
      internal
      apps and only when the installation policy allows it. Do not use it for sensitive data.
    • If the user explicitly wants a fully public app with no shared auth at all, warn them first that the app will be reachable anonymously on the public internet and will not receive any shared identity headers.
    • For that mode, use:
      • spec.dataClassification: public
      • spec.access.auth: none
      • spec.access.authorization.mode: none
    • In the current policy set, unauthenticated app access is allowed only for
      dataClassification: public
      . Call out that the app will not receive
      X-Auth-Request-*
      identity headers in that mode.
    • If the app needs PostgreSQL, declare it on
      SharedInfra
      under
      spec.infrastructure.databases.<name>
      and wire one of the DB env variants from
      valueFrom.infrastructureRef
      :
      • DATABASE_URL
        or
        DATABASE_URL_SYNC
        for libpq / sync clients
      • DATABASE_URL_ASYNC
        for common async Python stacks
    • If the app needs Redis, declare it on
      SharedInfra
      under
      spec.infrastructure.redis.<name>
      and wire
      REDIS_URL
      or
      REDIS_URL_TLS
      from
      valueFrom.infrastructureRef
      .
    • If the app needs shared upload or document handoff storage across services, declare it on
      SharedInfra
      under
      spec.infrastructure.objects.<name>
      and wire it from
      valueFrom.infrastructureRef
      with
      kind: objectStorage
      .
    • For split API/worker document flows, prefer object storage plus object keys over local-path handoff. Keep local disk only for scratch via
      runtime.filesystem.writablePaths
      .
    • If one service needs to call another service in the same application, wire that URL with
      valueFrom.serviceRef
      instead of hardcoding domains or patching manifests.
    • In the current prototype, Redis is only supported for
      dataClassification: internal
      .
    • Make sure the process actually listens on
      runtime.port
      . If the app expects a
      PORT
      env var, set it explicitly.
    • runtime.command
      and
      runtime.args
      are supported. Use them when the workload needs an explicit startup command instead of baking a wrapper image only for process launch.
    • If the app needs writable ephemeral directories, use
      runtime.filesystem.writablePaths
      instead of asking users to patch manifests by hand.
    • Only set
      runtime.filesystem.readOnlyRootFilesystem: false
      when the app genuinely cannot work with explicit writable mounts. Treat that as a security tradeoff and call it out.
  5. Validate before deploying.
    • Prefer the API-owned path when the installation provides it. The current CLI already defaults to the prototype API URL and Entra scope, so only set flags or environment variables when you need to override those defaults.
    • Validate
      SharedInfra
      first when present, then validate each dependent
      Service
      .
    • Run
      agentdeploy validate --file <contract>.yaml
      .
    • If you intentionally want offline local-engine validation instead of the hosted API path, pass
      --api-url=
      .
    • Inspect
      effective_service
      or
      effective_infra
      , plus
      manifest_files
      and
      warnings
      , in the response. They are the fastest way to catch dropped or mismatched fields before a deploy.
    • For a standalone service-only app,
      manifest_files
      should include
      namespace.yaml
      ,
      resourcequota.yaml
      , and
      limitrange.yaml
      . If those files are absent, the app is no longer on the standalone bootstrap path.
    • Treat
      QUOTA_*
      errors as pre-flight failures against the current rendered namespace limits, not as generic rollout failures.
    • Treat capacity-related warnings as best-effort scheduler signals. They do not block deploy by themselves, but they mean the cluster may be too full to place the new pods.
    • Fix errors by following the exact
      field
      ,
      allowed_values
      , and
      suggested_value
      in the JSON response.
    • Then run
      agentdeploy deploy --file <contract>.yaml --dry-run
      .
    • In the current prototype,
      deploy --dry-run
      can still return
      status: accepted
      and an
      operation_id
      . Treat it as preview-only. Review
      preview_only
      ,
      effective_service
      or
      effective_infra
      ,
      manifest_files
      , and
      warnings
      rather than assuming a live deployment started.
  6. Deploy for real.
    • Deploy
      SharedInfra
      first when present, then deploy each dependent
      Service
      .
    • Run
      agentdeploy deploy --file <contract>.yaml
      .
    • Capture
      operation_id
      , the reported record name, and the initial phase.
    • Do not assume
      git_commit
      or revision are returned immediately. Live deploys are queued and executed asynchronously.
    • If local CLI mode returns
      DEPLOY_NO_LIVE_TARGET
      , stop. Use the Platform API path or configure a real GitOps remote. Only use
      AGENTDEPLOY_ALLOW_LOCAL_GITOPS=true
      when the user explicitly wants local-only GitOps testing.
    • If the deploy returns
      DEPLOY_MISSING_SHARED_INFRA
      , the app is not a true standalone service. Either deploy
      SharedInfra
      first or remove the extra shared-state / service-ref coupling.
    • If the deploy is rejected with
      DEPLOY_OPERATION_ALREADY_IN_PROGRESS
      , check whether the active operation is still desirable. If it is stuck or obsolete, run
      agentdeploy cancel <record>
      and then submit the replacement deploy.
    • Poll with
      agentdeploy status <record>
      until the phase is
      live
      or an error is returned.
  7. Verify the result.
    • Start with the aggregate application view for multi-service apps:
      • agentdeploy applications
      • agentdeploy app-status <team> <application>
      • agentdeploy app-explain <team> <application>
    • Use the aggregate view to confirm
      SharedInfra
      plus all dependent services are converging together before drilling into a single record.
    • Use the URLs returned by
      status
      or
      explain
      for services. Do not hardcode domains because each installation can differ.
    • For a complete state dump, run
      agentdeploy explain <record>
      .
    • For runtime debugging without
      kubectl
      , use:
      • agentdeploy describe <record>
        for pod names, restart counts, waiting or termination reasons, image digests, requests, limits, and service or endpoint visibility
      • agentdeploy events <record>
        for missing secrets, quota failures, probe failures, and scheduling errors
      • agentdeploy logs <record> [--follow] [--previous] [--pod <name>] [--container <name>] [--tail N]
        for live or previous container logs
    • In
      explain
      , inspect the live
      infrastructureRef
      and
      serviceRef
      sections when secret wiring or same-namespace traffic looks wrong.
    • Compare
      requested_revision
      against
      observed_revision
      . If they differ, the control plane has accepted a newer revision than ArgoCD has actually reconciled in the cluster.
    • Treat a stale-reconciliation warning as a real GitOps signal. The platform now requests a targeted Argo refresh automatically, and you can also run
      agentdeploy refresh <app>
      if the warning persists.
    • If the app depends on PostgreSQL, confirm
      SharedInfra
      is healthy, the service injects the DB variant it actually uses, and the app-level readiness check matches the app’s real dependencies.
    • If the app depends on Redis, confirm
      SharedInfra
      is healthy, the service injects
      REDIS_URL
      or
      REDIS_URL_TLS
      , and the app-level readiness check actually exercises Redis.
    • If the app depends on shared object storage, confirm
      SharedInfra
      is healthy and the service injects the object-store keys it actually uses. Prefer
      OBJECT_STORE_*
      for portable app wiring and fall back to
      AZURE_STORAGE_*
      only when the runtime still needs provider-specific compatibility.
    • Remember that
      list
      ,
      status
      , and
      explain
      are usually filtered by team-scoped control-plane RBAC, not by app owner alone.
  1. 确认CLI可用,然后在编写配置文件前检查应用情况。
    • 使用工作流前,先运行
      command -v agentdeploy
      确认CLI存在。
    • 判断应用是否提供HTTP服务、运行后台任务,或仅为定时任务。
    • 检查应用是否已依赖
      PORT
      DATABASE_URL
      REDIS_URL
      环境变量,或需要执行迁移命令。
    • 优先复用现有不可变镜像摘要。仅当用户需要AgentDeploy从源码构建镜像时,才使用
      build:
      块。
  2. 选择应用架构形态。
    • 若应用仅有一个服务、无共享状态、无
      valueFrom.infrastructureRef
      valueFrom.serviceRef
      ,则单个
      Service
      即可满足需求。
    • 在这种独立架构下,AgentDeploy可直接从
      Service
      记录中自动创建
      Namespace
      ResourceQuota
      LimitRange
    • 若多个服务需要共享命名空间、PostgreSQL、Redis、对象存储或服务间通信,则先创建
      SharedInfra
      记录,再通过
      metadata.application
      将每个服务关联到该记录。
    • SharedInfra
      是唯一可内置数据库、Redis或对象存储资源的载体。请勿在
      Service
      中设置
      spec.infrastructure
      ,该模式已被废弃。
  3. 为每个服务选择工作负载形态。
    • web
      :面向浏览器的HTTP入口应用。
    • api
      :供浏览器调用的后端HTTP API。
    • worker
      :无入口的后台进程。
    • cron
      :定时任务,无入口。
  4. 创建或更新配置文件。
    • 当应用需要共享数据库/Redis或包含多个服务时,以
      assets/shared-infra.yaml
      为起点。
    • 每个服务以
      assets/
      中最匹配的服务模板为起点。
    • 应用名称需符合DNS命名规范。
    • 若代码库未声明部署元数据,则从代码库或目录名称衍生唯一的应用名称与子域名,避免与现有应用冲突。
    • 若代码库上下文缺失
      owner
      信息,优先从git配置、文档或项目上下文获取真实维护者邮箱。仅在明确的冒烟测试场景下,才生成虚拟所有者信息。
    • 若代码库上下文缺失
      team
      信息,优先从代码库、父目录或项目文档中提取明确的团队名称。若不存在,则为冒烟测试选择临时团队名称,并说明该假设。
    • 默认设置
      dataClassification: internal
      ,除非用户明确说明数据敏感度更高。
    • 当多个服务共享同一应用时,需显式设置服务的
      metadata.application
    • 对于小型独立应用,允许
      metadata.application
      默认使用
      metadata.name
    • 默认设置
      visibility: internal
      ,除非用户明确需要公网访问权限。
    • 默认设置
      authorization.mode: group-based
      ,除非用户明确需要
      org-wide
      模式且策略允许。
    • 优先使用专用Entra安全组进行应用访问控制。仅当整个团队都需要使用应用时,才使用更宽泛的团队级组。
    • 对于无真实组ID的冒烟测试或实验室环境,仅当应用为
      internal
      类型且安装策略允许时,才可使用
      org-wide
      模式。请勿在敏感数据场景中使用该模式。
    • 若用户明确需要完全公开、无共享认证的应用,需先警告用户:该应用将可被匿名访问公网,且不会接收任何共享身份头信息。
    • 该模式需配置:
      • spec.dataClassification: public
      • spec.access.auth: none
      • spec.access.authorization.mode: none
    • 在当前策略集中,仅
      dataClassification: public
      的应用允许未认证访问。需说明该模式下应用将不会收到
      X-Auth-Request-*
      身份头信息。
    • 若应用需要PostgreSQL,需在
      SharedInfra
      spec.infrastructure.databases.<name>
      下声明,并通过
      valueFrom.infrastructureRef
      关联对应的数据库环境变量:
      • 对于libpq/同步客户端,使用
        DATABASE_URL
        DATABASE_URL_SYNC
      • 对于常见异步Python栈,使用
        DATABASE_URL_ASYNC
    • 若应用需要Redis,需在
      SharedInfra
      spec.infrastructure.redis.<name>
      下声明,并通过
      valueFrom.infrastructureRef
      关联
      REDIS_URL
      REDIS_URL_TLS
    • 若应用需要跨服务共享上传或文档传递存储,需在
      SharedInfra
      spec.infrastructure.objects.<name>
      下声明,并通过
      valueFrom.infrastructureRef
      kind: objectStorage
      进行关联。
    • 对于拆分的API/worker文档流,优先使用对象存储加对象键的方式,而非本地路径传递。仅通过
      runtime.filesystem.writablePaths
      设置临时可写目录。
    • 若一个服务需要调用同一应用中的另一个服务,使用
      valueFrom.serviceRef
      关联其URL,而非硬编码域名或手动修改清单。
    • 在当前原型版本中,Redis仅支持
      dataClassification: internal
      的应用。
    • 确保进程实际监听
      runtime.port
      。若应用依赖
      PORT
      环境变量,需显式设置该变量。
    • 支持
      runtime.command
      runtime.args
      。当工作负载需要显式启动命令,而非仅通过包装镜像启动进程时,使用这两个配置。
    • 若应用需要可写临时目录,使用
      runtime.filesystem.writablePaths
      ,而非要求用户手动修改清单。
    • 仅当应用确实无法通过显式可写挂载正常工作时,才设置
      runtime.filesystem.readOnlyRootFilesystem: false
      。需将此视为安全权衡并明确说明。
  5. 部署前验证。
    • 若环境提供API支持,优先使用API路径。当前CLI已默认使用原型API URL和Entra作用域,仅当需要覆盖默认值时才设置标志或环境变量。
    • 若存在
      SharedInfra
      ,先验证它,再验证每个依赖的
      Service
    • 运行
      agentdeploy validate --file <contract>.yaml
      进行验证。
    • 若有意使用离线本地引擎验证而非托管API路径,需传递
      --api-url=
      参数。
    • 检查响应中的
      effective_service
      effective_infra
      manifest_files
      warnings
      字段。这些是在部署前捕获缺失或不匹配字段的最快方式。
    • 对于仅含服务的独立应用,
      manifest_files
      应包含
      namespace.yaml
      resourcequota.yaml
      limitrange.yaml
      。若这些文件缺失,说明应用已不再使用独立引导路径。
    • QUOTA_*
      错误视为针对当前渲染命名空间限制的预检查失败,而非通用发布故障。
    • 将容量相关警告视为调度器的尽力而为信号。它们本身不会阻塞部署,但意味着集群可能没有足够资源放置新Pod。
    • 根据JSON响应中的
      field
      allowed_values
      suggested_value
      修复错误。
    • 然后运行
      agentdeploy deploy --file <contract>.yaml --dry-run
    • 在当前原型版本中,
      deploy --dry-run
      仍可能返回
      status: accepted
      operation_id
      。需将其视为仅预览模式。应查看
      preview_only
      effective_service
      effective_infra
      manifest_files
      warnings
      字段,而非假设已启动实时部署。
  6. 正式部署。
    • 若存在
      SharedInfra
      ,先部署它,再部署每个依赖的
      Service
    • 运行
      agentdeploy deploy --file <contract>.yaml
    • 记录
      operation_id
      、报告的记录名称和初始阶段。
    • 不要假设
      git_commit
      或版本号会立即返回。实时部署会被排队并异步执行。
    • 若本地CLI模式返回
      DEPLOY_NO_LIVE_TARGET
      ,请停止操作。使用Platform API路径或配置真实的GitOps远程仓库。仅当用户明确需要本地GitOps测试时,才设置
      AGENTDEPLOY_ALLOW_LOCAL_GITOPS=true
    • 若部署返回
      DEPLOY_MISSING_SHARED_INFRA
      ,说明该应用并非真正的独立服务。需先部署
      SharedInfra
      ,或移除额外的共享状态/服务引用关联。
    • 若部署被拒绝并返回
      DEPLOY_OPERATION_ALREADY_IN_PROGRESS
      ,检查当前运行的操作是否仍有必要。若操作已停滞或过时,运行
      agentdeploy cancel <record>
      ,然后提交新的部署请求。
    • 运行
      agentdeploy status <record>
      轮询状态,直至阶段变为
      live
      或返回错误。
  7. 验证部署结果。
    • 对于多服务应用,先查看聚合应用视图:
      • agentdeploy applications
      • agentdeploy app-status <team> <application>
      • agentdeploy app-explain <team> <application>
    • 在深入单个记录前,先通过聚合视图确认
      SharedInfra
      及所有依赖服务是否正在同步收敛。
    • 使用
      status
      explain
      返回的服务URL。请勿硬编码域名,因为每个环境的配置可能不同。
    • 若需要完整状态导出,运行
      agentdeploy explain <record>
    • 无需
      kubectl
      即可进行运行时调试:
      • agentdeploy describe <record>
        :查看Pod名称、重启次数、等待或终止原因、镜像摘要、资源请求与限制,以及服务或端点可见性
      • agentdeploy events <record>
        :查看缺失的密钥、配额故障、探针失败和调度错误
      • agentdeploy logs <record> [--follow] [--previous] [--pod <name>] [--container <name>] [--tail N]
        :查看实时或历史容器日志
    • explain
      中,当密钥关联或同命名空间流量出现问题时,检查实时的
      infrastructureRef
      serviceRef
      部分。
    • 比较
      requested_revision
      observed_revision
      。若两者不同,说明控制平面已接受新版本,但ArgoCD尚未在集群中完成同步。
    • 将过时同步警告视为真实的GitOps信号。平台会自动请求ArgoCD进行定向刷新,若警告持续存在,也可手动运行
      agentdeploy refresh <app>
    • 若应用依赖PostgreSQL,需确认
      SharedInfra
      运行正常、服务已注入实际使用的数据库环境变量,且应用级就绪检查与应用真实依赖匹配。
    • 若应用依赖Redis,需确认
      SharedInfra
      运行正常、服务已注入
      REDIS_URL
      REDIS_URL_TLS
      ,且应用级就绪检查实际验证了Redis连接。
    • 若应用依赖共享对象存储,需确认
      SharedInfra
      运行正常、服务已注入实际使用的对象存储密钥。优先使用
      OBJECT_STORE_*
      进行可移植应用关联,仅当运行时仍需提供商特定兼容性时,才回退使用
      AZURE_STORAGE_*
    • 请注意,
      list
      status
      explain
      通常受团队范围的控制平面RBAC过滤,而非仅按应用所有者过滤。

Default decisions

默认决策

  • Prefer immutable image digests over tags.
  • Prefer the smallest viable CPU and memory values; only raise them when the app clearly needs more.
  • Prefer PostgreSQL only when the app actually needs persistent relational storage.
  • Prefer Redis only when the app actually needs cache or ephemeral key-value state.
  • Prefer one service per app unless there is a clear need for a multi-service application with shared namespace and shared infra.
  • Prefer the standalone
    Service
    bootstrap only for a true one-service app. The moment the app needs shared state or a sibling service, switch to explicit
    SharedInfra
    .
  • Prefer internal ingress and group-based authorization for enterprise apps.
  • 优先使用不可变镜像摘要而非标签。
  • 优先使用最小可行的CPU和内存值;仅当应用明确需要更多资源时才提升配置。
  • 仅当应用确实需要持久化关系型存储时,才使用PostgreSQL。
  • 仅当应用确实需要缓存或临时键值存储时,才使用Redis。
  • 优先为每个应用设置单个服务,除非明确需要多服务应用共享命名空间与基础设施。
  • 仅针对真正的单服务应用使用独立
    Service
    引导模式。一旦应用需要共享状态或兄弟服务,立即切换为显式
    SharedInfra
    模式。
  • 企业应用优先使用内部入口和基于组的授权模式。

High-value gotchas

高价值注意事项

  • Mutable image tags are rejected. Use
    repo@sha256:...
    or let AgentDeploy build the image.
  • allowedGroups
    must contain stable group IDs, not human-readable names.
  • Only
    internal
    data classification can be
    public
    .
  • confidential
    and
    restricted
    cannot use
    org-wide
    .
  • api
    means browser-consumed HTTP API in this platform, not general service-to-service auth.
  • Public apps use shared auth by default, but an app can explicitly opt out with
    spec.dataClassification: public
    ,
    spec.access.auth: none
    , and
    spec.access.authorization.mode: none
    .
  • For
    group-based
    access,
    allowedGroups
    should be the stable Entra object IDs of the groups that should be able to pass the shared auth proxy.
  • If more than one user set should be allowed, list all of their group object IDs in
    allowedGroups
    and keep the scope intentional. Prefer app-specific access groups over broad org-wide groups.
  • The app receives identity through
    X-Auth-Request-*
    headers, not raw Entra bearer tokens, by default.
  • The concrete headers are
    X-Auth-Request-Email
    ,
    X-Auth-Request-Groups
    ,
    X-Auth-Request-Preferred-Username
    , and
    X-Auth-Request-User
    .
  • Shared ingress auth no longer forwards bearer
    Authorization
    headers into apps by default. If an app expects raw OAuth access tokens, call that out as a platform mismatch instead of assuming they will be present.
  • auth: none
    means no shared ingress auth and no injected identity headers. In the current policy set, that is allowed only for
    dataClassification: public
    .
  • If PostgreSQL is declared, it must live on
    SharedInfra
    , not
    Service
    . Wire the correct DB env variant with
    valueFrom.infrastructureRef
    .
    DATABASE_URL
    /
    DATABASE_URL_SYNC
    are libpq-style, while
    DATABASE_URL_ASYNC
    is meant for common async Python stacks.
  • If Redis is declared, it must live on
    SharedInfra
    , not
    Service
    . Wire
    REDIS_URL
    or
    REDIS_URL_TLS
    with
    valueFrom.infrastructureRef
    . The platform now includes
    ssl_cert_reqs=required
    , so most
    redis-py
    and Celery clients should not need app-side query rewriting.
  • If one service needs another service's base URL, use
    valueFrom.serviceRef
    . That resolves to a stable in-namespace URL like
    http://expense-api/api
    and avoids hardcoding installation domains.
  • A single standalone
    Service
    can bootstrap its own namespace policy, but that only works when there are no
    infrastructureRef
    or
    serviceRef
    bindings and no other services in the same application.
  • DEPLOY_MISSING_SHARED_INFRA
    means the app has outgrown the standalone path and now needs an explicit
    SharedInfra
    owner.
  • In the current prototype, Redis is an
    internal
    -only infrastructure option and is exposed over TLS on port
    6380
    .
  • runtime.filesystem.writablePaths
    creates
    emptyDir
    mounts at those paths. Existing image contents at those paths will be hidden at runtime.
  • runtime.filesystem.readOnlyRootFilesystem
    defaults to
    true
    . Turning it off is a real security relaxation and should be deliberate.
  • If the app does not listen on
    runtime.port
    , the deployment will roll out but ingress health and readiness will still fail.
  • validate
    ,
    deploy --dry-run
    , and
    explain
    expose the effective normalized contract. Use that output to verify that infrastructure ownership, env wiring, and runtime overrides survived normalization.
  • In the intended product mode, deployers should use the Platform API path. They should not need direct Git push access or direct Kubernetes read access for normal lifecycle commands.
  • Team visibility is usually team-scoped, not owner-scoped. A caller typically sees all apps for teams they can view.
  • A second real deploy for the same app may be rejected while another non-terminal operation is queued or running.
  • agentdeploy cancel <app>
    is the current escape hatch for a stuck or obsolete live operation. It cancels the active operation record so a replacement deploy can be accepted.
  • requested_revision
    is the last revision the control plane accepted.
    observed_revision
    is the revision ArgoCD currently reports from the cluster. Treat them as different signals.
  • describe
    ,
    events
    , and
    logs
    depend on a recent hosted
    platform-api
    build when you are using
    AGENTDEPLOY_API_URL
    . If they return
    HTTP_NOT_FOUND
    , the CLI is newer than the live control plane.
  • 可变镜像标签会被拒绝。请使用
    repo@sha256:...
    格式,或让AgentDeploy构建镜像。
  • allowedGroups
    必须包含稳定的组ID,而非人类可读名称。
  • internal
    数据分类的应用可设置为
    public
    可见性。
  • confidential
    restricted
    类型的应用不可使用
    org-wide
    授权模式。
  • 在此平台中,
    api
    指供浏览器调用的HTTP API,而非通用服务间认证。
  • 公开应用默认使用共享认证,但可通过
    spec.dataClassification: public
    spec.access.auth: none
    spec.access.authorization.mode: none
    显式退出共享认证。
  • 对于
    group-based
    访问模式,
    allowedGroups
    应设置为有权通过共享认证代理的组的稳定Entra对象ID。
  • 若需允许多个用户组访问,将所有组的对象ID列入
    allowedGroups
    ,并保持访问范围明确。优先使用应用专用访问组,而非宽泛的全组织组。
  • 应用通过
    X-Auth-Request-*
    头接收身份信息,而非原始Entra Bearer令牌。
  • 具体的身份头包括
    X-Auth-Request-Email
    X-Auth-Request-Groups
    X-Auth-Request-Preferred-Username
    X-Auth-Request-User
  • 共享入口认证默认不再将Bearer
    Authorization
    头转发给应用。若应用依赖原始OAuth访问令牌,需说明这是平台不兼容问题,而非假设令牌会存在。
  • auth: none
    表示无共享入口认证,且不会注入身份头。在当前策略集中,仅
    dataClassification: public
    的应用允许使用该模式。
  • 若声明了PostgreSQL,必须将其配置在
    SharedInfra
    中,而非
    Service
    。通过
    valueFrom.infrastructureRef
    关联正确的数据库环境变量:
    DATABASE_URL
    /
    DATABASE_URL_SYNC
    为libpq风格,
    DATABASE_URL_ASYNC
    适用于常见异步Python栈。
  • 若声明了Redis,必须将其配置在
    SharedInfra
    中,而非
    Service
    。通过
    valueFrom.infrastructureRef
    关联
    REDIS_URL
    REDIS_URL_TLS
    。平台已默认包含
    ssl_cert_reqs=required
    ,因此大多数
    redis-py
    和Celery客户端无需在应用层重写查询参数。
  • 若一个服务需要调用另一个服务的基础URL,使用
    valueFrom.serviceRef
    。它会解析为稳定的同命名空间URL(如
    http://expense-api/api
    ),避免硬编码环境域名。
  • 单个独立
    Service
    可自动引导自身的命名空间策略,但仅当无
    infrastructureRef
    serviceRef
    绑定,且同一应用中无其他服务时才有效。
  • DEPLOY_MISSING_SHARED_INFRA
    表示应用已超出独立模式的能力范围,现在需要显式的
    SharedInfra
    所有者。
  • 在当前原型版本中,Redis仅支持
    internal
    类型的基础设施,并通过TLS在端口
    6380
    提供服务。
  • runtime.filesystem.writablePaths
    会在指定路径创建
    emptyDir
    挂载。镜像中该路径下的现有内容在运行时会被隐藏。
  • runtime.filesystem.readOnlyRootFilesystem
    默认设置为
    true
    。关闭该选项会降低安全性,需谨慎操作。
  • 若应用未监听
    runtime.port
    ,部署会完成,但入口健康检查与就绪检查仍会失败。
  • validate
    deploy --dry-run
    explain
    会展示标准化后的有效配置文件。可通过该输出验证基础设施所有权、环境变量关联和运行时覆盖是否在标准化后仍保持正确。
  • 在预期的产品模式中,部署人员应使用Platform API路径。正常生命周期管理无需直接Git推送权限或Kubernetes读取权限。
  • 团队可见性通常基于团队范围,而非所有者范围。调用者通常可查看其有权访问的所有团队的应用。
  • 若同一应用已有非终端操作在排队或运行,第二次真实部署请求可能会被拒绝。
  • agentdeploy cancel <app>
    是当前处理停滞或过时实时操作的应急方案。它会取消当前运行的操作记录,以便新的部署请求可被接受。
  • requested_revision
    是控制平面最后接受的版本。
    observed_revision
    是ArgoCD当前从集群报告的版本。需将它们视为不同的信号。
  • 当使用
    AGENTDEPLOY_API_URL
    时,
    describe
    events
    logs
    依赖于最新的托管
    platform-api
    构建版本。若返回
    HTTP_NOT_FOUND
    ,说明CLI版本比当前控制平面版本新。

Debug loop

调试流程

  1. Read
    agentdeploy status <record>
    first.
  2. If the phase is not obviously actionable, read
    agentdeploy explain <record>
    .
  3. For rollout or runtime failures, read
    agentdeploy describe <record>
    .
  4. If the cause still is not obvious, read
    agentdeploy events <record>
    .
  5. Use
    agentdeploy logs <record> --previous
    for crash loops and
    agentdeploy logs <record> --follow
    for live request or worker debugging.
  6. Use the error code prefix to choose the next action:
    • SCHEMA_*
      : fix the contract.
    • POLICY_*
      : change the requested shape or access mode.
    • AUTH_*
      : fix group IDs or auth assumptions.
    • INFRA_*
      : inspect database or Redis claim and secret readiness.
    • DEPLOY_*
      : inspect the workload rollout and health checks.
    • QUOTA_*
      : lower requests or ask for a higher app tier.
  1. 首先查看
    agentdeploy status <record>
  2. 若阶段信息不明确,查看
    agentdeploy explain <record>
  3. 若为发布或运行时故障,查看
    agentdeploy describe <record>
  4. 若仍无法确定原因,查看
    agentdeploy events <record>
  5. 对于崩溃循环,使用
    agentdeploy logs <record> --previous
    ;对于实时请求或worker调试,使用
    agentdeploy logs <record> --follow
  6. 根据错误代码前缀选择下一步操作:
    • SCHEMA_*
      :修复配置文件。
    • POLICY_*
      :修改请求的架构或访问模式。
    • AUTH_*
      :修复组ID或认证假设。
    • INFRA_*
      :检查数据库或Redis声明及密钥就绪状态。
    • DEPLOY_*
      :检查工作负载发布与健康检查。
    • QUOTA_*
      :降低资源请求,或申请更高的应用配额层级。

Feedback loop

反馈流程

  • If a real deployment exposes a high-value platform bug, contract gap, or reliability issue, raise that feedback rather than treating it as one-off local friction.
  • If you have access to
    elementx-ai/agentdeploy
    , open or update a GitHub issue with:
    • the affected app, record type, and workload type
    • the relevant
      SharedInfra
      or
      Service
      shape
    • operation ID, requested revision, and observed revision when available
    • the exact failure mode, impact, and the smallest useful fix
  • Prefer issues for meaningful fixes or improvements. Do not create noise for already-documented prototype limitations unless the observed behavior is worse than documented.
  • 若真实部署暴露了高价值的平台Bug、配置文件缺陷或可靠性问题,需反馈该问题,而非仅视为本地临时问题。
  • 若有权访问
    elementx-ai/agentdeploy
    ,请在GitHub上创建或更新Issue,包含:
    • 受影响的应用、记录类型和工作负载类型
    • 相关的
      SharedInfra
      Service
      架构
    • 操作ID、请求版本和观察版本(若可用)
    • 具体的故障模式、影响范围和最小可行修复方案
  • 优先为有意义的修复或改进创建Issue。对于已文档化的原型限制,除非实际行为比文档描述更差,否则无需创建无意义的Issue。

Output expectations

输出要求

When doing deployment work with this skill:
  • keep the contracts small and explicit
  • explain which workload type, application shape, and data classification you chose
  • surface the exact CLI commands you ran
  • quote the operation ID first, then the revision or Git commit once
    status
    or
    explain
    reports it
  • prefer actionable remediation over generic advice
使用本技能进行部署工作时:
  • 保持配置文件简洁且明确
  • 说明选择的工作负载类型、应用架构形态和数据分类
  • 展示执行的具体CLI命令
  • 优先引用操作ID,待
    status
    explain
    返回版本或Git提交后再补充
  • 优先提供可执行的修复建议,而非通用建议