railway

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

railway

Railway

One skill for the full Railway operator loop: status → debug → fix → deploy → verify. It wraps the
railway
CLI in a non-interactive, JSON-first style that an agent can drive without prompts, and it leans on
RAILWAY_TOKEN
from the environment instead of an interactive
railway login
.
This skill is repo-agnostic. It assumes the project is hosted on Railway (railway.com) and that a
RAILWAY_TOKEN
is exported in the environment. It makes no assumptions about the stack (Node, Python, Go, Docker, Nixpacks/Railpack — Railway's builder figures it out).
这是一款覆盖Railway完整运维流程的技能:状态查看 → 调试 → 修复 → 部署 → 验证。它以非交互式、优先JSON输出的方式封装
railway
CLI,支持Agent无需提示即可驱动操作,并且依赖环境中的
RAILWAY_TOKEN
而非交互式的
railway login
完成认证。
本技能与仓库无关。它假设项目托管在Railway(railway.com)上,且环境中已导出
RAILWAY_TOKEN
,不对技术栈做任何假设(Node、Python、Go、Docker、Nixpacks/Railpack——Railway的构建器会自动识别)。

When this skill triggers

触发场景

Phrases that should route here:
  • Deploy / build
    • "deploy this to railway"
    • "push to railway", "ship to railway", "railway up"
    • "build is failing on railway", "why did my build fail"
  • Logs / debugging
    • "show me the railway logs", "tail the logs", "railway logs --since 1h"
    • "why is my service crashing on railway"
    • "show me the 500s on railway", "show http logs", "show slow requests"
    • "find the request id abc123 in railway logs"
  • Ops
    • "redeploy on railway", "restart the api service", "roll back the last deploy"
    • "scale my railway service", "remove the latest deployment"
  • State / discovery
    • "list my railway projects", "what services are in this project", "list deployments"
    • "what's the status of my railway project", "is my service healthy"
  • Variables
    • "set a railway env var FOO=bar", "list railway variables", "delete a railway var"
  • Run / connect
    • "run this script with railway production env", "open a shell with railway env"
    • "ssh into my railway service", "connect to my railway postgres / redis / mongo"
  • Metrics
    • "what's the cpu/memory on railway", "is my service hitting limits"
    • "p95 latency on railway", "request rate on /api"
Skip when:
  • The host is not Railway (Fly, Render, Vercel, AWS, …). This skill knows the
    railway
    CLI; it does not generalise.
  • The fix is a code change with no operational lever — let the normal dev-process skills handle the code; come back here once it's time to deploy or read logs.
以下语句应路由至本技能:
  • 部署/构建
    • "deploy this to railway"
    • "push to railway", "ship to railway", "railway up"
    • "build is failing on railway", "why did my build fail"
  • 日志/调试
    • "show me the railway logs", "tail the logs", "railway logs --since 1h"
    • "why is my service crashing on railway"
    • "show me the 500s on railway", "show http logs", "show slow requests"
    • "find the request id abc123 in railway logs"
  • 运维操作
    • "redeploy on railway", "restart the api service", "roll back the last deploy"
    • "scale my railway service", "remove the latest deployment"
  • 状态/发现
    • "list my railway projects", "what services are in this project", "list deployments"
    • "what's the status of my railway project", "is my service healthy"
  • 变量管理
    • "set a railway env var FOO=bar", "list railway variables", "delete a railway var"
  • 运行/连接
    • "run this script with railway production env", "open a shell with railway env"
    • "ssh into my railway service", "connect to my railway postgres / redis / mongo"
  • 指标查看
    • "what's the cpu/memory on railway", "is my service hitting limits"
    • "p95 latency on railway", "request rate on /api"
以下情况无需使用本技能:
  • 应用托管在非Railway平台(Fly、Render、Vercel、AWS等)。本技能仅适配
    railway
    CLI,不支持通用场景。
  • 修复需求仅涉及代码变更而无运维操作——让常规开发流程技能处理代码变更,待需要部署或查看日志时再使用本技能。

Prerequisites

前置条件

  1. CLI on PATH.
    which railway
    should resolve. If not, install:
    npm install -g @railway/cli
    (or use the official installer at https://docs.railway.com/guides/cli). Minimum version: 4.x (this skill assumes the modern subcommand layout —
    service list
    ,
    deployment list
    ,
    logs --filter
    ,
    --json
    on most commands).
  2. Auth via env var.
    echo "${RAILWAY_TOKEN:0:8}…"
    should print a non-empty prefix. The CLI reads
    RAILWAY_TOKEN
    directly — do not run
    railway login
    in agent sessions. Two token shapes exist:
    • Account / personal token (created at https://railway.com/account/tokens) — works across every workspace, project, and environment the user has access to. Required for
      railway list
      ,
      railway link --workspace
      , and any cross-project view.
    • Project token (created in a project's Settings → Tokens, scoped to one project+environment) — works for that single project/env.
      railway list
      and other workspace-level commands return
      Unauthorized
      with this kind;
      railway status
      ,
      railway logs
      ,
      railway up
      ,
      railway variable …
      all work. First call that returns
      Unauthorized
      /
      Invalid RAILWAY_TOKEN
      is the signal to ask the user which shape they configured and whether they need to widen scope.
  3. No interactive prompts. Always pass explicit
    --project / --service / --environment
    flags (and
    --json
    ,
    -y
    ,
    --ci
    where they exist) instead of relying on linked state. Linked state is a
    .railway/
    directory and survives across CLI calls, but in fresh agent sessions there is no link yet — set the scope every call until the user explicitly asks to link.
  1. CLI已加入PATH。执行
    which railway
    应能找到命令。若未安装,可执行:
    npm install -g @railway/cli
    (或使用官方安装工具:https://docs.railway.com/guides/cli)。最低版本要求:4.x(本技能假设使用现代子命令布局——`service list
    deployment list
    logs --filter
    、多数命令支持
    --json`参数)。
  2. 通过环境变量认证。执行
    echo "${RAILWAY_TOKEN:0:8}…"
    应能输出非空前缀。CLI会直接读取
    RAILWAY_TOKEN
    ——请勿在Agent会话中执行
    railway login
    。令牌分为两种类型:
    • 账户/个人令牌(在https://railway.com/account/tokens创建)——适用于用户有权访问的所有工作区、项目和环境。执行`railway list
      railway link --workspace`等跨项目视图命令时需要此令牌。
    • 项目令牌(在项目设置→令牌中创建,仅对单个项目+环境生效)——仅适用于指定的项目/环境。使用此令牌执行
      railway list
      等工作区级命令会返回
      Unauthorized
      ,但
      railway status
      railway logs
      railway up
      railway variable …
      等命令均可正常使用。 若首次调用返回
      Unauthorized
      /
      Invalid RAILWAY_TOKEN
      ,则需询问用户配置的令牌类型,以及是否需要扩大权限范围。
  3. 无交互式提示。始终传递明确的
    --project / --service / --environment
    参数(以及
    --json
    -y
    --ci
    等可用参数),而非依赖链接状态。链接状态存储在
    .railway/
    目录中并在CLI调用间保留,但在新的Agent会话中尚无链接状态——每次调用都需设置范围,直到用户明确要求建立链接。

Operating procedure

操作流程

The skill is a small state machine. Pick the smallest entry point that answers the user's question; don't run discovery they didn't ask for.
       ┌──────────┐
       │ Discover │  list projects / services / deployments / env / vars
       └────┬─────┘
       ┌──────────┐        ┌──────────┐
       │  Debug   │◄──────►│ Metrics  │   logs (build/deploy/http) + cpu/mem/p95
       └────┬─────┘        └──────────┘
       ┌──────────┐
       │   Fix    │  variables set / code edit / config change
       └────┬─────┘
       ┌──────────┐        ┌──────────┐
       │  Deploy  │───────►│  Verify  │   up / redeploy / restart / down / roll back
       └──────────┘        └──────────┘
本技能是一个小型状态机。选择能响应用户问题的最小入口点,不要执行用户未请求的发现操作。
       ┌──────────┐
       │ 发现资源 │  列出项目/服务/部署/环境/变量
       └────┬─────┘
       ┌──────────┐        ┌──────────┐
       │  调试    │◄──────►│  指标查看 │   日志(构建/部署/HTTP)+ CPU/内存/P95延迟
       └────┬─────┘        └──────────┘
       ┌──────────┐
       │  修复    │  设置变量/编辑代码/修改配置
       └────┬─────┘
       ┌──────────┐        ┌──────────┐
       │  部署    │───────►│  验证    │   上传部署/重新部署/重启/移除部署/回滚
       └──────────┘        └──────────┘

Step 1 — discover

步骤1 — 发现资源

Always start by capturing the IDs you need, so subsequent calls are explicit. Prefer JSON output for parsing.
bash
undefined
始终先获取所需的ID,以便后续调用明确目标。优先使用JSON输出以便解析。
bash
undefined

Workspace-wide view (requires an account token).

工作区全局视图(需要账户令牌)。

railway list --json
railway list --json

A single project's structure (project token works here too if you pass --project).

单个项目的结构(若传递--project参数,项目令牌也可使用)。

railway status --json --project "$PROJECT_ID"
railway status --json --project "$PROJECT_ID"

Services and environments inside that project.

项目内的服务和环境。

railway service list --json --project "$PROJECT_ID" --environment production railway environment list --json --project "$PROJECT_ID"
railway service list --json --project "$PROJECT_ID" --environment production railway environment list --json --project "$PROJECT_ID"

Deployments for a specific service (most recent first).

指定服务的部署记录(按时间倒序,最新在前)。

railway deployment list
--project "$PROJECT_ID"
--service api
--environment production
--limit 20 --json

Capture from the JSON: project id, environment id/name (typically `production`, `staging`, plus PR preview envs), service ids/names, the latest deployment id and its `status` (`SUCCESS`, `FAILED`, `BUILDING`, `DEPLOYING`, `CRASHED`, `REMOVED`). The deployment status is what tells you whether the symptom is a build problem, a startup problem, or a steady-state runtime problem — pick the right log stream accordingly.
railway deployment list
--project "$PROJECT_ID"
--service api
--environment production
--limit 20 --json

从JSON输出中提取:项目ID、环境ID/名称(通常为`production`、`staging`,以及PR预览环境)、服务ID/名称、最新部署ID及其`status`(`SUCCESS`、`FAILED`、`BUILDING`、`DEPLOYING`、`CRASHED`、`REMOVED`)。部署状态可帮助判断问题类型是构建失败、启动失败还是运行时稳态问题——据此选择对应的日志流。

Step 2 — debug (logs first)

步骤2 — 调试(优先查看日志)

Railway exposes three log streams. Pick the one that matches the failure mode; mixing them makes the tail unreadable.
StreamFlagUse when
Deploy / runtime
railway logs
(default)
The app is up but misbehaving, or it crashed after starting.
Build
railway logs --build [DEPLOYMENT_ID]
Deployment is
FAILED
and never reached runtime.
HTTP
railway logs --http
The symptom is a status code, a latency spike, a specific request.
Default to historical, non-streaming queries in agent sessions — streaming hangs the shell. Any of
--lines
,
--since
, or
--until
disables streaming.
bash
undefined
Railway提供三种日志流。选择与故障模式匹配的日志流;混合查看会导致日志难以阅读。
日志流参数使用场景
部署/运行时
railway logs
(默认)
应用已启动但行为异常,或启动后崩溃。
构建
railway logs --build [DEPLOYMENT_ID]
部署状态为
FAILED
且未进入运行阶段。
HTTP
railway logs --http
问题表现为状态码异常、延迟飙升或特定请求异常。
在Agent会话中默认使用历史非流式查询——流式查询会导致Shell挂起。使用
--lines
--since
--until
参数均可禁用流式输出。
bash
undefined

Last 200 deploy log lines, JSON for parsing.

最近200条部署日志,JSON格式以便解析。

railway logs
--project "$PROJECT_ID" --service api --environment production
--lines 200 --json
railway logs
--project "$PROJECT_ID" --service api --environment production
--lines 200 --json

Build logs for the failed deployment specifically.

特定失败部署的构建日志。

railway logs --build "$DEPLOYMENT_ID"
--project "$PROJECT_ID" --service api --environment production
--lines 500
railway logs --build "$DEPLOYMENT_ID"
--project "$PROJECT_ID" --service api --environment production
--lines 500

Errors only in the last hour.

最近1小时内的错误日志。

railway logs --since 1h --filter "@level:error" --lines 100
--project "$PROJECT_ID" --service api --environment production
railway logs --since 1h --filter "@level:error" --lines 100
--project "$PROJECT_ID" --service api --environment production

All 5xx HTTP responses in the last 30 minutes.

最近30分钟内的所有5xx HTTP响应日志。

railway logs --http --since 30m --status ">=500" --lines 200
--project "$PROJECT_ID" --service api --environment production --json
railway logs --http --since 30m --status ">=500" --lines 200
--project "$PROJECT_ID" --service api --environment production --json

Slow GETs on a specific path.

指定路径上的慢GET请求日志。

railway logs --http --method GET --path /api/users
--filter "@totalDuration:>=1000" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production
railway logs --http --method GET --path /api/users
--filter "@totalDuration:>=1000" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production

Trace one request end-to-end.

追踪单个请求的完整链路日志。

railway logs --http --request-id "$REQUEST_ID" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production

Filter syntax (Railway query language, also accepted in the dashboard):

- **Text**: bare words → substring match; `"two words"` → phrase.
- **Level** (deploy/build only): `@level:error`, `@level:warn`, `@level:info`.
- **HTTP fields**: `@httpStatus`, `@method`, `@path`, `@host`, `@requestId`, `@clientUa`, `@srcIp`, `@edgeRegion`, `@upstreamAddress`, `@upstreamProto`, `@downstreamProto`, `@responseDetails`, `@deploymentId`, `@deploymentInstanceId`, `@totalDuration`, `@responseTime`, `@upstreamRqDuration`, `@txBytes`, `@rxBytes`, `@upstreamErrors`.
- **Operators**: `> >= < <= ..` (range, e.g. `@httpStatus:200..299`); `AND`, `OR`, `-` (negation), parentheses.

If the user asks for "logs from the latest deployment even if it failed", add `--latest` — otherwise `railway logs` walks back to the most recent **successful** deployment, which is usually not what you want when debugging a regression.
railway logs --http --request-id "$REQUEST_ID" --lines 50 --json
--project "$PROJECT_ID" --service api --environment production

过滤语法(Railway查询语言,同样适用于控制台):

- **文本**:单个单词 → 子串匹配;`"two words"` → 短语匹配。
- **日志级别**(仅部署/构建日志):`@level:error`、`@level:warn`、`@level:info`。
- **HTTP字段**:`@httpStatus`、`@method`、`@path`、`@host`、`@requestId`、`@clientUa`、`@srcIp`、`@edgeRegion`、`@upstreamAddress`、`@upstreamProto`、`@downstreamProto`、`@responseDetails`、`@deploymentId`、`@deploymentInstanceId`、`@totalDuration`、`@responseTime`、`@upstreamRqDuration`、`@txBytes`、`@rxBytes`、`@upstreamErrors`。
- **操作符**:`> >= < <= ..`(范围,例如`@httpStatus:200..299`);`AND`、`OR`、`-`(否定)、括号。

如果用户要求“查看最新部署的日志,即使部署失败”,需添加`--latest`参数——否则`railway logs`会默认返回最近一次**成功**部署的日志,这通常不是调试回归问题时需要的内容。

Step 3 — metrics (sanity check resource state)

步骤3 — 指标查看(验证资源状态)

Pair logs with metrics when the symptom could be a resource ceiling (OOM kills, CPU throttling, egress bursts, volume full).
bash
undefined
当问题可能由资源上限导致时(OOM终止、CPU节流、出口流量突增、存储卷已满),需结合日志和指标进行分析。
bash
undefined

Compact summary for the linked service, last hour.

关联服务的紧凑摘要,最近1小时数据。

railway metrics --json
--project "$PROJECT_ID" --service api --environment production
railway metrics --json
--project "$PROJECT_ID" --service api --environment production

Specific dimensions.

指定维度的指标数据。

railway metrics --cpu --memory --since 6h --json
--project "$PROJECT_ID" --service api --environment production
railway metrics --cpu --memory --since 6h --json
--project "$PROJECT_ID" --service api --environment production

HTTP percentiles + RPS for a path.

指定路径的HTTP百分位数+请求速率。

railway metrics --http --path /api/users --json --since 1h
--project "$PROJECT_ID" --service api --environment production
railway metrics --http --path /api/users --json --since 1h
--project "$PROJECT_ID" --service api --environment production

Table across every service in the project.

项目内所有服务的指标表格。

railway metrics --all --json
--project "$PROJECT_ID" --environment production

Read these together with the deploy log: a memory line climbing into the service's limit followed by a sudden gap is an OOM; sustained CPU at 100% with growing p95 is a throttle. Don't editorialise beyond what the numbers show.
railway metrics --all --json
--project "$PROJECT_ID" --environment production

结合部署日志分析指标:内存占用攀升至服务上限后突然出现断层,说明发生了OOM;CPU持续100%且P95延迟不断增加,说明存在CPU节流。仅基于数据输出结论,不要添加主观推断。

Step 4 — variables (read-only first, write only on confirmation)

步骤4 — 变量管理(优先只读操作,修改需确认)

Variables are usually where misconfiguration hides. Listing variables prints secret values — treat the output as confidential, never echo raw values back into the chat, and use
--json
so you can summarise (key names + value lengths) instead of pasting plaintext secrets.
bash
undefined
配置错误通常隐藏在变量中。列出变量会显示敏感值——需将输出视为机密信息,切勿直接在聊天中回显原始值,使用
--json
参数以便汇总(仅显示键名+值长度)而非粘贴明文敏感信息。
bash
undefined

Read — JSON includes raw values; redact before surfacing.

读取变量——JSON输出包含原始值;展示前需脱敏。

railway variable list --json
--project "$PROJECT_ID" --service api --environment production
railway variable list --json
--project "$PROJECT_ID" --service api --environment production

Write — explicit confirmation required before running.

设置变量——执行前需明确确认。

railway variable set "FEATURE_FLAG=on"
--project "$PROJECT_ID" --service api --environment production
railway variable set "FEATURE_FLAG=on"
--project "$PROJECT_ID" --service api --environment production

Setting a variable triggers a redeploy by default; add --skip-deploys

默认情况下,设置变量会触发重新部署;若需设置后不重新部署,需添加--skip-deploys参数(位于子命令之前的顶级参数)。

(top-level, before the subcommand) to set without redeploying.

删除变量。

Delete.

railway variable delete FEATURE_FLAG
--project "$PROJECT_ID" --service api --environment production

Default to listing first ("here are the keys configured on production; which one do you want to change?") and only run `set` / `delete` after the user picks a target. For new secrets, prefer reading from stdin so the plaintext never enters the agent's argv buffer (visible in `ps`): pipe the value into `railway variable set --stdin KEY` (a top-level option on the legacy `variable` form; the modern flow is `railway variable set "KEY=$(< file)"` from a local file the user controls).
railway variable delete FEATURE_FLAG
--project "$PROJECT_ID" --service api --environment production

默认先执行列出操作(“以下是生产环境配置的变量键名;您需要修改哪一个?”),仅在用户指定目标后再执行`set`/`delete`操作。对于新的敏感值,优先从标准输入读取,避免明文进入Agent的argv缓冲区(可通过`ps`命令查看):将值通过管道传入`railway variable set --stdin KEY`(旧版`variable`命令的顶级选项;新版流程是从用户控制的本地文件读取:`railway variable set "KEY=$(< file)"`)。

Step 5 — fix and deploy

步骤5 — 修复与部署

Three deploy verbs, in increasing order of intent:
VerbEffectUse when
railway restart
Restart the latest deployment without rebuilding.Process is wedged but the build artefact is fine.
railway redeploy
Re-run the latest deployment (or
--from-source
to pull the newest commit / image).
A transient failure or you want to redeploy the same artefact. Use
--from-source
to pick up new commits without uploading.
railway up
Upload the current working directory and deploy it.The fix is a code change in this repo.
Non-interactive defaults:
bash
undefined
部署操作分为三种,按操作意图强度递增排序:
命令效果使用场景
railway restart
重启最新部署,不重新构建。进程僵死但构建产物正常。
railway redeploy
重新执行最新部署(或使用
--from-source
拉取最新提交/镜像)。
临时故障,或需重新部署同一构建产物。使用
--from-source
参数可在不上传本地代码的情况下拉取最新提交。
railway up
上传当前工作目录并部署。修复内容为当前仓库的代码变更。
非交互式默认配置:
bash
undefined

Restart (no rebuild). -y skips the confirmation dialog.

重启(不重新构建)。-y参数跳过确认对话框。

railway restart -y --json
--project "$PROJECT_ID" --service api --environment production
railway restart -y --json
--project "$PROJECT_ID" --service api --environment production

Redeploy the latest deployment.

重新部署最新部署记录。

railway redeploy -y --json
--project "$PROJECT_ID" --service api --environment production
railway redeploy -y --json
--project "$PROJECT_ID" --service api --environment production

Redeploy and pull the newest commit / image from the configured source.

重新部署并从配置源拉取最新提交/镜像。

railway redeploy -y --from-source --json
--project "$PROJECT_ID" --service api --environment production
railway redeploy -y --from-source --json
--project "$PROJECT_ID" --service api --environment production

Upload and deploy this directory. --ci streams build logs only, then exits;

上传当前目录并部署。--ci参数仅流式输出构建日志,然后退出;非常适合Agent会话(无交互式日志附加)。

perfect for agent sessions (no interactive log attach).

railway up --ci
--project "$PROJECT_ID" --service api --environment production
--message "fix: bump httpx to 0.27 to pick up TLS bug fix"
railway up --ci
--project "$PROJECT_ID" --service api --environment production
--message "fix: bump httpx to 0.27 to pick up TLS bug fix"

Remove the most recent deployment (rollback to whatever was before it).

删除最近一次部署(回滚至上一次部署状态)。

railway down -y
--project "$PROJECT_ID" --service api --environment production

`railway up --ci` is the agent-friendly form: it implies `CI=true`, streams build logs to stdout, and exits with non-zero on build failure. Without `--ci` the CLI tries to attach a live log pager; in an automation context that hangs.

After deploy, **always verify** by sampling the new deployment's logs and a tiny metrics window — don't just trust the exit code. The Railway build can succeed and the runtime can still crashloop on startup.

```bash
railway down -y
--project "$PROJECT_ID" --service api --environment production

`railway up --ci`是Agent友好的形式:它隐含`CI=true`,将构建日志流式输出到标准输出,构建失败时返回非零退出码。若不使用`--ci`参数,CLI会尝试附加实时日志分页器,在自动化环境中会导致挂起。

部署完成后,**务必进行验证**——采样新部署的日志和一小段时间的指标数据,不要仅依赖退出码。Railway构建可能成功,但运行时仍可能在启动时崩溃循环。

```bash

Quick verification loop.

快速验证流程。

railway deployment list --json --limit 3
--project "$PROJECT_ID" --service api --environment production railway logs --lines 50 --since 2m
--project "$PROJECT_ID" --service api --environment production
undefined
railway deployment list --json --limit 3
--project "$PROJECT_ID" --service api --environment production railway logs --lines 50 --since 2m
--project "$PROJECT_ID" --service api --environment production
undefined

Step 6 — run, shell, ssh, db connect

步骤6 — 运行、Shell、SSH、数据库连接

For development workflows that need production env vars locally, or a shell on the live container:
bash
undefined
适用于需要在本地使用生产环境变量的开发工作流,或需要进入实时容器Shell的场景:
bash
undefined

Run a one-shot command with the linked service's variables injected.

注入关联服务的变量并执行一次性命令。

railway run --service api --environment production -- node scripts/migrate.js
railway run --service api --environment production -- node scripts/migrate.js

Open a subshell with the same env (interactive — only run when the user is at the terminal).

打开包含相同环境变量的子Shell(交互式——仅当用户在终端前时执行)。

railway shell --service api --environment production --silent
railway shell --service api --environment production --silent

SSH into the running container of a service. -i picks an identity file if Railway

SSH进入服务的运行容器。若Railway无法在~/.ssh中找到可用密钥,使用-i参数指定身份文件。

can't find a usable key in ~/.ssh.

railway ssh
--project "$PROJECT_ID" --service api --environment production
railway ssh
--project "$PROJECT_ID" --service api --environment production

One-shot remote command (non-interactive).

执行一次性远程命令(非交互式)。

railway ssh
--project "$PROJECT_ID" --service api --environment production
-- ls /app
railway ssh
--project "$PROJECT_ID" --service api --environment production
-- ls /app

Open a database shell against a Railway-managed DB service.

打开Railway托管数据库服务的Shell。

railway connect postgres
--project "$PROJECT_ID" --environment production

`railway run env` and `railway run printenv` will print every secret variable for that service — treat the output as you would `railway variable list --json` and never paste it back.
railway connect postgres
--project "$PROJECT_ID" --environment production

`railway run env`和`railway run printenv`会输出该服务的所有敏感变量——需像对待`railway variable list --json`一样处理输出,切勿直接粘贴到聊天中。

Common failure shapes

常见故障场景

Unauthorized. Please check that your RAILWAY_TOKEN is valid

Unauthorized. Please check that your RAILWAY_TOKEN is valid

Either no token, an expired one, or a project-scoped token being used against a workspace-level command (
railway list
,
railway link --workspace
). Ask the user which token shape they configured; if they need workspace-level commands, they need an account token from https://railway.com/account/tokens.
可能是未设置令牌、令牌已过期,或使用项目范围令牌执行工作区级命令(
railway list
railway link --workspace
)。询问用户配置的令牌类型;若需要执行工作区级命令,需使用从https://railway.com/account/tokens获取的账户令牌。

Build
FAILED
, deploy log empty

构建状态
FAILED
,部署日志为空

The failure is in
--build
logs, not the default deploy stream:
bash
railway logs --build "$DEPLOYMENT_ID" --lines 500 \
  --project "$PROJECT_ID" --service api --environment production
If the deployment id is unknown,
railway deployment list --json --limit 5
gives you the most recent failed one.
故障信息在
--build
日志流中,而非默认的部署日志流:
bash
railway logs --build "$DEPLOYMENT_ID" --lines 500 \
  --project "$PROJECT_ID" --service api --environment production
若未知部署ID,执行
railway deployment list --json --limit 5
可获取最近的失败部署记录。

CRASHED
deployment, deploy logs end with the start command

部署状态
CRASHED
,部署日志以启动命令结尾

App is dying during startup. Read the tail of
railway logs --lines 200
for the actual exception. Common shapes:
  • Missing env var (something like
    KeyError: 'DATABASE_URL'
    or
    panic: required environment variable …
    ) →
    railway variable list --json
    to confirm, then
    railway variable set …
    .
  • Port binding wrong — Railway sets
    $PORT
    ; the service must bind to
    0.0.0.0:$PORT
    , not a hardcoded port.
  • DB connection refused — check the linked DB service is in the same environment and the private network DNS (e.g.
    postgres.railway.internal
    ) is what the app expects.
应用在启动过程中终止。查看
railway logs --lines 200
的末尾内容获取实际异常信息。常见场景:
  • 缺少环境变量(例如
    KeyError: 'DATABASE_URL'
    panic: required environment variable …
    )→ 执行
    railway variable list --json
    确认,然后执行
    railway variable set …
    设置变量。
  • 端口绑定错误——Railway会设置
    $PORT
    ;服务必须绑定到
    0.0.0.0:$PORT
    ,而非硬编码端口。
  • 数据库连接被拒绝——检查关联的数据库服务是否在同一环境中,且应用使用的私有网络DNS(例如
    postgres.railway.internal
    )正确。

Build succeeds, runtime 502 / Bad Gateway from the edge

构建成功,运行时边缘返回502 / Bad Gateway

The app didn't bind to
$PORT
in time (default healthcheck window). Either the app is slow to start (raise
healthcheckTimeout
in
railway.json
/
railway.toml
, or fix the slow startup), or it's binding to
127.0.0.1
instead of
0.0.0.0
. Cross-check with
railway logs --http --status 502 --lines 50
to confirm the edge is the source.
应用未在健康检查窗口内绑定到
$PORT
。可能是应用启动缓慢(在
railway.json
/
railway.toml
中提高
healthcheckTimeout
,或修复启动缓慢问题),或绑定到
127.0.0.1
而非
0.0.0.0
。执行
railway logs --http --status 502 --lines 50
确认是否由边缘节点导致。

Sudden OOM (
SIGKILL
/
out of memory
)

突发OOM(
SIGKILL
/
out of memory

Pair the deploy log with
railway metrics --memory --since 30m --json
. If memory climbs into the service limit and the gap aligns with the kill, raise the service's memory cap (dashboard or
railway.json
resources.memory
). Don't silently raise it without telling the user — call out that you saw the ceiling hit.
结合部署日志和
railway metrics --memory --since 30m --json
分析。若内存占用攀升至服务上限且断层时间与终止时间一致,需提高服务的内存上限(通过控制台或
railway.json
中的
resources.memory
配置)。请勿在未告知用户的情况下擅自修改——需明确告知用户已触发内存上限。

railway up
hangs in an agent session

railway up
在Agent会话中挂起

You forgot
--ci
. The default mode attaches a live pager that doesn't exit. Kill it, re-run with
--ci
.
忘记添加
--ci
参数。默认模式会附加实时分页器,不会自动退出。终止进程后,重新添加
--ci
参数执行。

Variable changes "didn't take effect"

变量变更“未生效”

railway variable set
triggers a redeploy by default — but if
--skip-deploys
was passed, the variable is staged and the running deployment still has the old value. Either redeploy explicitly (
railway redeploy -y
) or rerun the set without
--skip-deploys
.
railway variable set
默认会触发重新部署——但若使用了
--skip-deploys
参数,变量会被暂存,运行中的部署仍使用旧值。需显式重新部署(
railway redeploy -y
),或不使用
--skip-deploys
参数重新执行变量设置。

Conventions

约定规范

  • JSON-first. Add
    --json
    to every command that supports it, and parse with
    jq
    rather than scraping human-readable output. Layouts change; the JSON keys are stable.
  • Explicit scope every call. Pass
    --project
    ,
    --service
    ,
    --environment
    on every command in an agent session. Don't rely on
    .railway/
    linked state — it's invisible to the user and confusing when it drifts.
  • Non-streaming logs by default. Always combine with
    --lines
    ,
    --since
    , or
    --until
    . Streaming is for humans at a terminal, not agents.
  • Never paste secrets.
    railway variable list
    ,
    railway run env
    , and
    railway shell
    all surface plaintext secrets. Summarise (key names, value lengths) instead. If the user explicitly asks for a value, paste it in a code block and remind them it's a secret.
  • Confirm before destructive ops.
    railway down
    ,
    railway restart
    ,
    railway redeploy
    ,
    railway variable delete
    ,
    railway environment delete
    ,
    railway volume delete
    ,
    railway delete
    (the project!) all change live state. Repeat the scope back to the user ("restart
    api
    in
    production
    of project
    ?") and wait for explicit confirmation, even if
    -y
    is technically available.
  • Verify after deploy. Don't end on a
    railway up --ci
    success line. Pull the latest deployment's status and a 50-line log sample so the user sees the actual runtime state, not just the build outcome.
  • One failure mode per investigation. Build vs. crashloop vs. 5xx vs. OOM are distinct shapes with distinct log streams. Don't blend their tails in one report.
  • 优先JSON输出。为所有支持
    --json
    参数的命令添加该参数,使用
    jq
    解析而非提取人类可读输出。输出格式可能变化,但JSON键名稳定。
  • 每次调用明确范围。在Agent会话中执行每个命令时都传递
    --project
    --service
    --environment
    参数。不要依赖
    .railway/
    链接状态——用户无法看到该状态,状态漂移时会造成混淆。
  • 默认使用非流式日志。始终结合
    --lines
    --since
    --until
    参数。流式日志适用于终端前的人类用户,不适用于Agent。
  • 切勿粘贴敏感信息
    railway variable list
    railway run env
    railway shell
    都会显示明文敏感信息。仅汇总显示(键名、值长度)。若用户明确要求查看值,需放在代码块中并提醒用户这是敏感信息。
  • 破坏性操作前确认
    railway down
    railway restart
    railway redeploy
    railway variable delete
    railway environment delete
    railway volume delete
    railway delete
    (删除项目!)都会修改实时状态。需向用户重复操作范围(“是否重启
    production
    环境中项目
    api
    服务?”)并等待明确确认,即使技术上可使用
    -y
    参数。
  • 部署后验证。不要以
    railway up --ci
    成功作为结束。获取最新部署的状态和50行日志样本,让用户看到实际运行状态,而非仅构建结果。
  • 每次调查针对单一故障模式。构建失败、崩溃循环、5xx错误、OOM是不同的故障类型,对应不同的日志流。不要在一份报告中混合展示不同类型的日志。

Onsager-bundled scripts (optional, repo-specific)

Onsager专属捆绑脚本(可选,仓库特定)

The
scripts/
directory ships three shell wrappers tuned for the
onsager-ai/onsager
monorepo. They are pinned to that repo's deployment shape (service name
onsager
, env var
ONSAGER_RAILWAY_TOKEN
, production URL
https://onsager-production.up.railway.app
,
justfile
targets). Repos other than Onsager can ignore these or fork them; the generic operating procedure above covers every project.
When in the Onsager repo:
TaskCommand
Pre-deploy check
sh scripts/preflight.sh
Diagnose failure
sh scripts/debug.sh [service]
Verify live deploy
sh scripts/smoke.sh [url]
  • preflight.sh
    — runs before any deploy or while triaging a build failure. Checks lockfiles (
    Cargo.lock
    ,
    pnpm-lock.yaml
    ) are tracked in git, Dockerfile COPY sources resolve, Railway vars don't leak
    localhost
    , and
    DATABASE_URL
    points at the Railway Postgres plugin. Exits non-zero on any failure; skips Railway variable checks if
    ONSAGER_RAILWAY_TOKEN
    is not set.
  • debug.sh [service]
    — one-shot diagnostics for a failed or stuck deploy: service status, build logs (40 lines), deploy/runtime logs (40), error-only logs (20), HTTP 4xx/5xx (10), env vars. Default service
    onsager
    . Requires
    ONSAGER_RAILWAY_TOKEN
    .
  • smoke.sh [base_url]
    — post-deploy verification: API checks via
    curl
    (
    /api/health
    ,
    /api/auth/me
    ,
    /api/nodes
    ,
    /api/sessions
    ) and optional UI checks via
    agent-browser
    (
    /
    ,
    /sessions
    ,
    /nodes
    ,
    /settings
    ). Default URL
    https://onsager-production.up.railway.app
    . UI checks skip gracefully if
    agent-browser
    is not on PATH.
These scripts demonstrate the wrapping pattern; another repo adopting this skill should fork the directory and re-shape the script bodies for its own deployment.
scripts/
目录包含三个针对
onsager-ai/onsager
单仓库优化的Shell封装脚本。这些脚本针对该仓库的部署形态(服务名称
onsager
、环境变量
ONSAGER_RAILWAY_TOKEN
、生产URL
https://onsager-production.up.railway.app
justfile
目标)进行了定制。非Onsager仓库可忽略或复刻这些脚本;上述通用操作流程适用于所有项目。
在Onsager仓库中使用:
任务命令
部署前检查
sh scripts/preflight.sh
故障诊断
sh scripts/debug.sh [service]
部署验证
sh scripts/smoke.sh [url]
  • preflight.sh
    — 在部署前或排查构建失败时执行。检查锁文件(
    Cargo.lock
    pnpm-lock.yaml
    )是否已纳入git跟踪、Dockerfile的COPY源是否有效、Railway变量是否泄露
    localhost
    DATABASE_URL
    是否指向Railway Postgres插件。若检查失败则返回非零退出码;若未设置
    ONSAGER_RAILWAY_TOKEN
    则跳过Railway变量检查。
  • debug.sh [service]
    — 针对失败或卡住的部署执行一次性诊断:服务状态、构建日志(40行)、部署/运行时日志(40行)、仅错误日志(20行)、HTTP 4xx/5xx日志(10行)、环境变量。默认服务为
    onsager
    。需要
    ONSAGER_RAILWAY_TOKEN
  • smoke.sh [base_url]
    — 部署后验证:通过
    curl
    进行API检查(
    /api/health
    /api/auth/me
    /api/nodes
    /api/sessions
    ),可选通过
    agent-browser
    进行UI检查(
    /
    /sessions
    /nodes
    /settings
    )。默认URL为
    https://onsager-production.up.railway.app
    。若
    agent-browser
    未加入PATH,UI检查会自动跳过。
这些脚本展示了封装模式;其他仓库采用本技能时,可复刻该目录并根据自身部署形态修改脚本内容。

Related skills

相关技能

  • The repo's spec-driven dev-process skill — when the fix is a code change, not just an ops lever; that's where the spec / branch / PR loop lives. This skill picks up at the deploy step.
  • plan-dag
    — when the operator question is really "what's still blocking the deploy?" rather than "deploy this thing".
  • Railway's own AI surfaces —
    railway agent -p "<question>"
    runs an interactive assistant inside the CLI, and
    railway mcp install
    wires Railway's MCP server into Claude Code / Cursor / Codex. Useful as a fallback when this skill's scripted flow isn't enough, but they're not a substitute for the explicit JSON-first loop above — they're for exploratory questions, not for reproducible automation.
  • 仓库的规范驱动开发流程技能——当修复需求仅涉及代码变更而非运维操作时使用;该技能处理规范/分支/PR流程。本技能在部署阶段接手。
  • plan-dag
    — 当运维问题实际是“部署仍有哪些阻塞项?”而非“部署此内容”时使用。
  • Railway官方AI工具 —
    railway agent -p "<question>"
    在CLI内运行交互式助手,
    railway mcp install
    将Railway的MCP服务器接入Claude Code/Cursor/Codex。当本技能的脚本化流程无法满足需求时,可作为备选方案,但无法替代上述明确的优先JSON输出流程——这些工具适用于探索性问题,而非可复现的自动化操作。