browser-to-api

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Browser to API

Replay-driven API discovery. Consume a

browser-trace

capture, pair its CDP request / response events, templatize observed URLs, infer JSON schemas from samples, and emit an OpenAPI 3.1 document plus a human-readable coverage report.

This skill does not capture traffic. It is purely offline post-processing on top of

browser-trace

cdp/network/*.jsonl

buckets. The two skills compose:

browser-trace    →  .o11y/<run>/cdp/network/{requests,responses}.jsonl
browser-to-api   →  .o11y/<run>/api-spec/index.html + openapi.yaml + client.mjs

基于重放的API发现。读取

browser-trace

捕获的内容，匹配其CDP请求/响应事件，将观测到的URL模板化，从样本中推断JSON Schema，并生成一份OpenAPI 3.1文档以及一份易读的覆盖范围报告。

本技能不捕获流量，仅对

browser-trace

生成的

cdp/network/*.jsonl

文件进行离线后处理。两个技能可配合使用：

browser-trace    →  .o11y/<run>/cdp/network/{requests,responses}.jsonl
browser-to-api   →  .o11y/<run>/api-spec/index.html + openapi.yaml + client.mjs

When to use

适用场景

The user wants an OpenAPI document for a third-party or undocumented website API.
The user has a
```
browser-trace
```
run and wants endpoints + schemas extracted from it.
The user is building a client/SDK against a site that doesn't publish a spec.
The user wants a coverage report showing which flows would broaden the spec.

If the user wants to capture traffic, send them to

browser-trace

first.

用户需要为第三方或无文档的网站API生成OpenAPI文档。
用户已完成
```
browser-trace
```
运行，希望从中提取端点和Schema。
用户正在针对未发布规范的网站构建客户端/SDK。
用户需要一份覆盖范围报告，了解哪些流程可以扩展规范内容。

如果用户需要捕获流量，请先引导他们使用

browser-trace

。

Two-step workflow

两步工作流

1. Capture with

browser-trace

(and optionally bodies via

browse network on

)

1. 使用

browser-trace

捕获流量（可选通过

browse network on

捕获请求/响应体）

bash

undefined

bash

undefined

Local example (see browser-trace SKILL.md for Browserbase variant)

本地示例（Browserbase变体请查看browser-trace的SKILL.md）

browse env local browse open about:blank TARGET="$(browse status --json | jq -r .wsUrl)"

node ../browser-trace/scripts/start-capture.mjs "$TARGET" my-site browse network on # capture request/response bodies browse open https://example.com

browse env local browse open about:blank TARGET="$(browse status --json | jq -r .wsUrl)"

node ../browser-trace/scripts/start-capture.mjs "$TARGET" my-site browse network on # 捕获请求/响应体 browse open https://example.com

...drive whatever flows you want covered...

...执行需要覆盖的操作流程...

Snapshot the bodies dir BEFORE turning capture off (the temp dir is shared

在关闭捕获前先快照body目录（临时目录按会话共享，若跳过此步骤，后续

browse network on

运行会覆盖当前body内容）

per-session, so subsequent

browse network on

runs would mix your bodies

—

with whatever a future capture writes if you skip this step).

—

cp -r "$(browse network path | jq -r .path)" .o11y/my-site/cdp/network/bodies/ browse network off

node ../browser-trace/scripts/stop-capture.mjs my-site node ../browser-trace/scripts/bisect-cdp.mjs my-site


`browse network on` is **optional but strongly recommended** — without it, the spec has no response-body schemas (the CDP firehose used by `browse cdp` does not embed bodies). With it, both request bodies (already captured by CDP) *and* response bodies are joined into the trace by CDP `requestId`.

cp -r "$(browse network path | jq -r .path)" .o11y/my-site/cdp/network/bodies/ browse network off

node ../browser-trace/scripts/stop-capture.mjs my-site node ../browser-trace/scripts/bisect-cdp.mjs my-site


`browse network on`是**可选但强烈推荐**的操作——如果不开启，生成的规范将没有响应体Schema（`browse cdp`使用的CDP数据流不包含响应体）。开启后，CDP已捕获的请求体和响应体将通过CDP `requestId`关联到追踪数据中。

2. Generate the spec

2. 生成规范

bash

node scripts/discover.mjs --run .o11y/my-site

bash

node scripts/discover.mjs --run .o11y/my-site

→ .o11y/my-site/api-spec/index.html ← open this

→ .o11y/my-site/api-spec/index.html ← 打开此文件查看

.o11y/my-site/api-spec/client.mjs

.o11y/my-site/api-spec/openapi.yaml

.o11y/my-site/api-spec/openapi.json

.o11y/my-site/api-spec/report.md

.o11y/my-site/api-spec/confidence.json

.o11y/my-site/api-spec/samples/*.json

.o11y/my-site/api-spec/intermediate/*.jsonl


`discover.mjs` auto-detects `<run>/cdp/network/bodies/`. To use a body capture from elsewhere (e.g. didn't snapshot, want the live `browse network` dir), pass `--bodies <path>` explicitly.


`discover.mjs`会自动检测`<run>/cdp/network/bodies/`目录。若要使用其他位置的body捕获数据（例如未快照，希望使用实时`browse network`目录），请显式传入`--bodies <path>`参数。

3. Open the HTML report

3. 打开HTML报告

After

discover.mjs

finishes, always open the generated HTML report:

bash

open .o11y/my-site/api-spec/index.html

The report is a self-contained HTML file (no server needed) that shows each discovered operation as an expandable card with variables, client usage, request/response examples, and a generated

client.mjs

snippet at the bottom. This is the primary deliverable — always open it for the user.

discover.mjs

执行完成后，请务必打开生成的HTML报告：

bash

open .o11y/my-site/api-spec/index.html

该报告是一个独立的HTML文件（无需服务器），以可展开卡片形式展示每个发现的操作，包含变量、客户端用法、请求/响应示例，底部还提供生成的

client.mjs

代码片段。这是核心交付物，请务必引导用户打开查看。

CLI flags

CLI参数

Flag	Required	Meaning
`--run <path>`	yes	Path to a `browser-trace` run directory
`--out <path>`	no	Output dir; default `<run>/api-spec/`
`--bodies <path>`	no	`browse network` capture dir to join into the trace (auto-detected from `<run>/cdp/network/bodies/` when present)
`--include <regex>`	no	Only include URLs matching regex (repeatable)
`--exclude <regex>`	no	Exclude URLs matching regex (repeatable; in addition to defaults)
`--origins <list>`	no	Comma-separated origin allow-list (e.g. `api.example.com,example.com` )
`--format <yaml\|json\|both>`	no	Output format. Default `both`
`--title <string>`	no	OpenAPI `info.title` . Default derived from primary origin
`--redact <list>`	no	Extra header names / JSON keys to redact (comma-separated)
`--min-samples <n>`	no	Minimum samples per endpoint to include. Default `1`
`--stage <name>`	no	Run only one stage: `load` , `filter` , `normalize` , `infer` , `emit`

参数	是否必填	说明
`--run <path>`	是	`browser-trace` 运行目录的路径
`--out <path>`	否	输出目录；默认值为 `<run>/api-spec/`
`--bodies <path>`	否	要关联到追踪数据的 `browse network` 捕获目录（当 `<run>/cdp/network/bodies/` 存在时会自动检测）
`--include <regex>`	否	仅包含匹配正则表达式的URL（可重复使用）
`--exclude <regex>`	否	排除匹配正则表达式的URL（可重复使用；在默认规则基础上追加）
`--origins <list>`	否	逗号分隔的源允许列表（例如 `api.example.com,example.com` ）
`--format <yaml\|json\|both>`	否	输出格式。默认值为 `both`
`--title <string>`	否	OpenAPI的 `info.title` 字段。默认从主域名推导
`--redact <list>`	否	额外需要脱敏的头名称/JSON键（逗号分隔）
`--min-samples <n>`	否	每个端点需要包含的最小样本数。默认值为 `1`
`--stage <name>`	否	仅运行指定阶段： `load` , `filter` , `normalize` , `infer` , `emit`

Output layout

输出结构

<run>/api-spec/
├── index.html                visual report — open this (self-contained, no server)
├── client.mjs                zero-dep fetch client with typed functions per operation
├── openapi.yaml              machine-readable spec
├── openapi.json              mirror
├── report.md                 markdown summary + curl examples
├── confidence.json           per-endpoint confidence + normalization flags
├── samples/                  redacted request/response examples
│   └── <method>__<path-hash>.json
└── intermediate/             pipeline byproducts (paired/filtered/endpoints jsonl)

<run>/api-spec/
├── index.html                可视化报告 — 打开此文件查看（独立文件，无需服务器）
├── client.mjs                零依赖的fetch客户端，每个操作对应一个类型化函数
├── openapi.yaml              机器可读的规范
├── openapi.json              规范的JSON格式镜像
├── report.md                 Markdown摘要 + Curl示例
├── confidence.json           每个端点的置信度 + 标准化标记
├── samples/                  脱敏后的请求/响应示例
│   └── <method>__<path-hash>.json
└── intermediate/             处理过程中的中间产物（配对/过滤/端点数据jsonl）

What you get from

browse cdp

and

browse network

browse cdp

和

browse network

的捕获能力

Two complementary capture sources:

Source	Provides	Limitation
`browse cdp` (used by `browser-trace` )	request method/URL/headers/ `postData` , response status/headers/mimeType, full event timing	Does not embed response bodies. Bodies must be pulled with `Network.getResponseBody` , which the firehose doesn't do.
`browse network on` (separate command)	request bodies AND response bodies on disk, keyed by CDP `requestId`	Capture dir is shared per `browse` session; snapshot before another `browse network on` overwrites it.

discover.mjs

will pull bodies from a

browse network

dir if you pass

--bodies <path>

(or stash them under

<run>/cdp/network/bodies/

, which is auto-detected). The matching is by

requestId

—

browse network

writes that into each

request.json

id

, and we join directly.

What changes when bodies are present:

✅ Path templating, query-param schemas, status codes, content-types — same either way.
✅ Request-body schemas —
```
postData
```
from CDP is enough; bodies dir is a nice-to-have for non-
```
postData
```
cases.
✅ Response-body schemas — fully inferred from real samples. Without bodies you get
```
{ description, content: <mimeType> }
```
skeletons.

The report flags every endpoint that has no response-body sample.

两个互补的捕获源：

来源	提供内容	局限性
`browse cdp` （ `browser-trace` 使用）	请求方法/URL/头/ `postData` 、响应状态/头/mimeType、完整事件时序	不包含响应体。必须通过 `Network.getResponseBody` 获取响应体，但数据流不会执行此操作。
`browse network on` （独立命令）	磁盘上的请求体和响应体，以CDP `requestId` 为键	捕获目录按 `browse` 会话共享；在执行另一次 `browse network on` 前需先快照，避免被覆盖。

如果传入

--bodies <path>

参数（或将body数据存放在

<run>/cdp/network/bodies/

目录，会自动检测），

discover.mjs

会从

browse network

目录中获取body数据。匹配逻辑基于

requestId

——

browse network

会将其写入每个

request.json

的

id

字段，直接关联即可。

存在body数据时的变化：

✅ URL路径模板化、查询参数Schema、状态码、内容类型 —— 有无body数据均支持。
✅ 请求体Schema —— CDP的
```
postData
```
已足够；body目录对非
```
postData
```
场景是补充增强。
✅ 响应体Schema —— 从真实样本中完整推断。若无body数据，仅会生成
```
{ description, content: <mimeType> }
```
的骨架结构。

报告会标记所有无响应体样本的端点。

Automatic noise filtering

自动噪声过滤

The normalize stage automatically classifies and drops infrastructure noise:

Tracking / analytics — paths containing

/track

/pixel

/beacon

/impression

/pageview

/dag/v*

Bot defense — Akamai (
```
/akam/
```
), fingerprint payloads (
```
sensor_data
```
), obfuscated multi-segment paths
Session plumbing —
```
/session
```
,
```
/authenticate/start
```
, cookie consent, A/B experiment endpoints
HTML page renders —
```
GET
```
requests returning
```
text/html
```
(the rendered page, not the API)

This typically drops 60-80% of captured traffic. The

--include

flag can rescue a false positive.

标准化阶段会自动分类并过滤基础设施噪声：

追踪/分析类 —— 路径包含
```
/track
```
、
```
/pixel
```
、
```
/beacon
```
、
```
/impression
```
、
```
/pageview
```
、
```
/dag/v*
```
的请求
机器人防护类 —— Akamai（
```
/akam/
```
）、指纹负载（
```
sensor_data
```
）、混淆的多段路径
会话管理类 ——
```
/session
```
、
```
/authenticate/start
```
、Cookie授权、A/B测试端点
HTML页面渲染类 —— 返回
```
text/html
```
的
```
GET
```
请求（渲染页面，非API请求）

这通常会过滤掉60-80%的捕获流量。若有误过滤的情况，可使用

--include

参数恢复。

GraphQL / multiplexed endpoint decomposition

GraphQL/多路复用端点拆分

When a single endpoint (like

/dapi/fe/gql

) is called with different

operationName

values, the skill automatically splits it into separate logical operations. Each gets its own:

OpenAPI path entry (e.g.
```
/dapi/fe/gql [Autocomplete]
```
)
Request/response schema inferred from only that operation's samples
Curl example and variables table in the report

Detection works on body fields (

operationName

method

action

) and query params (

opname

op

). This covers GraphQL (APQ and inline), JSON-RPC, and similar dispatch patterns.

当单个端点（如

/dapi/fe/gql

）被调用时传入不同的

operationName

值，本技能会自动将其拆分为多个独立的逻辑操作。每个操作会对应：

独立的OpenAPI路径条目（例如
```
/dapi/fe/gql [Autocomplete]
```
）
仅基于该操作样本推断的请求/响应Schema
报告中的Curl示例和变量表格

检测逻辑基于请求体字段（

operationName

、

method

、

action

）和查询参数（

opname

、

op

），覆盖GraphQL（APQ和内联）、JSON-RPC及类似的调度模式。

Limitations

局限性

Coverage is bounded by the captured flow. Endpoints not exercised in the trace will not appear. The skill cannot prove completeness.
Schemas are inductive, not contractual. A field might be optional on the server even if every sample contained it.
Auth is observed, not specified. The skill records auth-shaped headers in an
```
x-observed-auth
```
extension but won't claim a security scheme.
Path templating is heuristic. Numeric / UUID / hex / slug patterns are detected per segment. Ambiguous URLs are flagged in
```
confidence.json
```
.
Redaction is best-effort. Default redactions cover common credentials, but app-specific secrets may slip through; use
```
--redact
```
for known custom headers/keys.

覆盖范围受捕获流程限制：未在追踪中执行的端点不会出现在结果中，本技能无法保证规范的完整性。
Schema是归纳性的，而非契约性的：即使所有样本都包含某个字段，服务器端仍可能将其设为可选。
仅记录认证信息，不定义安全方案：本技能会在
```
x-observed-auth
```
扩展字段中记录观测到的认证头，但不会声明具体的安全方案。
路径模板化基于启发式规则：会按段检测数字/UUID/十六进制/短链接模式，模糊的URL会在
```
confidence.json
```
中标记。
脱敏是尽力而为的：默认脱敏规则覆盖常见凭证，但应用特定的密钥可能遗漏；可使用
```
--redact
```
参数指定自定义的头/键进行脱敏。

Best practices

最佳实践

Drive the flows you want documented. The richer the browser-trace, the richer the spec.
Use
--origins
for noisy sites. A marketing page hits dozens of analytics hosts; restrict to the API origin you care about.
Inspect
report.md
first. It has curl-ready examples and response samples for every discovered operation.
Bump
--min-samples
to 2+ when you want only confidently-shaped endpoints in the final doc — drop the long tail.
Pair with
browse network on
when response-body schemas matter. The CDP firehose alone has request bodies but not response bodies.

For pipeline internals and the file format reference, see REFERENCE.md.

执行需要记录的操作流程：
```
browser-trace
```
的内容越丰富，生成的规范越完善。
对嘈杂网站使用
--origins
参数：营销页面会调用数十个分析域名，可限制为你关注的API源。
先查看
report.md
：其中包含可直接使用的Curl示例和每个发现操作的响应样本。
将
--min-samples
设为2或更高：若仅希望最终文档中包含置信度高的端点，可过滤掉长尾数据。
配合
browse network on
使用：若响应体Schema很重要，必须开启此功能。CDP数据流仅包含请求体，不包含响应体。

关于流水线内部逻辑和文件格式参考，请查看REFERENCE.md。

browser-to-api

Original

Translation

Browser to API

Browser to API

When to use

适用场景

Two-step workflow

两步工作流

1. Capture with browser-trace (and optionally bodies via browse network on)

1. 使用browser-trace捕获流量（可选通过browse network on捕获请求/响应体）

Local example (see browser-trace SKILL.md for Browserbase variant)

本地示例（Browserbase变体请查看browser-trace的SKILL.md）

...drive whatever flows you want covered...

...执行需要覆盖的操作流程...

Snapshot the bodies dir BEFORE turning capture off (the temp dir is shared

在关闭捕获前先快照body目录（临时目录按会话共享，若跳过此步骤，后续browse network on运行会覆盖当前body内容）

per-session, so subsequent browse network on runs would mix your bodies

with whatever a future capture writes if you skip this step).

2. Generate the spec

2. 生成规范

→ .o11y/my-site/api-spec/index.html ← open this

→ .o11y/my-site/api-spec/index.html ← 打开此文件查看

.o11y/my-site/api-spec/client.mjs

.o11y/my-site/api-spec/client.mjs

.o11y/my-site/api-spec/openapi.yaml

.o11y/my-site/api-spec/openapi.yaml

.o11y/my-site/api-spec/openapi.json

.o11y/my-site/api-spec/openapi.json

.o11y/my-site/api-spec/report.md

.o11y/my-site/api-spec/report.md

.o11y/my-site/api-spec/confidence.json

.o11y/my-site/api-spec/confidence.json

.o11y/my-site/api-spec/samples/*.json

.o11y/my-site/api-spec/samples/*.json

.o11y/my-site/api-spec/intermediate/*.jsonl

.o11y/my-site/api-spec/intermediate/*.jsonl

3. Open the HTML report

3. 打开HTML报告

CLI flags

CLI参数

Output layout

输出结构

What you get from browse cdp and browse network

browse cdp和browse network的捕获能力

Automatic noise filtering

自动噪声过滤

GraphQL / multiplexed endpoint decomposition

GraphQL/多路复用端点拆分

Limitations

局限性

Best practices

最佳实践

1. Capture with
`browser-trace`
(and optionally bodies via
`browse network on`
)

1. 使用
`browser-trace`
捕获流量（可选通过
`browse network on`
捕获请求/响应体）

在关闭捕获前先快照body目录（临时目录按会话共享，若跳过此步骤，后续
`browse network on`
运行会覆盖当前body内容）

per-session, so subsequent
`browse network on`
runs would mix your bodies

What you get from
`browse cdp`
and
`browse network`

`browse cdp`
和
`browse network`
的捕获能力