browser-to-api
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBrowser to API
Browser to API
Replay-driven API discovery. Consume a capture, pair its CDP request / response events, templatize observed URLs, infer JSON schemas from samples, and emit an OpenAPI 3.1 document plus a human-readable coverage report.
browser-traceThis skill does not capture traffic. It is purely offline post-processing on top of 's buckets. The two skills compose:
browser-tracecdp/network/*.jsonlbrowser-trace → .o11y/<run>/cdp/network/{requests,responses}.jsonl
browser-to-api → .o11y/<run>/api-spec/index.html + openapi.yaml + client.mjs基于重放的API发现。读取捕获的内容,匹配其CDP请求/响应事件,将观测到的URL模板化,从样本中推断JSON Schema,并生成一份OpenAPI 3.1文档以及一份易读的覆盖范围报告。
browser-trace本技能不捕获流量,仅对生成的文件进行离线后处理。两个技能可配合使用:
browser-tracecdp/network/*.jsonlbrowser-trace → .o11y/<run>/cdp/network/{requests,responses}.jsonl
browser-to-api → .o11y/<run>/api-spec/index.html + openapi.yaml + client.mjsWhen to use
适用场景
- The user wants an OpenAPI document for a third-party or undocumented website API.
- The user has a run and wants endpoints + schemas extracted from it.
browser-trace - The user is building a client/SDK against a site that doesn't publish a spec.
- The user wants a coverage report showing which flows would broaden the spec.
If the user wants to capture traffic, send them to first.
browser-trace- 用户需要为第三方或无文档的网站API生成OpenAPI文档。
- 用户已完成运行,希望从中提取端点和Schema。
browser-trace - 用户正在针对未发布规范的网站构建客户端/SDK。
- 用户需要一份覆盖范围报告,了解哪些流程可以扩展规范内容。
如果用户需要捕获流量,请先引导他们使用。
browser-traceTwo-step workflow
两步工作流
1. Capture with browser-trace
(and optionally bodies via browse network on
)
browser-tracebrowse network on1. 使用browser-trace
捕获流量(可选通过browse network on
捕获请求/响应体)
browser-tracebrowse network onbash
undefinedbash
undefinedLocal example (see browser-trace SKILL.md for Browserbase variant)
本地示例(Browserbase变体请查看browser-trace的SKILL.md)
browse env local
browse open about:blank
TARGET="$(browse status --json | jq -r .wsUrl)"
node ../browser-trace/scripts/start-capture.mjs "$TARGET" my-site
browse network on # capture request/response bodies
browse open https://example.com
browse env local
browse open about:blank
TARGET="$(browse status --json | jq -r .wsUrl)"
node ../browser-trace/scripts/start-capture.mjs "$TARGET" my-site
browse network on # 捕获请求/响应体
browse open https://example.com
...drive whatever flows you want covered...
...执行需要覆盖的操作流程...
Snapshot the bodies dir BEFORE turning capture off (the temp dir is shared
在关闭捕获前先快照body目录(临时目录按会话共享,若跳过此步骤,后续browse network on
运行会覆盖当前body内容)
browse network onper-session, so subsequent browse network on
runs would mix your bodies
browse network on—
with whatever a future capture writes if you skip this step).
—
cp -r "$(browse network path | jq -r .path)" .o11y/my-site/cdp/network/bodies/
browse network off
node ../browser-trace/scripts/stop-capture.mjs my-site
node ../browser-trace/scripts/bisect-cdp.mjs my-site
`browse network on` is **optional but strongly recommended** — without it, the spec has no response-body schemas (the CDP firehose used by `browse cdp` does not embed bodies). With it, both request bodies (already captured by CDP) *and* response bodies are joined into the trace by CDP `requestId`.cp -r "$(browse network path | jq -r .path)" .o11y/my-site/cdp/network/bodies/
browse network off
node ../browser-trace/scripts/stop-capture.mjs my-site
node ../browser-trace/scripts/bisect-cdp.mjs my-site
`browse network on`是**可选但强烈推荐**的操作——如果不开启,生成的规范将没有响应体Schema(`browse cdp`使用的CDP数据流不包含响应体)。开启后,CDP已捕获的请求体和响应体将通过CDP `requestId`关联到追踪数据中。2. Generate the spec
2. 生成规范
bash
node scripts/discover.mjs --run .o11y/my-sitebash
node scripts/discover.mjs --run .o11y/my-site→ .o11y/my-site/api-spec/index.html ← open this
→ .o11y/my-site/api-spec/index.html ← 打开此文件查看
.o11y/my-site/api-spec/client.mjs
.o11y/my-site/api-spec/client.mjs
.o11y/my-site/api-spec/openapi.yaml
.o11y/my-site/api-spec/openapi.yaml
.o11y/my-site/api-spec/openapi.json
.o11y/my-site/api-spec/openapi.json
.o11y/my-site/api-spec/report.md
.o11y/my-site/api-spec/report.md
.o11y/my-site/api-spec/confidence.json
.o11y/my-site/api-spec/confidence.json
.o11y/my-site/api-spec/samples/*.json
.o11y/my-site/api-spec/samples/*.json
.o11y/my-site/api-spec/intermediate/*.jsonl
.o11y/my-site/api-spec/intermediate/*.jsonl
`discover.mjs` auto-detects `<run>/cdp/network/bodies/`. To use a body capture from elsewhere (e.g. didn't snapshot, want the live `browse network` dir), pass `--bodies <path>` explicitly.
`discover.mjs`会自动检测`<run>/cdp/network/bodies/`目录。若要使用其他位置的body捕获数据(例如未快照,希望使用实时`browse network`目录),请显式传入`--bodies <path>`参数。3. Open the HTML report
3. 打开HTML报告
After finishes, always open the generated HTML report:
discover.mjsbash
open .o11y/my-site/api-spec/index.htmlThe report is a self-contained HTML file (no server needed) that shows each discovered operation as an expandable card with variables, client usage, request/response examples, and a generated snippet at the bottom. This is the primary deliverable — always open it for the user.
client.mjsdiscover.mjsbash
open .o11y/my-site/api-spec/index.html该报告是一个独立的HTML文件(无需服务器),以可展开卡片形式展示每个发现的操作,包含变量、客户端用法、请求/响应示例,底部还提供生成的代码片段。这是核心交付物,请务必引导用户打开查看。
client.mjsCLI flags
CLI参数
| Flag | Required | Meaning |
|---|---|---|
| yes | Path to a |
| no | Output dir; default |
| no | |
| no | Only include URLs matching regex (repeatable) |
| no | Exclude URLs matching regex (repeatable; in addition to defaults) |
| no | Comma-separated origin allow-list (e.g. |
| no | Output format. Default |
| no | OpenAPI |
| no | Extra header names / JSON keys to redact (comma-separated) |
| no | Minimum samples per endpoint to include. Default |
| no | Run only one stage: |
| 参数 | 是否必填 | 说明 |
|---|---|---|
| 是 | |
| 否 | 输出目录;默认值为 |
| 否 | 要关联到追踪数据的 |
| 否 | 仅包含匹配正则表达式的URL(可重复使用) |
| 否 | 排除匹配正则表达式的URL(可重复使用;在默认规则基础上追加) |
| 否 | 逗号分隔的源允许列表(例如 |
| 否 | 输出格式。默认值为 |
| 否 | OpenAPI的 |
| 否 | 额外需要脱敏的头名称/JSON键(逗号分隔) |
| 否 | 每个端点需要包含的最小样本数。默认值为 |
| 否 | 仅运行指定阶段: |
Output layout
输出结构
<run>/api-spec/
├── index.html visual report — open this (self-contained, no server)
├── client.mjs zero-dep fetch client with typed functions per operation
├── openapi.yaml machine-readable spec
├── openapi.json mirror
├── report.md markdown summary + curl examples
├── confidence.json per-endpoint confidence + normalization flags
├── samples/ redacted request/response examples
│ └── <method>__<path-hash>.json
└── intermediate/ pipeline byproducts (paired/filtered/endpoints jsonl)<run>/api-spec/
├── index.html 可视化报告 — 打开此文件查看(独立文件,无需服务器)
├── client.mjs 零依赖的fetch客户端,每个操作对应一个类型化函数
├── openapi.yaml 机器可读的规范
├── openapi.json 规范的JSON格式镜像
├── report.md Markdown摘要 + Curl示例
├── confidence.json 每个端点的置信度 + 标准化标记
├── samples/ 脱敏后的请求/响应示例
│ └── <method>__<path-hash>.json
└── intermediate/ 处理过程中的中间产物(配对/过滤/端点数据jsonl)What you get from browse cdp
and browse network
browse cdpbrowse networkbrowse cdp
和browse network
的捕获能力
browse cdpbrowse networkTwo complementary capture sources:
| Source | Provides | Limitation |
|---|---|---|
| request method/URL/headers/ | Does not embed response bodies. Bodies must be pulled with |
| request bodies AND response bodies on disk, keyed by CDP | Capture dir is shared per |
discover.mjsbrowse network--bodies <path><run>/cdp/network/bodies/requestIdbrowse networkrequest.jsonidWhat changes when bodies are present:
- ✅ Path templating, query-param schemas, status codes, content-types — same either way.
- ✅ Request-body schemas — from CDP is enough; bodies dir is a nice-to-have for non-
postDatacases.postData - ✅ Response-body schemas — fully inferred from real samples. Without bodies you get skeletons.
{ description, content: <mimeType> }
The report flags every endpoint that has no response-body sample.
两个互补的捕获源:
| 来源 | 提供内容 | 局限性 |
|---|---|---|
| 请求方法/URL/头/ | 不包含响应体。必须通过 |
| 磁盘上的请求体和响应体,以CDP | 捕获目录按 |
如果传入参数(或将body数据存放在目录,会自动检测),会从目录中获取body数据。匹配逻辑基于——会将其写入每个的字段,直接关联即可。
--bodies <path><run>/cdp/network/bodies/discover.mjsbrowse networkrequestIdbrowse networkrequest.jsonid存在body数据时的变化:
- ✅ URL路径模板化、查询参数Schema、状态码、内容类型 —— 有无body数据均支持。
- ✅ 请求体Schema —— CDP的已足够;body目录对非
postData场景是补充增强。postData - ✅ 响应体Schema —— 从真实样本中完整推断。若无body数据,仅会生成的骨架结构。
{ description, content: <mimeType> }
报告会标记所有无响应体样本的端点。
Automatic noise filtering
自动噪声过滤
The normalize stage automatically classifies and drops infrastructure noise:
- Tracking / analytics — paths containing ,
/track,/pixel,/beacon,/impression,/pageview/dag/v* - Bot defense — Akamai (), fingerprint payloads (
/akam/), obfuscated multi-segment pathssensor_data - Session plumbing — ,
/session, cookie consent, A/B experiment endpoints/authenticate/start - HTML page renders — requests returning
GET(the rendered page, not the API)text/html
This typically drops 60-80% of captured traffic. The flag can rescue a false positive.
--include标准化阶段会自动分类并过滤基础设施噪声:
- 追踪/分析类 —— 路径包含、
/track、/pixel、/beacon、/impression、/pageview的请求/dag/v* - 机器人防护类 —— Akamai()、指纹负载(
/akam/)、混淆的多段路径sensor_data - 会话管理类 —— 、
/session、Cookie授权、A/B测试端点/authenticate/start - HTML页面渲染类 —— 返回的
text/html请求(渲染页面,非API请求)GET
这通常会过滤掉60-80%的捕获流量。若有误过滤的情况,可使用参数恢复。
--includeGraphQL / multiplexed endpoint decomposition
GraphQL/多路复用端点拆分
When a single endpoint (like ) is called with different values, the skill automatically splits it into separate logical operations. Each gets its own:
/dapi/fe/gqloperationName- OpenAPI path entry (e.g. )
/dapi/fe/gql [Autocomplete] - Request/response schema inferred from only that operation's samples
- Curl example and variables table in the report
Detection works on body fields (, , ) and query params (, ). This covers GraphQL (APQ and inline), JSON-RPC, and similar dispatch patterns.
operationNamemethodactionopnameop当单个端点(如)被调用时传入不同的值,本技能会自动将其拆分为多个独立的逻辑操作。每个操作会对应:
/dapi/fe/gqloperationName- 独立的OpenAPI路径条目(例如)
/dapi/fe/gql [Autocomplete] - 仅基于该操作样本推断的请求/响应Schema
- 报告中的Curl示例和变量表格
检测逻辑基于请求体字段(、、)和查询参数(、),覆盖GraphQL(APQ和内联)、JSON-RPC及类似的调度模式。
operationNamemethodactionopnameopLimitations
局限性
- Coverage is bounded by the captured flow. Endpoints not exercised in the trace will not appear. The skill cannot prove completeness.
- Schemas are inductive, not contractual. A field might be optional on the server even if every sample contained it.
- Auth is observed, not specified. The skill records auth-shaped headers in an extension but won't claim a security scheme.
x-observed-auth - Path templating is heuristic. Numeric / UUID / hex / slug patterns are detected per segment. Ambiguous URLs are flagged in .
confidence.json - Redaction is best-effort. Default redactions cover common credentials, but app-specific secrets may slip through; use for known custom headers/keys.
--redact
- 覆盖范围受捕获流程限制:未在追踪中执行的端点不会出现在结果中,本技能无法保证规范的完整性。
- Schema是归纳性的,而非契约性的:即使所有样本都包含某个字段,服务器端仍可能将其设为可选。
- 仅记录认证信息,不定义安全方案:本技能会在扩展字段中记录观测到的认证头,但不会声明具体的安全方案。
x-observed-auth - 路径模板化基于启发式规则:会按段检测数字/UUID/十六进制/短链接模式,模糊的URL会在中标记。
confidence.json - 脱敏是尽力而为的:默认脱敏规则覆盖常见凭证,但应用特定的密钥可能遗漏;可使用参数指定自定义的头/键进行脱敏。
--redact
Best practices
最佳实践
- Drive the flows you want documented. The richer the browser-trace, the richer the spec.
- Use for noisy sites. A marketing page hits dozens of analytics hosts; restrict to the API origin you care about.
--origins - Inspect first. It has curl-ready examples and response samples for every discovered operation.
report.md - Bump to 2+ when you want only confidently-shaped endpoints in the final doc — drop the long tail.
--min-samples - Pair with when response-body schemas matter. The CDP firehose alone has request bodies but not response bodies.
browse network on
For pipeline internals and the file format reference, see REFERENCE.md.
- 执行需要记录的操作流程:的内容越丰富,生成的规范越完善。
browser-trace - 对嘈杂网站使用参数:营销页面会调用数十个分析域名,可限制为你关注的API源。
--origins - 先查看:其中包含可直接使用的Curl示例和每个发现操作的响应样本。
report.md - 将设为2或更高:若仅希望最终文档中包含置信度高的端点,可过滤掉长尾数据。
--min-samples - 配合使用:若响应体Schema很重要,必须开启此功能。CDP数据流仅包含请求体,不包含响应体。
browse network on
关于流水线内部逻辑和文件格式参考,请查看REFERENCE.md。