browser-to-api

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Browser to API

Browser to API

Replay-driven API discovery. Consume a
browser-trace
capture, pair its CDP request / response events, templatize observed URLs, infer JSON schemas from samples, and emit an OpenAPI 3.1 document plus a human-readable coverage report.
This skill does not capture traffic. It is purely offline post-processing on top of
browser-trace
's
cdp/network/*.jsonl
buckets. The two skills compose:
browser-trace    →  .o11y/<run>/cdp/network/{requests,responses}.jsonl
browser-to-api   →  .o11y/<run>/api-spec/index.html + openapi.yaml + client.mjs
基于重放的API发现。读取
browser-trace
捕获的内容,匹配其CDP请求/响应事件,将观测到的URL模板化,从样本中推断JSON Schema,并生成一份OpenAPI 3.1文档以及一份易读的覆盖范围报告。
本技能不捕获流量,仅对
browser-trace
生成的
cdp/network/*.jsonl
文件进行离线后处理。两个技能可配合使用:
browser-trace    →  .o11y/<run>/cdp/network/{requests,responses}.jsonl
browser-to-api   →  .o11y/<run>/api-spec/index.html + openapi.yaml + client.mjs

When to use

适用场景

  • The user wants an OpenAPI document for a third-party or undocumented website API.
  • The user has a
    browser-trace
    run and wants endpoints + schemas extracted from it.
  • The user is building a client/SDK against a site that doesn't publish a spec.
  • The user wants a coverage report showing which flows would broaden the spec.
If the user wants to capture traffic, send them to
browser-trace
first.
  • 用户需要为第三方或无文档的网站API生成OpenAPI文档。
  • 用户已完成
    browser-trace
    运行,希望从中提取端点和Schema。
  • 用户正在针对未发布规范的网站构建客户端/SDK。
  • 用户需要一份覆盖范围报告,了解哪些流程可以扩展规范内容。
如果用户需要捕获流量,请先引导他们使用
browser-trace

Two-step workflow

两步工作流

1. Capture with
browser-trace
(and optionally bodies via
browse network on
)

1. 使用
browser-trace
捕获流量(可选通过
browse network on
捕获请求/响应体)

bash
undefined
bash
undefined

Local example (see browser-trace SKILL.md for Browserbase variant)

本地示例(Browserbase变体请查看browser-trace的SKILL.md)

browse env local browse open about:blank TARGET="$(browse status --json | jq -r .wsUrl)"
node ../browser-trace/scripts/start-capture.mjs "$TARGET" my-site browse network on # capture request/response bodies browse open https://example.com
browse env local browse open about:blank TARGET="$(browse status --json | jq -r .wsUrl)"
node ../browser-trace/scripts/start-capture.mjs "$TARGET" my-site browse network on # 捕获请求/响应体 browse open https://example.com

...drive whatever flows you want covered...

...执行需要覆盖的操作流程...

Snapshot the bodies dir BEFORE turning capture off (the temp dir is shared

在关闭捕获前先快照body目录(临时目录按会话共享,若跳过此步骤,后续
browse network on
运行会覆盖当前body内容)

per-session, so subsequent
browse network on
runs would mix your bodies

with whatever a future capture writes if you skip this step).

cp -r "$(browse network path | jq -r .path)" .o11y/my-site/cdp/network/bodies/ browse network off
node ../browser-trace/scripts/stop-capture.mjs my-site node ../browser-trace/scripts/bisect-cdp.mjs my-site

`browse network on` is **optional but strongly recommended** — without it, the spec has no response-body schemas (the CDP firehose used by `browse cdp` does not embed bodies). With it, both request bodies (already captured by CDP) *and* response bodies are joined into the trace by CDP `requestId`.
cp -r "$(browse network path | jq -r .path)" .o11y/my-site/cdp/network/bodies/ browse network off
node ../browser-trace/scripts/stop-capture.mjs my-site node ../browser-trace/scripts/bisect-cdp.mjs my-site

`browse network on`是**可选但强烈推荐**的操作——如果不开启,生成的规范将没有响应体Schema(`browse cdp`使用的CDP数据流不包含响应体)。开启后,CDP已捕获的请求体和响应体将通过CDP `requestId`关联到追踪数据中。

2. Generate the spec

2. 生成规范

bash
node scripts/discover.mjs --run .o11y/my-site
bash
node scripts/discover.mjs --run .o11y/my-site

→ .o11y/my-site/api-spec/index.html ← open this

→ .o11y/my-site/api-spec/index.html ← 打开此文件查看

.o11y/my-site/api-spec/client.mjs

.o11y/my-site/api-spec/client.mjs

.o11y/my-site/api-spec/openapi.yaml

.o11y/my-site/api-spec/openapi.yaml

.o11y/my-site/api-spec/openapi.json

.o11y/my-site/api-spec/openapi.json

.o11y/my-site/api-spec/report.md

.o11y/my-site/api-spec/report.md

.o11y/my-site/api-spec/confidence.json

.o11y/my-site/api-spec/confidence.json

.o11y/my-site/api-spec/samples/*.json

.o11y/my-site/api-spec/samples/*.json

.o11y/my-site/api-spec/intermediate/*.jsonl

.o11y/my-site/api-spec/intermediate/*.jsonl


`discover.mjs` auto-detects `<run>/cdp/network/bodies/`. To use a body capture from elsewhere (e.g. didn't snapshot, want the live `browse network` dir), pass `--bodies <path>` explicitly.

`discover.mjs`会自动检测`<run>/cdp/network/bodies/`目录。若要使用其他位置的body捕获数据(例如未快照,希望使用实时`browse network`目录),请显式传入`--bodies <path>`参数。

3. Open the HTML report

3. 打开HTML报告

After
discover.mjs
finishes, always open the generated HTML report:
bash
open .o11y/my-site/api-spec/index.html
The report is a self-contained HTML file (no server needed) that shows each discovered operation as an expandable card with variables, client usage, request/response examples, and a generated
client.mjs
snippet at the bottom. This is the primary deliverable — always open it for the user.
discover.mjs
执行完成后,请务必打开生成的HTML报告
bash
open .o11y/my-site/api-spec/index.html
该报告是一个独立的HTML文件(无需服务器),以可展开卡片形式展示每个发现的操作,包含变量、客户端用法、请求/响应示例,底部还提供生成的
client.mjs
代码片段。这是核心交付物,请务必引导用户打开查看。

CLI flags

CLI参数

FlagRequiredMeaning
--run <path>
yesPath to a
browser-trace
run directory
--out <path>
noOutput dir; default
<run>/api-spec/
--bodies <path>
no
browse network
capture dir to join into the trace (auto-detected from
<run>/cdp/network/bodies/
when present)
--include <regex>
noOnly include URLs matching regex (repeatable)
--exclude <regex>
noExclude URLs matching regex (repeatable; in addition to defaults)
--origins <list>
noComma-separated origin allow-list (e.g.
api.example.com,example.com
)
--format <yaml|json|both>
noOutput format. Default
both
--title <string>
noOpenAPI
info.title
. Default derived from primary origin
--redact <list>
noExtra header names / JSON keys to redact (comma-separated)
--min-samples <n>
noMinimum samples per endpoint to include. Default
1
--stage <name>
noRun only one stage:
load
,
filter
,
normalize
,
infer
,
emit
参数是否必填说明
--run <path>
browser-trace
运行目录的路径
--out <path>
输出目录;默认值为
<run>/api-spec/
--bodies <path>
要关联到追踪数据的
browse network
捕获目录(当
<run>/cdp/network/bodies/
存在时会自动检测)
--include <regex>
仅包含匹配正则表达式的URL(可重复使用)
--exclude <regex>
排除匹配正则表达式的URL(可重复使用;在默认规则基础上追加)
--origins <list>
逗号分隔的源允许列表(例如
api.example.com,example.com
--format <yaml|json|both>
输出格式。默认值为
both
--title <string>
OpenAPI的
info.title
字段。默认从主域名推导
--redact <list>
额外需要脱敏的头名称/JSON键(逗号分隔)
--min-samples <n>
每个端点需要包含的最小样本数。默认值为
1
--stage <name>
仅运行指定阶段:
load
,
filter
,
normalize
,
infer
,
emit

Output layout

输出结构

<run>/api-spec/
├── index.html                visual report — open this (self-contained, no server)
├── client.mjs                zero-dep fetch client with typed functions per operation
├── openapi.yaml              machine-readable spec
├── openapi.json              mirror
├── report.md                 markdown summary + curl examples
├── confidence.json           per-endpoint confidence + normalization flags
├── samples/                  redacted request/response examples
│   └── <method>__<path-hash>.json
└── intermediate/             pipeline byproducts (paired/filtered/endpoints jsonl)
<run>/api-spec/
├── index.html                可视化报告 — 打开此文件查看(独立文件,无需服务器)
├── client.mjs                零依赖的fetch客户端,每个操作对应一个类型化函数
├── openapi.yaml              机器可读的规范
├── openapi.json              规范的JSON格式镜像
├── report.md                 Markdown摘要 + Curl示例
├── confidence.json           每个端点的置信度 + 标准化标记
├── samples/                  脱敏后的请求/响应示例
│   └── <method>__<path-hash>.json
└── intermediate/             处理过程中的中间产物(配对/过滤/端点数据jsonl)

What you get from
browse cdp
and
browse network

browse cdp
browse network
的捕获能力

Two complementary capture sources:
SourceProvidesLimitation
browse cdp
(used by
browser-trace
)
request method/URL/headers/
postData
, response status/headers/mimeType, full event timing
Does not embed response bodies. Bodies must be pulled with
Network.getResponseBody
, which the firehose doesn't do.
browse network on
(separate command)
request bodies AND response bodies on disk, keyed by CDP
requestId
Capture dir is shared per
browse
session; snapshot before another
browse network on
overwrites it.
discover.mjs
will pull bodies from a
browse network
dir if you pass
--bodies <path>
(or stash them under
<run>/cdp/network/bodies/
, which is auto-detected). The matching is by
requestId
browse network
writes that into each
request.json
as
id
, and we join directly.
What changes when bodies are present:
  • ✅ Path templating, query-param schemas, status codes, content-types — same either way.
  • ✅ Request-body schemas —
    postData
    from CDP is enough; bodies dir is a nice-to-have for non-
    postData
    cases.
  • Response-body schemas — fully inferred from real samples. Without bodies you get
    { description, content: <mimeType> }
    skeletons.
The report flags every endpoint that has no response-body sample.
两个互补的捕获源:
来源提供内容局限性
browse cdp
browser-trace
使用)
请求方法/URL/头/
postData
、响应状态/头/mimeType、完整事件时序
不包含响应体。必须通过
Network.getResponseBody
获取响应体,但数据流不会执行此操作。
browse network on
(独立命令)
磁盘上的请求体和响应体,以CDP
requestId
为键
捕获目录按
browse
会话共享;在执行另一次
browse network on
前需先快照,避免被覆盖。
如果传入
--bodies <path>
参数(或将body数据存放在
<run>/cdp/network/bodies/
目录,会自动检测),
discover.mjs
会从
browse network
目录中获取body数据。匹配逻辑基于
requestId
——
browse network
会将其写入每个
request.json
id
字段,直接关联即可。
存在body数据时的变化:
  • ✅ URL路径模板化、查询参数Schema、状态码、内容类型 —— 有无body数据均支持。
  • ✅ 请求体Schema —— CDP的
    postData
    已足够;body目录对非
    postData
    场景是补充增强。
  • 响应体Schema —— 从真实样本中完整推断。若无body数据,仅会生成
    { description, content: <mimeType> }
    的骨架结构。
报告会标记所有无响应体样本的端点。

Automatic noise filtering

自动噪声过滤

The normalize stage automatically classifies and drops infrastructure noise:
  • Tracking / analytics — paths containing
    /track
    ,
    /pixel
    ,
    /beacon
    ,
    /impression
    ,
    /pageview
    ,
    /dag/v*
  • Bot defense — Akamai (
    /akam/
    ), fingerprint payloads (
    sensor_data
    ), obfuscated multi-segment paths
  • Session plumbing
    /session
    ,
    /authenticate/start
    , cookie consent, A/B experiment endpoints
  • HTML page renders
    GET
    requests returning
    text/html
    (the rendered page, not the API)
This typically drops 60-80% of captured traffic. The
--include
flag can rescue a false positive.
标准化阶段会自动分类并过滤基础设施噪声:
  • 追踪/分析类 —— 路径包含
    /track
    /pixel
    /beacon
    /impression
    /pageview
    /dag/v*
    的请求
  • 机器人防护类 —— Akamai(
    /akam/
    )、指纹负载(
    sensor_data
    )、混淆的多段路径
  • 会话管理类 ——
    /session
    /authenticate/start
    、Cookie授权、A/B测试端点
  • HTML页面渲染类 —— 返回
    text/html
    GET
    请求(渲染页面,非API请求)
这通常会过滤掉60-80%的捕获流量。若有误过滤的情况,可使用
--include
参数恢复。

GraphQL / multiplexed endpoint decomposition

GraphQL/多路复用端点拆分

When a single endpoint (like
/dapi/fe/gql
) is called with different
operationName
values, the skill automatically splits it into separate logical operations. Each gets its own:
  • OpenAPI path entry (e.g.
    /dapi/fe/gql [Autocomplete]
    )
  • Request/response schema inferred from only that operation's samples
  • Curl example and variables table in the report
Detection works on body fields (
operationName
,
method
,
action
) and query params (
opname
,
op
). This covers GraphQL (APQ and inline), JSON-RPC, and similar dispatch patterns.
当单个端点(如
/dapi/fe/gql
)被调用时传入不同的
operationName
值,本技能会自动将其拆分为多个独立的逻辑操作。每个操作会对应:
  • 独立的OpenAPI路径条目(例如
    /dapi/fe/gql [Autocomplete]
  • 仅基于该操作样本推断的请求/响应Schema
  • 报告中的Curl示例和变量表格
检测逻辑基于请求体字段(
operationName
method
action
)和查询参数(
opname
op
),覆盖GraphQL(APQ和内联)、JSON-RPC及类似的调度模式。

Limitations

局限性

  • Coverage is bounded by the captured flow. Endpoints not exercised in the trace will not appear. The skill cannot prove completeness.
  • Schemas are inductive, not contractual. A field might be optional on the server even if every sample contained it.
  • Auth is observed, not specified. The skill records auth-shaped headers in an
    x-observed-auth
    extension but won't claim a security scheme.
  • Path templating is heuristic. Numeric / UUID / hex / slug patterns are detected per segment. Ambiguous URLs are flagged in
    confidence.json
    .
  • Redaction is best-effort. Default redactions cover common credentials, but app-specific secrets may slip through; use
    --redact
    for known custom headers/keys.
  • 覆盖范围受捕获流程限制:未在追踪中执行的端点不会出现在结果中,本技能无法保证规范的完整性。
  • Schema是归纳性的,而非契约性的:即使所有样本都包含某个字段,服务器端仍可能将其设为可选。
  • 仅记录认证信息,不定义安全方案:本技能会在
    x-observed-auth
    扩展字段中记录观测到的认证头,但不会声明具体的安全方案。
  • 路径模板化基于启发式规则:会按段检测数字/UUID/十六进制/短链接模式,模糊的URL会在
    confidence.json
    中标记。
  • 脱敏是尽力而为的:默认脱敏规则覆盖常见凭证,但应用特定的密钥可能遗漏;可使用
    --redact
    参数指定自定义的头/键进行脱敏。

Best practices

最佳实践

  1. Drive the flows you want documented. The richer the browser-trace, the richer the spec.
  2. Use
    --origins
    for noisy sites.
    A marketing page hits dozens of analytics hosts; restrict to the API origin you care about.
  3. Inspect
    report.md
    first.
    It has curl-ready examples and response samples for every discovered operation.
  4. Bump
    --min-samples
    to 2+
    when you want only confidently-shaped endpoints in the final doc — drop the long tail.
  5. Pair with
    browse network on
    when response-body schemas matter. The CDP firehose alone has request bodies but not response bodies.
For pipeline internals and the file format reference, see REFERENCE.md.
  1. 执行需要记录的操作流程
    browser-trace
    的内容越丰富,生成的规范越完善。
  2. 对嘈杂网站使用
    --origins
    参数
    :营销页面会调用数十个分析域名,可限制为你关注的API源。
  3. 先查看
    report.md
    :其中包含可直接使用的Curl示例和每个发现操作的响应样本。
  4. --min-samples
    设为2或更高
    :若仅希望最终文档中包含置信度高的端点,可过滤掉长尾数据。
  5. 配合
    browse network on
    使用
    :若响应体Schema很重要,必须开启此功能。CDP数据流仅包含请求体,不包含响应体。
关于流水线内部逻辑和文件格式参考,请查看REFERENCE.md