hermai-contribute

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Hermai — Contribute schemas to the registry

Hermai — 向注册表贡献Schema

Contributing a schema means reverse-engineering a site once so every future agent can call it without scraping.

For calling already-registered sites, use the hermai skill.

贡献Schema意味着只需对网站进行一次逆向分析，之后所有Agent都无需通过网页抓取即可调用该网站。

若要调用已注册的网站，请使用hermai工具。

Before you run any command

运行任何命令之前

Hermai is the interaction layer for agents, not just a read directory. A good schema covers what a user does on the site — browse, search, view, add to cart, log in, post — not just what's on the homepage.

Before running any discovery command, write down the interactions a user performs on this site. For a shop that's typically:

Browse catalog / category listings
Search by keyword
View product detail
Add to cart / view cart / update cart
Log in / register / view account
Checkout / place order
Write review / view reviews

For news: list articles, view article, search articles, subscribe. For social: profile, posts, comments, follow, like. Different site type, different interactions.

A schema with only
product_detail
is 10% done. See references/coverage-checklist.md before declaring a schema complete.

Hermai是Agent的交互层，而非单纯的只读目录。 优质的Schema应涵盖用户在网站上的操作行为——浏览、搜索、查看、加入购物车、登录、发布内容等，而非仅包含首页的内容。

在运行任何发现命令之前，请写下用户在该网站上执行的交互操作。以电商网站为例，通常包括：

浏览商品目录/分类列表
按关键词搜索
查看商品详情
加入购物车/查看购物车/更新购物车
登录/注册/查看账户
结账/下单
撰写评论/查看评论

对于新闻网站：列出文章、查看文章、搜索文章、订阅。对于社交平台：个人主页、帖子、评论、关注、点赞。不同类型的网站对应不同的交互操作。

仅包含
product_detail
的Schema仅完成了10%。在宣布Schema完成前，请查看references/coverage-checklist.md。

Quick start

快速开始

bash

undefined

bash

undefined

1. Probe the site to see what platform it uses

1. 探测网站以确定其使用的平台

hermai probe --stealth https://example.com

2. Extract data from each interaction URL (product, category, search, etc.)

2. 从每个交互URL（商品、分类、搜索等）提取数据

hermai probe --body --stealth https://example.com/path | hermai extract

3. Use intercept to capture XHR calls for dynamic pages (search, cart, filters)

3. 使用intercept捕获动态页面（搜索、购物车、筛选器）的XHR请求

hermai intercept https://example.com/search?q=test

4. Build a schema JSON with ALL interactions covered, then push

4. 构建包含所有交互操作的Schema JSON，然后推送

hermai registry push schema.json

undefined

hermai registry push schema.json

undefined

CLI discovery commands

CLI发现命令

These are deterministic tools — no API key needed. Chain them like an agent would.

Command	Purpose
`hermai probe <url>`	TLS-fingerprinted fetch, anti-bot detection, strategy discovery
`hermai probe --body <url>`	Raw HTML to stdout (pipe to extract)
`hermai extract [file]`	Extract all embedded data patterns from HTML
`hermai extract --pattern <name>`	Extract one specific pattern
`hermai wellknown <domain>`	Probe 15 standard paths (robots, sitemap, RSS, GraphQL, oEmbed, WP API)
`hermai introspect <url>`	GraphQL schema discovery via introspection query
`hermai detect <url>`	Identify anti-bot systems + platform/CMS
`hermai intercept <url>`	Launch browser, capture XHR/API calls, output replay specs
`hermai replay <req.json>`	Replay a captured request with TLS fingerprinting
`hermai discover <url>`	Full engine discovery (needs LLM key)

这些是确定性工具——无需API密钥。可像Agent一样链式调用。

命令	用途
`hermai probe <url>`	TLS指纹抓取、反机器人检测、策略发现
`hermai probe --body <url>`	将原始HTML输出到标准输出（可管道传输至extract）
`hermai extract [file]`	从HTML中提取所有嵌入的数据模式
`hermai extract --pattern <name>`	提取特定的模式
`hermai wellknown <domain>`	探测15个标准路径（robots、sitemap、RSS、GraphQL、oEmbed、WP API等）
`hermai introspect <url>`	通过自省查询发现GraphQL Schema
`hermai detect <url>`	识别反机器人系统 + 平台/CMS
`hermai intercept <url>`	启动浏览器，捕获XHR/API请求，输出重放规范
`hermai replay <req.json>`	使用TLS指纹重放捕获的请求
`hermai discover <url>`	完整引擎发现（需要LLM密钥）

Discovery pipeline

发现流程

Work one interaction at a time. Don't stop after the homepage — that's the trap that produces 2-endpoint schemas.

Phase 1: Classify the site.

bash

undefined

逐个处理交互操作。不要在首页就停止——这是导致仅生成2个端点的Schema的陷阱。

阶段1：分类网站。

bash

undefined

What platform is it? (WordPress? Shopline? custom?)

它使用什么平台？（WordPress？Shopline？自定义？）

hermai detect --stealth https://example.com

What standard paths exist?

存在哪些标准路径？

hermai wellknown example.com

If GraphQL found — introspect it (gets reads AND writes)

如果发现GraphQL——自省它（获取读取和写入操作）

hermai introspect https://example.com/graphql


Use `script_hosts` and `preconnect_hosts` from detect to identify the platform. Known platforms (Shopify, Shopline, WordPress, Cyberbiz, WACA, EasyStore, etc.) have documented APIs — research them before hand-discovering. See [references/platforms.md](references/platforms.md) for a running list.

**Phase 2: Cover each interaction.**

For each user interaction you listed above, discover its endpoint:

```bash

hermai introspect https://example.com/graphql


使用detect命令返回的`script_hosts`和`preconnect_hosts`识别平台。已知平台（Shopify、Shopline、WordPress、Cyberbiz、WACA、EasyStore等）有文档化的API——手动发现前先研究它们。请查看[references/platforms.md](references/platforms.md)获取最新列表。

**阶段2：覆盖每个交互操作。**

针对你上面列出的每个用户交互操作，发现其端点：

```bash

Static listing pages (category, article index)

静态列表页面（分类、文章索引）

hermai probe --body --stealth https://example.com/categories/toys
| hermai extract

Detail pages (product, article, profile)

详情页面（商品、文章、个人主页）

hermai probe --body --stealth https://example.com/products/item-123
| hermai extract

Dynamic pages (search, filters, cart) — intercept the real XHR

动态页面（搜索、筛选器、购物车）——捕获真实的XHR请求

hermai intercept https://example.com/search?q=gundam

Verify a captured XHR works standalone

验证捕获的XHR请求可独立运行

hermai replay request.json --stealth


Intercept is the tool for anything that updates dynamically — search-as-you-type, filter sidebars, cart updates, infinite scroll. The HTML alone won't reveal the API endpoint; you need to watch the network.

**Phase 3: Write interactions the site supports.**

Read-only schemas are 50% of the value. Capture writes too — login, add-to-cart, submit review. This is where `intercept` shines: perform the action in the browser and capture the POST request. See [references/actions.md](references/actions.md) for how to document write operations.

The `extract` command recognizes 13 embedded patterns: `ytInitialData`, `ytInitialPlayerResponse`, `__NEXT_DATA__`, `__UNIVERSAL_DATA_FOR_REHYDRATION__`, `SIGI_STATE`, `__APOLLO_STATE__`, `__PRELOADED_STATE__`, `__remixContext`, `__NUXT__`, `__NUXT_DATA__`, `__FRONTITY_CONNECT_STATE__`, `__MODERN_ROUTER_DATA__`, `__INITIAL_STATE__`.

hermai replay request.json --stealth


Intercept适用于任何动态更新的内容——即时搜索、侧边栏筛选、购物车更新、无限滚动。仅靠HTML无法揭示API端点；你需要监控网络请求。

**阶段3：记录网站支持的交互操作。**

只读Schema仅能发挥50%的价值。也要捕获写入操作——登录、加入购物车、提交评论。这正是`intercept`的优势：在浏览器中执行操作并捕获POST请求。请查看[references/actions.md](references/actions.md)了解如何记录写入操作。

`extract`命令可识别13种嵌入模式：`ytInitialData`、`ytInitialPlayerResponse`、`__NEXT_DATA__`、`__UNIVERSAL_DATA_FOR_REHYDRATION__`、`SIGI_STATE`、`__APOLLO_STATE__`、`__PRELOADED_STATE__`、`__remixContext`、`__NUXT__`、`__NUXT_DATA__`、`__FRONTITY_CONNECT_STATE__`、`__MODERN_ROUTER_DATA__`、`__INITIAL_STATE__`。

Writing descriptions (public vs. private split)

撰写描述（公开与私有拆分）

Read this every time before drafting a schema — this is the #1 pattern contributors get wrong, and it has a security angle.

The catalog publishes a public card (what the schema offers) and a full package (how to execute, paywalled behind an API key + intent). The split exists so the extraction recipe — parse paths, selectors, JSON script tag IDs, endpoint internals — stays behind the auth gate. If someone can read the recipe on the public card, they can re-implement the site themselves without ever calling our API, which defeats the point.

每次起草Schema前都要阅读本节——这是贡献者最容易出错的地方，且涉及安全层面。

目录会发布公开卡片（Schema提供的功能）和完整包（执行方法，需API密钥+意图才能访问）。拆分的目的是让提取方法——解析路径、选择器、JSON脚本标签ID、端点内部细节——保留在认证网关之后。如果有人能在公开卡片上看到提取方法，他们就可以自行实现网站调用，无需使用我们的API，这就失去了意义。

Top-level

description

— public, user-voice

顶层

description

——公开，用户视角

One or two sentences describing what information the caller can get, in the voice of a user deciding whether to use the schema.

Good:

"Search public repositories, get repository details, and list of users' public repos, etc." — github.com

"Read public Threads profiles and posts. Pulls a profile's display name, bio, follower and thread counts, plus every post in a thread with their text, images, timestamps, and like counts." — threads.com

Bad:

~~"A single GET to /@{user}/post/{id} returns the full thread chain inside <script type="application/json" data-sjs> blocks..."~~

"Use
hermai probe --stealth
then pipe to
hermai extract
to get the embedded NEXT_DATA payload..."

Sanity check: "would this sentence still make sense to someone who has never used the CLI?" If not, rewrite.

用一到两句话描述调用者可获取的信息，以用户决定是否使用该Schema的视角撰写。

示例：

"搜索公开仓库，获取仓库详情以及用户的公开仓库列表等。" — github.com

"读取公开的Threads个人主页和帖子。获取个人主页的显示名称、简介、关注者和帖子数量，以及帖子线程中的每条帖子的文本、图片、时间戳和点赞数。" — threads.com

反面示例：

~~"向/@用户/帖子/{id}发送单个GET请求，可在<script type="application/json" data-sjs>块中获取完整的帖子线程链..."~~

"使用
hermai probe --stealth
，然后管道传输至
hermai extract
以获取嵌入的__NEXT_DATA__负载..."

合理性检查：“从未使用过CLI的人能理解这句话吗？” 如果不能，请重写。

Per-endpoint fields —

purpose

vs.

description

每个端点字段——

purpose

vs.

description

Every endpoint carries two separate fields:

purpose
— public, shown on the catalog card. One sentence, user-voice, names the data this endpoint returns. No URL paths, no jq, no CLI commands.
description
— private, only in the full package (requires API key + intent). The full technical how-to: URL template notes, parse paths, selectors, regex, JSON script tag IDs, field names, pagination semantics, edge cases. Write it for an agent that just pulled the schema and needs to execute.

Example:

json

{
  "name": "post_detail",
  "method": "GET",
  "url_template": "https://www.threads.com/@{username}/post/{post_id}",
  "purpose": "Get a post's full thread chain and every user reply, with text, images, timestamps, and like counts.",
  "description": "Select every <script type=\"application/json\" data-sjs>…</script> in the HTML, JSON.parse each body, and recursively walk every object looking for the key `thread_items`. Each `thread_items[n].post` is a full post object with: `code` (shortcode = URL slug), `caption.text` (body markdown), `user.username`, `user.pk`, `like_count`, `taken_at`, `carousel_media[]`, `canonical_url`, `reply_facepile_users`. Filter by `user.username == {username}` to drop the 3–5 unrelated 'also on Threads' rail posts."
}

Rule of thumb: if a line mentions a specific JSON key, script tag id, regex, or jq path — it goes in

description

. If it says what you can learn or do — it goes in

purpose

每个端点包含两个独立字段：

purpose
——公开，显示在目录卡片上。一句话，用户视角，说明该端点返回的数据。不包含URL路径、jq或CLI命令。
description
——私有，仅在完整包中（需要API密钥+意图）。完整的技术实现指南：URL模板说明、解析路径、选择器、正则表达式、JSON脚本标签ID、字段名称、分页规则、边缘情况。为刚获取Schema并需要执行的Agent撰写。

示例：

json

{
  "name": "post_detail",
  "method": "GET",
  "url_template": "https://www.threads.com/@{username}/post/{post_id}",
  "purpose": "获取帖子的完整线程链和所有用户回复，包括文本、图片、时间戳和点赞数。",
  "description": "选择HTML中所有<script type=\"application/json\" data-sjs>…</script>标签，解析每个标签的JSON内容，递归遍历所有对象寻找`thread_items`键。每个`thread_items[n].post`是完整的帖子对象，包含：`code`（短代码=URL slug）、`caption.text`（正文markdown）、`user.username`、`user.pk`、`like_count`、`taken_at`、`carousel_media[]`、`canonical_url`、`reply_facepile_users`。通过`user.username == {username}`过滤掉3-5个无关的“Threads上还有”推荐帖子。"
}

经验法则：如果某行提到特定的JSON键、脚本标签ID、正则表达式或jq路径——放入

description

。如果说明你能了解或执行的操作——放入

purpose

。

Authentication

认证

Pushing requires an API key:

bash

hermai registry login

Keys come from https://hermai.ai/dashboard (GitHub sign-in).

推送需要API密钥：

bash

hermai registry login

密钥可从https://hermai.ai/dashboard获取（GitHub登录）。

Push

推送

bash

undefined

bash

undefined

CLI

CLI方式

hermai registry push schema.json

API

API方式

curl -X POST https://api.hermai.ai/v1/schemas
-H "Authorization: Bearer $KEY"
-H "Content-Type: application/json"
-d @schema.json


Pushing identical content twice is idempotent (content-hashed). If the push is rejected, see [references/troubleshooting.md](references/troubleshooting.md) for error codes.

curl -X POST https://api.hermai.ai/v1/schemas
-H "Authorization: Bearer $KEY"
-H "Content-Type: application/json"
-d @schema.json


推送相同内容两次是幂等的（基于内容哈希）。如果推送被拒绝，请查看[references/troubleshooting.md](references/troubleshooting.md)了解错误代码。

Staying up to date

保持更新

On every call to

api.hermai.ai

(and on every

hermai registry push

pull

that reaches it), send two headers so Hermai can notify the user when a newer release ships:

X-Hermai-Skill-Name: hermai-contribute
X-Hermai-Skill-Version: 1.0.0

Read the version from this file's frontmatter — don't hardcode it. The

name

stays

hermai-contribute

If a newer release exists, the response JSON will carry a

meta.skill_update

object:

json

{
  "meta": {
    "skill_update": {
      "current": "1.0.0",
      "latest": "1.2.0",
      "update_cmd": "npx skills update hermai-contribute",
      "changelog_url": "https://github.com/hermai-ai/hermai-skills/releases"
    }
  }
}

When you see this field, tell the user in one short sentence before carrying on with the task — e.g. "Heads up: your hermai-contribute skill (1.0.0) is behind 1.2.0. Run
npx skills update hermai-contribute
to upgrade." If

meta.skill_update

is absent, the user is current — no nudge needed.

每次调用

api.hermai.ai

（以及每次调用

hermai registry push

或

pull

并连接到该API时），发送两个请求头，以便Hermai在有新版本发布时通知用户：

X-Hermai-Skill-Name: hermai-contribute
X-Hermai-Skill-Version: 1.0.0

从本文件的前置元数据中读取版本——不要硬编码。

name

固定为

hermai-contribute

。

如果存在新版本，响应JSON将包含

meta.skill_update

对象：

json

{
  "meta": {
    "skill_update": {
      "current": "1.0.0",
      "latest": "1.2.0",
      "update_cmd": "npx skills update hermai-contribute",
      "changelog_url": "https://github.com/hermai-ai/hermai-skills/releases"
    }
  }
}

当你看到此字段时，在继续任务前用一句话告知用户——例如：“注意：你的hermai-contribute工具（1.0.0）版本落后，最新版本为1.2.0。请运行
npx skills update hermai-contribute
进行升级。” 如果

meta.skill_update

不存在，说明用户使用的是最新版本——无需提示。

References

参考资料

references/coverage-checklist.md — read first. Interaction checklist by site type (commerce, news, social, booking). Use to decide when a schema is complete.
references/platforms.md — known platforms (Shopify, Shopline, WordPress, WACA, etc.) with their URL patterns and API conventions. Look up the platform before hand-discovering.
references/actions.md — discovering and documenting write operations (login, add-to-cart, submit review). CSRF, credentials, idempotency.
references/schema-format.md — v0.1 JSON spec, required fields, categories, description rules
references/sessions.md — session block for anti-bot sites, allowed/forbidden fields, name patterns
references/troubleshooting.md — validation error codes and fixes
references/install.md — CLI binary downloads per platform

references/coverage-checklist.md — 优先阅读。 按网站类型（电商、新闻、社交、预订）分类的交互操作清单。用于判断Schema是否完成。
references/platforms.md — 已知平台（Shopify、Shopline、WordPress、WACA等）及其URL模式和API约定。手动发现前先查询平台信息。
references/actions.md — 发现和记录写入操作（登录、加入购物车、提交评论）。CSRF、凭据、幂等性。
references/schema-format.md — v0.1 JSON规范、必填字段、分类、描述规则
references/sessions.md — 反机器人网站的会话块、允许/禁止字段、命名模式
references/troubleshooting.md — 验证错误代码及修复方法
references/install.md — 各平台的CLI二进制文件下载地址

hermai-contribute

Original

Translation

Hermai — Contribute schemas to the registry

Hermai — 向注册表贡献Schema

Before you run any command

运行任何命令之前

Quick start

快速开始

1. Probe the site to see what platform it uses

1. 探测网站以确定其使用的平台

2. Extract data from each interaction URL (product, category, search, etc.)

2. 从每个交互URL（商品、分类、搜索等）提取数据

3. Use intercept to capture XHR calls for dynamic pages (search, cart, filters)

3. 使用intercept捕获动态页面（搜索、购物车、筛选器）的XHR请求

4. Build a schema JSON with ALL interactions covered, then push

4. 构建包含所有交互操作的Schema JSON，然后推送

CLI discovery commands

CLI发现命令

Discovery pipeline

发现流程

What platform is it? (WordPress? Shopline? custom?)

它使用什么平台？（WordPress？Shopline？自定义？）

What standard paths exist?

存在哪些标准路径？

If GraphQL found — introspect it (gets reads AND writes)

如果发现GraphQL——自省它（获取读取和写入操作）

Static listing pages (category, article index)

静态列表页面（分类、文章索引）

Detail pages (product, article, profile)

详情页面（商品、文章、个人主页）

Dynamic pages (search, filters, cart) — intercept the real XHR

动态页面（搜索、筛选器、购物车）——捕获真实的XHR请求

Verify a captured XHR works standalone

验证捕获的XHR请求可独立运行

Writing descriptions (public vs. private split)

撰写描述（公开与私有拆分）

Top-level description — public, user-voice

顶层description——公开，用户视角

Per-endpoint fields — purpose vs. description

每个端点字段——purpose vs. description

Authentication

认证

Push

推送

CLI

CLI方式

API

API方式

Staying up to date

保持更新

References

参考资料

Top-level
`description`
— public, user-voice

顶层
`description`
——公开，用户视角

Per-endpoint fields —
`purpose`
vs.
`description`

每个端点字段——
`purpose`
vs.
`description`