hermai-contribute
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHermai — Contribute schemas to the registry
Hermai — 向注册表贡献Schema
Contributing a schema means reverse-engineering a site once so every future agent can call it without scraping.
For calling already-registered sites, use the hermai skill.
贡献Schema意味着只需对网站进行一次逆向分析,之后所有Agent都无需通过网页抓取即可调用该网站。
若要调用已注册的网站,请使用hermai工具。
Before you run any command
运行任何命令之前
Hermai is the interaction layer for agents, not just a read directory. A good schema covers what a user does on the site — browse, search, view, add to cart, log in, post — not just what's on the homepage.
Before running any discovery command, write down the interactions a user performs on this site. For a shop that's typically:
- Browse catalog / category listings
- Search by keyword
- View product detail
- Add to cart / view cart / update cart
- Log in / register / view account
- Checkout / place order
- Write review / view reviews
For news: list articles, view article, search articles, subscribe. For social: profile, posts, comments, follow, like. Different site type, different interactions.
A schema with only is 10% done. See references/coverage-checklist.md before declaring a schema complete.
product_detailHermai是Agent的交互层,而非单纯的只读目录。 优质的Schema应涵盖用户在网站上的操作行为——浏览、搜索、查看、加入购物车、登录、发布内容等,而非仅包含首页的内容。
在运行任何发现命令之前,请写下用户在该网站上执行的交互操作。以电商网站为例,通常包括:
- 浏览商品目录/分类列表
- 按关键词搜索
- 查看商品详情
- 加入购物车/查看购物车/更新购物车
- 登录/注册/查看账户
- 结账/下单
- 撰写评论/查看评论
对于新闻网站:列出文章、查看文章、搜索文章、订阅。对于社交平台:个人主页、帖子、评论、关注、点赞。不同类型的网站对应不同的交互操作。
仅包含的Schema仅完成了10%。 在宣布Schema完成前,请查看references/coverage-checklist.md。
product_detailQuick start
快速开始
bash
undefinedbash
undefined1. Probe the site to see what platform it uses
1. 探测网站以确定其使用的平台
hermai probe --stealth https://example.com
hermai probe --stealth https://example.com
2. Extract data from each interaction URL (product, category, search, etc.)
2. 从每个交互URL(商品、分类、搜索等)提取数据
hermai probe --body --stealth https://example.com/path | hermai extract
hermai probe --body --stealth https://example.com/path | hermai extract
3. Use intercept to capture XHR calls for dynamic pages (search, cart, filters)
3. 使用intercept捕获动态页面(搜索、购物车、筛选器)的XHR请求
hermai intercept https://example.com/search?q=test
hermai intercept https://example.com/search?q=test
4. Build a schema JSON with ALL interactions covered, then push
4. 构建包含所有交互操作的Schema JSON,然后推送
hermai registry push schema.json
undefinedhermai registry push schema.json
undefinedCLI discovery commands
CLI发现命令
These are deterministic tools — no API key needed. Chain them like an agent would.
| Command | Purpose |
|---|---|
| TLS-fingerprinted fetch, anti-bot detection, strategy discovery |
| Raw HTML to stdout (pipe to extract) |
| Extract all embedded data patterns from HTML |
| Extract one specific pattern |
| Probe 15 standard paths (robots, sitemap, RSS, GraphQL, oEmbed, WP API) |
| GraphQL schema discovery via introspection query |
| Identify anti-bot systems + platform/CMS |
| Launch browser, capture XHR/API calls, output replay specs |
| Replay a captured request with TLS fingerprinting |
| Full engine discovery (needs LLM key) |
这些是确定性工具——无需API密钥。可像Agent一样链式调用。
| 命令 | 用途 |
|---|---|
| TLS指纹抓取、反机器人检测、策略发现 |
| 将原始HTML输出到标准输出(可管道传输至extract) |
| 从HTML中提取所有嵌入的数据模式 |
| 提取特定的模式 |
| 探测15个标准路径(robots、sitemap、RSS、GraphQL、oEmbed、WP API等) |
| 通过自省查询发现GraphQL Schema |
| 识别反机器人系统 + 平台/CMS |
| 启动浏览器,捕获XHR/API请求,输出重放规范 |
| 使用TLS指纹重放捕获的请求 |
| 完整引擎发现(需要LLM密钥) |
Discovery pipeline
发现流程
Work one interaction at a time. Don't stop after the homepage — that's the trap that produces 2-endpoint schemas.
Phase 1: Classify the site.
bash
undefined逐个处理交互操作。不要在首页就停止——这是导致仅生成2个端点的Schema的陷阱。
阶段1:分类网站。
bash
undefinedWhat platform is it? (WordPress? Shopline? custom?)
它使用什么平台?(WordPress?Shopline?自定义?)
hermai detect --stealth https://example.com
hermai detect --stealth https://example.com
What standard paths exist?
存在哪些标准路径?
hermai wellknown example.com
hermai wellknown example.com
If GraphQL found — introspect it (gets reads AND writes)
如果发现GraphQL——自省它(获取读取和写入操作)
hermai introspect https://example.com/graphql
Use `script_hosts` and `preconnect_hosts` from detect to identify the platform. Known platforms (Shopify, Shopline, WordPress, Cyberbiz, WACA, EasyStore, etc.) have documented APIs — research them before hand-discovering. See [references/platforms.md](references/platforms.md) for a running list.
**Phase 2: Cover each interaction.**
For each user interaction you listed above, discover its endpoint:
```bashhermai introspect https://example.com/graphql
使用detect命令返回的`script_hosts`和`preconnect_hosts`识别平台。已知平台(Shopify、Shopline、WordPress、Cyberbiz、WACA、EasyStore等)有文档化的API——手动发现前先研究它们。请查看[references/platforms.md](references/platforms.md)获取最新列表。
**阶段2:覆盖每个交互操作。**
针对你上面列出的每个用户交互操作,发现其端点:
```bashStatic listing pages (category, article index)
静态列表页面(分类、文章索引)
hermai probe --body --stealth https://example.com/categories/toys
| hermai extract
| hermai extract
hermai probe --body --stealth https://example.com/categories/toys
| hermai extract
| hermai extract
Detail pages (product, article, profile)
详情页面(商品、文章、个人主页)
hermai probe --body --stealth https://example.com/products/item-123
| hermai extract
| hermai extract
hermai probe --body --stealth https://example.com/products/item-123
| hermai extract
| hermai extract
Dynamic pages (search, filters, cart) — intercept the real XHR
动态页面(搜索、筛选器、购物车)——捕获真实的XHR请求
hermai intercept https://example.com/search?q=gundam
hermai intercept https://example.com/search?q=gundam
Verify a captured XHR works standalone
验证捕获的XHR请求可独立运行
hermai replay request.json --stealth
Intercept is the tool for anything that updates dynamically — search-as-you-type, filter sidebars, cart updates, infinite scroll. The HTML alone won't reveal the API endpoint; you need to watch the network.
**Phase 3: Write interactions the site supports.**
Read-only schemas are 50% of the value. Capture writes too — login, add-to-cart, submit review. This is where `intercept` shines: perform the action in the browser and capture the POST request. See [references/actions.md](references/actions.md) for how to document write operations.
The `extract` command recognizes 13 embedded patterns: `ytInitialData`, `ytInitialPlayerResponse`, `__NEXT_DATA__`, `__UNIVERSAL_DATA_FOR_REHYDRATION__`, `SIGI_STATE`, `__APOLLO_STATE__`, `__PRELOADED_STATE__`, `__remixContext`, `__NUXT__`, `__NUXT_DATA__`, `__FRONTITY_CONNECT_STATE__`, `__MODERN_ROUTER_DATA__`, `__INITIAL_STATE__`.hermai replay request.json --stealth
Intercept适用于任何动态更新的内容——即时搜索、侧边栏筛选、购物车更新、无限滚动。仅靠HTML无法揭示API端点;你需要监控网络请求。
**阶段3:记录网站支持的交互操作。**
只读Schema仅能发挥50%的价值。也要捕获写入操作——登录、加入购物车、提交评论。这正是`intercept`的优势:在浏览器中执行操作并捕获POST请求。请查看[references/actions.md](references/actions.md)了解如何记录写入操作。
`extract`命令可识别13种嵌入模式:`ytInitialData`、`ytInitialPlayerResponse`、`__NEXT_DATA__`、`__UNIVERSAL_DATA_FOR_REHYDRATION__`、`SIGI_STATE`、`__APOLLO_STATE__`、`__PRELOADED_STATE__`、`__remixContext`、`__NUXT__`、`__NUXT_DATA__`、`__FRONTITY_CONNECT_STATE__`、`__MODERN_ROUTER_DATA__`、`__INITIAL_STATE__`。Writing descriptions (public vs. private split)
撰写描述(公开与私有拆分)
Read this every time before drafting a schema — this is the #1 pattern contributors get wrong, and it has a security angle.
The catalog publishes a public card (what the schema offers) and a full package (how to execute, paywalled behind an API key + intent). The split exists so the extraction recipe — parse paths, selectors, JSON script tag IDs, endpoint internals — stays behind the auth gate. If someone can read the recipe on the public card, they can re-implement the site themselves without ever calling our API, which defeats the point.
每次起草Schema前都要阅读本节——这是贡献者最容易出错的地方,且涉及安全层面。
目录会发布公开卡片(Schema提供的功能)和完整包(执行方法,需API密钥+意图才能访问)。拆分的目的是让提取方法——解析路径、选择器、JSON脚本标签ID、端点内部细节——保留在认证网关之后。如果有人能在公开卡片上看到提取方法,他们就可以自行实现网站调用,无需使用我们的API,这就失去了意义。
Top-level description
— public, user-voice
description顶层description
——公开,用户视角
descriptionOne or two sentences describing what information the caller can get, in the voice of a user deciding whether to use the schema.
Good:
"Search public repositories, get repository details, and list of users' public repos, etc." — github.com
"Read public Threads profiles and posts. Pulls a profile's display name, bio, follower and thread counts, plus every post in a thread with their text, images, timestamps, and like counts." — threads.com
Bad:
"A single GET to /@{user}/post/{id} returns the full thread chain inside <script type="application/json" data-sjs> blocks..."
"Usethen pipe tohermai probe --stealthto get the embedded NEXT_DATA payload..."hermai extract
Sanity check: "would this sentence still make sense to someone who has never used the CLI?" If not, rewrite.
用一到两句话描述调用者可获取的信息,以用户决定是否使用该Schema的视角撰写。
示例:
"搜索公开仓库,获取仓库详情以及用户的公开仓库列表等。" — github.com
"读取公开的Threads个人主页和帖子。获取个人主页的显示名称、简介、关注者和帖子数量,以及帖子线程中的每条帖子的文本、图片、时间戳和点赞数。" — threads.com
反面示例:
"向/@用户/帖子/{id}发送单个GET请求,可在<script type="application/json" data-sjs>块中获取完整的帖子线程链..."
"使用,然后管道传输至hermai probe --stealth以获取嵌入的__NEXT_DATA__负载..."hermai extract
合理性检查:“从未使用过CLI的人能理解这句话吗?” 如果不能,请重写。
Per-endpoint fields — purpose
vs. description
purposedescription每个端点字段——purpose
vs. description
purposedescriptionEvery endpoint carries two separate fields:
- — public, shown on the catalog card. One sentence, user-voice, names the data this endpoint returns. No URL paths, no jq, no CLI commands.
purpose - — private, only in the full package (requires API key + intent). The full technical how-to: URL template notes, parse paths, selectors, regex, JSON script tag IDs, field names, pagination semantics, edge cases. Write it for an agent that just pulled the schema and needs to execute.
description
Example:
json
{
"name": "post_detail",
"method": "GET",
"url_template": "https://www.threads.com/@{username}/post/{post_id}",
"purpose": "Get a post's full thread chain and every user reply, with text, images, timestamps, and like counts.",
"description": "Select every <script type=\"application/json\" data-sjs>…</script> in the HTML, JSON.parse each body, and recursively walk every object looking for the key `thread_items`. Each `thread_items[n].post` is a full post object with: `code` (shortcode = URL slug), `caption.text` (body markdown), `user.username`, `user.pk`, `like_count`, `taken_at`, `carousel_media[]`, `canonical_url`, `reply_facepile_users`. Filter by `user.username == {username}` to drop the 3–5 unrelated 'also on Threads' rail posts."
}Rule of thumb: if a line mentions a specific JSON key, script tag id, regex, or jq path — it goes in . If it says what you can learn or do — it goes in .
descriptionpurpose每个端点包含两个独立字段:
- ——公开,显示在目录卡片上。一句话,用户视角,说明该端点返回的数据。不包含URL路径、jq或CLI命令。
purpose - ——私有,仅在完整包中(需要API密钥+意图)。完整的技术实现指南:URL模板说明、解析路径、选择器、正则表达式、JSON脚本标签ID、字段名称、分页规则、边缘情况。为刚获取Schema并需要执行的Agent撰写。
description
示例:
json
{
"name": "post_detail",
"method": "GET",
"url_template": "https://www.threads.com/@{username}/post/{post_id}",
"purpose": "获取帖子的完整线程链和所有用户回复,包括文本、图片、时间戳和点赞数。",
"description": "选择HTML中所有<script type=\"application/json\" data-sjs>…</script>标签,解析每个标签的JSON内容,递归遍历所有对象寻找`thread_items`键。每个`thread_items[n].post`是完整的帖子对象,包含:`code`(短代码=URL slug)、`caption.text`(正文markdown)、`user.username`、`user.pk`、`like_count`、`taken_at`、`carousel_media[]`、`canonical_url`、`reply_facepile_users`。通过`user.username == {username}`过滤掉3-5个无关的“Threads上还有”推荐帖子。"
}经验法则:如果某行提到特定的JSON键、脚本标签ID、正则表达式或jq路径——放入。如果说明你能了解或执行的操作——放入。
descriptionpurposeAuthentication
认证
Pushing requires an API key:
bash
hermai registry loginKeys come from https://hermai.ai/dashboard (GitHub sign-in).
Push
推送
bash
undefinedbash
undefinedCLI
CLI方式
hermai registry push schema.json
hermai registry push schema.json
API
API方式
curl -X POST https://api.hermai.ai/v1/schemas
-H "Authorization: Bearer $KEY"
-H "Content-Type: application/json"
-d @schema.json
-H "Authorization: Bearer $KEY"
-H "Content-Type: application/json"
-d @schema.json
Pushing identical content twice is idempotent (content-hashed). If the push is rejected, see [references/troubleshooting.md](references/troubleshooting.md) for error codes.curl -X POST https://api.hermai.ai/v1/schemas
-H "Authorization: Bearer $KEY"
-H "Content-Type: application/json"
-d @schema.json
-H "Authorization: Bearer $KEY"
-H "Content-Type: application/json"
-d @schema.json
推送相同内容两次是幂等的(基于内容哈希)。如果推送被拒绝,请查看[references/troubleshooting.md](references/troubleshooting.md)了解错误代码。Staying up to date
保持更新
On every call to (and on every or that reaches it), send two headers so Hermai can notify the user when a newer release ships:
api.hermai.aihermai registry pushpullX-Hermai-Skill-Name: hermai-contribute
X-Hermai-Skill-Version: 1.0.0Read the version from this file's frontmatter — don't hardcode it. The stays .
namehermai-contributeIf a newer release exists, the response JSON will carry a object:
meta.skill_updatejson
{
"meta": {
"skill_update": {
"current": "1.0.0",
"latest": "1.2.0",
"update_cmd": "npx skills update hermai-contribute",
"changelog_url": "https://github.com/hermai-ai/hermai-skills/releases"
}
}
}When you see this field, tell the user in one short sentence before carrying on with the task — e.g. "Heads up: your hermai-contribute skill (1.0.0) is behind 1.2.0. Run to upgrade." If is absent, the user is current — no nudge needed.
npx skills update hermai-contributemeta.skill_update每次调用(以及每次调用或并连接到该API时),发送两个请求头,以便Hermai在有新版本发布时通知用户:
api.hermai.aihermai registry pushpullX-Hermai-Skill-Name: hermai-contribute
X-Hermai-Skill-Version: 1.0.0从本文件的前置元数据中读取版本——不要硬编码。固定为。
namehermai-contribute如果存在新版本,响应JSON将包含对象:
meta.skill_updatejson
{
"meta": {
"skill_update": {
"current": "1.0.0",
"latest": "1.2.0",
"update_cmd": "npx skills update hermai-contribute",
"changelog_url": "https://github.com/hermai-ai/hermai-skills/releases"
}
}
}当你看到此字段时,在继续任务前用一句话告知用户——例如:“注意:你的hermai-contribute工具(1.0.0)版本落后,最新版本为1.2.0。请运行进行升级。” 如果不存在,说明用户使用的是最新版本——无需提示。
npx skills update hermai-contributemeta.skill_updateReferences
参考资料
- references/coverage-checklist.md — read first. Interaction checklist by site type (commerce, news, social, booking). Use to decide when a schema is complete.
- references/platforms.md — known platforms (Shopify, Shopline, WordPress, WACA, etc.) with their URL patterns and API conventions. Look up the platform before hand-discovering.
- references/actions.md — discovering and documenting write operations (login, add-to-cart, submit review). CSRF, credentials, idempotency.
- references/schema-format.md — v0.1 JSON spec, required fields, categories, description rules
- references/sessions.md — session block for anti-bot sites, allowed/forbidden fields, name patterns
- references/troubleshooting.md — validation error codes and fixes
- references/install.md — CLI binary downloads per platform
- references/coverage-checklist.md — 优先阅读。 按网站类型(电商、新闻、社交、预订)分类的交互操作清单。用于判断Schema是否完成。
- references/platforms.md — 已知平台(Shopify、Shopline、WordPress、WACA等)及其URL模式和API约定。手动发现前先查询平台信息。
- references/actions.md — 发现和记录写入操作(登录、加入购物车、提交评论)。CSRF、凭据、幂等性。
- references/schema-format.md — v0.1 JSON规范、必填字段、分类、描述规则
- references/sessions.md — 反机器人网站的会话块、允许/禁止字段、命名模式
- references/troubleshooting.md — 验证错误代码及修复方法
- references/install.md — 各平台的CLI二进制文件下载地址