web-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

web-search Skill

web-search Skill

When to Use

使用场景

ALWAYS invoke this skill for any task involving:
  • Web search
  • Research or deep investigation
  • Scraping or data extraction from websites
  • Finding latest/current information
  • Comparing options from web sources
  • Any "find on the web" or "research X" request
Never perform web research using only transient memory or single-channel tools.
任何涉及以下内容的任务,请务必调用此Skill
  • 网页搜索
  • 主题调研或深度调查
  • 网站信息爬取或数据提取
  • 获取最新/实时信息
  • 对比来自网页的多种选项
  • 任何“在网页查找”或“调研X”的请求
请勿仅使用临时记忆或单渠道工具进行网页调研。

Core Principles (MUST Follow)

核心原则(必须遵守)

  1. Browser-First Anti-Bot Strategy: Prioritize human-like browser scraping using
    agent-browser
    (or
    playwright-cli
    via browser-switch) for rendered pages to avoid blocks.
  2. Parallel Retrieval: Launch multiple subagents/workers in parallel (Task tool when available), each with a distinct role:
    • Discovery: Prefer
      searxng-search
      /
      searxng-extract
      when installed; otherwise native web search, web fetch, Tavily, exa, or other available discovery tools.
    • Scraping:
      agent-browser
      (human-like navigation + snapshots);
      human-search
      cascade as fallback.
    • Synthesis:
      deep-research
      patterns toward
      90_synthesis.md
      .
    • One topic per research subagent: When the user asks about multiple distinct subjects, spawn separate subagents so each worker owns one topic only — each with its own
      docs/research/YYYYMMDD/{research_topic_slug}/
      folder. Do not assign unrelated questions to the same research subagent (avoids cross-contaminated context, checkpoints, and citations).
  3. Immediate Persistence: After every single source (search result, URL extracted, page rendered), append a structured checkpoint to the numbered artifact for that phase (e.g.
    10_discovery.md
    ,
    20_sources.md
    ). Do not proceed to the next source until the checkpoint is written. Required fields, example block, and persistence edge cases →
    references/checkpoint-template.md
    .
  4. Dated Artifact Convention: Use
    docs/research/YYYYMMDD/{research_topic_slug}/
    — for example
    docs/research/20260430/ai_coding_agents/
    . The research topic is carried by
    {research_topic_slug}
    , not baked into repeated strings inside filenames. Under that folder reuse the same phase filenames every time so layouts stay predictable. Typical files:
    • 00_plan.md
    • 10_discovery.md
    • 20_sources.md
      (checkpoints for fetched pages: browser, extract, etc.)
    • 90_synthesis.md
    • YYYYMMDD
      : Windows (PowerShell) →
      Get-Date -Format "yyyyMMdd"
      ; Linux/macOS (shell) →
      date +%Y%m%d
      ; generic (any OS) → Python
      datetime.date.today().strftime("%Y%m%d")
      .
  5. Synthesis from Disk Only: The final answer and
    90_synthesis.md
    must derive exclusively from persisted artifacts, never solely from conversation memory.
  6. Citations and Traceability: Follow
    deep-research
    patterns; citations must tie back to checkpoint entries and on-disk files.
  1. 浏览器优先的反机器人策略:优先使用
    agent-browser
    (或通过browser-switch调用
    playwright-cli
    )以类人方式爬取渲染后的页面,避免被拦截。
  2. 并行检索:并行启动多个子Agent/工作进程(若有Task工具则使用),每个进程承担不同角色:
    • 检索发现:若已安装
      searxng-search
      /
      searxng-extract
      则优先使用;否则使用原生网页搜索、网页抓取、Tavily、exa或其他可用的检索工具。
    • 内容爬取:使用
      agent-browser
      (类人导航+快照);将
      human-search
      作为级联备选方案。
    • 结果合成:遵循
      deep-research
      模式生成
      90_synthesis.md
    • 单一子Agent对应单一主题:当用户询问多个不同主题时,生成独立的子Agent,每个工作进程仅负责一个主题——每个主题对应独立的
      docs/research/YYYYMMDD/{research_topic_slug}/
      文件夹。请勿将无关问题分配给同一个调研子Agent(避免上下文、检查点和引用交叉污染)。
  3. 即时持久化:在获取每一个来源(搜索结果、提取的URL、渲染后的页面)后,将结构化检查点追加到对应阶段的编号文件中(例如
    10_discovery.md
    20_sources.md
    )。在检查点写入完成前,请勿处理下一个来源。必填字段、示例块和持久化边缘情况请参考
    references/checkpoint-template.md
  4. 带日期的文件命名规范:使用
    docs/research/YYYYMMDD/{research_topic_slug}/
    格式——例如
    docs/research/20260430/ai_coding_agents/
    调研主题
    {research_topic_slug}
    标识,无需在文件名中重复写入主题内容。该文件夹下每次都使用相同的阶段文件名,确保结构可预测。典型文件包括:
    • 00_plan.md
    • 10_discovery.md
    • 20_sources.md
      (已抓取页面的检查点:浏览器爬取、内容提取等)
    • 90_synthesis.md
    • YYYYMMDD
      生成方式:Windows(PowerShell)→
      Get-Date -Format "yyyyMMdd"
      ;Linux/macOS(Shell)→
      date +%Y%m%d
      ;通用(任意系统)→ Python
      datetime.date.today().strftime("%Y%m%d")
  5. 仅基于磁盘文件合成结果:最终答案和
    90_synthesis.md
    必须完全基于持久化的文件生成,不得仅依赖对话记忆。
  6. 引用与可追溯性:遵循
    deep-research
    模式;引用必须关联到检查点条目和磁盘文件。

Workflow checklist

工作流检查清单

Operational order for a research run. Tool install and orchestration roles → See also — referenced skills; checkpoint layout and fields →
references/checkpoint-template.md
:
  • Receive research query
  • Partition distinct topics → one
    {research_topic_slug}
    folder each under
    docs/research/YYYYMMDD/
  • Launch parallel subagents (Task tool): one subagent per topic for research workers; within each topic, parallelize by role (discovery / scraping / synthesis) as needed
  • Instruct each parallel subagent to use the referenced web-scraping and research skills and to save dated research files and checkpoints after each finding to avoid losing context. Each subagent should follow this skill’s rules.
  • After each source fetch: immediately append checkpoint
  • On write failure: retry once then use fallback file
  • After sufficient sources: run synthesis using deep-research patterns
  • Produce final response with citations linking back to persisted files
  • Confirm every expected artifact exists on disk and treat the run as incomplete until both synthesis and the final answer are grounded in those files (not conversation memory alone)
调研任务的执行顺序。工具安装与编排角色请参考——相关技能;检查点布局与字段请参考
references/checkpoint-template.md
  • 接收调研请求
  • 拆分不同主题→ 在
    docs/research/YYYYMMDD/
    下为每个主题创建一个
    {research_topic_slug}
    文件夹
  • 启动并行子Agent(使用Task工具):每个调研主题对应一个子Agent;每个主题内可根据角色(检索发现/内容爬取/结果合成)按需并行处理
  • 指示每个并行子Agent使用指定的网页爬取和调研技能,并在每次获取结果后保存带日期的调研文件和检查点,避免丢失上下文。每个子Agent都应遵守此Skill的规则。
  • 获取每个来源后:立即追加检查点
  • 写入失败时:重试一次,然后使用备选文件
  • 获取足够来源后:使用deep-research模式执行结果合成
  • 生成包含引用的最终响应,引用需关联到持久化文件
  • 确认所有预期文件都已保存到磁盘,且只有当合成结果和最终答案均基于这些文件(而非仅依赖对话记忆)时,才视为任务完成,否则任务未完成

See also — referenced skills (install + orchestration roles)

相关技能参考(安装与编排角色)

Install missing skills by copying each repo’s skill folder into your agent’s skills directory.
web-search
orchestrates these tools rather than replacing them:
  • human-search: Primary intelligent cascade. Tiered fallback (native → Python scraper → browser CLI → crawl4ai); use as the default retrieval engine for robustness.
  • agent-browser: Primary anti-bot scraping. Workflow:
    open
    snapshot -i
    → extract using refs →
    re-snapshot
    after changes. Use named sessions for parallel work. Strongest human-like resilience.
  • searxng-search + searxng-extract: Fast, free, unlimited local discovery and extraction (no API keys). Ideal for gathering initial URL candidates.
  • deep-research: Final synthesis phase — structured reporting, citation management, progressive disclosure, professional formatting for
    90_synthesis.md
    .
  • browser-switch: Picks agent-browser, playwright-cli, or other browser backends based on context.
  • playwright-cli: Backup browser automation when agent-browser is unavailable or when specific Playwright features are needed.
Tip: After installing missing skills, tell your subagents the skill paths to use, otherwise they might not discover the skills until the agent restart.
如需安装缺失的技能,请将每个仓库的技能文件夹复制到你的Agent技能目录中。
web-search
负责编排这些工具,而非替代它们:
  • human-search:核心智能级联工具。提供分层备选方案(原生搜索→Python爬虫→浏览器CLI→crawl4ai);作为默认检索引擎,具备高鲁棒性。
  • agent-browser:核心反机器人爬取工具。工作流:
    open
    snapshot -i
    → 参考相关内容提取信息 → 变更后
    re-snapshot
    。并行工作时使用命名会话。具备最强的类人爬取防护能力。
  • searxng-search + searxng-extract:快速、免费、无限制的本地检索与提取工具(无需API密钥)。适用于收集初始URL候选列表。
  • deep-research:最终结果合成阶段工具——提供结构化报告、引用管理、渐进式披露功能,为
    90_synthesis.md
    生成专业格式内容。
  • browser-switch:根据上下文选择agent-browser、playwright-cli或其他浏览器后端。
  • playwright-cli:当agent-browser不可用或需要特定Playwright功能时,作为备用浏览器自动化工具。
提示:安装缺失的技能后,请告知子Agent技能的使用路径,否则它们可能需要等到Agent重启后才能发现这些技能。