web-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseweb-search Skill
web-search Skill
When to Use
使用场景
ALWAYS invoke this skill for any task involving:
- Web search
- Research or deep investigation
- Scraping or data extraction from websites
- Finding latest/current information
- Comparing options from web sources
- Any "find on the web" or "research X" request
Never perform web research using only transient memory or single-channel tools.
任何涉及以下内容的任务,请务必调用此Skill:
- 网页搜索
- 主题调研或深度调查
- 网站信息爬取或数据提取
- 获取最新/实时信息
- 对比来自网页的多种选项
- 任何“在网页查找”或“调研X”的请求
请勿仅使用临时记忆或单渠道工具进行网页调研。
Core Principles (MUST Follow)
核心原则(必须遵守)
-
Browser-First Anti-Bot Strategy: Prioritize human-like browser scraping using(or
agent-browservia browser-switch) for rendered pages to avoid blocks.playwright-cli -
Parallel Retrieval: Launch multiple subagents/workers in parallel (Task tool when available), each with a distinct role:
- Discovery: Prefer /
searxng-searchwhen installed; otherwise native web search, web fetch, Tavily, exa, or other available discovery tools.searxng-extract - Scraping: (human-like navigation + snapshots);
agent-browsercascade as fallback.human-search - Synthesis: patterns toward
deep-research.90_synthesis.md - One topic per research subagent: When the user asks about multiple distinct subjects, spawn separate subagents so each worker owns one topic only — each with its own folder. Do not assign unrelated questions to the same research subagent (avoids cross-contaminated context, checkpoints, and citations).
docs/research/YYYYMMDD/{research_topic_slug}/
- Discovery: Prefer
-
Immediate Persistence: After every single source (search result, URL extracted, page rendered), append a structured checkpoint to the numbered artifact for that phase (e.g.,
10_discovery.md). Do not proceed to the next source until the checkpoint is written. Required fields, example block, and persistence edge cases →20_sources.md.references/checkpoint-template.md -
Dated Artifact Convention: Use— for example
docs/research/YYYYMMDD/{research_topic_slug}/. The research topic is carried bydocs/research/20260430/ai_coding_agents/, not baked into repeated strings inside filenames. Under that folder reuse the same phase filenames every time so layouts stay predictable. Typical files:{research_topic_slug}00_plan.md10_discovery.md- (checkpoints for fetched pages: browser, extract, etc.)
20_sources.md 90_synthesis.md- : Windows (PowerShell) →
YYYYMMDD; Linux/macOS (shell) →Get-Date -Format "yyyyMMdd"; generic (any OS) → Pythondate +%Y%m%d.datetime.date.today().strftime("%Y%m%d")
-
Synthesis from Disk Only: The final answer andmust derive exclusively from persisted artifacts, never solely from conversation memory.
90_synthesis.md -
Citations and Traceability: Followpatterns; citations must tie back to checkpoint entries and on-disk files.
deep-research
-
浏览器优先的反机器人策略:优先使用(或通过browser-switch调用
agent-browser)以类人方式爬取渲染后的页面,避免被拦截。playwright-cli -
并行检索:并行启动多个子Agent/工作进程(若有Task工具则使用),每个进程承担不同角色:
- 检索发现:若已安装/
searxng-search则优先使用;否则使用原生网页搜索、网页抓取、Tavily、exa或其他可用的检索工具。searxng-extract - 内容爬取:使用(类人导航+快照);将
agent-browser作为级联备选方案。human-search - 结果合成:遵循模式生成
deep-research。90_synthesis.md - 单一子Agent对应单一主题:当用户询问多个不同主题时,生成独立的子Agent,每个工作进程仅负责一个主题——每个主题对应独立的文件夹。请勿将无关问题分配给同一个调研子Agent(避免上下文、检查点和引用交叉污染)。
docs/research/YYYYMMDD/{research_topic_slug}/
- 检索发现:若已安装
-
即时持久化:在获取每一个来源(搜索结果、提取的URL、渲染后的页面)后,将结构化检查点追加到对应阶段的编号文件中(例如、
10_discovery.md)。在检查点写入完成前,请勿处理下一个来源。必填字段、示例块和持久化边缘情况请参考20_sources.md。references/checkpoint-template.md -
带日期的文件命名规范:使用格式——例如
docs/research/YYYYMMDD/{research_topic_slug}/。调研主题由docs/research/20260430/ai_coding_agents/标识,无需在文件名中重复写入主题内容。该文件夹下每次都使用相同的阶段文件名,确保结构可预测。典型文件包括:{research_topic_slug}00_plan.md10_discovery.md- (已抓取页面的检查点:浏览器爬取、内容提取等)
20_sources.md 90_synthesis.md- 生成方式:Windows(PowerShell)→
YYYYMMDD;Linux/macOS(Shell)→Get-Date -Format "yyyyMMdd";通用(任意系统)→ Pythondate +%Y%m%d。datetime.date.today().strftime("%Y%m%d")
-
仅基于磁盘文件合成结果:最终答案和必须完全基于持久化的文件生成,不得仅依赖对话记忆。
90_synthesis.md -
引用与可追溯性:遵循模式;引用必须关联到检查点条目和磁盘文件。
deep-research
Workflow checklist
工作流检查清单
Operational order for a research run. Tool install and orchestration roles → See also — referenced skills; checkpoint layout and fields → :
references/checkpoint-template.md- Receive research query
- Partition distinct topics → one folder each under
{research_topic_slug}docs/research/YYYYMMDD/ - Launch parallel subagents (Task tool): one subagent per topic for research workers; within each topic, parallelize by role (discovery / scraping / synthesis) as needed
- Instruct each parallel subagent to use the referenced web-scraping and research skills and to save dated research files and checkpoints after each finding to avoid losing context. Each subagent should follow this skill’s rules.
- After each source fetch: immediately append checkpoint
- On write failure: retry once then use fallback file
- After sufficient sources: run synthesis using deep-research patterns
- Produce final response with citations linking back to persisted files
- Confirm every expected artifact exists on disk and treat the run as incomplete until both synthesis and the final answer are grounded in those files (not conversation memory alone)
调研任务的执行顺序。工具安装与编排角色请参考——相关技能;检查点布局与字段请参考:
references/checkpoint-template.md- 接收调研请求
- 拆分不同主题→ 在下为每个主题创建一个
docs/research/YYYYMMDD/文件夹{research_topic_slug} - 启动并行子Agent(使用Task工具):每个调研主题对应一个子Agent;每个主题内可根据角色(检索发现/内容爬取/结果合成)按需并行处理
- 指示每个并行子Agent使用指定的网页爬取和调研技能,并在每次获取结果后保存带日期的调研文件和检查点,避免丢失上下文。每个子Agent都应遵守此Skill的规则。
- 获取每个来源后:立即追加检查点
- 写入失败时:重试一次,然后使用备选文件
- 获取足够来源后:使用deep-research模式执行结果合成
- 生成包含引用的最终响应,引用需关联到持久化文件
- 确认所有预期文件都已保存到磁盘,且只有当合成结果和最终答案均基于这些文件(而非仅依赖对话记忆)时,才视为任务完成,否则任务未完成
See also — referenced skills (install + orchestration roles)
相关技能参考(安装与编排角色)
Install missing skills by copying each repo’s skill folder into your agent’s skills directory. orchestrates these tools rather than replacing them:
web-search- human-search: Primary intelligent cascade. Tiered fallback (native → Python scraper → browser CLI → crawl4ai); use as the default retrieval engine for robustness.
- agent-browser: Primary anti-bot scraping. Workflow: →
open→ extract using refs →snapshot -iafter changes. Use named sessions for parallel work. Strongest human-like resilience.re-snapshot - searxng-search + searxng-extract: Fast, free, unlimited local discovery and extraction (no API keys). Ideal for gathering initial URL candidates.
- deep-research: Final synthesis phase — structured reporting, citation management, progressive disclosure, professional formatting for .
90_synthesis.md - browser-switch: Picks agent-browser, playwright-cli, or other browser backends based on context.
- playwright-cli: Backup browser automation when agent-browser is unavailable or when specific Playwright features are needed.
Tip: After installing missing skills, tell your subagents the skill paths to use, otherwise they might not discover the skills until the agent restart.
如需安装缺失的技能,请将每个仓库的技能文件夹复制到你的Agent技能目录中。负责编排这些工具,而非替代它们:
web-search- human-search:核心智能级联工具。提供分层备选方案(原生搜索→Python爬虫→浏览器CLI→crawl4ai);作为默认检索引擎,具备高鲁棒性。
- agent-browser:核心反机器人爬取工具。工作流:→
open→ 参考相关内容提取信息 → 变更后snapshot -i。并行工作时使用命名会话。具备最强的类人爬取防护能力。re-snapshot - searxng-search + searxng-extract:快速、免费、无限制的本地检索与提取工具(无需API密钥)。适用于收集初始URL候选列表。
- deep-research:最终结果合成阶段工具——提供结构化报告、引用管理、渐进式披露功能,为生成专业格式内容。
90_synthesis.md - browser-switch:根据上下文选择agent-browser、playwright-cli或其他浏览器后端。
- playwright-cli:当agent-browser不可用或需要特定Playwright功能时,作为备用浏览器自动化工具。
提示:安装缺失的技能后,请告知子Agent技能的使用路径,否则它们可能需要等到Agent重启后才能发现这些技能。