web-download
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWeb Download
Web Download
Overview
Overview
为中的每个节点进行网络调研,收集并保存可验证、可追溯的参考资料。多个子代理并行工作,每个子代理负责一个或多个节点的材料收集。
node-list.txtConduct web research for each node in , collect and save verifiable, traceable reference materials. Multiple sub-agents work in parallel, with each sub-agent responsible for collecting materials for one or more nodes.
node-list.txtWorkflow
Workflow
1. 配置参数(开始前必做)
1. Configure Parameters (Mandatory Before Starting)
使用AskUserQuestion工具询问用户配置参数,确保API调用频率合理:
问题:同时开启多少个子代理进行并行调研?
选项:
- 1个:最保守,适合有限资源场景
- 2个:默认推荐,平衡效率与稳定性
- 3个:适中,适合节点较多的场景
- agent自己决定:根据节点数量智能调整(最多3个)问题:每个节点最多进行几次Web Search搜索?
选项:
- 1次:快速收集基础资料
- 2次:默认推荐,平衡覆盖面与效率
- 3次:深入收集,适合重要节点问题:每次搜索结果最多进行几次Web Fetch读取网页?
选项:
- 1次:仅读取最相关的结果
- 2次:读取前2个相关结果
- 3次:默认推荐,充分覆盖搜索结果问题:每次搜索结果最多保存几个网页/文档?
选项:
- 1个:仅保存最相关的资料
- 2个:保存前2个相关资料
- 3个:默认推荐,确保资料多样性默认配置(为避免API调用频率过高):
- 子代理数量:最多2个
- 每节点搜索次数:最多2次
- 每次搜索Web Fetch次数:最多3次
- 每次搜索保存次数:最多3个
Use the AskUserQuestion tool to ask users for configuration parameters to ensure reasonable API call frequency:
Question: How many sub-agents to launch for parallel research?
Options:
- 1: Most conservative, suitable for resource-constrained scenarios
- 2: Default recommendation, balances efficiency and stability
- 3: Moderate, suitable for scenarios with many nodes
- Agent decides automatically: Intelligently adjust based on the number of nodes (max 3)Question: What's the maximum number of Web Search attempts per node?
Options:
- 1: Quickly collect basic information
- 2: Default recommendation, balances coverage and efficiency
- 3: In-depth collection, suitable for important nodesQuestion: What's the maximum number of Web Fetch attempts to read webpages per search result?
Options:
- 1: Only read the most relevant result
- 2: Read the top 2 relevant results
- 3: Default recommendation, fully covers search resultsQuestion: What's the maximum number of webpages/documents to save per search result?
Options:
- 1: Only save the most relevant material
- 2: Save the top 2 relevant materials
- 3: Default recommendation, ensures material diversityDefault Configuration (to avoid excessive API calls):
- Number of sub-agents: Max 2
- Search attempts per node: Max 2
- Web Fetch attempts per search: Max 3
- Save attempts per search: Max 3
2. 读取节点列表
2. Read Node List
从读取待处理的节点列表。
node-list.txtRead the list of nodes to process from .
node-list.txt3. 并行调研策略
3. Parallel Research Strategy
根据用户配置启动子代理(使用Task工具并行执行):
- 严格限制子代理数量不超过用户设定值
- 每个子代理处理1-2个节点
- 将节点列表平均分配给各子代理
示例分配(6个节点,2个子代理):
子代理1: 节点1, 节点2, 节点3
子代理2: 节点4, 节点5, 节点6Launch sub-agents according to user configuration (execute in parallel using the Task tool):
- Strictly limit the number of sub-agents to the user-specified value
- Each sub-agent handles 1-2 nodes
- Distribute the node list evenly among sub-agents
Example Allocation (6 nodes, 2 sub-agents):
Sub-agent 1: Node 1, Node 2, Node 3
Sub-agent 2: Node 4, Node 5, Node 64. 深度检索方法(严格限制)
4. Deep Retrieval Method (Strict Restrictions)
搜索策略(严格限制搜索次数):
- 每个节点最多进行用户配置次数的Web Search
- 优先使用不同的关键词组合获取多样化结果
- 包含中英文双语搜索(在限制次数内)
搜索关键词构建(在限制次数内选择):
第1次搜索:"{节点名称}"
第2次搜索:"{节点名称} 原理 教程" 或 "{节点名称} guide"Web Fetch限制:
- 每次搜索结果最多进行用户配置次数的Web Fetch
- 优先选择官方文档和权威来源
- 跳过重复或低质量的URL
保存限制:
- 每次搜索结果最多保存用户配置数量的网页
- 优先保存完整度高、内容丰富的资料
Search Strategy (strictly limited by search attempts):
- Each node can have a maximum of user-configured Web Search attempts
- Prioritize using different keyword combinations to obtain diverse results
- Include both Chinese and English searches (within the limited attempts)
Search Keyword Construction (select within limited attempts):
1st search: "{node name}"
2nd search: "{node name} principle tutorial" or "{node name} guide"Web Fetch Restrictions:
- A maximum of user-configured Web Fetch attempts per search result
- Prioritize official documents and authoritative sources
- Skip duplicate or low-quality URLs
Save Restrictions:
- A maximum of user-configured webpages saved per search result
- Prioritize saving materials with high completeness and rich content
4. 资料收集与保存
4. Material Collection and Saving
目标资料类型:
- 技术文档与官方指南
- 学术论文与研究报告
- 技术博客与教程
- 实践案例与代码示例
保存规则:
- 创建目录存储所有资料
materials/ - 使用web_reader工具获取完整网页内容
- 每个资料保存为独立文件,命名格式:
{节点索引}_{来源标识}.{ext} - 支持的文件格式:
- - Markdown格式内容
.md - - 纯文本内容
.txt - - 结构化数据
.json
Target Material Types:
- Technical documents and official guides
- Academic papers and research reports
- Technical blogs and tutorials
- Practical cases and code examples
Saving Rules:
- Create a directory to store all materials
materials/ - Use the web_reader tool to obtain complete webpage content
- Save each material as an independent file, naming format:
{node index}_{source identifier}.{ext} - Supported file formats:
- - Markdown format content
.md - - Plain text content
.txt - - Structured data
.json
5. 输出格式
5. Output Format
创建文件:
download.txt节点1内容: {节点1_材料1.md: 来源URL1}, {节点1_材料2.md: 来源URL2}
节点2内容: {节点2_材料1.md: 来源URL1}, {节点2_材料2.md: 来源URL2}
...文件命名规范:
- 使用格式
{序号}_{简短描述}.{扩展名} - 序号与node-list.txt中的行号对应
- 简短描述反映资料主题
Create a file:
download.txtNode 1 content: {node1_material1.md: source URL1}, {node1_material2.md: source URL2}
Node 2 content: {node2_material1.md: source URL1}, {node2_material2.md: source URL2}
...File Naming Specification:
- Use the format
{serial number}_{brief description}.{extension} - The serial number corresponds to the line number in node-list.txt
- The brief description reflects the material's topic
Scripts
Scripts
scripts/parallel_fetch.py
scripts/parallel_fetch.pyscripts/parallel_fetch.py
scripts/parallel_fetch.py并行下载工具,用于加速多个URL的内容获取。
功能:
- 并发下载多个网页
- 自动重试失败的请求
- 进度显示与错误报告
Parallel downloading tool for accelerating content retrieval from multiple URLs.
Features:
- Concurrent downloading of multiple webpages
- Automatic retries for failed requests
- Progress display and error reporting
scripts/validate_sources.py
scripts/validate_sources.pyscripts/validate_sources.py
scripts/validate_sources.py验证资料完整性与可访问性。
功能:
- 检查已下载资料的完整性
- 验证URL的可访问性
- 生成资料质量报告
Verify material integrity and accessibility.
Features:
- Check the integrity of downloaded materials
- Verify URL accessibility
- Generate material quality reports
Examples
Examples
示例:节点调研(默认配置)
Example: Node Research (Default Configuration)
用户配置:2个子代理,每节点2次搜索,每次3次fetch,保存3个资料
输入 ():
node-list.txtReact Hooks入门
Docker容器化技术搜索策略(严格限制):
节点1: React Hooks入门
- 搜索1: "React Hooks 入门教程"
- Fetch: 官方文档、技术博客(最多3次)
- 保存: 3个最相关的资料
- 搜索2: "React Hooks best practices"
- Fetch: 最佳实践相关文章(最多3次)
- 保存: 3个最相关的资料输出 ():
download.txtReact Hooks入门: {1_hooks_intro.md: https://react.dev/learn}, {1_hooks_guide.md: https://www.runoob.com/reactjs/react-hooks.html}, {1_hooks_best_practices.md: https://blog.logrocket.com/guide-to-react-hooks/}
Docker容器化技术: {2_docker_intro.md: https://docs.docker.com/get-started/}, {2_docker_tutorial.md: https://yeasy.gitbook.io/docker_practice/}User Configuration: 2 sub-agents, 2 searches per node, 3 fetches per search, save 3 materials
Input ():
node-list.txtIntroduction to React Hooks
Docker Containerization TechnologySearch Strategy (strictly restricted):
Node 1: Introduction to React Hooks
- Search 1: "React Hooks introduction tutorial"
- Fetch: Official documents, technical blogs (max 3 attempts)
- Save: 3 most relevant materials
- Search 2: "React Hooks best practices"
- Fetch: Articles related to best practices (max 3 attempts)
- Save: 3 most relevant materialsOutput ():
download.txtIntroduction to React Hooks: {1_hooks_intro.md: https://react.dev/learn}, {1_hooks_guide.md: https://www.runoob.com/reactjs/react-hooks.html}, {1_hooks_best_practices.md: https://blog.logrocket.com/guide-to-react-hooks/}
Docker Containerization Technology: {2_docker_intro.md: https://docs.docker.com/get-started/}, {2_docker_tutorial.md: https://yeasy.gitbook.io/docker_practice/}示例:快速收集(低配模式)
Example: Quick Collection (Low-Configuration Mode)
用户配置:1个子代理,每节点1次搜索,每次1次fetch,保存1个资料
适用场景:快速验证、资源受限、测试流程
特点:
- 最小化API调用
- 快速完成收集
- 资料基础但够用
User Configuration: 1 sub-agent, 1 search per node, 1 fetch per search, save 1 material
Applicable Scenarios: Quick verification, resource-constrained environments, process testing
Features:
- Minimizes API calls
- Completes collection quickly
- Basic but sufficient materials
Materials目录结构
Materials Directory Structure
materials/
├── 1_hooks_intro.md
├── 1_hooks_guide.md
├── 1_hooks_best_practices.md
├── 2_docker_intro.md
├── 2_docker_tutorial.md
├── 3_microservices_patterns.md
└── 3_microservices_guide.mdmaterials/
├── 1_hooks_intro.md
├── 1_hooks_guide.md
├── 1_hooks_best_practices.md
├── 2_docker_intro.md
├── 2_docker_tutorial.md
├── 3_microservices_patterns.md
└── 3_microservices_guide.mdTroubleshooting
Troubleshooting
| 问题 | 解决方案 |
|---|---|
| 某个节点找不到资料 | 尝试不同关键词,扩大搜索范围 |
| 网页内容无法获取 | 使用web_reader工具获取完整内容 |
| 资料质量不佳 | 优先选择官方文档、权威来源 |
| 并行请求失败 | 减少并发数,添加重试机制 |
| 资料重复 | 去重并合并相似内容 |
| Problem | Solution |
|---|---|
| No materials found for a node | Try different keywords to expand search scope |
| Unable to retrieve webpage content | Use the web_reader tool to get complete content |
| Poor material quality | Prioritize official documents and authoritative sources |
| Parallel request failure | Reduce concurrency and add retry mechanism |
| Duplicate materials | Deduplicate and merge similar content |
Quality Standards
Quality Standards
每个节点应收集:
- 至少2-3个高质量资料来源
- 涵盖不同角度(理论+实践)
- 优先级排序:官方文档 > 权威教程 > 技术博客 > 个人笔记
- 时间要求:优先选择近1-2年的资料(技术快速迭代领域)
Each node should collect:
- At least 2-3 high-quality source materials
- Cover different perspectives (theory + practice)
- Priority ranking: Official documents > Authoritative tutorials > Technical blogs > Personal notes
- Time requirement: Prioritize materials from the last 1-2 years (for fields with rapid technological iteration)