parallel-findall
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFindAll: Entity Discovery
FindAll:实体发现
Find: $ARGUMENTS
Requires≥ 0.3.0 (theparallel-clicommand was added in 0.3.0). Iffindallerrors withparallel-cli findallor similar, tell the user to runno such command(orparallel-cli updateif installed via pipx), then retry.pipx upgrade parallel-web-tools
查找:$ARGUMENTS
需要≥ 0.3.0(parallel-cli命令在0.3.0版本中新增)。如果findall执行时出现“no such command”或类似错误,请告知用户运行parallel-cli findall(若通过pipx安装则运行parallel-cli update),然后重试。pipx upgrade parallel-web-tools
When to use this skill
何时使用此技能
Use FindAll when the user wants a structured list of entities matching a description, not webpages or a narrative answer.
| User asks for… | Use |
|---|---|
| "Find all X that…" / "List every Y…" | parallel-findall (this skill) |
| Webpage results / quick answers / current info | parallel-web-search |
| Narrative report / analysis / "research X" | parallel-deep-research |
| Add fields to a list you already have | parallel-data-enrichment |
If the user already has a list and just wants to add fields, this is the wrong skill — use parallel-data-enrichment.
当用户需要符合描述的结构化实体列表,而非网页或叙述性答案时,使用FindAll。
| 用户需求… | 使用工具 |
|---|---|
| "查找所有满足……的X" / "列出所有……的Y" | parallel-findall(本技能) |
| 网页结果 / 快速答案 / 实时信息 | parallel-web-search |
| 叙述性报告 / 分析 / "研究X" | parallel-deep-research |
| 为已有列表添加字段 | parallel-data-enrichment |
如果用户已有列表且仅需添加字段,则不应使用本技能——请使用parallel-data-enrichment。
Step 1: Start the run
步骤1:启动任务
bash
parallel-cli findall run "$ARGUMENTS" --no-wait --jsonDefaults: generator , match limit . Stick with unless the user has a reason to escalate:
core10core- — most thorough generator (slower, costlier). Use when the user asks for "comprehensive" coverage or matches are sparse on
-g procore - — fastest, but markedly lower quality. Often returns query-echo entities (e.g., directory pages, the literal query string), entries with no URL, or category placeholders. Only use if the user explicitly asks for a quick scan and accepts noise; otherwise prefer
-g basecore - — return up to 50 matched entities (5–1000 allowed)
-n 50
If the user wants to exclude known entities (e.g., "find competitors but not Google or OpenAI"):
bash
parallel-cli findall run "$ARGUMENTS" --no-wait --json \
--exclude '[{"name":"Google","url":"google.com"},{"name":"OpenAI","url":"openai.com"}]'Tip — preview the schema first if the objective is ambiguous: shows the entity type and match conditions the API inferred, so you can refine wording before paying for a run.
parallel-cli findall ingest "$ARGUMENTS" --jsonParse the JSON output to extract the and any monitoring URL. Tell the user:
findall_id- A FindAll run has been started
- Approximate cadence (minutes for , longer for
core)pro - They can keep working while it runs
bash
parallel-cli findall run "$ARGUMENTS" --no-wait --json默认设置:生成器为 ,匹配上限为10。除非用户有特殊需求,否则保持使用:
corecore- —— 最全面的生成器(速度较慢,成本较高)。当用户要求“全面覆盖”或
-g pro生成的匹配结果较少时使用core - —— 速度最快,但质量明显较低。常返回重复查询实体(如目录页面、字面查询字符串)、无URL的条目或分类占位符。仅当用户明确要求快速扫描且可接受噪声时使用;否则优先选择
-g basecore - —— 返回最多50个匹配实体(允许范围5–1000)
-n 50
如果用户希望排除已知实体(例如:“查找竞争对手,但排除Google和OpenAI”):
bash
parallel-cli findall run "$ARGUMENTS" --no-wait --json \
--exclude '[{"name":"Google","url":"google.com"},{"name":"OpenAI","url":"openai.com"}]'小贴士——如果目标不明确,可先预览模式: 会显示API推断出的实体类型和匹配条件,以便你在付费运行前优化查询措辞。
parallel-cli findall ingest "$ARGUMENTS" --json解析JSON输出以提取和监控URL。告知用户:
findall_id- FindAll任务已启动
- 大致完成时长(需数分钟,
core耗时更久)pro - 他们可在任务运行期间继续其他工作
Step 2: Poll for results
步骤2:轮询结果
Choose a descriptive filename (e.g., , ). Use lowercase with hyphens, no spaces.
series-a-ai-2026charlotte-roofersbash
parallel-cli findall poll "$FINDALL_ID" -o "/tmp/$FILENAME.json" --timeout 540Important:
- Use (9 minutes) to stay within tool execution limits
--timeout 540 - Do NOT pass for large result sets — it will flood context.
--jsonsaves the full results to disk-o
选择一个描述性文件名(例如:、)。使用小写字母加连字符,不要有空格。
series-a-ai-2026charlotte-roofersbash
parallel-cli findall poll "$FINDALL_ID" -o "/tmp/$FILENAME.json" --timeout 540重要提示:
- 使用(9分钟)以符合工具执行限制
--timeout 540 - 对于大型结果集,请勿传递参数——这会占用大量上下文。
--json参数会将完整结果保存到磁盘-o
If the poll times out
如果轮询超时
Re-run the same command to continue waiting. Server-side the run continues regardless.
parallel-cli findall poll重新运行相同的命令即可继续等待。任务会在服务器端持续运行,不受影响。
parallel-cli findall pollResponse format
响应格式
Before presenting matches, filter the results for obvious noise:
- Drop entries with empty/missing
url - Drop entries whose echoes the user's query (e.g., literal "YC W25 batch companies in developer tools") — those are search-result placeholders, not real entities
name - Drop entries whose is a third-party directory or profile page rather than the entity's own domain. Concretely: drop URLs on
url,linkedin.com,ycombinator.com/companies/...,crunchbase.com, generic news/blog posts about the entity, etc. The URL should be something the entity itself owns (its product site, docs, or marketing site)pitchbook.com
If filtering removes a meaningful share of matches, mention this to the user and suggest re-running with or a higher .
-g pro-nSanity-check results. The base generator can hallucinate categorical attributes (e.g., return a YC S22 company as a YC W25 match). The filter rules above only catch URL/name shape, not factual correctness. If the user's query has a falsifiable attribute (a specific batch, year, geography, etc.), spot-check the kept entries against the source URL and flag any that don't fit. Recommend re-running with (or higher) if either multiple kept entries fail the spot-check or noise filtering dropped a meaningful share of the matched set (say, ≥40%) — both indicate isn't producing reliable results for this query.
-g base-g corebasePresent the remaining (real) entities as a markdown table or list. Lead with the count, then list each entity with its name, URL, and a one-line description if available. Cite each entity with its source URL.
Tell the user:
- How many entities were matched (and how many were filtered as noise, if any)
- The full results path ()
/tmp/$FILENAME.json - That they can:
-
Add fields to these results, e.g.:bash
parallel-cli findall enrich $FINDALL_ID '{"properties":{"ceo":{"type":"string"},"employee_count":{"type":"number"}}}'The schema is a JSON Schema-style object withmapping field names →properties.{type, description?} -
Get more matches:
parallel-cli findall extend $FINDALL_ID 50
-
在展示匹配结果前,过滤明显的噪声条目:
- 删除URL为空或缺失的条目
- 删除名称与用户查询重复的条目(例如:字面意义上的“YC W25批次开发工具公司”)——这些是搜索结果占位符,并非真实实体
- 删除URL为第三方目录或个人资料页面的条目,仅保留实体自有域名的URL。具体来说:删除、
linkedin.com、ycombinator.com/companies/...、crunchbase.com上的URL,以及关于该实体的通用新闻/博客文章等。URL应为实体自身拥有的网站(如产品官网、文档或营销站点)pitchbook.com
如果过滤操作移除了大量有意义的匹配结果,请告知用户并建议使用或更高的参数重新运行。
-g pro-n对的结果进行合理性检查。基础生成器可能会虚构分类属性(例如:将YC S22批次的公司标记为YC W25匹配结果)。上述过滤规则仅检查URL/名称格式,无法验证事实正确性。如果用户的查询包含可验证的属性(如特定批次、年份、地理位置等),请抽查保留的条目是否符合源URL信息,并标记不符合的条目。如果多个保留条目未通过抽查,或者噪声过滤移除了超过40%的匹配结果,则建议使用(或更高版本)重新运行——这两种情况都表明无法为此查询生成可靠结果。
-g base-g corebase将剩余的(真实)实体以Markdown表格或列表形式展示。先说明数量,然后列出每个实体的名称、URL(若有)以及一行描述信息。每个实体需标注其源URL。
告知用户:
- 匹配到的实体数量(以及被过滤为噪声的数量,若有)
- 完整结果的存储路径()
/tmp/$FILENAME.json - 他们可以:
-
为这些结果添加字段,例如:bash
parallel-cli findall enrich $FINDALL_ID '{"properties":{"ceo":{"type":"string"},"employee_count":{"type":"number"}}}'该模式为JSON Schema风格的对象,其中将字段名映射为properties。{type, description?} -
获取更多匹配结果:
parallel-cli findall extend $FINDALL_ID 50
-
Setup
环境配置
Requires (installed and authenticated). If fails, or if a later command fails with an authentication error, tell the user to see https://docs.parallel.ai/integrations/cli and stop.
parallel-cliparallel-cli --version需要已安装并完成认证的 。如果 执行失败,或后续命令出现认证错误,请告知用户查看 https://docs.parallel.ai/integrations/cli 并停止操作。
parallel-cliparallel-cli --version