parallel-findall

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

FindAll: Entity Discovery

FindAll：实体发现

Find: $ARGUMENTS

Requires
parallel-cli
≥ 0.3.0 (the
findall
command was added in 0.3.0). If
parallel-cli findall
errors with
no such command
or similar, tell the user to run
parallel-cli update
(or
pipx upgrade parallel-web-tools
if installed via pipx), then retry.

查找：$ARGUMENTS

需要
parallel-cli
≥ 0.3.0（
findall
命令在0.3.0版本中新增）。如果
parallel-cli findall
执行时出现“no such command”或类似错误，请告知用户运行
parallel-cli update
（若通过pipx安装则运行
pipx upgrade parallel-web-tools
），然后重试。

When to use this skill

何时使用此技能

Use FindAll when the user wants a structured list of entities matching a description, not webpages or a narrative answer.

User asks for…	Use
"Find all X that…" / "List every Y…"	parallel-findall (this skill)
Webpage results / quick answers / current info	parallel-web-search
Narrative report / analysis / "research X"	parallel-deep-research
Add fields to a list you already have	parallel-data-enrichment

If the user already has a list and just wants to add fields, this is the wrong skill — use parallel-data-enrichment.

当用户需要符合描述的结构化实体列表，而非网页或叙述性答案时，使用FindAll。

用户需求…	使用工具
"查找所有满足……的X" / "列出所有……的Y"	parallel-findall（本技能）
网页结果 / 快速答案 / 实时信息	parallel-web-search
叙述性报告 / 分析 / "研究X"	parallel-deep-research
为已有列表添加字段	parallel-data-enrichment

如果用户已有列表且仅需添加字段，则不应使用本技能——请使用parallel-data-enrichment。

Step 1: Start the run

步骤1：启动任务

bash

parallel-cli findall run "$ARGUMENTS" --no-wait --json

Defaults: generator

core

, match limit

. Stick with

core

unless the user has a reason to escalate:

```
-g pro
```
— most thorough generator (slower, costlier). Use when the user asks for "comprehensive" coverage or matches are sparse on
```
core
```
```
-g base
```
— fastest, but markedly lower quality. Often returns query-echo entities (e.g., directory pages, the literal query string), entries with no URL, or category placeholders. Only use if the user explicitly asks for a quick scan and accepts noise; otherwise prefer
```
core
```
```
-n 50
```
— return up to 50 matched entities (5–1000 allowed)

If the user wants to exclude known entities (e.g., "find competitors but not Google or OpenAI"):

bash

parallel-cli findall run "$ARGUMENTS" --no-wait --json \
    --exclude '[{"name":"Google","url":"google.com"},{"name":"OpenAI","url":"openai.com"}]'

Tip — preview the schema first if the objective is ambiguous:

parallel-cli findall ingest "$ARGUMENTS" --json

shows the entity type and match conditions the API inferred, so you can refine wording before paying for a run.

Parse the JSON output to extract the

findall_id

and any monitoring URL. Tell the user:

A FindAll run has been started
Approximate cadence (minutes for
```
core
```
, longer for
```
pro
```
)
They can keep working while it runs

bash

parallel-cli findall run "$ARGUMENTS" --no-wait --json

默认设置：生成器为

core

，匹配上限为10。除非用户有特殊需求，否则保持使用

core

：

```
-g pro
```
—— 最全面的生成器（速度较慢，成本较高）。当用户要求“全面覆盖”或
```
core
```
生成的匹配结果较少时使用
```
-g base
```
—— 速度最快，但质量明显较低。常返回重复查询实体（如目录页面、字面查询字符串）、无URL的条目或分类占位符。仅当用户明确要求快速扫描且可接受噪声时使用；否则优先选择
```
core
```
```
-n 50
```
—— 返回最多50个匹配实体（允许范围5–1000）

如果用户希望排除已知实体（例如：“查找竞争对手，但排除Google和OpenAI”）：

bash

parallel-cli findall run "$ARGUMENTS" --no-wait --json \
    --exclude '[{"name":"Google","url":"google.com"},{"name":"OpenAI","url":"openai.com"}]'

小贴士——如果目标不明确，可先预览模式：

parallel-cli findall ingest "$ARGUMENTS" --json

会显示API推断出的实体类型和匹配条件，以便你在付费运行前优化查询措辞。

解析JSON输出以提取

findall_id

和监控URL。告知用户：

FindAll任务已启动
大致完成时长（
```
core
```
需数分钟，
```
pro
```
耗时更久）
他们可在任务运行期间继续其他工作

Step 2: Poll for results

步骤2：轮询结果

Choose a descriptive filename (e.g.,

series-a-ai-2026

charlotte-roofers

). Use lowercase with hyphens, no spaces.

bash

parallel-cli findall poll "$FINDALL_ID" -o "/tmp/$FILENAME.json" --timeout 540

Important:

Use
```
--timeout 540
```
(9 minutes) to stay within tool execution limits
Do NOT pass
```
--json
```
for large result sets — it will flood context.
```
-o
```
saves the full results to disk

选择一个描述性文件名（例如：

series-a-ai-2026

、

charlotte-roofers

）。使用小写字母加连字符，不要有空格。

bash

parallel-cli findall poll "$FINDALL_ID" -o "/tmp/$FILENAME.json" --timeout 540

重要提示：

使用
```
--timeout 540
```
（9分钟）以符合工具执行限制
对于大型结果集，请勿传递
```
--json
```
参数——这会占用大量上下文。
```
-o
```
参数会将完整结果保存到磁盘

If the poll times out

如果轮询超时

Re-run the same

parallel-cli findall poll

command to continue waiting. Server-side the run continues regardless.

重新运行相同的

parallel-cli findall poll

命令即可继续等待。任务会在服务器端持续运行，不受影响。

Response format

响应格式

Before presenting matches, filter the results for obvious noise:

Drop entries with empty/missing
```
url
```
Drop entries whose
```
name
```
echoes the user's query (e.g., literal "YC W25 batch companies in developer tools") — those are search-result placeholders, not real entities
Drop entries whose
```
url
```
is a third-party directory or profile page rather than the entity's own domain. Concretely: drop URLs on
```
linkedin.com
```
,
```
ycombinator.com/companies/...
```
,
```
crunchbase.com
```
,
```
pitchbook.com
```
, generic news/blog posts about the entity, etc. The URL should be something the entity itself owns (its product site, docs, or marketing site)

If filtering removes a meaningful share of matches, mention this to the user and suggest re-running with

-g pro

or a higher

-n

Sanity-check
-g base
results. The base generator can hallucinate categorical attributes (e.g., return a YC S22 company as a YC W25 match). The filter rules above only catch URL/name shape, not factual correctness. If the user's query has a falsifiable attribute (a specific batch, year, geography, etc.), spot-check the kept entries against the source URL and flag any that don't fit. Recommend re-running with

-g core

(or higher) if either multiple kept entries fail the spot-check or noise filtering dropped a meaningful share of the matched set (say, ≥40%) — both indicate

base

isn't producing reliable results for this query.

Present the remaining (real) entities as a markdown table or list. Lead with the count, then list each entity with its name, URL, and a one-line description if available. Cite each entity with its source URL.

Tell the user:

How many entities were matched (and how many were filtered as noise, if any)
The full results path (
```
/tmp/$FILENAME.json
```
)

That they can:

Add fields to these results, e.g.:

bash

parallel-cli findall enrich $FINDALL_ID '{"properties":{"ceo":{"type":"string"},"employee_count":{"type":"number"}}}'

The schema is a JSON Schema-style object with

properties

mapping field names →

{type, description?}

Get more matches:

parallel-cli findall extend $FINDALL_ID 50

在展示匹配结果前，过滤明显的噪声条目：

删除URL为空或缺失的条目
删除名称与用户查询重复的条目（例如：字面意义上的“YC W25批次开发工具公司”）——这些是搜索结果占位符，并非真实实体
删除URL为第三方目录或个人资料页面的条目，仅保留实体自有域名的URL。具体来说：删除
```
linkedin.com
```
、
```
ycombinator.com/companies/...
```
、
```
crunchbase.com
```
、
```
pitchbook.com
```
上的URL，以及关于该实体的通用新闻/博客文章等。URL应为实体自身拥有的网站（如产品官网、文档或营销站点）

如果过滤操作移除了大量有意义的匹配结果，请告知用户并建议使用

-g pro

或更高的

-n

参数重新运行。

对
-g base
的结果进行合理性检查。基础生成器可能会虚构分类属性（例如：将YC S22批次的公司标记为YC W25匹配结果）。上述过滤规则仅检查URL/名称格式，无法验证事实正确性。如果用户的查询包含可验证的属性（如特定批次、年份、地理位置等），请抽查保留的条目是否符合源URL信息，并标记不符合的条目。如果多个保留条目未通过抽查，或者噪声过滤移除了超过40%的匹配结果，则建议使用

-g core

（或更高版本）重新运行——这两种情况都表明

base

无法为此查询生成可靠结果。

将剩余的（真实）实体以Markdown表格或列表形式展示。先说明数量，然后列出每个实体的名称、URL（若有）以及一行描述信息。每个实体需标注其源URL。

告知用户：

匹配到的实体数量（以及被过滤为噪声的数量，若有）
完整结果的存储路径（
```
/tmp/$FILENAME.json
```
）

他们可以：

为这些结果添加字段，例如：

bash

parallel-cli findall enrich $FINDALL_ID '{"properties":{"ceo":{"type":"string"},"employee_count":{"type":"number"}}}'

该模式为JSON Schema风格的对象，其中

properties

将字段名映射为

{type, description?}

。

获取更多匹配结果：

parallel-cli findall extend $FINDALL_ID 50

Setup

环境配置

Requires

parallel-cli

(installed and authenticated). If

parallel-cli --version

fails, or if a later command fails with an authentication error, tell the user to see https://docs.parallel.ai/integrations/cli and stop.

需要已安装并完成认证的

parallel-cli

。如果

parallel-cli --version

执行失败，或后续命令出现认证错误，请告知用户查看 https://docs.parallel.ai/integrations/cli 并停止操作。