Sifta People Search

Sifta is a candidate search tool for vertical recruitment sourcing in the AI industry. It is NOT a general web search, company intelligence, sales lead, outreach, ATS, or KOL collaboration tool.

The goal of using Sifta is to convert recruitment personas into a compact, interpretable list of candidates supported by public evidence. The workflow must remain convergent: search public candidate sources, summarize matching reasons, label uncertainties, and do not fabricate private information.

Target Personas

Currently, only AI industry candidate personas are covered:

AI Engineers & Developers: AI Agent, LLM, video large models, speech models, AI infra, application layer development.
Embodied Intelligence Talents: Engineers or roles related to robotics, autonomous driving, perception, control, simulation, VLA, embodied models.
Solopreneurs: Independent developers, one-person companies, solo builders with self-built products or verifiable public works.
Founders: founder, co-founder, individuals running their own products or AI startups.
Product Managers: AI product managers, ByteDance product managers, PMs related to Qwen or large model teams.
GTM/GMT Marketing: Global expansion marketing, growth, AI product marketing, developer marketing, community growth.
Research-focused Talents: Those focused on academia, paper publications, with evidence from arXiv/Google Scholar, who can be converted into recruitment candidates.

If the user requests groups outside these personas, first explain that this skill is only for AI industry recruitment sourcing, and ask if they want to adjust their request to fit one of the above personas.

Environment & Authentication

Sifta currently uses CLI mode.

Before calling the Sifta CLI for the first time in each session, run

sifta-cli status

. If

sifta-cli

is missing, install it first with

npm install -g @sifta/cli@latest

. If not authenticated, or if command syntax is uncertain, refer to references/cli-reference.md. The CLI can call the server via local configuration from

sifta-cli auth

or the

SIFTA_API_KEY

Do not silently open a browser, nor request server provider keys.

Command Selection

Choose the minimal command that achieves the goal:

Intent	Preferred Command
Candidate Search	`sifta-cli find-people --query "<query>" --checkpoint "<original user goal>" --target-count 10`
Can explicitly extract title/skill/location/company	`sifta-cli find-people --query "<query>" --checkpoint "<original user goal>" --filter '{...}'`
Enrich known profile or handle	`sifta-cli enrich-people --people '[...]'`
CLI/API schema changes	Run `sifta-cli tools` to view the schema, then use the current explicit command

Default to parsing JSON stdout. Do not use

--pretty

for agent parsing; it is only suitable for human viewing.

Source Strategy

Select sources based on the required recruitment evidence:

Default sources are GitHub and LinkedIn.
For AI engineers, R&D staff, developers, and talents focused on engineering implementation, explicitly use
```
--sources '["github"]'
```
; Chinese terms like "R&D role", "development engineer", "model/infra/application layer engineer" are also treated as code-focused candidates. Do not include LinkedIn for professional background verification unless the user explicitly specifies using LinkedIn for search.
For embodied intelligence talents, independent developers, and founders who emphasize code, products, or public works, prioritize using
```
--sources '["github"]'
```
; if laboratory, company experience, or team background is more emphasized, use LinkedIn instead.
For personas strongly related to company/team backgrounds like product managers, GTM/GMT marketing, prioritize using
```
--sources '["linkedin"]'
```
.
After the user explicitly specifies a source, all retries, failure recoveries, and alternative commands must retain the same
```
--sources
```
; do not revert to default sources, otherwise GitHub will be included.
LinkedIn is executed by the server via Exa People Search; the underlying request must use
```
category: "people"
```
; the query should be a natural language semantic query composed of role, location, company, and domain terms, rather than appending web search qualifiers like
```
LinkedIn profile only
```
.
Use
```
--mode research
```
only when paper evidence helps identify candidates. arXiv and Google Scholar are auxiliary evidence, not the primary source of final candidates.
Twitter/X and Xiaohongshu are optional public signal sources. Use them only when the user provides a known handle, requests public content signals, or the Sifta API explicitly exposes these sources; do not treat KOL collaboration as the main path of this skill.

If the source requested by the user is not supported or the API result shows it was not executed, clearly state this.

Query Plan Boundaries

The Skill / agent is responsible for converting the user's original query into a search plan:

Always put the user's original input for this round into
```
--checkpoint
```
as-is;
```
--query
```
only contains compact search terms for the connector.
Do not write retellings, translations, summaries, or filtered search terms in
```
--checkpoint
```
; it must be able to restore what the user actually said.
In multi-round conversations, use the user's original text that triggered this search for
```
--checkpoint
```
; if context needs to be retained, incorporate necessary context into
```
--query
```
or
```
filter
```
, do not overwrite the original input.
Do not include source/explanation terms like
```
GitHub developers in ...
```
or
```
clear evidence from GitHub
```
in GitHub queries.
When the position can be clearly identified, write it into
```
filter.titles
```
.
When skills or topics can be clearly identified, write them into
```
filter.skills
```
.
When location can be clearly identified, write it into
```
filter.locations
```
.
When company preferences can be clearly identified, write them into
```
filter.companies
```
.
Personas like founders, solopreneurs, GTM/GMT, and research-focused talents often do not have standard titles; retain evidence requirements such as products, growth, papers, self-built projects, team backgrounds in
```
query
```
, do not force them into filters.
Excluding company conditions is not supported; if the user proposes excluding a company, keep it in
```
query
```
as a soft constraint, and manually verify it when explaining results.

Do not force uncertain inferences into filters. When uncertain, keep it in

query

or ask for clarification first.

Ambiguity Handling

When a request may point to different goals, and incorrect search would waste time or API quota, ask a short question first.

Common ambiguities:

Company, product, or project names have multiple meanings.
The user mixes recruitment candidates, creators, customers, companies, or sales leads together.
The user describes a group that does not belong to the current 7 AI industry personas.
Location, seniority, or necessary evidence is missing, and it will significantly affect the search direction.
The user says "people who know X", but it is unclear whether the proof should be public code, professional experience, papers, or social content.

If the context is already clear enough, you can directly state the assumption and proceed, for example: "I will interpret this as recruitment candidates, prioritizing people with public AI infra evidence on GitHub/LinkedIn."

Workflow

Retell the candidate goal in one sentence.
First categorize it into one of the 7 AI industry personas, then select the source and mode based on the target evidence.
Run the minimal available CLI command, do not pass
```
--pretty
```
.
Parse JSON stdout, treat stderr as status or debugging information.
Output a compact candidate list, including profile links, matching reasons, evidence, and risk prompts.
Distinguish between evidence and inferences, label expired, missing, or weak evidence.
If results are weak, explain the reason and give a suggestion for a narrower follow-up query.

For complex scenarios, no results, or weak result recovery, refer to references/workflow-patterns.md.

Output Rules

When reporting results to the user:

The final answer itself must be Markdown text, not a plain text field block or JSON.
Do not rely on identifying the current running environment; regardless of whether it is in CLI, OpenClaw, Feishu, or other chat tools, default to Markdown output.
Include the candidate's name, source, profile URL, headline/location (if available), matching reasons, and key evidence.
List-type candidate results default to using Markdown tables, avoid stacking long paragraphs one by one.

Label which target persona the candidate is closer to, such as

AI Engineer

Embodied Intelligence

Solopreneur

Founder

AI PM

GTM/GMT

, or

Research-focused Talent

Group by confidence when necessary: Strong Matches, Possible Matches, Weak Matches.
Convey warnings returned by the API.
Do not fabricate email, phone, salary, willingness to relocate, employment status, or private contact information.
Do not assert that cross-channel profiles belong to the same person unless Sifta returns a same-person hint or there is clear public evidence; write "possible match" when uncertain.
Keep the candidate list compact unless the user requests raw JSON or all results.

Output in compact format:

Goal: <original candidate goal> Sources: <executedSources>

#	Candidate	Persona / Direction	Source	Overview	Matching Reason	Risk
1	<name>	<persona>	GitHub	<headline/location>	<evidence-backed reasons>	<missing or weak evidence>

Format requirements:

One candidate per row.
Use
```
[GitHub](url)
```
,
```
[LinkedIn](url)
```
, or
```
[Profile](url)
```
format in the Source column, do not output bare URLs.
Keep cell content as short sentences; "Matching Reason" should be limited to 30-50 words if possible.
Do not insert long explanations into the table; if really needed, add "Supplementary Notes" after the table.
Do not wrap the final result in a code block.

Do not use field block formats like

Candidate:

Persona:

Source:

Overview:

Matching Reason:

Risk:

Note: In any chat channel (including Feishu), the final output must prioritize Markdown tables; do not switch to field block formats even if the channel rendering is incomplete.

Failure Recovery

If the command fails due to parameter changes:

Run
```
sifta-cli tools
```
.
Find the relevant tool:
```
find_people
```
or
```
enrich_people
```
.
Reconstruct parameters based on the returned schema.
Retry using explicit CLI commands like
```
find-people
```
or
```
enrich-people
```
.

If the search returns no candidates, do not assert that such candidates do not exist. Instead, state "No candidates were returned in this search" and propose specific adjustments: relax title requirements, remove location constraints, switch sources, add company/domain clues, or use

--mode research

when paper evidence is relevant.

Detailed References

CLI Commands & JSON Parameters: references/cli-reference.md
Scenario-based Workflows: references/workflow-patterns.md

sifta-search

NPX Install

Tags

SKILL.md Content (Chinese)