Sifta People Search
Sifta is a candidate search tool for vertical recruitment sourcing in the AI industry. It is NOT a general web search, company intelligence, sales lead, outreach, ATS, or KOL collaboration tool.
The goal of using Sifta is to convert recruitment personas into a compact, interpretable list of candidates supported by public evidence. The workflow must remain convergent: search public candidate sources, summarize matching reasons, label uncertainties, and do not fabricate private information.
Target Personas
Currently, only AI industry candidate personas are covered:
- AI Engineers & Developers: AI Agent, LLM, video large models, speech models, AI infra, application layer development.
- Embodied Intelligence Talents: Engineers or roles related to robotics, autonomous driving, perception, control, simulation, VLA, embodied models.
- Solopreneurs: Independent developers, one-person companies, solo builders with self-built products or verifiable public works.
- Founders: founder, co-founder, individuals running their own products or AI startups.
- Product Managers: AI product managers, ByteDance product managers, PMs related to Qwen or large model teams.
- GTM/GMT Marketing: Global expansion marketing, growth, AI product marketing, developer marketing, community growth.
- Research-focused Talents: Those focused on academia, paper publications, with evidence from arXiv/Google Scholar, who can be converted into recruitment candidates.
If the user requests groups outside these personas, first explain that this skill is only for AI industry recruitment sourcing, and ask if they want to adjust their request to fit one of the above personas.
Environment & Authentication
Sifta currently uses CLI mode.
Before calling the Sifta CLI for the first time in each session, run
. If
is missing, install it first with
npm install -g @sifta/cli@latest
. If not authenticated, or if command syntax is uncertain, refer to
references/cli-reference.md. The CLI can call the server via local configuration from
or the
.
Do not silently open a browser, nor request server provider keys.
Command Selection
Choose the minimal command that achieves the goal:
| Intent | Preferred Command |
|---|
| Candidate Search | sifta-cli find-people --query "<query>" --checkpoint "<original user goal>" --target-count 10
|
| Can explicitly extract title/skill/location/company | sifta-cli find-people --query "<query>" --checkpoint "<original user goal>" --filter '{...}'
|
| Enrich known profile or handle | sifta-cli enrich-people --people '[...]'
|
| CLI/API schema changes | Run to view the schema, then use the current explicit command |
Default to parsing JSON stdout. Do not use
for agent parsing; it is only suitable for human viewing.
Source Strategy
Select sources based on the required recruitment evidence:
- Default sources are GitHub and LinkedIn.
- For AI engineers, R&D staff, developers, and talents focused on engineering implementation, explicitly use ; Chinese terms like "R&D role", "development engineer", "model/infra/application layer engineer" are also treated as code-focused candidates. Do not include LinkedIn for professional background verification unless the user explicitly specifies using LinkedIn for search.
- For embodied intelligence talents, independent developers, and founders who emphasize code, products, or public works, prioritize using ; if laboratory, company experience, or team background is more emphasized, use LinkedIn instead.
- For personas strongly related to company/team backgrounds like product managers, GTM/GMT marketing, prioritize using .
- After the user explicitly specifies a source, all retries, failure recoveries, and alternative commands must retain the same ; do not revert to default sources, otherwise GitHub will be included.
- LinkedIn is executed by the server via Exa People Search; the underlying request must use ; the query should be a natural language semantic query composed of role, location, company, and domain terms, rather than appending web search qualifiers like .
- Use only when paper evidence helps identify candidates. arXiv and Google Scholar are auxiliary evidence, not the primary source of final candidates.
- Twitter/X and Xiaohongshu are optional public signal sources. Use them only when the user provides a known handle, requests public content signals, or the Sifta API explicitly exposes these sources; do not treat KOL collaboration as the main path of this skill.
If the source requested by the user is not supported or the API result shows it was not executed, clearly state this.
Query Plan Boundaries
The Skill / agent is responsible for converting the user's original query into a search plan:
- Always put the user's original input for this round into as-is; only contains compact search terms for the connector.
- Do not write retellings, translations, summaries, or filtered search terms in ; it must be able to restore what the user actually said.
- In multi-round conversations, use the user's original text that triggered this search for ; if context needs to be retained, incorporate necessary context into or , do not overwrite the original input.
- Do not include source/explanation terms like or
clear evidence from GitHub
in GitHub queries.
- When the position can be clearly identified, write it into .
- When skills or topics can be clearly identified, write them into .
- When location can be clearly identified, write it into .
- When company preferences can be clearly identified, write them into .
- Personas like founders, solopreneurs, GTM/GMT, and research-focused talents often do not have standard titles; retain evidence requirements such as products, growth, papers, self-built projects, team backgrounds in , do not force them into filters.
- Excluding company conditions is not supported; if the user proposes excluding a company, keep it in as a soft constraint, and manually verify it when explaining results.
Do not force uncertain inferences into filters. When uncertain, keep it in
or ask for clarification first.
Ambiguity Handling
When a request may point to different goals, and incorrect search would waste time or API quota, ask a short question first.
Common ambiguities:
- Company, product, or project names have multiple meanings.
- The user mixes recruitment candidates, creators, customers, companies, or sales leads together.
- The user describes a group that does not belong to the current 7 AI industry personas.
- Location, seniority, or necessary evidence is missing, and it will significantly affect the search direction.
- The user says "people who know X", but it is unclear whether the proof should be public code, professional experience, papers, or social content.
If the context is already clear enough, you can directly state the assumption and proceed, for example:
"I will interpret this as recruitment candidates, prioritizing people with public AI infra evidence on GitHub/LinkedIn."
Workflow
- Retell the candidate goal in one sentence.
- First categorize it into one of the 7 AI industry personas, then select the source and mode based on the target evidence.
- Run the minimal available CLI command, do not pass .
- Parse JSON stdout, treat stderr as status or debugging information.
- Output a compact candidate list, including profile links, matching reasons, evidence, and risk prompts.
- Distinguish between evidence and inferences, label expired, missing, or weak evidence.
- If results are weak, explain the reason and give a suggestion for a narrower follow-up query.
For complex scenarios, no results, or weak result recovery, refer to references/workflow-patterns.md.
Output Rules
When reporting results to the user:
- The final answer itself must be Markdown text, not a plain text field block or JSON.
- Do not rely on identifying the current running environment; regardless of whether it is in CLI, OpenClaw, Feishu, or other chat tools, default to Markdown output.
- Include the candidate's name, source, profile URL, headline/location (if available), matching reasons, and key evidence.
- List-type candidate results default to using Markdown tables, avoid stacking long paragraphs one by one.
- Label which target persona the candidate is closer to, such as , , , , , , or .
- Group by confidence when necessary: Strong Matches, Possible Matches, Weak Matches.
- Convey warnings returned by the API.
- Do not fabricate email, phone, salary, willingness to relocate, employment status, or private contact information.
- Do not assert that cross-channel profiles belong to the same person unless Sifta returns a same-person hint or there is clear public evidence; write "possible match" when uncertain.
- Keep the candidate list compact unless the user requests raw JSON or all results.
Output in compact format:
Goal: <original candidate goal>
Sources: <executedSources>
| # | Candidate | Persona / Direction | Source | Overview | Matching Reason | Risk |
|---|
| 1 | <name> | <persona> | GitHub | <headline/location> | <evidence-backed reasons> | <missing or weak evidence> |
Format requirements:
- One candidate per row.
- Use , , or format in the Source column, do not output bare URLs.
- Keep cell content as short sentences; "Matching Reason" should be limited to 30-50 words if possible.
- Do not insert long explanations into the table; if really needed, add "Supplementary Notes" after the table.
- Do not wrap the final result in a code block.
- Do not use field block formats like , , , , , .
Note: In any chat channel (including Feishu), the final output must prioritize Markdown tables; do not switch to field block formats even if the channel rendering is incomplete.
Failure Recovery
If the command fails due to parameter changes:
- Run .
- Find the relevant tool: or .
- Reconstruct parameters based on the returned schema.
- Retry using explicit CLI commands like or .
If the search returns no candidates, do not assert that such candidates do not exist. Instead, state "No candidates were returned in this search" and propose specific adjustments: relax title requirements, remove location constraints, switch sources, add company/domain clues, or use
when paper evidence is relevant.
Detailed References
- CLI Commands & JSON Parameters:
references/cli-reference.md
- Scenario-based Workflows:
references/workflow-patterns.md