sifta-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Sifta People Search

Sifta People Search

Sifta 是面向 AI 行业垂直招聘 sourcing 的候选人搜索工具。它不是通用网页搜索、 公司情报、销售线索、触达、ATS 或 KOL 合作工具。
使用 Sifta 的目标是:把招聘画像转成一份紧凑、可解释、有公开证据支撑的候选人 列表。工作流要保持收敛:搜索公开候选人来源,总结匹配理由,标注不确定性,不 编造私人信息。
Sifta is a candidate search tool for vertical recruitment sourcing in the AI industry. It is NOT a general web search, company intelligence, sales lead, outreach, ATS, or KOL collaboration tool.
The goal of using Sifta is to convert recruitment personas into a compact, interpretable list of candidates supported by public evidence. The workflow must remain convergent: search public candidate sources, summarize matching reasons, label uncertainties, and do not fabricate private information.

目标画像

Target Personas

当前只覆盖 AI 行业候选人画像:
  • AI 工程师、开发者:AI Agent、LLM、视频大模型、语音模型、AI infra、应用层开发。
  • 具身智能人才:机器人、自动驾驶、感知、控制、仿真、VLA、具身模型相关工程师或岗位。
  • 超级个体:独立开发者、一人公司、solo builder,有自建产品或可验证公开作品。
  • 创始人:founder、co-founder、自己运营产品或 AI startup 的人。
  • 产品经理:AI 产品经理、字节产品经理、Qwen 或大模型团队相关 PM。
  • GTM/GMT 营销:出海营销、增长、AI 产品营销、developer marketing、社区增长。
  • 研究型人才:专注学术、论文发表、arXiv/Google Scholar 证据,可转招聘候选人。
如果用户请求这些画像之外的人群,先说明当前 skill 只面向 AI 行业招聘 sourcing,并询问 是否要把需求改写成上述画像之一。
Currently, only AI industry candidate personas are covered:
  • AI Engineers & Developers: AI Agent, LLM, video large models, speech models, AI infra, application layer development.
  • Embodied Intelligence Talents: Engineers or roles related to robotics, autonomous driving, perception, control, simulation, VLA, embodied models.
  • Solopreneurs: Independent developers, one-person companies, solo builders with self-built products or verifiable public works.
  • Founders: founder, co-founder, individuals running their own products or AI startups.
  • Product Managers: AI product managers, ByteDance product managers, PMs related to Qwen or large model teams.
  • GTM/GMT Marketing: Global expansion marketing, growth, AI product marketing, developer marketing, community growth.
  • Research-focused Talents: Those focused on academia, paper publications, with evidence from arXiv/Google Scholar, who can be converted into recruitment candidates.
If the user requests groups outside these personas, first explain that this skill is only for AI industry recruitment sourcing, and ask if they want to adjust their request to fit one of the above personas.

环境与认证

Environment & Authentication

Sifta 当前使用 CLI 模式。
每个会话第一次调用 Sifta CLI 前,先运行
sifta-cli status
。如果缺少
sifta-cli
,先用
npm install -g @sifta/cli@latest
安装。如果未认证,或命令语法 不确定,阅读 references/cli-reference.md。CLI 可通过
sifta-cli auth
的本地配置或
SIFTA_API_KEY
调用服务端。
不要静默打开浏览器,也不要请求服务端供应商密钥。
Sifta currently uses CLI mode.
Before calling the Sifta CLI for the first time in each session, run
sifta-cli status
. If
sifta-cli
is missing, install it first with
npm install -g @sifta/cli@latest
. If not authenticated, or if command syntax is uncertain, refer to references/cli-reference.md. The CLI can call the server via local configuration from
sifta-cli auth
or the
SIFTA_API_KEY
.
Do not silently open a browser, nor request server provider keys.

命令选择

Command Selection

选择能完成目标的最小命令:
意图优先命令
候选人搜索
sifta-cli find-people --query "<query>" --checkpoint "<原始用户目标>" --target-count 10
能明确拆出 title/skill/location/company
sifta-cli find-people --query "<query>" --checkpoint "<原始用户目标>" --filter '{...}'
已知 profile 或 handle 补全
sifta-cli enrich-people --people '[...]'
CLI/API schema 变化
sifta-cli tools
查看 schema,然后改用当前明确命令
默认解析 JSON stdout。不要把
--pretty
用于 agent 解析;它只适合人工查看。
Choose the minimal command that achieves the goal:
IntentPreferred Command
Candidate Search
sifta-cli find-people --query "<query>" --checkpoint "<original user goal>" --target-count 10
Can explicitly extract title/skill/location/company
sifta-cli find-people --query "<query>" --checkpoint "<original user goal>" --filter '{...}'
Enrich known profile or handle
sifta-cli enrich-people --people '[...]'
CLI/API schema changesRun
sifta-cli tools
to view the schema, then use the current explicit command
Default to parsing JSON stdout. Do not use
--pretty
for agent parsing; it is only suitable for human viewing.

来源策略

Source Strategy

根据需要的招聘证据选择来源:
  • 默认来源是 GitHub 和 LinkedIn。
  • AI 工程师、研发、开发者和偏工程落地的人才,必须显式使用
    --sources '["github"]'
    ; “研发岗”“开发工程师”“模型/infra/应用层工程师”等中文表述也按代码型候选人处理。 除非用户明确指定使用 LinkedIn 搜索,否则不要加入 LinkedIn 做职业背景核验。
  • 具身智能、独立开发者和 founder 如果强调代码、产品或公开作品,优先使用
    --sources '["github"]'
    ;如果更强调实验室、公司经历或团队背景,再使用 LinkedIn。
  • 产品经理、GTM/GMT 营销和公司/团队背景强相关画像,优先使用
    --sources '["linkedin"]'
  • 用户显式指定来源后,所有重试、失败恢复和替代命令都必须保留相同
    --sources
    ; 不要退回默认来源,否则会混入 GitHub。
  • LinkedIn 由服务端通过 Exa People Search 执行,底层请求必须使用
    category: "people"
    ; query 应写成角色、地点、公司、领域词组成的自然语言语义查询,而不是追加
    LinkedIn profile only
    这类网页搜索限定词。
  • 只有当论文证据有助于识别候选人时才使用
    --mode research
    。arXiv 和 Google Scholar 是辅助证据,不是最终候选人主来源。
  • Twitter/X 和小红书属于可选公开信号来源。只有用户提供已知 handle、要求公开内容 信号,或 Sifta API 明确暴露这些来源时才使用;不要把 KOL 合作当作本 skill 的主路径。
如果用户请求的来源没有被支持或 API 结果显示未执行该来源,要明确说明。
Select sources based on the required recruitment evidence:
  • Default sources are GitHub and LinkedIn.
  • For AI engineers, R&D staff, developers, and talents focused on engineering implementation, explicitly use
    --sources '["github"]'
    ; Chinese terms like "R&D role", "development engineer", "model/infra/application layer engineer" are also treated as code-focused candidates. Do not include LinkedIn for professional background verification unless the user explicitly specifies using LinkedIn for search.
  • For embodied intelligence talents, independent developers, and founders who emphasize code, products, or public works, prioritize using
    --sources '["github"]'
    ; if laboratory, company experience, or team background is more emphasized, use LinkedIn instead.
  • For personas strongly related to company/team backgrounds like product managers, GTM/GMT marketing, prioritize using
    --sources '["linkedin"]'
    .
  • After the user explicitly specifies a source, all retries, failure recoveries, and alternative commands must retain the same
    --sources
    ; do not revert to default sources, otherwise GitHub will be included.
  • LinkedIn is executed by the server via Exa People Search; the underlying request must use
    category: "people"
    ; the query should be a natural language semantic query composed of role, location, company, and domain terms, rather than appending web search qualifiers like
    LinkedIn profile only
    .
  • Use
    --mode research
    only when paper evidence helps identify candidates. arXiv and Google Scholar are auxiliary evidence, not the primary source of final candidates.
  • Twitter/X and Xiaohongshu are optional public signal sources. Use them only when the user provides a known handle, requests public content signals, or the Sifta API explicitly exposes these sources; do not treat KOL collaboration as the main path of this skill.
If the source requested by the user is not supported or the API result shows it was not executed, clearly state this.

查询计划边界

Query Plan Boundaries

Skill / agent 负责把用户原始 query 转成搜索计划:
  • 始终把用户本轮原始输入原样放入
    --checkpoint
    --query
    只放面向 connector 的紧凑搜索词。
  • --checkpoint
    不要写复述、翻译、总结或筛选后的搜索词;它必须能还原用户实际说的话。
  • 多轮对话中,
    --checkpoint
    使用触发本次搜索的用户原文;如果需要保留上下文,把必要上下文并入
    --query
    filter
    ,不要覆盖原始输入。
  • GitHub 查询不要写
    GitHub developers in ...
    clear evidence from GitHub
    这类来源/解释词。
  • 能明确识别岗位时,写入
    filter.titles
  • 能明确识别技能或主题时,写入
    filter.skills
  • 能明确识别地点时,写入
    filter.locations
  • 能明确识别公司偏好时,写入
    filter.companies
  • Founder、超级个体、GTM/GMT 和研究型画像常常不是标准 title;把产品、增长、论文、 自建项目、团队背景等证据要求保留在
    query
    中,不要强塞进 filter。
  • 不支持排除公司条件;如果用户提出排除公司,把它保留在
    query
    里作为软约束,并在结果解释时人工核验。
不要把不确定的推断强塞进 filter。不确定时保留在
query
或先追问。
The Skill / agent is responsible for converting the user's original query into a search plan:
  • Always put the user's original input for this round into
    --checkpoint
    as-is;
    --query
    only contains compact search terms for the connector.
  • Do not write retellings, translations, summaries, or filtered search terms in
    --checkpoint
    ; it must be able to restore what the user actually said.
  • In multi-round conversations, use the user's original text that triggered this search for
    --checkpoint
    ; if context needs to be retained, incorporate necessary context into
    --query
    or
    filter
    , do not overwrite the original input.
  • Do not include source/explanation terms like
    GitHub developers in ...
    or
    clear evidence from GitHub
    in GitHub queries.
  • When the position can be clearly identified, write it into
    filter.titles
    .
  • When skills or topics can be clearly identified, write them into
    filter.skills
    .
  • When location can be clearly identified, write it into
    filter.locations
    .
  • When company preferences can be clearly identified, write them into
    filter.companies
    .
  • Personas like founders, solopreneurs, GTM/GMT, and research-focused talents often do not have standard titles; retain evidence requirements such as products, growth, papers, self-built projects, team backgrounds in
    query
    , do not force them into filters.
  • Excluding company conditions is not supported; if the user proposes excluding a company, keep it in
    query
    as a soft constraint, and manually verify it when explaining results.
Do not force uncertain inferences into filters. When uncertain, keep it in
query
or ask for clarification first.

歧义处理

Ambiguity Handling

当请求可能指向不同目标,且错误搜索会浪费时间或 API 配额时,先问一个简短问题。
常见歧义:
  • 公司、产品或项目名称有多个含义。
  • 用户把招聘候选人、创作者、客户、公司或销售线索混在一起。
  • 用户描述的人群不属于当前 7 类 AI 行业画像。
  • 地点、资历或必要证据缺失,且会明显影响搜索方向。
  • 用户说“懂 X 的人”,但不清楚证明依据应是公开代码、职业经历、论文还是社交内容。
如果上下文已经足够明确,可以直接说明假设并继续,例如: “我按招聘候选人理解,优先找 GitHub/LinkedIn 上有 AI infra 公开证据的人。”
When a request may point to different goals, and incorrect search would waste time or API quota, ask a short question first.
Common ambiguities:
  • Company, product, or project names have multiple meanings.
  • The user mixes recruitment candidates, creators, customers, companies, or sales leads together.
  • The user describes a group that does not belong to the current 7 AI industry personas.
  • Location, seniority, or necessary evidence is missing, and it will significantly affect the search direction.
  • The user says "people who know X", but it is unclear whether the proof should be public code, professional experience, papers, or social content.
If the context is already clear enough, you can directly state the assumption and proceed, for example: "I will interpret this as recruitment candidates, prioritizing people with public AI infra evidence on GitHub/LinkedIn."

工作流

Workflow

  1. 用一句话复述候选人目标。
  2. 先归类到 7 类 AI 行业画像之一,再根据目标证据选择来源和 mode。
  3. 运行最小可用 CLI 命令,不传
    --pretty
  4. 解析 JSON stdout,把 stderr 视为状态或调试信息。
  5. 输出紧凑候选人列表,包含 profile 链接、匹配理由、证据和风险提示。
  6. 区分证据和推断,标注过期、缺失或较弱的证据。
  7. 如果结果较弱,说明原因,并给出一个更窄的后续查询建议。
遇到复杂场景、无结果或弱结果恢复时,再参考 references/workflow-patterns.md
  1. Retell the candidate goal in one sentence.
  2. First categorize it into one of the 7 AI industry personas, then select the source and mode based on the target evidence.
  3. Run the minimal available CLI command, do not pass
    --pretty
    .
  4. Parse JSON stdout, treat stderr as status or debugging information.
  5. Output a compact candidate list, including profile links, matching reasons, evidence, and risk prompts.
  6. Distinguish between evidence and inferences, label expired, missing, or weak evidence.
  7. If results are weak, explain the reason and give a suggestion for a narrower follow-up query.
For complex scenarios, no results, or weak result recovery, refer to references/workflow-patterns.md.

输出规则

Output Rules

向用户汇报结果时:
  • 最终回答本身必须是 Markdown 文本,不是纯文本字段块,也不是 JSON。
  • 不要依赖识别当前运行环境;无论在 CLI、OpenClaw、飞书或其他聊天工具中,默认都按 Markdown 输出。
  • 包含候选人姓名、来源、profile URL、headline/location(如有)、匹配理由和关键 证据。
  • 列表型候选人结果默认使用 Markdown 表格,避免逐条长段落堆叠。
  • 标注候选人更接近哪类目标画像,例如
    AI 工程师
    具身智能
    超级个体
    Founder
    AI PM
    GTM/GMT
    研究型人才
  • 必要时按置信度分组:强匹配、可能匹配、弱匹配。
  • 传达 API 返回的 warnings。
  • 不要编造邮箱、电话、薪资、是否愿意搬迁、在职状态或私人联系方式。
  • 除非 Sifta 返回 same-person hint,或有明确公开证据,否则不要断言跨渠道 profile 是同一个人;不确定时写成“可能匹配”。
  • 除非用户要求原始 JSON 或全部结果,否则保持候选人列表紧凑。
输出为紧凑格式:
目标:<原始候选人目标> 来源:<executedSources>
#候选人画像 / 方向来源概况匹配理由风险
1<name><persona>GitHub<headline/location><evidence-backed reasons><missing or weak evidence>
格式要求:
  • 每行一个候选人。
  • 来源列使用
    [GitHub](url)
    [LinkedIn](url)
    [Profile](url)
    形式,不输出裸 URL。
  • 单元格内容保持短句;“匹配理由”优先控制在 30-50 字以内。
  • 不要把长段解释塞进表格;确实需要时,在表格后增加“补充说明”。
  • 不要用代码块包裹最终结果。
  • 不要使用
    候选人:
    画像:
    来源:
    概况:
    匹配理由:
    风险:
    这种逐条字段块格式。
注意: 在任何聊天渠道(包括飞书)中,最终都必须优先输出为 Markdown 表格;即使渠道渲染不完整,也不要改用字段块格式。
When reporting results to the user:
  • The final answer itself must be Markdown text, not a plain text field block or JSON.
  • Do not rely on identifying the current running environment; regardless of whether it is in CLI, OpenClaw, Feishu, or other chat tools, default to Markdown output.
  • Include the candidate's name, source, profile URL, headline/location (if available), matching reasons, and key evidence.
  • List-type candidate results default to using Markdown tables, avoid stacking long paragraphs one by one.
  • Label which target persona the candidate is closer to, such as
    AI Engineer
    ,
    Embodied Intelligence
    ,
    Solopreneur
    ,
    Founder
    ,
    AI PM
    ,
    GTM/GMT
    , or
    Research-focused Talent
    .
  • Group by confidence when necessary: Strong Matches, Possible Matches, Weak Matches.
  • Convey warnings returned by the API.
  • Do not fabricate email, phone, salary, willingness to relocate, employment status, or private contact information.
  • Do not assert that cross-channel profiles belong to the same person unless Sifta returns a same-person hint or there is clear public evidence; write "possible match" when uncertain.
  • Keep the candidate list compact unless the user requests raw JSON or all results.
Output in compact format:
Goal: <original candidate goal> Sources: <executedSources>
#CandidatePersona / DirectionSourceOverviewMatching ReasonRisk
1<name><persona>GitHub<headline/location><evidence-backed reasons><missing or weak evidence>
Format requirements:
  • One candidate per row.
  • Use
    [GitHub](url)
    ,
    [LinkedIn](url)
    , or
    [Profile](url)
    format in the Source column, do not output bare URLs.
  • Keep cell content as short sentences; "Matching Reason" should be limited to 30-50 words if possible.
  • Do not insert long explanations into the table; if really needed, add "Supplementary Notes" after the table.
  • Do not wrap the final result in a code block.
  • Do not use field block formats like
    Candidate:
    ,
    Persona:
    ,
    Source:
    ,
    Overview:
    ,
    Matching Reason:
    ,
    Risk:
    .
Note: In any chat channel (including Feishu), the final output must prioritize Markdown tables; do not switch to field block formats even if the channel rendering is incomplete.

失败恢复

Failure Recovery

如果命令因为参数变化失败:
  1. 运行
    sifta-cli tools
  2. 找到相关工具:
    find_people
    enrich_people
  3. 根据返回的 schema 重建参数。
  4. 使用
    find-people
    enrich-people
    这些明确 CLI 命令重试。
如果搜索没有返回候选人,不要断言不存在这类候选人。应说明“这次搜索没有返回候选人”, 并提出具体调整:放宽 title、去掉地点、切换来源、补充公司/domain 线索,或在论文证据 相关时使用
--mode research
If the command fails due to parameter changes:
  1. Run
    sifta-cli tools
    .
  2. Find the relevant tool:
    find_people
    or
    enrich_people
    .
  3. Reconstruct parameters based on the returned schema.
  4. Retry using explicit CLI commands like
    find-people
    or
    enrich-people
    .
If the search returns no candidates, do not assert that such candidates do not exist. Instead, state "No candidates were returned in this search" and propose specific adjustments: relax title requirements, remove location constraints, switch sources, add company/domain clues, or use
--mode research
when paper evidence is relevant.

详细参考

Detailed References

  • CLI 命令与 JSON 参数: references/cli-reference.md
  • 场景化工作流: references/workflow-patterns.md
  • CLI Commands & JSON Parameters: references/cli-reference.md
  • Scenario-based Workflows: references/workflow-patterns.md