xhs-search-workflow

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

XHS Search Workflow

小红书搜索工作流

Setup

环境搭建

Run once on a new machine:
bash
skills/xhs-search-workflow/scripts/setup_env.sh
This creates
skills/xhs-search-workflow/.venv
and installs Python deps.
在新机器上执行一次以下命令:
bash
skills/xhs-search-workflow/scripts/setup_env.sh
这将创建
skills/xhs-search-workflow/.venv
虚拟环境并安装Python依赖。

Cookie Input

Cookie输入方式

Use either:
  • --cookie "..."
  • --env-file /path/to/.env
    with
    COOKIES="..."
Add
--no-env-proxy
when host proxy vars break network.
可选择以下任意一种方式:
  • --cookie "..."
  • --env-file /path/to/.env
    ,文件内需配置
    COOKIES="..."
当主机代理变量导致网络异常时,添加
--no-env-proxy
参数。

Main Scripts

主要脚本

  • scripts/search_notes.py
    : note search (supports advanced filters)
  • scripts/fetch_note_texts.py
    : extract note text and image URLs, optional image download
  • scripts/xhs_full_cli.py
    : unified entry for user/comment/message/homefeed/creator/no-water APIs
  • scripts/export_notes.py
    : export note data to Excel and/or media files
  • scripts/search_notes.py
    :笔记搜索(支持高级筛选)
  • scripts/fetch_note_texts.py
    :提取笔记文本和图片URL,可选图片下载功能
  • scripts/xhs_full_cli.py
    :统一入口,支持用户/评论/消息/首页推荐流/创作者/无水印等API调用
  • scripts/export_notes.py
    :将笔记数据导出为Excel和/或媒体文件

Typical Commands

常用命令

1) Search notes

1) 搜索笔记

bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/search_notes.py "汇丰银行" \
  --num 10 --sort 0 --note-type 0 --no-env-proxy --json
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/search_notes.py "汇丰银行" \
  --num 10 --sort 0 --note-type 0 --no-env-proxy --json

2) Extract note text + image URLs

2) 提取笔记文本 + 图片URL

bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/fetch_note_texts.py \
  --url-file note_urls.txt --no-env-proxy \
  --timeout 30 --retries 2 --min-interval 4 --max-interval 7 \
  --out note_content.json
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/fetch_note_texts.py \
  --url-file note_urls.txt --no-env-proxy \
  --timeout 30 --retries 2 --min-interval 4 --max-interval 7 \
  --out note_content.json

3) Download note images while extracting

3) 提取内容同时下载笔记图片

bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/fetch_note_texts.py \
  --url-file note_urls.txt --no-env-proxy \
  --download-images --image-dir xhs_images \
  --timeout 30 --retries 2 --min-interval 4 --max-interval 7 \
  --out note_content.json
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/fetch_note_texts.py \
  --url-file note_urls.txt --no-env-proxy \
  --download-images --image-dir xhs_images \
  --timeout 30 --retries 2 --min-interval 4 --max-interval 7 \
  --out note_content.json

4) Full API CLI examples

4) 全功能CLI示例

Search users:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/xhs_full_cli.py \
  --env-file .env --no-env-proxy search-users --query "汇丰银行" --num 10
Get note comments:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/xhs_full_cli.py \
  --env-file .env --no-env-proxy note-comments \
  --url "https://www.xiaohongshu.com/explore/<note_id>?xsec_token=<token>"
Get creator posted notes:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/xhs_full_cli.py \
  --env-file .env --no-env-proxy creator-posted
No-watermark URL conversion:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/xhs_full_cli.py \
  --no-env-proxy no-water-img --img-url "https://..."
搜索用户:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/xhs_full_cli.py \
  --env-file .env --no-env-proxy search-users --query "汇丰银行" --num 10
获取笔记评论:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/xhs_full_cli.py \
  --env-file .env --no-env-proxy note-comments \
  --url "https://www.xiaohongshu.com/explore/<note_id>?xsec_token=<token>"
获取创作者已发布笔记:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/xhs_full_cli.py \
  --env-file .env --no-env-proxy creator-posted
无水印URL转换:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/xhs_full_cli.py \
  --no-env-proxy no-water-img --img-url "https://..."

5) Export Excel/media

5) 导出Excel/媒体文件

From query:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/export_notes.py \
  --query "汇丰银行" --num 10 --save all \
  --excel xhs_notes.xlsx --media-dir xhs_media --no-env-proxy
From URL file:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/export_notes.py \
  --url-file note_urls.txt --save excel --excel xhs_notes.xlsx --no-env-proxy
根据关键词查询导出:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/export_notes.py \
  --query "汇丰银行" --num 10 --save all \
  --excel xhs_notes.xlsx --media-dir xhs_media --no-env-proxy
根据URL文件导出:
bash
skills/xhs-search-workflow/.venv/bin/python \
  skills/xhs-search-workflow/scripts/export_notes.py \
  --url-file note_urls.txt --save excel --excel xhs_notes.xlsx --no-env-proxy

xhs_full_cli.py
Subcommands

xhs_full_cli.py
子命令

  • user-info --user-id <id>
  • user-self-info
  • user-self-info2
  • user-posts --user-url <url>
  • user-likes --user-url <url>
  • user-collects --user-url <url>
  • note-info --url <url>
  • note-comments --url <url>
  • search-keyword --word <kw>
  • search-users --query <kw> --num <n>
  • messages-unread
  • messages-mentions
  • messages-likes
  • messages-connections
  • homefeed-channels
  • homefeed-recommend --category <name> --num <n>
  • creator-posted
  • no-water-video --note-id <id>
  • no-water-img --img-url <url>
  • user-info --user-id <id>
  • user-self-info
  • user-self-info2
  • user-posts --user-url <url>
  • user-likes --user-url <url>
  • user-collects --user-url <url>
  • note-info --url <url>
  • note-comments --url <url>
  • search-keyword --word <kw>
  • search-users --query <kw> --num <n>
  • messages-unread
  • messages-mentions
  • messages-likes
  • messages-connections
  • homefeed-channels
  • homefeed-recommend --category <name> --num <n>
  • creator-posted
  • no-water-video --note-id <id>
  • no-water-img --img-url <url>

Offline/Portable Design

离线/可移植设计

  • Skill bundles signing JS in
    assets/js/
    .
  • Skill bundles offline
    crypto-js
    at
    assets/js/vendor/crypto-js.js
    .
  • Skill does not import
    apis/
    or
    xhs_utils/
    from original repository.
  • 该技能在
    assets/js/
    中捆绑了签名JS文件
  • assets/js/vendor/crypto-js.js
    中捆绑了离线版
    crypto-js
  • 无需从原仓库导入
    apis/
    xhs_utils/
    模块

Validation

验证步骤

Run after edits:
bash
skills/xhs-search-workflow/.venv/bin/python \
  "$CODEX_HOME/skills/.system/skill-creator/scripts/quick_validate.py" \
  skills/xhs-search-workflow
Basic smoke tests:
bash
skills/xhs-search-workflow/.venv/bin/python skills/xhs-search-workflow/scripts/xhs_full_cli.py --help
skills/xhs-search-workflow/.venv/bin/python skills/xhs-search-workflow/scripts/export_notes.py --help
编辑后执行以下命令验证:
bash
skills/xhs-search-workflow/.venv/bin/python \
  "$CODEX_HOME/skills/.system/skill-creator/scripts/quick_validate.py" \
  skills/xhs-search-workflow
基础冒烟测试:
bash
skills/xhs-search-workflow/.venv/bin/python skills/xhs-search-workflow/scripts/xhs_full_cli.py --help
skills/xhs-search-workflow/.venv/bin/python skills/xhs-search-workflow/scripts/export_notes.py --help

Execution Notes

执行注意事项

  • Prefer
    skills/xhs-search-workflow/.venv/bin/python
    instead of system
    python
    .
  • If environment changed, rerun
    scripts/setup_env.sh
    before debugging.
  • Keep
    assets/js/vendor/crypto-js.js
    with the skill for cross-machine offline use.
  • Scripts force UTF-8 stdout/stderr; on Windows, also set
    PYTHONUTF8=1
    and
    PYTHONIOENCODING=utf-8
    .
  • scripts/xhs_client.py
    auto-checks JS assets and syncs
    assets/js/static/xhs_xray_pack{1,2}.js
    for runtime compatibility.
  • For
    xhs_full_cli.py
    , place global flags before subcommand:
    • Correct:
      xhs_full_cli.py --env-file .env --no-env-proxy <subcommand> ...
    • Wrong:
      xhs_full_cli.py <subcommand> ... --env-file .env
  • messages-mentions/messages-likes/messages-connections
    can return very large JSON; prefer
    --out
    to file.
  • fetch_note_texts.py
    defaults to serial throttling and retries to reduce hang/risk-control issues.
  • 优先使用
    skills/xhs-search-workflow/.venv/bin/python
    而非系统Python
  • 如果环境发生变化,调试前请重新运行
    scripts/setup_env.sh
  • 请保留
    assets/js/vendor/crypto-js.js
    文件,以支持跨机器离线使用
  • 脚本强制使用UTF-8编码输出;在Windows系统中,还需设置环境变量
    PYTHONUTF8=1
    PYTHONIOENCODING=utf-8
  • scripts/xhs_client.py
    会自动检查JS资源,并同步
    assets/js/static/xhs_xray_pack{1,2}.js
    以保证运行时兼容性
  • 对于
    xhs_full_cli.py
    ,全局参数需放在子命令之前:
    • 正确写法:
      xhs_full_cli.py --env-file .env --no-env-proxy <subcommand> ...
    • 错误写法:
      xhs_full_cli.py <subcommand> ... --env-file .env
  • messages-mentions/messages-likes/messages-connections
    可能返回超大体积JSON,建议使用
    --out
    参数将结果保存到文件
  • fetch_note_texts.py
    默认采用串行限流和重试机制,以减少请求挂起或风控问题

Troubleshooting

问题排查

See
references/troubleshooting.md
.
请查看
references/troubleshooting.md
文档。