xiaohongshu-media-collector

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

XHS Media Collector

XHS 媒体收集工具

Follow shared release-shell rules in:
  • postplus-shared
    release-shell rules
Legacy alias:
xhs-media-collector
.
Use this skill when the user wants to:
  • download Xiaohongshu cover images from validated datasets
  • turn normalized XHS research output into a local media manifest
  • prepare local image assets for XHS card composition or downstream review
Read these references before implementation:
  • skills/20-research/xhs-media-collector/references/validated-surfaces.md
  • skills/20-research/xhs-media-collector/references/manifest-schema.md
遵循以下共享的release-shell规则:
  • postplus-shared
    release-shell规则
旧版别名:
xhs-media-collector
当用户有以下需求时可使用本Skill:
  • 从经过验证的数据集中下载小红书封面图片
  • 将标准化的XHS研究输出转换为本地媒体清单
  • 为小红书卡片制作或后续审核准备本地图片资源
在实施前请阅读以下参考文档:
  • skills/20-research/xhs-media-collector/references/validated-surfaces.md
  • skills/20-research/xhs-media-collector/references/manifest-schema.md

Default posture

默认规则

Only collect asset URLs that are already exposed by a validated upstream dataset.
Supported by default:
  • coverUrl
    from normalized XHS post datasets
Not supported by default:
  • direct Xiaohongshu video downloader output
Do not pretend video collection works when the validated downloader returns
404 Not found data
.
仅收集已通过上游验证数据集公开的资源URL。
默认支持:
  • 标准化XHS帖子数据集中的
    coverUrl
默认不支持:
  • 直接下载小红书视频的输出
当验证过的下载器返回
404 Not found data
时,请勿伪装视频收集功能可用。

What this skill is for

本Skill的适用场景

  • building a local manifest from normalized XHS datasets
  • downloading cover or image assets from remote URLs
  • verifying that downloaded files exist and are non-empty
  • 从标准化XHS数据集构建本地清单
  • 从远程URL下载封面或图片资源
  • 验证下载的文件是否存在且非空

What this skill is not for

本Skill的非适用场景

  • discovering note URLs
  • extracting post metadata
  • downloading videos from note URLs by default
  • 发现笔记URL
  • 提取帖子元数据
  • 默认从笔记URL下载视频

Failure posture

失败处理规则

  • fail if the input dataset contains no downloadable remote image URLs
  • fail if the requested asset type is
    video
  • fail if a download returns a non-2xx response
  • keep the manifest as the single source of truth for downloaded assets
  • 如果输入数据集不包含可下载的远程图片URL,执行失败
  • 如果请求的资源类型为
    video
    ,执行失败
  • 如果下载返回非2xx响应,执行失败
  • 将清单作为已下载资源的唯一可信来源

Release-Shell Execution Contract

Release-Shell 执行约定

  • keep media manifests, download reports, and intermediate verification files under
    <work-folder>/.postplus/xiaohongshu-media-collector/
  • keep only final downloaded user-facing assets outside
    .postplus/
  • start with a bounded first pass, usually
    3-10
    cover images before broader pulls
  • fail fast if a download fails or the requested asset surface is not validated instead of pretending collection succeeded
  • 将媒体清单、下载报告和中间验证文件保存在
    <work-folder>/.postplus/xiaohongshu-media-collector/
    目录下
  • 仅将最终面向用户的已下载资源保存在
    .postplus/
    目录外
  • 先进行有限次数的首次尝试,通常在进行大范围拉取前先下载
    3-10
    张封面图片
  • 如果下载失败或请求的资源未经过验证,立即终止并返回失败,而非伪装收集成功

Main scripts

主要脚本

  • scripts/build_xhs_media_manifest.mjs
  • scripts/download_xhs_media_assets.mjs
  • scripts/verify_xhs_media_manifest.mjs
  • scripts/build_xhs_media_manifest.mjs
  • scripts/download_xhs_media_assets.mjs
  • scripts/verify_xhs_media_manifest.mjs

Minimal workflow

最简工作流

bash
node ${CLAUDE_SKILL_DIR}/scripts/build_xhs_media_manifest.mjs \
  --input <work-folder>/.postplus/xhs-normalized.json \
  --limit 3 \
  --output <work-folder>/.postplus/xhs-media-manifest.json

node ${CLAUDE_SKILL_DIR}/scripts/download_xhs_media_assets.mjs \
  --manifest <work-folder>/.postplus/xhs-media-manifest.json \
  --output-dir <work-folder>/.postplus/xhs-media-assets \
  --output <work-folder>/.postplus/xhs-media-download-report.json

node ${CLAUDE_SKILL_DIR}/scripts/verify_xhs_media_manifest.mjs \
  --manifest <work-folder>/.postplus/xhs-media-download-report.json
bash
node ${CLAUDE_SKILL_DIR}/scripts/build_xhs_media_manifest.mjs \
  --input <work-folder>/.postplus/xhs-normalized.json \
  --limit 3 \
  --output <work-folder>/.postplus/xhs-media-manifest.json

node ${CLAUDE_SKILL_DIR}/scripts/download_xhs_media_assets.mjs \
  --manifest <work-folder>/.postplus/xhs-media-manifest.json \
  --output-dir <work-folder>/.postplus/xhs-media-assets \
  --output <work-folder>/.postplus/xhs-media-download-report.json

node ${CLAUDE_SKILL_DIR}/scripts/verify_xhs_media_manifest.mjs \
  --manifest <work-folder>/.postplus/xhs-media-download-report.json

Good output

合格输出

Return:
  • manifest path
  • downloaded file count
  • failed asset count
  • stable local file paths for downstream use
返回:
  • 清单文件路径
  • 已下载文件数量
  • 下载失败的资源数量
  • 供下游使用的稳定本地文件路径