xiaohongshu-media-collector
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseXHS Media Collector
XHS 媒体收集工具
Follow shared release-shell rules in:
- release-shell rules
postplus-shared
Legacy alias: .
xhs-media-collectorUse this skill when the user wants to:
- download Xiaohongshu cover images from validated datasets
- turn normalized XHS research output into a local media manifest
- prepare local image assets for XHS card composition or downstream review
Read these references before implementation:
skills/20-research/xhs-media-collector/references/validated-surfaces.mdskills/20-research/xhs-media-collector/references/manifest-schema.md
遵循以下共享的release-shell规则:
- release-shell规则
postplus-shared
旧版别名:。
xhs-media-collector当用户有以下需求时可使用本Skill:
- 从经过验证的数据集中下载小红书封面图片
- 将标准化的XHS研究输出转换为本地媒体清单
- 为小红书卡片制作或后续审核准备本地图片资源
在实施前请阅读以下参考文档:
skills/20-research/xhs-media-collector/references/validated-surfaces.mdskills/20-research/xhs-media-collector/references/manifest-schema.md
Default posture
默认规则
Only collect asset URLs that are already exposed by a validated upstream dataset.
Supported by default:
- from normalized XHS post datasets
coverUrl
Not supported by default:
- direct Xiaohongshu video downloader output
Do not pretend video collection works when the validated downloader returns .
404 Not found data仅收集已通过上游验证数据集公开的资源URL。
默认支持:
- 标准化XHS帖子数据集中的
coverUrl
默认不支持:
- 直接下载小红书视频的输出
当验证过的下载器返回时,请勿伪装视频收集功能可用。
404 Not found dataWhat this skill is for
本Skill的适用场景
- building a local manifest from normalized XHS datasets
- downloading cover or image assets from remote URLs
- verifying that downloaded files exist and are non-empty
- 从标准化XHS数据集构建本地清单
- 从远程URL下载封面或图片资源
- 验证下载的文件是否存在且非空
What this skill is not for
本Skill的非适用场景
- discovering note URLs
- extracting post metadata
- downloading videos from note URLs by default
- 发现笔记URL
- 提取帖子元数据
- 默认从笔记URL下载视频
Failure posture
失败处理规则
- fail if the input dataset contains no downloadable remote image URLs
- fail if the requested asset type is
video - fail if a download returns a non-2xx response
- keep the manifest as the single source of truth for downloaded assets
- 如果输入数据集不包含可下载的远程图片URL,执行失败
- 如果请求的资源类型为,执行失败
video - 如果下载返回非2xx响应,执行失败
- 将清单作为已下载资源的唯一可信来源
Release-Shell Execution Contract
Release-Shell 执行约定
- keep media manifests, download reports, and intermediate verification files
under
<work-folder>/.postplus/xiaohongshu-media-collector/ - keep only final downloaded user-facing assets outside
.postplus/ - start with a bounded first pass, usually cover images before broader pulls
3-10 - fail fast if a download fails or the requested asset surface is not validated instead of pretending collection succeeded
- 将媒体清单、下载报告和中间验证文件保存在目录下
<work-folder>/.postplus/xiaohongshu-media-collector/ - 仅将最终面向用户的已下载资源保存在目录外
.postplus/ - 先进行有限次数的首次尝试,通常在进行大范围拉取前先下载张封面图片
3-10 - 如果下载失败或请求的资源未经过验证,立即终止并返回失败,而非伪装收集成功
Main scripts
主要脚本
scripts/build_xhs_media_manifest.mjsscripts/download_xhs_media_assets.mjsscripts/verify_xhs_media_manifest.mjs
scripts/build_xhs_media_manifest.mjsscripts/download_xhs_media_assets.mjsscripts/verify_xhs_media_manifest.mjs
Minimal workflow
最简工作流
bash
node ${CLAUDE_SKILL_DIR}/scripts/build_xhs_media_manifest.mjs \
--input <work-folder>/.postplus/xhs-normalized.json \
--limit 3 \
--output <work-folder>/.postplus/xhs-media-manifest.json
node ${CLAUDE_SKILL_DIR}/scripts/download_xhs_media_assets.mjs \
--manifest <work-folder>/.postplus/xhs-media-manifest.json \
--output-dir <work-folder>/.postplus/xhs-media-assets \
--output <work-folder>/.postplus/xhs-media-download-report.json
node ${CLAUDE_SKILL_DIR}/scripts/verify_xhs_media_manifest.mjs \
--manifest <work-folder>/.postplus/xhs-media-download-report.jsonbash
node ${CLAUDE_SKILL_DIR}/scripts/build_xhs_media_manifest.mjs \
--input <work-folder>/.postplus/xhs-normalized.json \
--limit 3 \
--output <work-folder>/.postplus/xhs-media-manifest.json
node ${CLAUDE_SKILL_DIR}/scripts/download_xhs_media_assets.mjs \
--manifest <work-folder>/.postplus/xhs-media-manifest.json \
--output-dir <work-folder>/.postplus/xhs-media-assets \
--output <work-folder>/.postplus/xhs-media-download-report.json
node ${CLAUDE_SKILL_DIR}/scripts/verify_xhs_media_manifest.mjs \
--manifest <work-folder>/.postplus/xhs-media-download-report.jsonGood output
合格输出
Return:
- manifest path
- downloaded file count
- failed asset count
- stable local file paths for downstream use
返回:
- 清单文件路径
- 已下载文件数量
- 下载失败的资源数量
- 供下游使用的稳定本地文件路径