feedgrab
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesefeedgrab — Universal Content Grabber
feedgrab — 通用内容抓取工具
Give it a URL, get back structured Markdown. Supports 8+ platforms with deep extraction.
输入一个URL,即可返回结构化Markdown。支持8+平台的深度提取。
Trigger
触发条件
Activate when user provides a URL and wants content fetched/extracted/read:
/feedgrab <URL>- "Grab this article"
- "Read this tweet/post"
- "抓取这个链接"
- Any URL from supported platforms
当用户提供URL并想要获取/提取/读取内容时激活:
/feedgrab <URL>- "Grab this article"
- "Read this tweet/post"
- "抓取这个链接"
- 任何来自支持平台的URL
Prerequisites Check
前置检查
Before fetching, verify feedgrab is installed:
bash
which feedgrab 2>/dev/null || command -v feedgrab 2>/dev/nullIf NOT installed, tell the user:
feedgrab is not installed. Run `/feedgrab-setup` or manually:
pip install feedgrab[all]
feedgrab setupThen stop — do not proceed without feedgrab.
在抓取前,验证feedgrab是否已安装:
bash
which feedgrab 2>/dev/null || command -v feedgrab 2>/dev/null如果未安装,告知用户:
feedgrab is not installed. Run `/feedgrab-setup` or manually:
pip install feedgrab[all]
feedgrab setup随后终止流程,未安装feedgrab则不继续执行后续操作。
Supported Platforms
支持的平台
| Platform | URL Pattern | Method |
|---|---|---|
| X/Twitter | | GraphQL → FxTwitter → Syndication → oEmbed → Jina → Playwright |
| WeChat (微信公众号) | | Playwright JS evaluate → Jina |
| Xiaohongshu (小红书) | | API (xhshow) → Jina → Playwright |
| YouTube | | API metadata + yt-dlp subtitles |
| GitHub | | REST API (Chinese README priority) |
| Feishu/Lark (飞书) | | Open API → Playwright → Jina |
| Bilibili (B站) | | API |
| Telegram | | Telethon |
| RSS | RSS/Atom feed URLs | feedparser |
| Any web page | Any other URL | Jina Reader fallback |
| 平台 | URL匹配规则 | 抓取方式 |
|---|---|---|
| X/Twitter | | GraphQL → FxTwitter → Syndication → oEmbed → Jina → Playwright |
| 微信公众号 | | Playwright JS evaluate → Jina |
| 小红书 | | API (xhshow) → Jina → Playwright |
| YouTube | | API metadata + yt-dlp subtitles |
| GitHub | | REST API (优先读取中文README) |
| 飞书 | | Open API → Playwright → Jina |
| B站 | | API |
| Telegram | | Telethon |
| RSS | RSS/Atom feed URLs | feedparser |
| 任意网页 | 其他所有URL | Jina Reader 兜底方案 |
Pipeline
处理流程
Step 1: Fetch Content
步骤1:抓取内容
bash
feedgrab "$ARGUMENTS"The CLI auto-detects the platform and routes to the appropriate fetcher.
bash
feedgrab "$ARGUMENTS"该CLI工具会自动检测平台并路由到对应的抓取器。
Step 2: Locate Output File
步骤2:定位输出文件
feedgrab saves output to (default: ). Check the CLI output for the saved file path, typically:
OUTPUT_DIR./output/output/X/author_date:title.mdoutput/mpweixin/author_date:title.mdoutput/XHS/author_date:title.mdoutput/YouTube/author_date:title.mdoutput/GitHub/author_date:title.mdoutput/Feishu/author_date:title.md
feedgrab会将输出保存到(默认路径:)。查看CLI输出获取保存的文件路径,通常为:
OUTPUT_DIR./output/output/X/author_date:title.mdoutput/mpweixin/author_date:title.mdoutput/XHS/author_date:title.mdoutput/YouTube/author_date:title.mdoutput/GitHub/author_date:title.mdoutput/Feishu/author_date:title.md
Step 3: Read and Present
步骤3:读取并展示
Read the output file and present the content to the user. The file includes:
.md- YAML front matter (title, source, author, published, likes, tags, etc.)
- Full article/tweet/post content in Markdown
- Images (as remote URLs or local paths if media download is enabled)
读取输出的文件并将内容展示给用户。文件包含:
.md- YAML前置元数据(标题、来源、作者、发布时间、点赞数、标签等)
- Markdown格式的完整文章/推文/帖子内容
- 图片(如果开启媒体下载则为远程URL或本地路径)
Clipboard Mode
剪贴板模式
If the user says "grab from clipboard" or the URL contains (which breaks PowerShell):
&bash
feedgrab clipThis reads the URL from the system clipboard.
如果用户说「从剪贴板抓取」或者URL包含(会导致PowerShell解析出错):
&bash
feedgrab clip该命令会从系统剪贴板读取URL。
Error Handling
错误处理
| Error | Solution |
|---|---|
| Run |
| Cookie expired / 401 / 403 | |
| Jina timeout (30s) | feedgrab auto-retries with Playwright |
| Rate limit (429) | feedgrab auto-rotates cookies if configured |
| |
| 错误 | 解决方案 |
|---|---|
| 运行 |
| Cookie过期 / 401 / 403 | 执行 |
| Jina超时(30秒) | feedgrab会自动切换Playwright重试 |
| 请求频率限制(429) | 已配置的情况下feedgrab会自动轮换Cookie |
| 执行 |
Tips
使用提示
- For Twitter deep extraction (views, bookmarks, threads): configure cookies via
feedgrab login twitter - For WeChat articles: no login needed for single articles
- For Xiaohongshu: for API mode (faster, no browser needed)
pip install xhshow - For GitHub: set for higher rate limits (5000/hr vs 60/hr)
GITHUB_TOKEN - For Feishu: set +
FEISHU_APP_IDfor Open API accessFEISHU_APP_SECRET - Run to diagnose issues
feedgrab doctor
- 若要Twitter深度提取(播放量、收藏数、推文串):通过配置Cookie
feedgrab login twitter - 若要抓取微信公众号文章:单篇文章无需登录
- 若要抓取小红书内容:执行即可使用API模式(速度更快,无需启动浏览器)
pip install xhshow - 若要抓取GitHub内容:设置可获得更高的请求频率限制(5000次/小时 vs 60次/小时)
GITHUB_TOKEN - 若要抓取飞书内容:设置+
FEISHU_APP_ID即可使用开放API访问FEISHU_APP_SECRET - 运行可诊断问题
feedgrab doctor