feedgrab

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

feedgrab — Universal Content Grabber

feedgrab — 通用内容抓取工具

Give it a URL, get back structured Markdown. Supports 8+ platforms with deep extraction.

输入一个URL，即可返回结构化Markdown。支持8+平台的深度提取。

Trigger

触发条件

Activate when user provides a URL and wants content fetched/extracted/read:

```
/feedgrab <URL>
```
"Grab this article"
"Read this tweet/post"
"抓取这个链接"
Any URL from supported platforms

当用户提供URL并想要获取/提取/读取内容时激活：

```
/feedgrab <URL>
```
"Grab this article"
"Read this tweet/post"
"抓取这个链接"
任何来自支持平台的URL

Prerequisites Check

前置检查

Before fetching, verify feedgrab is installed:

bash

which feedgrab 2>/dev/null || command -v feedgrab 2>/dev/null

If NOT installed, tell the user:

feedgrab is not installed. Run `/feedgrab-setup` or manually:
  pip install feedgrab[all]
  feedgrab setup

Then stop — do not proceed without feedgrab.

在抓取前，验证feedgrab是否已安装：

bash

which feedgrab 2>/dev/null || command -v feedgrab 2>/dev/null

如果未安装，告知用户：

feedgrab is not installed. Run `/feedgrab-setup` or manually:
  pip install feedgrab[all]
  feedgrab setup

随后终止流程，未安装feedgrab则不继续执行后续操作。

Supported Platforms

支持的平台

Platform	URL Pattern	Method
X/Twitter	`x.com//status/` , `twitter.com/*`	GraphQL → FxTwitter → Syndication → oEmbed → Jina → Playwright
WeChat (微信公众号)	`mp.weixin.qq.com/*`	Playwright JS evaluate → Jina
Xiaohongshu (小红书)	`xiaohongshu.com/explore/` , `xhslink.com/`	API (xhshow) → Jina → Playwright
YouTube	`youtube.com/watch?v=` , `youtu.be/`	API metadata + yt-dlp subtitles
GitHub	`github.com//`	REST API (Chinese README priority)
Feishu/Lark (飞书)	`feishu.cn/docx/` , `feishu.cn/wiki/`	Open API → Playwright → Jina
Bilibili (B站)	`bilibili.com/video/` , `b23.tv/`	API
Telegram	`t.me/*`	Telethon
RSS	RSS/Atom feed URLs	feedparser
Any web page	Any other URL	Jina Reader fallback

平台	URL匹配规则	抓取方式
X/Twitter	`x.com//status/` , `twitter.com/*`	GraphQL → FxTwitter → Syndication → oEmbed → Jina → Playwright
微信公众号	`mp.weixin.qq.com/*`	Playwright JS evaluate → Jina
小红书	`xiaohongshu.com/explore/` , `xhslink.com/`	API (xhshow) → Jina → Playwright
YouTube	`youtube.com/watch?v=` , `youtu.be/`	API metadata + yt-dlp subtitles
GitHub	`github.com//`	REST API (优先读取中文README)
飞书	`feishu.cn/docx/` , `feishu.cn/wiki/`	Open API → Playwright → Jina
B站	`bilibili.com/video/` , `b23.tv/`	API
Telegram	`t.me/*`	Telethon
RSS	RSS/Atom feed URLs	feedparser
任意网页	其他所有URL	Jina Reader 兜底方案

Pipeline

处理流程

Step 1: Fetch Content

步骤1：抓取内容

bash

feedgrab "$ARGUMENTS"

The CLI auto-detects the platform and routes to the appropriate fetcher.

bash

feedgrab "$ARGUMENTS"

该CLI工具会自动检测平台并路由到对应的抓取器。

Step 2: Locate Output File

步骤2：定位输出文件

feedgrab saves output to

OUTPUT_DIR

(default:

./output/

). Check the CLI output for the saved file path, typically:

```
output/X/author_date：title.md
```
```
output/mpweixin/author_date：title.md
```
```
output/XHS/author_date：title.md
```
```
output/YouTube/author_date：title.md
```
```
output/GitHub/author_date：title.md
```
```
output/Feishu/author_date：title.md
```

feedgrab会将输出保存到

OUTPUT_DIR

（默认路径：

./output/

）。查看CLI输出获取保存的文件路径，通常为：

```
output/X/author_date：title.md
```
```
output/mpweixin/author_date：title.md
```
```
output/XHS/author_date：title.md
```
```
output/YouTube/author_date：title.md
```
```
output/GitHub/author_date：title.md
```
```
output/Feishu/author_date：title.md
```

Step 3: Read and Present

步骤3：读取并展示

Read the output

.md

file and present the content to the user. The file includes:

YAML front matter (title, source, author, published, likes, tags, etc.)
Full article/tweet/post content in Markdown
Images (as remote URLs or local paths if media download is enabled)

读取输出的

.md

文件并将内容展示给用户。文件包含：

YAML前置元数据（标题、来源、作者、发布时间、点赞数、标签等）
Markdown格式的完整文章/推文/帖子内容
图片（如果开启媒体下载则为远程URL或本地路径）

Clipboard Mode

剪贴板模式

If the user says "grab from clipboard" or the URL contains

(which breaks PowerShell):

bash

feedgrab clip

This reads the URL from the system clipboard.

如果用户说「从剪贴板抓取」或者URL包含

（会导致PowerShell解析出错）：

bash

feedgrab clip

该命令会从系统剪贴板读取URL。

Error Handling

错误处理

Error	Solution
`feedgrab: command not found`	Run `/feedgrab-setup`
Cookie expired / 401 / 403	`feedgrab login <platform>` to refresh
Jina timeout (30s)	feedgrab auto-retries with Playwright
Rate limit (429)	feedgrab auto-rotates cookies if configured
`OUTPUT_DIR` not set	`feedgrab setup` to configure

错误	解决方案
`feedgrab: command not found`	运行 `/feedgrab-setup`
Cookie过期 / 401 / 403	执行 `feedgrab login <platform>` 刷新凭证
Jina超时（30秒）	feedgrab会自动切换Playwright重试
请求频率限制（429）	已配置的情况下feedgrab会自动轮换Cookie
`OUTPUT_DIR` 未设置	执行 `feedgrab setup` 完成配置

Tips

使用提示

For Twitter deep extraction (views, bookmarks, threads): configure cookies via
```
feedgrab login twitter
```
For WeChat articles: no login needed for single articles
For Xiaohongshu:
```
pip install xhshow
```
for API mode (faster, no browser needed)
For GitHub: set
```
GITHUB_TOKEN
```
for higher rate limits (5000/hr vs 60/hr)
For Feishu: set
```
FEISHU_APP_ID
```
+
```
FEISHU_APP_SECRET
```
for Open API access
Run
```
feedgrab doctor
```
to diagnose issues

若要Twitter深度提取（播放量、收藏数、推文串）：通过
```
feedgrab login twitter
```
配置Cookie
若要抓取微信公众号文章：单篇文章无需登录
若要抓取小红书内容：执行
```
pip install xhshow
```
即可使用API模式（速度更快，无需启动浏览器）
若要抓取GitHub内容：设置
```
GITHUB_TOKEN
```
可获得更高的请求频率限制（5000次/小时 vs 60次/小时）
若要抓取飞书内容：设置
```
FEISHU_APP_ID
```
+
```
FEISHU_APP_SECRET
```
即可使用开放API访问
运行
```
feedgrab doctor
```
可诊断问题