feedgrab

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

feedgrab — Universal Content Grabber

feedgrab — 通用内容抓取工具

Give it a URL, get back structured Markdown. Supports 8+ platforms with deep extraction.
输入一个URL,即可返回结构化Markdown。支持8+平台的深度提取。

Trigger

触发条件

Activate when user provides a URL and wants content fetched/extracted/read:
  • /feedgrab <URL>
  • "Grab this article"
  • "Read this tweet/post"
  • "抓取这个链接"
  • Any URL from supported platforms
当用户提供URL并想要获取/提取/读取内容时激活:
  • /feedgrab <URL>
  • "Grab this article"
  • "Read this tweet/post"
  • "抓取这个链接"
  • 任何来自支持平台的URL

Prerequisites Check

前置检查

Before fetching, verify feedgrab is installed:
bash
which feedgrab 2>/dev/null || command -v feedgrab 2>/dev/null
If NOT installed, tell the user:
feedgrab is not installed. Run `/feedgrab-setup` or manually:
  pip install feedgrab[all]
  feedgrab setup
Then stop — do not proceed without feedgrab.
在抓取前,验证feedgrab是否已安装:
bash
which feedgrab 2>/dev/null || command -v feedgrab 2>/dev/null
如果未安装,告知用户:
feedgrab is not installed. Run `/feedgrab-setup` or manually:
  pip install feedgrab[all]
  feedgrab setup
随后终止流程,未安装feedgrab则不继续执行后续操作。

Supported Platforms

支持的平台

PlatformURL PatternMethod
X/Twitter
x.com/*/status/*
,
twitter.com/*
GraphQL → FxTwitter → Syndication → oEmbed → Jina → Playwright
WeChat (微信公众号)
mp.weixin.qq.com/*
Playwright JS evaluate → Jina
Xiaohongshu (小红书)
xiaohongshu.com/explore/*
,
xhslink.com/*
API (xhshow) → Jina → Playwright
YouTube
youtube.com/watch?v=*
,
youtu.be/*
API metadata + yt-dlp subtitles
GitHub
github.com/*/*
REST API (Chinese README priority)
Feishu/Lark (飞书)
feishu.cn/docx/*
,
feishu.cn/wiki/*
Open API → Playwright → Jina
Bilibili (B站)
bilibili.com/video/*
,
b23.tv/*
API
Telegram
t.me/*
Telethon
RSSRSS/Atom feed URLsfeedparser
Any web pageAny other URLJina Reader fallback
平台URL匹配规则抓取方式
X/Twitter
x.com/*/status/*
,
twitter.com/*
GraphQL → FxTwitter → Syndication → oEmbed → Jina → Playwright
微信公众号
mp.weixin.qq.com/*
Playwright JS evaluate → Jina
小红书
xiaohongshu.com/explore/*
,
xhslink.com/*
API (xhshow) → Jina → Playwright
YouTube
youtube.com/watch?v=*
,
youtu.be/*
API metadata + yt-dlp subtitles
GitHub
github.com/*/*
REST API (优先读取中文README)
飞书
feishu.cn/docx/*
,
feishu.cn/wiki/*
Open API → Playwright → Jina
B站
bilibili.com/video/*
,
b23.tv/*
API
Telegram
t.me/*
Telethon
RSSRSS/Atom feed URLsfeedparser
任意网页其他所有URLJina Reader 兜底方案

Pipeline

处理流程

Step 1: Fetch Content

步骤1:抓取内容

bash
feedgrab "$ARGUMENTS"
The CLI auto-detects the platform and routes to the appropriate fetcher.
bash
feedgrab "$ARGUMENTS"
该CLI工具会自动检测平台并路由到对应的抓取器。

Step 2: Locate Output File

步骤2:定位输出文件

feedgrab saves output to
OUTPUT_DIR
(default:
./output/
). Check the CLI output for the saved file path, typically:
  • output/X/author_date:title.md
  • output/mpweixin/author_date:title.md
  • output/XHS/author_date:title.md
  • output/YouTube/author_date:title.md
  • output/GitHub/author_date:title.md
  • output/Feishu/author_date:title.md
feedgrab会将输出保存到
OUTPUT_DIR
(默认路径:
./output/
)。查看CLI输出获取保存的文件路径,通常为:
  • output/X/author_date:title.md
  • output/mpweixin/author_date:title.md
  • output/XHS/author_date:title.md
  • output/YouTube/author_date:title.md
  • output/GitHub/author_date:title.md
  • output/Feishu/author_date:title.md

Step 3: Read and Present

步骤3:读取并展示

Read the output
.md
file and present the content to the user. The file includes:
  • YAML front matter (title, source, author, published, likes, tags, etc.)
  • Full article/tweet/post content in Markdown
  • Images (as remote URLs or local paths if media download is enabled)
读取输出的
.md
文件并将内容展示给用户。文件包含:
  • YAML前置元数据(标题、来源、作者、发布时间、点赞数、标签等)
  • Markdown格式的完整文章/推文/帖子内容
  • 图片(如果开启媒体下载则为远程URL或本地路径)

Clipboard Mode

剪贴板模式

If the user says "grab from clipboard" or the URL contains
&
(which breaks PowerShell):
bash
feedgrab clip
This reads the URL from the system clipboard.
如果用户说「从剪贴板抓取」或者URL包含
&
(会导致PowerShell解析出错):
bash
feedgrab clip
该命令会从系统剪贴板读取URL。

Error Handling

错误处理

ErrorSolution
feedgrab: command not found
Run
/feedgrab-setup
Cookie expired / 401 / 403
feedgrab login <platform>
to refresh
Jina timeout (30s)feedgrab auto-retries with Playwright
Rate limit (429)feedgrab auto-rotates cookies if configured
OUTPUT_DIR
not set
feedgrab setup
to configure
错误解决方案
feedgrab: command not found
运行
/feedgrab-setup
Cookie过期 / 401 / 403执行
feedgrab login <platform>
刷新凭证
Jina超时(30秒)feedgrab会自动切换Playwright重试
请求频率限制(429)已配置的情况下feedgrab会自动轮换Cookie
OUTPUT_DIR
未设置
执行
feedgrab setup
完成配置

Tips

使用提示

  • For Twitter deep extraction (views, bookmarks, threads): configure cookies via
    feedgrab login twitter
  • For WeChat articles: no login needed for single articles
  • For Xiaohongshu:
    pip install xhshow
    for API mode (faster, no browser needed)
  • For GitHub: set
    GITHUB_TOKEN
    for higher rate limits (5000/hr vs 60/hr)
  • For Feishu: set
    FEISHU_APP_ID
    +
    FEISHU_APP_SECRET
    for Open API access
  • Run
    feedgrab doctor
    to diagnose issues
  • 若要Twitter深度提取(播放量、收藏数、推文串):通过
    feedgrab login twitter
    配置Cookie
  • 若要抓取微信公众号文章:单篇文章无需登录
  • 若要抓取小红书内容:执行
    pip install xhshow
    即可使用API模式(速度更快,无需启动浏览器)
  • 若要抓取GitHub内容:设置
    GITHUB_TOKEN
    可获得更高的请求频率限制(5000次/小时 vs 60次/小时)
  • 若要抓取飞书内容:设置
    FEISHU_APP_ID
    +
    FEISHU_APP_SECRET
    即可使用开放API访问
  • 运行
    feedgrab doctor
    可诊断问题