Loading...
Loading...
Content extraction for Chinese news sites. Supports WeChat Official Accounts, Toutiao, NetEase News, Sohu News, and Tencent News. Activated when users need to extract Chinese news content, crawl official account articles, scrape news, or obtain news in JSON/Markdown format.
npx skill4agent add nanmicoder/newscrawler china-news-crawler| Platform | ID | URL Example |
|---|---|---|
| WeChat Official Account | | |
| Toutiao | toutiao | |
| NetEase News | netease | |
| Sohu News | sohu | |
| Tencent News | tencent | |
# Extract news, auto-detect platform, output JSON + Markdown
uv run .claude/skills/china-news-crawler/scripts/extract_news.py "URL"
# Specify output directory
uv run .claude/skills/china-news-crawler/scripts/extract_news.py "URL" --output ./output
# Output only JSON
uv run .claude/skills/china-news-crawler/scripts/extract_news.py "URL" --format json
# Output only Markdown
uv run .claude/skills/china-news-crawler/scripts/extract_news.py "URL" --format markdown
# List supported platforms
uv run .claude/skills/china-news-crawler/scripts/extract_news.py --list-platforms./output{news_id}.json{news_id}.md{
"title": "Article Title",
"news_url": "Original URL",
"news_id": "Article ID",
"meta_info": {
"author_name": "Author/Source",
"author_url": "",
"publish_time": "2024-01-01 12:00"
},
"contents": [
{"type": "text", "content": "Paragraph text", "desc": ""},
{"type": "image", "content": "https://...", "desc": ""},
{"type": "video", "content": "https://...", "desc": ""}
],
"texts": ["Paragraph 1", "Paragraph 2"],
"images": ["Image URL 1", "Image URL 2"],
"videos": []
}# Article Title
## Article Information
**Author**: xxx
**Publish Time**: 2024-01-01 12:00
**Original Link**: [Link](URL)
---
## Article Content
Paragraph content...

---
## Media Resources
### Images (N)
1. URL1
2. URL2uv run .claude/skills/china-news-crawler/scripts/extract_news.py \
"https://mp.weixin.qq.com/s/ebMzDPu2zMT_mRgYgtL6eQ"[INFO] Platform detected: wechat (WeChat Official Account)
[INFO] Extracting content...
[INFO] Title: Article Title
[INFO] Author: Official Account Name
[INFO] Text paragraphs: 15
[INFO] Images: 3
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.json
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.mduv run .claude/skills/china-news-crawler/scripts/extract_news.py \
"https://www.toutiao.com/article/7434425099895210546/"| Error Type | Description | Solution |
|---|---|---|
| URL does not match any supported platform | Check if the URL is correct |
| Non-Chinese site | This Skill only supports Chinese news sites |
| Network error or page structure change | Retry or check URL validity |
china-news-crawler/
├── SKILL.md # [Required] Skill definition file
├── references/
│ └── platform-patterns.md # Platform URL pattern description
└── scripts/
├── extract_news.py # CLI entry script
├── models.py # Data models
├── detector.py # Platform detection
├── formatter.py # Markdown formatting
└── crawlers/ # Crawler modules
├── __init__.py
├── base.py # BaseNewsCrawler base class
├── fetchers.py # HTTP fetching strategies
├── wechat.py # WeChat Official Accounts
├── toutiao.py # Toutiao
├── netease.py # NetEase News
├── sohu.py # Sohu News
└── tencent.py # Tencent News