news-aggregation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

News Aggregation (Multi-Source, 3-Day Window)

新闻聚合(多来源,3天时间窗口)

Collect latest news from multiple sites and aggregators, merge similar stories into short topics, and list all main source links under each topic.
从多个网站和聚合平台收集最新新闻,将相似报道合并为简短主题,并在每个主题下列出所有主要来源链接。

When to use

使用场景

  • You want one concise briefing from many outlets.
  • You need deduplicated coverage (same story from multiple sites).
  • You want source transparency (all original links shown).
  • You want a default time window of the last 3 days unless specified otherwise.
  • 希望从多个媒体获取一份简洁的新闻简报。
  • 需要去重后的新闻报道(同一事件来自多个网站的内容)。
  • 希望来源透明(显示所有原始链接)。
  • 默认使用过去3天的时间窗口,除非用户指定其他范围。

Required tools / APIs

所需工具/API

  • No API keys required for basic RSS workflow.
  • Python 3.10+
Install:
bash
pip install feedparser python-dateutil
  • 基础RSS工作流无需API密钥。
  • Python 3.10+
安装:
bash
pip install feedparser python-dateutil

Sources (news sites + aggregators)

来源(新闻网站+聚合平台)

Use a mixed source list for better coverage.
使用混合来源列表以获得更全面的报道。

News sites (RSS)

新闻网站(RSS)

  • Reuters World:
    https://feeds.reuters.com/Reuters/worldNews
  • AP Top News:
    https://feeds.apnews.com/apnews/topnews
  • BBC World:
    http://feeds.bbci.co.uk/news/world/rss.xml
  • Al Jazeera:
    https://www.aljazeera.com/xml/rss/all.xml
  • The Guardian World:
    https://www.theguardian.com/world/rss
  • NPR News:
    https://feeds.npr.org/1001/rss.xml
  • 路透社世界新闻:
    https://feeds.reuters.com/Reuters/worldNews
  • 美联社头条新闻:
    https://feeds.apnews.com/apnews/topnews
  • BBC世界新闻:
    http://feeds.bbci.co.uk/news/world/rss.xml
  • 半岛电视台:
    https://www.aljazeera.com/xml/rss/all.xml
  • 卫报世界新闻:
    https://www.theguardian.com/world/rss
  • NPR新闻:
    https://feeds.npr.org/1001/rss.xml

Aggregators (RSS/API)

聚合平台(RSS/API)

  • Google News (topic feed):
    https://news.google.com/rss/search?q=world
  • Bing News (RSS query):
    https://www.bing.com/news/search?q=world&format=RSS
  • Hacker News (tech):
    https://hnrss.org/frontpage
  • Reddit News (community signal):
    https://www.reddit.com/r/news/.rss
  • 谷歌新闻(主题订阅源):
    https://news.google.com/rss/search?q=world
  • 必应新闻(RSS查询):
    https://www.bing.com/news/search?q=world&format=RSS
  • 黑客新闻(科技):
    https://hnrss.org/frontpage
  • Reddit新闻(社区信号):
    https://www.reddit.com/r/news/.rss

Skills

实现技巧

Node.js quick fetch + grouping starter

Node.js快速抓取与分组入门

javascript
// npm install rss-parser
const Parser = require('rss-parser');
const parser = new Parser();

const SOURCES = {
  Reuters: 'https://feeds.reuters.com/Reuters/worldNews',
  AP: 'https://feeds.apnews.com/apnews/topnews',
  BBC: 'http://feeds.bbci.co.uk/news/world/rss.xml',
  'Google News': 'https://news.google.com/rss/search?q=world'
};

async function fetchRecent(days = 3) {
  const cutoff = Date.now() - days * 24 * 60 * 60 * 1000;
  const all = [];

  for (const [source, url] of Object.entries(SOURCES)) {
    const feed = await parser.parseURL(url);
    for (const item of feed.items || []) {
      const ts = new Date(item.pubDate || item.isoDate || 0).getTime();
      if (!ts || ts < cutoff) continue;
      all.push({ source, title: item.title || '', link: item.link || '', ts });
    }
  }

  return all.sort((a, b) => b.ts - a.ts);
}

// Next step: add title-similarity clustering (same idea as Python section above)
javascript
// npm install rss-parser
const Parser = require('rss-parser');
const parser = new Parser();

const SOURCES = {
  Reuters: 'https://feeds.reuters.com/Reuters/worldNews',
  AP: 'https://feeds.apnews.com/apnews/topnews',
  BBC: 'http://feeds.bbci.co.uk/news/world/rss.xml',
  'Google News': 'https://news.google.com/rss/search?q=world'
};

async function fetchRecent(days = 3) {
  const cutoff = Date.now() - days * 24 * 60 * 60 * 1000;
  const all = [];

  for (const [source, url] of Object.entries(SOURCES)) {
    const feed = await parser.parseURL(url);
    for (const item of feed.items || []) {
      const ts = new Date(item.pubDate || item.isoDate || 0).getTime();
      if (!ts || ts < cutoff) continue;
      all.push({ source, title: item.title || '', link: item.link || '', ts });
    }
  }

  return all.sort((a, b) => b.ts - a.ts);
}

// 下一步:添加标题相似度聚类(思路与Python部分一致)

Agent prompt

Agent提示词

text
Use the News Aggregation skill.

Requirements:
1) Pull news from multiple predefined sources (news sites + aggregators).
2) Default to only the last 3 days unless user asks another time range.
3) Group similar headlines into one short topic.
4) Under each topic, list all main source links (not just one source).
5) If 3+ sources cover the same event, output one topic with all those links.
6) Keep summaries short and factual; avoid adding unsupported claims.
text
使用新闻聚合技能。

要求:
1) 从多个预定义来源(新闻网站+聚合平台)获取新闻。
2) 默认仅获取过去3天的内容,除非用户指定其他时间范围。
3) 将相似标题归为一个简短主题。
4) 在每个主题下列出所有主要来源链接(而非仅一个来源)。
5) 若3个及以上来源报道同一事件,输出一个主题并附上所有相关链接。
6) 保持摘要简短且基于事实;避免添加无依据的主张。

Best practices

最佳实践

  • Keep source diversity (wire + publisher + aggregator) to reduce bias.
  • Rank grouped topics by number of independent sources.
  • Include publication timestamps when possible.
  • Keep the grouping threshold conservative to avoid merging unrelated stories.
  • Allow custom source lists and time windows when user requests.
  • 保持来源多样性(通讯社+出版商+聚合平台)以减少偏见。
  • 根据独立来源的数量对分组主题进行排序。
  • 尽可能包含发布时间戳。
  • 保持分组阈值的保守性,避免合并不相关的报道。
  • 允许用户自定义来源列表和时间窗口。

Troubleshooting

故障排除

  • Empty results: some feeds may be unavailable; retry and rotate sources.
  • Too many duplicates: increase similarity threshold (e.g., 0.35 -> 0.45).
  • Under-grouping: decrease threshold (e.g., 0.35 -> 0.28).
  • Rate limiting: fetch feeds sequentially with small delays.
  • 无结果:部分订阅源可能不可用;重试并更换来源。
  • 重复内容过多:提高相似度阈值(例如,0.35 -> 0.45)。
  • 分组不足:降低阈值(例如,0.35 -> 0.28)。
  • 速率限制:添加短暂延迟后按顺序抓取订阅源。

See also

相关内容

  • Web Search API (Free)
  • Web Scraping (Chrome + DuckDuckGo)
  • 免费网络搜索API
  • 网页抓取(Chrome + DuckDuckGo)