llm-public-opinion-analytics-assistant

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

LLM Public Opinion Analytics Assistant

LLM舆情分析助手

Skill by ara.so — Data Skills collection.
A comprehensive Chinese public opinion monitoring platform that aggregates real-time trending data from 26 lists across 15 mainstream platforms (Weibo, Bilibili, Douyin, Zhihu, etc.) and provides LLM-powered analysis including sentiment classification, topic clustering, and automated alerts via email, WeChat Work, and Telegram.
ara.so提供的Skill——数据技能合集。
这是一个全面的中文舆情监测平台,聚合了15个主流平台(Weibo、Bilibili、Douyin、Zhihu等)26个榜单的实时热点数据,并提供基于LLM的分析功能,包括情感分类、主题聚类,以及通过邮件、企业微信、Telegram发送的自动化告警。

What This Project Does

项目功能

  • Multi-Platform Crawling: Scrapes trending lists from 15 Chinese platforms with Scrapy-based distributed crawlers
  • Deep Content Extraction: Retrieves full article/video content from detail pages using Selenium
  • LLM Analysis: Performs sentiment analysis, topic clustering, and trend summarization using large language models (supports Huawei Pangu, OpenAI-compatible APIs)
  • Conversational Interface: Natural language queries for trending topics, theme searches, and analytical reports
  • Alert System: Multi-channel push notifications (Enterprise WeChat, Telegram, email) with scheduled analysis reports
  • Shortcut Controls: Keyboard shortcuts to start/stop crawlers and trigger analysis tasks
  • 多平台爬取:基于Scrapy的分布式爬虫,爬取15个中文平台的热门榜单
  • 深度内容提取:使用Selenium从详情页获取完整的文章/视频内容
  • LLM分析:利用大语言模型执行情感分析、主题聚类和趋势总结(支持华为盘古、兼容OpenAI的API)
  • 对话式界面:通过自然语言查询热点话题、主题搜索及分析报告
  • 告警系统:多渠道推送通知(企业微信、Telegram、邮件),支持定时分析报告
  • 快捷控制:通过键盘快捷键启动/停止爬虫、触发分析任务

Installation

安装步骤

Prerequisites

前置条件

1. Browser Driver Setup
Download and configure browser drivers for content extraction:
bash
undefined
1. 浏览器驱动配置
下载并配置用于内容提取的浏览器驱动:
bash
undefined

For Chrome

For Chrome

Match your Chrome version (check chrome://version)

Match your Chrome version (check chrome://version)

For Edge

For Edge

Place driver in PATH or project directory

Place driver in PATH or project directory

Linux/macOS example:

Linux/macOS example:

sudo mv chromedriver /usr/local/bin/ chmod +x /usr/local/bin/chromedriver
sudo mv chromedriver /usr/local/bin/ chmod +x /usr/local/bin/chromedriver

Verify installation

Verify installation

chromedriver --version

**2. MySQL Database**

```bash
chromedriver --version

**2. MySQL数据库**

```bash

Install MySQL 8.0+

Install MySQL 8.0+

sudo apt-get install mysql-server # Ubuntu/Debian
sudo apt-get install mysql-server # Ubuntu/Debian

OR

OR

brew install mysql # macOS
brew install mysql # macOS

Create database

Create database

mysql -u root -p CREATE DATABASE hotsearch CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

**3. Python Environment**

```bash
mysql -u root -p CREATE DATABASE hotsearch CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

**3. Python环境**

```bash

Clone repository

Clone repository

git clone https://github.com/hmmnxkl/LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant.git cd LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant
git clone https://github.com/hmmnxkl/LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant.git cd LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant

Create virtual environment

Create virtual environment

python3 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate
python3 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate

Install dependencies

Install dependencies

pip install -r requirements.txt
undefined
pip install -r requirements.txt
undefined

Database Initialization

数据库初始化

Initialize database schema using the reference script:
python
undefined
使用参考脚本初始化数据库 schema:
python
undefined

init.py - Database setup reference

init.py - Database setup reference

import pymysql
connection = pymysql.connect( host='localhost', user='root', password='your_password', database='hotsearch', charset='utf8mb4' )
with connection.cursor() as cursor: # Create trending lists table cursor.execute(""" CREATE TABLE IF NOT EXISTS trending_topics ( id INT AUTO_INCREMENT PRIMARY KEY, platform VARCHAR(50) NOT NULL, title VARCHAR(500) NOT NULL, url VARCHAR(1000), rank INT, hot_value VARCHAR(100), detail_content TEXT, crawl_time DATETIME, INDEX idx_platform (platform), INDEX idx_crawl_time (crawl_time) ) """)
# Create analysis results table
cursor.execute("""
    CREATE TABLE IF NOT EXISTS analysis_results (
        id INT AUTO_INCREMENT PRIMARY KEY,
        query_text VARCHAR(500),
        analysis_type VARCHAR(50),
        result_content TEXT,
        sentiment_score FLOAT,
        created_at DATETIME,
        INDEX idx_query (query_text),
        INDEX idx_type (analysis_type)
    )
""")

connection.commit()
undefined
import pymysql
connection = pymysql.connect( host='localhost', user='root', password='your_password', database='hotsearch', charset='utf8mb4' )
with connection.cursor() as cursor: # Create trending lists table cursor.execute(""" CREATE TABLE IF NOT EXISTS trending_topics ( id INT AUTO_INCREMENT PRIMARY KEY, platform VARCHAR(50) NOT NULL, title VARCHAR(500) NOT NULL, url VARCHAR(1000), rank INT, hot_value VARCHAR(100), detail_content TEXT, crawl_time DATETIME, INDEX idx_platform (platform), INDEX idx_crawl_time (crawl_time) ) """)
# Create analysis results table
cursor.execute("""
    CREATE TABLE IF NOT EXISTS analysis_results (
        id INT AUTO_INCREMENT PRIMARY KEY,
        query_text VARCHAR(500),
        analysis_type VARCHAR(50),
        result_content TEXT,
        sentiment_score FLOAT,
        created_at DATETIME,
        INDEX idx_query (query_text),
        INDEX idx_type (analysis_type)
    )
""")

connection.commit()
undefined

Configuration

配置

Environment Variables

环境变量

Create
.env
file in project root:
bash
undefined
在项目根目录创建
.env
文件:
bash
undefined

Database Configuration

Database Configuration

MYSQL_HOST=localhost MYSQL_PORT=3306 MYSQL_USER=root MYSQL_PASSWORD=your_secure_password MYSQL_DATABASE=hotsearch
MYSQL_HOST=localhost MYSQL_PORT=3306 MYSQL_USER=root MYSQL_PASSWORD=your_secure_password MYSQL_DATABASE=hotsearch

LLM API Configuration (OpenAI-compatible format)

LLM API Configuration (OpenAI-compatible format)

OPENAI_API_KEY=your_api_key_here OPENAI_API_BASE=https://api.openai.com/v1 MODEL_NAME=gpt-4
OPENAI_API_KEY=your_api_key_here OPENAI_API_BASE=https://api.openai.com/v1 MODEL_NAME=gpt-4

OR use Huawei Pangu Model (local deployment)

OR use Huawei Pangu Model (local deployment)

PANGU_MODEL_PATH=/path/to/openpangu-embedded-7b-model

PANGU_MODEL_PATH=/path/to/openpangu-embedded-7b-model

USE_LOCAL_MODEL=true

USE_LOCAL_MODEL=true

Push Notification Channels

Push Notification Channels

Enterprise WeChat Robot

Enterprise WeChat Robot

Telegram Bot

Telegram Bot

TELEGRAM_BOT_TOKEN=your_bot_token TELEGRAM_CHAT_ID=your_chat_id
TELEGRAM_BOT_TOKEN=your_bot_token TELEGRAM_CHAT_ID=your_chat_id

Email (SMTP)

Email (SMTP)

SMTP_SERVER=smtp.gmail.com SMTP_PORT=587 SMTP_USER=your_email@gmail.com SMTP_PASSWORD=your_app_password ALERT_EMAIL_TO=recipient@example.com
SMTP_SERVER=smtp.gmail.com SMTP_PORT=587 SMTP_USER=your_email@gmail.com SMTP_PASSWORD=your_app_password ALERT_EMAIL_TO=recipient@example.com

Browser Driver Configuration

Browser Driver Configuration

CHROME_DRIVER_PATH=/usr/local/bin/chromedriver EDGE_DRIVER_PATH=/usr/local/bin/msedgedriver
undefined
CHROME_DRIVER_PATH=/usr/local/bin/chromedriver EDGE_DRIVER_PATH=/usr/local/bin/msedgedriver
undefined

Crawler Settings

爬虫设置

Configure
hotsearchcrawler/settings.py
:
python
undefined
配置
hotsearchcrawler/settings.py
python
undefined

MySQL pipeline settings

MySQL pipeline settings

MYSQL_HOST = 'localhost' MYSQL_DATABASE = 'hotsearch' MYSQL_USER = 'root' MYSQL_PASSWORD = 'your_password'
MYSQL_HOST = 'localhost' MYSQL_DATABASE = 'hotsearch' MYSQL_USER = 'root' MYSQL_PASSWORD = 'your_password'

Crawler behavior

Crawler behavior

CONCURRENT_REQUESTS = 16 DOWNLOAD_DELAY = 1 # Politeness delay between requests ROBOTSTXT_OBEY = False # Many Chinese platforms don't have robots.txt
CONCURRENT_REQUESTS = 16 DOWNLOAD_DELAY = 1 # Politeness delay between requests ROBOTSTXT_OBEY = False # Many Chinese platforms don't have robots.txt

Optional: Platform-specific cookies for authenticated content

Optional: Platform-specific cookies for authenticated content

COOKIES = { 'weibo': 'SUB=your_cookie_value', 'bilibili': 'SESSDATA=your_session_data' }
COOKIES = { 'weibo': 'SUB=your_cookie_value', 'bilibili': 'SESSDATA=your_session_data' }

User agent rotation

User agent rotation

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
undefined
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
undefined

Starting the System

启动系统

Launch Analysis System

启动分析系统

bash
undefined
bash
undefined

Start the main application

Start the main application

python app.py
python app.py

Access web interface at http://localhost:5000

Access web interface at http://localhost:5000

undefined
undefined

Start Crawlers

启动爬虫

Via Web Interface:
  • Use keyboard shortcut (configured in UI) to start/stop crawlers
Via Command Line:
bash
undefined
通过Web界面:
  • 使用UI中配置的键盘快捷键启动/停止爬虫
通过命令行:
bash
undefined

Start all platform crawlers

Start all platform crawlers

python run_spiders.py
python run_spiders.py

Test specific platform

Test specific platform

cd hotsearchcrawler scrapy crawl weibo_hot # Weibo trending scrapy crawl bilibili_hot # Bilibili trending scrapy crawl douyin_hot # Douyin trending scrapy crawl zhihu_hot # Zhihu trending
undefined
cd hotsearchcrawler scrapy crawl weibo_hot # Weibo trending scrapy crawl bilibili_hot # Bilibili trending scrapy crawl douyin_hot # Douyin trending scrapy crawl zhihu_hot # Zhihu trending
undefined

Key Usage Patterns

核心使用场景

Conversational Query Interface

对话式查询接口

python
undefined
python
undefined

Example API call to analysis endpoint

Example API call to analysis endpoint

import requests
response = requests.post('http://localhost:5000/api/query', json={ 'query': '最近关于人工智能的热点有哪些?', # "What AI-related trending topics recently?" 'analysis_type': 'theme_search' })
result = response.json() print(result['topics']) # List of AI-related trending topics print(result['analysis']) # LLM-generated summary
undefined
import requests
response = requests.post('http://localhost:5000/api/query', json={ 'query': '最近关于人工智能的热点有哪些?', # "What AI-related trending topics recently?" 'analysis_type': 'theme_search' })
result = response.json() print(result['topics']) # List of AI-related trending topics print(result['analysis']) # LLM-generated summary
undefined

Sentiment Analysis

情感分析

python
undefined
python
undefined

Analyze sentiment for a specific topic

Analyze sentiment for a specific topic

response = requests.post('http://localhost:5000/api/analyze', json={ 'query': 'GPT-6模型发布', 'analysis_type': 'sentiment' })
sentiment = response.json()
response = requests.post('http://localhost:5000/api/analyze', json={ 'query': 'GPT-6模型发布', 'analysis_type': 'sentiment' })
sentiment = response.json()

Returns: {

Returns: {

'sentiment': 'positive',

'sentiment': 'positive',

'score': 0.85,

'score': 0.85,

'reasoning': '大多数评论表达了期待和兴奋...'

'reasoning': '大多数评论表达了期待和兴奋...'

}

}

undefined
undefined

Topic Clustering

主题聚类

python
undefined
python
undefined

Cluster related topics

Cluster related topics

response = requests.post('http://localhost:5000/api/analyze', json={ 'query': '科技创新', 'analysis_type': 'clustering', 'time_range': '7d' # Last 7 days })
clusters = response.json()['clusters']
response = requests.post('http://localhost:5000/api/analyze', json={ 'query': '科技创新', 'analysis_type': 'clustering', 'time_range': '7d' # Last 7 days })
clusters = response.json()['clusters']

Returns grouped topics:

Returns grouped topics:

{

{

'AI模型发展': [topic1, topic2, ...],

'AI模型发展': [topic1, topic2, ...],

'硬件生态': [topic3, topic4, ...],

'硬件生态': [topic3, topic4, ...],

'商业应用': [topic5, topic6, ...]

'商业应用': [topic5, topic6, ...]

}

}

undefined
undefined

Scheduled Alert Tasks

定时告警任务

python
undefined
python
undefined

test_push_task.py - Configure automated reports

test_push_task.py - Configure automated reports

from hotsearch_analysis_agent.push_service import PushService
push_service = PushService()
from hotsearch_analysis_agent.push_service import PushService
push_service = PushService()

Create scheduled alert

Create scheduled alert

push_service.create_task({ 'name': 'AI技术日报', 'query': '人工智能 OR 大模型 OR AI', 'schedule': 'daily', # daily, weekly, hourly 'time': '09:00', 'channels': ['wechat', 'telegram', 'email'], 'analysis_types': ['sentiment', 'clustering', 'summary'] })
push_service.create_task({ 'name': 'AI技术日报', 'query': '人工智能 OR 大模型 OR AI', 'schedule': 'daily', # daily, weekly, hourly 'time': '09:00', 'channels': ['wechat', 'telegram', 'email'], 'analysis_types': ['sentiment', 'clustering', 'summary'] })

Test immediate push

Test immediate push

push_service.send_report( title='人工智能与前沿科技热点分析', content=analysis_result, channels=['telegram'] )
undefined
push_service.send_report( title='人工智能与前沿科技热点分析', content=analysis_result, channels=['telegram'] )
undefined

Direct Crawler Usage

直接使用爬虫

python
undefined
python
undefined

Custom crawler for specific platform

Custom crawler for specific platform

from hotsearchcrawler.spiders.weibo_spider import WeiboHotSpider from scrapy.crawler import CrawlerProcess
process = CrawlerProcess({ 'USER_AGENT': 'Mozilla/5.0...', 'MYSQL_HOST': 'localhost', 'MYSQL_DATABASE': 'hotsearch' })
process.crawl(WeiboHotSpider) process.start()
undefined
from hotsearchcrawler.spiders.weibo_spider import WeiboHotSpider from scrapy.crawler import CrawlerProcess
process = CrawlerProcess({ 'USER_AGENT': 'Mozilla/5.0...', 'MYSQL_HOST': 'localhost', 'MYSQL_DATABASE': 'hotsearch' })
process.crawl(WeiboHotSpider) process.start()
undefined

LLM Integration (Pangu Model)

LLM集成(盘古模型)

python
undefined
python
undefined

Using Huawei Pangu model locally

Using Huawei Pangu model locally

from hotsearch_analysis_agent.llm_engine import PanguAnalyzer
analyzer = PanguAnalyzer( model_path='/path/to/openpangu-embedded-7b-model' )
from hotsearch_analysis_agent.llm_engine import PanguAnalyzer
analyzer = PanguAnalyzer( model_path='/path/to/openpangu-embedded-7b-model' )

Analyze topic

Analyze topic

result = analyzer.analyze_topic( topic='DeepSeek V4采用华为算力', context=related_articles, task='sentiment_and_summary' )
print(result['sentiment']) # positive/neutral/negative print(result['summary']) # Chinese summary print(result['key_points']) # Extracted insights
undefined
result = analyzer.analyze_topic( topic='DeepSeek V4采用华为算力', context=related_articles, task='sentiment_and_summary' )
print(result['sentiment']) # positive/neutral/negative print(result['summary']) # Chinese summary print(result['key_points']) # Extracted insights
undefined

Common Workflows

常见工作流

Daily Monitoring Setup

日常监测设置

python
undefined
python
undefined

1. Start crawlers (runs continuously)

1. Start crawlers (runs continuously)

Run in background: nohup python run_spiders.py &

Run in background: nohup python run_spiders.py &

2. Configure daily report

2. Configure daily report

from hotsearch_analysis_agent.scheduler import AnalysisScheduler
scheduler = AnalysisScheduler() scheduler.add_daily_task( query='科技 OR 互联网 OR AI', time='08:00', recipients=['wechat_group', 'email'], report_format='detailed' # detailed or summary )
from hotsearch_analysis_agent.scheduler import AnalysisScheduler
scheduler = AnalysisScheduler() scheduler.add_daily_task( query='科技 OR 互联网 OR AI', time='08:00', recipients=['wechat_group', 'email'], report_format='detailed' # detailed or summary )

3. Start scheduler

3. Start scheduler

scheduler.run()
undefined
scheduler.run()
undefined

Real-Time Alert for Keywords

关键词实时告警

python
undefined
python
undefined

Monitor specific keywords with immediate alerts

Monitor specific keywords with immediate alerts

from hotsearch_analysis_agent.realtime_monitor import RealtimeMonitor
monitor = RealtimeMonitor() monitor.add_keyword_alert( keywords=['华为', '盘古', 'DeepSeek'], threshold_rank=10, # Alert if keyword appears in top 10 channels=['telegram'], callback=custom_alert_handler )
monitor.start()
undefined
from hotsearch_analysis_agent.realtime_monitor import RealtimeMonitor
monitor = RealtimeMonitor() monitor.add_keyword_alert( keywords=['华为', '盘古', 'DeepSeek'], threshold_rank=10, # Alert if keyword appears in top 10 channels=['telegram'], callback=custom_alert_handler )
monitor.start()
undefined

Export Analysis Report

导出分析报告

python
undefined
python
undefined

Generate and export comprehensive report

Generate and export comprehensive report

from hotsearch_analysis_agent.report_generator import ReportGenerator
generator = ReportGenerator() report = generator.generate( query='人工智能', date_range=('2026-04-01', '2026-04-07'), include_sentiment=True, include_clusters=True, format='markdown' # markdown, html, or json )
from hotsearch_analysis_agent.report_generator import ReportGenerator
generator = ReportGenerator() report = generator.generate( query='人工智能', date_range=('2026-04-01', '2026-04-07'), include_sentiment=True, include_clusters=True, format='markdown' # markdown, html, or json )

Save to file

Save to file

with open('ai_report_20260407.md', 'w', encoding='utf-8') as f: f.write(report)
with open('ai_report_20260407.md', 'w', encoding='utf-8') as f: f.write(report)

Or push directly

Or push directly

generator.push_report(report, channels=['email', 'wechat'])
undefined
generator.push_report(report, channels=['email', 'wechat'])
undefined

Platform Coverage

平台覆盖范围

Supported platforms (26 trending lists from 15 platforms):
  • Social Media: Weibo, Douyin, Kuaishou
  • Video: Bilibili, Xigua Video
  • News: Toutiao, NetEase, Sina
  • Q&A/Forums: Zhihu, Tieba
  • Tech: 36Kr, iFeng Tech, IT Home
  • Finance: Caijing, Eastmoney
  • Others: Baidu Hot Search, Sogou
Each platform may have multiple lists (综合榜, 热搜榜, 视频榜, etc.)
支持的平台(15个平台共26个热门榜单):
  • 社交媒体:Weibo、Douyin、Kuaishou
  • 视频平台:Bilibili、Xigua Video
  • 新闻平台:Toutiao、NetEase、Sina
  • 问答/论坛:Zhihu、Tieba
  • 科技平台:36Kr、iFeng Tech、IT Home
  • 财经平台:Caijing、Eastmoney
  • 其他:Baidu Hot Search、Sogou
每个平台可能包含多个榜单(综合榜、热搜榜、视频榜等)

Troubleshooting

故障排查

Crawler Fails to Extract Content

爬虫无法提取内容

python
undefined
python
undefined

Check browser driver compatibility

Check browser driver compatibility

from selenium import webdriver from selenium.webdriver.chrome.service import Service
service = Service('/usr/local/bin/chromedriver') driver = webdriver.Chrome(service=service) driver.get('https://www.bilibili.com') print(driver.page_source[:500]) # Should show HTML
from selenium import webdriver from selenium.webdriver.chrome.service import Service
service = Service('/usr/local/bin/chromedriver') driver = webdriver.Chrome(service=service) driver.get('https://www.bilibili.com') print(driver.page_source[:500]) # Should show HTML

If fails, update driver to match browser version

If fails, update driver to match browser version

undefined
undefined

Database Connection Issues

数据库连接问题

python
undefined
python
undefined

Test MySQL connection

Test MySQL connection

import pymysql
try: conn = pymysql.connect( host='localhost', user='root', password='your_password', database='hotsearch' ) print("Connected successfully") except pymysql.Error as e: print(f"Error: {e}") # Common fixes: # 1. Check MySQL service: sudo systemctl status mysql # 2. Verify credentials in .env # 3. Grant permissions: GRANT ALL ON hotsearch.* TO 'root'@'localhost';
undefined
import pymysql
try: conn = pymysql.connect( host='localhost', user='root', password='your_password', database='hotsearch' ) print("Connected successfully") except pymysql.Error as e: print(f"Error: {e}") # Common fixes: # 1. Check MySQL service: sudo systemctl status mysql # 2. Verify credentials in .env # 3. Grant permissions: GRANT ALL ON hotsearch.* TO 'root'@'localhost';
undefined

LLM API Timeout

LLM API超时

python
undefined
python
undefined

Increase timeout for large context

Increase timeout for large context

import openai openai.api_key = os.getenv('OPENAI_API_KEY')
response = openai.ChatCompletion.create( model='gpt-4', messages=[{'role': 'user', 'content': long_text}], timeout=120 # Increase from default 30s )
undefined
import openai openai.api_key = os.getenv('OPENAI_API_KEY')
response = openai.ChatCompletion.create( model='gpt-4', messages=[{'role': 'user', 'content': long_text}], timeout=120 # Increase from default 30s )
undefined

Push Notification Not Sending

推送通知无法发送

bash
undefined
bash
undefined

Test Enterprise WeChat webhook

Test Enterprise WeChat webhook

curl -X POST 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY'
-H 'Content-Type: application/json'
-d '{"msgtype":"text","text":{"content":"测试消息"}}'
curl -X POST 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY'
-H 'Content-Type: application/json'
-d '{"msgtype":"text","text":{"content":"测试消息"}}'

Test Telegram bot

Test Telegram bot

Memory Issues with Large Datasets

大数据集内存问题

python
undefined
python
undefined

Use batch processing for large queries

Use batch processing for large queries

from hotsearch_analysis_agent.db_utils import query_topics_batch
for batch in query_topics_batch( query='人工智能', batch_size=1000, date_range=('2026-01-01', '2026-04-07') ): analysis = analyzer.process_batch(batch) save_results(analysis)
undefined
from hotsearch_analysis_agent.db_utils import query_topics_batch
for batch in query_topics_batch( query='人工智能', batch_size=1000, date_range=('2026-01-01', '2026-04-07') ): analysis = analyzer.process_batch(batch) save_results(analysis)
undefined

Advanced Configuration

高级配置

Custom Platform Spider

自定义平台爬虫

python
undefined
python
undefined

hotsearchcrawler/spiders/custom_spider.py

hotsearchcrawler/spiders/custom_spider.py

import scrapy from hotsearchcrawler.items import TrendingItem
class CustomPlatformSpider(scrapy.Spider): name = 'custom_platform' start_urls = ['https://custom-platform.com/trending']
def parse(self, response):
    for item in response.css('.trending-item'):
        yield TrendingItem(
            platform='CustomPlatform',
            title=item.css('.title::text').get(),
            url=item.css('a::attr(href)').get(),
            rank=item.css('.rank::text').get(),
            hot_value=item.css('.hot::text').get()
        )
undefined
import scrapy from hotsearchcrawler.items import TrendingItem
class CustomPlatformSpider(scrapy.Spider): name = 'custom_platform' start_urls = ['https://custom-platform.com/trending']
def parse(self, response):
    for item in response.css('.trending-item'):
        yield TrendingItem(
            platform='CustomPlatform',
            title=item.css('.title::text').get(),
            url=item.css('a::attr(href)').get(),
            rank=item.css('.rank::text').get(),
            hot_value=item.css('.hot::text').get()
        )
undefined

Multi-Language Support Extension

多语言支持扩展

python
undefined
python
undefined

Add translation layer for non-Chinese queries

Add translation layer for non-Chinese queries

from hotsearch_analysis_agent.translator import QueryTranslator
translator = QueryTranslator() en_query = "artificial intelligence trends" zh_query = translator.translate(en_query, target='zh')
from hotsearch_analysis_agent.translator import QueryTranslator
translator = QueryTranslator() en_query = "artificial intelligence trends" zh_query = translator.translate(en_query, target='zh')

Returns: "人工智能趋势"

Returns: "人工智能趋势"

results = analyzer.query(zh_query) translated_results = translator.translate_results(results, target='en')

This skill enables AI agents to help developers deploy and utilize a comprehensive Chinese public opinion monitoring system with LLM-powered analytics and multi-channel alerting.
results = analyzer.query(zh_query) translated_results = translator.translate_results(results, target='en')

该Skill可让AI Agent帮助开发者部署并使用这套全面的中文舆情监测系统,具备LLM驱动的分析能力和多渠道告警功能。