llm-public-opinion-analytics-assistant
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLLM Public Opinion Analytics Assistant
LLM舆情分析助手
Skill by ara.so — Data Skills collection.
A comprehensive Chinese public opinion monitoring platform that aggregates real-time trending data from 26 lists across 15 mainstream platforms (Weibo, Bilibili, Douyin, Zhihu, etc.) and provides LLM-powered analysis including sentiment classification, topic clustering, and automated alerts via email, WeChat Work, and Telegram.
由ara.so提供的Skill——数据技能合集。
这是一个全面的中文舆情监测平台,聚合了15个主流平台(Weibo、Bilibili、Douyin、Zhihu等)26个榜单的实时热点数据,并提供基于LLM的分析功能,包括情感分类、主题聚类,以及通过邮件、企业微信、Telegram发送的自动化告警。
What This Project Does
项目功能
- Multi-Platform Crawling: Scrapes trending lists from 15 Chinese platforms with Scrapy-based distributed crawlers
- Deep Content Extraction: Retrieves full article/video content from detail pages using Selenium
- LLM Analysis: Performs sentiment analysis, topic clustering, and trend summarization using large language models (supports Huawei Pangu, OpenAI-compatible APIs)
- Conversational Interface: Natural language queries for trending topics, theme searches, and analytical reports
- Alert System: Multi-channel push notifications (Enterprise WeChat, Telegram, email) with scheduled analysis reports
- Shortcut Controls: Keyboard shortcuts to start/stop crawlers and trigger analysis tasks
- 多平台爬取:基于Scrapy的分布式爬虫,爬取15个中文平台的热门榜单
- 深度内容提取:使用Selenium从详情页获取完整的文章/视频内容
- LLM分析:利用大语言模型执行情感分析、主题聚类和趋势总结(支持华为盘古、兼容OpenAI的API)
- 对话式界面:通过自然语言查询热点话题、主题搜索及分析报告
- 告警系统:多渠道推送通知(企业微信、Telegram、邮件),支持定时分析报告
- 快捷控制:通过键盘快捷键启动/停止爬虫、触发分析任务
Installation
安装步骤
Prerequisites
前置条件
1. Browser Driver Setup
Download and configure browser drivers for content extraction:
bash
undefined1. 浏览器驱动配置
下载并配置用于内容提取的浏览器驱动:
bash
undefinedFor Chrome
For Chrome
Match your Chrome version (check chrome://version)
Match your Chrome version (check chrome://version)
For Edge
For Edge
Place driver in PATH or project directory
Place driver in PATH or project directory
Linux/macOS example:
Linux/macOS example:
sudo mv chromedriver /usr/local/bin/
chmod +x /usr/local/bin/chromedriver
sudo mv chromedriver /usr/local/bin/
chmod +x /usr/local/bin/chromedriver
Verify installation
Verify installation
chromedriver --version
**2. MySQL Database**
```bashchromedriver --version
**2. MySQL数据库**
```bashInstall MySQL 8.0+
Install MySQL 8.0+
sudo apt-get install mysql-server # Ubuntu/Debian
sudo apt-get install mysql-server # Ubuntu/Debian
OR
OR
brew install mysql # macOS
brew install mysql # macOS
Create database
Create database
mysql -u root -p
CREATE DATABASE hotsearch CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
**3. Python Environment**
```bashmysql -u root -p
CREATE DATABASE hotsearch CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
**3. Python环境**
```bashClone repository
Clone repository
git clone https://github.com/hmmnxkl/LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant.git
cd LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant
git clone https://github.com/hmmnxkl/LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant.git
cd LLM-Based-Intelligent-Public-Opinion-Analytics-Assistant
Create virtual environment
Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
Install dependencies
Install dependencies
pip install -r requirements.txt
undefinedpip install -r requirements.txt
undefinedDatabase Initialization
数据库初始化
Initialize database schema using the reference script:
python
undefined使用参考脚本初始化数据库 schema:
python
undefinedinit.py - Database setup reference
init.py - Database setup reference
import pymysql
connection = pymysql.connect(
host='localhost',
user='root',
password='your_password',
database='hotsearch',
charset='utf8mb4'
)
with connection.cursor() as cursor:
# Create trending lists table
cursor.execute("""
CREATE TABLE IF NOT EXISTS trending_topics (
id INT AUTO_INCREMENT PRIMARY KEY,
platform VARCHAR(50) NOT NULL,
title VARCHAR(500) NOT NULL,
url VARCHAR(1000),
rank INT,
hot_value VARCHAR(100),
detail_content TEXT,
crawl_time DATETIME,
INDEX idx_platform (platform),
INDEX idx_crawl_time (crawl_time)
)
""")
# Create analysis results table
cursor.execute("""
CREATE TABLE IF NOT EXISTS analysis_results (
id INT AUTO_INCREMENT PRIMARY KEY,
query_text VARCHAR(500),
analysis_type VARCHAR(50),
result_content TEXT,
sentiment_score FLOAT,
created_at DATETIME,
INDEX idx_query (query_text),
INDEX idx_type (analysis_type)
)
""")
connection.commit()undefinedimport pymysql
connection = pymysql.connect(
host='localhost',
user='root',
password='your_password',
database='hotsearch',
charset='utf8mb4'
)
with connection.cursor() as cursor:
# Create trending lists table
cursor.execute("""
CREATE TABLE IF NOT EXISTS trending_topics (
id INT AUTO_INCREMENT PRIMARY KEY,
platform VARCHAR(50) NOT NULL,
title VARCHAR(500) NOT NULL,
url VARCHAR(1000),
rank INT,
hot_value VARCHAR(100),
detail_content TEXT,
crawl_time DATETIME,
INDEX idx_platform (platform),
INDEX idx_crawl_time (crawl_time)
)
""")
# Create analysis results table
cursor.execute("""
CREATE TABLE IF NOT EXISTS analysis_results (
id INT AUTO_INCREMENT PRIMARY KEY,
query_text VARCHAR(500),
analysis_type VARCHAR(50),
result_content TEXT,
sentiment_score FLOAT,
created_at DATETIME,
INDEX idx_query (query_text),
INDEX idx_type (analysis_type)
)
""")
connection.commit()undefinedConfiguration
配置
Environment Variables
环境变量
Create file in project root:
.envbash
undefined在项目根目录创建 文件:
.envbash
undefinedDatabase Configuration
Database Configuration
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=your_secure_password
MYSQL_DATABASE=hotsearch
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_USER=root
MYSQL_PASSWORD=your_secure_password
MYSQL_DATABASE=hotsearch
LLM API Configuration (OpenAI-compatible format)
LLM API Configuration (OpenAI-compatible format)
OPENAI_API_KEY=your_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4
OPENAI_API_KEY=your_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4
OR use Huawei Pangu Model (local deployment)
OR use Huawei Pangu Model (local deployment)
PANGU_MODEL_PATH=/path/to/openpangu-embedded-7b-model
PANGU_MODEL_PATH=/path/to/openpangu-embedded-7b-model
USE_LOCAL_MODEL=true
USE_LOCAL_MODEL=true
Push Notification Channels
Push Notification Channels
Enterprise WeChat Robot
Enterprise WeChat Robot
WECHAT_ROBOT_WEBHOOK=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY
WECHAT_ROBOT_WEBHOOK=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY
Telegram Bot
Telegram Bot
TELEGRAM_BOT_TOKEN=your_bot_token
TELEGRAM_CHAT_ID=your_chat_id
TELEGRAM_BOT_TOKEN=your_bot_token
TELEGRAM_CHAT_ID=your_chat_id
Email (SMTP)
Email (SMTP)
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your_email@gmail.com
SMTP_PASSWORD=your_app_password
ALERT_EMAIL_TO=recipient@example.com
SMTP_SERVER=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your_email@gmail.com
SMTP_PASSWORD=your_app_password
ALERT_EMAIL_TO=recipient@example.com
Browser Driver Configuration
Browser Driver Configuration
CHROME_DRIVER_PATH=/usr/local/bin/chromedriver
EDGE_DRIVER_PATH=/usr/local/bin/msedgedriver
undefinedCHROME_DRIVER_PATH=/usr/local/bin/chromedriver
EDGE_DRIVER_PATH=/usr/local/bin/msedgedriver
undefinedCrawler Settings
爬虫设置
Configure :
hotsearchcrawler/settings.pypython
undefined配置 :
hotsearchcrawler/settings.pypython
undefinedMySQL pipeline settings
MySQL pipeline settings
MYSQL_HOST = 'localhost'
MYSQL_DATABASE = 'hotsearch'
MYSQL_USER = 'root'
MYSQL_PASSWORD = 'your_password'
MYSQL_HOST = 'localhost'
MYSQL_DATABASE = 'hotsearch'
MYSQL_USER = 'root'
MYSQL_PASSWORD = 'your_password'
Crawler behavior
Crawler behavior
CONCURRENT_REQUESTS = 16
DOWNLOAD_DELAY = 1 # Politeness delay between requests
ROBOTSTXT_OBEY = False # Many Chinese platforms don't have robots.txt
CONCURRENT_REQUESTS = 16
DOWNLOAD_DELAY = 1 # Politeness delay between requests
ROBOTSTXT_OBEY = False # Many Chinese platforms don't have robots.txt
Optional: Platform-specific cookies for authenticated content
Optional: Platform-specific cookies for authenticated content
COOKIES = {
'weibo': 'SUB=your_cookie_value',
'bilibili': 'SESSDATA=your_session_data'
}
COOKIES = {
'weibo': 'SUB=your_cookie_value',
'bilibili': 'SESSDATA=your_session_data'
}
User agent rotation
User agent rotation
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
undefinedUSER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
undefinedStarting the System
启动系统
Launch Analysis System
启动分析系统
bash
undefinedbash
undefinedStart the main application
Start the main application
python app.py
python app.py
Access web interface at http://localhost:5000
Access web interface at http://localhost:5000
undefinedundefinedStart Crawlers
启动爬虫
Via Web Interface:
- Use keyboard shortcut (configured in UI) to start/stop crawlers
Via Command Line:
bash
undefined通过Web界面:
- 使用UI中配置的键盘快捷键启动/停止爬虫
通过命令行:
bash
undefinedStart all platform crawlers
Start all platform crawlers
python run_spiders.py
python run_spiders.py
Test specific platform
Test specific platform
cd hotsearchcrawler
scrapy crawl weibo_hot # Weibo trending
scrapy crawl bilibili_hot # Bilibili trending
scrapy crawl douyin_hot # Douyin trending
scrapy crawl zhihu_hot # Zhihu trending
undefinedcd hotsearchcrawler
scrapy crawl weibo_hot # Weibo trending
scrapy crawl bilibili_hot # Bilibili trending
scrapy crawl douyin_hot # Douyin trending
scrapy crawl zhihu_hot # Zhihu trending
undefinedKey Usage Patterns
核心使用场景
Conversational Query Interface
对话式查询接口
python
undefinedpython
undefinedExample API call to analysis endpoint
Example API call to analysis endpoint
import requests
response = requests.post('http://localhost:5000/api/query', json={
'query': '最近关于人工智能的热点有哪些?', # "What AI-related trending topics recently?"
'analysis_type': 'theme_search'
})
result = response.json()
print(result['topics']) # List of AI-related trending topics
print(result['analysis']) # LLM-generated summary
undefinedimport requests
response = requests.post('http://localhost:5000/api/query', json={
'query': '最近关于人工智能的热点有哪些?', # "What AI-related trending topics recently?"
'analysis_type': 'theme_search'
})
result = response.json()
print(result['topics']) # List of AI-related trending topics
print(result['analysis']) # LLM-generated summary
undefinedSentiment Analysis
情感分析
python
undefinedpython
undefinedAnalyze sentiment for a specific topic
Analyze sentiment for a specific topic
response = requests.post('http://localhost:5000/api/analyze', json={
'query': 'GPT-6模型发布',
'analysis_type': 'sentiment'
})
sentiment = response.json()
response = requests.post('http://localhost:5000/api/analyze', json={
'query': 'GPT-6模型发布',
'analysis_type': 'sentiment'
})
sentiment = response.json()
Returns: {
Returns: {
'sentiment': 'positive',
'sentiment': 'positive',
'score': 0.85,
'score': 0.85,
'reasoning': '大多数评论表达了期待和兴奋...'
'reasoning': '大多数评论表达了期待和兴奋...'
}
}
undefinedundefinedTopic Clustering
主题聚类
python
undefinedpython
undefinedCluster related topics
Cluster related topics
response = requests.post('http://localhost:5000/api/analyze', json={
'query': '科技创新',
'analysis_type': 'clustering',
'time_range': '7d' # Last 7 days
})
clusters = response.json()['clusters']
response = requests.post('http://localhost:5000/api/analyze', json={
'query': '科技创新',
'analysis_type': 'clustering',
'time_range': '7d' # Last 7 days
})
clusters = response.json()['clusters']
Returns grouped topics:
Returns grouped topics:
{
{
'AI模型发展': [topic1, topic2, ...],
'AI模型发展': [topic1, topic2, ...],
'硬件生态': [topic3, topic4, ...],
'硬件生态': [topic3, topic4, ...],
'商业应用': [topic5, topic6, ...]
'商业应用': [topic5, topic6, ...]
}
}
undefinedundefinedScheduled Alert Tasks
定时告警任务
python
undefinedpython
undefinedtest_push_task.py - Configure automated reports
test_push_task.py - Configure automated reports
from hotsearch_analysis_agent.push_service import PushService
push_service = PushService()
from hotsearch_analysis_agent.push_service import PushService
push_service = PushService()
Create scheduled alert
Create scheduled alert
push_service.create_task({
'name': 'AI技术日报',
'query': '人工智能 OR 大模型 OR AI',
'schedule': 'daily', # daily, weekly, hourly
'time': '09:00',
'channels': ['wechat', 'telegram', 'email'],
'analysis_types': ['sentiment', 'clustering', 'summary']
})
push_service.create_task({
'name': 'AI技术日报',
'query': '人工智能 OR 大模型 OR AI',
'schedule': 'daily', # daily, weekly, hourly
'time': '09:00',
'channels': ['wechat', 'telegram', 'email'],
'analysis_types': ['sentiment', 'clustering', 'summary']
})
Test immediate push
Test immediate push
push_service.send_report(
title='人工智能与前沿科技热点分析',
content=analysis_result,
channels=['telegram']
)
undefinedpush_service.send_report(
title='人工智能与前沿科技热点分析',
content=analysis_result,
channels=['telegram']
)
undefinedDirect Crawler Usage
直接使用爬虫
python
undefinedpython
undefinedCustom crawler for specific platform
Custom crawler for specific platform
from hotsearchcrawler.spiders.weibo_spider import WeiboHotSpider
from scrapy.crawler import CrawlerProcess
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0...',
'MYSQL_HOST': 'localhost',
'MYSQL_DATABASE': 'hotsearch'
})
process.crawl(WeiboHotSpider)
process.start()
undefinedfrom hotsearchcrawler.spiders.weibo_spider import WeiboHotSpider
from scrapy.crawler import CrawlerProcess
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0...',
'MYSQL_HOST': 'localhost',
'MYSQL_DATABASE': 'hotsearch'
})
process.crawl(WeiboHotSpider)
process.start()
undefinedLLM Integration (Pangu Model)
LLM集成(盘古模型)
python
undefinedpython
undefinedUsing Huawei Pangu model locally
Using Huawei Pangu model locally
from hotsearch_analysis_agent.llm_engine import PanguAnalyzer
analyzer = PanguAnalyzer(
model_path='/path/to/openpangu-embedded-7b-model'
)
from hotsearch_analysis_agent.llm_engine import PanguAnalyzer
analyzer = PanguAnalyzer(
model_path='/path/to/openpangu-embedded-7b-model'
)
Analyze topic
Analyze topic
result = analyzer.analyze_topic(
topic='DeepSeek V4采用华为算力',
context=related_articles,
task='sentiment_and_summary'
)
print(result['sentiment']) # positive/neutral/negative
print(result['summary']) # Chinese summary
print(result['key_points']) # Extracted insights
undefinedresult = analyzer.analyze_topic(
topic='DeepSeek V4采用华为算力',
context=related_articles,
task='sentiment_and_summary'
)
print(result['sentiment']) # positive/neutral/negative
print(result['summary']) # Chinese summary
print(result['key_points']) # Extracted insights
undefinedCommon Workflows
常见工作流
Daily Monitoring Setup
日常监测设置
python
undefinedpython
undefined1. Start crawlers (runs continuously)
1. Start crawlers (runs continuously)
Run in background: nohup python run_spiders.py &
Run in background: nohup python run_spiders.py &
2. Configure daily report
2. Configure daily report
from hotsearch_analysis_agent.scheduler import AnalysisScheduler
scheduler = AnalysisScheduler()
scheduler.add_daily_task(
query='科技 OR 互联网 OR AI',
time='08:00',
recipients=['wechat_group', 'email'],
report_format='detailed' # detailed or summary
)
from hotsearch_analysis_agent.scheduler import AnalysisScheduler
scheduler = AnalysisScheduler()
scheduler.add_daily_task(
query='科技 OR 互联网 OR AI',
time='08:00',
recipients=['wechat_group', 'email'],
report_format='detailed' # detailed or summary
)
3. Start scheduler
3. Start scheduler
scheduler.run()
undefinedscheduler.run()
undefinedReal-Time Alert for Keywords
关键词实时告警
python
undefinedpython
undefinedMonitor specific keywords with immediate alerts
Monitor specific keywords with immediate alerts
from hotsearch_analysis_agent.realtime_monitor import RealtimeMonitor
monitor = RealtimeMonitor()
monitor.add_keyword_alert(
keywords=['华为', '盘古', 'DeepSeek'],
threshold_rank=10, # Alert if keyword appears in top 10
channels=['telegram'],
callback=custom_alert_handler
)
monitor.start()
undefinedfrom hotsearch_analysis_agent.realtime_monitor import RealtimeMonitor
monitor = RealtimeMonitor()
monitor.add_keyword_alert(
keywords=['华为', '盘古', 'DeepSeek'],
threshold_rank=10, # Alert if keyword appears in top 10
channels=['telegram'],
callback=custom_alert_handler
)
monitor.start()
undefinedExport Analysis Report
导出分析报告
python
undefinedpython
undefinedGenerate and export comprehensive report
Generate and export comprehensive report
from hotsearch_analysis_agent.report_generator import ReportGenerator
generator = ReportGenerator()
report = generator.generate(
query='人工智能',
date_range=('2026-04-01', '2026-04-07'),
include_sentiment=True,
include_clusters=True,
format='markdown' # markdown, html, or json
)
from hotsearch_analysis_agent.report_generator import ReportGenerator
generator = ReportGenerator()
report = generator.generate(
query='人工智能',
date_range=('2026-04-01', '2026-04-07'),
include_sentiment=True,
include_clusters=True,
format='markdown' # markdown, html, or json
)
Save to file
Save to file
with open('ai_report_20260407.md', 'w', encoding='utf-8') as f:
f.write(report)
with open('ai_report_20260407.md', 'w', encoding='utf-8') as f:
f.write(report)
Or push directly
Or push directly
generator.push_report(report, channels=['email', 'wechat'])
undefinedgenerator.push_report(report, channels=['email', 'wechat'])
undefinedPlatform Coverage
平台覆盖范围
Supported platforms (26 trending lists from 15 platforms):
- Social Media: Weibo, Douyin, Kuaishou
- Video: Bilibili, Xigua Video
- News: Toutiao, NetEase, Sina
- Q&A/Forums: Zhihu, Tieba
- Tech: 36Kr, iFeng Tech, IT Home
- Finance: Caijing, Eastmoney
- Others: Baidu Hot Search, Sogou
Each platform may have multiple lists (综合榜, 热搜榜, 视频榜, etc.)
支持的平台(15个平台共26个热门榜单):
- 社交媒体:Weibo、Douyin、Kuaishou
- 视频平台:Bilibili、Xigua Video
- 新闻平台:Toutiao、NetEase、Sina
- 问答/论坛:Zhihu、Tieba
- 科技平台:36Kr、iFeng Tech、IT Home
- 财经平台:Caijing、Eastmoney
- 其他:Baidu Hot Search、Sogou
每个平台可能包含多个榜单(综合榜、热搜榜、视频榜等)
Troubleshooting
故障排查
Crawler Fails to Extract Content
爬虫无法提取内容
python
undefinedpython
undefinedCheck browser driver compatibility
Check browser driver compatibility
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service('/usr/local/bin/chromedriver')
driver = webdriver.Chrome(service=service)
driver.get('https://www.bilibili.com')
print(driver.page_source[:500]) # Should show HTML
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service('/usr/local/bin/chromedriver')
driver = webdriver.Chrome(service=service)
driver.get('https://www.bilibili.com')
print(driver.page_source[:500]) # Should show HTML
If fails, update driver to match browser version
If fails, update driver to match browser version
undefinedundefinedDatabase Connection Issues
数据库连接问题
python
undefinedpython
undefinedTest MySQL connection
Test MySQL connection
import pymysql
try:
conn = pymysql.connect(
host='localhost',
user='root',
password='your_password',
database='hotsearch'
)
print("Connected successfully")
except pymysql.Error as e:
print(f"Error: {e}")
# Common fixes:
# 1. Check MySQL service: sudo systemctl status mysql
# 2. Verify credentials in .env
# 3. Grant permissions: GRANT ALL ON hotsearch.* TO 'root'@'localhost';
undefinedimport pymysql
try:
conn = pymysql.connect(
host='localhost',
user='root',
password='your_password',
database='hotsearch'
)
print("Connected successfully")
except pymysql.Error as e:
print(f"Error: {e}")
# Common fixes:
# 1. Check MySQL service: sudo systemctl status mysql
# 2. Verify credentials in .env
# 3. Grant permissions: GRANT ALL ON hotsearch.* TO 'root'@'localhost';
undefinedLLM API Timeout
LLM API超时
python
undefinedpython
undefinedIncrease timeout for large context
Increase timeout for large context
import openai
openai.api_key = os.getenv('OPENAI_API_KEY')
response = openai.ChatCompletion.create(
model='gpt-4',
messages=[{'role': 'user', 'content': long_text}],
timeout=120 # Increase from default 30s
)
undefinedimport openai
openai.api_key = os.getenv('OPENAI_API_KEY')
response = openai.ChatCompletion.create(
model='gpt-4',
messages=[{'role': 'user', 'content': long_text}],
timeout=120 # Increase from default 30s
)
undefinedPush Notification Not Sending
推送通知无法发送
bash
undefinedbash
undefinedTest Enterprise WeChat webhook
Test Enterprise WeChat webhook
curl -X POST 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY'
-H 'Content-Type: application/json'
-d '{"msgtype":"text","text":{"content":"测试消息"}}'
-H 'Content-Type: application/json'
-d '{"msgtype":"text","text":{"content":"测试消息"}}'
curl -X POST 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY'
-H 'Content-Type: application/json'
-d '{"msgtype":"text","text":{"content":"测试消息"}}'
-H 'Content-Type: application/json'
-d '{"msgtype":"text","text":{"content":"测试消息"}}'
Test Telegram bot
Test Telegram bot
Memory Issues with Large Datasets
大数据集内存问题
python
undefinedpython
undefinedUse batch processing for large queries
Use batch processing for large queries
from hotsearch_analysis_agent.db_utils import query_topics_batch
for batch in query_topics_batch(
query='人工智能',
batch_size=1000,
date_range=('2026-01-01', '2026-04-07')
):
analysis = analyzer.process_batch(batch)
save_results(analysis)
undefinedfrom hotsearch_analysis_agent.db_utils import query_topics_batch
for batch in query_topics_batch(
query='人工智能',
batch_size=1000,
date_range=('2026-01-01', '2026-04-07')
):
analysis = analyzer.process_batch(batch)
save_results(analysis)
undefinedAdvanced Configuration
高级配置
Custom Platform Spider
自定义平台爬虫
python
undefinedpython
undefinedhotsearchcrawler/spiders/custom_spider.py
hotsearchcrawler/spiders/custom_spider.py
import scrapy
from hotsearchcrawler.items import TrendingItem
class CustomPlatformSpider(scrapy.Spider):
name = 'custom_platform'
start_urls = ['https://custom-platform.com/trending']
def parse(self, response):
for item in response.css('.trending-item'):
yield TrendingItem(
platform='CustomPlatform',
title=item.css('.title::text').get(),
url=item.css('a::attr(href)').get(),
rank=item.css('.rank::text').get(),
hot_value=item.css('.hot::text').get()
)undefinedimport scrapy
from hotsearchcrawler.items import TrendingItem
class CustomPlatformSpider(scrapy.Spider):
name = 'custom_platform'
start_urls = ['https://custom-platform.com/trending']
def parse(self, response):
for item in response.css('.trending-item'):
yield TrendingItem(
platform='CustomPlatform',
title=item.css('.title::text').get(),
url=item.css('a::attr(href)').get(),
rank=item.css('.rank::text').get(),
hot_value=item.css('.hot::text').get()
)undefinedMulti-Language Support Extension
多语言支持扩展
python
undefinedpython
undefinedAdd translation layer for non-Chinese queries
Add translation layer for non-Chinese queries
from hotsearch_analysis_agent.translator import QueryTranslator
translator = QueryTranslator()
en_query = "artificial intelligence trends"
zh_query = translator.translate(en_query, target='zh')
from hotsearch_analysis_agent.translator import QueryTranslator
translator = QueryTranslator()
en_query = "artificial intelligence trends"
zh_query = translator.translate(en_query, target='zh')
Returns: "人工智能趋势"
Returns: "人工智能趋势"
results = analyzer.query(zh_query)
translated_results = translator.translate_results(results, target='en')
This skill enables AI agents to help developers deploy and utilize a comprehensive Chinese public opinion monitoring system with LLM-powered analytics and multi-channel alerting.results = analyzer.query(zh_query)
translated_results = translator.translate_results(results, target='en')
该Skill可让AI Agent帮助开发者部署并使用这套全面的中文舆情监测系统,具备LLM驱动的分析能力和多渠道告警功能。