ai-nlp-analytics
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI NLP Analytics
AI NLP分析
<!-- dual-compat-start -->
<!-- dual-compat-start -->
Use When
适用场景
- Text analytics using LLM APIs — sentiment analysis, customer feedback classification, document entity extraction, multi-language support (English/Luganda/Swahili), feedback aggregation, and NLP feature implementation for PHP/Android/iOS. Sources...
- The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.
- 使用LLM API进行文本分析——情感分析、客户反馈分类、文档实体提取、多语言支持(英语/卢干达语/斯瓦希里语)、反馈聚合,以及为PHP/Android/iOS实现NLP功能。来源...
- 任务需要可复用的判断逻辑、领域约束或成熟工作流,而非临时建议。
Do Not Use When
不适用场景
- The task is unrelated to or would be better handled by a more specific companion skill.
ai-nlp-analytics - The request only needs a trivial answer and none of this skill's constraints or references materially help.
- 任务与无关,或更适合由特定配套技能处理。
ai-nlp-analytics - 请求只需简单答案,本技能的约束或参考资料无法提供实质性帮助。
Required Inputs
必要输入
- Gather relevant project context, constraints, and the concrete problem to solve.
- Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.
- 收集相关项目背景、约束条件及具体待解决问题。
- 确认期望交付物:设计方案、代码、评审结果、迁移计划、审计报告或文档。
Workflow
工作流程
- Read this first, then load only the referenced deep-dive files that are necessary for the task.
SKILL.md - Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
- Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.
- 先阅读此,再仅加载完成任务所需的相关深度文档。
SKILL.md - 应用本技能中的有序指导、检查清单和决策规则,而非随意挑选孤立片段。
- 生成交付物时,若相关需明确说明假设条件、风险及后续工作。
Quality Standards
质量标准
- Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
- Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
- Prefer deterministic, reviewable steps over vague advice or tool-specific magic.
- 输出内容需以执行导向为主,简洁明了,并与仓库的基线工程标准保持一致。
- 除非技能明确要求更高标准,否则需兼容现有项目惯例。
- 优先采用可确定、可评审的步骤,而非模糊建议或工具特定的“魔法操作”。
Anti-Patterns
反模式
- Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
- Loading every reference file by default instead of using progressive disclosure.
- 将示例视为可直接复制粘贴的标准,而不检查适用性、约束条件或失败模式。
- 默认加载所有参考文件,而非逐步按需披露。
Outputs
输出结果
- A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
- Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
- References used, companion skills, or follow-up actions when they materially improve execution.
- 符合任务需求的具体成果:实施指南、评审发现、架构决策、模板或生成的工件。
- 若现有上下文无法完成任务,需明确说明假设、权衡或未解决的缺口。
- 若能实质性提升执行效果,需列出使用的参考资料、配套技能或后续行动。
Evidence Produced
生成的证据
| Category | Artifact | Format | Example |
|---|---|---|---|
| Correctness | NLP analytics evaluation | Markdown doc covering sentiment, classification, and entity-extraction accuracy on a fixed eval set | |
| 分类 | 工件 | 格式 | 示例 |
|---|---|---|---|
| 正确性 | NLP分析评估报告 | 涵盖固定评估集上情感分析、分类和实体提取准确率的Markdown文档 | |
References
参考资料
- Use the links and companion skills already referenced in this file when deeper context is needed.
- 如需更深入的上下文,使用本文件中已引用的链接和配套技能。
What NLP Analytics Does
NLP分析的功能
Natural Language Processing (NLP) analytics transforms unstructured text — feedback, comments, messages, documents, forms — into structured, actionable insights. Using LLM APIs, you can perform sophisticated NLP without training custom models.
Use cases for SaaS products:
- Analyse parent/patient/customer feedback automatically.
- Classify support tickets or complaints by type and urgency.
- Extract key entities from uploaded documents (invoices, receipts, forms).
- Summarise free-text notes into structured records.
- Detect sentiment in survey responses across thousands of users.
自然语言处理(NLP)分析将非结构化文本——反馈、评论、消息、文档、表单——转化为结构化、可执行的洞察。借助LLM API,无需训练自定义模型即可实现复杂的NLP功能。
SaaS产品适用场景:
- 自动分析家长/患者/客户反馈。
- 按类型和紧急程度分类支持工单或投诉。
- 从上传文档(发票、收据、表单)中提取关键实体。
- 将自由文本笔记总结为结构化记录。
- 检测数千用户调查回复中的情感倾向。
Feature 1: Sentiment Analysis
功能1:情感分析
Classify the emotional tone of text as Positive, Neutral, or Negative. Apply to: feedback forms, app reviews, survey responses, support messages.
将文本的情感基调分类为积极、中性或消极。适用于:反馈表单、应用评论、调查回复、支持消息。
Prompt Template
提示词模板
You are a sentiment analysis engine for a business management system.
Classify the sentiment of each piece of text.
Input: array of { id, text, source, language }
Output — strict JSON array:
[
{
"id": <string>,
"sentiment": "positive|neutral|negative",
"intensity": "strong|moderate|mild",
"key_phrase": "<the phrase that most drives the sentiment, max 8 words>",
"language_detected": "<ISO 639-1 code>"
}
]
Rules:
- Detect language automatically; do not require English input.
- Do not infer sentiment from punctuation alone — read meaning.
- If text is too short to judge (< 3 words), return sentiment: "neutral", intensity: "mild".You are a sentiment analysis engine for a business management system.
Classify the sentiment of each piece of text.
Input: array of { id, text, source, language }
Output — strict JSON array:
[
{
"id": <string>,
"sentiment": "positive|neutral|negative",
"intensity": "strong|moderate|mild",
"key_phrase": "<the phrase that most drives the sentiment, max 8 words>",
"language_detected": "<ISO 639-1 code>"
}
]
Rules:
- Detect language automatically; do not require English input.
- Do not infer sentiment from punctuation alone — read meaning.
- If text is too short to judge (< 3 words), return sentiment: "neutral", intensity: "mild".Aggregation Query (PHP/Laravel)
聚合查询(PHP/Laravel)
php
// Aggregate sentiment results by tenant for the dashboard
$summary = DB::table('nlp_results')
->where('tenant_id', $tenantId)
->where('period', $period)
->selectRaw('
sentiment,
COUNT(*) as count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) as pct
')
->groupBy('sentiment')
->get();
// Store individual results
NLPResult::create([
'tenant_id' => $tenantId,
'source_type' => 'parent_feedback',
'source_id' => $feedbackId,
'sentiment' => $result['sentiment'],
'intensity' => $result['intensity'],
'key_phrase' => $result['key_phrase'],
'period' => now()->format('Y-m'),
]);php
// Aggregate sentiment results by tenant for the dashboard
$summary = DB::table('nlp_results')
->where('tenant_id', $tenantId)
->where('period', $period)
->selectRaw('
sentiment,
COUNT(*) as count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) as pct
')
->groupBy('sentiment')
->get();
// Store individual results
NLPResult::create([
'tenant_id' => $tenantId,
'source_type' => 'parent_feedback',
'source_id' => $feedbackId,
'sentiment' => $result['sentiment'],
'intensity' => $result['intensity'],
'key_phrase' => $result['key_phrase'],
'period' => now()->format('Y-m'),
]);Dashboard Display
仪表盘展示
Feedback Sentiment — This Term
Positive ████████████░░░░ 74% (148 responses)
Neutral ███░░░░░░░░░░░░░ 18% (36 responses)
Negative ██░░░░░░░░░░░░░░ 8% (16 responses)
Top Negative Themes:
- "Fees too high" (6 mentions)
- "Poor communication from teachers" (4 mentions)
- "Long waiting times at the clinic" (3 mentions)Feedback Sentiment — This Term
Positive ████████████░░░░ 74% (148 responses)
Neutral ███░░░░░░░░░░░░░ 18% (36 responses)
Negative ██░░░░░░░░░░░░░░ 8% (16 responses)
Top Negative Themes:
- "Fees too high" (6 mentions)
- "Poor communication from teachers" (4 mentions)
- "Long waiting times at the clinic" (3 mentions)Feature 2: Text Classification
功能2:文本分类
Categorise incoming text into predefined business categories. Apply to: support tickets, expense descriptions, complaint types, document types.
将传入文本归类到预定义的业务类别中。适用于:支持工单、费用描述、投诉类型、文档类型。
Prompt Template
提示词模板
You are a text classification engine.
Classify each item into exactly one category from the provided list.
Categories: [<list from caller>]
Input: array of { id, text }
Output — strict JSON array:
[
{
"id": <string>,
"category": "<one of the provided categories>",
"confidence": "high|medium|low",
"secondary_category": "<second best category or null>"
}
]
If the text does not fit any category, use the category: "uncategorised".You are a text classification engine.
Classify each item into exactly one category from the provided list.
Categories: [<list from caller>]
Input: array of { id, text }
Output — strict JSON array:
[
{
"id": <string>,
"category": "<one of the provided categories>",
"confidence": "high|medium|low",
"secondary_category": "<second best category or null>"
}
]
If the text does not fit any category, use the category: "uncategorised".Domain Category Examples
领域类别示例
Support tickets (school):
["Fee query", "Grade query", "Attendance query", "Technical issue",
"Complaint — teacher", "Complaint — facilities", "Admission enquiry", "Other"]Expense classification (ERP):
["Travel", "Accommodation", "Meals", "Office supplies", "IT equipment",
"Professional services", "Utilities", "Marketing", "Miscellaneous"]Healthcare complaints:
["Wait time", "Staff conduct", "Treatment quality", "Billing",
"Facility cleanliness", "Medication", "Communication", "Other"]学校支持工单:
["Fee query", "Grade query", "Attendance query", "Technical issue",
"Complaint — teacher", "Complaint — facilities", "Admission enquiry", "Other"]ERP费用分类:
["Travel", "Accommodation", "Meals", "Office supplies", "IT equipment",
"Professional services", "Utilities", "Marketing", "Miscellaneous"]医疗投诉:
["Wait time", "Staff conduct", "Treatment quality", "Billing",
"Facility cleanliness", "Medication", "Communication", "Other"]Bulk Classification Cost
批量分类成本
Processing 500 support tickets per month:
- Input: ~200 tokens per ticket × 500 = 100,000 tokens
- Output: ~30 tokens per ticket × 500 = 15,000 tokens
- With Haiku: (100K × $0.80 + 15K × $4.00) / 1M = $0.08 + $0.06 = $0.14/month
每月处理500条支持工单:
- 输入:约200 tokens/工单 × 500 = 100,000 tokens
- 输出:约30 tokens/工单 × 500 = 15,000 tokens
- 使用Haiku模型:(100K × $0.80 + 15K × $4.00) / 1M = $0.08 + $0.06 = $0.14/月
Feature 3: Named Entity Extraction
功能3:命名实体提取
Pull structured data from free-form documents. Apply to: uploaded invoices, receipts, ID documents, lab reports, application forms.
从自由格式文档中提取结构化数据。适用于:上传的发票、收据、身份证件、实验室报告、申请表。
Prompt Template — Invoice Extraction
提示词模板——发票提取
You are a document intelligence engine.
Extract structured data from the provided invoice or receipt text.
Output — strict JSON:
{
"vendor_name": "<string or null>",
"vendor_tin": "<string or null>",
"invoice_number": "<string or null>",
"invoice_date": "<YYYY-MM-DD or null>",
"due_date": "<YYYY-MM-DD or null>",
"currency": "<ISO 4217 code>",
"subtotal": <float or null>,
"tax_amount": <float or null>,
"total_amount": <float or null>,
"line_items": [
{ "description": "<string>", "quantity": <float>, "unit_price": <float>, "amount": <float> }
],
"extraction_confidence": "high|medium|low",
"flags": ["<any field that could not be reliably extracted>"]
}
If a field is not present in the document, return null.
Do not invent or guess values — only extract what is explicitly stated.You are a document intelligence engine.
Extract structured data from the provided invoice or receipt text.
Output — strict JSON:
{
"vendor_name": "<string or null>",
"vendor_tin": "<string or null>",
"invoice_number": "<string or null>",
"invoice_date": "<YYYY-MM-DD or null>",
"due_date": "<YYYY-MM-DD or null>",
"currency": "<ISO 4217 code>",
"subtotal": <float or null>,
"tax_amount": <float or null>,
"total_amount": <float or null>,
"line_items": [
{ "description": "<string>", "quantity": <float>, "unit_price": <float>, "amount": <float> }
],
"extraction_confidence": "high|medium|low",
"flags": ["<any field that could not be reliably extracted>"]
}
If a field is not present in the document, return null.
Do not invent or guess values — only extract what is explicitly stated.Photo-to-Text Pipeline (Android/iOS)
图片转文本流程(Android/iOS)
kotlin
// Android — OCR via ML Kit, then send text to AI Service
val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
recognizer.process(InputImage.fromBitmap(bitmap, 0))
.addOnSuccessListener { visionText ->
val extractedText = visionText.text
viewModel.extractInvoiceData(extractedText) // calls AI Service
}kotlin
// Android — OCR via ML Kit, then send text to AI Service
val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
recognizer.process(InputImage.fromBitmap(bitmap, 0))
.addOnSuccessListener { visionText ->
val extractedText = visionText.text
viewModel.extractInvoiceData(extractedText) // calls AI Service
}Feature 4: Feedback Aggregation and Theme Detection
功能4:反馈聚合与主题检测
Identify recurring themes across large volumes of free-text feedback. Useful for end-of-term parent surveys, patient satisfaction, customer reviews.
在大量自由文本反馈中识别重复出现的主题。适用于期末家长调查、患者满意度调查、客户评论。
Prompt Template
提示词模板
You are a qualitative research analyst.
Read the following responses and identify the top themes expressed.
Responses: [<array of text responses>]
Output — strict JSON:
{
"total_responses_analysed": <int>,
"themes": [
{
"theme": "<short label, max 5 words>",
"description": "<one sentence explaining the theme>",
"frequency": "<approximate number of responses mentioning this>",
"sentiment": "positive|negative|mixed",
"representative_quotes": ["<verbatim quote 1>", "<verbatim quote 2>"]
}
],
"overall_summary": "<2–3 sentence executive summary>",
"top_recommended_action": "<one sentence — most impactful thing to address>"
}
Identify 3–7 distinct themes. Do not overlap themes.Batch size guidance: Process 30–50 responses per API call. For 500 responses, run 10–17 calls nightly.
You are a qualitative research analyst.
Read the following responses and identify the top themes expressed.
Responses: [<array of text responses>]
Output — strict JSON:
{
"total_responses_analysed": <int>,
"themes": [
{
"theme": "<short label, max 5 words>",
"description": "<one sentence explaining the theme>",
"frequency": "<approximate number of responses mentioning this>",
"sentiment": "positive|negative|mixed",
"representative_quotes": ["<verbatim quote 1>", "<verbatim quote 2>"]
}
],
"overall_summary": "<2–3 sentence executive summary>",
"top_recommended_action": "<one sentence — most impactful thing to address>"
}
Identify 3–7 distinct themes. Do not overlap themes.批量处理规模建议: 每次API调用处理30–50条回复。若有500条回复,可在夜间运行10–17次调用。
Feature 5: Multi-Language Support
功能5:多语言支持
East African clients write in English, Luganda, Swahili, and mixed code-switching. LLMs handle this natively — no translation step needed.
In every NLP prompt, add:
Language handling:
- Accept input in any language including Luganda, Swahili, and East African English varieties.
- Output must always be in [target_language — default English].
- Do not transliterate names or places.Detected language handling (PHP):
php
$languageDetected = $nlpResult['language_detected']; // 'lg' = Luganda, 'sw' = Swahili
// Store for analytics — track which languages clients use
NLPResult::create([
'language' => $languageDetected,
// ...
]);
// Show language breakdown on admin dashboard
// "Feedback received: 62% English | 24% Luganda | 14% Swahili"东非客户使用英语、卢干达语、斯瓦希里语及混合语码转换。LLM可原生处理此类情况——无需翻译步骤。
在所有NLP提示词中添加:
Language handling:
- Accept input in any language including Luganda, Swahili, and East African English varieties.
- Output must always be in [target_language — default English].
- Do not transliterate names or places.检测语言处理(PHP):
php
$languageDetected = $nlpResult['language_detected']; // 'lg' = Luganda, 'sw' = Swahili
// Store for analytics — track which languages clients use
NLPResult::create([
'language' => $languageDetected,
// ...
]);
// Show language breakdown on admin dashboard
// "Feedback received: 62% English | 24% Luganda | 14% Swahili"NLP Analytics Storage Schema
NLP分析存储表结构
sql
CREATE TABLE nlp_results (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
tenant_id BIGINT UNSIGNED NOT NULL,
source_type VARCHAR(64) NOT NULL, -- 'feedback', 'ticket', 'invoice', 'survey'
source_id BIGINT UNSIGNED NOT NULL,
nlp_task VARCHAR(32) NOT NULL, -- 'sentiment', 'classification', 'extraction', 'themes'
result_json JSON NOT NULL,
sentiment ENUM('positive','neutral','negative') NULL,
category VARCHAR(128) NULL,
confidence ENUM('high','medium','low') NULL,
language CHAR(5) NULL,
period CHAR(7) NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
INDEX idx_tenant_period (tenant_id, period),
INDEX idx_source (source_type, source_id),
INDEX idx_sentiment (tenant_id, sentiment, period)
);sql
CREATE TABLE nlp_results (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
tenant_id BIGINT UNSIGNED NOT NULL,
source_type VARCHAR(64) NOT NULL, -- 'feedback', 'ticket', 'invoice', 'survey'
source_id BIGINT UNSIGNED NOT NULL,
nlp_task VARCHAR(32) NOT NULL, -- 'sentiment', 'classification', 'extraction', 'themes'
result_json JSON NOT NULL,
sentiment ENUM('positive','neutral','negative') NULL,
category VARCHAR(128) NULL,
confidence ENUM('high','medium','low') NULL,
language CHAR(5) NULL,
period CHAR(7) NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
INDEX idx_tenant_period (tenant_id, period),
INDEX idx_source (source_type, source_id),
INDEX idx_sentiment (tenant_id, sentiment, period)
);Anti-Patterns
反模式
- Never run NLP on personal health data without DPPA-compliant scrubbing first.
- Never show verbatim quotes in theme reports without confirming the user has permission to see that feedback (RBAC check).
- Never classify into too many categories (> 10) — accuracy degrades.
- Never skip the validation step: parse the JSON output before storing it.
- Never run entity extraction on an image without OCR first — the model needs text input, not an image file, unless using a vision-capable model.
See also:
- — Prompt design standards and output validation
ai-feature-spec - — PII scrubbing before NLP on personal data
ai-security - — Structured data prediction (classification, regression)
ai-predictive-analytics - — Displaying sentiment and theme analytics
ai-analytics-dashboards - — Token cost for batch NLP processing
ai-cost-modeling
- 若未先进行符合DPPA标准的脱敏处理,切勿对个人健康数据运行NLP分析。
- 若未确认用户有权查看反馈,切勿在主题报告中显示原始引用(需进行RBAC权限检查)。
- 切勿设置过多分类(超过10个)——准确率会下降。
- 切勿跳过验证步骤:存储前需解析JSON输出。
- 若未先进行OCR,切勿对图片运行实体提取——模型需要文本输入而非图像文件,除非使用具备视觉能力的模型。
另请参阅:
- — 提示词设计标准与输出验证
ai-feature-spec - — 对个人数据进行NLP分析前的PII脱敏处理
ai-security - — 结构化数据预测(分类、回归)
ai-predictive-analytics - — 情感与主题分析的仪表盘展示
ai-analytics-dashboards - — 批量NLP处理的Token成本计算
ai-cost-modeling