ai-nlp-analytics

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI NLP Analytics

AI NLP分析

<!-- dual-compat-start -->
<!-- dual-compat-start -->

Use When

适用场景

  • Text analytics using LLM APIs — sentiment analysis, customer feedback classification, document entity extraction, multi-language support (English/Luganda/Swahili), feedback aggregation, and NLP feature implementation for PHP/Android/iOS. Sources...
  • The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.
  • 使用LLM API进行文本分析——情感分析、客户反馈分类、文档实体提取、多语言支持(英语/卢干达语/斯瓦希里语)、反馈聚合,以及为PHP/Android/iOS实现NLP功能。来源...
  • 任务需要可复用的判断逻辑、领域约束或成熟工作流,而非临时建议。

Do Not Use When

不适用场景

  • The task is unrelated to
    ai-nlp-analytics
    or would be better handled by a more specific companion skill.
  • The request only needs a trivial answer and none of this skill's constraints or references materially help.
  • 任务与
    ai-nlp-analytics
    无关,或更适合由特定配套技能处理。
  • 请求只需简单答案,本技能的约束或参考资料无法提供实质性帮助。

Required Inputs

必要输入

  • Gather relevant project context, constraints, and the concrete problem to solve.
  • Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.
  • 收集相关项目背景、约束条件及具体待解决问题。
  • 确认期望交付物:设计方案、代码、评审结果、迁移计划、审计报告或文档。

Workflow

工作流程

  • Read this
    SKILL.md
    first, then load only the referenced deep-dive files that are necessary for the task.
  • Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
  • Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.
  • 先阅读此
    SKILL.md
    ,再仅加载完成任务所需的相关深度文档。
  • 应用本技能中的有序指导、检查清单和决策规则,而非随意挑选孤立片段。
  • 生成交付物时,若相关需明确说明假设条件、风险及后续工作。

Quality Standards

质量标准

  • Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
  • Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
  • Prefer deterministic, reviewable steps over vague advice or tool-specific magic.
  • 输出内容需以执行导向为主,简洁明了,并与仓库的基线工程标准保持一致。
  • 除非技能明确要求更高标准,否则需兼容现有项目惯例。
  • 优先采用可确定、可评审的步骤,而非模糊建议或工具特定的“魔法操作”。

Anti-Patterns

反模式

  • Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
  • Loading every reference file by default instead of using progressive disclosure.
  • 将示例视为可直接复制粘贴的标准,而不检查适用性、约束条件或失败模式。
  • 默认加载所有参考文件,而非逐步按需披露。

Outputs

输出结果

  • A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
  • Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
  • References used, companion skills, or follow-up actions when they materially improve execution.
  • 符合任务需求的具体成果:实施指南、评审发现、架构决策、模板或生成的工件。
  • 若现有上下文无法完成任务,需明确说明假设、权衡或未解决的缺口。
  • 若能实质性提升执行效果,需列出使用的参考资料、配套技能或后续行动。

Evidence Produced

生成的证据

CategoryArtifactFormatExample
CorrectnessNLP analytics evaluationMarkdown doc covering sentiment, classification, and entity-extraction accuracy on a fixed eval set
docs/ai/nlp-eval-2026-04-16.md
分类工件格式示例
正确性NLP分析评估报告涵盖固定评估集上情感分析、分类和实体提取准确率的Markdown文档
docs/ai/nlp-eval-2026-04-16.md

References

参考资料

  • Use the links and companion skills already referenced in this file when deeper context is needed.
<!-- dual-compat-end -->
  • 如需更深入的上下文,使用本文件中已引用的链接和配套技能。
<!-- dual-compat-end -->

What NLP Analytics Does

NLP分析的功能

Natural Language Processing (NLP) analytics transforms unstructured text — feedback, comments, messages, documents, forms — into structured, actionable insights. Using LLM APIs, you can perform sophisticated NLP without training custom models.
Use cases for SaaS products:
  • Analyse parent/patient/customer feedback automatically.
  • Classify support tickets or complaints by type and urgency.
  • Extract key entities from uploaded documents (invoices, receipts, forms).
  • Summarise free-text notes into structured records.
  • Detect sentiment in survey responses across thousands of users.

自然语言处理(NLP)分析将非结构化文本——反馈、评论、消息、文档、表单——转化为结构化、可执行的洞察。借助LLM API,无需训练自定义模型即可实现复杂的NLP功能。
SaaS产品适用场景:
  • 自动分析家长/患者/客户反馈。
  • 按类型和紧急程度分类支持工单或投诉。
  • 从上传文档(发票、收据、表单)中提取关键实体。
  • 将自由文本笔记总结为结构化记录。
  • 检测数千用户调查回复中的情感倾向。

Feature 1: Sentiment Analysis

功能1:情感分析

Classify the emotional tone of text as Positive, Neutral, or Negative. Apply to: feedback forms, app reviews, survey responses, support messages.
将文本的情感基调分类为积极、中性或消极。适用于:反馈表单、应用评论、调查回复、支持消息。

Prompt Template

提示词模板

You are a sentiment analysis engine for a business management system.
Classify the sentiment of each piece of text.

Input: array of { id, text, source, language }
Output — strict JSON array:
[
  {
    "id": <string>,
    "sentiment": "positive|neutral|negative",
    "intensity": "strong|moderate|mild",
    "key_phrase": "<the phrase that most drives the sentiment, max 8 words>",
    "language_detected": "<ISO 639-1 code>"
  }
]

Rules:
- Detect language automatically; do not require English input.
- Do not infer sentiment from punctuation alone — read meaning.
- If text is too short to judge (< 3 words), return sentiment: "neutral", intensity: "mild".
You are a sentiment analysis engine for a business management system.
Classify the sentiment of each piece of text.

Input: array of { id, text, source, language }
Output — strict JSON array:
[
  {
    "id": <string>,
    "sentiment": "positive|neutral|negative",
    "intensity": "strong|moderate|mild",
    "key_phrase": "<the phrase that most drives the sentiment, max 8 words>",
    "language_detected": "<ISO 639-1 code>"
  }
]

Rules:
- Detect language automatically; do not require English input.
- Do not infer sentiment from punctuation alone — read meaning.
- If text is too short to judge (< 3 words), return sentiment: "neutral", intensity: "mild".

Aggregation Query (PHP/Laravel)

聚合查询(PHP/Laravel)

php
// Aggregate sentiment results by tenant for the dashboard
$summary = DB::table('nlp_results')
    ->where('tenant_id', $tenantId)
    ->where('period', $period)
    ->selectRaw('
        sentiment,
        COUNT(*) as count,
        ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) as pct
    ')
    ->groupBy('sentiment')
    ->get();

// Store individual results
NLPResult::create([
    'tenant_id'   => $tenantId,
    'source_type' => 'parent_feedback',
    'source_id'   => $feedbackId,
    'sentiment'   => $result['sentiment'],
    'intensity'   => $result['intensity'],
    'key_phrase'  => $result['key_phrase'],
    'period'      => now()->format('Y-m'),
]);
php
// Aggregate sentiment results by tenant for the dashboard
$summary = DB::table('nlp_results')
    ->where('tenant_id', $tenantId)
    ->where('period', $period)
    ->selectRaw('
        sentiment,
        COUNT(*) as count,
        ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) as pct
    ')
    ->groupBy('sentiment')
    ->get();

// Store individual results
NLPResult::create([
    'tenant_id'   => $tenantId,
    'source_type' => 'parent_feedback',
    'source_id'   => $feedbackId,
    'sentiment'   => $result['sentiment'],
    'intensity'   => $result['intensity'],
    'key_phrase'  => $result['key_phrase'],
    'period'      => now()->format('Y-m'),
]);

Dashboard Display

仪表盘展示

Feedback Sentiment — This Term
Positive  ████████████░░░░  74%  (148 responses)
Neutral   ███░░░░░░░░░░░░░  18%  (36 responses)
Negative  ██░░░░░░░░░░░░░░   8%  (16 responses)

Top Negative Themes:
- "Fees too high" (6 mentions)
- "Poor communication from teachers" (4 mentions)
- "Long waiting times at the clinic" (3 mentions)

Feedback Sentiment — This Term
Positive  ████████████░░░░  74%  (148 responses)
Neutral   ███░░░░░░░░░░░░░  18%  (36 responses)
Negative  ██░░░░░░░░░░░░░░   8%  (16 responses)

Top Negative Themes:
- "Fees too high" (6 mentions)
- "Poor communication from teachers" (4 mentions)
- "Long waiting times at the clinic" (3 mentions)

Feature 2: Text Classification

功能2:文本分类

Categorise incoming text into predefined business categories. Apply to: support tickets, expense descriptions, complaint types, document types.
将传入文本归类到预定义的业务类别中。适用于:支持工单、费用描述、投诉类型、文档类型。

Prompt Template

提示词模板

You are a text classification engine.
Classify each item into exactly one category from the provided list.

Categories: [<list from caller>]

Input: array of { id, text }
Output — strict JSON array:
[
  {
    "id": <string>,
    "category": "<one of the provided categories>",
    "confidence": "high|medium|low",
    "secondary_category": "<second best category or null>"
  }
]

If the text does not fit any category, use the category: "uncategorised".
You are a text classification engine.
Classify each item into exactly one category from the provided list.

Categories: [<list from caller>]

Input: array of { id, text }
Output — strict JSON array:
[
  {
    "id": <string>,
    "category": "<one of the provided categories>",
    "confidence": "high|medium|low",
    "secondary_category": "<second best category or null>"
  }
]

If the text does not fit any category, use the category: "uncategorised".

Domain Category Examples

领域类别示例

Support tickets (school):
["Fee query", "Grade query", "Attendance query", "Technical issue",
 "Complaint — teacher", "Complaint — facilities", "Admission enquiry", "Other"]
Expense classification (ERP):
["Travel", "Accommodation", "Meals", "Office supplies", "IT equipment",
 "Professional services", "Utilities", "Marketing", "Miscellaneous"]
Healthcare complaints:
["Wait time", "Staff conduct", "Treatment quality", "Billing",
 "Facility cleanliness", "Medication", "Communication", "Other"]
学校支持工单:
["Fee query", "Grade query", "Attendance query", "Technical issue",
 "Complaint — teacher", "Complaint — facilities", "Admission enquiry", "Other"]
ERP费用分类:
["Travel", "Accommodation", "Meals", "Office supplies", "IT equipment",
 "Professional services", "Utilities", "Marketing", "Miscellaneous"]
医疗投诉:
["Wait time", "Staff conduct", "Treatment quality", "Billing",
 "Facility cleanliness", "Medication", "Communication", "Other"]

Bulk Classification Cost

批量分类成本

Processing 500 support tickets per month:
  • Input: ~200 tokens per ticket × 500 = 100,000 tokens
  • Output: ~30 tokens per ticket × 500 = 15,000 tokens
  • With Haiku: (100K × $0.80 + 15K × $4.00) / 1M = $0.08 + $0.06 = $0.14/month

每月处理500条支持工单:
  • 输入:约200 tokens/工单 × 500 = 100,000 tokens
  • 输出:约30 tokens/工单 × 500 = 15,000 tokens
  • 使用Haiku模型:(100K × $0.80 + 15K × $4.00) / 1M = $0.08 + $0.06 = $0.14/月

Feature 3: Named Entity Extraction

功能3:命名实体提取

Pull structured data from free-form documents. Apply to: uploaded invoices, receipts, ID documents, lab reports, application forms.
从自由格式文档中提取结构化数据。适用于:上传的发票、收据、身份证件、实验室报告、申请表。

Prompt Template — Invoice Extraction

提示词模板——发票提取

You are a document intelligence engine.
Extract structured data from the provided invoice or receipt text.

Output — strict JSON:
{
  "vendor_name": "<string or null>",
  "vendor_tin": "<string or null>",
  "invoice_number": "<string or null>",
  "invoice_date": "<YYYY-MM-DD or null>",
  "due_date": "<YYYY-MM-DD or null>",
  "currency": "<ISO 4217 code>",
  "subtotal": <float or null>,
  "tax_amount": <float or null>,
  "total_amount": <float or null>,
  "line_items": [
    { "description": "<string>", "quantity": <float>, "unit_price": <float>, "amount": <float> }
  ],
  "extraction_confidence": "high|medium|low",
  "flags": ["<any field that could not be reliably extracted>"]
}

If a field is not present in the document, return null.
Do not invent or guess values — only extract what is explicitly stated.
You are a document intelligence engine.
Extract structured data from the provided invoice or receipt text.

Output — strict JSON:
{
  "vendor_name": "<string or null>",
  "vendor_tin": "<string or null>",
  "invoice_number": "<string or null>",
  "invoice_date": "<YYYY-MM-DD or null>",
  "due_date": "<YYYY-MM-DD or null>",
  "currency": "<ISO 4217 code>",
  "subtotal": <float or null>,
  "tax_amount": <float or null>,
  "total_amount": <float or null>,
  "line_items": [
    { "description": "<string>", "quantity": <float>, "unit_price": <float>, "amount": <float> }
  ],
  "extraction_confidence": "high|medium|low",
  "flags": ["<any field that could not be reliably extracted>"]
}

If a field is not present in the document, return null.
Do not invent or guess values — only extract what is explicitly stated.

Photo-to-Text Pipeline (Android/iOS)

图片转文本流程(Android/iOS)

kotlin
// Android — OCR via ML Kit, then send text to AI Service
val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
recognizer.process(InputImage.fromBitmap(bitmap, 0))
    .addOnSuccessListener { visionText ->
        val extractedText = visionText.text
        viewModel.extractInvoiceData(extractedText)  // calls AI Service
    }

kotlin
// Android — OCR via ML Kit, then send text to AI Service
val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)
recognizer.process(InputImage.fromBitmap(bitmap, 0))
    .addOnSuccessListener { visionText ->
        val extractedText = visionText.text
        viewModel.extractInvoiceData(extractedText)  // calls AI Service
    }

Feature 4: Feedback Aggregation and Theme Detection

功能4:反馈聚合与主题检测

Identify recurring themes across large volumes of free-text feedback. Useful for end-of-term parent surveys, patient satisfaction, customer reviews.
在大量自由文本反馈中识别重复出现的主题。适用于期末家长调查、患者满意度调查、客户评论。

Prompt Template

提示词模板

You are a qualitative research analyst.
Read the following responses and identify the top themes expressed.

Responses: [<array of text responses>]

Output — strict JSON:
{
  "total_responses_analysed": <int>,
  "themes": [
    {
      "theme": "<short label, max 5 words>",
      "description": "<one sentence explaining the theme>",
      "frequency": "<approximate number of responses mentioning this>",
      "sentiment": "positive|negative|mixed",
      "representative_quotes": ["<verbatim quote 1>", "<verbatim quote 2>"]
    }
  ],
  "overall_summary": "<2–3 sentence executive summary>",
  "top_recommended_action": "<one sentence — most impactful thing to address>"
}

Identify 3–7 distinct themes. Do not overlap themes.
Batch size guidance: Process 30–50 responses per API call. For 500 responses, run 10–17 calls nightly.

You are a qualitative research analyst.
Read the following responses and identify the top themes expressed.

Responses: [<array of text responses>]

Output — strict JSON:
{
  "total_responses_analysed": <int>,
  "themes": [
    {
      "theme": "<short label, max 5 words>",
      "description": "<one sentence explaining the theme>",
      "frequency": "<approximate number of responses mentioning this>",
      "sentiment": "positive|negative|mixed",
      "representative_quotes": ["<verbatim quote 1>", "<verbatim quote 2>"]
    }
  ],
  "overall_summary": "<2–3 sentence executive summary>",
  "top_recommended_action": "<one sentence — most impactful thing to address>"
}

Identify 3–7 distinct themes. Do not overlap themes.
批量处理规模建议: 每次API调用处理30–50条回复。若有500条回复,可在夜间运行10–17次调用。

Feature 5: Multi-Language Support

功能5:多语言支持

East African clients write in English, Luganda, Swahili, and mixed code-switching. LLMs handle this natively — no translation step needed.
In every NLP prompt, add:
Language handling:
- Accept input in any language including Luganda, Swahili, and East African English varieties.
- Output must always be in [target_language — default English].
- Do not transliterate names or places.
Detected language handling (PHP):
php
$languageDetected = $nlpResult['language_detected']; // 'lg' = Luganda, 'sw' = Swahili

// Store for analytics — track which languages clients use
NLPResult::create([
    'language' => $languageDetected,
    // ...
]);

// Show language breakdown on admin dashboard
// "Feedback received: 62% English | 24% Luganda | 14% Swahili"

东非客户使用英语、卢干达语、斯瓦希里语及混合语码转换。LLM可原生处理此类情况——无需翻译步骤。
在所有NLP提示词中添加:
Language handling:
- Accept input in any language including Luganda, Swahili, and East African English varieties.
- Output must always be in [target_language — default English].
- Do not transliterate names or places.
检测语言处理(PHP):
php
$languageDetected = $nlpResult['language_detected']; // 'lg' = Luganda, 'sw' = Swahili

// Store for analytics — track which languages clients use
NLPResult::create([
    'language' => $languageDetected,
    // ...
]);

// Show language breakdown on admin dashboard
// "Feedback received: 62% English | 24% Luganda | 14% Swahili"

NLP Analytics Storage Schema

NLP分析存储表结构

sql
CREATE TABLE nlp_results (
    id              BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    tenant_id       BIGINT UNSIGNED NOT NULL,
    source_type     VARCHAR(64) NOT NULL,  -- 'feedback', 'ticket', 'invoice', 'survey'
    source_id       BIGINT UNSIGNED NOT NULL,
    nlp_task        VARCHAR(32) NOT NULL,  -- 'sentiment', 'classification', 'extraction', 'themes'
    result_json     JSON NOT NULL,
    sentiment       ENUM('positive','neutral','negative') NULL,
    category        VARCHAR(128) NULL,
    confidence      ENUM('high','medium','low') NULL,
    language        CHAR(5) NULL,
    period          CHAR(7) NOT NULL,
    created_at      DATETIME DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_tenant_period  (tenant_id, period),
    INDEX idx_source         (source_type, source_id),
    INDEX idx_sentiment      (tenant_id, sentiment, period)
);

sql
CREATE TABLE nlp_results (
    id              BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    tenant_id       BIGINT UNSIGNED NOT NULL,
    source_type     VARCHAR(64) NOT NULL,  -- 'feedback', 'ticket', 'invoice', 'survey'
    source_id       BIGINT UNSIGNED NOT NULL,
    nlp_task        VARCHAR(32) NOT NULL,  -- 'sentiment', 'classification', 'extraction', 'themes'
    result_json     JSON NOT NULL,
    sentiment       ENUM('positive','neutral','negative') NULL,
    category        VARCHAR(128) NULL,
    confidence      ENUM('high','medium','low') NULL,
    language        CHAR(5) NULL,
    period          CHAR(7) NOT NULL,
    created_at      DATETIME DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_tenant_period  (tenant_id, period),
    INDEX idx_source         (source_type, source_id),
    INDEX idx_sentiment      (tenant_id, sentiment, period)
);

Anti-Patterns

反模式

  • Never run NLP on personal health data without DPPA-compliant scrubbing first.
  • Never show verbatim quotes in theme reports without confirming the user has permission to see that feedback (RBAC check).
  • Never classify into too many categories (> 10) — accuracy degrades.
  • Never skip the validation step: parse the JSON output before storing it.
  • Never run entity extraction on an image without OCR first — the model needs text input, not an image file, unless using a vision-capable model.

See also:
  • ai-feature-spec
    — Prompt design standards and output validation
  • ai-security
    — PII scrubbing before NLP on personal data
  • ai-predictive-analytics
    — Structured data prediction (classification, regression)
  • ai-analytics-dashboards
    — Displaying sentiment and theme analytics
  • ai-cost-modeling
    — Token cost for batch NLP processing
  • 若未先进行符合DPPA标准的脱敏处理,切勿对个人健康数据运行NLP分析。
  • 若未确认用户有权查看反馈,切勿在主题报告中显示原始引用(需进行RBAC权限检查)。
  • 切勿设置过多分类(超过10个)——准确率会下降。
  • 切勿跳过验证步骤:存储前需解析JSON输出。
  • 若未先进行OCR,切勿对图片运行实体提取——模型需要文本输入而非图像文件,除非使用具备视觉能力的模型。

另请参阅:
  • ai-feature-spec
    — 提示词设计标准与输出验证
  • ai-security
    — 对个人数据进行NLP分析前的PII脱敏处理
  • ai-predictive-analytics
    — 结构化数据预测(分类、回归)
  • ai-analytics-dashboards
    — 情感与主题分析的仪表盘展示
  • ai-cost-modeling
    — 批量NLP处理的Token成本计算