critique

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

STEPS

步骤

Step 1: Preparation

步骤1：准备

Invoke /impeccable, which contains design principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding. If no design context exists yet, you MUST run /impeccable teach first. Additionally gather: what the interface is trying to accomplish.

调用/impeccable工具，该工具包含设计原则、反模式以及上下文收集协议。在继续操作前请遵循该协议。如果目前尚无设计上下文，必须先运行/impeccable teach命令。此外还需收集：该界面的核心目标是什么。

Step 2: Gather Assessments

步骤2：收集评估结果

Launch two independent assessments. Neither must see the other's output to avoid bias.

You SHOULD delegate each assessment to a separate sub-agent for independence. Use your environment's agent spawning mechanism (e.g., Claude Code's

Agent

tool, or Codex's subagent spawning). Sub-agents should return their findings as structured text. Do NOT output findings to the user yet.

If sub-agents are not available in the current environment, complete each assessment sequentially, writing findings to internal notes before proceeding.

Tab isolation: When browser automation is available, each assessment MUST create its own new tab. Never reuse an existing tab, even if one is already open at the correct URL. This prevents the two assessments from interfering with each other's page state.

启动两项独立评估。两项评估的输出结果不得互通，以避免偏见。

建议将每项评估委托给独立的子Agent完成，以保证独立性。使用当前环境的Agent生成机制（例如Claude Code的

Agent

工具，或Codex的子Agent生成功能）。子Agent需以结构化文本形式返回评估结果。目前请勿向用户输出评估结果。

如果当前环境不支持子Agent，则依次完成每项评估，先将结果写入内部笔记再继续下一步。

标签页隔离：当具备浏览器自动化能力时，每项评估必须创建新的标签页。切勿复用现有标签页，即使已有标签页打开了正确的URL。这可以防止两项评估互相干扰页面状态。

Assessment A: LLM Design Review

评估A：LLM设计评审

Read the relevant source files (HTML, CSS, JS/TS) and, if browser automation is available, visually inspect the live page. Create a new tab for this; do not reuse existing tabs. After navigation, label the tab by setting the document title:

javascript

document.title = '[LLM] ' + document.title;

Think like a design director. Evaluate:

AI Slop Detection (CRITICAL): Does this look like every other AI-generated interface? Review against ALL DON'T guidelines in the impeccable skill. Check for AI color palette, gradient text, dark glows, glassmorphism, hero metric layouts, identical card grids, generic fonts, and all other tells. The test: If someone said "AI made this," would you believe them immediately?

Holistic Design Review: visual hierarchy (eye flow, primary action clarity), information architecture (structure, grouping, cognitive load), emotional resonance (does it match brand and audience?), discoverability (are interactive elements obvious?), composition (balance, whitespace, rhythm), typography (hierarchy, readability, font choices), color (purposeful use, cohesion, accessibility), states & edge cases (empty, loading, error, success), microcopy (clarity, tone, helpfulness).

Cognitive Load (consult cognitive-load):

Run the 8-item cognitive load checklist. Report failure count: 0-1 = low (good), 2-3 = moderate, 4+ = critical.
Count visible options at each decision point. If >4, flag it.
Check for progressive disclosure: is complexity revealed only when needed?

Emotional Journey:

What emotion does this interface evoke? Is that intentional?
Peak-end rule: Is the most intense moment positive? Does the experience end well?
Emotional valleys: Check for anxiety spikes at high-stakes moments (payment, delete, commit). Are there design interventions (progress indicators, reassurance copy, undo options)?

Nielsen's Heuristics (consult heuristics-scoring): Score each of the 10 heuristics 0-4. This scoring will be presented in the report.

Return structured findings covering: AI slop verdict, heuristic scores, cognitive load assessment, what's working (2-3 items), priority issues (3-5 with what/why/fix), minor observations, and provocative questions.

阅读相关源文件（HTML、CSS、JS/TS），如果具备浏览器自动化能力，还需可视化检查实时页面。为此创建新标签页，切勿复用现有标签页。导航完成后，通过修改文档标题标记该标签页：

javascript

document.title = '[LLM] ' + document.title;

以设计总监的视角思考，评估以下内容：

AI生成痕迹检测（关键项）：该界面是否看起来和其他AI生成的界面千篇一律？对照/impeccable工具中所有的禁止准则进行检查。排查AI风格调色板、渐变文本、深色发光效果、玻璃拟态、核心指标布局、重复卡片网格、通用字体以及其他所有AI生成特征。测试标准：如果有人说“这是AI做的”，你会不会立刻相信？

整体设计评审：视觉层级（视觉流向、核心操作清晰度）、信息架构（结构、分组、认知负荷）、情感共鸣（是否匹配品牌与受众？）、可发现性（交互元素是否醒目？）、构图（平衡、留白、节奏）、排版（层级、可读性、字体选择）、色彩（目的性使用、协调性、可访问性）、状态与边缘场景（空状态、加载中、错误、成功）、微文案（清晰度、语气、实用性）。

认知负荷（参考cognitive-load）：

执行8项认知负荷检查表，报告不合格项数量：0-1 = 低负荷（良好），2-3 = 中等负荷，4+ = 严重负荷。
统计每个决策点的可见选项数量，若超过4个则标记。
检查渐进式披露：是否仅在需要时才展示复杂内容？

情感体验历程：

该界面会唤起用户何种情绪？这是有意设计的吗？
峰终定律：体验中最强烈的时刻是积极的吗？体验的收尾是否良好？
情感低谷：检查高风险操作（支付、删除、提交）时是否会引发用户焦虑峰值。是否有设计干预措施（进度指示器、安抚文案、撤销选项）？

尼尔森启发式评估（参考heuristics-scoring）：为10项启发式原则分别打分（0-4分），评分结果将在报告中呈现。

返回结构化评估结果，包括：AI生成痕迹判定、启发式评分、认知负荷评估、设计亮点（2-3项）、优先级问题（3-5项，包含问题描述/影响/修复方案）、次要观察结果、启发性问题。

Assessment B: Automated Detection

评估B：自动化检测

Run the bundled deterministic detector, which flags 25 specific patterns (AI slop tells + general design quality).

CLI scan:

bash

npx impeccable --json [--fast] [target]

Pass HTML/JSX/TSX/Vue/Svelte files or directories as
```
[target]
```
(anything with markup). Do not pass CSS-only files.
For URLs, skip the CLI scan (it requires Puppeteer). Use browser visualization instead.
For large directories (200+ scannable files), use
```
--fast
```
(regex-only, skips jsdom)
For 500+ files, narrow scope or ask the user
Exit code 0 = clean, 2 = findings

Browser visualization (when browser automation tools are available AND the target is a viewable page):

The overlay is a visual aid for the user. It highlights issues directly in their browser. Do NOT scroll through the page to screenshot overlays. Instead, read the console output to get the results programmatically.

Start the live detection server:
bash
```
npx impeccable live &
```
Note the port printed to stdout (auto-assigned). Use
```
--port=PORT
```
to fix it.
Create a new tab and navigate to the page (use dev server URL for local files, or direct URL). Do not reuse existing tabs.

Label the tab via

javascript_tool

so the user can distinguish it:

javascript

document.title = '[Human] ' + document.title;

Scroll to top to ensure the page is scrolled to the very top before injection

Inject via

javascript_tool

(replace PORT with the port from step 1):

javascript

const s = document.createElement('script'); s.src = 'http://localhost:PORT/detect.js'; document.head.appendChild(s);

Wait 2-3 seconds for the detector to render overlays
Read results from console using
```
read_console_messages
```
with pattern
```
impeccable
```
. The detector logs all findings with the
```
[impeccable]
```
prefix. Do NOT scroll through the page to take screenshots of the overlays.
Cleanup: Stop the live server when done:
bash
```
npx impeccable live stop
```

For multi-view targets, inject on 3-5 representative pages. If injection fails, continue with CLI results only.

Return: CLI findings (JSON), browser console findings (if applicable), and any false positives noted.

运行内置的确定性检测器，该检测器可识别25种特定模式（AI生成痕迹+通用设计质量问题）。

CLI扫描：

bash

npx impeccable --json [--fast] [target]

将HTML/JSX/TSX/Vue/Svelte文件或目录作为
```
[target]
```
传入（仅限含标记语言的文件），请勿传入纯CSS文件。
若目标为URL，则跳过CLI扫描（该扫描需要Puppeteer），改用浏览器可视化检测。
若为大型目录（200个以上可扫描文件），使用
```
--fast
```
参数（仅用正则表达式，跳过jsdom）。
若文件数量超过500个，请缩小范围或询问用户。
退出码0 = 无问题，2 = 检测到问题。

浏览器可视化检测（当具备浏览器自动化工具且目标为可访问页面时）：

该覆盖层是为用户提供的可视化辅助工具，可在浏览器中直接高亮显示问题。请勿滚动页面截图覆盖层，而是通过读取控制台输出以编程方式获取结果。

启动实时检测服务器：
bash
```
npx impeccable live &
```
记录标准输出中显示的端口号（自动分配），可使用
```
--port=PORT
```
参数指定端口。
创建新标签页并导航至目标页面（本地文件使用开发服务器URL，或直接使用目标URL），切勿复用现有标签页。
标记标签页：通过
```
javascript_tool
```
修改文档标题，方便用户区分：
javascript
```
document.title = '[Human] ' + document.title;
```
滚动至顶部：确保页面在注入脚本前已滚动到最顶部。

注入脚本：通过

javascript_tool

执行以下代码（将PORT替换为步骤1中获取的端口号）：

javascript

const s = document.createElement('script'); s.src = 'http://localhost:PORT/detect.js'; document.head.appendChild(s);

等待2-3秒，让检测器渲染覆盖层。
读取控制台结果：使用
```
read_console_messages
```
工具匹配
```
impeccable
```
模式，检测器会以
```
[impeccable]
```
为前缀记录所有检测结果。请勿滚动页面截图覆盖层。
清理：完成后停止实时服务器：
bash
```
npx impeccable live stop
```

对于多视图目标，在3-5个代表性页面中注入脚本。若注入失败，则仅使用CLI扫描结果。

返回内容：CLI检测结果（JSON格式）、浏览器控制台检测结果（若适用）、以及标记的误报项。

Step 3: Generate Combined Critique Report

步骤3：生成综合评审报告

Synthesize both assessments into a single report. Do NOT simply concatenate. Weave the findings together, noting where the LLM review and detector agree, where the detector caught issues the LLM missed, and where detector findings are false positives.

Structure your feedback as a design director would:

将两项评估结果整合为一份报告，请勿简单拼接。需将结果融合，注明LLM评审与自动化检测的共识点、检测器发现而LLM遗漏的问题、以及检测器的误报项。

按照设计总监的风格组织反馈内容：

Design Health Score

设计健康评分

Consult heuristics-scoring

Present the Nielsen's 10 heuristics scores as a table:

#	Heuristic	Score	Key Issue
1	Visibility of System Status	?	[specific finding or "n/a" if solid]
2	Match System / Real World	?
3	User Control and Freedom	?
4	Consistency and Standards	?
5	Error Prevention	?
6	Recognition Rather Than Recall	?
7	Flexibility and Efficiency	?
8	Aesthetic and Minimalist Design	?
9	Error Recovery	?
10	Help and Documentation	?
Total		??/40	[Rating band]

Be honest with scores. A 4 means genuinely excellent. Most real interfaces score 20-32.

参考heuristics-scoring

以表格形式呈现尼尔森10项启发式原则的评分：

序号	启发式原则	得分	核心问题
1	系统状态可见性	?	[具体发现，若无问题则填“n/a”]
2	系统与真实世界匹配	?
3	用户控制与自由度	?
4	一致性与标准	?
5	错误预防	?
6	识别而非回忆	?
7	灵活性与效率	?
8	美观与极简设计	?
9	错误恢复	?
10	帮助与文档	?
总分		??/40	[评级区间]

评分需真实客观。4分代表真正的优秀，大多数实际界面得分在20-32分之间。

Anti-Patterns Verdict

反模式判定

Start here. Does this look AI-generated?

LLM assessment: Your own evaluation of AI slop tells. Cover overall aesthetic feel, layout sameness, generic composition, missed opportunities for personality.

Deterministic scan: Summarize what the automated detector found, with counts and file locations. Note any additional issues the detector caught that you missed, and flag any false positives.

Visual overlays (if browser was used): Tell the user that overlays are now visible in the [Human] tab in their browser, highlighting the detected issues. Summarize what the console output reported.

以此为开篇：该界面看起来像是AI生成的吗？

LLM评估结果：你对AI生成痕迹的自主评估，涵盖整体美学风格、布局重复性、通用构图、个性化缺失等方面。

确定性扫描结果：总结自动化检测器的发现，包括问题数量与文件位置。注明检测器发现而你遗漏的额外问题，并标记误报项。

可视化覆盖层（若使用浏览器检测）：告知用户浏览器的**[Human]**标签页中已显示覆盖层，高亮标记了检测到的问题。总结控制台输出的结果。

Overall Impression

整体印象

A brief gut reaction: what works, what doesn't, and the single biggest opportunity.

简短的直观感受：哪些部分做得好，哪些不足，以及最核心的改进机会。

What's Working

设计亮点

Highlight 2-3 things done well. Be specific about why they work.

重点突出2-3项做得出色的设计，具体说明其有效的原因。

Priority Issues

优先级问题

The 3-5 most impactful design problems, ordered by importance.

For each issue, tag with P0-P3 severity (consult heuristics-scoring for severity definitions):

[P?] What: Name the problem clearly
Why it matters: How this hurts users or undermines goals
Fix: What to do about it (be concrete)
Suggested command: Which command could address this (from: /animate, /quieter, /shape, /optimize, /adapt, /clarify, /layout, /distill, /delight, /audit, /harden, /polish, /bolder, /typeset, /critique, /colorize, /overdrive)

3-5个影响最大的设计问题，按重要性排序。

每个问题需标记P0-P3严重等级（参考heuristics-scoring中的严重等级定义）：

[P?] 问题描述：清晰命名问题
影响原因：该问题如何伤害用户或影响目标达成
修复方案：具体的解决措施
建议命令：可解决该问题的命令（可选：/animate, /quieter, /shape, /optimize, /adapt, /clarify, /layout, /distill, /delight, /audit, /harden, /polish, /bolder, /typeset, /critique, /colorize, /overdrive）

Persona Red Flags

用户角色风险点

Consult personas

Auto-select 2-3 personas most relevant to this interface type (use the selection table in the reference). If

.github/copilot-instructions.md

contains a

## Design Context

section from

impeccable teach

, also generate 1-2 project-specific personas from the audience/brand info.

For each selected persona, walk through the primary user action and list specific red flags found:

Alex (Power User): No keyboard shortcuts detected. Form requires 8 clicks for primary action. Forced modal onboarding. High abandonment risk.

Jordan (First-Timer): Icon-only nav in sidebar. Technical jargon in error messages ("404 Not Found"). No visible help. Will abandon at step 2.

Be specific. Name the exact elements and interactions that fail each persona. Don't write generic persona descriptions; write what broke for them.

参考personas

自动选择2-3个与该界面类型最相关的用户角色（使用参考文档中的选择表）。若

.github/copilot-instructions.md

文件中包含来自

impeccable teach

的

## Design Context

章节，还需根据受众/品牌信息生成1-2个项目专属用户角色。

针对每个选中的用户角色，梳理其核心操作流程，并列出发现的具体风险点：

Alex（高级用户）：未检测到键盘快捷键，核心操作需8次点击，强制模态引导，高放弃风险。

Jordan（新手用户）：侧边栏仅含图标导航，错误消息使用技术术语（“404 Not Found”），无可见帮助入口，将在步骤2放弃操作。

需具体明确，指出导致角色体验失败的具体元素与交互。请勿撰写通用的角色描述，需说明角色遇到的具体问题。

Minor Observations

次要观察结果

Quick notes on smaller issues worth addressing.

简要列出值得解决的小问题。

Questions to Consider

思考问题

Provocative questions that might unlock better solutions:

"What if the primary action were more prominent?"
"Does this need to feel this complex?"
"What would a confident version of this look like?"

Remember:

Be direct. Vague feedback wastes everyone's time.
Be specific. "The submit button," not "some elements."
Say what's wrong AND why it matters to users.
Give concrete suggestions, not just "consider exploring..."
Prioritize ruthlessly. If everything is important, nothing is.
Don't soften criticism. Developers need honest feedback to ship great design.

提出具有启发性的问题，助力探索更优解决方案：

“如果核心操作更突出会怎样？”
“这个界面真的需要这么复杂吗？”
“这个界面更自信的版本会是什么样？”

注意：

直接坦率，模糊的反馈只会浪费时间。
具体明确，要说“提交按钮”而非“某些元素”。
说明问题是什么以及对用户的影响。
给出具体建议，而非仅“考虑探索...”。
严格排序优先级，若所有问题都重要，则等于没有重点。
不要弱化批评，开发者需要诚实的反馈才能打造优秀设计。

Step 4: Ask the User

步骤4：询问用户

After presenting findings, use targeted questions based on what was actually found. ask the user directly to clarify what you cannot infer. These answers will shape the action plan.

Ask questions along these lines (adapt to the specific findings; do NOT ask generic questions):

Priority direction: Based on the issues found, ask which category matters most to the user right now. For example: "I found problems with visual hierarchy, color usage, and information overload. Which area should we tackle first?" Offer the top 2-3 issue categories as options.
Design intent: If the critique found a tonal mismatch, ask whether it was intentional. For example: "The interface feels clinical and corporate. Is that the intended tone, or should it feel warmer/bolder/more playful?" Offer 2-3 tonal directions as options based on what would fix the issues found.
Scope: Ask how much the user wants to take on. For example: "I found N issues. Want to address everything, or focus on the top 3?" Offer scope options like "Top 3 only", "All issues", "Critical issues only".
Constraints (optional; only ask if relevant): If the findings touch many areas, ask if anything is off-limits. For example: "Should any sections stay as-is?" This prevents the plan from touching things the user considers done.

Rules for questions:

Every question must reference specific findings from the report. Never ask generic "who is your audience?" questions.
Keep it to 2-4 questions maximum. Respect the user's time.
Offer concrete options, not open-ended prompts.
If findings are straightforward (e.g., only 1-2 clear issues), skip questions and go directly to Step 5.

在展示结果后，基于实际发现提出针对性问题，直接询问用户以澄清你无法推断的信息。这些答案将指导后续的行动计划。

可从以下方向提问（需适配具体发现，请勿提出通用问题）：

优先级方向：基于发现的问题，询问用户当前最关注哪个类别。例如：“我发现视觉层级、色彩使用和信息过载方面存在问题，我们应该先解决哪个领域？”提供前2-3个问题类别作为选项。
设计意图：若评审发现风格不匹配，询问是否为有意设计。例如：“该界面给人一种严肃的企业感，这是预期的风格，还是需要更温暖/大胆/有趣的风格？”基于问题修复需求提供2-3个风格方向选项。
处理范围：询问用户希望处理多少问题。例如：“我发现了N个问题，是要全部解决，还是只处理前3个？”提供范围选项如“仅前3个”、“全部问题”、“仅严重问题”。
约束条件（可选；仅在相关时提问）：若发现的问题涉及多个领域，询问是否有不可修改的部分。例如：“是否有部分界面需要保持原样？”这可避免计划触及用户认为已完成的内容。

提问规则：

每个问题必须参考报告中的具体发现，切勿提出“你的受众是谁？”这类通用问题。
最多提问2-4个问题，尊重用户时间。
提供具体选项，而非开放式问题。
若发现的问题简单明确（例如仅1-2个清晰问题），则跳过提问直接进入步骤5。

Step 5: Recommended Actions

步骤5：推荐行动

After receiving the user's answers, present a prioritized action summary reflecting the user's priorities and scope from Step 4.

收到用户回复后，根据用户在步骤4中提出的优先级与范围，呈现一份优先级明确的行动摘要。

Action Summary

行动摘要

List recommended commands in priority order, based on the user's answers:

/command-name
: Brief description of what to fix (specific context from critique findings)
/command-name
: Brief description (specific context) ...

Rules for recommendations:

Only recommend commands from: /animate, /quieter, /shape, /optimize, /adapt, /clarify, /layout, /distill, /delight, /audit, /harden, /polish, /bolder, /typeset, /critique, /colorize, /overdrive
Order by the user's stated priorities first, then by impact
Each item's description should carry enough context that the command knows what to focus on
Map each Priority Issue to the appropriate command
Skip commands that would address zero issues
If the user chose a limited scope, only include items within that scope
If the user marked areas as off-limits, exclude commands that would touch those areas
End with
```
/polish
```
as the final step if any fixes were recommended

After presenting the summary, tell the user:

You can ask me to run these one at a time, all at once, or in any order you prefer.
Re-run
/critique
after fixes to see your score improve.

按优先级列出推荐命令，基于用户的回复：

/command-name
：简要描述需修复的内容（来自评审发现的具体上下文）
/command-name
：简要描述（具体上下文） ...

推荐规则：

仅推荐以下命令：/animate, /quieter, /shape, /optimize, /adapt, /clarify, /layout, /distill, /delight, /audit, /harden, /polish, /bolder, /typeset, /critique, /colorize, /overdrive
排序优先级：先按用户指定的优先级，再按影响程度
每个条目的描述需包含足够上下文，确保命令知晓修复重点
将每个优先级问题映射到对应的命令
跳过无法解决任何问题的命令
若用户选择有限范围，则仅包含该范围内的条目
若用户标记了不可修改的区域，则排除会触及这些区域的命令
若有修复建议，最后一步需推荐
```
/polish
```
命令

在展示摘要后，告知用户：

你可以要求我逐个执行、一次性全部执行，或按任意顺序执行这些命令。
修复完成后可重新运行
/critique
查看评分提升情况。