i-critique

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

STEPS

步骤

Step 1: Preparation

步骤1：准备工作

Invoke /impeccable, which contains design principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding. If no design context exists yet, you MUST run /impeccable teach first. Additionally gather: what the interface is trying to accomplish.

调用/impeccable，该工具包含设计原则、反模式和上下文收集协议。在继续操作前请遵循该协议。如果尚未存在设计上下文，你必须先运行/impeccable teach。此外还要收集：该界面要实现的目标是什么。

Step 2: Gather Assessments

步骤2：收集评估结果

Launch two independent assessments. Neither must see the other's output to avoid bias.

You SHOULD delegate each assessment to a separate sub-agent for independence. Use your environment's agent spawning mechanism (e.g., Claude Code's

Agent

tool, or Codex's subagent spawning). Sub-agents should return their findings as structured text. Do NOT output findings to the user yet.

If sub-agents are not available in the current environment, complete each assessment sequentially, writing findings to internal notes before proceeding.

Tab isolation: When browser automation is available, each assessment MUST create its own new tab. Never reuse an existing tab, even if one is already open at the correct URL. This prevents the two assessments from interfering with each other's page state.

启动两项独立评估。两项评估的输出不得互通以避免偏差。

你应该将每项评估委托给独立的子Agent以保证独立性。使用你环境中的Agent生成机制（例如Claude Code的

Agent

工具，或Codex的子Agent生成功能）。子Agent需要以结构化文本返回评估结果，暂不要向用户输出结果。

如果当前环境不支持子Agent，按顺序完成每项评估，在继续前将结果写入内部笔记。

标签页隔离：如果支持浏览器自动化，每项评估必须创建独立的新标签页。即使已有打开的对应URL标签页也不要复用，避免两项评估的页面状态互相干扰。

Assessment A: LLM Design Review

评估A：LLM设计评审

Read the relevant source files (HTML, CSS, JS/TS) and, if browser automation is available, visually inspect the live page. Create a new tab for this; do not reuse existing tabs. After navigation, label the tab by setting the document title:

javascript

document.title = '[LLM] ' + document.title;

Think like a design director. Evaluate:

AI Slop Detection (CRITICAL): Does this look like every other AI-generated interface? Review against ALL DON'T guidelines in the impeccable skill. Check for AI color palette, gradient text, dark glows, glassmorphism, hero metric layouts, identical card grids, generic fonts, and all other tells. The test: If someone said "AI made this," would you believe them immediately?

Holistic Design Review: visual hierarchy (eye flow, primary action clarity), information architecture (structure, grouping, cognitive load), emotional resonance (does it match brand and audience?), discoverability (are interactive elements obvious?), composition (balance, whitespace, rhythm), typography (hierarchy, readability, font choices), color (purposeful use, cohesion, accessibility), states & edge cases (empty, loading, error, success), microcopy (clarity, tone, helpfulness).

Cognitive Load (consult cognitive-load):

Run the 8-item cognitive load checklist. Report failure count: 0-1 = low (good), 2-3 = moderate, 4+ = critical.
Count visible options at each decision point. If >4, flag it.
Check for progressive disclosure: is complexity revealed only when needed?

Emotional Journey:

What emotion does this interface evoke? Is that intentional?
Peak-end rule: Is the most intense moment positive? Does the experience end well?
Emotional valleys: Check for anxiety spikes at high-stakes moments (payment, delete, commit). Are there design interventions (progress indicators, reassurance copy, undo options)?

Nielsen's Heuristics (consult heuristics-scoring): Score each of the 10 heuristics 0-4. This scoring will be presented in the report.

Return structured findings covering: AI slop verdict, heuristic scores, cognitive load assessment, what's working (2-3 items), priority issues (3-5 with what/why/fix), minor observations, and provocative questions.

读取相关源文件（HTML、CSS、JS/TS），如果支持浏览器自动化，还要可视化检查在线页面。为此创建新标签页，不要复用现有标签页。跳转后通过设置文档标题为标签页命名：

javascript

document.title = '[LLM] ' + document.title;

站在设计总监的角度思考，评估以下维度：

AI劣质设计检测（关键项）：这个设计看起来是不是和其他所有AI生成的界面千篇一律？对照impeccable技能中所有「禁止」规范进行检查，排查AI配色、渐变文字、深色发光效果、玻璃拟态、核心指标布局、同质化卡片网格、通用字体等所有AI生成特征。测试标准：如果有人说「这是AI做的」，你会不会立刻相信？

整体设计评审：视觉层级（视线流、主操作清晰度）、信息架构（结构、分组、认知负荷）、情感共鸣（是否匹配品牌和受众？）、可发现性（交互元素是否清晰可见？）、构图（平衡、留白、节奏）、排版（层级、可读性、字体选择）、色彩（目的性、一致性、可访问性）、状态与边缘场景（空状态、加载、错误、成功）、微文案（清晰度、语气、实用性）。

认知负荷（参考cognitive-load）：

运行8项认知负荷检查清单，报告未通过数量：0-1 = 低（良好），2-3 = 中等，4+ = 严重。
统计每个决策点的可见选项数量，如果超过4个则标记。
检查渐进式披露：复杂度是否仅在需要时才展示？

情感历程：

这个界面唤起了什么情绪？是否是刻意设计的？
峰终定律：体验中最强烈的时刻是否是正面的？体验收尾是否良好？
情绪低谷：检查高风险场景（支付、删除、确认提交）下的焦虑峰值，是否有设计干预措施（进度指示器、安抚文案、撤销选项）？

尼尔森可用性原则（参考heuristics-scoring）： 10项原则每项按0-4分打分，分数会展示在报告中。

返回结构化结果，包含：AI劣质设计判定、可用性原则评分、认知负荷评估、优势点（2-3项）、优先级问题（3-5项，包含问题/原因/修复方案）、次要观察结果、启发性问题。

Assessment B: Automated Detection

评估B：自动检测

Run the bundled deterministic detector, which flags 25 specific patterns (AI slop tells + general design quality).

CLI scan:

bash

npx impeccable --json [--fast] [target]

Pass HTML/JSX/TSX/Vue/Svelte files or directories as
```
[target]
```
(anything with markup). Do not pass CSS-only files.
For URLs, skip the CLI scan (it requires Puppeteer). Use browser visualization instead.
For large directories (200+ scannable files), use
```
--fast
```
(regex-only, skips jsdom)
For 500+ files, narrow scope or ask the user
Exit code 0 = clean, 2 = findings

Browser visualization (when browser automation tools are available AND the target is a viewable page):

The overlay is a visual aid for the user. It highlights issues directly in their browser. Do NOT scroll through the page to screenshot overlays. Instead, read the console output to get the results programmatically.

Start the live detection server:
bash
```
npx impeccable live &
```
Note the port printed to stdout (auto-assigned). Use
```
--port=PORT
```
to fix it.
Create a new tab and navigate to the page (use dev server URL for local files, or direct URL). Do not reuse existing tabs.

Label the tab via

javascript_tool

so the user can distinguish it:

javascript

document.title = '[Human] ' + document.title;

Scroll to top to ensure the page is scrolled to the very top before injection

Inject via

javascript_tool

(replace PORT with the port from step 1):

javascript

const s = document.createElement('script'); s.src = 'http://localhost:PORT/detect.js'; document.head.appendChild(s);

Wait 2-3 seconds for the detector to render overlays
Read results from console using
```
read_console_messages
```
with pattern
```
impeccable
```
. The detector logs all findings with the
```
[impeccable]
```
prefix. Do NOT scroll through the page to take screenshots of the overlays.
Cleanup: Stop the live server when done:
bash
```
npx impeccable live stop
```

For multi-view targets, inject on 3-5 representative pages. If injection fails, continue with CLI results only.

Return: CLI findings (JSON), browser console findings (if applicable), and any false positives noted.

运行内置的确定性检测器，可标记25种特定模式（AI劣质设计特征+通用设计质量问题）。

CLI扫描：

bash

npx impeccable --json [--fast] [target]

将HTML/JSX/TSX/Vue/Svelte文件或目录作为
```
[target]
```
传入（任何包含标记语言的内容），不要传入纯CSS文件。
对于URL，跳过CLI扫描（需要Puppeteer支持），改用浏览器可视化检测。
对于大型目录（200+可扫描文件），使用
```
--fast
```
参数（仅正则匹配，跳过jsdom解析）。
超过500个文件时，缩小扫描范围或询问用户。
退出码0 = 无问题，2 = 发现问题。

浏览器可视化（当支持浏览器自动化工具且目标是可查看的页面时使用）：

覆盖层是面向用户的视觉辅助工具，会直接在用户浏览器中高亮问题。不要滚动页面截图覆盖层，而是读取控制台输出来程序化获取结果。

启动实时检测服务：
bash
```
npx impeccable live &
```
记录标准输出打印的端口号（自动分配），可使用
```
--port=PORT
```
指定固定端口。
创建新标签页并跳转至目标页面（本地文件用开发服务URL，线上资源用直接URL），不要复用现有标签页。
为标签页命名，通过
```
javascript_tool
```
执行以下代码方便用户区分：
javascript
```
document.title = '[Human] ' + document.title;
```
滚动至页面顶部，确保注入脚本前页面处于最顶端位置。

注入脚本，通过

javascript_tool

执行以下代码（将PORT替换为步骤1的端口号）：

javascript

const s = document.createElement('script'); s.src = 'http://localhost:PORT/detect.js'; document.head.appendChild(s);

等待2-3秒让检测器渲染覆盖层。
读取控制台结果，使用
```
read_console_messages
```
匹配
```
impeccable
```
前缀的日志，检测器所有结果都会带
```
[impeccable]
```
前缀输出。不要滚动页面截图覆盖层。
清理：检测完成后停止实时服务：
bash
```
npx impeccable live stop
```

对于多视图目标，在3-5个代表性页面注入脚本。如果注入失败，仅使用CLI结果继续。

返回：CLI扫描结果（JSON）、浏览器控制台结果（如果有）、所有标注的误报。

Step 3: Generate Combined Critique Report

步骤3：生成综合评审报告

Synthesize both assessments into a single report. Do NOT simply concatenate. Weave the findings together, noting where the LLM review and detector agree, where the detector caught issues the LLM missed, and where detector findings are false positives.

Structure your feedback as a design director would:

将两项评估的结果整合成一份报告，不要简单拼接。融合两项结果，标注LLM评审和检测器结论一致的点、检测器发现了LLM遗漏的问题的点、检测器结果为误报的点。

按照设计总监的风格组织反馈：

Design Health Score

设计健康度评分

Consult heuristics-scoring

Present the Nielsen's 10 heuristics scores as a table:

#	Heuristic	Score	Key Issue
1	Visibility of System Status	?	[specific finding or "n/a" if solid]
2	Match System / Real World	?
3	User Control and Freedom	?
4	Consistency and Standards	?
5	Error Prevention	?
6	Recognition Rather Than Recall	?
7	Flexibility and Efficiency	?
8	Aesthetic and Minimalist Design	?
9	Error Recovery	?
10	Help and Documentation	?
Total		??/40	[Rating band]

Be honest with scores. A 4 means genuinely excellent. Most real interfaces score 20-32.

参考heuristics-scoring

将尼尔森十大可用性原则的评分以表格形式展示：

序号	可用性原则	得分	核心问题
1	系统状态可见性	?	[具体发现，无问题则填"n/a"]
2	系统与现实世界匹配	?
3	用户控制权与自由度	?
4	一致性与标准	?
5	错误预防	?
6	识别优于回忆	?
7	灵活性与效率	?
8	简约美学设计	?
9	错误处理与恢复	?
10	帮助与文档	?
总分		??/40	[评级区间]

打分要客观，4分代表真正优秀，大部分实际界面得分在20-32分之间。

Anti-Patterns Verdict

反模式判定

Start here. Does this look AI-generated?

LLM assessment: Your own evaluation of AI slop tells. Cover overall aesthetic feel, layout sameness, generic composition, missed opportunities for personality.

Deterministic scan: Summarize what the automated detector found, with counts and file locations. Note any additional issues the detector caught that you missed, and flag any false positives.

Visual overlays (if browser was used): Tell the user that overlays are now visible in the [Human] tab in their browser, highlighting the detected issues. Summarize what the console output reported.

开篇优先展示这部分：这个设计看起来是不是AI生成的？

LLM评估：你对AI劣质设计特征的评估结果，包含整体审美感受、布局同质化、通用构图、缺失个性的点。

确定性扫描：总结自动检测器的发现，包含数量和文件位置。标注检测器发现的你遗漏的额外问题，以及误报项。

视觉覆盖层（如果使用了浏览器检测）：告知用户现在可以在浏览器的**[Human]**标签页中看到覆盖层，高亮了检测到的问题，总结控制台输出的结果。

Overall Impression

整体印象

A brief gut reaction: what works, what doesn't, and the single biggest opportunity.

简短的直观感受：哪些做得好，哪些不好，以及最大的优化机会。

What's Working

优势点

Highlight 2-3 things done well. Be specific about why they work.

高亮2-3个做得好的地方，具体说明为什么好。

Priority Issues

优先级问题

The 3-5 most impactful design problems, ordered by importance.

For each issue, tag with P0-P3 severity (consult heuristics-scoring for severity definitions):

[P?] What: Name the problem clearly
Why it matters: How this hurts users or undermines goals
Fix: What to do about it (be concrete)
Suggested command: Which command could address this (from: /animate, /quieter, /shape, /optimize, /adapt, /clarify, /distill, /delight, /onboard, /normalize, /audit, /harden, /polish, /extract, /bolder, /arrange, /typeset, /critique, /colorize, /overdrive)

3-5个影响最大的设计问题，按重要性排序。

每个问题标注P0-P3严重等级（参考heuristics-scoring的严重等级定义）：

[P?] 问题：清晰描述问题
影响：该问题对用户的伤害或对目标的负面影响
修复方案：具体的解决方法
建议命令：可用于解决该问题的命令（从以下选择：/animate, /quieter, /shape, /optimize, /adapt, /clarify, /distill, /delight, /onboard, /normalize, /audit, /harden, /polish, /extract, /bolder, /arrange, /typeset, /critique, /colorize, /overdrive）

Persona Red Flags

用户画像风险点

Consult personas

Auto-select 2-3 personas most relevant to this interface type (use the selection table in the reference). If

.github/copilot-instructions.md

contains a

## Design Context

section from

impeccable teach

, also generate 1-2 project-specific personas from the audience/brand info.

For each selected persona, walk through the primary user action and list specific red flags found:

Alex (Power User): No keyboard shortcuts detected. Form requires 8 clicks for primary action. Forced modal onboarding. High abandonment risk.

Jordan (First-Timer): Icon-only nav in sidebar. Technical jargon in error messages ("404 Not Found"). No visible help. Will abandon at step 2.

Be specific. Name the exact elements and interactions that fail each persona. Don't write generic persona descriptions; write what broke for them.

参考personas

自动选择2-3个与该界面类型最相关的用户画像（使用参考文档中的选择表）。如果

.github/copilot-instructions.md

中包含

impeccable teach

生成的

## 设计上下文

章节，还要根据受众/品牌信息生成1-2个项目专属的用户画像。

对每个选中的用户画像，梳理核心用户操作路径，列出发现的具体风险点：

Alex（重度用户）：未检测到键盘快捷键，核心操作需要8次点击，强制模态框引导，流失风险高。

Jordan（新用户）：侧边栏仅用图标导航，错误消息包含技术术语（"404 Not Found"），无可见帮助，会在第2步流失。

要具体，指出每个用户画像对应的具体失效元素和交互，不要写通用的用户画像描述，要写对他们造成阻碍的具体问题。

Minor Observations

次要观察结果

Quick notes on smaller issues worth addressing.

小问题的快速备注，值得修复。

Questions to Consider

值得思考的问题

Provocative questions that might unlock better solutions:

"What if the primary action were more prominent?"
"Does this need to feel this complex?"
"What would a confident version of this look like?"

Remember:

Be direct. Vague feedback wastes everyone's time.
Be specific. "The submit button," not "some elements."
Say what's wrong AND why it matters to users.
Give concrete suggestions, not just "consider exploring..."
Prioritize ruthlessly. If everything is important, nothing is.
Don't soften criticism. Developers need honest feedback to ship great design.

可能带来更优解决方案的启发性问题：

"如果主操作更突出会怎么样？"
"这个界面有必要做得这么复杂吗？"
"如果这个设计更自信一点会是什么样？"

注意事项：

直接明了，模糊的反馈浪费所有人的时间。
具体精准，说「提交按钮」，不要说「一些元素」。
既要说明问题，也要说明对用户的影响。
给出具体建议，不要只说「可以考虑探索...」。
严格优先级排序，如果所有内容都重要，就等于没有重点。
不要软化批评，开发者需要诚实的反馈才能交付优秀的设计。

Step 4: Ask the User

步骤4：询问用户

After presenting findings, use targeted questions based on what was actually found. ask the user directly to clarify what you cannot infer. These answers will shape the action plan.

Ask questions along these lines (adapt to the specific findings; do NOT ask generic questions):

Priority direction: Based on the issues found, ask which category matters most to the user right now. For example: "I found problems with visual hierarchy, color usage, and information overload. Which area should we tackle first?" Offer the top 2-3 issue categories as options.
Design intent: If the critique found a tonal mismatch, ask whether it was intentional. For example: "The interface feels clinical and corporate. Is that the intended tone, or should it feel warmer/bolder/more playful?" Offer 2-3 tonal directions as options based on what would fix the issues found.
Scope: Ask how much the user wants to take on. For example: "I found N issues. Want to address everything, or focus on the top 3?" Offer scope options like "Top 3 only", "All issues", "Critical issues only".
Constraints (optional; only ask if relevant): If the findings touch many areas, ask if anything is off-limits. For example: "Should any sections stay as-is?" This prevents the plan from touching things the user considers done.

Rules for questions:

Every question must reference specific findings from the report. Never ask generic "who is your audience?" questions.
Keep it to 2-4 questions maximum. Respect the user's time.
Offer concrete options, not open-ended prompts.
If findings are straightforward (e.g., only 1-2 clear issues), skip questions and go directly to Step 5.

展示完结果后，基于实际发现的结果提出针对性问题，直接询问用户你无法推断的内容，用户的回答会作为后续行动计划的依据。

按照以下方向提问（适配具体发现的问题，不要提通用问题）：

优先级方向：基于发现的问题，询问用户当前最关注的类别，例如："我发现了视觉层级、色彩使用和信息过载方面的问题，我们应该优先解决哪个领域？" 提供2-3个最核心的问题类别作为选项。
设计意图：如果评审发现了语气不匹配的问题，询问是否是刻意设计的，例如："这个界面给人感觉偏冰冷的企业风格，这是预期的风格吗？还是需要更温暖/更大胆/更活泼一点？" 基于可以解决问题的方向提供2-3种风格选项。
范围：询问用户希望处理的问题范围，例如："我共发现了N个问题，你希望全部解决，还是仅聚焦前3个？" 提供范围选项，比如「仅前3个」、「全部问题」、「仅严重问题」。
约束（可选，仅相关时提问）：如果发现的问题涉及很多模块，询问是否有不能修改的内容，例如："有没有哪些模块需要保持现状？" 避免行动计划触碰用户认为已经完成的内容。

提问规则：

每个问题必须关联报告中具体的发现，永远不要提「你的受众是谁？」这种通用问题。
最多2-4个问题，尊重用户时间。
提供具体选项，不要用开放式提问。
如果结论非常明确（比如只有1-2个清晰的问题），跳过提问直接进入步骤5。

Step 5: Recommended Actions

步骤5：推荐行动

After receiving the user's answers, present a prioritized action summary reflecting the user's priorities and scope from Step 4.

收到用户的回答后，基于步骤4中用户的优先级和范围要求，输出优先级排序的行动摘要。

Action Summary

行动摘要

List recommended commands in priority order, based on the user's answers:

/command-name
: Brief description of what to fix (specific context from critique findings)
/command-name
: Brief description (specific context) ...

Rules for recommendations:

Only recommend commands from: /animate, /quieter, /shape, /optimize, /adapt, /clarify, /distill, /delight, /onboard, /normalize, /audit, /harden, /polish, /extract, /bolder, /arrange, /typeset, /critique, /colorize, /overdrive
Order by the user's stated priorities first, then by impact
Each item's description should carry enough context that the command knows what to focus on
Map each Priority Issue to the appropriate command
Skip commands that would address zero issues
If the user chose a limited scope, only include items within that scope
If the user marked areas as off-limits, exclude commands that would touch those areas
End with
```
/polish
```
as the final step if any fixes were recommended

After presenting the summary, tell the user:

You can ask me to run these one at a time, all at once, or in any order you prefer.
Re-run
/critique
after fixes to see your score improve.

基于用户的回答，按优先级列出推荐的命令：

/command-name
：需要修复的内容的简短描述（来自评审结果的具体上下文）
/command-name
：简短描述（具体上下文） ...

推荐规则：

仅推荐以下命令：/animate, /quieter, /shape, /optimize, /adapt, /clarify, /distill, /delight, /onboard, /normalize, /audit, /harden, /polish, /extract, /bolder, /arrange, /typeset, /critique, /colorize, /overdrive
优先按用户指定的优先级排序，其次按影响大小排序。
每个条目的描述要包含足够上下文，让命令知道需要聚焦的内容。
每个优先级问题对应合适的命令。
跳过无法解决任何问题的命令。
如果用户选择了有限范围，仅包含该范围内的条目。
如果用户标注了禁止修改的区域，排除会触碰这些区域的命令。
如果推荐了任何修复项，最后以
```
/polish
```
作为收尾步骤。

展示完摘要后，告知用户：

你可以让我逐个运行这些命令，一次性全部运行，或者按你偏好的顺序运行。
修复完成后重新运行
/critique
可以查看评分提升情况。