Step 1: Pre-Playtest Preparation
Define the test objectives. Write 3-5 specific questions this playtest will answer. Each question should be:
- Observable (you can determine the answer by watching, not just asking)
- Actionable (the answer directly informs a design decision)
- Scoped (answerable within a single play session)
Good test questions:
- "Do players discover the dodge-roll mechanic organically within the first two encounters?"
- "At what point in the progression curve do players stop voluntarily exploring and start rushing to objectives?"
- "Does the resource scarcity in Act 2 create tension or frustration?"
Prepare the observation sheet. For each test question, define:
- What specific player behaviors indicate success (positive signals)
- What specific player behaviors indicate failure (negative signals)
- Where in the game session to watch most closely (critical observation windows)
Create the per-player tracking form:
Player ID: ___
Session Date: ___
Session Duration: ___
Test Build Version: ___
Timestamped Observations:
[MM:SS] [Observation] [Category: Action/Hesitation/Confusion/Emotion/Verbal]
Post-Session Survey Responses:
Q1: ___
Q2: ___
Q3: ___
Set up recording infrastructure:
- Screen capture with audio (mandatory -- you will miss things in real-time that the recording catches)
- Face camera if available (facial micro-expressions reveal engagement, confusion, and frustration that players will never verbalize)
- Input logging if your engine supports it (heatmaps of where players click, where they die, where they spend time)
- Ensure recordings are timestamped and synchronized so you can cross-reference player expression with game events
Prepare the test environment:
- Use a consistent hardware setup across all testers (different frame rates and input devices contaminate results)
- Remove development overlays, debug menus, and console access
- Disable any developer shortcuts or god-mode toggles
- Have a clean save state ready so every tester starts from the same point
- Test the recording setup with a dry run before the first tester arrives
Brief your facilitators (if you have helpers):
- Their only job is to observe and record. Not to help. Not to explain. Not to react.
- If a tester asks "What do I do?" the correct response is: "What do you think you should do?"
- If a tester is completely stuck for more than 90 seconds on a non-critical path, they may offer a single neutral hint ("Have you tried interacting with the glowing object?"). Log this as a critical finding.
- Facilitators should not sit directly next to the player. Peripheral awareness of being watched changes behavior. Sit behind and to the side.
Step 2: During the Playtest -- Silent Observation Protocol
This is where discipline matters most. You are a scientist. Your personal feelings about the game are irrelevant during this phase.
Real-time observation categories:
Actions -- What is the player doing?
- Record moment-to-moment decisions. Not just "player fought the boss" but "player circled the boss for 15 seconds before attacking, suggesting they were looking for a weak point or building courage."
- Track navigation patterns. Do players go where you intended? Where do they go instead? Unintended exploration paths reveal what the environment is actually communicating versus what you think it communicates.
- Note input patterns. Button mashing (panic or boredom), deliberate presses (strategic engagement), repeated failed inputs (control confusion).
Hesitations -- Where does the player pause?
- A pause before a door means the player is anticipating what is behind it (good -- you created tension).
- A pause at a menu means the player does not understand the options (bad -- your UI is unclear).
- A pause in combat means the player is either strategizing (good) or overwhelmed (bad). Their facial expression and subsequent action disambiguate.
Confusions -- Where does the player misunderstand?
- Track "expectation mismatches" -- moments where the player clearly expected one outcome and got another. These are the highest-value findings in any playtest.
- Note instances where the player uses a mechanic incorrectly but thinks they are using it correctly. This reveals that your feedback systems are not communicating state clearly.
- Watch for players reading the same tooltip or sign multiple times -- it means the information was unclear or they do not trust their own understanding.
Emotions -- What is the player feeling?
- Delight indicators: leaning forward, widening eyes, spontaneous laughter, "cool" or "whoa" vocalizations, showing the screen to someone nearby
- Frustration indicators: sighing, leaning back, crossing arms, clicking more aggressively, muttering, eye-rolling
- Engagement indicators: losing track of time, ignoring phone notifications, asking "can I keep playing?" at the end
- Disengagement indicators: checking phone, looking around the room, playing with reduced attention, asking "how much longer?"
- Flow state indicators: quiet focus, rhythmic input patterns, surprise when told time is up, difficulty recalling specific moments (they were "in it"). Hades playtests reportedly showed players losing 30+ minutes without checking the clock -- the gold standard for flow state confirmation
Map emotional responses to specific game moments. This creates an emotional heatmap of the play session -- where are the peaks and valleys? Compare this to your intended emotional arc from the design document.
Verbal observations (think-aloud protocol, if used):
- Record the player's real-time narration without filtering or correcting.
- Flag moments where what the player says contradicts what they are doing -- these are gold. "This is easy" followed by dying three times reveals a gap between perceived and actual skill.
Step 3: Post-Session Debrief
Keep it short. 5-7 minutes maximum. The player's attention is most valuable while the experience is fresh, but fatigue sets in quickly after a play session.
Core debrief questions (ask in this order):
- "What was the game about?" -- tests whether the core fantasy and theme communicated clearly
- "What were you trying to do most of the time?" -- reveals whether the player understood the primary objective and core loop
- "Was there a moment that stood out as particularly good?" -- identifies delight peaks from the player's perspective (cross-reference with your observations)
- "Was there a moment that felt confusing or frustrating?" -- identifies friction from the player's perspective
- "If you could change one thing, what would it be?" -- reveals the player's top-of-mind pain point
Optional deep-dive questions (only if time permits and the answer informs a test question):
- "Did you feel like you understood what your options were at any given time?" -- tests decision clarity
- "Did the difficulty feel about right, too easy, or too hard?" -- subjective difficulty assessment (triangulate with behavioral data)
- "Was there anything you wanted to do that the game didn't let you?" -- reveals affordance gaps
Do NOT ask:
- "Did you like it?" -- useless. Social pressure ensures a positive answer.
- "Would you buy it?" -- irrelevant at this stage and puts the player in an evaluative mindset that suppresses honest feedback.
- Leading questions: "Did you notice how the lighting changed in the cave?" -- you are feeding them the observation you want.
Step 4: Post-Playtest Analysis
Wait at least 2 hours after the last session before analyzing. Immediate analysis is contaminated by recency bias -- the last tester's experience dominates your thinking.
Cross-player pattern identification:
- Compile observations into a matrix: rows are game moments/features, columns are players
- Highlight moments where 3+ players exhibited the same behavior -- these are systemic findings, not individual quirks
- Identify divergence points: moments where players split into distinct behavior groups (this reveals a design fork that may need to be resolved or embraced)
Finding classification:
Categorize every finding by severity:
| Severity | Definition | Action Required |
|---|
| Critical | Breaks the core experience. Player cannot progress, or the intended emotion is inverted (frustration instead of triumph). | Must fix before next playtest. |
| Major | Degrades the experience significantly. Player can proceed but the quality of the experience is noticeably diminished. | Should fix in current milestone. |
| Minor | Could be better. Player notices but is not significantly impacted. | Fix when convenient, or batch into a polish pass. |
| Observation | Interesting behavioral note that does not indicate a problem but may inform future design decisions. | Log for reference. No action required. |
Recommendation generation:
For each Critical and Major finding, generate a specific, actionable recommendation:
- What to change (be concrete -- "reduce enemy count in room 3 from 5 to 3" not "make it easier")
- Why it will help (connect the recommendation to the observed behavior)
- Expected impact (what should change in the next playtest if this fix works)
- Potential side effects (will this fix create new problems elsewhere?)
Longitudinal comparison:
If prior playtest data exists, compare results across iterations:
- Which findings from the previous playtest were addressed, and did the fixes work?
- Which problems persisted despite attempted fixes (these may be structural, not surface-level)?
- Is the overall trajectory improving? Are you fixing more than you are breaking?
Step 5: Report Generation and Distribution
Compile the analysis into the standardized Playtest Report format (see Output Format below). Distribute to the full team with a 2-sentence executive summary at the top -- the lead designer and producer need the headline without reading 10 pages.
步骤1:测试前准备
明确测试目标。列出本次playtest要解答的3-5个具体问题。每个问题需满足:
- 可观察(通过观察而非询问就能得出答案)
- 可落地(答案能直接指导设计决策)
- 范围明确(单次测试会话内可解答)
优秀的测试问题示例:
- “玩家能否在前两场战斗中自主发现闪避翻滚机制?”
- “玩家在进度曲线的哪个阶段会停止主动探索,转而直奔目标?”
- “第二章的资源稀缺性会带来紧张感还是挫败感?”
准备观察记录表。针对每个测试问题,定义:
- 哪些具体玩家行为代表成功(积极信号)
- 哪些具体玩家行为代表失败(消极信号)
- 测试会话中需要重点关注的时段(关键观察窗口)
创建单玩家跟踪表单:
Player ID: ___
Session Date: ___
Session Duration: ___
Test Build Version: ___
Timestamped Observations:
[MM:SS] [Observation] [Category: Action/Hesitation/Confusion/Emotion/Verbal]
Post-Session Survey Responses:
Q1: ___
Q2: ___
Q3: ___
搭建录制基础设施:
- 带音频的屏幕录制(必填——实时观察会遗漏细节,录制内容可事后回看)
- 如有条件,开启面部摄像头(面部微表情能揭示玩家不会口头表达的参与度、困惑与挫败感)
- 若引擎支持,开启输入日志(玩家点击位置、死亡地点、停留时间的热力图)
- 确保录制内容带时间戳并同步,以便将玩家表情与游戏事件交叉比对
准备测试环境:
- 所有测试者使用一致的硬件配置(不同帧率和输入设备会影响结果)
- 移除开发Overlay、调试菜单和控制台权限
- 禁用任何开发者快捷键或无敌模式开关
- 准备干净的存档状态,确保所有测试者从同一节点开始
- 正式测试前进行试运行,验证录制设置正常
向主持人(如有助手)说明要求:
- 他们的唯一工作是观察和记录。不要提供帮助、解释或做出反应。
- 如果测试者问“我该做什么?”,正确回应是:“你觉得你应该做什么?”
- 如果测试者在非关键路径上完全卡壳超过90秒,可提供一句中立提示(比如“你试过和发光物体互动吗?”)。并将此记录为关键发现。
- 主持人不要坐在测试者正旁边。被注视的感知会改变玩家行为。应坐在测试者身后侧方。
步骤2:测试期间——静默观察协议
这一阶段最需要纪律性。你是一名科学家,对游戏的个人感受在此阶段无关紧要。
实时观察分类:
行为——玩家在做什么?
- 记录每一刻的决策。不只是“玩家与Boss战斗”,而是“玩家绕Boss转圈15秒后才发起攻击,说明他们在寻找弱点或鼓起勇气”。
- 追踪导航模式。玩家是否走向你预期的方向?他们实际走向了哪里?意外的探索路径揭示了环境实际传递的信息,与你认为传递的信息之间的差异。
- 记录输入模式。连按按钮(恐慌或无聊)、刻意按键(策略性参与)、重复错误输入(操控困惑)。
犹豫——玩家在哪里停顿?
- 门前停顿意味着玩家在预期门后的内容(好现象——你营造了紧张感)。
- 菜单停顿意味着玩家不理解选项(坏现象——UI设计不清晰)。
- 战斗中的停顿可能是玩家在制定策略(好现象)或不知所措(坏现象)。他们的面部表情和后续行为可区分这两种情况。
困惑——玩家在哪里产生误解?
- 追踪“预期偏差”——玩家明确预期某一结果但实际得到另一种结果的时刻。这些是playtest中最有价值的发现。
- 记录玩家错误使用机制但认为自己操作正确的情况。这说明你的反馈系统没有清晰传达状态。
- 留意玩家反复阅读同一提示或标识的情况——这意味着信息表述不清,或者玩家不信任自己的理解。
情绪——玩家的感受是什么?
- 愉悦信号:身体前倾、眼睛睁大、自发大笑、发出“酷”或“哇”的声音、向身边人展示屏幕
- 挫败信号:叹气、身体后靠、抱臂、点击动作更用力、喃喃自语、翻白眼
- 参与信号:忘记时间、忽略手机通知、测试结束时问“我能继续玩吗?”
- 脱离信号:查看手机、环顾房间、注意力分散、问“还要多久?”
- 心流状态信号:专注沉默、输入节奏稳定、被告知时间到感到惊讶、难以回忆具体时刻(他们完全“沉浸其中”)。据报道,《哈迪斯》(Hades)的playtest中,玩家会连续玩30多分钟不看时间——这是心流状态的黄金标准。
将情绪反应与特定游戏时刻关联。这会生成测试会话的情绪热力图——哪些时段是峰值,哪些是低谷?将其与设计文档中预期的情绪曲线对比。
口头观察(若使用出声思考协议):
- 如实记录玩家的实时表述,不筛选或纠正。
- 标记玩家言行不一的时刻——这些是黄金发现。比如“这很简单”之后连续死亡三次,说明玩家感知技能与实际技能存在差距。
步骤3:测试后访谈
保持简短。最多5-7分钟。玩家的注意力在体验刚结束时最有价值,但测试后很快会产生疲劳。
核心访谈问题(按以下顺序提问):
- “这个游戏是关于什么的?”——测试核心设定与主题是否传达清晰
- “你大部分时间都在尝试做什么?”——揭示玩家是否理解主要目标与核心循环
- “有没有哪个时刻特别出彩?”——从玩家视角识别愉悦峰值(与你的观察交叉比对)
- “有没有哪个时刻让你感到困惑或挫败?”——从玩家视角识别体验摩擦点
- “如果只能改一个地方,你会改什么?”——揭示玩家最在意的痛点
可选深度问题(仅在时间允许且答案能解答测试问题时提问):
- “你觉得自己在任何时刻都清楚有哪些选择吗?”——测试决策清晰度
- “难度感觉合适、太简单还是太难?”——主观难度评估(结合行为数据 triangulate)
- “有没有什么你想做但游戏不允许的事?”——揭示功能缺口
请勿提问:
- “你喜欢这个游戏吗?”——毫无意义。社交压力会确保玩家给出正面答案。
- “你会买它吗?”——现阶段无关紧要,且会让玩家进入评价心态,抑制真实反馈。
- 诱导性问题:“你注意到洞穴里的灯光变化了吗?”——你在灌输自己想要的观察结果。
步骤4:测试后分析
最后一场测试结束后至少等待2小时再进行分析。即时分析会受近因偏差影响——最后一名测试者的体验会主导你的判断。
跨玩家模式识别:
- 将观察结果整理成矩阵:行是游戏时刻/功能,列是玩家
- 高亮3名及以上玩家表现出相同行为的时刻——这些是系统性发现,而非个体特例
- 识别分歧点:玩家行为分成不同群体的时刻(这揭示了可能需要解决或接纳的设计分支)
发现分类:
按严重程度对所有发现进行分类:
| 严重程度 | 定义 | 行动要求 |
|---|
| Critical(致命) | 破坏核心体验。玩家无法推进,或预期情绪完全反转(本该胜利却感到挫败) | 必须在下一次playtest前修复 |
| Major(严重) | 显著降低体验质量。玩家可以推进,但体验质量明显下降 | 应在当前里程碑内修复 |
| Minor(轻微) | 可以优化。玩家会注意到,但不会受到显著影响 | 方便时修复,或批量纳入打磨阶段 |
| Observation(观察) | 有趣的行为记录,不代表问题,但可能为未来设计提供参考 | 记录存档。无需行动 |
生成建议:
针对每一项Critical和Major发现,生成具体、可落地的建议:
- 要修改什么(具体明确——比如“将3号房间的敌人数量从5个减少到3个”而非“降低难度”)
- 为什么这会有帮助(将建议与观察到的行为关联)
- 预期影响(如果修复生效,下一次playtest中会有哪些变化)
- 潜在副作用(修复是否会在其他地方引发新问题?)
纵向对比:
如果存在过往playtest数据,对比不同迭代的结果:
- 上一次playtest的哪些发现已解决,修复是否有效?
- 哪些问题在尝试修复后仍存在(这些可能是结构性问题,而非表面问题)?
- 整体趋势是否向好?修复的问题是否比新增的多?
步骤5:报告生成与分发
将分析结果整理成标准化的Playtest报告格式(见下方输出格式)。分发给整个团队,报告顶部附上2句话的执行摘要——首席设计师和制作人无需阅读10页内容,只需了解核心信息。