realistic-ugc-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRealistic UGC Video Production
真实感UGC视频制作
Create long-form AI videos that look and sound authentically human. This skill orchestrates a multi-step workflow using Nano Banana for realistic base images and Kling AI for video generation, with specific techniques to avoid the "AI look".
生成外观和声音都极具真实感的长篇AI视频。本技能通过多步骤工作流协调Nano Banana生成逼真基础图像,Kling AI生成视频,并采用特定技术避免「AI感」。
Why Videos Look AI (And How to Fix It)
AI视频失真的原因及解决方法
| Problem | Solution |
|---|---|
| Too perfect/clean skin | Add imperfections: micro-pores, natural oils, fine lines |
| Studio lighting | Use available/natural light, mixed color temps |
| Character too still | Add micro-movements, head tilts, natural sway |
| Inconsistent pacing | Use 55-60 syllables per clip |
| Robotic voice | Process through Adobe Podcast or Resemble AI |
| Obvious jump cuts | Cover with B-roll or animations |
| Weird AI hands | Crop hands out of frame or keep completely static |
| 问题 | 解决方案 |
|---|---|
| 皮肤过于完美/光滑 | 添加瑕疵:微毛孔、天然油脂、细纹 |
| 演播室灯光 | 使用现有/自然光,混合色温 |
| 人物过于静止 | 添加微动作、头部倾斜、自然晃动 |
| 节奏不一致 | 每个片段控制在55-60个音节 |
| 机械音 | 通过Adobe Podcast或Resemble AI处理音频 |
| 明显的跳剪 | 用B-roll或动画覆盖 |
| AI手部异常 | 将手部移出画面或保持完全静止 |
Known Limitation: Hands
已知局限:手部处理
AI video models (Kling, Veo, etc.) struggle with realistic hand movement. Fingers morph, gestures look unnatural, and hands are often the biggest tell.
Best practice: Keep hands OUT of frame or completely static.
Options:
- Head/shoulders framing - Crop base image to exclude hands entirely
- Arms crossed - Static pose, no finger movement needed
- Hands below frame - Desk edge cuts off at wrists
- Cover with B-roll - Cut away during any hand weirdness in post
AI视频模型(Kling、Veo等)难以生成逼真的手部动作。手指变形、手势不自然,手部往往是最容易暴露AI痕迹的部分。
最佳实践:将手部移出画面或保持完全静止。
可选方案:
- 头肩构图 - 裁剪基础图像,完全排除手部
- 交叉手臂 - 静态姿势,无需手指动作
- 手部置于画面外 - 桌面边缘在手腕处截断画面
- 用B-roll覆盖 - 后期剪辑中用B-roll掩盖手部异常片段
Complete Workflow
完整工作流
Phase 1: Collect Requirements
阶段1:收集需求
Before starting, gather from the user:
- Character description - Age, ethnicity, features, clothing
- Setting/background - Office, home, studio, outdoor
- Script - The full text the character will speak
- Tone - Conversational, urgent, professional, friendly
- Video length target - This determines how many clips needed
开始前,向用户收集以下信息:
- 人物描述 - 年龄、种族、特征、着装
- 场景/背景 - 办公室、家庭、演播室、户外
- 脚本 - 人物要讲述的完整文本
- 语气 - 口语化、紧迫感、专业、友好
- 目标视频时长 - 决定所需片段数量
Phase 2: Generate Base Image (Nano Banana)
阶段2:生成基础图像(Nano Banana)
Use the Nano Banana skill to generate the character image. Critical: Apply the imperfection techniques from CHARACTER-PROMPTING.md.
Key elements for realistic UGC:
- iPhone capture aesthetic (26mm equivalent lens)
- Available/mixed lighting (NOT studio)
- Visible skin texture (pores, oils, fine lines)
- Minor imperfections (stubble, dark circles)
- Computational depth artifacts
- ISO noise (500-900 range)
Command:
bash
~/.claude/skills/nano-banana/scripts/generate.sh "[enhanced prompt]" --aspect 9:16 --size 2KThen optionally upscale through Enhancor AI for additional texture.
使用Nano Banana技能生成人物图像。关键: 应用CHARACTER-PROMPTING.md中的瑕疵添加技术。
真实感UGC的核心要素:
- iPhone拍摄风格(等效26mm镜头)
- 现有/混合灯光(非演播室灯光)
- 可见的皮肤纹理(毛孔、油脂、细纹)
- 轻微瑕疵(胡茬、黑眼圈)
- 计算景深伪影
- ISO噪点(500-900范围)
命令:
bash
~/.claude/skills/nano-banana/scripts/generate.sh "[enhanced prompt]" --aspect 9:16 --size 2K随后可通过Enhancor AI进行超分辨率处理,增加更多纹理细节。
Phase 3: Chunk the Script (Critical for Pacing)
阶段3:脚本拆分(节奏控制关键)
This is the most important step for natural pacing. See SCRIPT-CHUNKING.md.
The 55-60 Syllable Rule:
- Count syllables, not words
- Each video generation = 55-60 syllables
- Never cut mid-sentence
- Add filler sentences to reach target if needed
Example chunking:
Chunk 1 (58 syllables):
"Hey everyone, I wanted to share something that completely changed how I think about productivity. It's not another app or system."
Chunk 2 (56 syllables):
"It's actually about understanding your own energy patterns throughout the day. Once I figured this out, everything clicked into place."这是确保自然节奏最重要的步骤。详见SCRIPT-CHUNKING.md。
55-60音节规则:
- 按音节计数,而非单词数
- 每个视频片段对应55-60个音节
- 切勿在句子中间拆分
- 必要时添加填充句以达到目标音节数
拆分示例:
片段1(58个音节):
"嘿大家好,我想分享一个彻底改变我对生产力认知的方法。它不是另一个应用或系统。"
片段2(56个音节):
"实际上,它关乎了解你一天中的精力模式。一旦我弄明白这一点,一切都步入了正轨。"Phase 4: Generate Video Clips (Kling AI)
阶段4:生成视频片段(Kling AI)
For each script chunk, generate a 10-second video clip. Use the Kling AI skill with movement prompts from MOVEMENT-PROMPTING.md.
Key elements:
- Hand "Home Base" Protocol
- Timestamped movement clusters
- Natural blinks, head tilts, micro-sways
- Consistent base image reference
Spawn background agent for each clip:
Task tool:
- subagent_type: "general-purpose"
- run_in_background: true
- prompt: [Include image URL, script chunk, movement prompt, output path]为每个脚本片段生成10秒视频。使用Kling AI技能,并结合MOVEMENT-PROMPTING.md中的动作提示。
核心要素:
- 手部「基准姿势」协议
- 带时间戳的动作组
- 自然眨眼、头部倾斜、微晃动
- 一致的基础图像参考
为每个片段启动后台代理:
Task tool:
- subagent_type: "general-purpose"
- run_in_background: true
- prompt: [Include image URL, script chunk, movement prompt, output path]Phase 5: Post-Production
阶段5:后期制作
See POST-PRODUCTION.md for detailed guidance.
- Assemble clips in CapCut or similar editor
- Fix audio with Adobe Podcast (minimum) or Resemble AI (for voice swap)
- Cover jump cuts with B-roll or animations
- Remove filler sentences if they feel awkward
- Export in final resolution
详细指南见POST-PRODUCTION.md。
- 拼接片段 - 使用CapCut或类似编辑器
- 音频修复 - 至少通过Adobe Podcast处理,或用Resemble AI进行语音替换
- 覆盖跳剪 - 用B-roll或动画掩盖
- 移除填充句 - 若填充句显得生硬则删除
- 导出 - 导出最终分辨率视频
Quick Start: Single Clip Test
快速上手:单片段测试
Before generating full video, test the workflow with one clip:
- Generate base image with full imperfections
- Create 10-second test video with first script chunk
- Evaluate pacing and movement
- Adjust prompts if needed
- Proceed with full production
生成完整视频前,先用一个片段测试工作流:
- 生成包含所有瑕疵细节的基础图像
- 用第一个脚本片段生成10秒测试视频
- 评估节奏和动作效果
- 必要时调整提示词
- 推进完整视频制作
Example Full Prompt for Base Image
基础图像示例完整提示词
A vertical 9:16 UGC-style video frame captured on an iPhone 11 resting on a tripod.
Medium-wide portrait at true eye-level with slightly forward-leaning posture.
[CHARACTER]: A [age] [gender] with [ethnicity] complexion, [eye color] eyes beneath
[eyebrow description], [nose description]. [Jawline/facial hair]. [Hair description].
Expression is [emotion]—[specific expression details].
[CLOTHING]: [Fitted/casual garment], [collar detail].
[SKIN TEXTURE]: Visible pores across T-zone, faint smile lines, natural oils catching
light on forehead and nose. [Age-appropriate details]. No filter, no foundation.
[FOREGROUND]: Hands rest naturally on [surface], fingers relaxed, visible veins and
knuckle texture. Nearby: [everyday objects like water bottle, phone, notebook].
[CAMERA]: Native iPhone 11 lens (26mm equivalent), slightly wide perspective, mild
barrel softness at edges. Only tiny pockets of neural blur around hair edges.
[LIGHTING]: Available light mix—cool overcast daylight from window left, warm tungsten
from desk lamp right. Soft asymmetric shadows, natural falloff. ISO noise 500-900.
[BACKGROUND]: [Realistic home/office elements]—bookshelf, [furniture], clearly visible
not heavily blurred.
[REALITY DETAILS]: Gentle 35mm film grain, light fingerprint smudge on lens, tiny dust
haze in air. No cinematic bloom, no studio finish.
Styling: raw UGC realism, available indoor light, mixed color temperature, minimal
depth blur, visible ISO noise, emphasis on authenticity.竖屏9:16比例的UGC风格视频帧,由架在三脚架上的iPhone 11拍摄。
真实眼平视角的中宽景肖像,人物身体微微前倾。
[人物]:一位[年龄][性别],[种族]肤色,[眼睛颜色]眼睛,搭配[眉毛描述]、[鼻子描述]。[下颌线/面部毛发]。[发型描述]。表情为[情绪]——[具体表情细节]。
[服装]:[合身/休闲款服装],[衣领细节]。
[皮肤纹理]:T区可见毛孔,淡淡的笑纹,额头和鼻子上的天然油脂在光线下反光。[符合年龄的细节]。无滤镜,无粉底。
[前景]:双手自然放在[表面]上,手指放松,可见血管和指关节纹理。附近:[日常物品,如水杯、手机、笔记本]。
[相机]:iPhone 11原生镜头(等效26mm),略宽视角,边缘轻微桶形虚化。仅在头发边缘有细微的神经模糊。
[灯光]:混合自然光——左侧窗户的冷调阴天光线,右侧台灯的暖调钨丝灯光。柔和的不对称阴影,自然过渡。ISO噪点500-900。
[背景]:[真实家居/办公元素]——书架、[家具],清晰可见,未过度模糊。
[真实感细节]:轻微的35mm胶片颗粒,镜头上的细微指纹污渍,空气中的微尘。无电影级光晕,无演播室质感。
风格:原生UGC真实感,室内自然光,混合色温,低景深模糊,可见ISO噪点,强调真实性。Example Movement Prompt for Video
视频动作提示示例
Hand "Home Base" Protocol: Hands default to Active Idle. Fingers shift, thumbs rub,
wrists rotate slightly while anchored. Gestures only for key emphasis.
[0.0s-0.5s] Pre-roll: Sharp inhale, eyes lock to lens, head still
[0.5s-3.0s] Hands in Active Idle (fingers interlocked), head tilts slightly right,
brows furrow in seriousness
[3.0s-6.0s] Hands break clasp for quick open-palm rotation then return, head drifts
forward, natural blink
[6.0s-8.0s] Hands return to Active Idle (loose clasp, thumbs tapping), head nods
encouragingly, cheeks lift in natural smile
[8.0s-10.0s] Hands anchored (wrist shifts), chin lifts in quick final nod, natural blink
[Script]: "[CHUNK TEXT HERE]"
[Tone]: [Urgent/Conversational/Professional/etc.]
[Pacing]: Rapid fire delivery, high energy, viral UGC style, confident, 2x speed手部「基准姿势」协议:手部默认处于活跃闲置状态。手指微动,拇指摩擦,手腕轻微转动但保持固定。仅在重点强调时做手势。
[0.0s-0.5s] 开场:深吸一口气,目光锁定镜头,头部静止
[0.5s-3.0s] 双手呈活跃闲置状态(手指交握),头部微微向右倾斜,眉头微皱,表情严肃
[3.0s-6.0s] 双手松开交握,快速做开掌旋转动作后恢复,头部微微前倾,自然眨眼
[6.0s-8.0s] 双手回到活跃闲置状态(松散交握,拇指轻敲),头部鼓励性地点动,脸颊自然扬起微笑
[8.0s-10.0s] 手部固定(手腕微动),下巴快速抬起点头收尾,自然眨眼
[脚本]:「[片段文本]」
[语气]:[紧迫感/口语化/专业等]
[节奏]:快节奏表达,高能量,爆款UGC风格,自信,2倍速Reference Files
参考文件
- CHARACTER-PROMPTING.md - Imperfection techniques for realistic characters
- SCRIPT-CHUNKING.md - The syllable method for consistent pacing
- MOVEMENT-PROMPTING.md - Natural movement choreography
- POST-PRODUCTION.md - Audio fixing and editing tips
- CHARACTER-PROMPTING.md - 逼真人物的瑕疵添加技术
- SCRIPT-CHUNKING.md - 基于音节的节奏控制方法
- MOVEMENT-PROMPTING.md - 自然动作编排
- POST-PRODUCTION.md - 音频修复与剪辑技巧
Alternative: InfiniteTalk
替代方案:InfiniteTalk
For simpler long-form videos, consider InfiniteTalk (infinitetalk.ai).
Pros: Single generation for longer videos
Cons: Less control over pacing (no timer/duration control), charged by output length
Use the syllable method above when precise pacing control is needed.
对于更简单的长篇视频,可考虑InfiniteTalk(infinitetalk.ai)。
优势: 单次生成即可得到较长视频
劣势: 节奏控制较弱(无计时器/时长控制),按输出长度收费
当需要精准控制节奏时,使用上述音节拆分方法。
Checklist Before Generating
生成前检查清单
- Character prompt includes skin imperfections
- Lighting is available/natural, NOT studio
- Script chunked into 55-60 syllable segments
- Each chunk is complete sentences
- Movement prompt includes hand base, head movements, blinks
- Post-production plan for audio and jump cuts
- 人物提示词包含皮肤瑕疵
- 使用现有/自然光,而非演播室灯光
- 脚本拆分为55-60音节的片段
- 每个片段为完整句子
- 动作提示包含手部基准姿势、头部动作、眨眼
- 制定了音频修复和跳剪覆盖的后期计划