world-class-carousel
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWorld-Class Instagram Carousel Generator
世界级Instagram轮播图生成器
Generate Instagram carousels that are genuinely world-class: content people save, share, and come back to. Not engagement bait. Not AI slop. Actual value, delivered through precise visual design and narrative structure.
This skill is fully generalized. It contains FORM (structure, principles, patterns), not MATTER (specific topics). The user provides the matter (topic); the skill provides the form (archetypes, design system, music matrix, quality gates). Together they produce the carousel. Nothing is hardcoded.
生成真正世界级的Instagram轮播内容:用户愿意保存、分享并反复查看的内容,而非博眼球的噱头或AI生成的劣质内容。通过精准的视觉设计和叙事结构传递实际价值。
本工具完全通用。它包含「形式」(结构、原则、模式),而非「内容」(特定主题)。用户提供内容(主题),工具提供形式(原型、设计系统、音乐矩阵、质量关卡),二者结合产出轮播图。无任何硬编码内容。
BEFORE YOU START: Read KNOWN_ISSUES.md
KNOWN_ISSUES.md开始前必读:查看KNOWN_ISSUES.md
KNOWN_ISSUES.mdBefore generating ANY carousel, read . It contains compressed rules from all previous sessions -- data format gotchas, sizing rules, visual strategy decisions, and quality gates. Ignoring it means repeating solved mistakes.
/home/node/.claude/skills/world-class-carousel/KNOWN_ISSUES.md在生成任何轮播图之前,请阅读。其中包含所有过往会话总结的压缩规则——数据格式注意事项、尺寸规则、视觉策略决策及质量关卡。忽略它意味着重复已解决的错误。
/home/node/.claude/skills/world-class-carousel/KNOWN_ISSUES.mdEXECUTION PIPELINE
执行流程
When the user requests a carousel, execute these 6 phases in order (Phase 6 runs post-delivery):
当用户请求生成轮播图时,按顺序执行以下6个阶段(第6阶段在交付后执行):
PHASE 1: RESEARCH & STRUCTURING
阶段1:研究与结构化
- Analyze the topic -- What is the core insight? What specific value can this deliver?
- Identify the audience -- What does the target audience NOT already know? What's their current understanding?
- Auto-detect content vertical and theme -- Use the Content Vertical Detection table below
- Select the archetype -- Which of the 7 carousel archetypes (see below) fits best? Use the Archetype Selection Guide below. Auto-select unless the user specifies.
- Design the narrative arc -- Map each archetype role to a renderer slide type using the Role-to-SlideType Mapping below. Ensure each slide creates a curiosity gap that the next slide resolves.
- Run the Bullshit Test on the outline -- Does every slide pass? (See QUALITY GATE below)
- 分析主题——核心洞察是什么?能传递哪些具体价值?
- 定位受众——目标受众尚不了解什么?他们当前的认知水平如何?
- 自动检测内容垂直领域与主题——使用下方的「内容垂直领域检测表」
- 选择原型——7种轮播图原型(见下文)中哪一种最契合?使用下方的「原型选择指南」。除非用户指定,否则自动选择。
- 设计叙事弧——使用「角色到幻灯片类型映射表」,将每个原型角色对应到渲染器幻灯片类型。确保每张幻灯片都能制造好奇心缺口,并由下一张幻灯片填补。
- 对大纲执行「废话检测」——每张幻灯片是否都能通过?(见下文「质量关卡」)
Content Vertical Detection (Topic -> Theme)
内容垂直领域检测(主题→风格)
Analyze the topic and auto-select the renderer theme:
| Content Vertical | Keywords/Signals | Renderer Theme | Background Style |
|---|---|---|---|
| Tech / AI / Coding | AI, code, developer, API, tools, stack, programming, SaaS, data | | |
| Business / Strategy | growth, revenue, startup, founder, marketing, sales, strategy, scale | | |
| Education / How-To | learn, tutorial, guide, roadmap, beginner, master, course, how to | | |
| Creative / Design | design, UX, brand, visual, aesthetic, portfolio, creative | | |
| Mindset / Philosophy | mindset, habits, productivity, stoic, growth, mental, philosophy | | |
If the user specifies a brand config with a theme, always use that instead.
分析主题并自动选择渲染器主题:
| 内容垂直领域 | 关键词/信号 | 渲染器主题 | 背景样式 |
|---|---|---|---|
| 科技/AI/编程 | AI、code、developer、API、tools、stack、programming、SaaS、data | | |
| 商业/战略 | growth、revenue、startup、founder、marketing、sales、strategy、scale | | |
| 教育/教程 | learn、tutorial、guide、roadmap、beginner、master、course、how to | | |
| 创意/设计 | design、UX、brand、visual、aesthetic、portfolio、creative | | |
| 心态/哲学 | mindset、habits、productivity、stoic、growth、mental、philosophy | | |
如果用户指定了包含主题的品牌配置,则始终优先使用该配置。
Content Category Selection (10 Categories, Aristotelian)
内容类别选择(10类,基于亚里士多德理论)
Each category has unique visual DNA derived from psychology axioms (Cialdini, cognitive load theory, dual coding, serial position effect). Select based on topic:
| If the topic is about... | Category | Arc Shape | Hook Style | Primary Cialdini |
|---|---|---|---|---|
| Explaining a research paper | | Revelatory | Face + paper panel | Authority |
| Comparing AI tools/models | | Divergent | Multi-screenshot face-off | Social Proof |
| Today's AI development | | Convergent | News-editorial face | Scarcity |
| Step-by-step AI tool how-to | | Linear | Phone-in-hand / device mockup | Reciprocity |
| Controversial opinion | | Confrontational | Bold abstract typography | Authority |
| Copy-paste prompts/templates | | Divergent | Phone screenshot mockup | Reciprocity |
| Complete sector overview | | Divergent | Multi-person face-off | Authority |
| Build [X] with AI project | | Linear+Reveal | Multi-device result showcase | Social Proof |
| Funding/business news | | Convergent | Founder portrait + data | Scarcity |
| Future predictions/timeline | | Revelatory | Abstract cinematic AI imagery | Scarcity |
Universal Psychology Rules (apply to ALL categories):
- Max 4 information chunks per slide (Cognitive Load Theory, Sweller)
- Pattern interrupt every 2-3 slides (diagram, comparison, color shift, or layout change)
- Density wave: H-M-H-M-H-H-M (never 3 high-density slides consecutively)
- Synthesis slide = THE save trigger (Serial Position Effect: last items remembered best)
- Dual-code the hardest concept (Paivio: visual + text = 6.5x retention)
- CTA matches save trigger: utility categories → "Save this", social categories → "Share/Comment"
Category-to-Slide-Sequence Quick Reference:
- (9 slides): hook → body → diagram → body → body → diagram → body → synthesis → cta
paper_decoder - (8 slides): hook → body → comparison → body → body → comparison → synthesis → cta
tool_showdown - (8 slides): hook → body → body → body → diagram → body → synthesis → cta
breaking_news - (8 slides): hook → body → tool → tool → tool → body → synthesis → cta
tool_tutorial - (7 slides): hook → body → body → body → body → synthesis → cta (text-driven, no diagrams)
hot_take - (9 slides): hook → body → body → body → comparison → body → body → synthesis → cta
prompt_playbook - (9 slides): hook → diagram → body → body → body → comparison → diagram → synthesis → cta
industry_map - (8 slides): hook → body → tool → tool → tool → body → synthesis → cta
build_this - (7 slides): hook → body → body → body → diagram → synthesis → cta
founders_money - (8 slides): hook → body → body → diagram → body → body → synthesis → cta
future_scenario
每个类别都有源自心理学公理(西奥迪尼、认知负荷理论、双重编码、系列位置效应)的独特视觉基因。根据主题选择:
| 若主题关于... | 类别 | 弧型 | 钩子风格 | 核心西奥迪尼原则 |
|---|---|---|---|---|
| 解读研究论文 | | 揭秘型 | 人物+论文面板 | 权威性 |
| 对比AI工具/模型 | | 发散型 | 多截图对决 | 社会认同 |
| 最新AI动态 | | 收敛型 | 新闻编辑式人物 | 稀缺性 |
| AI工具分步教程 | | 线性型 | 手持设备/设备样机 | 互惠性 |
| 争议性观点 | | 对抗型 | 醒目抽象排版 | 权威性 |
| 可复制的提示词/模板 | | 发散型 | 手机截图样机 | 互惠性 |
| 完整行业概览 | | 发散型 | 多人物对决 | 权威性 |
| 用AI构建[X]项目 | | 线性+揭秘型 | 多设备成果展示 | 社会认同 |
| 融资/商业新闻 | | 收敛型 | 创始人肖像+数据 | 稀缺性 |
| 未来预测/时间线 | | 揭秘型 | 抽象电影级AI图像 | 稀缺性 |
通用心理学规则(适用于所有类别):
- 每张幻灯片最多4个信息块(认知负荷理论,斯韦勒)
- 每2-3张幻灯片设置一次模式中断(图表、对比、颜色变化或布局更改)
- 密度波动:高-中-高-中-高-高-中(绝不要连续3张高密度幻灯片)
- 总结幻灯片=核心保存触发点(系列位置效应:最后内容记忆最深刻)
- 最难的概念采用双重编码(派维奥:视觉+文本=6.5倍留存率)
- 行动号召(CTA)匹配保存触发点:实用类→「保存此内容」,社交类→「分享/评论」
类别到幻灯片序列快速参考:
- (9张):钩子→主体→图表→主体→主体→图表→主体→总结→CTA
paper_decoder - (8张):钩子→主体→对比→主体→主体→对比→总结→CTA
tool_showdown - (8张):钩子→主体→主体→主体→图表→主体→总结→CTA
breaking_news - (8张):钩子→主体→工具→工具→工具→主体→总结→CTA
tool_tutorial - (7张):钩子→主体→主体→主体→主体→总结→CTA(纯文本,无图表)
hot_take - (9张):钩子→主体→主体→主体→对比→主体→主体→总结→CTA
prompt_playbook - (9张):钩子→图表→主体→主体→主体→对比→图表→总结→CTA
industry_map - (8张):钩子→主体→工具→工具→工具→主体→总结→CTA
build_this - (7张):钩子→主体→主体→主体→图表→总结→CTA
founders_money - (8张):钩子→主体→主体→图表→主体→主体→总结→CTA
future_scenario
Role-to-SlideType Mapping
角色到幻灯片类型映射
Map each archetype role to a renderer slide type when building the carousel spec:
| Archetype Role | Renderer Slide Type | Notes |
|---|---|---|
| | Use |
| | Use |
| | Use |
| | Use |
| | Use |
| | Use |
| | Use title highlight to emphasize the key outcome |
| | Use |
| | Use |
| | Use bullets for listed points |
构建轮播图规格时,将每个原型角色对应到渲染器幻灯片类型:
| 原型角色 | 渲染器幻灯片类型 | 说明 |
|---|---|---|
| | 使用 |
| | 对关键词使用 |
| | 对关键点使用 |
| | 对项目名称使用 |
| | 使用 |
| | 使用 |
| | 使用标题高亮强调关键成果 |
| | 使用 |
| | 使用 |
| | 使用列表展示要点 |
PHASE 1.5: VISUAL STRATEGY DECISION (Before Writing Content)
阶段1.5:视觉策略决策(内容创作前)
Before writing any content, decide the visual strategy for this carousel. You have access to multiple tools -- choose the right ones for the topic.
在撰写任何内容之前,确定此轮播图的视觉策略。你可使用多种工具——为主题选择合适的工具。
Available Visual Tools Inventory
可用视觉工具清单
| Tool | What It Does | When to Use | How to Invoke |
|---|---|---|---|
| AI Cinematic Images | HD photorealistic/artistic images (Gemini 3 Pro) | Hook/CTA backgrounds, emotional priming, conceptual anchoring | |
| AI Flowcharts/Diagrams | Production-quality flowcharts with text labels, arrows, boxes | Process flows, pipelines, decision trees -- REPLACES TikZ for better visuals | |
| AI Architecture Diagrams | Blueprint-style system diagrams with components and connections | Microservices, tech stacks, system design | |
| AI Infographics/Charts | Bar charts, data visualizations with accurate labels and proportions | Market data, statistics, comparisons | |
| AI Abstract Backgrounds | Neural networks, geometric patterns, cosmic visuals | Slide backgrounds via | |
| TikZ Diagrams | Vector flowcharts in LaTeX (basic but reliable) | Simple 3-5 node flows where AI image gen is overkill | Use |
| Gradient Backgrounds | TikZ-rendered gradient fills with geometric accents | Default for all text-only slides | Set |
| 工具 | 功能 | 使用场景 | 调用方式 |
|---|---|---|---|
| AI电影级图像 | 高清写实/艺术风格图像(Gemini 3 Pro) | 钩子/CTA背景、情绪铺垫、概念锚定 | 使用 |
| AI流程图/图表 | 生产级流程图,含文本标签、箭头、框 | 流程、管道、决策树——替代TikZ,视觉效果更佳 | 使用 |
| AI架构图 | 蓝图风格系统图,含组件与连接关系 | 微服务、技术栈、系统设计 | 使用 |
| AI信息图/图表 | 柱状图、数据可视化,含精准标签与比例 | 市场数据、统计、对比 | 使用 |
| AI抽象背景 | 神经网络、几何图案、宇宙视觉效果 | 通过 | 使用 |
| TikZ图表 | LaTeX矢量流程图(基础但可靠) | 简单3-5节点流程,AI图像生成过于冗余时使用 | 使用 |
| 渐变背景 | TikZ渲染的渐变填充,含几何装饰 | 所有纯文本幻灯片的默认选择 | 在幻灯片数据中设置 |
CRITICAL: Slide-Type Visual Rule (Experimentally Verified)
关键:幻灯片类型视觉规则(经实验验证)
This rule was established through controlled A/B experiments (7 strategies, same content, scored 1-10). It overrides gut instinct:
| Slide Type | Visual Strategy | WHY (Experimental Evidence) |
|---|---|---|
| Hook | | Scroll-stopping power. First slide = 80% of engagement. Score: 8.0/10 |
| Body | TEXT-ONLY. No images. | Images on body slides destroy 40% of content space. Text-only scored 8.3/10 vs 5.7/10 with images |
| Diagram | AI-generated diagram as | Gemini 3 Pro generates production-quality flowcharts with readable labels, arrows, and boxes. Far more visually striking than basic TikZ. Use |
| Synthesis | Text-only | Save-worthy reference material. Images would reduce information density. |
| CTA | | Emotional close with visual punch. |
DO NOT put AI images on body slides. This was the single biggest quality mistake found in testing.
DO NOT use browser screenshots on any slides. They always look terrible embedded in carousel slides.
此规则通过受控A/B实验确立(7种策略,相同内容,评分1-10)。它优先于直觉判断:
| 幻灯片类型 | 视觉策略 | 实验依据 |
|---|---|---|
| 钩子 | | 停滑能力。第一张幻灯片决定80%的互动率。评分:8.0/10 |
| 主体 | 纯文本,无图像 | 主体幻灯片添加图像会占用40%的内容空间。纯文本评分:8.3/10 |
| 图表 | AI生成图表作为 | Gemini 3 Pro可生成带清晰标签、箭头和框的生产级流程图,视觉效果远优于基础TikZ。使用 |
| 总结 | 纯文本 | 值得保存的参考资料。添加图像会降低信息密度 |
| CTA | | 情绪收尾,视觉冲击力强 |
请勿在主体幻灯片中添加AI图像。这是测试中发现的最大质量问题。
请勿在任何幻灯片中使用浏览器截图。嵌入轮播图时,截图效果始终很差。
Visual Strategy Decision Matrix (Topic-Level)
视觉策略决策矩阵(主题级)
For each topic, determine the primary visual mode, background style, and which slide-level visuals to use:
| Topic Type | Background Style | Hook Visual | Body Visuals | Diagram Strategy | Example |
|---|---|---|---|---|---|
| Philosophy / Mindset | | AI image: symbolic figure | None (text carries weight) | AI-generated concept map | Stoic principles: marble bust + storm |
| Tool Review / SaaS | | AI image: abstract tech glow | None (text-only bullets describe tools) | AI-generated comparison chart | "6 AI Tools": text descriptions + AI chart |
| News / Current Events | | AI image: dramatic scene | None (text with citations) | AI-generated timeline or power map | "AI War 2025": cinematic + AI power map |
| Technical Tutorial | | AI image: conceptual diagram | None (step-by-step text) | AI-generated architecture/flowchart | "Deploy with Docker": AI architecture diagram |
| Business / Strategy | | AI image: bold abstract | None (text with real data citations) | AI-generated bar chart or funnel | "Growth Hacking": AI infographic |
| Comparison / Versus | | AI image: abstract contrast | | AI-generated side-by-side chart | "React vs Vue": comparison columns + AI chart |
| Creative / Design | | AI image: artistic/gallery quality | None (text-only) | AI-generated process flow | "UX Trends 2025": artistic + AI flow |
| Framework / Mental Model | | AI image: system metaphor | None (text explains components) | AI-generated flowchart (preferred over TikZ) | "OODA Loop": AI flowchart as |
| Data / Research | | AI image: data visualization concept | None (text with specific numbers) | AI-generated bar chart / infographic | "AI Market 2025": AI bar chart |
针对每个主题,确定主要视觉模式、背景样式及幻灯片级视觉方案:
| 主题类型 | 背景样式 | 钩子视觉 | 主体视觉 | 图表策略 | 示例 |
|---|---|---|---|---|---|
| 哲学/心态 | | AI图像:象征性人物 | 无(文本承载核心) | AI生成概念图 | 斯多葛原则:大理石雕像+风暴 |
| 工具评测/SaaS | | AI图像:抽象科技光效 | 无(纯文本列表描述工具) | AI生成对比图表 | "6款AI工具":文本描述+AI图表 |
| 新闻/时事 | | AI图像:戏剧性场景 | 无(带引用的文本) | AI生成时间线或权力图 | "2025年AI大战":电影级图像+AI权力图 |
| 技术教程 | | AI图像:概念图 | 无(分步文本) | AI生成架构/流程图 | "用Docker部署":AI架构图 |
| 商业/战略 | | AI图像:醒目抽象图 | 无(带真实数据引用的文本) | AI生成柱状图或漏斗图 | "增长黑客":AI信息图 |
| 对比/对决 | | AI图像:抽象对比 | | AI生成并列图表 | "React vs Vue":对比分栏+AI图表 |
| 创意/设计 | | AI图像:艺术/画廊级质量 | 无(纯文本) | AI生成流程图 | "2025年UX趋势":艺术图像+AI流程图 |
| 框架/思维模型 | | AI图像:系统隐喻 | 无(文本解释组件) | AI生成流程图(优先于TikZ) | "OODA循环":AI流程图作为 |
| 数据/研究 | | AI图像:数据可视化概念 | 无(带具体数字的文本) | AI生成柱状图/信息图 | "2025年AI市场":AI柱状图 |
AI Image Generation Best Practices
AI图像生成最佳实践
Model & Routing:
- Use skill (uses
generate-image). Nano-banana-pro requiresAI_GATEWAY_API_KEY(often unset) but uses the same underlying model.GEMINI_API_KEY - Model: (primary). Fallback:
google/gemini-3-pro-image-preview.google/gemini-3.1-flash-image-preview - Output: ~1408x768 landscape. Overlay compensates for portrait stretch on slides.
Gemini 3 Pro Proven Capabilities (Experimentally Verified):
| Capability | Quality | Best Use in Carousels | Prompt Strategy |
|---|---|---|---|
| Cinematic portraits | Excellent | Hook/CTA backgrounds | 50+ words: materials, lighting, composition, colors, atmosphere |
| Multi-image composition | Excellent (avg 9.6/10) | Hook slides with real faces + screenshots | Aristotelian axioms below. Send base64 to |
| Screenshot → device mockup | Excellent | Tool showcase, product launch slides | "floating laptop/phone mockup, dark studio, reflective surface" |
| Person + screenshot editorial | Excellent | News hooks with evidence | "person as SUBJECT, screenshot as floating holographic EVIDENCE panel" |
| Multi-screenshot dashboard | Excellent | Comparison/versus slides | "floating panels at varied depths, color-coded edge glows, grid floor" |
| Flowcharts | Excellent | Diagram slides as | Describe boxes, arrows, labels, and connections structurally |
| Abstract backgrounds | Excellent | Any slide background | Materials, colors, atmosphere, "no text no words" |
模型与路由:
- 使用工具(需
generate-image)。nano-banana-pro需AI_GATEWAY_API_KEY(通常未设置),但使用相同底层模型。GEMINI_API_KEY - 模型:(首选)。备选:
google/gemini-3-pro-image-preview。google/gemini-3.1-flash-image-preview - 输出:约1408x768横向。叠加层可补偿幻灯片纵向拉伸。
Gemini 3 Pro已验证能力(经实验验证):
| 能力 | 质量 | 轮播图最佳用途 | 提示词策略 |
|---|---|---|---|
| 电影级肖像 | 优秀 | 钩子/CTA背景 | 50+词:材质、光线、构图、颜色、氛围 |
| 多图像合成 | 优秀(平均9.6/10) | 带真实人物+截图的钩子幻灯片 | 遵循下方亚里士多德公理。将base64发送至 |
| 截图→设备样机 | 优秀 | 工具展示、产品发布幻灯片 | "悬浮笔记本/手机样机,深色工作室,反光表面" |
| 人物+截图编辑 | 优秀 | 带证据的新闻钩子 | "人物作为主体,截图作为悬浮全息证据面板" |
| 多截图仪表盘 | 优秀 | 对比/对决幻灯片 | "悬浮面板,不同深度,彩色边缘光效,网格地面" |
| 流程图 | 优秀 | 作为 | 结构化描述框、箭头、标签及连接关系 |
| 抽象背景 | 优秀 | 任意幻灯片背景 | 材质、颜色、氛围,"无文本无文字" |
The 7 Aristotelian Axioms for Multi-Image Composition (Experimentally Proven)
多图像合成的7条亚里士多德公理(经实验验证)
These irreducible premises govern ALL multi-image prompts. Every prompt must satisfy all 7:
A1: VISUAL HIERARCHY -- Eye processes: faces > contrast edges > text > color fields. Composition must respect this order.
A2: INPUT TYPE DETERMINES ROLE -- Each input has exactly one role:
- Photo of person → SUBJECT (preserve face, never modify)
- Screenshot/UI → EVIDENCE (float as holographic panel, stylize frame, preserve content)
- Logo/brand → ANCHOR (small, consistent corner placement)
- Abstract/texture → ATMOSPHERE (background only)
A3: UNIFIED LIGHT SOURCE -- All elements share one dominant light direction. Mixed lighting = instant "fake" detection.
A4: DEPTH CREATES DRAMA -- Foreground sharp (subject), midground recessed (screenshots), background soft (atmosphere). 3 layers minimum.
A5: NEGATIVE SPACE IS FUNCTIONAL -- Bottom 30-35% dark for text overlay. Not waste -- it's where the headline goes.
A6: COLOR TEMPERATURE = STORY -- Cool blue/teal = innovation. Warm red = urgency. Split red/blue = competition. Mono + accent = editorial.
A7: NO-TEXT SEAL -- Always end with "absolutely no text, no words, no letters, no watermarks" (outside screenshots).
这些不可简化的前提支配所有多图像提示词。每个提示词必须满足全部7条:
A1:视觉层次——视觉处理顺序:人脸>对比边缘>文本>色块。构图必须遵循此顺序。
A2:输入类型决定角色——每个输入仅有一个角色:
- 人物照片→主体(保留人脸,绝不修改)
- 截图/UI→证据(作为全息面板悬浮,风格化边框,保留内容)
- Logo/品牌→锚点(小尺寸,固定角落位置)
- 抽象/纹理→氛围(仅背景)
A3:统一光源——所有元素共享一个主光源方向。混合光源会立即被识别为「虚假」。
A4:深度创造戏剧性——前景清晰(主体),中景凹陷(截图),背景柔和(氛围)。至少3层。
A5:负空间具备功能性——底部30-35%区域深色处理,用于文本叠加。这不是浪费,而是标题区域。
A6:色温=叙事——冷蓝/青=创新。暖红=紧迫感。红蓝分割=竞争。单色+强调色=编辑风格。
A7:无文本封印——始终以「绝对无文本、无文字、无字母、无水印」结尾(截图内除外)。
Proven Scenario Prompt Templates (avg 9.6/10 across 10 tests)
已验证场景提示词模板(10次测试平均9.6/10)
Person + News Screenshot (9.5/10): "Image 1 is [person] -- preserve face, place in left 60%, dramatic side lighting. Image 2 is screenshot -- float as glowing translucent panel, tilted 8 degrees, recessed behind subject, cyan edge glow. Dark moody background, cinematic depth of field. Bottom 30% dark. No text outside screenshot."
Tool Screenshot Showcase (9/10): "Place screenshot on sleek floating laptop mockup angled 15 degrees. Dark gradient background, ambient teal glow from screen. Glossy reflective surface below. Premium Apple product launch aesthetic. No text outside screenshot."
Multi-Screenshot Dashboard (9.5/10): "Arrange as glowing panels floating in dark space, varied depths and angles (5-15 degrees). Largest centered. Color-coded edge glows. Grid floor, particle effects. Digital command center aesthetic. No text outside screenshots."
Person + Screenshots + Logo (10/10): "Person as dominant subject center-left, face preserved. Screenshots as holographic panels around them. Logo small in upper corner with glow. Volumetric light rays, 3-layer depth. No text outside screenshots/logo."
Face-Off + Data (10/10): "Person A on LEFT in profile facing right, red lighting. Person B on RIGHT facing left, blue lighting. Dashboard between them as floating holographic display. Smoke and sparks in the gap. Competitive energy. No text outside screenshot."
Phone in Hand (10/10): "Screenshot on smartphone held in hand from lower-right. Dark background, soft bokeh lights. Screen bright and crisp. Lifestyle photography style. No text outside screenshot."
5-Image Mega (10/10): "2 people (main foreground, secondary recessed) + 2 screenshots (holographic panels, color-coded glows) + logo (corner). Volumetric light, split lighting, multiple depth layers. No text outside screenshots/logo."
Prompt Rules:
- HYPER-DETAILED (50+ words): materials, lighting, composition, colors, atmosphere. Generic = bad.
- Always end with "absolutely no text, no words, no letters, no watermarks" -- AI models add unwanted text otherwise.
- Declare each input's role explicitly (per A2): "Image 1 is a portrait... Image 2 is a screenshot..."
- Specify depth map (per A4): "subject sharp foreground, screenshots floating midground, atmospheric background"
- Lock light direction (per A3): "single dominant light from upper-left, rim light on subject"
- For editorial portraits: add "ABSOLUTELY NO TEXT NO LOGOS NO MAGAZINE ELEMENTS" or Gemini creates TIME covers.
- Overlay opacity sweet spot: 0.60-0.68 for hooks, 0.55-0.65 for diagrams, 0.65-0.70 for CTA.
人物+新闻截图(9.5/10):"图像1是[人物]——保留人脸,放置在左侧60%区域,戏剧性侧光。图像2是截图——作为发光半透明面板悬浮,倾斜8度,位于主体后方,青色边缘光效。深色氛围感背景,电影级景深。底部30%深色处理。截图外无文本。"
工具截图展示(9/10):"将截图放置在悬浮笔记本样机上,倾斜15度。深色渐变背景,屏幕发出环境青色光效。下方有光泽反光表面。高端苹果产品发布美学风格。截图外无文本。"
多截图仪表盘(9.5/10):"排列为悬浮在深色空间中的发光面板,不同深度和角度(5-15度)。最大面板居中。彩色边缘光效。网格地面,粒子效果。数字指挥中心美学风格。截图外无文本。"
人物+截图+Logo(10/10):"人物作为主要主体位于中左,保留人脸。截图作为全息面板环绕其周围。Logo小尺寸位于右上角,带光效。体积光,多层深度。截图/Logo外无文本。"
对决+数据(10/10):"人物A在左侧,面朝右侧,红色光线。人物B在右侧,面朝左侧,蓝色光线。仪表盘作为悬浮全息显示屏位于中间。缝隙中有烟雾和火花。竞争氛围。截图外无文本。"
手持手机(10/10):"截图显示在从右下角手持的智能手机上。深色背景,柔和散景灯光。屏幕明亮清晰。生活方式摄影风格。截图外无文本。"
5图合成(10/10):"2人(主要前景,次要背景)+2张截图(全息面板,彩色光效)+Logo(角落)。体积光,分割灯光,多层深度。截图/Logo外无文本。"
提示词规则:
- 超详细(50+词):材质、光线、构图、颜色、氛围。通用提示词=劣质结果。
- 始终以「绝对无文本、无文字、无字母、无水印」结尾——否则AI模型会添加多余文本。
- 明确声明每个输入的角色(依据A2):"图像1是肖像...图像2是截图..."
- 指定深度图(依据A4):"主体清晰前景,截图悬浮中景,氛围感背景"
- 锁定光源方向(依据A3):"单一主光源来自左上,主体轮廓光"
- 编辑肖像:添加「绝对无文本无Logo无杂志元素」,否则Gemini会生成《时代》封面风格图像。
- 叠加层透明度最佳值:钩子0.60-0.68,图表0.55-0.65,CTA0.65-0.70。
Background Style Selection
背景样式选择
Set in the carousel spec or per-slide to control the look:
bg_styledata | Visual Result | Best For |
|---|---|---|
| Top-to-bottom gradient with subtle accent glow | All themes. Clean, modern, professional |
| AI-generated paper/fabric texture | AVOID -- produces grey rock look |
| Multi-stop gradient with geometric accent shapes | Creative, premium, high-contrast |
| Flat theme background color | Clean/education themes, data-heavy content |
| (AI background) | Full-bleed AI image with overlay | Dramatic hooks, artistic carousels |
Set it at the spec level for all slides: in the spec JSON.
Or per-slide for variation: on specific slides.
"bg_style": "gradient""data": {"bg_style": "gradient_mesh", ...}在轮播图规格或单张幻灯片中设置以控制外观:
databg_style | 视觉效果 | 最佳场景 |
|---|---|---|
| 从上到下渐变,带微妙强调光效 | 所有主题。简洁、现代、专业 |
| AI生成纸张/织物纹理 | 避免——产生灰暗效果 |
| 多色渐变,带几何强调形状 | 创意、高端、高对比度 |
| 纯色主题背景 | 简洁/教育主题、数据密集型内容 |
| (AI背景) | 全屏AI图像+叠加层 | 戏剧性钩子、艺术风格轮播图 |
在规格级别设置以应用于所有幻灯片:在规格JSON中设置。
或在单张幻灯片设置以实现变化:在特定幻灯片中设置。
"bg_style": "gradient""data": {"bg_style": "gradient_mesh", ...}Screenshot Capture Protocol -- DEPRECATED
截图捕获协议——已废弃
DO NOT use browser screenshots on carousel slides. They consistently look terrible -- low resolution, poorly framed, and badly integrated with the slide design. This was tested extensively and abandoned.
Instead: Use AI-generated images via Gemini 3 Pro for any visual needs:
- Tool/product visuals: Generate an AI illustration or abstract representation
- Data/charts: Generate AI bar charts or infographics (Gemini 3 Pro handles these well)
- Architecture/flows: Generate AI flowcharts or architecture diagrams
- People: Use text descriptions instead of photos
请勿在轮播图幻灯片中使用浏览器截图。它们始终效果糟糕——分辨率低、构图差、与幻灯片设计融合度差。经广泛测试后已弃用。
替代方案:使用Gemini 3 Pro生成的AI图像满足所有视觉需求:
- 工具/产品视觉:生成AI插画或抽象表现
- 数据/图表:生成AI柱状图或信息图(Gemini 3 Pro擅长此)
- 架构/流程:生成AI流程图或架构图
- 人物:使用文本描述替代照片
PHASE 2: CONTENT CREATION
阶段2:内容创作
- Write the hook (Slide 1) -- Apply the Hook Taxonomy. This slide determines everything.
- Write each slide -- One idea per slide. No exceptions. Apply the Bullshit Test to each.
- Map to renderer data format -- For each slide, create the JSON data object matching the slide type's required fields (see Data fields by slide type in RENDERING SCRIPTS).
- Execute the visual strategy decided in Phase 1.5:
- Generate AI images per the 2-3 Rule (hook + 1-2 emotional peaks). State each image's telos.
- Capture browser screenshots for any real tools/products/news referenced.
- Set per the Background Style Selection table.
bg_style - Use slide type for any process/flow that benefits from a visual.
diagram
- Select Instagram music -- Apply the Music Decision Matrix (see MUSIC SELECTION).
- Write the caption -- Front-load value in first 2 lines. Include CTA and hashtags.
- Build the carousel spec JSON -- Assemble all slides into a single spec file for the orchestrator.
- 撰写钩子(第1张幻灯片)——应用钩子分类法。此幻灯片决定一切。
- 撰写每张幻灯片——每张一个观点,无例外。对每张执行「废话检测」。
- 映射到渲染器数据格式——为每张幻灯片创建符合幻灯片类型必填字段的JSON数据对象(见「渲染脚本」中的「按幻灯片类型划分的数据字段」)。
- 执行阶段1.5确定的视觉策略:
- 按2-3规则生成AI图像(钩子+1-2个情绪峰值)。明确每张图像的目的。
- 为提及的任何真实工具/产品/新闻捕获浏览器截图(已废弃,建议使用AI图像)。
- 根据「背景样式选择表」设置。
bg_style - 对任何受益于视觉展示的流程/使用幻灯片类型。
diagram
- 选择Instagram音乐——应用「音乐选择矩阵」(见「音乐选择」)。
- 撰写配文——前2行突出价值。包含CTA和话题标签。
- 构建轮播图规格JSON——将所有幻灯片组装成单个规格文件,供编排器使用。
PHASE 3: VISUAL PRODUCTION (LaTeX Pipeline)
阶段3:视觉制作(LaTeX流程)
Use the LaTeX-based rendering pipeline for publication-grade output. This produces slides that match or exceed the quality of accounts with 1M+ followers (Chase AI, Analytics Vidhya, etc.).
The pipeline: LaTeX (TikZ) -> PDF (pdflatex) -> PNG (pdftoppm at 300 DPI) -> resize to 1080x1350
使用基于LaTeX的渲染流程生成出版级输出。产出的幻灯片质量匹配或超过拥有100万+粉丝的账号(如Chase AI、Analytics Vidhya等)。
流程:LaTeX(TikZ)→PDF(pdflatex)→PNG(pdftoppm 300 DPI)→调整为1080x1350
Why LaTeX (not Pillow/HTML)
为何选择LaTeX(而非Pillow/HTML)
- Knuth-Plass optimal line breaking -- no ugly word wraps
- Professional font kerning and ligatures -- Palatino with microtype
- Native vector diagrams -- TikZ flow charts (fallback for simple diagrams)
- AI image integration -- full-bleed Gemini 3 Pro images for hook/CTA/diagram backgrounds
- Gradient backgrounds -- clean TikZ-rendered gradients for text-only slides
- Publication-grade output -- the same engine that typesets academic papers and books
- Knuth-Plass最优换行——无难看的单词换行
- 专业字距调整与连字——Palatino字体搭配microtype
- 原生矢量图表——TikZ流程图(简单图表备选)
- AI图像集成——Gemini 3 Pro全屏图像作为钩子/CTA/图表背景
- 渐变背景——TikZ渲染的简洁渐变,适用于纯文本幻灯片
- 出版级输出——与学术论文和书籍排版相同的引擎
Step 3a: Generate AI Images (Hook, CTA, Diagrams)
步骤3a:生成AI图像(钩子、CTA、图表)
Generate AI images for hook background, CTA background, and optionally diagram backgrounds:
bash
undefined为钩子背景、CTA背景及可选图表背景生成AI图像:
bash
undefinedHook background (cinematic, hyper-detailed 50+ word prompt)
钩子背景(电影级,超详细50+词提示词)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic cinematic split-screen composition: left side dark blue crystalline monolith with electric energy, right side warm golden organic neural network, clash of opposing forces, volumetric lighting, no text no words no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
"Dramatic cinematic split-screen composition: left side dark blue crystalline monolith with electric energy, right side warm golden organic neural network, clash of opposing forces, volumetric lighting, no text no words no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic cinematic split-screen composition: left side dark blue crystalline monolith with electric energy, right side warm golden organic neural network, clash of opposing forces, volumetric lighting, no text no words no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
"Dramatic cinematic split-screen composition: left side dark blue crystalline monolith with electric energy, right side warm golden organic neural network, clash of opposing forces, volumetric lighting, no text no words no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
CTA background (emotional close)
CTA背景(情绪收尾)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Mesmerizing cosmic portal with swirling deep indigo and purple energy, golden light rays, ethereal atmosphere, no text no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
"Mesmerizing cosmic portal with swirling deep indigo and purple energy, golden light rays, ethereal atmosphere, no text no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Mesmerizing cosmic portal with swirling deep indigo and purple energy, golden light rays, ethereal atmosphere, no text no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
"Mesmerizing cosmic portal with swirling deep indigo and purple energy, golden light rays, ethereal atmosphere, no text no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
Diagram as AI image (optional -- replaces TikZ for better visuals)
图表AI图像(可选——替代TikZ,视觉效果更佳)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Professional flowchart: Data Collection box connects to Processing box connects to Output box, clean white background, blue and grey, sharp vector style, readable labels"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
"Professional flowchart: Data Collection box connects to Processing box connects to Output box, clean white background, blue and grey, sharp vector style, readable labels"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
undefinedpython3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Professional flowchart: Data Collection box connects to Processing box connects to Output box, clean white background, blue and grey, sharp vector style, readable labels"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
"Professional flowchart: Data Collection box connects to Processing box connects to Output box, clean white background, blue and grey, sharp vector style, readable labels"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
undefinedStep 3b: Render Slides with 7 Slide Types
步骤3b:使用7种幻灯片类型渲染幻灯片
The LaTeX renderer () supports 7 slide types:
render_latex_slide.py| Type | Description | Best For |
|---|---|---|
| Large title + highlighted phrase + subtitle | Cover / first slide |
| Title + highlighted text + body + bullets | Content-heavy slides, curated list items |
| Multi-column comparison table | Side-by-side analysis |
| Title + TikZ flow diagram (vertical/horizontal) | Architecture, workflows |
| Styled numbered points with badges | Save-worthy summary |
| Centered title + text + handle button | Call to action |
4 Color Themes: (parchment/terracotta), (white/blue), (indigo/purple), (sage/gold)
warmcleandarkearthSlide 1 (Hook) -- Title with AI background:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type hook \
--data tmp/carousel/hook_data.json \
--output tmp/carousel/slide_01.png \
--theme dark --brand tmp/carousel/brand.jsonWhere contains:
hook_data.json{"title": "6 AI Tools That Will", "title_highlight": "Replace Your Stack", "subtitle": "The tools 10x engineers are switching to.", "callout": "Save this!", "slide_num": 1, "total_slides": 8, "ai_bg": "tmp/carousel/hook_bg.png", "overlay_opacity": 0.63}Body slides -- Content-heavy with bullets (gradient bg, NO texture):
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type body \
--data tmp/carousel/body_data.json \
--output tmp/carousel/slide_02.png \
--theme dark --brand tmp/carousel/brand.jsonWhere contains:
body_data.json{"title": "Why Most Developers", "title_highlight": "Get This Wrong", "body": "The biggest mistake is...", "bullets": ["Point 1", "Point 2"], "slide_num": 2, "total_slides": 8, "bg_style": "gradient"}NOTE: Always pass data as a JSON file path, never inline JSON. Always include for text-only slides. Always pass .
"bg_style": "gradient"--brandComparison slide -- Multi-column:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type comparison \
--data tmp/carousel/comparison_data.json \
--output tmp/carousel/slide_04.png \
--theme warm --brand tmp/carousel/brand.jsonWhere contains:
comparison_data.json{"title": "Claude vs GPT", "subtitle": "How they compare", "columns": [{"name": "Claude", "items": [{"label": "Best for", "value": "Complex refactors"}]}, {"name": "GPT-4", "items": [{"label": "Best for", "value": "Quick prototyping"}]}], "slide_num": 4, "total_slides": 9, "bg_style": "gradient"}Diagram slide -- AI-generated diagram background (preferred) or TikZ fallback:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type diagram \
--data tmp/carousel/diagram_data.json \
--output tmp/carousel/slide_07.png \
--theme dark --brand tmp/carousel/brand.jsonWhere contains:
diagram_data.json{"title": "The Architecture", "description": "How the tools connect.", "diagram_nodes": [{"label": "Code", "desc": "Write"}, {"label": "Deploy", "desc": "Ship"}, {"label": "Monitor", "desc": "Track"}], "diagram_type": "vertical", "slide_num": 7, "total_slides": 9, "ai_bg": "tmp/carousel/diagram_bg.png", "overlay_opacity": 0.60, "bg_style": "gradient"}Synthesis slide -- Save-worthy numbered summary:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type synthesis \
--data tmp/carousel/synthesis_data.json \
--output tmp/carousel/slide_08.png \
--theme dark --brand tmp/carousel/brand.jsonWhere contains:
synthesis_data.json{"title": "Your Stack", "points": ["Tool 1 for X", "Tool 2 for Y", "Tool 3 for Z"], "slide_num": 8, "total_slides": 9, "bg_style": "gradient"}CTA slide -- with AI background for emotional close:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type cta \
--data tmp/carousel/cta_data.json \
--output tmp/carousel/slide_09.png \
--theme dark --brand tmp/carousel/brand.jsonWhere contains:
cta_data.json{"title": "Want the full breakdown?", "cta_text": "Follow for daily tips.", "handle": "@yourbrand", "slide_num": 9, "total_slides": 9, "show_nav": false, "ai_bg": "tmp/carousel/cta_bg.png", "overlay_opacity": 0.67}LaTeX渲染器()支持7种幻灯片类型:
render_latex_slide.py| 类型 | 描述 | 最佳场景 |
|---|---|---|
| 大标题+高亮短语+副标题 | 封面/第一张幻灯片 |
| 标题+高亮文本+主体+列表 | 内容密集型幻灯片、精选列表项 |
| 多栏对比表 | 并列分析 |
| 标题+TikZ流程图(垂直/水平) | 架构、工作流 |
| 带徽章的编号要点 | 值得保存的总结 |
| 居中标题+文本+账号按钮 | 行动号召 |
4种颜色主题:(羊皮纸/赤陶色)、(白/蓝)、(靛蓝/紫色)、(/sage/金色)
warmcleandarkearth第1张(钩子)——带AI背景的标题:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type hook \
--data tmp/carousel/hook_data.json \
--output tmp/carousel/slide_01.png \
--theme dark --brand tmp/carousel/brand.json其中包含:
hook_data.json{"title": "6 AI Tools That Will", "title_highlight": "Replace Your Stack", "subtitle": "The tools 10x engineers are switching to.", "callout": "Save this!", "slide_num": 1, "total_slides": 8, "ai_bg": "tmp/carousel/hook_bg.png", "overlay_opacity": 0.63}主体幻灯片——带列表的内容密集型幻灯片(渐变背景,无纹理):
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type body \
--data tmp/carousel/body_data.json \
--output tmp/carousel/slide_02.png \
--theme dark --brand tmp/carousel/brand.json其中包含:
body_data.json{"title": "Why Most Developers", "title_highlight": "Get This Wrong", "body": "The biggest mistake is...", "bullets": ["Point 1", "Point 2"], "slide_num": 2, "total_slides": 8, "bg_style": "gradient"}注意:始终通过JSON文件路径传递数据,而非内联JSON。纯文本幻灯片始终包含。始终传递。
"bg_style": "gradient"--brand对比幻灯片——多栏:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type comparison \
--data tmp/carousel/comparison_data.json \
--output tmp/carousel/slide_04.png \
--theme warm --brand tmp/carousel/brand.json其中包含:
comparison_data.json{"title": "Claude vs GPT", "subtitle": "How they compare", "columns": [{"name": "Claude", "items": [{"label": "Best for", "value": "Complex refactors"}]}, {"name": "GPT-4", "items": [{"label": "Best for", "value": "Quick prototyping"}]}], "slide_num": 4, "total_slides": 9, "bg_style": "gradient"}图表幻灯片——AI生成图表背景(优先)或TikZ备选:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type diagram \
--data tmp/carousel/diagram_data.json \
--output tmp/carousel/slide_07.png \
--theme dark --brand tmp/carousel/brand.json其中包含:
diagram_data.json{"title": "The Architecture", "description": "How the tools connect.", "diagram_nodes": [{"label": "Code", "desc": "Write"}, {"label": "Deploy", "desc": "Ship"}, {"label": "Monitor", "desc": "Track"}], "diagram_type": "vertical", "slide_num": 7, "total_slides": 9, "ai_bg": "tmp/carousel/diagram_bg.png", "overlay_opacity": 0.60, "bg_style": "gradient"}总结幻灯片——值得保存的编号总结:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type synthesis \
--data tmp/carousel/synthesis_data.json \
--output tmp/carousel/slide_08.png \
--theme dark --brand tmp/carousel/brand.json其中包含:
synthesis_data.json{"title": "Your Stack", "points": ["Tool 1 for X", "Tool 2 for Y", "Tool 3 for Z"], "slide_num": 8, "total_slides": 9, "bg_style": "gradient"}CTA幻灯片——带AI背景的情绪收尾:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type cta \
--data tmp/carousel/cta_data.json \
--output tmp/carousel/slide_09.png \
--theme dark --brand tmp/carousel/brand.json其中包含:
cta_data.json{"title": "Want the full breakdown?", "cta_text": "Follow for daily tips.", "handle": "@yourbrand", "slide_num": 9, "total_slides": 9, "show_nav": false, "ai_bg": "tmp/carousel/cta_bg.png", "overlay_opacity": 0.67}Step 3d: Full Carousel Generation (Orchestrator)
步骤3d:完整轮播图生成(编排器)
Generate a complete carousel from a single JSON spec:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
--spec carousel_spec.json \
--output-dir outputs/carousel/ \
--brand tmp/carousel/brand.jsonThe spec JSON format:
json
{
"topic": "6 AI Tools That Will Replace Your Stack",
"brand": "AI Builder",
"theme": "dark",
"bg_style": "gradient",
"slides": [
{"type": "hook", "data": {"title": "...", "title_highlight": "...", "ai_bg": "tmp/hook_bg.png", "overlay_opacity": 0.63}},
{"type": "body", "data": {"title": "...", "bullets": ["..."], "bg_style": "gradient"}},
{"type": "diagram", "data": {"title": "...", "diagram_nodes": [{"label": "...", "desc": "..."}], "diagram_type": "vertical", "ai_bg": "tmp/diagram_bg.png", "overlay_opacity": 0.60}},
{"type": "synthesis", "data": {"title": "...", "points": ["..."], "bg_style": "gradient"}},
{"type": "cta", "data": {"title": "...", "handle": "@brand", "ai_bg": "tmp/cta_bg.png", "overlay_opacity": 0.67}}
]
}Spec-level applies to all slides. Per-slide overrides it. Options: , , . If omitted, defaults to . Never use .
bg_styledata.bg_style"gradient""gradient_mesh""solid""gradient""texture"The orchestrator auto-injects brand name, slide numbering, renders all slides, and creates a preview grid.
从单个JSON规格生成完整轮播图:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
--spec carousel_spec.json \
--output-dir outputs/carousel/ \
--brand tmp/carousel/brand.json规格JSON格式:
json
{
"topic": "6 AI Tools That Will Replace Your Stack",
"brand": "AI Builder",
"theme": "dark",
"bg_style": "gradient",
"slides": [
{"type": "hook", "data": {"title": "...", "title_highlight": "...", "ai_bg": "tmp/hook_bg.png", "overlay_opacity": 0.63}},
{"type": "body", "data": {"title": "...", "bullets": ["..."], "bg_style": "gradient"}},
{"type": "diagram", "data": {"title": "...", "diagram_nodes": [{"label": "...", "desc": "..."}], "diagram_type": "vertical", "ai_bg": "tmp/diagram_bg.png", "overlay_opacity": 0.60}},
{"type": "synthesis", "data": {"title": "...", "points": ["..."], "bg_style": "gradient"}},
{"type": "cta", "data": {"title": "...", "handle": "@brand", "ai_bg": "tmp/cta_bg.png", "overlay_opacity": 0.67}}
]
}**规格级**应用于所有幻灯片。单张幻灯片会覆盖它。选项:、、。若省略,默认。绝不使用。
bg_styledata.bg_style"gradient""gradient_mesh""solid""gradient""texture"编排器会自动注入品牌名称、幻灯片编号、渲染所有幻灯片并创建预览网格。
Brand Configuration System
品牌配置系统
The design system is fully generalized through brand configs -- JSON files that define visual identity per channel or brand. Pass to any render command.
--brand brand.jsonBrand config JSON format:
json
{
"name": "TechStack AI", // Brand name shown in header
"logo": "path/to/logo.png", // Optional: logo image replaces text in header
"theme": "dark", // Base theme: warm, clean, dark, earth
"accent_override": "6366F1", // Optional: override accent hex (no #)
"font_serif": "newpxtext", // LaTeX serif font package (default: Palatino)
"header_style": "bold", // Header text: italic, bold, or plain
"nav_style": "circle", // Navigation arrow: circle, arrow, none
"divider_style": "line", // Dividers: line, ornament (diamond), dots, none
"corner_radius": "6pt" // Rounded corner radius for labels/badges
}3 sample brand configs (in ):
tmp/brands/| Brand | Theme | Accent | Header | Divider | Character |
|---|---|---|---|---|---|
| TechStack AI | dark | Indigo | Bold | Line | Modern dev/AI content |
| Growth Academy | earth | Amber | Italic | Ornament | Business coaching |
| Code Academy | clean | Blue (default) | Bold | Dots | Educational tutorials |
Usage with brand config:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type hook \
--data hook_data.json \
--output slide.png \
--theme dark \
--brand brands/techstartup.jsonAI Image Integration (Aristotelian Framework): Slides support two AI image zones:
- : Accent illustration placed in a card (hook bottom, body bottom)
ai_image - : Full-bleed background with semi-transparent overlay for text readability
ai_bg - When no AI image is provided, decorative geometric accents fill empty space automatically
通过品牌配置JSON文件实现设计系统完全通用——定义各渠道或品牌的视觉标识。将传递给任何渲染命令。
--brand brand.json品牌配置JSON格式:
json
{
"name": "TechStack AI", // 头部显示的品牌名称
"logo": "path/to/logo.png", // 可选:Logo图像替代头部文本
"theme": "dark", // 基础主题:warm、clean、dark、earth
"accent_override": "6366F1", // 可选:覆盖强调色十六进制值(无#)
"font_serif": "newpxtext", // LaTeX衬线字体包(默认:Palatino)
"header_style": "bold", // 头部文本:italic、bold或plain
"nav_style": "circle", // 导航箭头:circle、arrow、none
"divider_style": "line", // 分隔线:line、ornament(菱形)、dots、none
"corner_radius": "6pt" // 标签/徽章圆角半径
}3个示例品牌配置(位于):
tmp/brands/| 品牌 | 主题 | 强调色 | 头部 | 分隔线 | 风格 |
|---|---|---|---|---|---|
| TechStack AI | dark | 靛蓝 | Bold | Line | 现代开发者/AI内容 |
| Growth Academy | earth | 琥珀 | Italic | Ornament | 商业教练 |
| Code Academy | clean | 蓝色(默认) | Bold | Dots | 教育教程 |
品牌配置使用示例:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type hook \
--data hook_data.json \
--output slide.png \
--theme dark \
--brand brands/techstartup.jsonAI图像集成(亚里士多德框架):幻灯片支持两个AI图像区域:
- :卡片中的强调插图(钩子底部、主体底部)
ai_image - :全屏背景,带半透明叠加层以保证文本可读性
ai_bg - 若未提供AI图像,装饰性几何元素会自动填充空白区域
AI Image Integration Principles (First-Principles Framework)
AI图像集成原则(第一性原理框架)
Images in carousels must serve a purpose (telos). Before generating any AI image, name its function in one sentence. If you cannot, do not generate it.
The Three Teloi (Purposes) of Carousel Images:
| Telos | When to Use | Image Form | Example |
|---|---|---|---|
| Emotional Priming | Create a feeling before text is read | Atmospheric, evocative, human/natural | Marble bust for philosophy, neon cityscape for tech |
| Conceptual Anchoring | Give abstract ideas a visual handle | Symbolic, metaphorical, illustrative | Storm figure for "amor fati", network diagram for systems |
| Authority Signaling | Establish credibility through proof | Documentary, screenshots, concrete | Product screenshot, data chart, real photo |
The 2-3 Rule (Golden Mean): In an 8-10 slide carousel, use AI images on exactly 2-3 slides. Always the hook (slide 1) and CTA (last slide). Optionally the diagram slide with an AI-generated diagram as . Never on body slides -- visual fatigue destroys reading rhythm and costs 40% content space.
ai_bgImage Placement Decision Matrix:
| Slide Type | AI Image? | Zone | Reasoning |
|---|---|---|---|
| Always | | Scroll-stop power: atmospheric image + typography > typography alone (Axiom 1, 3) |
| Never | -- | Text carries the weight; images destroy 40% content space for minimal gain |
| Preferred | | Gemini 3 Pro generates production-quality flowcharts with readable labels, arrows, and boxes -- far more visually striking than basic TikZ. TikZ remains as fallback for simple flows. |
| Never | -- | Numbered points ARE the content; keep text-only with gradient bg |
| Always | | Emotional close: atmospheric image creates a feeling of resolution |
Prompt Engineering for Consistency: All AI images in a single carousel MUST share a consistent style prefix. Build the prefix from the content vertical:
| Content Vertical | Style Prefix for AI Image Prompts |
|---|---|
| Mindset/Philosophy | "warm earthy tones, parchment cream, watercolor or classical art style, muted terracotta accents, editorial quality" |
| Tech/AI | "dark indigo and purple tones, subtle geometric patterns, clean digital art, neon accents, futuristic" |
| Business/Strategy | "warm amber and gold tones, bold professional graphics, rich depth, confident and energetic" |
| Education | "clean white and blue tones, flat illustration style, precise and clear, minimal and modern" |
| Creative/Design | "dark charcoal with bold accent colors, artistic and expressive, gallery quality, intentional composition" |
Text Readability is Inviolable: If using (full-bleed), overlay opacity must ensure WCAG AA contrast (4.5:1). Minimum . Proven ranges: hook 0.60-0.68, diagram 0.55-0.65, CTA 0.65-0.70.
ai_bgoverlay_opacity: 0.55What NOT to generate: Generic stock-photo-style images (people in offices, handshakes, generic landscapes). If the image could illustrate any topic, it fails the Telos Test.
轮播图中的图像必须有明确目的。生成任何AI图像之前,用一句话说明其功能。若无法说明,则不生成。
轮播图图像的三大目的:
| 目的 | 使用场景 | 图像形式 | 示例 |
|---|---|---|---|
| 情绪铺垫 | 在阅读文本前营造氛围 | 氛围感、唤起情绪、人文/自然 | 哲学内容用大理石雕像,科技内容用霓虹城市景观 |
| 概念锚定 | 为抽象概念提供视觉载体 | 象征性、隐喻性、说明性 | 「amor fati」用风暴人物,系统用网络图 |
| 权威信号 | 通过证据建立可信度 | 纪实、截图、具象 | 产品截图、数据图表、真实照片 |
2-3规则(黄金法则):在8-10张幻灯片的轮播图中,仅在2-3张幻灯片中使用AI图像。始终包含钩子(第1张)和CTA(最后1张)。可选在图表幻灯片中使用AI生成图表作为。绝不在主体幻灯片中使用——视觉疲劳会破坏阅读节奏,并占用40%的内容空间。
ai_bg图像放置决策矩阵:
| 幻灯片类型 | 是否使用AI图像 | 区域 | 原因 |
|---|---|---|---|
| 始终 | | 停滑能力:氛围图像+排版>仅排版(公理1、3) |
| 绝不 | -- | 文本承载核心;图像占用40%内容空间,收益极小 |
| 优先 | | Gemini 3 Pro生成的生产级流程图带清晰标签、箭头和框——视觉效果远优于基础TikZ。TikZ作为简单流程备选。 |
| 绝不 | -- | 编号要点即为核心内容;保持纯文本+渐变背景 |
| 始终 | | 情绪收尾:氛围图像营造结束感 |
一致性提示词工程:单个轮播图中的所有AI图像必须共享一致的风格前缀。根据内容垂直领域构建前缀:
| 内容垂直领域 | AI图像提示词风格前缀 |
|---|---|
| 心态/哲学 | "暖色调,羊皮纸奶油色,水彩或古典艺术风格,柔和赤陶色强调,编辑级质量" |
| 科技/AI | "深靛蓝和紫色调,微妙几何图案,简洁数字艺术,霓虹强调,未来感" |
| 商业/战略 | "暖琥珀和金色调,醒目专业图形,丰富深度,自信有活力" |
| 教育 | "简洁白蓝调,扁平化插图风格,精准清晰,极简现代" |
| 创意/设计 | "深炭黑+醒目强调色,艺术感表现力,画廊级质量,刻意构图" |
文本可读性不可侵犯:若使用(全屏),叠加层透明度必须确保WCAG AA对比度(4.5:1)。最小。已验证范围:钩子0.60-0.68,图表0.55-0.65,CTA0.65-0.70。
ai_bgoverlay_opacity: 0.55禁止生成:通用库存照片风格图像(办公室人物、握手、通用风景)。若图像可用于任何主题,则未通过「目的测试」。
AI Visual Generation (via generate-image skill)
AI视觉生成(通过generate-image工具)
Generate AI images for hook backgrounds, CTA backgrounds, and diagram visuals using the skill (requires ):
generate-imageAI_GATEWAY_API_KEYbash
undefined使用工具为钩子背景、CTA背景及图表视觉生成AI图像(需):
generate-imageAI_GATEWAY_API_KEYbash
undefinedHook background -- cinematic, atmospheric, scroll-stopping
钩子背景——电影级、氛围感、停滑
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic cinematic split-screen composition, glowing neon circuits on dark background,
volumetric lighting, deep indigo and electric purple tones, no text, no words, no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
"Dramatic cinematic split-screen composition, glowing neon circuits on dark background,
volumetric lighting, deep indigo and electric purple tones, no text, no words, no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic cinematic split-screen composition, glowing neon circuits on dark background,
volumetric lighting, deep indigo and electric purple tones, no text, no words, no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
"Dramatic cinematic split-screen composition, glowing neon circuits on dark background,
volumetric lighting, deep indigo and electric purple tones, no text, no words, no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
CTA background -- emotional close
CTA背景——情绪收尾
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Abstract convergence of light streams on dark background, warm golden highlights,
sense of resolution and completeness, cinematic atmosphere, no text, no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
"Abstract convergence of light streams on dark background, warm golden highlights,
sense of resolution and completeness, cinematic atmosphere, no text, no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Abstract convergence of light streams on dark background, warm golden highlights,
sense of resolution and completeness, cinematic atmosphere, no text, no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
"Abstract convergence of light streams on dark background, warm golden highlights,
sense of resolution and completeness, cinematic atmosphere, no text, no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
Diagram as AI image (preferred over TikZ for complex flows)
图表AI图像(优先于TikZ,适用于复杂流程)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Professional flowchart: Data Collection box connects to Processing box connects to Output box,
clean white boxes on dark blue background, arrows between nodes, minimal corporate design"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
"Professional flowchart: Data Collection box connects to Processing box connects to Output box,
clean white boxes on dark blue background, arrows between nodes, minimal corporate design"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
**Key rules**: Always add "no text, no words, no letters" unless the image IS a diagram with labels. Use hyper-detailed prompts (50+ words) for best results.python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Professional flowchart: Data Collection box connects to Processing box connects to Output box,
clean white boxes on dark blue background, arrows between nodes, minimal corporate design"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
"Professional flowchart: Data Collection box connects to Processing box connects to Output box,
clean white boxes on dark blue background, arrows between nodes, minimal corporate design"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
undefinedViral Hook Compositing Pipeline (PIL)
病毒式钩子合成流程(PIL)
For viral-style hook slides matching accounts like @evolving.ai and @therundownai, use a two-step pipeline:
Step 1: Generate cinematic base image with Gemini 3 Pro (topic-specific, dramatic composition):
bash
undefined为生成匹配@evolving.ai和@therundownai等账号的病毒式钩子幻灯片,使用两步流程:
步骤1:用Gemini 3 Pro生成电影级基础图像(主题相关,戏剧性构图):
bash
undefinedMulti-person composition (best for news/war/rivalry topics)
多人合成(最适用于新闻/战争/竞争主题)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Cinematic photomontage: three powerful figures in dramatic formation,
center figure is a humanoid AI robot with glowing eyes, flanking figures
are business leaders in dark suits, red and blue dramatic lighting,
dark moody background, editorial magazine composition, hyper-detailed"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
"Cinematic photomontage: three powerful figures in dramatic formation,
center figure is a humanoid AI robot with glowing eyes, flanking figures
are business leaders in dark suits, red and blue dramatic lighting,
dark moody background, editorial magazine composition, hyper-detailed"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Cinematic photomontage: three powerful figures in dramatic formation,
center figure is a humanoid AI robot with glowing eyes, flanking figures
are business leaders in dark suits, red and blue dramatic lighting,
dark moody background, editorial magazine composition, hyper-detailed"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
"Cinematic photomontage: three powerful figures in dramatic formation,
center figure is a humanoid AI robot with glowing eyes, flanking figures
are business leaders in dark suits, red and blue dramatic lighting,
dark moody background, editorial magazine composition, hyper-detailed"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
Single portrait (best for profile/biography/interview topics)
单人肖像(最适用于个人简介/传记/访谈主题)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Editorial portrait: distinguished elder with glasses, warm ambient lighting,
slightly blurred conference background, shallow depth of field,
photojournalistic style, natural expression, cinematic color grading"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
"Editorial portrait: distinguished elder with glasses, warm ambient lighting,
slightly blurred conference background, shallow depth of field,
photojournalistic style, natural expression, cinematic color grading"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Editorial portrait: distinguished elder with glasses, warm ambient lighting,
slightly blurred conference background, shallow depth of field,
photojournalistic style, natural expression, cinematic color grading"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
"Editorial portrait: distinguished elder with glasses, warm ambient lighting,
slightly blurred conference background, shallow depth of field,
photojournalistic style, natural expression, cinematic color grading"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
Face-off composition (best for comparison/versus topics)
对决构图(最适用于对比/对决主题)
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic face-off: two opposing figures in profile facing each other,
one in cool blue lighting one in warm orange, city skyline between them,
energy effects and particles, dark cinematic atmosphere, epic confrontation"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
"Dramatic face-off: two opposing figures in profile facing each other,
one in cool blue lighting one in warm orange, city skyline between them,
energy effects and particles, dark cinematic atmosphere, epic confrontation"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
**Step 2a: News-editorial style** (matches @therundownai -- single person, big headline):
```bash
python3 scripts/compose_news_hook.py \
--base tmp/carousel/hook_base.png \
--output tmp/carousel/slide_01_hook.png \
--headline "OpenAI just hit $13B ARR making it the fastest-growing software company in history" \
--category "AI NEWS" \
--brand "@DailyAINews"The script (editorial style):
compose_news_hook.py- Subtle bottom gradient (ease-in, configurable start/strength)
- Small category label above headline
- MASSIVE bold headline (Inter Black, auto-sized 42-72px to fill bottom 35%)
- Optional brand mark top-left
- Clean, minimal -- no slide counter, no CTA, no subhead
- Best for: single-person portrait + news headline
Step 2b: Multi-person viral style (compose_hook.py -- multi-person, full overlay):
bash
python3 scripts/compose_hook.py \
--base tmp/carousel/hook_base.png \
--output tmp/carousel/slide_01_hook.png \
--headline "THE AI WAR JUST ESCALATED" \
--subhead "3 moves that changed everything this week" \
--brand "YOUR BRAND" \
--category "AI NEWS"The script (viral style):
compose_hook.py- Bottom gradient overlay (0-220 alpha, ease-in curve) for text readability
- Light top gradient for brand area
- Category label (upper-left, e.g., "AI NEWS")
- Brand watermark (centered)
- Word-wrapped bold headline (bottom area, all-caps)
- Optional subheadline
- "SWIPE FOR MORE" CTA with decorative line
- Slide counter (top-right, "1/8")
- Best for: multi-person compositions, face-off style
Prompt Strategy by Topic Type:
| Topic Type | Base Image Style | Score |
|---|---|---|
| News/current events | Multi-person photomontage + robot | 8.5/10 |
| Comparison/versus | Face-off composition with opposing energy | 8.5/10 |
| Profile/biography | Single editorial portrait | 8/10 |
| Tools/abstract | Silhouette with holographic/tech backdrop | 7.5/10 |
For educational/tutorial/framework topics, AI-generated compositions work excellently (8-8.5/10).
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic face-off: two opposing figures in profile facing each other,
one in cool blue lighting one in warm orange, city skyline between them,
energy effects and particles, dark cinematic atmosphere, epic confrontation"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
"Dramatic face-off: two opposing figures in profile facing each other,
one in cool blue lighting one in warm orange, city skyline between them,
energy effects and particles, dark cinematic atmosphere, epic confrontation"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
**步骤2a:新闻编辑风格**(匹配@therundownai——单人+大标题):
```bash
python3 scripts/compose_news_hook.py \
--base tmp/carousel/hook_base.png \
--output tmp/carousel/slide_01_hook.png \
--headline "OpenAI just hit $13B ARR making it the fastest-growing software company in history" \
--category "AI NEWS" \
--brand "@DailyAINews"compose_news_hook.py- 微妙底部渐变(缓入,可配置起始/强度)
- 标题上方的小类别标签
- 超大粗体标题(Inter Black,自动调整42-72px以填充底部35%)
- 可选左上角品牌标识
- 简洁极简——无幻灯片计数器、无CTA、无副标题
- 最佳场景:单人肖像+新闻标题
步骤2b:多人病毒式风格(compose_hook.py——多人+全叠加层):
bash
python3 scripts/compose_hook.py \
--base tmp/carousel/hook_base.png \
--output tmp/carousel/slide_01_hook.png \
--headline "THE AI WAR JUST ESCALATED" \
--subhead "3 moves that changed everything this week" \
--brand "YOUR BRAND" \
--category "AI NEWS"compose_hook.py- 底部渐变叠加层(0-220透明度,缓入曲线)以保证文本可读性
- 顶部浅渐变用于品牌区域
- 类别标签(左上角,如"AI NEWS")
- 品牌水印(居中)
- 自动换行粗体标题(底部区域,全大写)
- 可选副标题
- "SWIPE FOR MORE" CTA,带装饰线
- 幻灯片计数器(右上角,"1/8")
- 最佳场景:多人合成、对决风格
按主题类型划分的提示词策略:
| 主题类型 | 基础图像风格 | 评分 |
|---|---|---|
| 新闻/时事 | 多人合成+机器人 | 8.5/10 |
| 对比/对决 | 带对立能量的对决构图 | 8.5/10 |
| 个人简介/传记 | 单人编辑肖像 | 8/10 |
| 工具/抽象 | 剪影+全息/科技背景 | 7.5/10 |
教育/教程/框架主题:AI生成合成效果极佳(8-8.5/10)。
Real-Face Hook Pipeline (for news/current events topics)
真实人物钩子流程(适用于新闻/时事主题)
When the topic involves specific real people (Sam Altman, Elon Musk, Jensen Huang, etc.), use web-sourced Creative Commons photos instead of AI generation:
BEST Approach: Base64 multi-image via AI Gateway (10/10)
Send local photos as base64 data URIs to . This bypasses URL accessibility issues (Wikimedia blocked, etc.) and supports ALL local images including 3+ people.
/api/v1/images/generationspython
import base64, json, os
from pathlib import Path
from urllib import request
API_KEY = os.environ["AI_GATEWAY_API_KEY"]
BASE = "https://ai-gateway.happycapy.ai/api/v1" # NOT /openai/v1 !当主题涉及特定真实人物(Sam Altman、Elon Musk、Jensen Huang等)时,使用网络来源的知识共享(Creative Commons)照片替代AI生成:
最佳方案:通过AI Gateway的Base64多图像合成(10/10)
将本地照片转换为Base64数据URI,发送至。此方法绕过URL可访问性问题(如维基媒体被屏蔽),支持3人以上合成。
/api/v1/images/generationspython
import base64, json, os
from pathlib import Path
from urllib import request
API_KEY = os.environ["AI_GATEWAY_API_KEY"]
BASE = "https://ai-gateway.happycapy.ai/api/v1" # 不是/openai/v1!Load photos as base64 data URIs
将照片加载为Base64数据URI
images_b64 = []
for photo in ["elon_musk.jpg", "jensen_huang.jpg", "sam_altman.jpg"]:
data = base64.b64encode(Path(photo).read_bytes()).decode()
images_b64.append(f"data:image/jpeg;base64,{data}")
payload = {
"model": "google/gemini-3-pro-image-preview",
"prompt": "Create a dramatic face-off style composition with these three tech leaders. "
"Confrontational layout, intense red vs blue split lighting, dark background "
"with smoke/particle effects. Faces must remain photorealistic and recognizable.",
"images": images_b64,
"response_format": "url",
"n": 1
}
req = request.Request(
f"{BASE}/images/generations",
data=json.dumps(payload).encode(),
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
"Origin": "https://trickle.so"
},
method="POST"
)
with request.urlopen(req, timeout=180) as resp:
result = json.loads(resp.read())
img_url = result["data"][0]["url"]
# Download and save...
CRITICAL: Use `/api/v1/images/generations` (NOT `/api/v1/openai/v1/images/generations`). The OpenAI-prefixed endpoint rejects the `images` parameter.
**Alternative: transform_image.py with Flickr URLs (9.5/10)**
When photos are available at Flickr URLs (directly accessible by Vertex AI):
```bash
python3 ~/.claude/skills/generate-image/scripts/transform_image.py \
"Create a dramatic cinematic photomontage combining these tech leaders. \
Dark dramatic background with blue and red lighting. Keep faces EXACTLY as they appear." \
"https://live.staticflickr.com/7832/33377877458_d1a3774615_b.jpg" \
"https://live.staticflickr.com/5767/30796823531_85932ecaa0_b.jpg" \
--model "google/gemini-3-pro-image-preview" \
--output tmp/carousel/hook_base.pngPhoto sourcing rules:
- Use Creative Commons (CC BY 2.0+) photos from Flickr, Wikimedia Commons
- Flickr URLs accessible by Vertex AI; Wikimedia URLs often blocked
- Use with browser User-Agent for Wikimedia downloads to local files
urllib.request - For local-only files (Wikimedia downloads), use the base64 approach above
- Include CC attribution in carousel caption
Fallback: PIL rembg composite (7/10)
bash
pip install rembg # One-time setupimages_b64 = []
for photo in ["elon_musk.jpg", "jensen_huang.jpg", "sam_altman.jpg"]:
data = base64.b64encode(Path(photo).read_bytes()).decode()
images_b64.append(f"data:image/jpeg;base64,{data}")
payload = {
"model": "google/gemini-3-pro-image-preview",
"prompt": "Create a dramatic face-off style composition with these three tech leaders. "
"Confrontational layout, intense red vs blue split lighting, dark background "
"with smoke/particle effects. Faces must remain photorealistic and recognizable.",
"images": images_b64,
"response_format": "url",
"n": 1
}
req = request.Request(
f"{BASE}/images/generations",
data=json.dumps(payload).encode(),
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}",
"Origin": "https://trickle.so"
},
method="POST"
)
with request.urlopen(req, timeout=180) as resp:
result = json.loads(resp.read())
img_url = result["data"][0]["url"]
# 下载并保存...
关键:使用`/api/v1/images/generations`(**不是**`/api/v1/openai/v1/images/generations`)。带OpenAI前缀的端点会拒绝`images`参数。
**替代方案:使用Flickr URL的transform_image.py(9.5/10)**
当照片可通过Flickr URL访问(Vertex AI可直接访问)时:
```bash
python3 ~/.claude/skills/generate-image/scripts/transform_image.py \
"Create a dramatic cinematic photomontage combining these tech leaders. \
Dark dramatic background with blue and red lighting. Keep faces EXACTLY as they appear." \
"https://live.staticflickr.com/7832/33377877458_d1a3774615_b.jpg" \
"https://live.staticflickr.com/5767/30796823531_85932ecaa0_b.jpg" \
--model "google/gemini-pro-image-preview" \
--output tmp/carousel/hook_base.png照片来源规则:
- 使用Flickr、维基媒体共享的知识共享(CC BY 2.0+)照片
- Flickr URL可被Vertex AI访问;维基媒体URL常被屏蔽
- 下载维基媒体照片到本地时,使用搭配浏览器用户代理
urllib.request - 本地文件(如维基媒体下载)使用上述Base64方法
- 在轮播图配文中包含CC署名
备选:PIL rembg合成(7/10)
bash
pip install rembg # 一次性安装Remove backgrounds, composite onto AI background, apply compose_hook.py overlay
移除背景,合成到AI背景,应用compose_hook.py叠加层
**nano-banana-pro status:** The native google-genai SDK requires GEMINI_API_KEY (not set). The AI Gateway has no Gemini-native endpoint, so routing the SDK through the gateway fails (404). The base64 approach above achieves the same multi-image composition capability via the AI Gateway's image generation endpoint.
**nano-banana-pro状态**:原生google-genai SDK需`GEMINI_API_KEY`(未设置)。AI Gateway无Gemini原生端点,因此通过网关路由SDK会失败(404)。上述Base64方法通过AI Gateway的图像生成端点实现相同的多图像合成能力。PHASE 4: MUSIC SELECTION
阶段4:音乐选择
Select from Instagram's available music library. Do NOT generate music. Apply the Music Decision Matrix to recommend 2-3 specific tracks the user can search for on Instagram.
从Instagram可用音乐库中选择。不生成音乐。应用「音乐决策矩阵」,推荐2-3首用户可在Instagram搜索到的具体曲目。
PHASE 5: QUALITY REVIEW & EXPORT
阶段5:质量审核与导出
Run the final checklist (see APPENDIX) against every slide. Re-render any slide that fails. Output:
- 7-10 slide PNG images at 1080x1350
- Caption text with hashtags
- Music recommendation (Instagram track names + artists)
- Posting notes (best time, engagement strategy)
对每张幻灯片执行最终检查清单(见附录)。重新渲染未通过的幻灯片。输出:
- 7-10张1080x1350 PNG图像
- 带话题标签的配文文本
- 音乐推荐(Instagram曲目名称+艺术家)
- 发布提示(最佳时间、互动策略)
THE 6 FOUNDATIONAL AXIOMS
6条基础公理
Every decision in this skill traces back to these irreducible premises:
本工具的所有决策均源自这些不可简化的前提:
AXIOM 1: Attention is Finite and Contested
公理1:注意力有限且竞争激烈
A human scrolling Instagram makes a stay-or-leave decision in ~1.3 seconds. The first slide is a survival test. Visual pattern interrupts trigger involuntary attention. Cognitive curiosity gaps (Zeigarnik effect) create forward momentum. The cost of starting to swipe is high; the cost of continuing is near-zero.
用户滚动Instagram时,约1.3秒内决定停留或离开。第一张幻灯片是生存测试。视觉模式中断触发非自愿注意力。认知好奇心缺口(蔡格尼克效应)创造前进动力。开始滑动的成本高,持续滑动的成本几乎为零。
AXIOM 2: Value is the Only Sustainable Currency
公理2:价值是唯一可持续的货币
Content that does not leave the viewer materially better off is noise. Save rate is the purest signal of value. Share rate = social currency. "Useful" is domain-specific.
无法让用户获得实质性提升的内容就是噪音。保存率是价值的最纯粹信号。分享率=社交货币。「有用」是领域特定的。
AXIOM 3: Visual Cognition Precedes Textual Cognition
公理3:视觉认知优先于文本认知
The brain processes visual information 60,000x faster than text. Color communicates emotion before words. Spatial hierarchy dictates reading order. Consistency creates cognitive fluency. One dominant visual per slide.
大脑处理视觉信息的速度比文本快60,000倍。颜色在文字之前传递情绪。空间层次决定阅读顺序。一致性创造认知流畅性。每张幻灯片一个主导视觉。
AXIOM 4: Narrative Arc is Hardwired
公理4:叙事弧是与生俱来的
Content structured as narrative is retained 22x better than lists. Each slide must resolve the previous curiosity gap AND create the next one. The arc must reach genuine resolution.
结构化叙事的内容留存率比列表高22倍。每张幻灯片必须填补上一个好奇心缺口,并制造下一个。叙事弧必须达到真正的收尾。
AXIOM 5: The Medium Constrains and Enables
公理5:媒介限制与赋能并存
1080x1350 canvas on a 6-inch screen in half-attention. Minimum readable font = 24px. Bottom ~15% occluded by UI. Portrait (4:5) occupies maximum screen real estate.
6英寸屏幕上的1080x1350画布,半注意力状态。最小可读字体=24px。底部约15%被UI遮挡。竖屏(4:5)占用最大屏幕空间。
AXIOM 6: Audio Creates Emotional Context
公理6:音频创造情绪语境
Music activates the limbic system independently. Instagram's algorithm rewards music usage with 15-30% more reach. Genre signals tribal identity. Trending audio boosts discovery if it genuinely fits.
音乐独立激活边缘系统。Instagram算法对使用音乐的内容给予15-30%的曝光提升。流派传递群体身份。趋势音频若契合内容,可提升发现率。
THE 7 CAROUSEL ARCHETYPES
7种轮播图原型
Auto-select the best archetype based on the topic. Each archetype has a specific slide structure, value test, and music profile.
根据主题自动选择最佳原型。每个原型有特定的幻灯片结构、价值测试和音乐配置。
1. TUTORIAL (How-To)
1. 教程(操作指南)
Slide 1: Problem statement (hook)
Slide 2: Tool/method introduction
Slide 3: Step 1 (with visual)
Slide 4: Step 2
Slide 5: Step 3
Slide 6: Step 4 (if needed)
Slide 7: Result / proof it works
Slide 8: Common mistakes to avoid
Slide 9: Quick-reference summary (save-worthy)
Slide 10: CTAValue Test: Can the reader DO the thing after reading?
Music Profile: Lo-fi/chillhop, 70-85 BPM, instrumental
幻灯片1:问题陈述(钩子)
幻灯片2:工具/方法介绍
幻灯片3:步骤1(带视觉)
幻灯片4:步骤2
幻灯片5:步骤3
幻灯片6:步骤4(若需要)
幻灯片7:结果/效果证明
幻灯片8:常见错误规避
幻灯片9:快速参考总结(值得保存)
幻灯片10:CTA价值测试:读者阅读后能否完成操作?
音乐配置:Lo-fi/chillhop,70-85 BPM,纯器乐
2. FRAMEWORK (Mental Model)
2. 框架(思维模型)
Slide 1: Common problem everyone faces (hook)
Slide 2: Why existing approaches fail
Slide 3: The framework name + overview
Slide 4: Component 1 explained
Slide 5: Component 2 explained
Slide 6: Component 3 explained
Slide 7: How the components connect (diagram)
Slide 8: Practical application example
Slide 9: The complete framework visual (save-worthy)
Slide 10: CTAValue Test: Does the reader now have a reusable thinking tool?
Music Profile: Minimal electronic, 90-110 BPM, instrumental
幻灯片1:所有人都面临的常见问题(钩子)
幻灯片2:现有方法为何失效
幻灯片3:框架名称+概述
幻灯片4:组件1解读
幻灯片5:组件2解读
幻灯片6:组件3解读
幻灯片7:组件连接方式(图表)
幻灯片8:实际应用示例
幻灯片9:完整框架视觉(值得保存)
幻灯片10:CTA价值测试:读者是否获得可复用的思维工具?
音乐配置:极简电子,90-110 BPM,纯器乐
3. MYTH-BUSTER (Contrarian Insight)
3. 谣言粉碎机(逆向洞察)
Slide 1: "Everyone thinks X" (hook)
Slide 2: "Here's what's actually happening"
Slide 3: Evidence 1
Slide 4: Evidence 2
Slide 5: Evidence 3
Slide 6: The real framework / truth
Slide 7: Implications
Slide 8: What to do instead
Slide 9: The mental model shift (save-worthy)
Slide 10: CTAValue Test: Has the reader's mental model shifted?
Music Profile: Trip-hop/downtempo, 85-100 BPM, instrumental
幻灯片1:「所有人都认为X」(钩子)
幻灯片2:「实际情况是这样」
幻灯片3:证据1
幻灯片4:证据2
幻灯片5:证据3
幻灯片6:真实框架/真相
幻灯片7:影响
幻灯片8:替代方案
幻灯片9:思维模型转变(值得保存)
幻灯片10:CTA价值测试:读者的思维模型是否转变?
音乐配置:Trip-hop/downtempo,85-100 BPM,纯器乐
4. CASE STUDY (Proof-Based)
4. 案例研究(基于证据)
Slide 1: The result / shocking metric (hook)
Slide 2: The context / starting point
Slide 3: What was done (overview)
Slide 4: Step 1 of the process
Slide 5: Step 2
Slide 6: Step 3
Slide 7: The data / proof
Slide 8: Key insight
Slide 9: How you can replicate it (save-worthy)
Slide 10: CTAValue Test: Is the specific mechanism replicable?
Music Profile: Upbeat electronic, 110-120 BPM, light vocals OK
幻灯片1:结果/惊人数据(钩子)
幻灯片2:背景/起点
幻灯片3:实施内容(概述)
幻灯片4:流程步骤1
幻灯片5:步骤2
幻灯片6:步骤3
幻灯片7:数据/效果证明
幻灯片8:核心洞察
幻灯片9:复制方法(值得保存)
幻灯片10:CTA价值测试:具体机制是否可复制?
音乐配置: upbeat电子,110-120 BPM,可带轻量人声
5. CURATED LIST (Resource Compilation)
5. 精选列表(资源汇总)
Slide 1: "X Tools/Resources for Y" (hook)
Slide 2: Item 1 + why it's valuable
Slide 3: Item 2 + why
Slide 4: Item 3 + why
Slide 5: Item 4 + why
Slide 6: Item 5 + why
Slide 7: Item 6 + why (if needed)
Slide 8: Item 7 + why (if needed)
Slide 9: Comparison / selection guide (save-worthy)
Slide 10: CTAValue Test: Can the reader immediately use at least 3 of these?
Music Profile: Chill beats/lo-fi, 75-90 BPM, instrumental
幻灯片1:「Y所需的X工具/资源」(钩子)
幻灯片2:项目1+价值说明
幻灯片3:项目2+价值说明
幻灯片4:项目3+价值说明
幻灯片5:项目4+价值说明
幻灯片6:项目5+价值说明
幻灯片7:项目6+价值说明(若需要)
幻灯片8:项目7+价值说明(若需要)
幻灯片9:对比/选择指南(值得保存)
幻灯片10:CTA价值测试:读者能否立即使用至少3项资源?
音乐配置:Chill beats/lo-fi,75-90 BPM,纯器乐
6. DEEP DIVE (Technical Explanation)
6. 深度解析(技术说明)
Slide 1: The concept + why it matters (hook)
Slide 2: What most people get wrong
Slide 3: How it actually works (simplified)
Slide 4: Visual diagram / mechanism
Slide 5: Practical example 1
Slide 6: Practical example 2
Slide 7: Common mistakes
Slide 8: Pro tips
Slide 9: The complete mental model (save-worthy)
Slide 10: CTAValue Test: Does the reader understand the mechanism, not just the surface?
Music Profile: Ambient/atmospheric, 60-80 BPM, instrumental only
幻灯片1:概念+重要性(钩子)
幻灯片2:多数人理解错误的点
幻灯片3:实际工作原理(简化版)
幻灯片4:视觉图表/机制
幻灯片5:实际示例1
幻灯片6:实际示例2
幻灯片7:常见错误
幻灯片8:专业技巧
幻灯片9:完整思维模型(值得保存)
幻灯片10:CTA价值测试:读者是否理解机制,而非仅表面内容?
音乐配置:Ambient/atmospheric,60-80 BPM,仅纯器乐
7. TRANSFORMATION (Before/After)
7. 转型(前后对比)
Slide 1: The "after" result (hook)
Slide 2: The "before" state / the pain
Slide 3: The discovery / turning point
Slide 4: The change in approach
Slide 5: Step 1 of the new way
Slide 6: Step 2
Slide 7: Step 3
Slide 8: The complete "after" state with proof
Slide 9: How to start your transformation (save-worthy)
Slide 10: CTAValue Test: Can the reader see themselves in the transformation?
Music Profile: Progressive/building, 80-120 BPM arc, light vocals OK
幻灯片1:「之后」的结果(钩子)
幻灯片2:「之前」的状态/痛点
幻灯片3:发现/转折点
幻灯片4:方法转变
幻灯片5:新方法步骤1
幻灯片6:步骤2
幻灯片7:步骤3
幻灯片8:完整「之后」状态+证明
幻灯片9:如何开启转型(值得保存)
幻灯片10:CTA价值测试:读者能否在转型中看到自己的影子?
音乐配置:Progressive/building,80-120 BPM弧,可带轻量人声
THE 6 HOOK PATTERNS
6种钩子模式
The first slide determines everything. Select the best hook pattern for the topic:
第一张幻灯片决定一切。根据主题选择最佳钩子模式:
1. The Curiosity Gap
1. 好奇心缺口
"Claude Code has a memory problem. Here's how to fix it for free."
States a problem the audience recognizes + promises a solution. Optionally removes an objection ("for free", "in 5 minutes").
"Claude Code存在内存问题。这是免费解决方法。"
陈述受众认可的问题+承诺解决方案。可移除异议(如"免费"、"5分钟内")。
2. The Contrarian Statement
2. 逆向声明
"Stop using RAG. There's a better way."
Contradicts a common belief. Creates cognitive dissonance that demands resolution.
"停止使用RAG。有更好的方法。"
与普遍认知相悖。制造认知失调,迫使读者寻求答案。
3. The Specific Result
3. 具体结果
"This setup saved me 4 hours per week of prompt debugging."
Concrete numbers bypass the vague-promise filter. Specificity = credibility.
"此设置每周为我节省4小时的提示词调试时间。"
具体数字绕过模糊承诺过滤器。具体性=可信度。
4. The Analogy Bridge
4. 类比桥梁
"Your AI agent's memory works like a messy desk. Here's how to organize it."
Maps unfamiliar onto familiar. Creates instant comprehension.
"你的AI Agent内存就像杂乱的书桌。这是整理方法。"
将陌生概念映射到熟悉事物。瞬间理解。
5. The "You're Doing It Wrong"
5. 「你做错了」
"90% of developers use Claude Code wrong. Are you one of them?"
Identity-based challenge. Use sparingly -- dangerous if overused.
"90%的开发者错误使用Claude Code。你是其中之一吗?"
基于身份的挑战。谨慎使用——过度使用会引发反感。
6. The Stack / Combination
6. 组合/叠加
"Obsidian + Claude Code = unlimited AI memory"
Two known things combined unexpectedly. The "+" implies synergy.
"Obsidian + Claude Code = 无限AI内存"
将两个已知事物意外结合。「+」意味着协同效应。
THE BULLSHIT TEST (Mandatory Quality Gate)
废话检测(强制质量关卡)
Every single slide must pass ALL 3 conditions before rendering. No exceptions.
每张幻灯片在渲染前必须通过全部3项条件。无例外。
Condition 1: SPECIFICITY
条件1:具体性
Does this contain a concrete, actionable insight that could NOT be guessed by someone with zero domain knowledge?
- FAIL: "Use the right tools for the job"
- PASS: "Obsidian's graph view lets Claude Code traverse 10x more documents by following wiki-links between markdown files"
内容是否包含具体、可操作的洞察?领域外人士无法猜测到的内容?
- 不通过:"使用适合的工具"
- 通过:"Obsidian的图谱视图让Claude Code通过Markdown文件间的维基链接,遍历10倍多的文档"
Condition 2: NOVELTY
条件2:新颖性
Does this present a connection, framework, or technique the viewer has likely NOT encountered before?
- FAIL: "AI is changing the world"
- PASS: "By creating bidirectional links between your docs, you turn Claude Code's context window into a navigation system instead of a storage container"
内容是否呈现读者可能从未接触过的关联、框架或技巧?
- 不通过:"AI正在改变世界"
- 通过:"通过在文档间创建双向链接,你将Claude Code的上下文窗口从存储容器转变为导航系统"
Condition 3: DENSITY
条件3:密度
Could the same information be compressed further without loss of meaning? If yes, it is padded and needs to be tightened.
- FAIL: "There are many benefits to using this approach, including several key advantages that make it worthwhile"
- PASS: "3 benefits: 10x doc navigation, auto-linked memory, zero-config setup"
If a slide fails any condition, rewrite it before rendering.
相同信息能否进一步压缩而不丢失含义?若可以,则内容冗余,需要精简。
- 不通过:"使用此方法有诸多好处,包括数个关键优势,值得一试"
- 通过:"3个优势:10倍文档导航、自动链接内存、零配置设置"
若幻灯片未通过任何条件,渲染前重写。
VISUAL DESIGN SYSTEM
视觉设计系统
Typography Hierarchy
排版层次
| Element | Size | Weight | Font Type |
|---|---|---|---|
| Slide Title | 64-80px | Bold/Black (700-900) | Strong serif OR geometric sans |
| Subtitle / Hook | 32-40px | SemiBold (600) | Same family as title |
| Body Text | 24-28px | Regular (400) | Clean sans-serif |
| Bullet Points | 22-26px | Regular (400) | Same as body |
| Labels / Citations | 16-20px | Light (300) | Same as body |
| Slide Indicator | 14-16px | Light (300) | Sans-serif |
Rules:
- Maximum 2 fonts per carousel
- Title font and body font must pair well
- NEVER go below 24px for any text the reader must understand
- Consistent across ALL slides
| 元素 | 尺寸 | 字重 | 字体类型 |
|---|---|---|---|
| 幻灯片标题 | 64-80px | 粗体/特粗体(700-900) | 醒目衬线或几何无衬线 |
| 副标题/钩子 | 32-40px | 半粗体(600) | 与标题同系列 |
| 主体文本 | 24-28px | 常规(400) | 简洁无衬线 |
| 列表项 | 22-26px | 常规(400) | 与主体同系列 |
| 标签/引用 | 16-20px | 轻量(300) | 与主体同系列 |
| 幻灯片指示器 | 14-16px | 轻量(300) | 无衬线 |
规则:
- 每个轮播图最多2种字体
- 标题字体与主体字体必须搭配协调
- 读者必须理解的文本,字号绝不低于24px
- 所有幻灯片保持一致
Color Palettes by Content Vertical
按内容垂直领域划分的调色板
Tech / AI / Coding:
- Background: (deep dark) or
#0D1117(midnight blue)#1A1A2E - Primary text: (near-white) or
#E6EDF3#F0F6FC - Accent: (electric purple) or
#7C3AED(bright blue)#3B82F6 - Secondary: (muted gray)
#6B7280
Business / Strategy:
- Background: Linear gradient to
#F97316(warm amber) or#EAB308(cream)#FFF7ED - Primary text: (near-black)
#1C1917 - Accent: (confident red) or
#DC2626(gold)#F59E0B - Secondary: (warm gray)
#78716C
Education / How-To:
- Background: (clean white) or
#FFFFFF(cool off-white)#F8FAFC - Primary text: (dark slate)
#0F172A - Accent: (trust blue) or
#2563EB(sky blue)#0EA5E9 - Secondary: (slate gray)
#64748B
Design / Creative:
- Background: (charcoal) or
#18181B(near-white)#FAFAFA - Primary text: Inverse of background
- Accent: ONE bold color (magenta,
#EC4899emerald, or#10B981amber)#F59E0B - Secondary: (zinc)
#71717A
Mindset / Growth:
- Background: (warm neutral) or
#F5F0EB(forest dark)#1B3A2D - Primary text: (earth brown) or
#2D2416(warm light)#E8E0D5 - Accent: (forest green) or
#16A34A(amber earth)#B45309 - Secondary: (warm mid-tone)
#8B7355
科技/AI/编程:
- 背景:(深黑)或
#0D1117(午夜蓝)#1A1A2E - 主文本:(近白)或
#E6EDF3#F0F6FC - 强调色:(电光紫)或
#7C3AED(亮蓝)#3B82F6 - 次要色:(灰)
#6B7280
商业/战略:
- 背景:线性渐变到
#F97316(暖琥珀)或#EAB308(奶油色)#FFF7ED - 主文本:(近黑)
#1C1917 - 强调色:(醒目红)或
#DC2626(金色)#F59E0B - 次要色:(暖灰)
#78716C
教育/操作指南:
- 背景:(纯白)或
#FFFFFF(冷白)#F8FAFC - 主文本:(深灰)
#0F172A - 强调色:(信任蓝)或
#2563EB(天蓝色)#0EA5E9 - 次要色:(灰)
#64748B
设计/创意:
- 背景:(炭黑)或
#18181B(近白)#FAFAFA - 主文本:与背景反色
- 强调色:一种醒目颜色(洋红、
#EC4899祖母绿或#10B981琥珀)#F59E0B - 次要色:(灰)
#71717A
心态/成长:
- 背景:(暖中性)或
#F5F0EB(深绿)#1B3A2D - 主文本:(土棕)或
#2D2416(暖白)#E8E0D5 - 强调色:(森林绿)或
#16A34A(琥珀棕)#B45309 - 次要色:(暖灰)
#8B7355
Layout Rules
布局规则
- Canvas: 1080 x 1350 px (4:5 portrait) -- ALWAYS
- Margins: 60px minimum on all sides
- Safe Zone: Center 80% (top/bottom 10% may be occluded by Instagram UI)
- One idea per slide: If a slide has two ideas, split it into two slides
- Visual anchor: Every slide needs ONE dominant visual element
- Breathing room: Content should never feel cramped -- generous whitespace signals quality
- 画布:1080 x 1350 px(4:5竖屏)——始终
- 边距:所有侧边最小60px
- 安全区域:中间80%(顶部/底部10%可能被Instagram UI遮挡)
- 每张一个观点:若一张幻灯片有两个观点,拆分为两张
- 视觉锚点:每张幻灯片需要一个主导视觉元素
- 呼吸空间:内容绝不能拥挤——充足留白彰显质量
MUSIC SELECTION (Instagram Library)
音乐选择(Instagram库)
Do NOT generate music. Recommend specific tracks available on Instagram's music library.
不生成音乐。推荐Instagram音乐库中可搜索到的具体曲目。
Music Decision Matrix
音乐决策矩阵
| Content Type | Search Keywords on Instagram | BPM Range | Vocals | Example Tracks to Search |
|---|---|---|---|---|
| Tech / AI | "lo-fi", "chill beats", "trip-hop" | 70-90 | No | DJ Shadow - Six Days, Nujabes - Aruarian Dance, Tycho - A Walk, Bonobo - Kerala |
| Business | "indie electronic", "future bass" | 100-120 | Minimal | ODESZA - A Moment Apart, Rufus Du Sol - Innerbloom, Bicep - Glue |
| Tutorial | "study beats", "chillhop", "acoustic" | 75-95 | No | Idealism - Lovely Day, Jinsang - Solitude, Tomppabeats - Monday Loop |
| Motivational | "epic", "cinematic", "uplifting" | 110-130 | Optional | M83 - Midnight City, Hans Zimmer - Time, Illenium - Good Things Fall Apart |
| Creative | "minimal techno", "ambient", "art" | 90-115 | No | Four Tet - Two Thousand and Seventeen, Jon Hopkins - Emerald Rush, Kiasmos - Blurred |
| Myth-Buster | "dark ambient", "post-rock", "mysterious" | 80-100 | No | Massive Attack - Teardrop, Radiohead - Everything In Its Right Place, Portishead - Wandering Star |
| Case Study | "upbeat", "indie pop", "electronic" | 110-125 | Light | Washed Out - Feel It All Around, Toro y Moi - So Many Details, M83 - Wait |
| 内容类型 | Instagram搜索关键词 | BPM范围 | 人声 | 搜索示例曲目 |
|---|---|---|---|---|
| 科技/AI | "lo-fi"、"chill beats"、"trip-hop" | 70-90 | 无 | DJ Shadow - Six Days、Nujabes - Aruarian Dance、Tycho - A Walk、Bonobo - Kerala |
| 商业 | "indie electronic"、"future bass" | 100-120 | 少量 | ODESZA - A Moment Apart、Rufus Du Sol - Innerbloom、Bicep - Glue |
| 教程 | "study beats"、"chillhop"、"acoustic" | 75-95 | 无 | Idealism - Lovely Day、Jinsang - Solitude、Tomppabeats - Monday Loop |
| 励志 | "epic"、"cinematic"、"uplifting" | 110-130 | 可选 | M83 - Midnight City、Hans Zimmer - Time、Illenium - Good Things Fall Apart |
| 创意 | "minimal techno"、"ambient"、"art" | 90-115 | 无 | Four Tet - Two Thousand and Seventeen、Jon Hopkins - Emerald Rush、Kiasmos - Blurred |
| 谣言粉碎机 | "dark ambient"、"post-rock"、"mysterious" | 80-100 | 无 | Massive Attack - Teardrop、Radiohead - Everything In Its Right Place、Portishead - Wandering Star |
| 案例研究 | "upbeat"、"indie pop"、"electronic" | 110-125 | 轻量 | Washed Out - Feel It All Around、Toro y Moi - So Many Details、M83 - Wait |
Music Selection Rules
音乐选择规则
- Text-heavy carousels: ALWAYS instrumental only (vocals compete with reading)
- Visual-heavy carousels: Vocals acceptable (separate processing channels)
- Trending audio: Use ONLY if it genuinely fits the content type. Mismatched trending sounds damage authenticity
- Trending audio lifecycle: Discovery (Day 0-3, max boost) -> Growth (Day 3-14, good) -> Peak (Day 14-30, OK) -> Saturation (Day 30+, skip)
- Output format: Provide 2-3 track recommendations with artist name, track name, and why it fits
- 文本密集型轮播图:始终使用纯器乐(人声会干扰阅读)
- 视觉密集型轮播图:可使用带人声曲目(独立处理通道)
- 趋势音频:仅当真正契合内容类型时使用。不匹配的趋势音频会损害真实性
- 趋势音频生命周期:发现期(0-3天,最大曝光提升)→增长期(3-14天,良好)→峰值期(14-30天,尚可)→饱和期(30天以上,跳过)
- 输出格式:提供2-3首推荐曲目,包含艺术家名称、曲目名称及契合原因
CAPTION TEMPLATE
配文模板
[Hook line -- front-load value, must be compelling in first 2 lines before "...more"]
[2-3 sentences expanding the core value proposition]
[Key points:]
- Point 1 (specific, not vague)
- Point 2
- Point 3
[Specific CTA -- NOT "What do you think?" but rather a specific question or action]
[5-15 hashtags with distribution:]
[2-3 broad (100K-1M posts)] [3-5 niche (10K-100K)] [2-3 community (1K-10K)] [1-2 branded][钩子句——前2行突出价值,「查看更多」前必须引人注目]
[2-3句话扩展核心价值主张]
[核心要点:]
- 要点1(具体,不模糊)
- 要点2
- 要点3
[具体CTA——不是「你怎么看?」,而是具体问题或行动]
[5-15个话题标签,分布如下:]
[2-3个宽泛标签(10万-100万帖子)] [3-5个细分标签(1万-10万帖子)] [2-3个社区标签(1千-1万帖子)] [1-2个品牌标签]INSTAGRAM ALGORITHM OPTIMIZATION
Instagram算法优化
- Save Rate is the #1 signal. Design every carousel to be save-worthy. Include a synthesis/mental-model slide.
- 10-slide carousels outperform shorter ones by ~30% in save rate
- Dwell time: More slides = more time on post = algorithm reward
- Music adds ~15-30% reach boost
- Re-engagement: Instagram re-shows carousels to users who did not swipe all the way through
- First hour: Posts saved within the first hour get exponential distribution
- Hashtags: Put in caption (not first comment). 5-15 total.
- 保存率是头号信号。设计每张轮播图以值得保存为目标。包含总结/思维模型幻灯片。
- 10张幻灯片的轮播图保存率比短轮播图高约30%
- 停留时间:幻灯片越多=帖子停留时间越长=算法奖励
- 音乐提升约15-30%的曝光
- 再互动:Instagram会向未滑完轮播图的用户重新展示内容
- 首小时:首小时内被保存的帖子获得指数级分发
- 话题标签:放在配文中(不是第一条评论)。总计5-15个。
RENDERING SCRIPTS
渲染脚本
render_latex_slide.py (PRIMARY RENDERER)
render_latex_slide.py(主渲染器)
Publication-grade LaTeX slide renderer. Produces 1080x1350 PNG slides using pdflatex + pdftoppm.
6 slide types: , , , , ,
4 themes: , , ,
hookbodycomparisondiagramsynthesisctawarmcleandarkearthbash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type body \
--data body_data.json \
--output slide.png \
--theme dark \
--brand brand_config.jsonData fields by slide type:
- hook: ,
title,title_highlight,subtitle,callout,ai_bg,overlay_opacitylogos[] - body: ,
title,title_highlight,body,bullets[]bg_style - comparison: ,
title,subtitle,columns[{name, items[{label, value}]}]bg_style - diagram: ,
title,description,diagram_nodes[{label, desc}](vertical/horizontal),diagram_type,ai_bgoverlay_opacity - synthesis: ,
title,points[]bg_style - cta: ,
title,cta_text,handle,stats[],ai_bgoverlay_opacity - All types: ,
slide_num,total_slides,show_nav(full-bleed background),ai_bg,overlay_opacitybg_style
出版级LaTeX幻灯片渲染器。使用pdflatex+pdftoppm生成1080x1350 PNG幻灯片。
6种幻灯片类型:、、、、、
4种主题:、、、
hookbodycomparisondiagramsynthesisctawarmcleandarkearthbash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
--type body \
--data body_data.json \
--output slide.png \
--theme dark \
--brand brand_config.json按幻灯片类型划分的数据字段:
- hook:、
title、title_highlight、subtitle、callout、ai_bg、overlay_opacitylogos[] - body:、
title、title_highlight、body、bullets[]bg_style - comparison:、
title、subtitle、columns[{name, items[{label, value}]}]bg_style - diagram:、
title、description、diagram_nodes[{label, desc}](vertical/horizontal)、diagram_type、ai_bgoverlay_opacity - synthesis:、
title、points[]bg_style - cta:、
title、cta_text、handle、stats[]、ai_bgoverlay_opacity - 所有类型:、
slide_num、total_slides、show_nav(全屏背景)、ai_bg、overlay_opacitybg_style
generate_carousel.py (ORCHESTRATOR)
generate_carousel.py(编排器)
End-to-end carousel generation from a JSON spec. Handles slide numbering, rendering, and preview grid assembly.
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
--spec carousel_spec.json \
--output-dir outputs/carousel/ \
--brand brand_config.json从JSON规格端到端生成轮播图。处理幻灯片编号、渲染及预览网格组装。
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
--spec carousel_spec.json \
--output-dir outputs/carousel/ \
--brand brand_config.jsonAI Image Generation (via generate-image skill)
AI图像生成(通过generate-image工具)
Use the skill for all AI images (hook bg, CTA bg, diagram bg). See "AI Visual Generation" section above for examples.
generate-imagebash
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Your detailed prompt here, 50+ words, no text no words no letters" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/image.png所有AI图像(钩子背景、CTA背景、图表背景)使用工具生成。见上文「AI视觉生成」部分示例。
generate-imagebash
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
"Your detailed prompt here, 50+ words, no text no words no letters" \
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/image.pngassemble_carousel.py (ASSEMBLY)
assemble_carousel.py(组装)
Validates 1080x1350, optimizes PNGs, creates preview grid, generates metadata JSON.
bash
python3 ~/.claude/skills/world-class-carousel/scripts/assemble_carousel.py \
--input-dir tmp/carousel/ --output-dir outputs/carousel/ --optimize验证1080x1350尺寸、优化PNG、创建预览网格、生成元数据JSON。
bash
python3 ~/.claude/skills/world-class-carousel/scripts/assemble_carousel.py \
--input-dir tmp/carousel/ --output-dir outputs/carousel/ --optimizerender_slide.py (LEGACY - Pillow-based)
render_slide.py(旧版——基于Pillow)
Pillow-based renderer with 6 layout modes. Superseded by for production use. Still available for quick prototyping without LaTeX dependencies.
render_latex_slide.py基于Pillow的渲染器,含6种布局模式。已取代它,用于生产环境。仍可用于无LaTeX依赖的快速原型。
render_latex_slide.py10 WORLD-CLASS DIFFERENTIATORS
10个世界级差异化因素
Apply these to elevate from "good" to "world-class":
- Intellectual Density: One INSIGHT per slide, not just one idea (insight = non-obvious connection between two known things)
- Visual Craftsmanship: Every pixel intentional. Margins mathematical. Colors from a system.
- Hook Specificity: "I tested 1,247 prompts across 6 models" not "5 Tips for Better Prompts"
- Narrative Completeness: Each slide creates a question the next answers. Final slide ties back to hook.
- Proof Over Claims: Screenshots, before/after comparisons, specific metrics -- not "this is great"
- Typography as Design: The way words are sized, spaced, and placed tells the story VISUALLY
- Strategic Restraint: Know what to leave OUT. Negative space is a design choice.
- Music-Content Resonance: BPM matches reading pace. Genre signals the tribe.
- Save-Worthy Synthesis: Last content slide is a mental model / framework diagram worth saving
- Authentic Voice: Written as one expert talking to a colleague. Never "content creator voice."
应用这些因素,从「优秀」提升至「世界级」:
- 知识密度:每张一个洞察,而非一个观点(洞察=两个已知事物间的非显而易见关联)
- 视觉工艺:每个像素都有意图。边距精确。颜色源自系统。
- 钩子具体性:"我在6个模型上测试了1247条提示词"而非"5个提示词优化技巧"
- 叙事完整性:每张幻灯片制造下一张要回答的问题。最后一张幻灯片呼应钩子。
- 证据优先于主张:截图、前后对比、具体数据——而非"这很棒"
- 排版即设计:文字的尺寸、间距和放置方式视觉化讲述故事
- 战略性克制:知道该省略什么。留白是设计选择。
- 音乐-内容共鸣:BPM匹配阅读节奏。流派传递群体身份。
- 值得保存的总结:最后一张内容幻灯片是值得保存的思维模型/框架图表
- 真实语气:以专家对同事的语气撰写。绝不是「内容创作者语气」。
FINAL CHECKLIST
最终检查清单
Before delivering any carousel, verify ALL of these:
- First slide passes the 1.3-second scroll-stop test
- Every slide passes the Bullshit Test (Specific, Novel, Dense)
- One idea per slide, no exceptions
- Typography readable at mobile size (24px+ body text)
- Color palette consistent across all slides
- Narrative arc complete (tension -> resolution)
- Each slide creates curiosity for the next
- Last content slide has a save-worthy synthesis (mental model, framework, diagram)
- CTA slide is clear and specific
- Music recommendation matches content type and audience
- Aspect ratio is 1080x1350 (4:5)
- All content within safe zone (not occluded by UI)
- Caption front-loads value in first 2 lines
- 5-15 hashtags with proper distribution (broad + niche + community + branded)
- Alt-text provided for accessibility
- Every rendered slide visually inspected (no half-empty slides)
- Synthesis title < 4 words
- Synthesis points are flat strings, not dicts
- JSON data passed via temp files, not inline in bash
- Hook/CTA slides use for visual topics; body slides stay text-only
ai_bg - AI images prompted with "no text" to prevent unwanted labels
- Overlay opacity 0.60-0.68 for hooks, 0.65-0.70 for CTA
- All KNOWN_ISSUES.md rules checked before delivery
交付任何轮播图前,验证所有以下项:
- 第一张幻灯片通过1.3秒停滑测试
- 每张幻灯片通过废话检测(具体、新颖、密集)
- 每张一个观点,无例外
- 排版在手机尺寸下可读(主体文本24px+)
- 调色板所有幻灯片一致
- 叙事弧完整(张力→收尾)
- 每张幻灯片为下一张制造好奇心
- 最后一张内容幻灯片是值得保存的总结(思维模型、框架、图表)
- CTA幻灯片清晰具体
- 音乐推荐匹配内容类型与受众
- 宽高比为1080x1350(4:5)
- 所有内容位于安全区域(未被UI遮挡)
- 配文前2行突出价值
- 5-15个话题标签,分布合理(宽泛+细分+社区+品牌)
- 提供无障碍替代文本(Alt-text)
- 所有渲染幻灯片经视觉检查(无半空白幻灯片)
- 总结标题<4词
- 总结要点为纯字符串,非字典
- JSON数据通过临时文件传递,而非bash内联
- 钩子/CTA幻灯片对视觉主题使用;主体幻灯片保持纯文本
ai_bg - AI图像提示词包含「无文本」,避免多余标签
- 叠加层透明度:钩子0.60-0.68,CTA0.65-0.70
- 交付前检查所有规则
KNOWN_ISSUES.md
PHASE 6: LEARNING PROTOCOL (Post-Delivery)
阶段6:学习协议(交付后)
After every carousel delivery, update the skill's knowledge base. This system prevents repeating mistakes while staying compact.
每次轮播图交付后,更新工具知识库。此系统避免重复错误,同时保持精简。
Two-Tier Memory Architecture
双层内存架构
Tier 1: (in this skill directory)
KNOWN_ISSUES.md- MAX 60 lines. Contains ONLY compressed, actionable rules.
- Format: one-line rules grouped by category. No narratives, no session history.
- When adding a new rule: check if it supersedes an existing rule. If yes, REPLACE the old rule. Never append duplicates.
- Read this file at the START of every carousel session to avoid known pitfalls.
Tier 2: directory (in this skill directory)
session-archives/- Verbose session logs go here as timestamped files:
session-archives/YYYY-MM-DD-topic.md - Include: full experiment data, scoring matrices, debug traces, before/after comparisons.
- These files are NEVER loaded into context unless explicitly requested by the user.
- They exist as raw data for future deep-dives, not as operational knowledge.
第一层:(位于此工具目录)
KNOWN_ISSUES.md- 最多60行。仅包含压缩、可操作的规则。
- 格式:按类别分组的单行规则。无叙事,无会话历史。
- 添加新规则时:检查是否取代现有规则。若是,替换旧规则。绝不添加重复项。
- 每次轮播图会话开始时阅读此文件,避免已知陷阱。
第二层:目录(位于此工具目录)
session-archives/- 详细会话日志以时间戳文件存储:
session-archives/YYYY-MM-DD-topic.md - 包含:完整实验数据、评分矩阵、调试跟踪、前后对比。
- 除非用户明确请求,否则绝不加载到上下文。
- 作为原始数据用于未来深度分析,而非操作知识。
After Every Session
每次会话后
- Check KNOWN_ISSUES.md -- Does this session reveal a new rule? Add it (max 1 line). Does it supersede an old rule? Replace it.
- Archive verbose data -- If the session involved experiments, debugging, or research, write a session archive file.
- Compress, don't accumulate -- The goal is a fixed-size knowledge base that gets BETTER over time, not BIGGER.
- 检查——本次会话是否揭示新规则?添加(最多1行)。是否取代旧规则?替换。
KNOWN_ISSUES.md - 归档详细数据——若会话涉及实验、调试或研究,撰写会话归档文件。
- 压缩,而非积累——目标是固定大小的知识库,随时间优化,而非变大。
The Compression Principle
压缩原则
Every piece of learning must be compressed to its irreducible form before entering Tier 1:
- BAD: "In session on March 10, we discovered that passing synthesis points as dicts causes an AttributeError because the renderer at line 870 does escape_latex(pt) directly on each point" (38 words)
- GOOD: "Synthesis must be FLAT STRINGS, not dicts. Renderer does
points[]directly." (12 words)escape_latex(pt)
If you can't compress it to one line, it belongs in Tier 2 (session archive), not Tier 1.
所有学习内容必须压缩至不可简化形式,才能进入第一层:
- 糟糕:"在3月10日的会话中,我们发现传递总结要点为字典时,会引发AttributeError,因为渲染器第870行直接对每个要点执行escape_latex(pt)"(38词)
- 优秀:"总结必须为纯字符串,非字典。渲染器直接执行
points[]。"(12词)escape_latex(pt)
若无法压缩为一行,则属于第二层(会话归档),而非第一层。