world-class-carousel

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

World-Class Instagram Carousel Generator

世界级Instagram轮播图生成器

Generate Instagram carousels that are genuinely world-class: content people save, share, and come back to. Not engagement bait. Not AI slop. Actual value, delivered through precise visual design and narrative structure.
This skill is fully generalized. It contains FORM (structure, principles, patterns), not MATTER (specific topics). The user provides the matter (topic); the skill provides the form (archetypes, design system, music matrix, quality gates). Together they produce the carousel. Nothing is hardcoded.

生成真正世界级的Instagram轮播内容:用户愿意保存、分享并反复查看的内容,而非博眼球的噱头或AI生成的劣质内容。通过精准的视觉设计和叙事结构传递实际价值。
本工具完全通用。它包含「形式」(结构、原则、模式),而非「内容」(特定主题)。用户提供内容(主题),工具提供形式(原型、设计系统、音乐矩阵、质量关卡),二者结合产出轮播图。无任何硬编码内容。

BEFORE YOU START: Read
KNOWN_ISSUES.md

开始前必读:查看
KNOWN_ISSUES.md

Before generating ANY carousel, read
/home/node/.claude/skills/world-class-carousel/KNOWN_ISSUES.md
. It contains compressed rules from all previous sessions -- data format gotchas, sizing rules, visual strategy decisions, and quality gates. Ignoring it means repeating solved mistakes.

在生成任何轮播图之前,请阅读
/home/node/.claude/skills/world-class-carousel/KNOWN_ISSUES.md
。其中包含所有过往会话总结的压缩规则——数据格式注意事项、尺寸规则、视觉策略决策及质量关卡。忽略它意味着重复已解决的错误。

EXECUTION PIPELINE

执行流程

When the user requests a carousel, execute these 6 phases in order (Phase 6 runs post-delivery):
当用户请求生成轮播图时,按顺序执行以下6个阶段(第6阶段在交付后执行):

PHASE 1: RESEARCH & STRUCTURING

阶段1:研究与结构化

  1. Analyze the topic -- What is the core insight? What specific value can this deliver?
  2. Identify the audience -- What does the target audience NOT already know? What's their current understanding?
  3. Auto-detect content vertical and theme -- Use the Content Vertical Detection table below
  4. Select the archetype -- Which of the 7 carousel archetypes (see below) fits best? Use the Archetype Selection Guide below. Auto-select unless the user specifies.
  5. Design the narrative arc -- Map each archetype role to a renderer slide type using the Role-to-SlideType Mapping below. Ensure each slide creates a curiosity gap that the next slide resolves.
  6. Run the Bullshit Test on the outline -- Does every slide pass? (See QUALITY GATE below)
  1. 分析主题——核心洞察是什么?能传递哪些具体价值?
  2. 定位受众——目标受众尚不了解什么?他们当前的认知水平如何?
  3. 自动检测内容垂直领域与主题——使用下方的「内容垂直领域检测表」
  4. 选择原型——7种轮播图原型(见下文)中哪一种最契合?使用下方的「原型选择指南」。除非用户指定,否则自动选择。
  5. 设计叙事弧——使用「角色到幻灯片类型映射表」,将每个原型角色对应到渲染器幻灯片类型。确保每张幻灯片都能制造好奇心缺口,并由下一张幻灯片填补。
  6. 对大纲执行「废话检测」——每张幻灯片是否都能通过?(见下文「质量关卡」)

Content Vertical Detection (Topic -> Theme)

内容垂直领域检测(主题→风格)

Analyze the topic and auto-select the renderer theme:
Content VerticalKeywords/SignalsRenderer ThemeBackground Style
Tech / AI / CodingAI, code, developer, API, tools, stack, programming, SaaS, data
dark
gradient
(default)
Business / Strategygrowth, revenue, startup, founder, marketing, sales, strategy, scale
earth
gradient
Education / How-Tolearn, tutorial, guide, roadmap, beginner, master, course, how to
clean
gradient
Creative / Designdesign, UX, brand, visual, aesthetic, portfolio, creative
dark
gradient_mesh
Mindset / Philosophymindset, habits, productivity, stoic, growth, mental, philosophy
warm
gradient
If the user specifies a brand config with a theme, always use that instead.
分析主题并自动选择渲染器主题:
内容垂直领域关键词/信号渲染器主题背景样式
科技/AI/编程AI、code、developer、API、tools、stack、programming、SaaS、data
dark
gradient
(默认)
商业/战略growth、revenue、startup、founder、marketing、sales、strategy、scale
earth
gradient
教育/教程learn、tutorial、guide、roadmap、beginner、master、course、how to
clean
gradient
创意/设计design、UX、brand、visual、aesthetic、portfolio、creative
dark
gradient_mesh
心态/哲学mindset、habits、productivity、stoic、growth、mental、philosophy
warm
gradient
如果用户指定了包含主题的品牌配置,则始终优先使用该配置。

Content Category Selection (10 Categories, Aristotelian)

内容类别选择(10类,基于亚里士多德理论)

Each category has unique visual DNA derived from psychology axioms (Cialdini, cognitive load theory, dual coding, serial position effect). Select based on topic:
If the topic is about...CategoryArc ShapeHook StylePrimary Cialdini
Explaining a research paper
paper_decoder
RevelatoryFace + paper panelAuthority
Comparing AI tools/models
tool_showdown
DivergentMulti-screenshot face-offSocial Proof
Today's AI development
breaking_news
ConvergentNews-editorial faceScarcity
Step-by-step AI tool how-to
tool_tutorial
LinearPhone-in-hand / device mockupReciprocity
Controversial opinion
hot_take
ConfrontationalBold abstract typographyAuthority
Copy-paste prompts/templates
prompt_playbook
DivergentPhone screenshot mockupReciprocity
Complete sector overview
industry_map
DivergentMulti-person face-offAuthority
Build [X] with AI project
build_this
Linear+RevealMulti-device result showcaseSocial Proof
Funding/business news
founders_money
ConvergentFounder portrait + dataScarcity
Future predictions/timeline
future_scenario
RevelatoryAbstract cinematic AI imageryScarcity
Universal Psychology Rules (apply to ALL categories):
  • Max 4 information chunks per slide (Cognitive Load Theory, Sweller)
  • Pattern interrupt every 2-3 slides (diagram, comparison, color shift, or layout change)
  • Density wave: H-M-H-M-H-H-M (never 3 high-density slides consecutively)
  • Synthesis slide = THE save trigger (Serial Position Effect: last items remembered best)
  • Dual-code the hardest concept (Paivio: visual + text = 6.5x retention)
  • CTA matches save trigger: utility categories → "Save this", social categories → "Share/Comment"
Category-to-Slide-Sequence Quick Reference:
  • paper_decoder
    (9 slides): hook → body → diagram → body → body → diagram → body → synthesis → cta
  • tool_showdown
    (8 slides): hook → body → comparison → body → body → comparison → synthesis → cta
  • breaking_news
    (8 slides): hook → body → body → body → diagram → body → synthesis → cta
  • tool_tutorial
    (8 slides): hook → body → tooltooltool → body → synthesis → cta
  • hot_take
    (7 slides): hook → body → body → body → body → synthesis → cta (text-driven, no diagrams)
  • prompt_playbook
    (9 slides): hook → body → body → body → comparison → body → body → synthesis → cta
  • industry_map
    (9 slides): hook → diagram → body → body → body → comparisondiagram → synthesis → cta
  • build_this
    (8 slides): hook → body → tooltooltool → body → synthesis → cta
  • founders_money
    (7 slides): hook → body → body → body → diagram → synthesis → cta
  • future_scenario
    (8 slides): hook → body → body → diagram → body → body → synthesis → cta
每个类别都有源自心理学公理(西奥迪尼、认知负荷理论、双重编码、系列位置效应)的独特视觉基因。根据主题选择:
若主题关于...类别弧型钩子风格核心西奥迪尼原则
解读研究论文
paper_decoder
揭秘型人物+论文面板权威性
对比AI工具/模型
tool_showdown
发散型多截图对决社会认同
最新AI动态
breaking_news
收敛型新闻编辑式人物稀缺性
AI工具分步教程
tool_tutorial
线性型手持设备/设备样机互惠性
争议性观点
hot_take
对抗型醒目抽象排版权威性
可复制的提示词/模板
prompt_playbook
发散型手机截图样机互惠性
完整行业概览
industry_map
发散型多人物对决权威性
用AI构建[X]项目
build_this
线性+揭秘型多设备成果展示社会认同
融资/商业新闻
founders_money
收敛型创始人肖像+数据稀缺性
未来预测/时间线
future_scenario
揭秘型抽象电影级AI图像稀缺性
通用心理学规则(适用于所有类别)
  • 每张幻灯片最多4个信息块(认知负荷理论,斯韦勒)
  • 每2-3张幻灯片设置一次模式中断(图表、对比、颜色变化或布局更改)
  • 密度波动:高-中-高-中-高-高-中(绝不要连续3张高密度幻灯片)
  • 总结幻灯片=核心保存触发点(系列位置效应:最后内容记忆最深刻)
  • 最难的概念采用双重编码(派维奥:视觉+文本=6.5倍留存率)
  • 行动号召(CTA)匹配保存触发点:实用类→「保存此内容」,社交类→「分享/评论」
类别到幻灯片序列快速参考
  • paper_decoder
    (9张):钩子→主体→图表→主体→主体→图表→主体→总结→CTA
  • tool_showdown
    (8张):钩子→主体→对比→主体→主体→对比→总结→CTA
  • breaking_news
    (8张):钩子→主体→主体→主体→图表→主体→总结→CTA
  • tool_tutorial
    (8张):钩子→主体→工具工具工具→主体→总结→CTA
  • hot_take
    (7张):钩子→主体→主体→主体→主体→总结→CTA(纯文本,无图表)
  • prompt_playbook
    (9张):钩子→主体→主体→主体→对比→主体→主体→总结→CTA
  • industry_map
    (9张):钩子→图表→主体→主体→主体→对比图表→总结→CTA
  • build_this
    (8张):钩子→主体→工具工具工具→主体→总结→CTA
  • founders_money
    (7张):钩子→主体→主体→主体→图表→总结→CTA
  • future_scenario
    (8张):钩子→主体→主体→图表→主体→主体→总结→CTA

Role-to-SlideType Mapping

角色到幻灯片类型映射

Map each archetype role to a renderer slide type when building the carousel spec:
Archetype RoleRenderer Slide TypeNotes
hook
hook
Use
title
+
title_highlight
for split title effect
intro
,
context
,
reveal
,
before
body
Use
title_highlight
for the key phrase
step
,
component
,
layer
,
shift
,
evidence
,
action
body
Use
bullets
for key points
item
body
Use
title_highlight
for the item name,
bullets
for details
diagram
,
connection
diagram
Use
diagram_nodes
with
vertical
or
horizontal
layout
contrast
,
reframe
comparison
Use
columns
with opposing views
result
,
after
,
lesson
,
implication
body
Use title highlight to emphasize the key outcome
synthesis
synthesis
Use
points[]
for numbered key takeaways
cta
cta
Use
handle
,
cta_text
, optional
stats[]
bonus
,
pitfalls
,
prediction
body
Use bullets for listed points
构建轮播图规格时,将每个原型角色对应到渲染器幻灯片类型:
原型角色渲染器幻灯片类型说明
hook
hook
使用
title
+
title_highlight
实现标题拆分效果
intro
context
reveal
before
body
对关键词使用
title_highlight
step
component
layer
shift
evidence
action
body
对关键点使用
bullets
item
body
对项目名称使用
title_highlight
,详情使用
bullets
diagram
connection
diagram
使用
diagram_nodes
搭配
vertical
horizontal
布局
contrast
reframe
comparison
使用
columns
展示对立观点
result
after
lesson
implication
body
使用标题高亮强调关键成果
synthesis
synthesis
使用
points[]
展示编号核心要点
cta
cta
使用
handle
cta_text
,可选
stats[]
bonus
pitfalls
prediction
body
使用列表展示要点

PHASE 1.5: VISUAL STRATEGY DECISION (Before Writing Content)

阶段1.5:视觉策略决策(内容创作前)

Before writing any content, decide the visual strategy for this carousel. You have access to multiple tools -- choose the right ones for the topic.
在撰写任何内容之前,确定此轮播图的视觉策略。你可使用多种工具——为主题选择合适的工具。

Available Visual Tools Inventory

可用视觉工具清单

ToolWhat It DoesWhen to UseHow to Invoke
AI Cinematic ImagesHD photorealistic/artistic images (Gemini 3 Pro)Hook/CTA backgrounds, emotional priming, conceptual anchoring
generate-image
skill with hyper-detailed prompt (50+ words)
AI Flowcharts/DiagramsProduction-quality flowcharts with text labels, arrows, boxesProcess flows, pipelines, decision trees -- REPLACES TikZ for better visuals
generate-image
skill with structural prompt describing boxes + connections
AI Architecture DiagramsBlueprint-style system diagrams with components and connectionsMicroservices, tech stacks, system design
generate-image
skill with component/connection prompt
AI Infographics/ChartsBar charts, data visualizations with accurate labels and proportionsMarket data, statistics, comparisons
generate-image
skill with data + style description
AI Abstract BackgroundsNeural networks, geometric patterns, cosmic visualsSlide backgrounds via
ai_bg
generate-image
skill with atmosphere/material prompt
TikZ DiagramsVector flowcharts in LaTeX (basic but reliable)Simple 3-5 node flows where AI image gen is overkillUse
diagram
slide type with
diagram_nodes
Gradient BackgroundsTikZ-rendered gradient fills with geometric accentsDefault for all text-only slidesSet
bg_style: "gradient"
in slide data
工具功能使用场景调用方式
AI电影级图像高清写实/艺术风格图像(Gemini 3 Pro)钩子/CTA背景、情绪铺垫、概念锚定使用
generate-image
工具,搭配超详细提示词(50+词)
AI流程图/图表生产级流程图,含文本标签、箭头、框流程、管道、决策树——替代TikZ,视觉效果更佳使用
generate-image
工具,搭配描述框+连接关系的结构化提示词
AI架构图蓝图风格系统图,含组件与连接关系微服务、技术栈、系统设计使用
generate-image
工具,搭配组件/连接关系提示词
AI信息图/图表柱状图、数据可视化,含精准标签与比例市场数据、统计、对比使用
generate-image
工具,搭配数据+风格描述提示词
AI抽象背景神经网络、几何图案、宇宙视觉效果通过
ai_bg
设置幻灯片背景
使用
generate-image
工具,搭配氛围/材质提示词
TikZ图表LaTeX矢量流程图(基础但可靠)简单3-5节点流程,AI图像生成过于冗余时使用使用
diagram
幻灯片类型搭配
diagram_nodes
渐变背景TikZ渲染的渐变填充,含几何装饰所有纯文本幻灯片的默认选择在幻灯片数据中设置
bg_style: "gradient"

CRITICAL: Slide-Type Visual Rule (Experimentally Verified)

关键:幻灯片类型视觉规则(经实验验证)

This rule was established through controlled A/B experiments (7 strategies, same content, scored 1-10). It overrides gut instinct:
Slide TypeVisual StrategyWHY (Experimental Evidence)
Hook
ai_bg
full-bleed + 0.60-0.68 overlay
Scroll-stopping power. First slide = 80% of engagement. Score: 8.0/10
BodyTEXT-ONLY. No images.Images on body slides destroy 40% of content space. Text-only scored 8.3/10 vs 5.7/10 with images
DiagramAI-generated diagram as
ai_bg
(preferred) OR TikZ fallback
Gemini 3 Pro generates production-quality flowcharts with readable labels, arrows, and boxes. Far more visually striking than basic TikZ. Use
ai_bg
+ 0.55-0.65 overlay so text remains readable over the diagram.
SynthesisText-onlySave-worthy reference material. Images would reduce information density.
CTA
ai_bg
full-bleed + 0.65-0.70 overlay
Emotional close with visual punch.
DO NOT put AI images on body slides. This was the single biggest quality mistake found in testing. DO NOT use browser screenshots on any slides. They always look terrible embedded in carousel slides.
此规则通过受控A/B实验确立(7种策略,相同内容,评分1-10)。它优先于直觉判断:
幻灯片类型视觉策略实验依据
钩子
ai_bg
全屏覆盖 + 0.60-0.68透明度叠加层
停滑能力。第一张幻灯片决定80%的互动率。评分:8.0/10
主体纯文本,无图像主体幻灯片添加图像会占用40%的内容空间。纯文本评分:8.3/10
图表AI生成图表作为
ai_bg
(优先选择)或TikZ备选
Gemini 3 Pro可生成带清晰标签、箭头和框的生产级流程图,视觉效果远优于基础TikZ。使用
ai_bg
+0.55-0.65透明度叠加层,确保文本可读
总结纯文本值得保存的参考资料。添加图像会降低信息密度
CTA
ai_bg
全屏覆盖 + 0.65-0.70透明度叠加层
情绪收尾,视觉冲击力强
请勿在主体幻灯片中添加AI图像。这是测试中发现的最大质量问题。 请勿在任何幻灯片中使用浏览器截图。嵌入轮播图时,截图效果始终很差。

Visual Strategy Decision Matrix (Topic-Level)

视觉策略决策矩阵(主题级)

For each topic, determine the primary visual mode, background style, and which slide-level visuals to use:
Topic TypeBackground StyleHook VisualBody VisualsDiagram StrategyExample
Philosophy / Mindset
gradient
AI image: symbolic figureNone (text carries weight)AI-generated concept mapStoic principles: marble bust + storm
Tool Review / SaaS
gradient
or
gradient_mesh
AI image: abstract tech glowNone (text-only bullets describe tools)AI-generated comparison chart"6 AI Tools": text descriptions + AI chart
News / Current Events
gradient
AI image: dramatic sceneNone (text with citations)AI-generated timeline or power map"AI War 2025": cinematic + AI power map
Technical Tutorial
gradient
(clean)
AI image: conceptual diagramNone (step-by-step text)AI-generated architecture/flowchart"Deploy with Docker": AI architecture diagram
Business / Strategy
gradient
AI image: bold abstractNone (text with real data citations)AI-generated bar chart or funnel"Growth Hacking": AI infographic
Comparison / Versus
gradient_mesh
AI image: abstract contrast
comparison
slide type columns
AI-generated side-by-side chart"React vs Vue": comparison columns + AI chart
Creative / Design
gradient_mesh
(dark)
AI image: artistic/gallery qualityNone (text-only)AI-generated process flow"UX Trends 2025": artistic + AI flow
Framework / Mental Model
gradient
AI image: system metaphorNone (text explains components)AI-generated flowchart (preferred over TikZ)"OODA Loop": AI flowchart as
ai_bg
Data / Research
gradient
AI image: data visualization conceptNone (text with specific numbers)AI-generated bar chart / infographic"AI Market 2025": AI bar chart
针对每个主题,确定主要视觉模式、背景样式及幻灯片级视觉方案:
主题类型背景样式钩子视觉主体视觉图表策略示例
哲学/心态
gradient
AI图像:象征性人物无(文本承载核心)AI生成概念图斯多葛原则:大理石雕像+风暴
工具评测/SaaS
gradient
gradient_mesh
AI图像:抽象科技光效无(纯文本列表描述工具)AI生成对比图表"6款AI工具":文本描述+AI图表
新闻/时事
gradient
AI图像:戏剧性场景无(带引用的文本)AI生成时间线或权力图"2025年AI大战":电影级图像+AI权力图
技术教程
gradient
(简洁风格)
AI图像:概念图无(分步文本)AI生成架构/流程图"用Docker部署":AI架构图
商业/战略
gradient
AI图像:醒目抽象图无(带真实数据引用的文本)AI生成柱状图或漏斗图"增长黑客":AI信息图
对比/对决
gradient_mesh
AI图像:抽象对比
comparison
幻灯片类型分栏
AI生成并列图表"React vs Vue":对比分栏+AI图表
创意/设计
gradient_mesh
(深色)
AI图像:艺术/画廊级质量无(纯文本)AI生成流程图"2025年UX趋势":艺术图像+AI流程图
框架/思维模型
gradient
AI图像:系统隐喻无(文本解释组件)AI生成流程图(优先于TikZ)"OODA循环":AI流程图作为
ai_bg
数据/研究
gradient
AI图像:数据可视化概念无(带具体数字的文本)AI生成柱状图/信息图"2025年AI市场":AI柱状图

AI Image Generation Best Practices

AI图像生成最佳实践

Model & Routing:
  • Use
    generate-image
    skill (uses
    AI_GATEWAY_API_KEY
    ). Nano-banana-pro requires
    GEMINI_API_KEY
    (often unset) but uses the same underlying model.
  • Model:
    google/gemini-3-pro-image-preview
    (primary). Fallback:
    google/gemini-3.1-flash-image-preview
    .
  • Output: ~1408x768 landscape. Overlay compensates for portrait stretch on slides.
Gemini 3 Pro Proven Capabilities (Experimentally Verified):
CapabilityQualityBest Use in CarouselsPrompt Strategy
Cinematic portraitsExcellentHook/CTA backgrounds50+ words: materials, lighting, composition, colors, atmosphere
Multi-image compositionExcellent (avg 9.6/10)Hook slides with real faces + screenshotsAristotelian axioms below. Send base64 to
/api/v1/images/generations
Screenshot → device mockupExcellentTool showcase, product launch slides"floating laptop/phone mockup, dark studio, reflective surface"
Person + screenshot editorialExcellentNews hooks with evidence"person as SUBJECT, screenshot as floating holographic EVIDENCE panel"
Multi-screenshot dashboardExcellentComparison/versus slides"floating panels at varied depths, color-coded edge glows, grid floor"
FlowchartsExcellentDiagram slides as
ai_bg
Describe boxes, arrows, labels, and connections structurally
Abstract backgroundsExcellentAny slide backgroundMaterials, colors, atmosphere, "no text no words"
模型与路由
  • 使用
    generate-image
    工具(需
    AI_GATEWAY_API_KEY
    )。nano-banana-pro需
    GEMINI_API_KEY
    (通常未设置),但使用相同底层模型。
  • 模型:
    google/gemini-3-pro-image-preview
    (首选)。备选:
    google/gemini-3.1-flash-image-preview
  • 输出:约1408x768横向。叠加层可补偿幻灯片纵向拉伸。
Gemini 3 Pro已验证能力(经实验验证):
能力质量轮播图最佳用途提示词策略
电影级肖像优秀钩子/CTA背景50+词:材质、光线、构图、颜色、氛围
多图像合成优秀(平均9.6/10)带真实人物+截图的钩子幻灯片遵循下方亚里士多德公理。将base64发送至
/api/v1/images/generations
截图→设备样机优秀工具展示、产品发布幻灯片"悬浮笔记本/手机样机,深色工作室,反光表面"
人物+截图编辑优秀带证据的新闻钩子"人物作为主体,截图作为悬浮全息证据面板"
多截图仪表盘优秀对比/对决幻灯片"悬浮面板,不同深度,彩色边缘光效,网格地面"
流程图优秀作为
ai_bg
的图表幻灯片
结构化描述框、箭头、标签及连接关系
抽象背景优秀任意幻灯片背景材质、颜色、氛围,"无文本无文字"

The 7 Aristotelian Axioms for Multi-Image Composition (Experimentally Proven)

多图像合成的7条亚里士多德公理(经实验验证)

These irreducible premises govern ALL multi-image prompts. Every prompt must satisfy all 7:
A1: VISUAL HIERARCHY -- Eye processes: faces > contrast edges > text > color fields. Composition must respect this order. A2: INPUT TYPE DETERMINES ROLE -- Each input has exactly one role:
  • Photo of person → SUBJECT (preserve face, never modify)
  • Screenshot/UI → EVIDENCE (float as holographic panel, stylize frame, preserve content)
  • Logo/brand → ANCHOR (small, consistent corner placement)
  • Abstract/texture → ATMOSPHERE (background only)
A3: UNIFIED LIGHT SOURCE -- All elements share one dominant light direction. Mixed lighting = instant "fake" detection. A4: DEPTH CREATES DRAMA -- Foreground sharp (subject), midground recessed (screenshots), background soft (atmosphere). 3 layers minimum. A5: NEGATIVE SPACE IS FUNCTIONAL -- Bottom 30-35% dark for text overlay. Not waste -- it's where the headline goes. A6: COLOR TEMPERATURE = STORY -- Cool blue/teal = innovation. Warm red = urgency. Split red/blue = competition. Mono + accent = editorial. A7: NO-TEXT SEAL -- Always end with "absolutely no text, no words, no letters, no watermarks" (outside screenshots).
这些不可简化的前提支配所有多图像提示词。每个提示词必须满足全部7条:
A1:视觉层次——视觉处理顺序:人脸>对比边缘>文本>色块。构图必须遵循此顺序。 A2:输入类型决定角色——每个输入仅有一个角色:
  • 人物照片→主体(保留人脸,绝不修改)
  • 截图/UI→证据(作为全息面板悬浮,风格化边框,保留内容)
  • Logo/品牌→锚点(小尺寸,固定角落位置)
  • 抽象/纹理→氛围(仅背景)
A3:统一光源——所有元素共享一个主光源方向。混合光源会立即被识别为「虚假」。 A4:深度创造戏剧性——前景清晰(主体),中景凹陷(截图),背景柔和(氛围)。至少3层。 A5:负空间具备功能性——底部30-35%区域深色处理,用于文本叠加。这不是浪费,而是标题区域。 A6:色温=叙事——冷蓝/青=创新。暖红=紧迫感。红蓝分割=竞争。单色+强调色=编辑风格。 A7:无文本封印——始终以「绝对无文本、无文字、无字母、无水印」结尾(截图内除外)。

Proven Scenario Prompt Templates (avg 9.6/10 across 10 tests)

已验证场景提示词模板(10次测试平均9.6/10)

Person + News Screenshot (9.5/10): "Image 1 is [person] -- preserve face, place in left 60%, dramatic side lighting. Image 2 is screenshot -- float as glowing translucent panel, tilted 8 degrees, recessed behind subject, cyan edge glow. Dark moody background, cinematic depth of field. Bottom 30% dark. No text outside screenshot."
Tool Screenshot Showcase (9/10): "Place screenshot on sleek floating laptop mockup angled 15 degrees. Dark gradient background, ambient teal glow from screen. Glossy reflective surface below. Premium Apple product launch aesthetic. No text outside screenshot."
Multi-Screenshot Dashboard (9.5/10): "Arrange as glowing panels floating in dark space, varied depths and angles (5-15 degrees). Largest centered. Color-coded edge glows. Grid floor, particle effects. Digital command center aesthetic. No text outside screenshots."
Person + Screenshots + Logo (10/10): "Person as dominant subject center-left, face preserved. Screenshots as holographic panels around them. Logo small in upper corner with glow. Volumetric light rays, 3-layer depth. No text outside screenshots/logo."
Face-Off + Data (10/10): "Person A on LEFT in profile facing right, red lighting. Person B on RIGHT facing left, blue lighting. Dashboard between them as floating holographic display. Smoke and sparks in the gap. Competitive energy. No text outside screenshot."
Phone in Hand (10/10): "Screenshot on smartphone held in hand from lower-right. Dark background, soft bokeh lights. Screen bright and crisp. Lifestyle photography style. No text outside screenshot."
5-Image Mega (10/10): "2 people (main foreground, secondary recessed) + 2 screenshots (holographic panels, color-coded glows) + logo (corner). Volumetric light, split lighting, multiple depth layers. No text outside screenshots/logo."
Prompt Rules:
  • HYPER-DETAILED (50+ words): materials, lighting, composition, colors, atmosphere. Generic = bad.
  • Always end with "absolutely no text, no words, no letters, no watermarks" -- AI models add unwanted text otherwise.
  • Declare each input's role explicitly (per A2): "Image 1 is a portrait... Image 2 is a screenshot..."
  • Specify depth map (per A4): "subject sharp foreground, screenshots floating midground, atmospheric background"
  • Lock light direction (per A3): "single dominant light from upper-left, rim light on subject"
  • For editorial portraits: add "ABSOLUTELY NO TEXT NO LOGOS NO MAGAZINE ELEMENTS" or Gemini creates TIME covers.
  • Overlay opacity sweet spot: 0.60-0.68 for hooks, 0.55-0.65 for diagrams, 0.65-0.70 for CTA.
人物+新闻截图(9.5/10):"图像1是[人物]——保留人脸,放置在左侧60%区域,戏剧性侧光。图像2是截图——作为发光半透明面板悬浮,倾斜8度,位于主体后方,青色边缘光效。深色氛围感背景,电影级景深。底部30%深色处理。截图外无文本。"
工具截图展示(9/10):"将截图放置在悬浮笔记本样机上,倾斜15度。深色渐变背景,屏幕发出环境青色光效。下方有光泽反光表面。高端苹果产品发布美学风格。截图外无文本。"
多截图仪表盘(9.5/10):"排列为悬浮在深色空间中的发光面板,不同深度和角度(5-15度)。最大面板居中。彩色边缘光效。网格地面,粒子效果。数字指挥中心美学风格。截图外无文本。"
人物+截图+Logo(10/10):"人物作为主要主体位于中左,保留人脸。截图作为全息面板环绕其周围。Logo小尺寸位于右上角,带光效。体积光,多层深度。截图/Logo外无文本。"
对决+数据(10/10):"人物A在左侧,面朝右侧,红色光线。人物B在右侧,面朝左侧,蓝色光线。仪表盘作为悬浮全息显示屏位于中间。缝隙中有烟雾和火花。竞争氛围。截图外无文本。"
手持手机(10/10):"截图显示在从右下角手持的智能手机上。深色背景,柔和散景灯光。屏幕明亮清晰。生活方式摄影风格。截图外无文本。"
5图合成(10/10):"2人(主要前景,次要背景)+2张截图(全息面板,彩色光效)+Logo(角落)。体积光,分割灯光,多层深度。截图/Logo外无文本。"
提示词规则
  • 超详细(50+词):材质、光线、构图、颜色、氛围。通用提示词=劣质结果。
  • 始终以「绝对无文本、无文字、无字母、无水印」结尾——否则AI模型会添加多余文本。
  • 明确声明每个输入的角色(依据A2):"图像1是肖像...图像2是截图..."
  • 指定深度图(依据A4):"主体清晰前景,截图悬浮中景,氛围感背景"
  • 锁定光源方向(依据A3):"单一主光源来自左上,主体轮廓光"
  • 编辑肖像:添加「绝对无文本无Logo无杂志元素」,否则Gemini会生成《时代》封面风格图像。
  • 叠加层透明度最佳值:钩子0.60-0.68,图表0.55-0.65,CTA0.65-0.70。

Background Style Selection

背景样式选择

Set
bg_style
in the carousel spec or per-slide
data
to control the look:
bg_style
Value
Visual ResultBest For
"gradient"
(default)
Top-to-bottom gradient with subtle accent glowAll themes. Clean, modern, professional
"texture"
AI-generated paper/fabric textureAVOID -- produces grey rock look
"gradient_mesh"
Multi-stop gradient with geometric accent shapesCreative, premium, high-contrast
"solid"
Flat theme background colorClean/education themes, data-heavy content
(AI background)Full-bleed AI image with overlayDramatic hooks, artistic carousels
Set it at the spec level for all slides:
"bg_style": "gradient"
in the spec JSON. Or per-slide for variation:
"data": {"bg_style": "gradient_mesh", ...}
on specific slides.
在轮播图规格或单张幻灯片
data
中设置
bg_style
以控制外观:
bg_style
视觉效果最佳场景
"gradient"
(默认)
从上到下渐变,带微妙强调光效所有主题。简洁、现代、专业
"texture"
AI生成纸张/织物纹理避免——产生灰暗效果
"gradient_mesh"
多色渐变,带几何强调形状创意、高端、高对比度
"solid"
纯色主题背景简洁/教育主题、数据密集型内容
(AI背景)全屏AI图像+叠加层戏剧性钩子、艺术风格轮播图
在规格级别设置以应用于所有幻灯片:在规格JSON中设置
"bg_style": "gradient"
。 或在单张幻灯片设置以实现变化:在特定幻灯片中设置
"data": {"bg_style": "gradient_mesh", ...}

Screenshot Capture Protocol -- DEPRECATED

截图捕获协议——已废弃

DO NOT use browser screenshots on carousel slides. They consistently look terrible -- low resolution, poorly framed, and badly integrated with the slide design. This was tested extensively and abandoned.
Instead: Use AI-generated images via Gemini 3 Pro for any visual needs:
  • Tool/product visuals: Generate an AI illustration or abstract representation
  • Data/charts: Generate AI bar charts or infographics (Gemini 3 Pro handles these well)
  • Architecture/flows: Generate AI flowcharts or architecture diagrams
  • People: Use text descriptions instead of photos
请勿在轮播图幻灯片中使用浏览器截图。它们始终效果糟糕——分辨率低、构图差、与幻灯片设计融合度差。经广泛测试后已弃用。
替代方案:使用Gemini 3 Pro生成的AI图像满足所有视觉需求:
  • 工具/产品视觉:生成AI插画或抽象表现
  • 数据/图表:生成AI柱状图或信息图(Gemini 3 Pro擅长此)
  • 架构/流程:生成AI流程图或架构图
  • 人物:使用文本描述替代照片

PHASE 2: CONTENT CREATION

阶段2:内容创作

  1. Write the hook (Slide 1) -- Apply the Hook Taxonomy. This slide determines everything.
  2. Write each slide -- One idea per slide. No exceptions. Apply the Bullshit Test to each.
  3. Map to renderer data format -- For each slide, create the JSON data object matching the slide type's required fields (see Data fields by slide type in RENDERING SCRIPTS).
  4. Execute the visual strategy decided in Phase 1.5:
    • Generate AI images per the 2-3 Rule (hook + 1-2 emotional peaks). State each image's telos.
    • Capture browser screenshots for any real tools/products/news referenced.
    • Set
      bg_style
      per the Background Style Selection table.
    • Use
      diagram
      slide type for any process/flow that benefits from a visual.
  5. Select Instagram music -- Apply the Music Decision Matrix (see MUSIC SELECTION).
  6. Write the caption -- Front-load value in first 2 lines. Include CTA and hashtags.
  7. Build the carousel spec JSON -- Assemble all slides into a single spec file for the orchestrator.
  1. 撰写钩子(第1张幻灯片)——应用钩子分类法。此幻灯片决定一切。
  2. 撰写每张幻灯片——每张一个观点,无例外。对每张执行「废话检测」。
  3. 映射到渲染器数据格式——为每张幻灯片创建符合幻灯片类型必填字段的JSON数据对象(见「渲染脚本」中的「按幻灯片类型划分的数据字段」)。
  4. 执行阶段1.5确定的视觉策略
    • 按2-3规则生成AI图像(钩子+1-2个情绪峰值)。明确每张图像的目的。
    • 为提及的任何真实工具/产品/新闻捕获浏览器截图(已废弃,建议使用AI图像)。
    • 根据「背景样式选择表」设置
      bg_style
    • 对任何受益于视觉展示的流程/使用
      diagram
      幻灯片类型。
  5. 选择Instagram音乐——应用「音乐选择矩阵」(见「音乐选择」)。
  6. 撰写配文——前2行突出价值。包含CTA和话题标签。
  7. 构建轮播图规格JSON——将所有幻灯片组装成单个规格文件,供编排器使用。

PHASE 3: VISUAL PRODUCTION (LaTeX Pipeline)

阶段3:视觉制作(LaTeX流程)

Use the LaTeX-based rendering pipeline for publication-grade output. This produces slides that match or exceed the quality of accounts with 1M+ followers (Chase AI, Analytics Vidhya, etc.).
The pipeline: LaTeX (TikZ) -> PDF (pdflatex) -> PNG (pdftoppm at 300 DPI) -> resize to 1080x1350
使用基于LaTeX的渲染流程生成出版级输出。产出的幻灯片质量匹配或超过拥有100万+粉丝的账号(如Chase AI、Analytics Vidhya等)。
流程:LaTeX(TikZ)→PDF(pdflatex)→PNG(pdftoppm 300 DPI)→调整为1080x1350

Why LaTeX (not Pillow/HTML)

为何选择LaTeX(而非Pillow/HTML)

  • Knuth-Plass optimal line breaking -- no ugly word wraps
  • Professional font kerning and ligatures -- Palatino with microtype
  • Native vector diagrams -- TikZ flow charts (fallback for simple diagrams)
  • AI image integration -- full-bleed Gemini 3 Pro images for hook/CTA/diagram backgrounds
  • Gradient backgrounds -- clean TikZ-rendered gradients for text-only slides
  • Publication-grade output -- the same engine that typesets academic papers and books
  • Knuth-Plass最优换行——无难看的单词换行
  • 专业字距调整与连字——Palatino字体搭配microtype
  • 原生矢量图表——TikZ流程图(简单图表备选)
  • AI图像集成——Gemini 3 Pro全屏图像作为钩子/CTA/图表背景
  • 渐变背景——TikZ渲染的简洁渐变,适用于纯文本幻灯片
  • 出版级输出——与学术论文和书籍排版相同的引擎

Step 3a: Generate AI Images (Hook, CTA, Diagrams)

步骤3a:生成AI图像(钩子、CTA、图表)

Generate AI images for hook background, CTA background, and optionally diagram backgrounds:
bash
undefined
为钩子背景、CTA背景及可选图表背景生成AI图像:
bash
undefined

Hook background (cinematic, hyper-detailed 50+ word prompt)

钩子背景(电影级,超详细50+词提示词)

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic cinematic split-screen composition: left side dark blue crystalline monolith with electric energy, right side warm golden organic neural network, clash of opposing forces, volumetric lighting, no text no words no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic cinematic split-screen composition: left side dark blue crystalline monolith with electric energy, right side warm golden organic neural network, clash of opposing forces, volumetric lighting, no text no words no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png

CTA background (emotional close)

CTA背景(情绪收尾)

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Mesmerizing cosmic portal with swirling deep indigo and purple energy, golden light rays, ethereal atmosphere, no text no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Mesmerizing cosmic portal with swirling deep indigo and purple energy, golden light rays, ethereal atmosphere, no text no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png

Diagram as AI image (optional -- replaces TikZ for better visuals)

图表AI图像(可选——替代TikZ,视觉效果更佳)

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Professional flowchart: Data Collection box connects to Processing box connects to Output box, clean white background, blue and grey, sharp vector style, readable labels"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
undefined
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Professional flowchart: Data Collection box connects to Processing box connects to Output box, clean white background, blue and grey, sharp vector style, readable labels"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
undefined

Step 3b: Render Slides with 7 Slide Types

步骤3b:使用7种幻灯片类型渲染幻灯片

The LaTeX renderer (
render_latex_slide.py
) supports 7 slide types:
TypeDescriptionBest For
hook
Large title + highlighted phrase + subtitleCover / first slide
body
Title + highlighted text + body + bulletsContent-heavy slides, curated list items
comparison
Multi-column comparison tableSide-by-side analysis
diagram
Title + TikZ flow diagram (vertical/horizontal)Architecture, workflows
synthesis
Styled numbered points with badgesSave-worthy summary
cta
Centered title + text + handle buttonCall to action
4 Color Themes:
warm
(parchment/terracotta),
clean
(white/blue),
dark
(indigo/purple),
earth
(sage/gold)
Slide 1 (Hook) -- Title with AI background:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type hook \
  --data tmp/carousel/hook_data.json \
  --output tmp/carousel/slide_01.png \
  --theme dark --brand tmp/carousel/brand.json
Where
hook_data.json
contains:
{"title": "6 AI Tools That Will", "title_highlight": "Replace Your Stack", "subtitle": "The tools 10x engineers are switching to.", "callout": "Save this!", "slide_num": 1, "total_slides": 8, "ai_bg": "tmp/carousel/hook_bg.png", "overlay_opacity": 0.63}
Body slides -- Content-heavy with bullets (gradient bg, NO texture):
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type body \
  --data tmp/carousel/body_data.json \
  --output tmp/carousel/slide_02.png \
  --theme dark --brand tmp/carousel/brand.json
Where
body_data.json
contains:
{"title": "Why Most Developers", "title_highlight": "Get This Wrong", "body": "The biggest mistake is...", "bullets": ["Point 1", "Point 2"], "slide_num": 2, "total_slides": 8, "bg_style": "gradient"}
NOTE: Always pass data as a JSON file path, never inline JSON. Always include
"bg_style": "gradient"
for text-only slides. Always pass
--brand
.
Comparison slide -- Multi-column:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type comparison \
  --data tmp/carousel/comparison_data.json \
  --output tmp/carousel/slide_04.png \
  --theme warm --brand tmp/carousel/brand.json
Where
comparison_data.json
contains:
{"title": "Claude vs GPT", "subtitle": "How they compare", "columns": [{"name": "Claude", "items": [{"label": "Best for", "value": "Complex refactors"}]}, {"name": "GPT-4", "items": [{"label": "Best for", "value": "Quick prototyping"}]}], "slide_num": 4, "total_slides": 9, "bg_style": "gradient"}
Diagram slide -- AI-generated diagram background (preferred) or TikZ fallback:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type diagram \
  --data tmp/carousel/diagram_data.json \
  --output tmp/carousel/slide_07.png \
  --theme dark --brand tmp/carousel/brand.json
Where
diagram_data.json
contains:
{"title": "The Architecture", "description": "How the tools connect.", "diagram_nodes": [{"label": "Code", "desc": "Write"}, {"label": "Deploy", "desc": "Ship"}, {"label": "Monitor", "desc": "Track"}], "diagram_type": "vertical", "slide_num": 7, "total_slides": 9, "ai_bg": "tmp/carousel/diagram_bg.png", "overlay_opacity": 0.60, "bg_style": "gradient"}
Synthesis slide -- Save-worthy numbered summary:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type synthesis \
  --data tmp/carousel/synthesis_data.json \
  --output tmp/carousel/slide_08.png \
  --theme dark --brand tmp/carousel/brand.json
Where
synthesis_data.json
contains:
{"title": "Your Stack", "points": ["Tool 1 for X", "Tool 2 for Y", "Tool 3 for Z"], "slide_num": 8, "total_slides": 9, "bg_style": "gradient"}
CTA slide -- with AI background for emotional close:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type cta \
  --data tmp/carousel/cta_data.json \
  --output tmp/carousel/slide_09.png \
  --theme dark --brand tmp/carousel/brand.json
Where
cta_data.json
contains:
{"title": "Want the full breakdown?", "cta_text": "Follow for daily tips.", "handle": "@yourbrand", "slide_num": 9, "total_slides": 9, "show_nav": false, "ai_bg": "tmp/carousel/cta_bg.png", "overlay_opacity": 0.67}
LaTeX渲染器(
render_latex_slide.py
)支持7种幻灯片类型:
类型描述最佳场景
hook
大标题+高亮短语+副标题封面/第一张幻灯片
body
标题+高亮文本+主体+列表内容密集型幻灯片、精选列表项
comparison
多栏对比表并列分析
diagram
标题+TikZ流程图(垂直/水平)架构、工作流
synthesis
带徽章的编号要点值得保存的总结
cta
居中标题+文本+账号按钮行动号召
4种颜色主题
warm
(羊皮纸/赤陶色)、
clean
(白/蓝)、
dark
(靛蓝/紫色)、
earth
(/sage/金色)
第1张(钩子)——带AI背景的标题:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type hook \
  --data tmp/carousel/hook_data.json \
  --output tmp/carousel/slide_01.png \
  --theme dark --brand tmp/carousel/brand.json
其中
hook_data.json
包含:
{"title": "6 AI Tools That Will", "title_highlight": "Replace Your Stack", "subtitle": "The tools 10x engineers are switching to.", "callout": "Save this!", "slide_num": 1, "total_slides": 8, "ai_bg": "tmp/carousel/hook_bg.png", "overlay_opacity": 0.63}
主体幻灯片——带列表的内容密集型幻灯片(渐变背景,无纹理):
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type body \
  --data tmp/carousel/body_data.json \
  --output tmp/carousel/slide_02.png \
  --theme dark --brand tmp/carousel/brand.json
其中
body_data.json
包含:
{"title": "Why Most Developers", "title_highlight": "Get This Wrong", "body": "The biggest mistake is...", "bullets": ["Point 1", "Point 2"], "slide_num": 2, "total_slides": 8, "bg_style": "gradient"}
注意:始终通过JSON文件路径传递数据,而非内联JSON。纯文本幻灯片始终包含
"bg_style": "gradient"
。始终传递
--brand
对比幻灯片——多栏:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type comparison \
  --data tmp/carousel/comparison_data.json \
  --output tmp/carousel/slide_04.png \
  --theme warm --brand tmp/carousel/brand.json
其中
comparison_data.json
包含:
{"title": "Claude vs GPT", "subtitle": "How they compare", "columns": [{"name": "Claude", "items": [{"label": "Best for", "value": "Complex refactors"}]}, {"name": "GPT-4", "items": [{"label": "Best for", "value": "Quick prototyping"}]}], "slide_num": 4, "total_slides": 9, "bg_style": "gradient"}
图表幻灯片——AI生成图表背景(优先)或TikZ备选:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type diagram \
  --data tmp/carousel/diagram_data.json \
  --output tmp/carousel/slide_07.png \
  --theme dark --brand tmp/carousel/brand.json
其中
diagram_data.json
包含:
{"title": "The Architecture", "description": "How the tools connect.", "diagram_nodes": [{"label": "Code", "desc": "Write"}, {"label": "Deploy", "desc": "Ship"}, {"label": "Monitor", "desc": "Track"}], "diagram_type": "vertical", "slide_num": 7, "total_slides": 9, "ai_bg": "tmp/carousel/diagram_bg.png", "overlay_opacity": 0.60, "bg_style": "gradient"}
总结幻灯片——值得保存的编号总结:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type synthesis \
  --data tmp/carousel/synthesis_data.json \
  --output tmp/carousel/slide_08.png \
  --theme dark --brand tmp/carousel/brand.json
其中
synthesis_data.json
包含:
{"title": "Your Stack", "points": ["Tool 1 for X", "Tool 2 for Y", "Tool 3 for Z"], "slide_num": 8, "total_slides": 9, "bg_style": "gradient"}
CTA幻灯片——带AI背景的情绪收尾:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type cta \
  --data tmp/carousel/cta_data.json \
  --output tmp/carousel/slide_09.png \
  --theme dark --brand tmp/carousel/brand.json
其中
cta_data.json
包含:
{"title": "Want the full breakdown?", "cta_text": "Follow for daily tips.", "handle": "@yourbrand", "slide_num": 9, "total_slides": 9, "show_nav": false, "ai_bg": "tmp/carousel/cta_bg.png", "overlay_opacity": 0.67}

Step 3d: Full Carousel Generation (Orchestrator)

步骤3d:完整轮播图生成(编排器)

Generate a complete carousel from a single JSON spec:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
  --spec carousel_spec.json \
  --output-dir outputs/carousel/ \
  --brand tmp/carousel/brand.json
The spec JSON format:
json
{
  "topic": "6 AI Tools That Will Replace Your Stack",
  "brand": "AI Builder",
  "theme": "dark",
  "bg_style": "gradient",
  "slides": [
    {"type": "hook", "data": {"title": "...", "title_highlight": "...", "ai_bg": "tmp/hook_bg.png", "overlay_opacity": 0.63}},
    {"type": "body", "data": {"title": "...", "bullets": ["..."], "bg_style": "gradient"}},
    {"type": "diagram", "data": {"title": "...", "diagram_nodes": [{"label": "...", "desc": "..."}], "diagram_type": "vertical", "ai_bg": "tmp/diagram_bg.png", "overlay_opacity": 0.60}},
    {"type": "synthesis", "data": {"title": "...", "points": ["..."], "bg_style": "gradient"}},
    {"type": "cta", "data": {"title": "...", "handle": "@brand", "ai_bg": "tmp/cta_bg.png", "overlay_opacity": 0.67}}
  ]
}
Spec-level
bg_style
applies to all slides. Per-slide
data.bg_style
overrides it. Options:
"gradient"
,
"gradient_mesh"
,
"solid"
. If omitted, defaults to
"gradient"
. Never use
"texture"
.
The orchestrator auto-injects brand name, slide numbering, renders all slides, and creates a preview grid.
从单个JSON规格生成完整轮播图:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
  --spec carousel_spec.json \
  --output-dir outputs/carousel/ \
  --brand tmp/carousel/brand.json
规格JSON格式:
json
{
  "topic": "6 AI Tools That Will Replace Your Stack",
  "brand": "AI Builder",
  "theme": "dark",
  "bg_style": "gradient",
  "slides": [
    {"type": "hook", "data": {"title": "...", "title_highlight": "...", "ai_bg": "tmp/hook_bg.png", "overlay_opacity": 0.63}},
    {"type": "body", "data": {"title": "...", "bullets": ["..."], "bg_style": "gradient"}},
    {"type": "diagram", "data": {"title": "...", "diagram_nodes": [{"label": "...", "desc": "..."}], "diagram_type": "vertical", "ai_bg": "tmp/diagram_bg.png", "overlay_opacity": 0.60}},
    {"type": "synthesis", "data": {"title": "...", "points": ["..."], "bg_style": "gradient"}},
    {"type": "cta", "data": {"title": "...", "handle": "@brand", "ai_bg": "tmp/cta_bg.png", "overlay_opacity": 0.67}}
  ]
}
**规格级
bg_style
**应用于所有幻灯片。单张幻灯片
data.bg_style
会覆盖它。选项:
"gradient"
"gradient_mesh"
"solid"
。若省略,默认
"gradient"
绝不使用
"texture"
编排器会自动注入品牌名称、幻灯片编号、渲染所有幻灯片并创建预览网格。

Brand Configuration System

品牌配置系统

The design system is fully generalized through brand configs -- JSON files that define visual identity per channel or brand. Pass
--brand brand.json
to any render command.
Brand config JSON format:
json
{
  "name": "TechStack AI",       // Brand name shown in header
  "logo": "path/to/logo.png",   // Optional: logo image replaces text in header
  "theme": "dark",              // Base theme: warm, clean, dark, earth
  "accent_override": "6366F1",  // Optional: override accent hex (no #)
  "font_serif": "newpxtext",    // LaTeX serif font package (default: Palatino)
  "header_style": "bold",       // Header text: italic, bold, or plain
  "nav_style": "circle",        // Navigation arrow: circle, arrow, none
  "divider_style": "line",      // Dividers: line, ornament (diamond), dots, none
  "corner_radius": "6pt"        // Rounded corner radius for labels/badges
}
3 sample brand configs (in
tmp/brands/
):
BrandThemeAccentHeaderDividerCharacter
TechStack AIdarkIndigo
6366F1
BoldLineModern dev/AI content
Growth AcademyearthAmber
B45309
ItalicOrnamentBusiness coaching
Code AcademycleanBlue (default)BoldDotsEducational tutorials
Usage with brand config:
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type hook \
  --data hook_data.json \
  --output slide.png \
  --theme dark \
  --brand brands/techstartup.json
AI Image Integration (Aristotelian Framework): Slides support two AI image zones:
  • ai_image
    : Accent illustration placed in a card (hook bottom, body bottom)
  • ai_bg
    : Full-bleed background with semi-transparent overlay for text readability
  • When no AI image is provided, decorative geometric accents fill empty space automatically
通过品牌配置JSON文件实现设计系统完全通用——定义各渠道或品牌的视觉标识。将
--brand brand.json
传递给任何渲染命令。
品牌配置JSON格式
json
{
  "name": "TechStack AI",       // 头部显示的品牌名称
  "logo": "path/to/logo.png",   // 可选:Logo图像替代头部文本
  "theme": "dark",              // 基础主题:warm、clean、dark、earth
  "accent_override": "6366F1",  // 可选:覆盖强调色十六进制值(无#)
  "font_serif": "newpxtext",    // LaTeX衬线字体包(默认:Palatino)
  "header_style": "bold",       // 头部文本:italic、bold或plain
  "nav_style": "circle",        // 导航箭头:circle、arrow、none
  "divider_style": "line",      // 分隔线:line、ornament(菱形)、dots、none
  "corner_radius": "6pt"        // 标签/徽章圆角半径
}
3个示例品牌配置(位于
tmp/brands/
):
品牌主题强调色头部分隔线风格
TechStack AIdark靛蓝
6366F1
BoldLine现代开发者/AI内容
Growth Academyearth琥珀
B45309
ItalicOrnament商业教练
Code Academyclean蓝色(默认)BoldDots教育教程
品牌配置使用示例
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type hook \
  --data hook_data.json \
  --output slide.png \
  --theme dark \
  --brand brands/techstartup.json
AI图像集成(亚里士多德框架):幻灯片支持两个AI图像区域:
  • ai_image
    :卡片中的强调插图(钩子底部、主体底部)
  • ai_bg
    :全屏背景,带半透明叠加层以保证文本可读性
  • 若未提供AI图像,装饰性几何元素会自动填充空白区域

AI Image Integration Principles (First-Principles Framework)

AI图像集成原则(第一性原理框架)

Images in carousels must serve a purpose (telos). Before generating any AI image, name its function in one sentence. If you cannot, do not generate it.
The Three Teloi (Purposes) of Carousel Images:
TelosWhen to UseImage FormExample
Emotional PrimingCreate a feeling before text is readAtmospheric, evocative, human/naturalMarble bust for philosophy, neon cityscape for tech
Conceptual AnchoringGive abstract ideas a visual handleSymbolic, metaphorical, illustrativeStorm figure for "amor fati", network diagram for systems
Authority SignalingEstablish credibility through proofDocumentary, screenshots, concreteProduct screenshot, data chart, real photo
The 2-3 Rule (Golden Mean): In an 8-10 slide carousel, use AI images on exactly 2-3 slides. Always the hook (slide 1) and CTA (last slide). Optionally the diagram slide with an AI-generated diagram as
ai_bg
. Never on body slides -- visual fatigue destroys reading rhythm and costs 40% content space.
Image Placement Decision Matrix:
Slide TypeAI Image?ZoneReasoning
hook
Always
ai_bg
(full-bleed + 0.60-0.68 overlay)
Scroll-stop power: atmospheric image + typography > typography alone (Axiom 1, 3)
body
Never--Text carries the weight; images destroy 40% content space for minimal gain
diagram
Preferred
ai_bg
(full-bleed + 0.55-0.65 overlay)
Gemini 3 Pro generates production-quality flowcharts with readable labels, arrows, and boxes -- far more visually striking than basic TikZ. TikZ remains as fallback for simple flows.
synthesis
Never--Numbered points ARE the content; keep text-only with gradient bg
cta
Always
ai_bg
(full-bleed + 0.65-0.70 overlay)
Emotional close: atmospheric image creates a feeling of resolution
Prompt Engineering for Consistency: All AI images in a single carousel MUST share a consistent style prefix. Build the prefix from the content vertical:
Content VerticalStyle Prefix for AI Image Prompts
Mindset/Philosophy"warm earthy tones, parchment cream, watercolor or classical art style, muted terracotta accents, editorial quality"
Tech/AI"dark indigo and purple tones, subtle geometric patterns, clean digital art, neon accents, futuristic"
Business/Strategy"warm amber and gold tones, bold professional graphics, rich depth, confident and energetic"
Education"clean white and blue tones, flat illustration style, precise and clear, minimal and modern"
Creative/Design"dark charcoal with bold accent colors, artistic and expressive, gallery quality, intentional composition"
Text Readability is Inviolable: If using
ai_bg
(full-bleed), overlay opacity must ensure WCAG AA contrast (4.5:1). Minimum
overlay_opacity: 0.55
. Proven ranges: hook 0.60-0.68, diagram 0.55-0.65, CTA 0.65-0.70.
What NOT to generate: Generic stock-photo-style images (people in offices, handshakes, generic landscapes). If the image could illustrate any topic, it fails the Telos Test.
轮播图中的图像必须有明确目的。生成任何AI图像之前,用一句话说明其功能。若无法说明,则不生成。
轮播图图像的三大目的
目的使用场景图像形式示例
情绪铺垫在阅读文本前营造氛围氛围感、唤起情绪、人文/自然哲学内容用大理石雕像,科技内容用霓虹城市景观
概念锚定为抽象概念提供视觉载体象征性、隐喻性、说明性「amor fati」用风暴人物,系统用网络图
权威信号通过证据建立可信度纪实、截图、具象产品截图、数据图表、真实照片
2-3规则(黄金法则):在8-10张幻灯片的轮播图中,仅在2-3张幻灯片中使用AI图像。始终包含钩子(第1张)和CTA(最后1张)。可选在图表幻灯片中使用AI生成图表作为
ai_bg
。绝不在主体幻灯片中使用——视觉疲劳会破坏阅读节奏,并占用40%的内容空间。
图像放置决策矩阵
幻灯片类型是否使用AI图像区域原因
hook
始终
ai_bg
(全屏+0.60-0.68叠加层)
停滑能力:氛围图像+排版>仅排版(公理1、3)
body
绝不--文本承载核心;图像占用40%内容空间,收益极小
diagram
优先
ai_bg
(全屏+0.55-0.65叠加层)
Gemini 3 Pro生成的生产级流程图带清晰标签、箭头和框——视觉效果远优于基础TikZ。TikZ作为简单流程备选。
synthesis
绝不--编号要点即为核心内容;保持纯文本+渐变背景
cta
始终
ai_bg
(全屏+0.65-0.70叠加层)
情绪收尾:氛围图像营造结束感
一致性提示词工程:单个轮播图中的所有AI图像必须共享一致的风格前缀。根据内容垂直领域构建前缀:
内容垂直领域AI图像提示词风格前缀
心态/哲学"暖色调,羊皮纸奶油色,水彩或古典艺术风格,柔和赤陶色强调,编辑级质量"
科技/AI"深靛蓝和紫色调,微妙几何图案,简洁数字艺术,霓虹强调,未来感"
商业/战略"暖琥珀和金色调,醒目专业图形,丰富深度,自信有活力"
教育"简洁白蓝调,扁平化插图风格,精准清晰,极简现代"
创意/设计"深炭黑+醒目强调色,艺术感表现力,画廊级质量,刻意构图"
文本可读性不可侵犯:若使用
ai_bg
(全屏),叠加层透明度必须确保WCAG AA对比度(4.5:1)。最小
overlay_opacity: 0.55
。已验证范围:钩子0.60-0.68,图表0.55-0.65,CTA0.65-0.70。
禁止生成:通用库存照片风格图像(办公室人物、握手、通用风景)。若图像可用于任何主题,则未通过「目的测试」。

AI Visual Generation (via generate-image skill)

AI视觉生成(通过generate-image工具)

Generate AI images for hook backgrounds, CTA backgrounds, and diagram visuals using the
generate-image
skill (requires
AI_GATEWAY_API_KEY
):
bash
undefined
使用
generate-image
工具为钩子背景、CTA背景及图表视觉生成AI图像(需
AI_GATEWAY_API_KEY
):
bash
undefined

Hook background -- cinematic, atmospheric, scroll-stopping

钩子背景——电影级、氛围感、停滑

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic cinematic split-screen composition, glowing neon circuits on dark background,
volumetric lighting, deep indigo and electric purple tones, no text, no words, no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic cinematic split-screen composition, glowing neon circuits on dark background,
volumetric lighting, deep indigo and electric purple tones, no text, no words, no letters"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_bg.png

CTA background -- emotional close

CTA背景——情绪收尾

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Abstract convergence of light streams on dark background, warm golden highlights,
sense of resolution and completeness, cinematic atmosphere, no text, no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Abstract convergence of light streams on dark background, warm golden highlights,
sense of resolution and completeness, cinematic atmosphere, no text, no words"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/cta_bg.png

Diagram as AI image (preferred over TikZ for complex flows)

图表AI图像(优先于TikZ,适用于复杂流程)

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Professional flowchart: Data Collection box connects to Processing box connects to Output box,
clean white boxes on dark blue background, arrows between nodes, minimal corporate design"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png

**Key rules**: Always add "no text, no words, no letters" unless the image IS a diagram with labels. Use hyper-detailed prompts (50+ words) for best results.
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Professional flowchart: Data Collection box connects to Processing box connects to Output box,
clean white boxes on dark blue background, arrows between nodes, minimal corporate design"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/diagram_bg.png
undefined

Viral Hook Compositing Pipeline (PIL)

病毒式钩子合成流程(PIL)

For viral-style hook slides matching accounts like @evolving.ai and @therundownai, use a two-step pipeline:
Step 1: Generate cinematic base image with Gemini 3 Pro (topic-specific, dramatic composition):
bash
undefined
为生成匹配@evolving.ai和@therundownai等账号的病毒式钩子幻灯片,使用两步流程:
步骤1:用Gemini 3 Pro生成电影级基础图像(主题相关,戏剧性构图):
bash
undefined

Multi-person composition (best for news/war/rivalry topics)

多人合成(最适用于新闻/战争/竞争主题)

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Cinematic photomontage: three powerful figures in dramatic formation,
center figure is a humanoid AI robot with glowing eyes, flanking figures
are business leaders in dark suits, red and blue dramatic lighting,
dark moody background, editorial magazine composition, hyper-detailed"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Cinematic photomontage: three powerful figures in dramatic formation,
center figure is a humanoid AI robot with glowing eyes, flanking figures
are business leaders in dark suits, red and blue dramatic lighting,
dark moody background, editorial magazine composition, hyper-detailed"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png

Single portrait (best for profile/biography/interview topics)

单人肖像(最适用于个人简介/传记/访谈主题)

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Editorial portrait: distinguished elder with glasses, warm ambient lighting,
slightly blurred conference background, shallow depth of field,
photojournalistic style, natural expression, cinematic color grading"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Editorial portrait: distinguished elder with glasses, warm ambient lighting,
slightly blurred conference background, shallow depth of field,
photojournalistic style, natural expression, cinematic color grading"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png

Face-off composition (best for comparison/versus topics)

对决构图(最适用于对比/对决主题)

python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic face-off: two opposing figures in profile facing each other,
one in cool blue lighting one in warm orange, city skyline between them,
energy effects and particles, dark cinematic atmosphere, epic confrontation"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png

**Step 2a: News-editorial style** (matches @therundownai -- single person, big headline):
```bash
python3 scripts/compose_news_hook.py \
  --base tmp/carousel/hook_base.png \
  --output tmp/carousel/slide_01_hook.png \
  --headline "OpenAI just hit $13B ARR making it the fastest-growing software company in history" \
  --category "AI NEWS" \
  --brand "@DailyAINews"
The
compose_news_hook.py
script (editorial style):
  • Subtle bottom gradient (ease-in, configurable start/strength)
  • Small category label above headline
  • MASSIVE bold headline (Inter Black, auto-sized 42-72px to fill bottom 35%)
  • Optional brand mark top-left
  • Clean, minimal -- no slide counter, no CTA, no subhead
  • Best for: single-person portrait + news headline
Step 2b: Multi-person viral style (compose_hook.py -- multi-person, full overlay):
bash
python3 scripts/compose_hook.py \
  --base tmp/carousel/hook_base.png \
  --output tmp/carousel/slide_01_hook.png \
  --headline "THE AI WAR JUST ESCALATED" \
  --subhead "3 moves that changed everything this week" \
  --brand "YOUR BRAND" \
  --category "AI NEWS"
The
compose_hook.py
script (viral style):
  • Bottom gradient overlay (0-220 alpha, ease-in curve) for text readability
  • Light top gradient for brand area
  • Category label (upper-left, e.g., "AI NEWS")
  • Brand watermark (centered)
  • Word-wrapped bold headline (bottom area, all-caps)
  • Optional subheadline
  • "SWIPE FOR MORE" CTA with decorative line
  • Slide counter (top-right, "1/8")
  • Best for: multi-person compositions, face-off style
Prompt Strategy by Topic Type:
Topic TypeBase Image StyleScore
News/current eventsMulti-person photomontage + robot8.5/10
Comparison/versusFace-off composition with opposing energy8.5/10
Profile/biographySingle editorial portrait8/10
Tools/abstractSilhouette with holographic/tech backdrop7.5/10
For educational/tutorial/framework topics, AI-generated compositions work excellently (8-8.5/10).
python3 ~/.claude/skills/generate-image/scripts/generate_image.py
"Dramatic face-off: two opposing figures in profile facing each other,
one in cool blue lighting one in warm orange, city skyline between them,
energy effects and particles, dark cinematic atmosphere, epic confrontation"
--model "google/gemini-3-pro-image-preview" --output tmp/carousel/hook_base.png

**步骤2a:新闻编辑风格**(匹配@therundownai——单人+大标题):
```bash
python3 scripts/compose_news_hook.py \
  --base tmp/carousel/hook_base.png \
  --output tmp/carousel/slide_01_hook.png \
  --headline "OpenAI just hit $13B ARR making it the fastest-growing software company in history" \
  --category "AI NEWS" \
  --brand "@DailyAINews"
compose_news_hook.py
脚本(编辑风格):
  • 微妙底部渐变(缓入,可配置起始/强度)
  • 标题上方的小类别标签
  • 超大粗体标题(Inter Black,自动调整42-72px以填充底部35%)
  • 可选左上角品牌标识
  • 简洁极简——无幻灯片计数器、无CTA、无副标题
  • 最佳场景:单人肖像+新闻标题
步骤2b:多人病毒式风格(compose_hook.py——多人+全叠加层):
bash
python3 scripts/compose_hook.py \
  --base tmp/carousel/hook_base.png \
  --output tmp/carousel/slide_01_hook.png \
  --headline "THE AI WAR JUST ESCALATED" \
  --subhead "3 moves that changed everything this week" \
  --brand "YOUR BRAND" \
  --category "AI NEWS"
compose_hook.py
脚本(病毒式风格):
  • 底部渐变叠加层(0-220透明度,缓入曲线)以保证文本可读性
  • 顶部浅渐变用于品牌区域
  • 类别标签(左上角,如"AI NEWS")
  • 品牌水印(居中)
  • 自动换行粗体标题(底部区域,全大写)
  • 可选副标题
  • "SWIPE FOR MORE" CTA,带装饰线
  • 幻灯片计数器(右上角,"1/8")
  • 最佳场景:多人合成、对决风格
按主题类型划分的提示词策略
主题类型基础图像风格评分
新闻/时事多人合成+机器人8.5/10
对比/对决带对立能量的对决构图8.5/10
个人简介/传记单人编辑肖像8/10
工具/抽象剪影+全息/科技背景7.5/10
教育/教程/框架主题:AI生成合成效果极佳(8-8.5/10)。

Real-Face Hook Pipeline (for news/current events topics)

真实人物钩子流程(适用于新闻/时事主题)

When the topic involves specific real people (Sam Altman, Elon Musk, Jensen Huang, etc.), use web-sourced Creative Commons photos instead of AI generation:
BEST Approach: Base64 multi-image via AI Gateway (10/10)
Send local photos as base64 data URIs to
/api/v1/images/generations
. This bypasses URL accessibility issues (Wikimedia blocked, etc.) and supports ALL local images including 3+ people.
python
import base64, json, os
from pathlib import Path
from urllib import request

API_KEY = os.environ["AI_GATEWAY_API_KEY"]
BASE = "https://ai-gateway.happycapy.ai/api/v1"  # NOT /openai/v1 !
当主题涉及特定真实人物(Sam Altman、Elon Musk、Jensen Huang等)时,使用网络来源的知识共享(Creative Commons)照片替代AI生成:
最佳方案:通过AI Gateway的Base64多图像合成(10/10)
将本地照片转换为Base64数据URI,发送至
/api/v1/images/generations
。此方法绕过URL可访问性问题(如维基媒体被屏蔽),支持3人以上合成。
python
import base64, json, os
from pathlib import Path
from urllib import request

API_KEY = os.environ["AI_GATEWAY_API_KEY"]
BASE = "https://ai-gateway.happycapy.ai/api/v1"  # 不是/openai/v1!

Load photos as base64 data URIs

将照片加载为Base64数据URI

images_b64 = [] for photo in ["elon_musk.jpg", "jensen_huang.jpg", "sam_altman.jpg"]: data = base64.b64encode(Path(photo).read_bytes()).decode() images_b64.append(f"data:image/jpeg;base64,{data}")
payload = { "model": "google/gemini-3-pro-image-preview", "prompt": "Create a dramatic face-off style composition with these three tech leaders. " "Confrontational layout, intense red vs blue split lighting, dark background " "with smoke/particle effects. Faces must remain photorealistic and recognizable.", "images": images_b64, "response_format": "url", "n": 1 }
req = request.Request( f"{BASE}/images/generations", data=json.dumps(payload).encode(), headers={ "Content-Type": "application/json", "Authorization": f"Bearer {API_KEY}", "Origin": "https://trickle.so" }, method="POST" ) with request.urlopen(req, timeout=180) as resp: result = json.loads(resp.read()) img_url = result["data"][0]["url"] # Download and save...

CRITICAL: Use `/api/v1/images/generations` (NOT `/api/v1/openai/v1/images/generations`). The OpenAI-prefixed endpoint rejects the `images` parameter.

**Alternative: transform_image.py with Flickr URLs (9.5/10)**

When photos are available at Flickr URLs (directly accessible by Vertex AI):

```bash
python3 ~/.claude/skills/generate-image/scripts/transform_image.py \
  "Create a dramatic cinematic photomontage combining these tech leaders. \
  Dark dramatic background with blue and red lighting. Keep faces EXACTLY as they appear." \
  "https://live.staticflickr.com/7832/33377877458_d1a3774615_b.jpg" \
  "https://live.staticflickr.com/5767/30796823531_85932ecaa0_b.jpg" \
  --model "google/gemini-3-pro-image-preview" \
  --output tmp/carousel/hook_base.png
Photo sourcing rules:
  • Use Creative Commons (CC BY 2.0+) photos from Flickr, Wikimedia Commons
  • Flickr URLs accessible by Vertex AI; Wikimedia URLs often blocked
  • Use
    urllib.request
    with browser User-Agent for Wikimedia downloads to local files
  • For local-only files (Wikimedia downloads), use the base64 approach above
  • Include CC attribution in carousel caption
Fallback: PIL rembg composite (7/10)
bash
pip install rembg  # One-time setup
images_b64 = [] for photo in ["elon_musk.jpg", "jensen_huang.jpg", "sam_altman.jpg"]: data = base64.b64encode(Path(photo).read_bytes()).decode() images_b64.append(f"data:image/jpeg;base64,{data}")
payload = { "model": "google/gemini-3-pro-image-preview", "prompt": "Create a dramatic face-off style composition with these three tech leaders. " "Confrontational layout, intense red vs blue split lighting, dark background " "with smoke/particle effects. Faces must remain photorealistic and recognizable.", "images": images_b64, "response_format": "url", "n": 1 }
req = request.Request( f"{BASE}/images/generations", data=json.dumps(payload).encode(), headers={ "Content-Type": "application/json", "Authorization": f"Bearer {API_KEY}", "Origin": "https://trickle.so" }, method="POST" ) with request.urlopen(req, timeout=180) as resp: result = json.loads(resp.read()) img_url = result["data"][0]["url"] # 下载并保存...

关键:使用`/api/v1/images/generations`(**不是**`/api/v1/openai/v1/images/generations`)。带OpenAI前缀的端点会拒绝`images`参数。

**替代方案:使用Flickr URL的transform_image.py(9.5/10)**

当照片可通过Flickr URL访问(Vertex AI可直接访问)时:

```bash
python3 ~/.claude/skills/generate-image/scripts/transform_image.py \
  "Create a dramatic cinematic photomontage combining these tech leaders. \
  Dark dramatic background with blue and red lighting. Keep faces EXACTLY as they appear." \
  "https://live.staticflickr.com/7832/33377877458_d1a3774615_b.jpg" \
  "https://live.staticflickr.com/5767/30796823531_85932ecaa0_b.jpg" \
  --model "google/gemini-pro-image-preview" \
  --output tmp/carousel/hook_base.png
照片来源规则
  • 使用Flickr、维基媒体共享的知识共享(CC BY 2.0+)照片
  • Flickr URL可被Vertex AI访问;维基媒体URL常被屏蔽
  • 下载维基媒体照片到本地时,使用
    urllib.request
    搭配浏览器用户代理
  • 本地文件(如维基媒体下载)使用上述Base64方法
  • 在轮播图配文中包含CC署名
备选:PIL rembg合成(7/10)
bash
pip install rembg  # 一次性安装

Remove backgrounds, composite onto AI background, apply compose_hook.py overlay

移除背景,合成到AI背景,应用compose_hook.py叠加层


**nano-banana-pro status:** The native google-genai SDK requires GEMINI_API_KEY (not set). The AI Gateway has no Gemini-native endpoint, so routing the SDK through the gateway fails (404). The base64 approach above achieves the same multi-image composition capability via the AI Gateway's image generation endpoint.

**nano-banana-pro状态**:原生google-genai SDK需`GEMINI_API_KEY`(未设置)。AI Gateway无Gemini原生端点,因此通过网关路由SDK会失败(404)。上述Base64方法通过AI Gateway的图像生成端点实现相同的多图像合成能力。

PHASE 4: MUSIC SELECTION

阶段4:音乐选择

Select from Instagram's available music library. Do NOT generate music. Apply the Music Decision Matrix to recommend 2-3 specific tracks the user can search for on Instagram.
从Instagram可用音乐库中选择。不生成音乐。应用「音乐决策矩阵」,推荐2-3首用户可在Instagram搜索到的具体曲目。

PHASE 5: QUALITY REVIEW & EXPORT

阶段5:质量审核与导出

Run the final checklist (see APPENDIX) against every slide. Re-render any slide that fails. Output:
  • 7-10 slide PNG images at 1080x1350
  • Caption text with hashtags
  • Music recommendation (Instagram track names + artists)
  • Posting notes (best time, engagement strategy)

对每张幻灯片执行最终检查清单(见附录)。重新渲染未通过的幻灯片。输出:
  • 7-10张1080x1350 PNG图像
  • 带话题标签的配文文本
  • 音乐推荐(Instagram曲目名称+艺术家)
  • 发布提示(最佳时间、互动策略)

THE 6 FOUNDATIONAL AXIOMS

6条基础公理

Every decision in this skill traces back to these irreducible premises:
本工具的所有决策均源自这些不可简化的前提:

AXIOM 1: Attention is Finite and Contested

公理1:注意力有限且竞争激烈

A human scrolling Instagram makes a stay-or-leave decision in ~1.3 seconds. The first slide is a survival test. Visual pattern interrupts trigger involuntary attention. Cognitive curiosity gaps (Zeigarnik effect) create forward momentum. The cost of starting to swipe is high; the cost of continuing is near-zero.
用户滚动Instagram时,约1.3秒内决定停留或离开。第一张幻灯片是生存测试。视觉模式中断触发非自愿注意力。认知好奇心缺口(蔡格尼克效应)创造前进动力。开始滑动的成本高,持续滑动的成本几乎为零。

AXIOM 2: Value is the Only Sustainable Currency

公理2:价值是唯一可持续的货币

Content that does not leave the viewer materially better off is noise. Save rate is the purest signal of value. Share rate = social currency. "Useful" is domain-specific.
无法让用户获得实质性提升的内容就是噪音。保存率是价值的最纯粹信号。分享率=社交货币。「有用」是领域特定的。

AXIOM 3: Visual Cognition Precedes Textual Cognition

公理3:视觉认知优先于文本认知

The brain processes visual information 60,000x faster than text. Color communicates emotion before words. Spatial hierarchy dictates reading order. Consistency creates cognitive fluency. One dominant visual per slide.
大脑处理视觉信息的速度比文本快60,000倍。颜色在文字之前传递情绪。空间层次决定阅读顺序。一致性创造认知流畅性。每张幻灯片一个主导视觉。

AXIOM 4: Narrative Arc is Hardwired

公理4:叙事弧是与生俱来的

Content structured as narrative is retained 22x better than lists. Each slide must resolve the previous curiosity gap AND create the next one. The arc must reach genuine resolution.
结构化叙事的内容留存率比列表高22倍。每张幻灯片必须填补上一个好奇心缺口,并制造下一个。叙事弧必须达到真正的收尾。

AXIOM 5: The Medium Constrains and Enables

公理5:媒介限制与赋能并存

1080x1350 canvas on a 6-inch screen in half-attention. Minimum readable font = 24px. Bottom ~15% occluded by UI. Portrait (4:5) occupies maximum screen real estate.
6英寸屏幕上的1080x1350画布,半注意力状态。最小可读字体=24px。底部约15%被UI遮挡。竖屏(4:5)占用最大屏幕空间。

AXIOM 6: Audio Creates Emotional Context

公理6:音频创造情绪语境

Music activates the limbic system independently. Instagram's algorithm rewards music usage with 15-30% more reach. Genre signals tribal identity. Trending audio boosts discovery if it genuinely fits.

音乐独立激活边缘系统。Instagram算法对使用音乐的内容给予15-30%的曝光提升。流派传递群体身份。趋势音频若契合内容,可提升发现率。

THE 7 CAROUSEL ARCHETYPES

7种轮播图原型

Auto-select the best archetype based on the topic. Each archetype has a specific slide structure, value test, and music profile.
根据主题自动选择最佳原型。每个原型有特定的幻灯片结构、价值测试和音乐配置。

1. TUTORIAL (How-To)

1. 教程(操作指南)

Slide 1: Problem statement (hook)
Slide 2: Tool/method introduction
Slide 3: Step 1 (with visual)
Slide 4: Step 2
Slide 5: Step 3
Slide 6: Step 4 (if needed)
Slide 7: Result / proof it works
Slide 8: Common mistakes to avoid
Slide 9: Quick-reference summary (save-worthy)
Slide 10: CTA
Value Test: Can the reader DO the thing after reading? Music Profile: Lo-fi/chillhop, 70-85 BPM, instrumental
幻灯片1:问题陈述(钩子)
幻灯片2:工具/方法介绍
幻灯片3:步骤1(带视觉)
幻灯片4:步骤2
幻灯片5:步骤3
幻灯片6:步骤4(若需要)
幻灯片7:结果/效果证明
幻灯片8:常见错误规避
幻灯片9:快速参考总结(值得保存)
幻灯片10:CTA
价值测试:读者阅读后能否完成操作? 音乐配置:Lo-fi/chillhop,70-85 BPM,纯器乐

2. FRAMEWORK (Mental Model)

2. 框架(思维模型)

Slide 1: Common problem everyone faces (hook)
Slide 2: Why existing approaches fail
Slide 3: The framework name + overview
Slide 4: Component 1 explained
Slide 5: Component 2 explained
Slide 6: Component 3 explained
Slide 7: How the components connect (diagram)
Slide 8: Practical application example
Slide 9: The complete framework visual (save-worthy)
Slide 10: CTA
Value Test: Does the reader now have a reusable thinking tool? Music Profile: Minimal electronic, 90-110 BPM, instrumental
幻灯片1:所有人都面临的常见问题(钩子)
幻灯片2:现有方法为何失效
幻灯片3:框架名称+概述
幻灯片4:组件1解读
幻灯片5:组件2解读
幻灯片6:组件3解读
幻灯片7:组件连接方式(图表)
幻灯片8:实际应用示例
幻灯片9:完整框架视觉(值得保存)
幻灯片10:CTA
价值测试:读者是否获得可复用的思维工具? 音乐配置:极简电子,90-110 BPM,纯器乐

3. MYTH-BUSTER (Contrarian Insight)

3. 谣言粉碎机(逆向洞察)

Slide 1: "Everyone thinks X" (hook)
Slide 2: "Here's what's actually happening"
Slide 3: Evidence 1
Slide 4: Evidence 2
Slide 5: Evidence 3
Slide 6: The real framework / truth
Slide 7: Implications
Slide 8: What to do instead
Slide 9: The mental model shift (save-worthy)
Slide 10: CTA
Value Test: Has the reader's mental model shifted? Music Profile: Trip-hop/downtempo, 85-100 BPM, instrumental
幻灯片1:「所有人都认为X」(钩子)
幻灯片2:「实际情况是这样」
幻灯片3:证据1
幻灯片4:证据2
幻灯片5:证据3
幻灯片6:真实框架/真相
幻灯片7:影响
幻灯片8:替代方案
幻灯片9:思维模型转变(值得保存)
幻灯片10:CTA
价值测试:读者的思维模型是否转变? 音乐配置:Trip-hop/downtempo,85-100 BPM,纯器乐

4. CASE STUDY (Proof-Based)

4. 案例研究(基于证据)

Slide 1: The result / shocking metric (hook)
Slide 2: The context / starting point
Slide 3: What was done (overview)
Slide 4: Step 1 of the process
Slide 5: Step 2
Slide 6: Step 3
Slide 7: The data / proof
Slide 8: Key insight
Slide 9: How you can replicate it (save-worthy)
Slide 10: CTA
Value Test: Is the specific mechanism replicable? Music Profile: Upbeat electronic, 110-120 BPM, light vocals OK
幻灯片1:结果/惊人数据(钩子)
幻灯片2:背景/起点
幻灯片3:实施内容(概述)
幻灯片4:流程步骤1
幻灯片5:步骤2
幻灯片6:步骤3
幻灯片7:数据/效果证明
幻灯片8:核心洞察
幻灯片9:复制方法(值得保存)
幻灯片10:CTA
价值测试:具体机制是否可复制? 音乐配置: upbeat电子,110-120 BPM,可带轻量人声

5. CURATED LIST (Resource Compilation)

5. 精选列表(资源汇总)

Slide 1: "X Tools/Resources for Y" (hook)
Slide 2: Item 1 + why it's valuable
Slide 3: Item 2 + why
Slide 4: Item 3 + why
Slide 5: Item 4 + why
Slide 6: Item 5 + why
Slide 7: Item 6 + why (if needed)
Slide 8: Item 7 + why (if needed)
Slide 9: Comparison / selection guide (save-worthy)
Slide 10: CTA
Value Test: Can the reader immediately use at least 3 of these? Music Profile: Chill beats/lo-fi, 75-90 BPM, instrumental
幻灯片1:「Y所需的X工具/资源」(钩子)
幻灯片2:项目1+价值说明
幻灯片3:项目2+价值说明
幻灯片4:项目3+价值说明
幻灯片5:项目4+价值说明
幻灯片6:项目5+价值说明
幻灯片7:项目6+价值说明(若需要)
幻灯片8:项目7+价值说明(若需要)
幻灯片9:对比/选择指南(值得保存)
幻灯片10:CTA
价值测试:读者能否立即使用至少3项资源? 音乐配置:Chill beats/lo-fi,75-90 BPM,纯器乐

6. DEEP DIVE (Technical Explanation)

6. 深度解析(技术说明)

Slide 1: The concept + why it matters (hook)
Slide 2: What most people get wrong
Slide 3: How it actually works (simplified)
Slide 4: Visual diagram / mechanism
Slide 5: Practical example 1
Slide 6: Practical example 2
Slide 7: Common mistakes
Slide 8: Pro tips
Slide 9: The complete mental model (save-worthy)
Slide 10: CTA
Value Test: Does the reader understand the mechanism, not just the surface? Music Profile: Ambient/atmospheric, 60-80 BPM, instrumental only
幻灯片1:概念+重要性(钩子)
幻灯片2:多数人理解错误的点
幻灯片3:实际工作原理(简化版)
幻灯片4:视觉图表/机制
幻灯片5:实际示例1
幻灯片6:实际示例2
幻灯片7:常见错误
幻灯片8:专业技巧
幻灯片9:完整思维模型(值得保存)
幻灯片10:CTA
价值测试:读者是否理解机制,而非仅表面内容? 音乐配置:Ambient/atmospheric,60-80 BPM,仅纯器乐

7. TRANSFORMATION (Before/After)

7. 转型(前后对比)

Slide 1: The "after" result (hook)
Slide 2: The "before" state / the pain
Slide 3: The discovery / turning point
Slide 4: The change in approach
Slide 5: Step 1 of the new way
Slide 6: Step 2
Slide 7: Step 3
Slide 8: The complete "after" state with proof
Slide 9: How to start your transformation (save-worthy)
Slide 10: CTA
Value Test: Can the reader see themselves in the transformation? Music Profile: Progressive/building, 80-120 BPM arc, light vocals OK

幻灯片1:「之后」的结果(钩子)
幻灯片2:「之前」的状态/痛点
幻灯片3:发现/转折点
幻灯片4:方法转变
幻灯片5:新方法步骤1
幻灯片6:步骤2
幻灯片7:步骤3
幻灯片8:完整「之后」状态+证明
幻灯片9:如何开启转型(值得保存)
幻灯片10:CTA
价值测试:读者能否在转型中看到自己的影子? 音乐配置:Progressive/building,80-120 BPM弧,可带轻量人声

THE 6 HOOK PATTERNS

6种钩子模式

The first slide determines everything. Select the best hook pattern for the topic:
第一张幻灯片决定一切。根据主题选择最佳钩子模式:

1. The Curiosity Gap

1. 好奇心缺口

"Claude Code has a memory problem. Here's how to fix it for free."
States a problem the audience recognizes + promises a solution. Optionally removes an objection ("for free", "in 5 minutes").
"Claude Code存在内存问题。这是免费解决方法。"
陈述受众认可的问题+承诺解决方案。可移除异议(如"免费"、"5分钟内")。

2. The Contrarian Statement

2. 逆向声明

"Stop using RAG. There's a better way."
Contradicts a common belief. Creates cognitive dissonance that demands resolution.
"停止使用RAG。有更好的方法。"
与普遍认知相悖。制造认知失调,迫使读者寻求答案。

3. The Specific Result

3. 具体结果

"This setup saved me 4 hours per week of prompt debugging."
Concrete numbers bypass the vague-promise filter. Specificity = credibility.
"此设置每周为我节省4小时的提示词调试时间。"
具体数字绕过模糊承诺过滤器。具体性=可信度。

4. The Analogy Bridge

4. 类比桥梁

"Your AI agent's memory works like a messy desk. Here's how to organize it."
Maps unfamiliar onto familiar. Creates instant comprehension.
"你的AI Agent内存就像杂乱的书桌。这是整理方法。"
将陌生概念映射到熟悉事物。瞬间理解。

5. The "You're Doing It Wrong"

5. 「你做错了」

"90% of developers use Claude Code wrong. Are you one of them?"
Identity-based challenge. Use sparingly -- dangerous if overused.
"90%的开发者错误使用Claude Code。你是其中之一吗?"
基于身份的挑战。谨慎使用——过度使用会引发反感。

6. The Stack / Combination

6. 组合/叠加

"Obsidian + Claude Code = unlimited AI memory"
Two known things combined unexpectedly. The "+" implies synergy.

"Obsidian + Claude Code = 无限AI内存"
将两个已知事物意外结合。「+」意味着协同效应。

THE BULLSHIT TEST (Mandatory Quality Gate)

废话检测(强制质量关卡)

Every single slide must pass ALL 3 conditions before rendering. No exceptions.
每张幻灯片在渲染前必须通过全部3项条件。无例外。

Condition 1: SPECIFICITY

条件1:具体性

Does this contain a concrete, actionable insight that could NOT be guessed by someone with zero domain knowledge?
  • FAIL: "Use the right tools for the job"
  • PASS: "Obsidian's graph view lets Claude Code traverse 10x more documents by following wiki-links between markdown files"
内容是否包含具体、可操作的洞察?领域外人士无法猜测到的内容?
  • 不通过:"使用适合的工具"
  • 通过:"Obsidian的图谱视图让Claude Code通过Markdown文件间的维基链接,遍历10倍多的文档"

Condition 2: NOVELTY

条件2:新颖性

Does this present a connection, framework, or technique the viewer has likely NOT encountered before?
  • FAIL: "AI is changing the world"
  • PASS: "By creating bidirectional links between your docs, you turn Claude Code's context window into a navigation system instead of a storage container"
内容是否呈现读者可能从未接触过的关联、框架或技巧?
  • 不通过:"AI正在改变世界"
  • 通过:"通过在文档间创建双向链接,你将Claude Code的上下文窗口从存储容器转变为导航系统"

Condition 3: DENSITY

条件3:密度

Could the same information be compressed further without loss of meaning? If yes, it is padded and needs to be tightened.
  • FAIL: "There are many benefits to using this approach, including several key advantages that make it worthwhile"
  • PASS: "3 benefits: 10x doc navigation, auto-linked memory, zero-config setup"
If a slide fails any condition, rewrite it before rendering.

相同信息能否进一步压缩而不丢失含义?若可以,则内容冗余,需要精简。
  • 不通过:"使用此方法有诸多好处,包括数个关键优势,值得一试"
  • 通过:"3个优势:10倍文档导航、自动链接内存、零配置设置"
若幻灯片未通过任何条件,渲染前重写。

VISUAL DESIGN SYSTEM

视觉设计系统

Typography Hierarchy

排版层次

ElementSizeWeightFont Type
Slide Title64-80pxBold/Black (700-900)Strong serif OR geometric sans
Subtitle / Hook32-40pxSemiBold (600)Same family as title
Body Text24-28pxRegular (400)Clean sans-serif
Bullet Points22-26pxRegular (400)Same as body
Labels / Citations16-20pxLight (300)Same as body
Slide Indicator14-16pxLight (300)Sans-serif
Rules:
  • Maximum 2 fonts per carousel
  • Title font and body font must pair well
  • NEVER go below 24px for any text the reader must understand
  • Consistent across ALL slides
元素尺寸字重字体类型
幻灯片标题64-80px粗体/特粗体(700-900)醒目衬线或几何无衬线
副标题/钩子32-40px半粗体(600)与标题同系列
主体文本24-28px常规(400)简洁无衬线
列表项22-26px常规(400)与主体同系列
标签/引用16-20px轻量(300)与主体同系列
幻灯片指示器14-16px轻量(300)无衬线
规则
  • 每个轮播图最多2种字体
  • 标题字体与主体字体必须搭配协调
  • 读者必须理解的文本,字号绝不低于24px
  • 所有幻灯片保持一致

Color Palettes by Content Vertical

按内容垂直领域划分的调色板

Tech / AI / Coding:
  • Background:
    #0D1117
    (deep dark) or
    #1A1A2E
    (midnight blue)
  • Primary text:
    #E6EDF3
    (near-white) or
    #F0F6FC
  • Accent:
    #7C3AED
    (electric purple) or
    #3B82F6
    (bright blue)
  • Secondary:
    #6B7280
    (muted gray)
Business / Strategy:
  • Background: Linear gradient
    #F97316
    to
    #EAB308
    (warm amber) or
    #FFF7ED
    (cream)
  • Primary text:
    #1C1917
    (near-black)
  • Accent:
    #DC2626
    (confident red) or
    #F59E0B
    (gold)
  • Secondary:
    #78716C
    (warm gray)
Education / How-To:
  • Background:
    #FFFFFF
    (clean white) or
    #F8FAFC
    (cool off-white)
  • Primary text:
    #0F172A
    (dark slate)
  • Accent:
    #2563EB
    (trust blue) or
    #0EA5E9
    (sky blue)
  • Secondary:
    #64748B
    (slate gray)
Design / Creative:
  • Background:
    #18181B
    (charcoal) or
    #FAFAFA
    (near-white)
  • Primary text: Inverse of background
  • Accent: ONE bold color (
    #EC4899
    magenta,
    #10B981
    emerald, or
    #F59E0B
    amber)
  • Secondary:
    #71717A
    (zinc)
Mindset / Growth:
  • Background:
    #F5F0EB
    (warm neutral) or
    #1B3A2D
    (forest dark)
  • Primary text:
    #2D2416
    (earth brown) or
    #E8E0D5
    (warm light)
  • Accent:
    #16A34A
    (forest green) or
    #B45309
    (amber earth)
  • Secondary:
    #8B7355
    (warm mid-tone)
科技/AI/编程
  • 背景:
    #0D1117
    (深黑)或
    #1A1A2E
    (午夜蓝)
  • 主文本:
    #E6EDF3
    (近白)或
    #F0F6FC
  • 强调色:
    #7C3AED
    (电光紫)或
    #3B82F6
    (亮蓝)
  • 次要色:
    #6B7280
    (灰)
商业/战略
  • 背景:线性渐变
    #F97316
    #EAB308
    (暖琥珀)或
    #FFF7ED
    (奶油色)
  • 主文本:
    #1C1917
    (近黑)
  • 强调色:
    #DC2626
    (醒目红)或
    #F59E0B
    (金色)
  • 次要色:
    #78716C
    (暖灰)
教育/操作指南
  • 背景:
    #FFFFFF
    (纯白)或
    #F8FAFC
    (冷白)
  • 主文本:
    #0F172A
    (深灰)
  • 强调色:
    #2563EB
    (信任蓝)或
    #0EA5E9
    (天蓝色)
  • 次要色:
    #64748B
    (灰)
设计/创意
  • 背景:
    #18181B
    (炭黑)或
    #FAFAFA
    (近白)
  • 主文本:与背景反色
  • 强调色:一种醒目颜色(
    #EC4899
    洋红、
    #10B981
    祖母绿或
    #F59E0B
    琥珀)
  • 次要色:
    #71717A
    (灰)
心态/成长
  • 背景:
    #F5F0EB
    (暖中性)或
    #1B3A2D
    (深绿)
  • 主文本:
    #2D2416
    (土棕)或
    #E8E0D5
    (暖白)
  • 强调色:
    #16A34A
    (森林绿)或
    #B45309
    (琥珀棕)
  • 次要色:
    #8B7355
    (暖灰)

Layout Rules

布局规则

  1. Canvas: 1080 x 1350 px (4:5 portrait) -- ALWAYS
  2. Margins: 60px minimum on all sides
  3. Safe Zone: Center 80% (top/bottom 10% may be occluded by Instagram UI)
  4. One idea per slide: If a slide has two ideas, split it into two slides
  5. Visual anchor: Every slide needs ONE dominant visual element
  6. Breathing room: Content should never feel cramped -- generous whitespace signals quality

  1. 画布:1080 x 1350 px(4:5竖屏)——始终
  2. 边距:所有侧边最小60px
  3. 安全区域:中间80%(顶部/底部10%可能被Instagram UI遮挡)
  4. 每张一个观点:若一张幻灯片有两个观点,拆分为两张
  5. 视觉锚点:每张幻灯片需要一个主导视觉元素
  6. 呼吸空间:内容绝不能拥挤——充足留白彰显质量

MUSIC SELECTION (Instagram Library)

音乐选择(Instagram库)

Do NOT generate music. Recommend specific tracks available on Instagram's music library.
不生成音乐。推荐Instagram音乐库中可搜索到的具体曲目。

Music Decision Matrix

音乐决策矩阵

Content TypeSearch Keywords on InstagramBPM RangeVocalsExample Tracks to Search
Tech / AI"lo-fi", "chill beats", "trip-hop"70-90NoDJ Shadow - Six Days, Nujabes - Aruarian Dance, Tycho - A Walk, Bonobo - Kerala
Business"indie electronic", "future bass"100-120MinimalODESZA - A Moment Apart, Rufus Du Sol - Innerbloom, Bicep - Glue
Tutorial"study beats", "chillhop", "acoustic"75-95NoIdealism - Lovely Day, Jinsang - Solitude, Tomppabeats - Monday Loop
Motivational"epic", "cinematic", "uplifting"110-130OptionalM83 - Midnight City, Hans Zimmer - Time, Illenium - Good Things Fall Apart
Creative"minimal techno", "ambient", "art"90-115NoFour Tet - Two Thousand and Seventeen, Jon Hopkins - Emerald Rush, Kiasmos - Blurred
Myth-Buster"dark ambient", "post-rock", "mysterious"80-100NoMassive Attack - Teardrop, Radiohead - Everything In Its Right Place, Portishead - Wandering Star
Case Study"upbeat", "indie pop", "electronic"110-125LightWashed Out - Feel It All Around, Toro y Moi - So Many Details, M83 - Wait
内容类型Instagram搜索关键词BPM范围人声搜索示例曲目
科技/AI"lo-fi"、"chill beats"、"trip-hop"70-90DJ Shadow - Six Days、Nujabes - Aruarian Dance、Tycho - A Walk、Bonobo - Kerala
商业"indie electronic"、"future bass"100-120少量ODESZA - A Moment Apart、Rufus Du Sol - Innerbloom、Bicep - Glue
教程"study beats"、"chillhop"、"acoustic"75-95Idealism - Lovely Day、Jinsang - Solitude、Tomppabeats - Monday Loop
励志"epic"、"cinematic"、"uplifting"110-130可选M83 - Midnight City、Hans Zimmer - Time、Illenium - Good Things Fall Apart
创意"minimal techno"、"ambient"、"art"90-115Four Tet - Two Thousand and Seventeen、Jon Hopkins - Emerald Rush、Kiasmos - Blurred
谣言粉碎机"dark ambient"、"post-rock"、"mysterious"80-100Massive Attack - Teardrop、Radiohead - Everything In Its Right Place、Portishead - Wandering Star
案例研究"upbeat"、"indie pop"、"electronic"110-125轻量Washed Out - Feel It All Around、Toro y Moi - So Many Details、M83 - Wait

Music Selection Rules

音乐选择规则

  1. Text-heavy carousels: ALWAYS instrumental only (vocals compete with reading)
  2. Visual-heavy carousels: Vocals acceptable (separate processing channels)
  3. Trending audio: Use ONLY if it genuinely fits the content type. Mismatched trending sounds damage authenticity
  4. Trending audio lifecycle: Discovery (Day 0-3, max boost) -> Growth (Day 3-14, good) -> Peak (Day 14-30, OK) -> Saturation (Day 30+, skip)
  5. Output format: Provide 2-3 track recommendations with artist name, track name, and why it fits

  1. 文本密集型轮播图:始终使用纯器乐(人声会干扰阅读)
  2. 视觉密集型轮播图:可使用带人声曲目(独立处理通道)
  3. 趋势音频:仅当真正契合内容类型时使用。不匹配的趋势音频会损害真实性
  4. 趋势音频生命周期:发现期(0-3天,最大曝光提升)→增长期(3-14天,良好)→峰值期(14-30天,尚可)→饱和期(30天以上,跳过)
  5. 输出格式:提供2-3首推荐曲目,包含艺术家名称、曲目名称及契合原因

CAPTION TEMPLATE

配文模板

[Hook line -- front-load value, must be compelling in first 2 lines before "...more"]

[2-3 sentences expanding the core value proposition]

[Key points:]
- Point 1 (specific, not vague)
- Point 2
- Point 3

[Specific CTA -- NOT "What do you think?" but rather a specific question or action]

[5-15 hashtags with distribution:]
[2-3 broad (100K-1M posts)] [3-5 niche (10K-100K)] [2-3 community (1K-10K)] [1-2 branded]

[钩子句——前2行突出价值,「查看更多」前必须引人注目]

[2-3句话扩展核心价值主张]

[核心要点:]
- 要点1(具体,不模糊)
- 要点2
- 要点3

[具体CTA——不是「你怎么看?」,而是具体问题或行动]

[5-15个话题标签,分布如下:]
[2-3个宽泛标签(10万-100万帖子)] [3-5个细分标签(1万-10万帖子)] [2-3个社区标签(1千-1万帖子)] [1-2个品牌标签]

INSTAGRAM ALGORITHM OPTIMIZATION

Instagram算法优化

  • Save Rate is the #1 signal. Design every carousel to be save-worthy. Include a synthesis/mental-model slide.
  • 10-slide carousels outperform shorter ones by ~30% in save rate
  • Dwell time: More slides = more time on post = algorithm reward
  • Music adds ~15-30% reach boost
  • Re-engagement: Instagram re-shows carousels to users who did not swipe all the way through
  • First hour: Posts saved within the first hour get exponential distribution
  • Hashtags: Put in caption (not first comment). 5-15 total.

  • 保存率是头号信号。设计每张轮播图以值得保存为目标。包含总结/思维模型幻灯片。
  • 10张幻灯片的轮播图保存率比短轮播图高约30%
  • 停留时间:幻灯片越多=帖子停留时间越长=算法奖励
  • 音乐提升约15-30%的曝光
  • 再互动:Instagram会向未滑完轮播图的用户重新展示内容
  • 首小时:首小时内被保存的帖子获得指数级分发
  • 话题标签:放在配文中(不是第一条评论)。总计5-15个。

RENDERING SCRIPTS

渲染脚本

render_latex_slide.py (PRIMARY RENDERER)

render_latex_slide.py(主渲染器)

Publication-grade LaTeX slide renderer. Produces 1080x1350 PNG slides using pdflatex + pdftoppm.
6 slide types:
hook
,
body
,
comparison
,
diagram
,
synthesis
,
cta
4 themes:
warm
,
clean
,
dark
,
earth
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type body \
  --data body_data.json \
  --output slide.png \
  --theme dark \
  --brand brand_config.json
Data fields by slide type:
  • hook:
    title
    ,
    title_highlight
    ,
    subtitle
    ,
    callout
    ,
    ai_bg
    ,
    overlay_opacity
    ,
    logos[]
  • body:
    title
    ,
    title_highlight
    ,
    body
    ,
    bullets[]
    ,
    bg_style
  • comparison:
    title
    ,
    subtitle
    ,
    columns[{name, items[{label, value}]}]
    ,
    bg_style
  • diagram:
    title
    ,
    description
    ,
    diagram_nodes[{label, desc}]
    ,
    diagram_type
    (vertical/horizontal),
    ai_bg
    ,
    overlay_opacity
  • synthesis:
    title
    ,
    points[]
    ,
    bg_style
  • cta:
    title
    ,
    cta_text
    ,
    handle
    ,
    stats[]
    ,
    ai_bg
    ,
    overlay_opacity
  • All types:
    slide_num
    ,
    total_slides
    ,
    show_nav
    ,
    ai_bg
    (full-bleed background),
    overlay_opacity
    ,
    bg_style
出版级LaTeX幻灯片渲染器。使用pdflatex+pdftoppm生成1080x1350 PNG幻灯片。
6种幻灯片类型
hook
body
comparison
diagram
synthesis
cta
4种主题
warm
clean
dark
earth
bash
python3 ~/.claude/skills/world-class-carousel/scripts/render_latex_slide.py \
  --type body \
  --data body_data.json \
  --output slide.png \
  --theme dark \
  --brand brand_config.json
按幻灯片类型划分的数据字段
  • hook
    title
    title_highlight
    subtitle
    callout
    ai_bg
    overlay_opacity
    logos[]
  • body
    title
    title_highlight
    body
    bullets[]
    bg_style
  • comparison
    title
    subtitle
    columns[{name, items[{label, value}]}]
    bg_style
  • diagram
    title
    description
    diagram_nodes[{label, desc}]
    diagram_type
    (vertical/horizontal)、
    ai_bg
    overlay_opacity
  • synthesis
    title
    points[]
    bg_style
  • cta
    title
    cta_text
    handle
    stats[]
    ai_bg
    overlay_opacity
  • 所有类型
    slide_num
    total_slides
    show_nav
    ai_bg
    (全屏背景)、
    overlay_opacity
    bg_style

generate_carousel.py (ORCHESTRATOR)

generate_carousel.py(编排器)

End-to-end carousel generation from a JSON spec. Handles slide numbering, rendering, and preview grid assembly.
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
  --spec carousel_spec.json \
  --output-dir outputs/carousel/ \
  --brand brand_config.json
从JSON规格端到端生成轮播图。处理幻灯片编号、渲染及预览网格组装。
bash
python3 ~/.claude/skills/world-class-carousel/scripts/generate_carousel.py \
  --spec carousel_spec.json \
  --output-dir outputs/carousel/ \
  --brand brand_config.json

AI Image Generation (via generate-image skill)

AI图像生成(通过generate-image工具)

Use the
generate-image
skill for all AI images (hook bg, CTA bg, diagram bg). See "AI Visual Generation" section above for examples.
bash
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
  "Your detailed prompt here, 50+ words, no text no words no letters" \
  --model "google/gemini-3-pro-image-preview" --output tmp/carousel/image.png
所有AI图像(钩子背景、CTA背景、图表背景)使用
generate-image
工具生成。见上文「AI视觉生成」部分示例。
bash
python3 ~/.claude/skills/generate-image/scripts/generate_image.py \
  "Your detailed prompt here, 50+ words, no text no words no letters" \
  --model "google/gemini-3-pro-image-preview" --output tmp/carousel/image.png

assemble_carousel.py (ASSEMBLY)

assemble_carousel.py(组装)

Validates 1080x1350, optimizes PNGs, creates preview grid, generates metadata JSON.
bash
python3 ~/.claude/skills/world-class-carousel/scripts/assemble_carousel.py \
  --input-dir tmp/carousel/ --output-dir outputs/carousel/ --optimize
验证1080x1350尺寸、优化PNG、创建预览网格、生成元数据JSON。
bash
python3 ~/.claude/skills/world-class-carousel/scripts/assemble_carousel.py \
  --input-dir tmp/carousel/ --output-dir outputs/carousel/ --optimize

render_slide.py (LEGACY - Pillow-based)

render_slide.py(旧版——基于Pillow)

Pillow-based renderer with 6 layout modes. Superseded by
render_latex_slide.py
for production use. Still available for quick prototyping without LaTeX dependencies.

基于Pillow的渲染器,含6种布局模式。
render_latex_slide.py
已取代它,用于生产环境。仍可用于无LaTeX依赖的快速原型。

10 WORLD-CLASS DIFFERENTIATORS

10个世界级差异化因素

Apply these to elevate from "good" to "world-class":
  1. Intellectual Density: One INSIGHT per slide, not just one idea (insight = non-obvious connection between two known things)
  2. Visual Craftsmanship: Every pixel intentional. Margins mathematical. Colors from a system.
  3. Hook Specificity: "I tested 1,247 prompts across 6 models" not "5 Tips for Better Prompts"
  4. Narrative Completeness: Each slide creates a question the next answers. Final slide ties back to hook.
  5. Proof Over Claims: Screenshots, before/after comparisons, specific metrics -- not "this is great"
  6. Typography as Design: The way words are sized, spaced, and placed tells the story VISUALLY
  7. Strategic Restraint: Know what to leave OUT. Negative space is a design choice.
  8. Music-Content Resonance: BPM matches reading pace. Genre signals the tribe.
  9. Save-Worthy Synthesis: Last content slide is a mental model / framework diagram worth saving
  10. Authentic Voice: Written as one expert talking to a colleague. Never "content creator voice."

应用这些因素,从「优秀」提升至「世界级」:
  1. 知识密度:每张一个洞察,而非一个观点(洞察=两个已知事物间的非显而易见关联)
  2. 视觉工艺:每个像素都有意图。边距精确。颜色源自系统。
  3. 钩子具体性:"我在6个模型上测试了1247条提示词"而非"5个提示词优化技巧"
  4. 叙事完整性:每张幻灯片制造下一张要回答的问题。最后一张幻灯片呼应钩子。
  5. 证据优先于主张:截图、前后对比、具体数据——而非"这很棒"
  6. 排版即设计:文字的尺寸、间距和放置方式视觉化讲述故事
  7. 战略性克制:知道该省略什么。留白是设计选择。
  8. 音乐-内容共鸣:BPM匹配阅读节奏。流派传递群体身份。
  9. 值得保存的总结:最后一张内容幻灯片是值得保存的思维模型/框架图表
  10. 真实语气:以专家对同事的语气撰写。绝不是「内容创作者语气」。

FINAL CHECKLIST

最终检查清单

Before delivering any carousel, verify ALL of these:
  • First slide passes the 1.3-second scroll-stop test
  • Every slide passes the Bullshit Test (Specific, Novel, Dense)
  • One idea per slide, no exceptions
  • Typography readable at mobile size (24px+ body text)
  • Color palette consistent across all slides
  • Narrative arc complete (tension -> resolution)
  • Each slide creates curiosity for the next
  • Last content slide has a save-worthy synthesis (mental model, framework, diagram)
  • CTA slide is clear and specific
  • Music recommendation matches content type and audience
  • Aspect ratio is 1080x1350 (4:5)
  • All content within safe zone (not occluded by UI)
  • Caption front-loads value in first 2 lines
  • 5-15 hashtags with proper distribution (broad + niche + community + branded)
  • Alt-text provided for accessibility
  • Every rendered slide visually inspected (no half-empty slides)
  • Synthesis title < 4 words
  • Synthesis points are flat strings, not dicts
  • JSON data passed via temp files, not inline in bash
  • Hook/CTA slides use
    ai_bg
    for visual topics; body slides stay text-only
  • AI images prompted with "no text" to prevent unwanted labels
  • Overlay opacity 0.60-0.68 for hooks, 0.65-0.70 for CTA
  • All KNOWN_ISSUES.md rules checked before delivery

交付任何轮播图前,验证所有以下项:
  • 第一张幻灯片通过1.3秒停滑测试
  • 每张幻灯片通过废话检测(具体、新颖、密集)
  • 每张一个观点,无例外
  • 排版在手机尺寸下可读(主体文本24px+)
  • 调色板所有幻灯片一致
  • 叙事弧完整(张力→收尾)
  • 每张幻灯片为下一张制造好奇心
  • 最后一张内容幻灯片是值得保存的总结(思维模型、框架、图表)
  • CTA幻灯片清晰具体
  • 音乐推荐匹配内容类型与受众
  • 宽高比为1080x1350(4:5)
  • 所有内容位于安全区域(未被UI遮挡)
  • 配文前2行突出价值
  • 5-15个话题标签,分布合理(宽泛+细分+社区+品牌)
  • 提供无障碍替代文本(Alt-text)
  • 所有渲染幻灯片经视觉检查(无半空白幻灯片)
  • 总结标题<4词
  • 总结要点为纯字符串,非字典
  • JSON数据通过临时文件传递,而非bash内联
  • 钩子/CTA幻灯片对视觉主题使用
    ai_bg
    ;主体幻灯片保持纯文本
  • AI图像提示词包含「无文本」,避免多余标签
  • 叠加层透明度:钩子0.60-0.68,CTA0.65-0.70
  • 交付前检查所有
    KNOWN_ISSUES.md
    规则

PHASE 6: LEARNING PROTOCOL (Post-Delivery)

阶段6:学习协议(交付后)

After every carousel delivery, update the skill's knowledge base. This system prevents repeating mistakes while staying compact.
每次轮播图交付后,更新工具知识库。此系统避免重复错误,同时保持精简。

Two-Tier Memory Architecture

双层内存架构

Tier 1:
KNOWN_ISSUES.md
(in this skill directory)
  • MAX 60 lines. Contains ONLY compressed, actionable rules.
  • Format: one-line rules grouped by category. No narratives, no session history.
  • When adding a new rule: check if it supersedes an existing rule. If yes, REPLACE the old rule. Never append duplicates.
  • Read this file at the START of every carousel session to avoid known pitfalls.
Tier 2:
session-archives/
directory
(in this skill directory)
  • Verbose session logs go here as timestamped files:
    session-archives/YYYY-MM-DD-topic.md
  • Include: full experiment data, scoring matrices, debug traces, before/after comparisons.
  • These files are NEVER loaded into context unless explicitly requested by the user.
  • They exist as raw data for future deep-dives, not as operational knowledge.
第一层:
KNOWN_ISSUES.md
(位于此工具目录)
  • 最多60行。仅包含压缩、可操作的规则。
  • 格式:按类别分组的单行规则。无叙事,无会话历史。
  • 添加新规则时:检查是否取代现有规则。若是,替换旧规则。绝不添加重复项。
  • 每次轮播图会话开始时阅读此文件,避免已知陷阱。
第二层:
session-archives/
目录
(位于此工具目录)
  • 详细会话日志以时间戳文件存储:
    session-archives/YYYY-MM-DD-topic.md
  • 包含:完整实验数据、评分矩阵、调试跟踪、前后对比。
  • 除非用户明确请求,否则绝不加载到上下文。
  • 作为原始数据用于未来深度分析,而非操作知识。

After Every Session

每次会话后

  1. Check KNOWN_ISSUES.md -- Does this session reveal a new rule? Add it (max 1 line). Does it supersede an old rule? Replace it.
  2. Archive verbose data -- If the session involved experiments, debugging, or research, write a session archive file.
  3. Compress, don't accumulate -- The goal is a fixed-size knowledge base that gets BETTER over time, not BIGGER.
  1. 检查
    KNOWN_ISSUES.md
    ——本次会话是否揭示新规则?添加(最多1行)。是否取代旧规则?替换。
  2. 归档详细数据——若会话涉及实验、调试或研究,撰写会话归档文件。
  3. 压缩,而非积累——目标是固定大小的知识库,随时间优化,而非变大。

The Compression Principle

压缩原则

Every piece of learning must be compressed to its irreducible form before entering Tier 1:
  • BAD: "In session on March 10, we discovered that passing synthesis points as dicts causes an AttributeError because the renderer at line 870 does escape_latex(pt) directly on each point" (38 words)
  • GOOD: "Synthesis
    points[]
    must be FLAT STRINGS, not dicts. Renderer does
    escape_latex(pt)
    directly." (12 words)
If you can't compress it to one line, it belongs in Tier 2 (session archive), not Tier 1.
所有学习内容必须压缩至不可简化形式,才能进入第一层:
  • 糟糕:"在3月10日的会话中,我们发现传递总结要点为字典时,会引发AttributeError,因为渲染器第870行直接对每个要点执行escape_latex(pt)"(38词)
  • 优秀:"总结
    points[]
    必须为纯字符串,非字典。渲染器直接执行
    escape_latex(pt)
    。"(12词)
若无法压缩为一行,则属于第二层(会话归档),而非第一层。