persona-builder

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Influencer Identity

网红身份打造

Take a person — their socials, their camera roll, the way they talk, who they want to be online — and run them through a glow-up with real talk. The output is a
persona.md
+ portable folder kit + a roadmap to close the gap between where they are now and the persona they want to project.
Sister skill to
build-a-brand
: same uncompromising standards on voice, aesthetic, and specificity — but the brand is a human, not a product. The output is consumed by other Pika skills (ugc-ads, podcast, founder-product-video, app-sizzle, app-store-screens) so the influencer can produce on-brand content anywhere.
针对用户的社交媒体账号、相册照片、沟通风格及期望的线上形象,为其提供全方位的形象升级方案与真实建议。输出内容为
persona.md
+ 可移植文件夹工具包 + 缩小当前状态与目标形象差距的行动路线图。
本技能是
build-a-brand
的姊妹技能:在语气、美学风格与精准度上遵循同等严苛标准——但打造的品牌主体是真人,而非产品。输出内容可被其他Pika技能(ugc-ads、podcast、founder-product-video、app-sizzle、app-store-screens)调用,让网红能够在任何场景产出符合品牌调性的内容。

Cost transparency gate

成本透明提示

Before any paid MCP call, call
mcp__plugin_pika_pika__identity_balance({verbose: true})
once. Surface the current balance, recent burn rate, and remaining runway, then gate the run with this exact message:
Estimated cost: about 600-1,200 credits (~$6-$12) for scrape/transcription if needed, GPT-image-2 mood-board and voice-page atmosphere tiles, MCP HTML image renders, JPG conversion, final html_to_pdf render, and post-render analyze_media PDF QA. This exceeds $5, so Reply
proceed
to continue or
cancel
to stop.
Do not call any paid MCP tool until the user replies
proceed
. If the user replies
cancel
, stop without generating. This is the only yes/no gate; after
proceed
, run the workflow until the next explicit user approval gate for identity/mood-board/PDF content.
在发起任何付费MCP调用前,需先调用一次
mcp__plugin_pika_pika__identity_balance({verbose: true})
。向用户展示当前余额、近期消耗速率及剩余可用额度,然后通过以下固定话术确认是否继续:
预估成本:约600-1200积分(约6-12美元),涵盖必要的爬取/转录费用、GPT-image-2情绪板与风格页氛围图制作、MCP HTML图片渲染、JPG格式转换、最终html_to_pdf渲染及渲染后analyze_media PDF质检。费用超过5美元,请回复
proceed
继续或
cancel
终止。
在用户回复
proceed
前,不得调用任何付费MCP工具。若用户回复
cancel
,则立即停止操作,不生成任何内容。这是唯一的确认环节;用户回复
proceed
后,执行工作流直至下一个需要用户明确批准的环节(身份/情绪板/PDF内容确认)。

Tone of the conversation

沟通语气

This is not a packaging exercise. The user came here to look more marketable. That means: be honest, be strategic, be specific. The skill should feel like a friend who books brand deals telling them what to actually fix — not a horoscope and not a Canva mood board factory.
  • Lead with what the inputs prove, not what the user wished they'd hear. If their photos are crappy, say so and tell them how to fix it.
  • Initiate the strategic conversation, don't wait to be asked. The user doesn't always know what's lucrative or what content mix gets paid — bring receipts.
  • Name the gap explicitly: "to look like @[reference], you'd need to add X / drop Y / shoot in Z light." Don't soft-pedal.
  • No empty validation. If the user's bio is generic, the answer isn't "love it!", it's a rewrite.
绝非包装美化。用户前来是为了提升自身商业价值。这意味着:要诚实、要有策略、要具体。技能给人的感觉应像一位擅长对接品牌合作的朋友,告诉用户真正需要改进的地方——而非空洞的运势解读或Canva情绪板生成器。
  • 基于输入内容给出结论,而非用户想听的话。如果用户的照片质量不佳,直接指出并说明改进方法。
  • 主动发起策略沟通,不要等用户提问。用户未必清楚哪些方向盈利,或哪种内容组合能获得品牌付费——要给出实际依据。
  • 明确指出差距:“要达到@[参考账号]的水平,你需要新增X内容/删除Y内容/在Z光线环境下拍摄。”不要含糊其辞。
  • 拒绝空洞赞美。如果用户的个人简介过于通用,不要回复“太棒了!”,而是直接给出改写版本。

Full Workflow

完整工作流

Stage 0 — Intake (empty-args menu)

阶段0 — 信息收集(无输入菜单)

If invoked with no input (no handle, no URLs, no photos, and no relevant prior context), print this menu verbatim as your full response and stop. Do not call any tool. Wait for the user's next message.
Glow-up time. This is going to whip you into a more marketable shape online — honest read on what's working, what's not, where the money actually is for someone with your inputs, and what to fix to get there. You'll leave with a designed Influencer Persona doc + a roadmap to actually become that persona.
Light-touch from you — drop files or paste links:
  • IG / TikTok / YouTube handle — I'll pull your bio, recent posts, and captions. This is the strongest signal and the most important input.
  • 16+ individual full-res photos that feel like "you" — drag-drop the actual files. If you gave active socials, scraped post imagery can cover some of this; if you have no socials, the full PDF needs enough real photos for the mood board and the voice-mode pages.
  • Any URLs that show your taste — Pinterest, Spotify, Depop, Letterboxd, etc. Drop as many or as few as you want.
  • Describe yourself + how you want to come off on social — a few sentences in your own words. What you're actually like, and what you want people to see when they land on your profile.
  • No socials yet? Say "start from scratch" and I'll ask you everything. Before the mood board / PDF stages, I'll still ask for 16+ real photos so the visuals are grounded in you.
Handles + a few photos is the strongest combo. Skip what doesn't apply. Nothing here requires screenshotting apps, drafting long paragraphs, or recording video.
If the user already dropped one of the above, skip the menu and proceed straight to Step 1.
若调用时无任何输入(无账号、无URL、无照片、无相关上下文),则直接输出以下菜单内容并停止操作,不调用任何工具,等待用户下一步消息。
形象升级时刻。这将帮助你打造更具商业价值的线上形象——客观分析当前优势、现存问题、适合你的盈利方向,以及具体的优化方案。最终你将获得定制的网红Persona文档 + 实现该形象的行动路线图。
你只需轻松提供以下内容:上传文件或粘贴链接:
  • IG / TikTok / YouTube账号 — 我将提取你的个人简介、近期动态及文案。这是最关键的输入信息,参考价值最高。
  • 16张及以上能体现“真实你”的高清照片 — 直接拖拽上传文件即可。若你提供了活跃的社交账号,爬取的动态图片可部分替代;若你暂无社交账号,完整PDF需要足够的真实照片用于情绪板与风格页制作。
  • 任何能体现你风格的URL — Pinterest、Spotify、Depop、Letterboxd等平台链接。可多可少,按需提供。
  • 描述你自己 + 期望的社交形象 — 用你自己的话写几句话。真实的你是什么样,以及你希望用户访问你的主页时看到怎样的形象。
  • 还没有社交账号? 回复“start from scratch”,我会逐一询问所需信息。在制作情绪板/PDF前,我仍会要求你提供16张及以上真实照片,确保视觉内容贴合真实的你。
提供账号 + 几张照片是最佳组合。无需提供不适用的内容。无需截图应用、撰写长文或录制视频
若用户已提供上述某类内容,则跳过菜单,直接进入步骤1。

Step 1 — Read the input

步骤1 — 分析输入内容

Before supported social scrape/transcription or any other paid MCP call in this step, ensure the Cost transparency gate above has been completed. Run that gate only if it has not already run in this invocation; do not call
mcp__plugin_pika_pika__identity_balance
again, and do not ask for a second
proceed
after the user has already approved the upfront cost gate.
Open with a brief agenda so the user knows what's coming. Set the tone — this is a glow-up, not a packaging job:
quick framing: this isn't a packaging exercise. i'm going to be honest with you about what's working in your current presence, what isn't, where the lucrative paths are for someone with your inputs, and exactly what to fix to get there. by the end you'll have a designed influencer persona doc + a real roadmap.

here's how this works — 5 steps:
1. **Read the input** — i scrape your socials (if you gave handles), look at your photos, ask you a few questions, play back what i'm hearing
2. **Strategic direction + lock the identity** — the real-talk step. i lay out the lucrative paths your inputs actually support (with stats), name the gaps between your current feed and where you want to go, critique your current photography honestly (lighting, framing, gear) and tell you what to buy. then we lock the persona. you approve before we move on.
3. **Mood board** — i curate from your photos (color-graded if your influencer aesthetic differs from your existing grid), then GPT fills aesthetic gaps. you approve.
4. **Influencer Persona PDF** — i draft your voice bank (bios, caption modes with visual examples, hook openers, DM voice, do/don't) and lay it into a multi-page branded PDF alongside your persona overview, content categories, mood board, and a Key Next Steps roadmap. you approve it from the PDF preview.
5. **Package the kit** — persona.md + folder kit (mood board, voice bank, PDF) saved to ./tmp/. only after you say "ship it".

let's start. [questions follow]
Read inputs first:
  • If any supported social handles or supported social URLs were dropped → call
    mcp__plugin_pika_pika__scrape_social
    on those supported socials IN PARALLEL before asking the canonical questions, so data is back by the time the user answers. Use
    rehost: true
    for Instagram/TikTok feed and video actions so downstream curated tiles have durable
    pika_cdn_url
    / CDN media instead of ephemeral signed media. Do not send unsupported taste URLs (Spotify, Depop, Letterboxd, personal sites, mood-board links, etc.) to
    mcp__plugin_pika_pika__scrape_social
    ; keep them as taste signals and ask what to borrow from them if the intent is not obvious. Standard call pattern per supported platform:
    • instagram:
      instagram.profile
      +
      instagram.user-posts
      (limit 30,
      rehost: true
      ) +
      instagram.user-reels
      (limit 30,
      rehost: true
      )
    • tiktok:
      tiktok.profile
      +
      tiktok.profile-videos
      (limit 30,
      rehost: true
      )
    • youtube:
      youtube.channel
      +
      youtube.channel-videos
      (limit 30)
    • twitter:
      twitter.profile
      +
      twitter.user-tweets
      (limit 30)
    • pinterest:
      pinterest.profile
      (limit 50) Pull recent posts, captions, bios, visual style. Note dominant subjects, lighting, voice patterns, caption length, hook style. Captions, bios, metrics, and visual-style summaries are text/strategy signal that informs Step 2; they are not dropped raw into visual artifacts. Clean public-feed media from the
      rehost: true
      scrape can become curated image source material in Step 3 / Step 4 after the source-priority and no-text gates below. Scraped captions are the PRIMARY voice signal for the voice bank in Step 4 — direct user captions beat any inference from chat answers. Chat-voice + intake answers are the fallback only when scrape fails (private account, no posts yet, brand-new handle).
  • If camera roll / photos → look at them. Note settings, lighting, what's in frame, what's NOT (no people? all flatlay? always outdoors?).
  • If selfie video → transcribe via
    mcp__plugin_pika_pika__transcribe_audio
    , note vocabulary, energy, pacing, fillers, what they're enthusiastic about.
Then ask these 4 FIXED canonical questions in a single message. Same every time — do NOT improvise or substitute. Consistency matters; if the user restarts and gets different questions, they lose trust in the process.
  1. Describe yourself + how you want to come off on social — a few sentences in your own words. What you're actually like, and what you want people to see when they land on your profile. (This question covers the real-self / projection / authenticity dial all at once.)
    • Plus: name 1–3 creators or characters you want to feel like. For each, tell me which dimension you want to borrow — their voice, their content type, or their visual aesthetic. You don't have to like all three things about them; specify what's in and what's out. ("I want her voice but not her content" is a totally normal answer.)
  2. Where do you live + what's your day-to-day pace? (Location + speed signal — affects the authenticity register and the kinds of settings the mood board renders.)
  3. Ideal friday night? (Single vibe question — surfaces personality quickly. "TV and bed early" vs "out till 2am" reads instantly.)
  4. What are you hoping to get out of this? (The WHY — followers, a business, dating, community, just expression, something else. This drives every strategic choice downstream.)
That's it. 4 questions. No variations. No "pick 3 of these and skip the rest." If the user already answered some of these in their initial drop (e.g. their description-of-self answer in the intake menu covers Q1), acknowledge what you have and only ask the unanswered ones.
If the user explicitly says "start from scratch," ask these 4 questions even without socials. If the user dropped zero photos AND zero URLs AND zero description without saying start-from-scratch, fall back to asking the menu's intake items first before these 4 questions — you need at least SOMETHING to read. No-social users are supported; no-photo visual/PDF output is not. If a start-from-scratch user answers the questions but still provides no real photos, complete the strategic text work and pause before Step 3 to request 16+ individual photos: 6 mood-board curated tiles plus 10 separate PDF voice-page curated tiles.
Keep it to one message. Wait for answers.
After answers, play back what you heard. Lead with the surprising or specific things, not summary boilerplate:
  • The person: 2–3 sentence read on who they actually are
  • The voice signal: how they talk in the inputs you read (specific phrases, energy, what they avoid)
  • The visual world: what their feed, camera-roll photos, taste URLs, and/or reference creators suggest, AND what they're reaching for if different (this is the seed for the Step 2 visual aesthetic spec)
  • The authenticity dial: real / amplified / different — and along which axis
  • References anchoring the work: each creator they named + which dimension to borrow (voice / content type / visual aesthetic)
End with: "anything to add, change, or call out before I lock the identity?" Wait for response.
在执行支持的社交平台爬取/转录或本步骤中的其他付费MCP调用前,需确保已完成上述成本透明提示环节。仅当本次调用尚未执行该环节时才需触发;无需重复调用
mcp__plugin_pika_pika__identity_balance
,也无需在用户已批准前期成本后再次要求确认。
先向用户简要说明流程,明确语气——这是形象升级,而非包装美化:
快速说明:这不是包装美化,我会客观告诉你当前线上形象的优势、问题、适合你的盈利方向,以及具体的优化方案。最终你将获得定制的网红Persona文档 + 真实可行的行动路线图。

流程分为5步:
1. **分析输入内容** — 爬取你的社交账号(若提供)、查看你的照片、询问几个问题、反馈我的分析结果
2. **策略方向 + 锁定身份** — 核心建议环节。我会列出你的输入内容所支持的盈利方向(附数据)、指出当前动态与目标形象的差距、客观评价当前摄影质量(光线、构图、设备)并给出采购建议。随后锁定Persona,需你批准后再进入下一步。
3. **情绪板制作** — 从你的照片中精选素材(若目标美学风格与现有动态不符,会进行调色处理),再通过GPT补充美学缺口。需你批准。
4. **网红Persona PDF制作** — 撰写你的风格库(个人简介、带视觉示例的文案模板、开场钩子、私信语气、注意事项),并将其与Persona概述、内容分类、情绪板及关键下一步行动路线图整合为多页品牌化PDF。你可通过PDF预览确认内容。
5. **打包工具包** — 将persona.md + 文件夹工具包(情绪板、风格库、PDF)保存至./tmp/。仅在你回复"ship it"后执行。

现在开始。[后续为问题]
优先分析输入内容:
  • 若用户提供了支持的社交账号或URL → 并行调用
    mcp__plugin_pika_pika__scrape_social
    爬取这些平台的数据,确保在用户回答问题前获取到数据。针对Instagram/TikTok的动态与视频,使用
    rehost: true
    参数,确保下游精选素材使用持久化的
    pika_cdn_url
    /CDN媒体,而非临时签名媒体。不要将不支持的风格URL(Spotify、Depop、Letterboxd、个人网站、情绪板链接等)传入
    mcp__plugin_pika_pika__scrape_social
    ;将其作为风格信号,若意图不明确则询问用户可借鉴的方向。各平台标准调用模式:
    • instagram:
      instagram.profile
      +
      instagram.user-posts
      (限制30条,
      rehost: true
      ) +
      instagram.user-reels
      (限制30条,
      rehost: true
    • tiktok:
      tiktok.profile
      +
      tiktok.profile-videos
      (限制30条,
      rehost: true
    • youtube:
      youtube.channel
      +
      youtube.channel-videos
      (限制30条)
    • twitter:
      twitter.profile
      +
      twitter.user-tweets
      (限制30条)
    • pinterest:
      pinterest.profile
      (限制50条) 提取近期动态、文案、个人简介、视觉风格。记录核心主题、光线、语气模式、文案长度、钩子风格。文案、个人简介、数据及视觉风格总结是文本/策略信号,用于指导步骤2;不会直接作为视觉素材使用。爬取的公开动态媒体(
      rehost: true
      )可作为步骤3/4的精选图片来源,但需符合后续的来源优先级与无文字规则。爬取的文案是步骤4风格库的主要语气信号——用户的原始文案优于任何聊天推断内容。仅当爬取失败(账号私密、无动态、新账号)时,才以聊天语气+问卷答案作为 fallback。
  • 若用户提供了相册/照片 → 查看照片,记录场景、光线、画面内容及缺失元素(无人?全是平铺图?全在户外?)。
  • 若用户提供了自拍视频 → 通过
    mcp__plugin_pika_pika__transcribe_audio
    转录内容,记录词汇、活力、语速、口头禅及用户感兴趣的话题。
随后一次性提出以下4个固定问题。每次都使用相同问题——不得即兴修改或替换。一致性至关重要;若用户重启流程时遇到不同问题,会失去对流程的信任。
  1. 描述你自己 + 期望的社交形象 — 用你自己的话写几句话。真实的你是什么样,以及你希望用户访问你的主页时看到怎样的形象。(该问题同时涵盖真实自我/形象呈现/真实性平衡)
    • 补充:列出1-3位你希望效仿的创作者或角色。针对每位,说明你想借鉴的维度——他们的语气内容类型视觉美学。你无需喜欢他们的所有方面;明确说明可取与不可取之处。(“我想借鉴她的语气,但不想做她的内容”是完全合理的回答。)
  2. 你居住在哪里 + 日常节奏是怎样的? (地点+节奏信号——影响真实性定位与情绪板渲染的场景)
  3. 理想的周五夜晚是怎样的? (单一场景问题——快速展现个性。“早早上床看电视”与“嗨到凌晨2点”的差异一目了然。)
  4. 你希望通过本次服务获得什么? (核心目标——粉丝、商业机会、交友、社群、自我表达或其他。这将指导后续所有策略选择。)
就这4个问题,无变体。无需“选3个跳过其余”。若用户在初始输入中已回答部分问题(例如在信息收集菜单中已回答问题1),则确认已获取的信息,仅询问未回答的问题。
若用户明确回复“start from scratch”,即使无社交账号也需询问这4个问题。若用户未提供任何照片、URL或描述,且未回复start-from-scratch,则先回到信息收集菜单请求输入,再询问这4个问题——至少需要一些信息用于分析。支持无社交账号的用户,但不支持无照片的视觉/PDF输出。若从零开始的用户回答了问题但仍未提供真实照片,则完成策略文本工作,在步骤3前暂停并请求16张及以上照片:6张用于情绪板精选素材,10张用于PDF风格页精选素材。
将问题整合为一条消息发送,等待用户回复。
收到回复后,反馈你的分析结果。重点突出意外或具体的信息,而非笼统总结:
  • 真实的你:2-3句话描述真实的用户
  • 语气信号:用户在输入内容中的沟通方式(具体短语、活力、回避的话题)
  • 视觉风格:用户的动态、相册照片、风格URL及/或参考创作者所体现的风格,以及与用户期望的差异(这是步骤2视觉美学规范的基础)
  • 真实性平衡:真实/放大/差异化——以及具体维度
  • 参考锚点:用户列出的每位创作者 + 借鉴的维度(语气/内容类型/视觉美学)
最后询问:“在锁定身份前,有什么需要补充、修改或强调的吗?”等待用户回复。

Step 2 — Strategic Direction & Identity Lock (HARD GATE — approval required)

步骤2 — 策略方向与身份锁定(硬性确认环节——需用户批准)

This step has four parts. The first three are the glow-up real-talk — the work the user came here for. The fourth is the identity lock that ends the gate. Skipping 2a–2c and going straight to identity.md is a fail state; the user explicitly wants strategic guidance, gap analysis, and critique, not just a packaging summary.
Tone for this step: specific, honest, and useful. Cite the inputs you read in Step 1 by name ("your trench-coat post from Nov 8" beats "your outfit content"). Bring numbers wherever you can. Don't soft-pedal — if their bio is generic, say so. If their lighting is bad, say so. The user is paying you to tell them what their friend who books brand deals would tell them.
本步骤分为四部分。前三部分是形象升级的核心建议——用户前来的核心需求。第四部分是身份锁定,标志本环节结束。跳过2a–2c直接进入identity.md是失败的;用户明确需要策略指导、差距分析与客观评价,而非仅包装总结。
本步骤语气:具体、诚实、实用。引用步骤1中分析的具体输入内容(例如“你11月8日发布的风衣动态”优于“你的穿搭内容”)。尽可能提供数据支持。不要含糊其辞——若用户的个人简介过于通用,直接指出;若光线不佳,直接说明。用户付费是为了获得对接品牌合作的朋友会给出的真实建议。

2a — Strategic direction (lucrative paths + content implications)

2a — 策略方向(盈利方向 + 内容影响)

Initiate the strategic conversation; don't wait to be asked. Before locking the persona, lay out the 2–3 most lucrative directions their inputs actually support, with rough industry stats and the concrete content implications for each. The user often doesn't know what gets paid, what CPMs look like in different niches, or what posting load each path requires. Bring receipts.
For each direction:
  • Niche label (specific, not "lifestyle" — "Boston-local lifestyle creator", "Texas barre-class founder content", "millennial dad humor for finance")
  • Why this direction fits the inputs — point to the specific posts / photos / things they said that support it
  • Rough monetization profile — typical CPM range, common brand-deal floor at different follower tiers (e.g. "fashion micro: $100–250/post at 5K; lifestyle micro: $150–300/post; finance: $400–800/post"), affiliate/LTK reasonableness, sponsored-post realism. Every numeric monetization claim must either cite a source verified during this run or be explicitly labeled broad heuristic estimate; never present unsourced CPMs, follower thresholds, or brand-deal floors as precise facts.
  • Content load this path requires — posting cadence (e.g. "4–5 outfits/week + 1 location reel"), production tier expected (phone-shot OK vs. mirrorless needed), recurring formats audiences expect in this niche (e.g. "OOTD posts + 'GRWM for X' reels + monthly local-roundup carousel")
  • Who's already winning here at your size — name 2–3 creators at 5K–30K who run this exact play, with one sentence each on what makes them work. Only name real creators, handles, follower tiers, or current metrics if verified during this run via
    mcp__plugin_pika_pika__scrape_social
    or another live supported source. If you cannot verify, use clearly labeled archetypes instead (e.g. "local style micro-creator archetype") and omit handles/counts.
End the section with a direct question: "which of these directions are you most drawn to — or do you want to combine?" Don't let the user pick "lifestyle" generically; force the choice to be specific.
If the user already named a direction in Step 1 ("I want to be a fashion creator"), still run this section to either validate or widen the lens — they often haven't considered the adjacent path that pays better with their existing inputs.
主动发起策略沟通,不要等用户提问。在锁定Persona前,列出2-3个用户输入内容所支持的最具盈利潜力的方向,附上行业大致数据及每个方向的具体内容要求。用户往往不清楚哪些方向盈利、不同细分领域的CPM水平,或每个方向所需的发布频率。需给出实际依据。
针对每个方向:
  • 细分领域标签(具体,而非“生活方式”——例如“波士顿本地生活博主”、“德州 barre 课程创始人内容”、“面向金融从业者的千禧一代老爸幽默内容”)
  • 适配原因——指出支持该方向的具体动态/照片/用户表述
  • 大致变现概况——典型CPM范围、不同粉丝量级的品牌合作底价(例如“时尚微博主:5K粉丝时单条动态100-250美元;生活方式微博主:150-300美元;金融领域:400-800美元”)、联盟营销/LTK可行性、赞助动态的现实性。所有变现数据需标注来源(本次运行中验证的来源)或明确标注通用估算值;不得将无来源的CPM、粉丝阈值或品牌合作底价作为精确事实呈现。
  • 内容发布要求——发布频率(例如“每周4-5套穿搭 + 1条本地场景Reel”)、预期制作水平(手机拍摄即可 vs. 需要微单)、该领域受众预期的常规内容形式(例如“OOTD动态 + 'GRWM for X' Reel + 月度本地汇总轮播”)
  • 同量级成功案例——列出2-3位粉丝量在5K-30K的创作者,每人用一句话说明其成功之处。仅列出本次运行中通过
    mcp__plugin_pika_pika__scrape_social
    或其他实时支持来源验证的真实创作者、账号、粉丝量级或当前数据。若无法验证,则使用明确标注的原型(例如“本地时尚微博主原型”),省略账号/数据。
本部分结尾提出直接问题:“你最倾向哪个方向——或是否想组合多个方向?” 不要让用户笼统选择“生活方式”;需明确具体方向。
若用户在步骤1中已指定方向(例如“我想成为时尚博主”),仍需执行本部分内容,验证该方向或拓展思路——用户往往未考虑到与其现有输入适配性更好、盈利更高的相邻方向。

2b — Gap analysis (current profile → target persona)

2b — 差距分析(当前形象 → 目标Persona)

Once the direction is named (or if the user already locked it in Step 1), explicitly compare what they're currently posting vs. what the target persona requires. This is the section that tells them what's missing.
For each gap, name:
  • The missing content category — "you have OOTD nailed but zero hosting / kitchen content; the persona you named needs that bucket"
  • The format gap — "you have static posts but no reels; the niche pays reels at 3–5× the CPM of static"
  • The cadence gap — "you're posting ~1×/week; this niche expects 4–5×/week to stay in the algorithm"
  • The voice gap — if their actual captions don't match the voice they say they want, name the specific phrase patterns they need to drop and the ones they need to add
  • The aesthetic gap — if their current visual register diverges from where they want to land (e.g. flat phone-flash shots vs. cinematic golden-hour register), say so explicitly
  • The follower-tier gap — name where they are now (e.g. 2K), what tier the paid work starts (typically 5K+ for local brand deals, 10K+ for national), and what realistic 90-day growth looks like for someone executing the plan
Frame each gap as a shippable action, not a critique. "You need 1 reel/week tied to a Boston seasonal moment" lands better than "you don't post enough reels."
确定方向后(或用户在步骤1中已锁定方向),明确对比用户当前发布内容目标Persona要求。本部分将告知用户缺失的内容。
针对每个差距,说明:
  • 缺失的内容类别——“你已掌握OOTD内容,但完全没有居家/厨房内容;你指定的Persona需要该类别”
  • 形式差距——“你只有静态动态,没有Reel;该领域Reel的CPM是静态动态的3-5倍”
  • 发布频率差距——“你每周发布约1次;该领域需要每周4-5次才能维持算法曝光”
  • 语气差距——若用户的实际文案与期望语气不符,指出需要摒弃的短语模式及需要新增的模式
  • 美学差距——若当前视觉风格与目标风格存在差异(例如手机闪光灯平铺拍摄 vs. 电影感黄金时段风格),明确说明
  • 粉丝量级差距——说明当前粉丝量(例如2K)、付费合作起始量级(通常本地品牌合作需5K+粉丝,全国品牌需10K+),以及执行计划后90天的合理增长预期
将每个差距表述为可执行的行动,而非批评。例如“你需要每周发布1条与波士顿季节性场景相关的Reel”比“你Reel发布得太少”更有效。

2c — Content critique (production quality + gear)

2c — 内容评价(制作质量 + 设备建议)

Be helpfully critical. If the user's current photos are crappy, say so — and tell them how to fix it. The default failure mode is to be too polite here; correct that.
Review their existing photo / video content for:
  • Lighting — name specific photos that are working ("the kitchen shot at 4pm — that's the window light to chase") and specific ones that aren't ("the trench post is shot into overhead-noon light, which is why your face is hard-shadowed; reshoot at 3–6pm window light")
  • Framing & composition — phone held too low or too high, subject too centered, headroom issues, busy backgrounds, where to actually stand
  • Color & post-processing — are they using a preset / preset pack? do their colors match the locked visual aesthetic? if not, name specific tools
  • Consistency — does the grid feel like one person's eye? where does it break?
  • Selfie technique specifically — phone position, mirror cleanliness, what to wear vs. avoid for OOTD selfies, how to hide face if face-shy
Then suggest concrete gear/tools to purchase, scaled to the user's likely budget. Don't recommend a $1500 mirrorless to someone who has 500 followers — start with what gets them 80% of the lift for under $200:
  • Starter tier (under $100): phone tripod ($15), clip-on reflector or A4 white foam board ($25), one Lightroom mobile preset pack matched to their aesthetic ($15–40), basic ring light or LED panel if their home is dim ($30)
  • Intermediate tier ($100–500): small softbox kit, gorillapod for outdoor reels, wide-angle phone clip-lens for mirror selfies in small spaces, a dedicated mic for reel voiceovers (~$80)
  • Advanced tier ($500+): only recommend if the user is already at 10K+ and asks — entry-level mirrorless (Sony a6400 / Fujifilm X-S20) + a 35mm-equivalent prime, color-graded preset subscription, an editor
Name brands and price points so the user can actually shop. "Buy a clip-on reflector" alone is too vague; "Lume Cube panel mini or the Pictar reflector clip, both around $25 on Amazon" gives them something to click.
给出有帮助的批评。若用户当前照片质量不佳,直接指出——并说明改进方法。常见错误是过于委婉;需纠正这一点。
从以下方面评价用户现有照片/视频内容:
  • 光线——指出效果好的具体照片(“下午4点的厨房照片——这是值得追求的窗边光线”)及效果差的照片(“风衣动态是在正午顶光下拍摄的,导致面部阴影过重;建议在下午3-6点的窗边光线环境下重拍”)
  • 构图与布局——手机位置过低或过高、主体过于居中、头部空间问题、背景杂乱、正确的站立位置
  • 色彩与后期处理——是否使用预设/预设包?色彩是否符合锁定的视觉美学?若不符合,指出具体工具
  • 一致性——动态风格是否统一?哪里出现断裂?
  • 自拍技巧——手机位置、镜面清洁度、OOTD自拍的穿搭建议、面部害羞时的遮挡方法
随后给出符合用户预算的具体设备/工具建议。不要向只有500粉丝的用户推荐1500美元的微单——先推荐花费低于200美元即可获得80%提升的设备:
  • 入门级(低于100美元):手机三脚架(约15美元)、夹式反光板或A4白色泡沫板(约25美元)、匹配其美学风格的Lightroom移动端预设包(约15-40美元)、若家中光线昏暗则推荐基础环形灯或LED面板(约30美元)
  • 进阶级(100-500美元):小型柔光箱套件、户外Reel用八爪鱼三脚架、小空间镜面自拍用广角手机镜头、Reel旁白专用麦克风(约80美元)
  • 专业级(500美元以上):仅当用户粉丝量已达10K+且主动询问时才推荐——入门级微单(Sony a6400 / Fujifilm X-S20)+ 35mm等效定焦镜头、调色预设订阅服务、专业编辑
明确品牌与价格,方便用户直接选购。仅说“买夹式反光板”过于模糊;“Lume Cube迷你面板或Pictar夹式反光板,亚马逊售价均约25美元”能让用户直接行动。

2d — Lock the identity.md

2d — 锁定identity.md

NOW (after 2a–2c have been delivered and the user has weighed in), draft the persona summary. This is the user's first proper artifact — it has to feel like someone gets them and like the strategic real-talk above shaped where it landed, not like generic brand copy.
identity.md
structure (deliver inline in the chat as a markdown block, save to disk later):
markdown
undefined
现在(完成2a–2c并收到用户反馈后),撰写Persona总结。这是用户获得的第一个正式成果——必须让用户感受到被理解,且上述核心策略建议已融入其中,而非通用品牌文案。
identity.md
结构(以markdown块形式在聊天中交付,后续保存至磁盘):
markdown
undefined

[@handle or first name] — Identity

[@账号名或名字] — 身份定位

The real you

真实的你

[2–3 paragraphs in plain language. Specific. References their actual stuff — the comfort show they named, the way they describe their friends, the thing they said about their high school self. Sounds like someone who paid attention, not a horoscope.]
[2-3段平实语言描述。具体。引用用户的真实细节——例如其提到的舒适剧集、描述朋友的方式、关于高中时期的表述。听起来像是认真倾听后的总结,而非空洞的运势解读。]

Your influencer identity

你的网红身份

[2–3 paragraphs about the online projection. Calls out the authenticity dial explicitly: "You're not pretending to be someone else — you're you with the volume on [dimension] turned up." OR "This is a character. Here's where the character starts and the real you ends." Be specific about what's amplified, what's filtered, and why.]
[2-3段关于线上形象的描述。明确说明真实性平衡: “你不是在假装成别人——而是将[某维度]的自己放大。” 或 “这是一个角色。以下是角色与真实你的分界点。” 具体说明哪些部分被放大、哪些被过滤,以及原因。]

Visual aesthetic

视觉美学

This is the spec Step 3's mood board has to render. Be detailed enough that someone else could build the board from this section without seeing the user.
  • Palette: [specific colors / named tones — "warm cream, cognac brown, dusty rose, deep navy" beats "neutrals"]
  • Lighting: [warm window / overcast / golden hour / harsh flash / candlelight — which lighting dominates]
  • Settings: [where shots take place — specific. "Brooklyn brownstones, marble-top coffee shops, kitchen counter at 10pm" beats "city lifestyle"]
  • Photography style: [phone-camera POV, mirror selfies (faceless or face), 3/4 candids, overhead flatlay, etc. — which framings dominate]
  • Subjects in frame: [hands holding things, full body, faces, food, books, the dog, the bar, etc.]
  • Forbidden visual elements: [what would never appear — e.g. "no full-glam beauty shots", "no landscape paintings", "no studio-clean white backgrounds"]
  • Relationship to existing grid: [if their existing photos already match this aesthetic, say so. If the aesthetic is a slight evolution from their grid, name exactly what shifts — e.g. "their grid is currently warm and slightly washed; the influencer aesthetic pushes deeper shadows + cooler highlights for a more cinematic read." Step 3 uses this to decide whether to apply light color-grading to her photos for cohesion.]
这是步骤3情绪板的制作规范。需足够详细,让其他人无需查看用户内容即可根据本部分制作情绪板。
  • 配色方案: [具体颜色/色调名称——例如“暖奶油色、白兰地棕、灰粉色、深藏青色”优于“中性色”]
  • 光线: [暖窗边光/阴天/黄金时段/硬闪光灯/烛光——主导光线类型]
  • 场景: [拍摄地点——具体。例如“布鲁克林褐石建筑、大理石台面咖啡馆、深夜10点的厨房台”优于“城市生活方式”]
  • 摄影风格: [手机视角、镜面自拍(露脸或不露脸)、3/4抓拍、 overhead平铺等——主导构图方式]
  • 画面主体: [手持物品、全身、面部、食物、书籍、宠物、酒吧等]
  • 禁用视觉元素: [绝对不会出现的内容——例如“无全妆美妆照”、“无风景画”、“无干净纯白工作室背景”]
  • 与现有动态的关系: [若现有照片已符合该美学风格,说明情况。若美学风格是现有动态的小幅进化,明确说明变化点——例如“现有动态风格偏暖且色调较淡;网红美学风格加深阴影+提亮冷色调,营造更具电影感的视觉效果”。步骤3将据此决定是否对用户照片进行轻度调色以保持一致性。]

Voice principles

语气原则

Sounds like: [3 example sentences in their voice — short, specific, real] Never sounds like: [3 example sentences they would never write — the cringe versions] Voice adjectives (3): [adj], [adj], [adj] Forbidden words/phrases: [3–6 actual phrases they'd want to avoid]
示例语气: [3句符合用户语气的短句——具体、真实] 禁用语气: [3句用户绝不会使用的示例——尴尬版本] 语气形容词(3个): [形容词1], [形容词2], [形容词3] 禁用词汇/短语: [3-6个用户希望避免的具体短语]

Reference creators (with dimension + what to borrow)

参考创作者(维度 + 借鉴点)

  • [@handle or name] — borrow their [voice / content type / visual aesthetic]. Specifically: [the actual thing — pacing, lighting, type of self-disclosure, etc.]
  • ...

**Anti-generic check before delivering:**
- Could this identity be re-pasted onto a different person without changing anything? If yes, push for specificity.
- Does the **Visual aesthetic** section name specific colors, lighting, and settings — or is it adjective soup? Step 3 has to render this; "warm and minimal" is not enough.
- Does each **Reference creator** entry specify the dimension (voice / content type / visual aesthetic) AND the specific thing to borrow? Vague references like "feel like @emmachamberlain" without naming WHAT are a failure state.

**Deliver the identity in chat, then explicitly ask:** "does this feel like you? anything to push harder, soften, or rewrite before I move to the mood board?"

🛑 **Wait for explicit approval.** "yes" / "this is me" / "ship it" / "perfect, do the board". Not "ok", not silence, not "make it more X" (that's feedback — incorporate and re-ask). Approval gates are hard stops.
  • [@账号名或名字] — 借鉴其**[语气/内容类型/视觉美学]**。具体借鉴点:[实际细节——语速、光线、自我披露类型等]
  • ...

**交付前的反通用检查:**
- 该身份是否可直接复制到另一个人身上而无需修改?若是,则需进一步细化。
- **视觉美学**部分是否明确了具体颜色、光线与场景——还是仅用形容词堆砌?步骤3需要据此制作情绪板;“温暖简约”不够具体。
- 每个**参考创作者**条目是否明确了维度(语气/内容类型/视觉美学)及具体借鉴点?仅模糊提及“想效仿@emmachamberlain”而未说明具体借鉴点是失败的。

**在聊天中交付身份内容,然后明确询问:** “这符合你的预期吗?在制作情绪板前,有什么需要强化、弱化或改写的内容吗?”

🛑 **等待明确批准**。例如“yes” / “这就是我” / “ship it” / “完美,制作情绪板”。不接受“ok”、沉默或“让它更X”(这是反馈——需调整后再次询问)。批准环节是硬性停止点。

Step 3 — Mood Board (HARD GATE — approval required)

步骤3 — 情绪板制作(硬性确认环节——需用户批准)

Build the mood board AFTER identity is locked. The board is built to match the
Visual aesthetic
section in the approved Step 2 identity.md — not to discover the aesthetic.
Re-read that section before sourcing or generating any tile. Every tile (curated + generated) must serve that locked spec.
Source priority — mix is required, fully-generated is a fail state:
  1. Individual full-resolution photos from the user (their camera roll, photos they love, full-res files) — the strongest curated source. These go in
    mood-board/curated/
    . 1a. Handle-only public-feed fallback for video-heavy / mostly Reels accounts. If the user gave a public handle but no separate photo files, scraped feed imagery can provide curated tiles. For image posts, use the highest-quality
    image_versions2.candidates[0].url
    . For video posts where
    media_url
    is a
    .mp4
    , do not use the ephemeral IG
    visual_media_url
    / signed cover JPG as the curated tile source; do not alter any signed-cover query param because that causes
    URL signature mismatch
    , and mid-clip covers often include baked-in caption or title overlays. Use only the durable rehosted
    .mp4
    returned by the Step 1
    rehost: true
    scrape (the
    pika_cdn_url
    /
    cdn.pika.art
    media; treat it as
    is_durable: true
    ) and call
    mcp__plugin_pika_pika__extract_frame(video_url: <durable_rehosted_mp4>, time_s: 0)
    . Treat each extracted frame as a candidate tile, not an approved tile: immediately call
    mcp__plugin_pika_pika__analyze_media
    on the frame URL with an OCR / visible text / no-text prompt before placement. This sourcing-time check must happen before the frame becomes a mood-board tile, PDF content-category tile, or voice-page curated tile. If frame 0 contains text, a title card, or a burned-in caption, drop the reel or try a later keyframe (for example
    time_s: 1.5
    or
    time_s: 3
    ) and run the same
    mcp__plugin_pika_pika__analyze_media
    no-text check before using that later keyframe. If no durable rehosted
    .mp4
    is present, retry the scrape with
    rehost: true
    ; do not fall back to a signed cover URL. Final PDF QA is only a backstop after this sourcing gate, never the first text-detection step.
  2. NEVER crop from an IG grid screenshot. Tile widths in grid screenshots aren't pixel-aligned to clean fractions; auto-crop bleeds; result is bad. Confirmed-fail approach. If a user offers a grid screenshot, ask them for individual photos instead.
  3. A mood board with zero curated tiles has failed. If neither user photos nor clean public-feed curated frames are available, unblock that first — never substitute generated stand-ins. If the user intended to provide photos and they didn't reach disk, ask them to re-upload. For start-from-scratch / no-social users, pause here and request 16+ individual photos before building visual artifacts: 6 mood-board curated tiles plus 10 separate PDF voice-page curated tiles.
  4. Read the
    Relationship to existing grid
    field from Step 2.
    If the locked aesthetic is identical to the user's existing grid, place curated tiles as-is. If Step 2 names a slight evolution (e.g. "their grid is warm + washed, the influencer aesthetic pushes deeper shadows + cooler highlights"), apply light CSS color treatment in the HTML composite before rendering —
    filter: contrast(...) saturate(...) brightness(...)
    , a subtle overlay, or a mild blend-mode treatment. Never re-pass the user's photos through gpt-image-2 — image-model re-rendering drifts face/pet/object identity and breaks the "this is actually her" anchor. If the divergence is too large for CSS treatment alone, the answer is more generated atmosphere tiles in the new aesthetic + fewer untreated curated tiles, NOT regenerating her photos.
  5. Identify aesthetic coverage gaps — what part of the locked Step 2 aesthetic isn't represented yet (a specific setting, framing, lighting condition)?
  6. Fill gaps with
    mcp__plugin_pika_pika__generate_image
    (provider="gpt-image-2")
    for social-content atmosphere only (load-bearing phrase — pushes gpt-image-2 away from vision-board / wallpaper energy; see Load-bearing phrases section), prompted in the Step 2 aesthetic (its palette, lighting, settings, photography style). See the hard rules below. Generated tiles go in
    mood-board/generated/
    .
🛑 HARD RULE: Generated tiles render SOCIAL CONTENT energy, not atmospheric vision-board energy.
A mood board for an influencer identity has to look like a creator's actual feed, not a vision board.
  • YES: phone-camera framing, candid 3/4 angles, mirror selfies (faceless if generated), in-hand POV shots (coffee on a table, book in lap, drink at a bar), tablescape overheads, dressing-room outfit flatlays, golden-hour walking-from-behind shots, brunch food close-ups, candle-lit dinner tables, vintage shop browsing, looking-up POV with sky behind, behind-the-scenes hands-doing-the-thing.
  • NO: landscape paintings, atmospheric "vibes" tiles, sunset harbors with no human signal, moss macros, pure scenery, anything that reads as desktop wallpaper or art print. If a tile looks like a Tumblr aesthetic frame from 2014 — fail. The mood board must read as a curated 2024–2026 social feed.
Prompts for generated tiles should always specify a framing (phone-camera POV, mirror selfie no face, 3/4 candid, overhead tablescape, etc.) AND a moment (the specific second of life captured), not just a "vibe."
See
references/aesthetic-prompts.md
"Social trend framings" for the prompt patterns.
🛑 HARD RULE: Generated tiles cover ATMOSPHERE only — never the user's real subjects.
  • The user's face, body, hair → only their actual photos
  • The user's dog, cat, pet, horse → only their actual photos
  • Their partner, friends, family, kids → only their actual photos
  • Their actual car, actual home, actual recurring objects → only their actual photos
Generated tiles render: rooms, weather, light through windows, hardwood floors, snow on cobblestones, the type of object they collect (vintage leather goods, brass watches), the kind of place they go (golden-hour skylines, woodland trails). NEVER a fake version of someone or something they actually have. Reason: face/animal fidelity drifts, and a fake version of someone's real dog reads as uncanny and insulting. The user will tell you "that's not my dog."
Composite — fill the template, don't author CSS. The board is built from a fixed template,
references/templates/mood-board.template.html
, rendered by
mcp__plugin_pika_pika__html_to_png
. Never ask gpt-image-2 to render a "mood board grid" — it hallucinates the tiles. You only supply data: exactly 12
.tile
rows (each an image URL + a per-image
background-position
crop focus), the brand name, the subtitle vibe words, the font pair, and the palette tokens. The template owns the grid math, the white header band, and the dimensions once — so the cream-sliver / off-grid failures can't recur. Set
{{COLS}}
to 6 for portrait tiles (default; most camera-roll input) or 4 for landscape-heavy input. Use exactly 12 tiles so the title-banded board and the PDF full-bleed no-header variant fill the canvas with no empty cells.
Curated:generated ratio = ~50/50. Roughly 6 curated + 6 generated for a 12-tile board. A board that's 8+ curated tiles + 2-3 generated reads as "her camera roll dumped on a grid" — not designed. The generated tiles do the work of expanding the aesthetic into branded territory she hasn't shot yet (the brunch tablescape, the wool-coat outfit selfie, the cocktail POV). A 50/50 mix reads as a designed mood board with real-life anchoring.
Font system — locked here, propagates to Step 4 PDF.
Pick a typography pair (display + body) that matches the Visual aesthetic adjectives locked in Step 2. Don't default to Didot or Helvetica reflexively — the wrong font for the brand makes the mood board read off and undercuts the PDF. The test is whether the font's personality matches the aesthetic adjectives. Heuristic starting points:
Aesthetic registerDisplay (titles)Body (subtitles, captions)
quiet-luxury, classy, refinedDidot, Bodoni ModaInter, Helvetica Neue
cinematic, cozy-coastal, editorialCormorant Garamond, Playfair DisplayInter, Söhne
minimal, modern, cleanInter Display, Helvetica NeueInter, Helvetica Neue
playful, retro, Y2K, funRecoleta, Bagnard, custom displayInter, IBM Plex Sans
warm-feminine, romantic, softItaliana, Cormorant InfantInter, Söhne
editorial, magazine-stylePlayfair Display, GT SectraInter, Helvetica Neue
earthy, handmade, organicCaudex, Recoleta SoftInter, IBM Plex Sans
Save the chosen pair as
fontDisplay
+
fontBody
in the persona.md Visual aesthetic section so Step 4 PDF picks them up. When fonts aren't system-installed (most of the above except Helvetica), source them from Google Fonts by filling
{{FONT_IMPORT}}
with a full
<link rel="stylesheet" href="...">
tag.
Render the board. Fill the template tokens:
  • {{TILES}}
    — one
    <div class="tile" style="background-image:url('IMAGE_URL');background-position:POS"></div>
    per tile.
    POS
    is the per-image crop focus:
    50% 20%
    upper-body OOTD,
    50% 35%
    top-down kitchen,
    50% 50%
    centered.
  • {{BRAND_NAME}}
    ,
    {{SUBTITLE}}
    (tracked vibe words, e.g.
    BOSTON · WOOL COATS · COCKTAIL BARS
    ).
  • {{FONT_IMPORT}}
    (a Google Fonts
    <link>
    when the pair isn't system-installed),
    {{FONT_DISPLAY}}
    ,
    {{FONT_BODY}}
    ,
    {{BG}}
    (lightest palette tone),
    {{FG}}
    ,
    {{MUTED}}
    .
Then call
mcp__plugin_pika_pika__html_to_png
with
format: "png"
and
raster_options.viewport_px = { width: 1920, height: 1224 }
. If it returns
{task_id, status}
, poll
mcp__plugin_pika_pika__task_status({task_id})
until terminal and read the image URL. For the PDF full-bleed page (Step 4 page 3), render the SAME tiles through
references/templates/mood-board-no-header.template.html
at
{ width: 1920, height: 1080 }
— that variant has zero gutters so the PDF page shows no cream slivers.
Mandatory checks before delivering:
  1. Curated tiles required. A fully-generated mood board is a fail state. Valid curated tiles are user-supplied images or clean public-feed curated frames from a supported social handle. If you can't get either onto the board, the gate is "unblock the real curated source," not "ship without."
  2. No regenerated real subjects. Re-read the hard rule above. The user's face, pet, partner, friends — never generated. Their atmosphere — fine.
  3. No baked-in text on any tile. gpt-image-2 prompts must include the no-text guardrail — append
    "NO text anywhere in image"
    (load-bearing — gpt-image-2 otherwise bakes faux brand labels onto bottles, scarves, books; see Load-bearing phrases section). For handle-only reel-derived curated tiles, use frame 0 from the durable rehosted
    .mp4
    ; if a reel frame still shows burned-in caption/title overlays, drop it rather than shipping text on the board.
  4. Anti-stocky test. If the board reads as stock photography (uniform polish, uniform palette, studio-perfect, no real-life imperfection) — fail. Trendy 2024–2026 aesthetic energy = real-feeling snapshots > campaign-clean glossiness. The curated user tiles anchor the "real" register; generated tiles should match that energy, not slick up.
  5. Palette variation within the identity. A cohesive palette is fine; a single-register board ("just brown" / "just sage" / "just beige") reads dead. Even within a warm-cognac identity, you can include warm cream, dusty rose, deep navy, sage — whatever co-exists naturally in their world.
  6. Cast diversity on any generated tiles featuring people. Name an ethnicity per prompt (load-bearing procedural rule — gpt-image-2 defaults to lighter skin tones otherwise; see Load-bearing phrases section). (Note: with the hard rule above, generated tiles featuring people should be RARE — strangers in scenes only, never the user.) See
    references/aesthetic-prompts.md
    for the cast-diversity prompt pattern.
  7. Zoom-read the composite. Open the PNG. Verify no overlapping tiles, no broken aspect ratios, no misaligned content. Tile borders clean.
  8. Anti-template check. Should feel like THIS person, not a one-word aesthetic preset (NOT "clean girl beige", NOT "dark academia", NOT "cottagecore" unless the user named that aesthetic). If you can label the board with one TikTok aesthetic word — push for specificity.
Deliver in chat: "here's your mood board — [link to PNG]. the curated tiles are pulled from [sources]; I filled gaps in [named gaps] with generated tiles. does this feel like your visual world?"
🛑 Wait for explicit approval before moving on.
锁定身份后再制作情绪板。情绪板需严格匹配步骤2中已批准的identity.md的
视觉美学
部分——而非探索美学风格
。在获取或生成任何素材前,重新阅读该部分。所有素材(精选+生成)必须符合锁定的规范。
素材优先级——必须混合使用,全生成素材是失败的:
  1. 用户提供的高清照片(相册照片、用户喜欢的照片、高清文件)——最佳精选素材来源。保存至
    mood-board/curated/
    。 1a. 仅提供账号的视频类/以Reel为主的账号的 fallback方案。若用户提供了公开账号但未单独提供照片文件,爬取的动态图片可作为精选素材。对于图片动态,使用最高质量的
    image_versions2.candidates[0].url
    。对于
    media_url
    .mp4
    的视频动态,不要使用临时IG
    visual_media_url
    /签名封面JPG作为精选素材来源;不要修改任何签名封面的查询参数,否则会导致
    URL signature mismatch
    ,且中间帧封面通常包含内嵌文案或标题。仅使用步骤1中
    rehost: true
    爬取返回的持久化重托管
    .mp4
    pika_cdn_url
    /
    cdn.pika.art
    媒体;视为
    is_durable: true
    ),并调用
    mcp__plugin_pika_pika__extract_frame(video_url: <durable_rehosted_mp4>, time_s: 0)
    。将提取的帧视为候选素材,而非已批准素材:在使用前立即调用
    mcp__plugin_pika_pika__analyze_media
    对帧URL进行OCR/可见文字/无文字检查。该检查必须在帧成为情绪板素材、PDF内容分类素材或风格页精选素材前完成。若第0帧包含文字、标题卡或内嵌文案,则放弃该Reel或尝试后续关键帧(例如
    time_s: 1.5
    time_s: 3
    ),并在使用前对后续关键帧执行相同的
    mcp__plugin_pika_pika__analyze_media
    无文字检查。若没有持久化重托管
    .mp4
    ,则重新执行
    rehost: true
    爬取;不要 fallback到签名封面URL。最终PDF质检只是该素材来源环节的后备,而非第一个文字检测步骤。
  2. 绝不从IG网格截图中裁剪素材。网格截图中的素材宽度与像素对齐不匹配;自动裁剪会导致边缘溢出;结果质量差。已确认的失败方法。若用户提供网格截图,请其提供单张照片
  3. 无精选素材的情绪板是失败的。若既无用户照片也无清晰的公开动态精选帧,需先解决该问题——绝不使用生成素材替代。若用户本打算提供照片但未成功上传,请其重新上传。对于从零开始/无社交账号的用户,在此暂停并请求16张及以上照片,再制作视觉成果:6张情绪板精选素材 + 10张PDF风格页精选素材。
  4. 阅读步骤2中的
    与现有动态的关系
    字段
    。若锁定的美学风格与用户现有动态完全一致,则直接使用精选素材。若步骤2指出是小幅进化(例如“现有动态偏暖且色调较淡,网红美学风格加深阴影+提亮冷色调”),则在HTML合成渲染前应用轻度CSS调色处理——
    filter: contrast(...) saturate(...) brightness(...)
    、微妙的叠加层或轻度混合模式处理。绝不要将用户照片重新传入gpt-image-2——图像模型重新渲染会导致人脸/宠物/物体身份偏差,破坏“这就是真实的她”的锚点。若差异过大无法通过CSS处理解决,则在新美学风格中增加更多生成氛围素材、减少未经处理的精选素材,而非重新生成用户照片。
  5. 识别美学覆盖缺口——锁定的步骤2美学风格中哪些部分尚未体现(特定场景、构图、光线条件)?
  6. 使用
    mcp__plugin_pika_pika__generate_image
    (provider="gpt-image-2")填补缺口
    ,仅用于**社交内容氛围*(核心表述——引导gpt-image-2远离愿景板/壁纸风格;详见核心表述部分)*,并按照步骤2的美学风格(配色、光线、场景、摄影风格)撰写提示词。遵循以下硬性规则。生成素材保存至
    mood-board/generated/
🛑 硬性规则:生成素材需呈现社交内容风格,而非氛围愿景板风格
网红身份的情绪板必须看起来像创作者的真实动态,而非愿景板。
  • 正确示例: 手机视角、3/4抓拍、镜面自拍(若生成则不露脸)、手持视角(桌上咖啡、腿上书籍、酒吧饮品)、桌面 overhead平铺、更衣室穿搭平铺、黄金时段背影行走、早午餐食物特写、烛光晚餐桌、古着店浏览、抬头看天空视角、幕后操作特写。
  • 错误示例: 风景画、氛围“氛围感”素材、无人物的日落港口、苔藓微距、纯风景、任何看起来像桌面壁纸或艺术印刷品的内容。若素材看起来像2014年的Tumblr美学帧——失败。情绪板必须呈现2024–2026年的社交动态风格。
生成素材的提示词必须明确构图(手机视角、不露脸镜面自拍、3/4抓拍、overhead桌面等)及瞬间(捕捉的具体生活瞬间),而非仅“氛围感”。
详见
references/aesthetic-prompts.md
中的“社交趋势构图”提示词模板。
🛑 硬性规则:生成素材仅用于氛围补充——绝不生成用户的真实主体
  • 用户的脸、身体、发型 → 仅使用其真实照片
  • 用户的狗、猫、宠物、马 → 仅使用其真实照片
  • 用户的伴侣、朋友、家人、孩子 → 仅使用其真实照片
  • 用户的真实汽车、真实住所、真实常用物品 → 仅使用其真实照片
生成素材用于呈现:房间、天气、窗边光线、硬木地板、鹅卵石路上的雪、用户收集的物品类型(复古皮具、黄铜手表)、用户常去的场所类型(黄金时段天际线、林间小径)。绝不生成用户真实人物或物品的仿制品。原因:人脸/动物的保真度会偏差,仿造用户真实宠物会显得怪异且冒犯。用户会说“那不是我的狗”。
合成——使用模板填充,不要自定义CSS。情绪板基于固定模板
references/templates/mood-board.template.html
,通过
mcp__plugin_pika_pika__html_to_png
渲染。绝不要让gpt-image-2渲染“情绪板网格”——它会虚构素材。你只需提供数据: 恰好12个
.tile
行(每个包含图片URL + 单张图片的
background-position
裁剪焦点)、品牌名称、副标题风格词、字体组合、配色标记。模板负责网格计算、白色标题栏及尺寸——避免出现边缘空白等问题。将
{{COLS}}
设置为6(默认;大多数相册输入为竖版)或4(若输入以横版为主)。使用恰好12个素材,使带标题栏的情绪板与PDF无标题全屏变体填满画布,无空白单元格。
精选:生成素材比例 = ~50/50。12个素材中约6个精选 + 6个生成。若情绪板包含8+精选素材 + 2-3个生成素材,会显得像“用户相册直接导入网格”——而非设计后的成果。生成素材用于将美学风格拓展到用户尚未拍摄的品牌化场景(早午餐桌面、羊毛大衣穿搭自拍、饮品视角)。50/50的比例会让情绪板看起来是经过设计且锚定真实生活的成果。
字体系统——在此锁定,同步到步骤4 PDF
选择匹配步骤2中锁定的视觉美学形容词的字体组合(标题+正文)。不要默认使用Didot或Helvetica——错误的字体会让情绪板显得违和,削弱PDF的专业性。测试标准是字体个性是否匹配美学形容词。启发式起点:
美学定位标题字体正文字体
低调奢华、优雅、精致Didot, Bodoni ModaInter, Helvetica Neue
电影感、舒适海岸风、编辑感Cormorant Garamond, Playfair DisplayInter, Söhne
极简、现代、干净Inter Display, Helvetica NeueInter, Helvetica Neue
活泼、复古、Y2K、趣味Recoleta, Bagnard, 自定义标题字体Inter, IBM Plex Sans
温暖柔美、浪漫、柔和Italiana, Cormorant InfantInter, Söhne
编辑感、杂志风格Playfair Display, GT SectraInter, Helvetica Neue
质朴、手工感、有机Caudex, Recoleta SoftInter, IBM Plex Sans
将选定的字体组合保存到persona.md的视觉美学部分,以便步骤4 PDF调用。若字体不是系统预装字体(除Helvetica外的大多数上述字体),则从Google Fonts获取,将
{{FONT_IMPORT}}
填充为完整的
<link rel="stylesheet" href="...">
标签。
渲染情绪板。填充模板标记:
  • {{TILES}}
    — 每个素材对应一个
    <div class="tile" style="background-image:url('IMAGE_URL');background-position:POS"></div>
    POS
    是单张图片的裁剪焦点:
    50% 20%
    (上半身穿搭)、
    50% 35%
    (俯视厨房)、
    50% 50%
    (居中)。
  • {{BRAND_NAME}}
    ,
    {{SUBTITLE}}
    (带字间距的风格词,例如
    BOSTON · WOOL COATS · COCKTAIL BARS
    )。
  • {{FONT_IMPORT}}
    (当字体组合不是系统预装时,使用Google Fonts的
    <link>
    标签)、
    {{FONT_DISPLAY}}
    ,
    {{FONT_BODY}}
    ,
    {{BG}}
    (最浅配色)、
    {{FG}}
    ,
    {{MUTED}}
然后调用
mcp__plugin_pika_pika__html_to_png
,设置
format: "png"
raster_options.viewport_px = { width: 1920, height: 1224 }
。若返回
{task_id, status}
,则轮询
mcp__plugin_pika_pika__task_status({task_id})
直至任务完成,然后读取图片URL。对于PDF全屏页(步骤4第3页),使用
references/templates/mood-board-no-header.template.html
渲染相同素材,设置
{ width: 1920, height: 1080 }
——该变体无任何边距,确保PDF页面无边缘空白。
交付前的强制检查:
  1. 必须包含精选素材。全生成素材的情绪板是失败的。有效精选素材包括用户提供的图片或从支持的社交账号提取的清晰公开动态帧。若无法获取任何精选素材,需先解决该问题,而非交付无精选素材的情绪板。
  2. 绝不生成用户的真实主体。重新阅读上述硬性规则。用户的脸、宠物、伴侣、朋友——绝不生成。仅生成氛围素材。
  3. 所有素材不得包含内嵌文字。gpt-image-2提示词必须包含无文字约束——追加
    "NO text anywhere in image"
    (核心表述——否则gpt-image-2会在瓶子、围巾、书籍上添加虚假品牌标签;详见核心表述部分)。对于仅提供账号的Reel衍生精选素材,使用持久化重托管
    .mp4
    的第0帧;若Reel帧仍包含内嵌文案/标题叠加层,则放弃该素材,而非交付带文字的情绪板。
  4. 反库存感测试。若情绪板看起来像库存照片(统一抛光感、统一配色、工作室完美质感、无真实生活瑕疵)——失败。2024–2026年的流行美学风格 = 真实快照 > 广告级完美质感。用户的精选素材锚定“真实”风格;生成素材需匹配该风格,而非过度美化。
  5. 身份内的配色变化。配色统一是可以的;但单一风格的情绪板(“全棕色”/“全鼠尾草绿”/“全米色”)会显得呆板。即使是温暖白兰地棕的身份,也可融入暖奶油色、灰粉色、深藏青色、鼠尾草绿——只要这些颜色自然存在于用户的生活中。
  6. 生成素材中若有人物,需体现多样性。每个带有人物的提示词需指定种族*(核心流程规则——否则gpt-image-2默认生成浅肤色人物;详见核心表述部分)*。(注意:根据上述硬性规则,生成素材中出现人物应是罕见的——仅为场景中的陌生人,而非用户本人。)详见
    references/aesthetic-prompts.md
    中的多样性提示词模板。
  7. 放大查看合成结果。打开PNG文件,验证无素材重叠、无比例失调、无内容错位。素材边框清晰。
  8. 反模板化检查。情绪板应体现该用户的独特性,而非单一美学预设(除非用户明确指定,否则不得是“clean girl米色”、“暗黑学术风”、“ cottagecore”)。若能用一个TikTok美学词标签情绪板——需进一步细化。
在聊天中交付: “这是你的情绪板——[PNG链接]。精选素材来自[来源];我用生成素材填补了[明确缺口]。这符合你的视觉风格吗?”
🛑 等待明确批准后再进入下一步。

Step 4 — Influencer Persona PDF (HARD GATE — approval required)

步骤4 — 网红Persona PDF制作(硬性确认环节——需用户批准)

The deliverable is a multi-page branded PDF titled
[Name]'s Influencer Persona
, not an inline markdown block. Never label this artifact a "media kit" — the user came here to define their persona, not to pitch partners yet; calling it a media kit is premature and undercuts the glow-up framing. The PDF lays voice-bank content into a designed document alongside the persona overview, content categories, approved mood board, and the Key Next Steps roadmap from Step 2's strategic conversation. The user approves the voice bank by approving the PDF.
What the PDF is for: the personal artifact that captures who the user is online — what they post, how they sound, what their feed looks like, and the concrete next steps to actually become that persona. It has to be well-designed — strong typography, clear hierarchy, generous white space, on-aesthetic. A sloppy PDF undercuts the entire kit.
交付成果为多页品牌化PDF,标题为
[姓名]'s Influencer Persona
,而非inline markdown块。绝不要将该成果称为“媒体包”——用户前来是为了定义自身Persona,而非向品牌推销;称为媒体包为时过早,且削弱了形象升级的定位。PDF将风格库内容与Persona概述、内容分类、已批准情绪板及步骤2策略对话中的关键下一步行动路线图整合为设计精美的文档。用户通过批准PDF来确认风格库内容。
PDF用途: 记录用户线上形象的个人成果——包括发布内容、语气风格、动态视觉效果,以及实现该形象的具体下一步行动。必须设计精良——排版清晰、层级明确、留白合理、符合美学风格。粗糙的PDF会削弱整个工具包的专业性。

4a — Draft the voice bank content

4a — 撰写风格库内容

Generate the voice-bank text first (the same content the old skill produced inline). Save it as
./tmp/[handle]-influencer-kit/voice-bank.md
for downstream skill consumption. Content sections:
  • Bios (3 variants): IG main (short tagline), IG alt (longer, more specific), TikTok / casual.
  • Caption modes (5): Hook, Story, Casual selfie, Vulnerable, Promo. Each mode = a CONTEXT label ("use when…") + 2 sample captions showing range (e.g., one short + one slightly longer).
  • Hook openers (5): opening lines for Story / Reel / TikTok intros.
  • DM + comment voice: reply-to-compliment, decline-a-brand-pitch, reply-to-hater (or "doesn't engage"), default emoji palette.
  • Do & Don't (5 each): person-specific, not generic ("use specific place names" beats "post consistently").
Quality bars on the copy itself:
  • Seed every caption from the scraped captions in Step 1 — sentence rhythm, vocabulary, emoji habits, recurring phrases, hook style. Chat-voice fallback only if scrape failed.
  • Every caption passes the anti-generic test (could it belong to someone else? rewrite if yes).
  • Every caption passes the AI-creator-voice ban list (no "apparently i [did thing] now", no em-dash parentheticals for wit, no "[x] + [y]" lists with plus connectors, no "fake errand", no listicle minimal poetry, no "I'm just a girl who…").
  • Forbidden words from the identity never appear in any sample caption.
  • The 2-captions-per-mode range shows genuine variation.
先生成风格库文本(与旧技能inline生成的内容一致)。保存为
./tmp/[账号名]-influencer-kit/voice-bank.md
,供下游技能调用。内容板块:
  • 个人简介(3种变体): IG主简介(短标语)、IG备用简介(更长、更具体)、TikTok/休闲风格简介。
  • 文案模板(5种): 钩子、故事、休闲自拍、脆弱感、推广。每种模板 = 场景标签(“适用场景…”) + 2个示例文案(体现风格范围,例如1短1长)。
  • 开场钩子(5个): 故事/Reel/TikTok开场台词。
  • 私信+评论语气: 回复赞美、拒绝品牌合作、回复喷子(或“不回应”)、默认表情符号组合。
  • 注意事项(各5条): 针对用户的具体建议,而非通用建议(例如“使用具体地名”优于“持续发布”)。
文案质量标准:
  • 所有文案以步骤1爬取的文案为基础——句子节奏、词汇、表情符号习惯、常用短语、钩子风格。仅当爬取失败时才使用聊天语气作为 fallback。
  • 每个文案需通过反通用测试(是否可被其他人使用?若是则改写)。
  • 每个文案需通过AI创作者语气禁用列表(不得出现“apparently i [did thing] now”、不得用破折号括号体现幽默、不得用加号连接“[x] + [y]”列表、不得出现“fake errand”、不得出现清单式极简诗歌、不得出现“I'm just a girl who…”)。
  • 身份中禁用的词汇不得出现在任何示例文案中。
  • 每种模板的2个示例文案需体现真实的风格差异。

4b — Lay it into the PDF

4b — 整合到PDF中

PDF spec:
  • Page size: 16:9 landscape, 1920×1080 per page.
  • Page count: exactly 12 pages maximum, each focused on one idea. Do not expand the kit past 12 pages: the required
    mcp__plugin_pika_pika__analyze_media
    PDF QA path uses
    all_pages: true
    , and that sync path supports at most 12 PDF pages.
  • Typography: the
    fontDisplay
    +
    fontBody
    pair locked in Step 3. NO third font, no mixing of register.
  • Palette: the Visual aesthetic palette from Step 2. Page backgrounds default to the lightest tone; accents from the rest of the palette.
Page sequence (target):
  1. Cover — name in display font (very large, 120–180pt), tagline subtitle in tracked body-font caps, one hero image (full-bleed or large-cropped), small "Influencer Persona · [year]" label (never "Media Kit")
  2. About / Identity — 1–2 paragraphs synthesized from
    identity.md
    (real-you + influencer-you, authenticity dial named). Side imagery: 1–3 curated photos + a small stats grid (Based / Niche / Status / Voice)
  3. Mood Board — FULL BLEED — the approved mood board embedded edge-to-edge, no header band, no padding. Pre-render it by filling
    references/templates/mood-board-no-header.template.html
    ; the template owns the no-header grid math: 6 cols × 320w × 2 rows × 540h = 1920×1080 exactly, no gaps. Tiles touch directly with no cream between them, otherwise the PDF page shows 2–4px cream slivers on the edges and between rows. The title-banded mood board (with gutters + header) is the standalone disk artifact at
    mood-board.png
    ; the edge-to-edge variant is
    mood-board-no-header.png
    and only used in the PDF.
  4. Content Categories — 4 cards laid out in a 2×2 grid, each card = image + category name + one-line description + 3–4 example post types. If a scrape exists, categories are derived from the user's actual scraped feed: tally the user's 30 scraped post captions, group by topic/format, pick the four highest-volume buckets. If there is no scrape/no social yet, label the page Proposed Content Categories and derive the four cards from the user's answers, taste URLs, reference creators, and real photos; do not claim they are existing feed frequencies. Examples: Outfits & Styling / Boston Lifestyle / Home & Recipes / Brand Partnerships for a NE-coastal lifestyle creator; Workouts / Recipes / Workouts-to-Recipes / Brand Collabs for a fitness creator; Tour & Travel / Studio / Real Life / Brand Collabs for a musician. Use the user's own specificity (e.g. "Nutcracker, holiday markets, Snowport" not "city event content"). This is more useful than a generic color-swatch page. Do NOT use a flat color-swatch page or a textured-material-swatch page — both variants read as boring; the texture variant specifically reads as "all cloth" and doesn't tell anyone what the creator actually MAKES.
Content Categories layout lives in
references/templates/pdf-content-categories.template.html
(300×400 image,
grid-template-columns: 300px 1fr
card — baked in). Fill
{{CONTENT_CATEGORIES_TITLE}}
as
Content Categories
when a scrape exists and
Proposed Content Categories
when no scrape/no social exists. Fill 4 cards' image/name/desc/post-types + a per-card
{{CARD_N_POS}}
crop focus (
50% 20%
upper-body,
50% 35%
top-down kitchen,
50% 50%
centered). Don't author the card CSS.
5–9. Voice modes — one page per caption mode (Hook, Story, Casual selfie, Vulnerable, Promo). Layout: text left half (mode name + "use when" context + 2 sample captions), 2×2 grid of 4 tiles right half.
Voice-page layout lives in
references/templates/pdf-voice-mode.template.html
— one template reused for all 5 modes. The padded grid that prevents the off-page bleed (80px equal padding, 800/160/800 columns, 2×2 grid) is baked in. Fill
{{MODE_NAME}}
/
{{USE_WHEN}}
/ 2 captions / 4 tile URLs (+ optional per-tile crop). Don't re-derive the grid CSS.
  1. Hook openers + DM voice — numbered openers on one side, DM examples on the other
  2. Do & Don't — two-column side-by-side
  3. Key Next Steps — the roadmap page (replaces the older References + Contact page; never re-add a contact card). 4–6 cards in a 2-column grid, each card = number + section label (e.g. "01 · Lighting & Framing") + one-line headline + 2–3 sentences of concrete action. Content comes directly from Step 2's strategic conversation — the lucrative direction chosen in 2a, the gaps named in 2b, and the gear recommendations from 2c. Each card must be a shippable action with a specific number, brand, or behavior — not "improve your content." Example cards (from the gamar___b sample kit): "Stop shooting into harsh overhead light — buy a phone tripod ($15) + clip-on reflector ($25)", "Currently 80% OOTD; target 40/25/20/15 across the 4 content categories", "5K in 90 days: better lighting + 4–5×/week cadence + 1 reel/week tied to a seasonal moment".
  • Next Steps layout lives in
    references/templates/pdf-next-steps.template.html
    — the explicit
    .next-grid { height: 720px }
    that prevents BOTH the 13th-empty-page and the 300px dead-zone is baked in (Chromium collapses
    1fr
    rows without a definite height; >740px overflows into an extra page). Fill 4–6 cards (number / label / headline / 2–3 sentences); delete the trailing card blocks for fewer than 6. Don't re-derive this CSS.
Voice-mode page imagery — the demonstration:
  • Exactly 4 tiles per voice-mode page, arranged in the 2×2 grid above. With an 800×920px grid and 16px gap, each tile cell is 392×452px at page scale. Fill the cell with
    width: 100%; height: 100%; background-size: cover;
    and tune
    background-position
    per tile; do not force a separate 3:4 crop that would overflow the grid.
  • Split: 2 curated + 2 freshly-generated tiles per page. Curated = scraped feed images at higher quality than what's in the mood board (different posts). For handle-only video-heavy / mostly Reels accounts, treat clean
    mcp__plugin_pika_pika__extract_frame(video_url: <durable_rehosted_mp4>, time_s: 0)
    outputs from
    .mp4
    media_url
    posts as candidate curated feed images; do not use altered
    visual_media_url
    signed covers. Before any reel-derived frame is placed on a voice page or content-category card, call
    mcp__plugin_pika_pika__analyze_media
    on the candidate frame with an OCR / visible text / no-text prompt. Reject frames with baked-in caption/title overlays; if you try a later keyframe, run the same
    mcp__plugin_pika_pika__analyze_media
    no-text check on that later keyframe before placement. If there is no feed/no social yet, use different user-provided camera-roll photos instead. If there are no curated user photos or clean public-feed curated frames, do not build the PDF; pause and request real photos.
  • Voice-page tiles MUST be different from the mood board. The mood board is the visual world; voice pages demonstrate the voice with fresh imagery. Reusing mood-board tiles flattens the read.
  • Be smart per mode — match imagery to mode intent:
    • Hook → bold scroll-stopper OOTD / cinematic outfit walks
    • Story → narrative moments (kitchen, family content, walking scenes)
    • Casual Selfie → actually selfies from the user's real photos, or face-free generic selfie-register generated tiles if the user did not provide selfie photos — mirror selfie with phone fully obscuring face, cropped outfit/body detail, back-of-head 3/4, hand-obscured face. No visible generated faces. NOT abstract atmosphere.
    • Vulnerable → quiet-moment imagery — unmade bed, candle, journal, wine-and-candle
    • Promo → product-on-table, brand-deal-style flat lays, the user's own brand collab posts
  • Curated photos (the user's real face / pet / partner) are placed as-is — never AI-modified. Light CSS color treatment from Step 3 is OK; full re-rendering through gpt-image-2 is not.
  • For generated selfie tiles, never generate the user's face — render faceless mirror-selfie POVs (
    "hand obscuring face"
    ,
    "phone obscuring face"
    ,
    "back-of-head 3/4"
    ) (load-bearing phrasings — without them gpt-image-2 invents the user's face, breaking the "this is actually her" anchor; see Load-bearing phrases section) so the imagery reads as a selfie register without faking identity.
  • gpt-image-2 content-policy fallback: if a generated selfie-register prompt is rejected because the words "mirror selfie", "selfie", or "phone obscuring face" trip the policy filter, retry once with reworded, policy-safe framing. Drop the word "selfie"; write a hand-held phone body-framing shot, cropped outfit/body detail, or face hidden by a hand-held phone. Do not retry the same prompt or repeat rejected wording. The fallback still requires no visible generated faces and must not invent the user's face.
PDF规范:
  • 页面尺寸: 16:9横版,每页1920×1080像素。
  • 页数: 最多12页,每页聚焦一个主题。不得超过12页:所需的
    mcp__plugin_pika_pika__analyze_media
    PDF质检路径使用
    all_pages: true
    ,该同步路径最多支持12页PDF。
  • 排版: 步骤3中锁定的
    fontDisplay
    +
    fontBody
    字体组合。不得使用第三种字体,不得混合风格。
  • 配色: 步骤2中视觉美学的配色方案。页面背景默认使用最浅色调;强调色使用其他配色。
页面顺序(目标):
  1. 封面 — 大字号标题字体显示姓名(120–180pt)、全大写正文字体显示标语副标题、一张主图(全屏或大幅裁剪)、小字号标签“Influencer Persona · [年份]”(绝不要用“Media Kit”)
  2. 关于/身份定位 — 1–2段从
    identity.md
    提炼的内容(真实的你 + 网红身份,明确说明真实性平衡)。侧边配图:1–3张精选照片 + 小型统计网格(基于/细分领域/状态/语气)
  3. 情绪板——全屏 — 已批准的情绪板全屏嵌入,无标题栏、无内边距。通过填充
    references/templates/mood-board-no-header.template.html
    预渲染;模板负责无标题网格计算:6列×320宽 × 2行×540高 = 1920×1080,无间隙。素材直接拼接,无边缘空白,否则PDF页面边缘及行之间会出现2–4px空白。带标题栏的情绪板(带边距+标题)是独立磁盘成果
    mood-board.png
    ;全屏变体是
    mood-board-no-header.png
    ,仅用于PDF。
  4. 内容分类 — 2×2网格布局的4张卡片,每张卡片 = 图片 + 分类名称 + 一行描述 + 3–4个示例内容类型。若有爬取数据,分类基于用户实际爬取的动态:统计用户30条爬取文案,按主题/格式分组,选择4个最高频类别。若无爬取数据/无社交账号,则页面标题为建议内容分类,基于用户回答、风格URL、参考创作者及真实照片推导4个分类;不得声称是现有动态频率。示例:新英格兰海岸生活博主的分类为穿搭风格/波士顿生活/居家食谱/品牌合作;健身博主为锻炼/食谱/练后食谱/品牌合作;音乐人为巡演旅行/工作室/真实生活/品牌合作。使用用户的具体细节(例如“胡桃夹子、假日集市、Snowport”而非“城市活动内容”)。这比通用配色页更有用。不得使用纯色配色页或材质纹理页——两者都显得无聊;纹理页尤其会让人觉得“全是布料”,无法体现创作者的实际内容。
内容分类布局位于
references/templates/pdf-content-categories.template.html
(300×400图片,
grid-template-columns: 300px 1fr
卡片——已内置)。若有爬取数据,
{{CONTENT_CATEGORIES_TITLE}}
填充为
Content Categories
;若无爬取数据/无社交账号,填充为
Proposed Content Categories
。填充4张卡片的图片/名称/描述/内容类型 + 每张卡片的
{{CARD_N_POS}}
裁剪焦点(
50% 20%
上半身、
50% 35%
俯视厨房、
50% 50%
居中)。不要自定义卡片CSS。
5–9. 语气模板 — 每个文案模板对应一页(钩子、故事、休闲自拍、脆弱感、推广)。布局:左半部分为文本(模板名称 + “适用场景” + 2个示例文案),右半部分为2×2网格的4个素材。
语气页布局位于
references/templates/pdf-voice-mode.template.html
——一个模板复用所有5种模板。内置防止内容溢出的内边距网格(四边80px等内边距,800/160/800列,2×2网格)。填充
{{MODE_NAME}}
/
{{USE_WHEN}}
/ 2个文案 / 4个素材URL(+可选的单素材裁剪)。不要重新推导网格CSS。
  1. 开场钩子 + 私信语气 — 一侧为编号钩子,另一侧为私信示例
  2. 注意事项 — 两栏并排布局
  3. 关键下一步行动路线图页面(替代旧版参考+联系页面;绝不要重新添加联系卡片)。2列网格布局的4–6张卡片,每张卡片 = 编号 + 板块标签(例如“01 · 光线与构图”) + 一行标题 + 2–3句具体行动。内容直接来自步骤2的策略对话——2a中选择的盈利方向、2b中指出的差距、2c中的设备建议。每张卡片必须是可执行的具体行动,包含明确的数字、品牌或行为——而非“提升你的内容”。示例卡片(来自gamar___b示例工具包):“停止在顶光下拍摄——购买手机三脚架(15美元)+ 夹式反光板(25美元)”、“当前80%为穿搭内容;目标40/25/20/15分配到4个内容分类”、“90天内达到5K粉丝:优化光线 + 每周4–5次发布 + 每周1条季节性Reel”。
  • 下一步行动布局位于
    references/templates/pdf-next-steps.template.html
    ——内置明确的
    .next-grid { height: 720px }
    ,防止出现第13空白页和300px空白区域(Chromium会在无明确高度时折叠
    1fr
    行;超过740px会溢出到额外页面)。填充4–6张卡片(编号/标签/标题/2–3句);若少于6张则删除多余卡片块。不要重新推导该CSS。
语气页配图——演示要求:
  • 每页语气页恰好4个素材,排列为上述2×2网格。800×920px网格 + 16px间隙,每个素材单元格在页面缩放后为392×452px。使用
    width: 100%; height: 100%; background-size: cover;
    填充单元格,并根据素材调整
    background-position
    ;不要强制使用3:4裁剪,否则会溢出网格。
  • 比例:每页2个精选素材 + 2个新生成素材。精选素材 = 爬取动态中质量更高的图片(与情绪板不同的动态)。对于仅提供账号的视频类/以Reel为主的账号,将从
    .mp4
    media_url
    动态中提取的清晰
    mcp__plugin_pika_pika__extract_frame(video_url: <durable_rehosted_mp4>, time_s: 0)
    输出视为候选精选素材;不要使用修改后的
    visual_media_url
    签名封面。在将任何Reel衍生帧用于语气页或内容分类卡片前,调用
    mcp__plugin_pika_pika__analyze_media
    对候选帧执行OCR/可见文字/无文字检查。若帧包含内嵌文案/标题叠加层,则放弃该帧;若尝试后续关键帧,需对后续关键帧执行相同的
    mcp__plugin_pika_pika__analyze_media
    无文字检查。若无动态/无社交账号,则使用用户提供的其他相册照片。若无精选用户照片或清晰的公开动态帧,则不制作PDF;暂停并请求真实照片。
  • 语气页素材必须与情绪板素材不同。情绪板是视觉风格展示;语气页需用新素材演示语气。重复使用情绪板素材会削弱内容层次感。
  • 根据模板匹配素材——素材需符合模板意图:
    • 钩子 → 吸睛穿搭/电影感穿搭行走
    • 故事 → 叙事场景(厨房、家庭内容、行走场景)
    • 休闲自拍 → 真实自拍(用户真实照片),若用户未提供自拍照片,则生成无脸自拍风格素材(手机完全遮挡脸的镜面自拍、裁剪穿搭/身体细节、3/4背影、手遮挡脸)。不得出现生成的人脸。不得使用抽象氛围素材。
    • 脆弱感 → 安静场景素材(未整理的床、蜡烛、日记、酒与蜡烛)
    • 推广 → 桌上产品、品牌合作风格平铺、用户自身的品牌合作动态
  • 精选照片(用户真实脸/宠物/伴侣)直接使用——绝不进行AI修改。步骤3的轻度CSS调色是允许的;但不得通过gpt-image-2重新渲染。
  • 生成自拍素材时,绝不生成用户的脸——渲染无脸镜面自拍视角(
    "hand obscuring face"
    "phone obscuring face"
    "back-of-head 3/4"
    (核心表述——否则gpt-image-2会虚构用户的脸,破坏“这就是真实的她”的锚点;详见核心表述部分),使素材呈现自拍风格但不伪造身份。
  • gpt-image-2内容政策 fallback: 若生成自拍风格的提示词因“mirror selfie”、“selfie”或“phone obscuring face”等词汇触发政策过滤而被拒绝,重新尝试一次,使用符合政策的措辞。删除“selfie”一词;描述手持手机拍摄身体的画面、裁剪穿搭/身体细节,或手遮挡脸的画面。不得重复使用被拒绝的提示词。Fallback仍需确保无生成人脸,且不得虚构用户的脸。

4c — Design principles (non-negotiable)

4c — 设计原则(不可协商)

  • Typography hierarchy. Display font for page titles + brand name; body font for subtitles, captions, body. Consistent type scale across pages (e.g., 120 / 64 / 32 / 18 / 13 pt). No third type family.
  • White space is a feature. Don't fill page edges to edges. Generous margins (80–120px outer).
  • One focal element per page. Each page has one thing to look at — title + visual, or caption + photo, not 8 things competing.
  • Palette discipline. Use the locked Visual aesthetic palette throughout. Don't introduce new colors.
  • No baked-in fake data. Don't invent follower counts, engagement rates, or brand partnerships that didn't happen. If a stats section is included, mark inferred items.
  • 排版层级。标题字体用于页面标题+品牌名称;正文字体用于副标题、文案、正文。全页面使用一致的字体缩放比例(例如120 / 64 / 32 / 18 / 13 pt)。不得使用第三种字体。
  • 留白是特色。不要填满页面边缘。使用充足的外边距(80–120px)。
  • 每页一个焦点。每页只有一个核心视觉元素——标题+视觉,或文案+照片,而非8个竞争元素。
  • 配色纪律。全程使用锁定的视觉美学配色方案。不得引入新颜色。
  • 不得伪造数据。不得虚构粉丝量、互动率或未发生的品牌合作。若包含统计板块,需标注推断内容。

4d — Build approach

4d — 制作方法

  • Assemble from the page templates — don't author HTML/CSS.
    shared_head
    =
    references/templates/pdf-shared-head.template.html
    (carries the palette/font vars + ALL page CSS; fill its
    {{FONT_IMPORT}}
    /
    {{FONT_DISPLAY}}
    /
    {{FONT_BODY}}
    /
    {{BG}}
    /
    {{FG}}
    /
    {{ACCENT}}
    /
    {{MUTED}}
    tokens).
    body_pages[]
    = the filled per-page templates in sequence:
    pdf-cover
    pdf-about
    pdf-moodboard
    pdf-content-categories
    pdf-voice-mode
    ×5 (one per mode) →
    pdf-hooks-dm
    pdf-do-dont
    pdf-next-steps
    . Each body fragment is markup-only; the worker injects
    shared_head
    into every page and renders them in parallel, then merges.
  • Use one
    @page
    rule per slide (1920×1080 landscape, no margins on @page).
  • Embed fonts by filling
    {{FONT_IMPORT}}
    with one Google Fonts
    <link rel="stylesheet" href="...">
    tag; that token is inserted before the shared
    <style>
    block.
  • Reference images as HTTPS URLs — for local images, call
    mcp__plugin_pika_pika__upload_asset
    first to get a CDN URL, then reference that URL in the HTML.
  • 🛑 LOAD-BEARING: pass
    pdf_options.paper_size
    explicitly to
    mcp__plugin_pika_pika__html_to_pdf
    .
    The
    @page { size: 1920px 1080px }
    CSS rule alone is not enough
    mcp__plugin_pika_pika__html_to_pdf
    defaults to US Letter (792×612 points, 1.294 aspect ratio) if
    pdf_options.paper_size
    is omitted, and the 1920×1080-designed HTML gets scaled/letterboxed into the letter page with giant white bars top and bottom. Always pass:
    json
    "pdf_options": {
      "paper_size": {"width": 1920, "height": 1080, "unit": "px"},
      "margins": {"top": 0, "right": 0, "bottom": 0, "left": 0, "unit": "px"}
    }
    mcp__plugin_pika_pika__html_to_pdf
    defaults to async mode. If the response is
    {task_id, status}
    , poll
    mcp__plugin_pika_pika__task_status({task_id})
    in a tight loop until
    completed
    ,
    failed
    , or
    cancelled
    ; on
    completed
    , read the PDF URL from the task result before proceeding. Keep the completed
    mcp__plugin_pika_pika__html_to_pdf
    /
    mcp__plugin_pika_pika__task_status
    result in notes; its server-side structural metadata (
    page_count
    , and
    pages[]
    when present) is the only portable page-count evidence for the timeout fallback. Verify after each render by calling
    mcp__plugin_pika_pika__analyze_media
    on the rendered PDF as
    application/pdf
    with
    all_pages: true
    . Ask it to inspect all pages for 16:9 landscape framing, no US Letter letterboxing, no giant white bars, no blank trailing pages, and no page that breaks the visual hierarchy. If
    all_pages: true
    times out on an image-dense PDF, use the Step 4e per-page QA + server-result page-count /
    pdf_options.paper_size
    fallback before deciding whether the render can proceed. If it reports letterboxing or white bars, rerender with the explicit
    pdf_options.paper_size
    above.
  • Save the resulting PDF as
    ./tmp/[handle]-influencer-kit/influencer-persona.pdf
    (never
    media-kit.pdf
    ).
🛑 PDF size cap: html_to_pdf rejects outputs over 50 MB. gpt-image-2 PNG outputs are 1–3 MB each; with ~15–20 images per kit, raw PNG embeds can push past the cap on the first render. Before final PDF render, reduce large generated images through MCP, not local image libraries: wrap each large image URL in a one-image HTML/CSS frame and call
mcp__plugin_pika_pika__html_to_png
with
format: "jpg"
,
raster_options.viewport_px
capped so the longest side is ≤1400, and
raster_options.jpeg_quality: 85
. If the response is
{task_id, status}
, poll
mcp__plugin_pika_pika__task_status({task_id})
until terminal and use the completed JPG URL in the PDF HTML.
Canonical generated-tile URL pipeline (MCP-only):
  1. Generate each atmosphere tile with
    mcp__plugin_pika_pika__generate_image
    (provider="gpt-image-2").
  2. For every generated PNG that will be embedded in the PDF, transcode/downscale through
    mcp__plugin_pika_pika__html_to_png
    by placing the source URL as a full-bleed background image in a small HTML frame and rendering
    format: "jpg"
    at quality 85.
  3. Build an explicit
    asset_url_map
    in scratch notes: raw generated URL → final JPG URL. Insert the final JPG URLs into the
    body_pages
    fragments.
  4. Grep the final HTML/source notes for raw
    gpt-image-
    PNG URLs before render; the mood board full-bleed PNG may remain, but voice-page generated tiles should use the final JPG URLs.
  5. Call
    mcp__plugin_pika_pika__html_to_pdf
    with
    body_pages
    ,
    shared_head
    , and the explicit
    pdf_options.paper_size
    above.
  6. A 12-page kit at this spec should land far below 50 MB. PDFs in the 40–55 MB range mean the MCP JPG conversion step was skipped or the HTML still references raw generated PNGs. AI tiles that still look "stocky" after this means the composition itself is the problem — go to Step 4d.5 for the re-generation prompt formula.
🛑 Cloudflare R2 /
mcp__plugin_pika_pika__upload_asset
PUT gotchas:
  • Presigned URLs expire in 300s. A sequential
    curl PUT
    chain of 10+ files where each takes 8–12s can exceed the window for the last few URLs. Always run PUTs in parallel using
    (curl ... &)
    subshells +
    wait
    .
  • Subshell PUTs can drop the
    Content-Type
    header silently
    — the file uploads but R2 stores it with
    text/html
    MIME, which then makes html_to_pdf reject the image with "MIME type 'text/html' is not in the allowlist." Mitigation: after every batch PUT, verify with
    curl -s -I <url> | grep content-type
    and re-PUT any wrong types.
  • Re-mint + re-PUT immediately if you see 403/ExpiredRequest. Don't try to extend a stale URL.
  • 使用页面模板组装——不要自定义HTML/CSS
    shared_head
    =
    references/templates/pdf-shared-head.template.html
    (包含配色/字体变量 + 所有页面CSS;填充其
    {{FONT_IMPORT}}
    /
    {{FONT_DISPLAY}}
    /
    {{FONT_BODY}}
    /
    {{BG}}
    /
    {{FG}}
    /
    {{ACCENT}}
    /
    {{MUTED}}
    标记)。
    body_pages[]
    = 按顺序填充的每页模板:
    pdf-cover
    pdf-about
    pdf-moodboard
    pdf-content-categories
    pdf-voice-mode
    ×5(每个模板一页) →
    pdf-hooks-dm
    pdf-do-dont
    pdf-next-steps
    。每个body片段仅包含标记;工作进程会将
    shared_head
    注入每页并并行渲染,然后合并。
  • 每页使用一个
    @page
    规则(1920×1080横版,
    @page
    无内边距)。
  • 通过填充
    {{FONT_IMPORT}}
    为Google Fonts的
    <link rel="stylesheet" href="...">
    标签嵌入字体;该标记插入到共享
    <style>
    块之前。
  • 图片引用使用HTTPS URL——对于本地图片,先调用
    mcp__plugin_pika_pika__upload_asset
    获取CDN URL,再在HTML中引用该URL。
  • 🛑 核心要求: 向
    mcp__plugin_pika_pika__html_to_pdf
    显式传递
    pdf_options.paper_size
    。仅使用
    @page { size: 1920px 1080px }
    CSS规则不够——若省略
    pdf_options.paper_size
    mcp__plugin_pika_pika__html_to_pdf
    默认使用US Letter尺寸(792×612点,1.294比例),1920×1080设计的HTML会被缩放/添加黑边适配Letter页面,顶部和底部出现巨大空白。必须始终传递:
    json
    "pdf_options": {
      "paper_size": {"width": 1920, "height": 1080, "unit": "px"},
      "margins": {"top": 0, "right": 0, "bottom": 0, "left": 0, "unit": "px"}
    }
    mcp__plugin_pika_pika__html_to_pdf
    默认使用异步模式。若返回
    {task_id, status}
    ,则循环轮询
    mcp__plugin_pika_pika__task_status({task_id})
    直至任务
    completed
    failed
    cancelled
    ;任务完成后,从结果中读取PDF URL再继续。保留完成的
    mcp__plugin_pika_pika__html_to_pdf
    /
    mcp__plugin_pika_pika__task_status
    结果;其服务器端结构元数据(
    page_count
    及存在时的
    pages[]
    )是超时fallback的唯一可靠页数证据。每次渲染后,调用
    mcp__plugin_pika_pika__analyze_media
    对渲染后的PDF(
    application/pdf
    )执行
    all_pages: true
    检查。要求检查所有页面是否为16:9横版、无US Letter黑边、无巨大空白、无空白尾页、无破坏视觉层级的页面。若图片密集的PDF导致
    all_pages: true
    超时,则在决定是否继续渲染前,使用步骤4e的单页QA + 服务器结果页数 /
    pdf_options.paper_size
    fallback。若报告存在黑边或空白,则使用上述显式
    pdf_options.paper_size
    重新渲染。
  • 将最终PDF保存为
    ./tmp/[账号名]-influencer-kit/influencer-persona.pdf
    (绝不要用
    media-kit.pdf
    )。
🛑 PDF大小限制: html_to_pdf拒绝超过50 MB的输出。gpt-image-2的PNG输出每张1–3 MB;工具包包含约15–20张图片,原始PNG嵌入会导致首次渲染超过限制。最终PDF渲染前,通过MCP压缩大尺寸生成图片,而非本地图片库: 将每个大尺寸图片URL放入单图片HTML/CSS框架,调用
mcp__plugin_pika_pika__html_to_png
,设置
format: "jpg"
raster_options.viewport_px
最长边≤1400、
raster_options.jpeg_quality: 85
。若返回
{task_id, status}
,则轮询
mcp__plugin_pika_pika__task_status({task_id})
直至任务完成,然后在PDF HTML中使用最终JPG URL。
标准生成素材URL流程(仅使用MCP):
  1. 使用
    mcp__plugin_pika_pika__generate_image
    (provider="gpt-image-2")生成每个氛围素材。
  2. 对于每个将嵌入PDF的生成PNG,通过
    mcp__plugin_pika_pika__html_to_png
    转码/压缩:将源URL作为全屏背景放入小型HTML框架,渲染为
    format: "jpg"
    ,质量85。
  3. 在临时笔记中创建明确的
    asset_url_map
    : 原始生成URL → 最终JPG URL。将最终JPG URL插入
    body_pages
    片段。
  4. 渲染前在最终HTML/源笔记中搜索原始
    gpt-image-
    PNG URL;情绪板全屏PNG可保留,但语气页生成素材需使用最终JPG URL。
  5. 调用
    mcp__plugin_pika_pika__html_to_pdf
    ,传入
    body_pages
    shared_head
    及上述显式
    pdf_options.paper_size
  6. 符合该规范的12页工具包应远低于50 MB。若PDF大小在40–55 MB之间,说明跳过了MCP PNG→JPG转换步骤,或HTML仍引用原始生成PNG。若AI素材仍显得“库存感”,说明构图存在问题——参考步骤4d.5的重新生成提示词公式。
🛑 Cloudflare R2 /
mcp__plugin_pika_pika__upload_asset
PUT注意事项:
  • 预签名URL有效期为300秒。若顺序执行10+个
    curl PUT
    ,每个耗时8–12秒,最后几个URL可能过期。始终并行执行PUT,使用
    (curl ... &)
    子shell +
    wait
  • 子shell PUT可能会静默丢失
    Content-Type
    ——文件上传成功,但R2将其存储为
    text/html
    MIME类型,导致html_to_pdf拒绝该图片,提示“MIME type 'text/html' is not in the allowlist”。解决方法:每次批量PUT后,使用
    curl -s -I <url> | grep content-type
    验证,重新PUT类型错误的文件。
  • 若出现403/ExpiredRequest,立即重新生成并PUT。不要尝试延长过期URL的有效期。

4d.5 — Beating the AI tell (the composition is the problem)

4d.5 — 避免AI痕迹(构图是核心问题)

gpt-image-2 outputs have two recognizable AI tells:
  1. Photography style — too-perfect lighting, pristine surfaces, no real-life imperfection. Light CSS treatment can help match the surrounding board.
  2. Composition — perfectly arranged objects, centered subjects, magazine-product-shoot framing, no real-world clutter. NOT fixable after generation. It has to be solved at the prompt level.
The AI-composition trap (especially in Promo / brand-deal tiles):
  • Three skincare bottles arranged on marble with peonies and rosemary = stock-photo flat lay. AI tell, period.
  • Perfume bottle on a folded silk scarf with falling petals = product hero shot. AI tell, period.
  • Cinnamon dough perfectly rolled with butter + brown sugar in matching bowls = food-blog stock. AI tell.
The fix — prompt for real lived-in context, not styled arrangement:
  • Skincare: "Phone-camera mirror selfie detail of a generic person holding ONE bottle in a generic bathroom, phone fully obscuring face, toothbrush cup + towel cluttered behind. NOT a styled flat lay."
  • Perfume: "Close-up of a generic wrist mid-spritz with the bottle in the other hand visible in soft focus. Background: generic cluttered vanity — hairbrushes, makeup, iced coffee. NOT a product hero shot."
  • Cinnamon dough: "Generic hands rolling dough on a marble counter, dusty-rose linen towel half-falling off the side, butter and sugar in small mismatched bowls, off-center framing. NOT overhead-stock-flatlay-perfect."
The prompt formula that beats the AI tell:
[hand-held / mid-action verb] + [real environment with slight clutter] + [faceless person partially in frame] + [imperfect / off-center framing] +
"NOT a [styled flat lay / product hero shot / stock food photo]"
The trailing
"NOT a styled flat lay / product hero shot / stock food photo"
is the load-bearing anchor of this whole pattern — without it gpt-image-2 reverts to centered product composition that later CSS treatment cannot rescue. See Load-bearing phrases section for the full collection.
HTML/CSS treatment (after generation, before composite/PDF): Use the HTML render layer to match generated tiles to the user's real-photo register:
  • Slight
    filter
    changes only: mild contrast, saturation, brightness, or warmth.
  • Optional overlay: very low-opacity noise texture or vignette via CSS pseudo-elements.
  • Casual crop:
    object-position
    per tile so the subject is not uniformly centered.
  • Never alter real user subjects through an image model. Real user photos stay source-faithful.
gpt-image-2的输出有两个明显的AI痕迹:
  1. 摄影风格——光线过于完美、表面过于洁净、无真实生活瑕疵。轻度CSS处理可帮助匹配周围素材风格。
  2. 构图——物体排列过于完美、主体居中、杂志产品拍摄构图、无真实世界杂乱。生成后无法修复。必须在提示词层面解决。
AI构图陷阱(尤其在推广/品牌合作素材中):
  • 三个护肤品瓶放在大理石上,搭配牡丹和迷迭香 = 库存照片平铺。典型AI痕迹。
  • 香水瓶放在折叠丝巾上,搭配飘落的花瓣 = 产品主图。典型AI痕迹。
  • 完美卷起的肉桂面团,搭配黄油+红糖的配套碗 = 美食博客库存图。典型AI痕迹。
解决方法——提示真实生活场景,而非刻意布置:
  • 护肤品:“手机视角镜面自拍,普通人手持一瓶护肤品,手机完全遮挡脸,背景是杂乱的牙刷杯+毛巾。NOT a styled flat lay。”
  • 香水:“特写普通人手腕喷香水的瞬间,另一只手的香水瓶处于虚焦状态。背景:杂乱的梳妆台——梳子、化妆品、冰咖啡。NOT a product hero shot。”
  • 肉桂面团:“普通人在大理石台面上揉面团,灰粉色亚麻毛巾半挂在台边,黄油和红糖放在不配套的小碗里,构图偏离中心。NOT overhead-stock-flatlay-perfect。”
避免AI痕迹的提示词公式:
[手持/动态动词] + [带轻微杂乱的真实环境] + [无脸人物部分入镜] + [不完美/偏离中心构图] +
"NOT a [styled flat lay / product hero shot / stock food photo]"
末尾的
"NOT a styled flat lay / product hero shot / stock food photo"
核心锚点——没有它,gpt-image-2会回到居中产品构图,后续CSS处理无法修复。详见核心表述部分的完整列表。
HTML/CSS处理(生成后,合成/PDF前): 使用HTML渲染层匹配生成素材与用户真实照片风格:
  • 仅进行轻微
    filter
    调整:轻度对比度、饱和度、亮度或色温调整。
  • 可选叠加层:通过CSS伪元素添加极低透明度的噪点纹理或暗角。
  • 随意裁剪:为每个素材设置
    object-position
    ,使主体不统一居中。
  • 绝不通过图像模型修改用户真实主体。用户真实照片保持原始状态。

4e — Mandatory checks before delivering

4e — 交付前的强制检查

  1. Render-read every page through MCP. Call
    mcp__plugin_pika_pika__analyze_media
    on the final PDF URL as
    application/pdf
    with
    all_pages: true
    and ask for a page-by-page QA pass: confirm every page renders cleanly in 16:9 landscape, no blank pages, no US Letter white-bar letterboxing, no confusing/off-aesthetic pages, no broken hierarchy, and no visible image/crop failure. If any issue is reported, fix the HTML/assets and rerender before delivering.
    • Fallback only for image-dense PDF timeout / rasterization budget failures:
      all_pages: true
      is still the mandatory first QA call. If that all-pages call times out or fails only because the multi-page PDF rasterization budget was exhausted, do not block the run solely on that aggregate failure. Verify structure from MCP/server evidence first: the completed
      mcp__plugin_pika_pika__html_to_pdf
      or
      mcp__plugin_pika_pika__task_status
      result must report
      page_count: 12
      (and
      pages[]
      length 12 when present), the render request must have exactly 12
      body_pages
      , and the submitted
      pdf_options.paper_size
      must be
      {width: 1920, height: 1080, unit: "px"}
      with zero margins. Treat that server-render result + request as the MediaBox/page-size contract; do not run local PDF inspection or local PDF tools/libraries. If the server result lacks
      page_count
      , the fallback is not proven: rerender with smaller JPG assets or stop and report
      pdf_structural_metadata_unavailable
      . Once structure is proven, run per-page MCP QA with
      mcp__plugin_pika_pika__analyze_media(media=<pdf_url>, application/pdf, page: <n>)
      for page 1 through page 12, asking the same visual QA question for each page. Any per-page visual defect still blocks delivery and requires rerender.
  2. Anti-AI-voice on all captions is re-checked before they go into the PDF.
  3. Real photo integrity — the user's face / pet / partner only as curated originals (or lightly graded), never AI-rendered.
  4. Typography consistency — same display + body pair on every page. No surprise font swaps.
  5. No baked-in text on any tile (carried over from Step 3). Check generated, curated, public-feed, and reel-derived tiles; if a frame has burned-in caption/title overlays or any other visible text, replace it and rerender before delivery. For reel-derived tiles, verify the build notes include the sourcing-time
    mcp__plugin_pika_pika__analyze_media
    OCR/no-text result for each candidate frame before placement; final PDF QA is a backstop after sourcing, not the first text gate.
  6. Palette stays inside the spec — no rogue colors.
  7. PDF size under 50 MB — if html_to_pdf returns a "File too large" error, the MCP PNG→JPG conversion step (4d) was skipped.
  8. Voice-page imagery is mode-appropriate — Selfie page actually shows selfies; Vulnerable page actually shows quiet-moment imagery. Not "abstract atmosphere for everything."
  9. Voice-page tiles are not mood-board tiles. Fresh imagery only.
Deliver in chat: path to PDF + 1-line description. Ask: "approve this to ship the kit, or anything to swap before we lock it in?"
🛑 Wait for explicit approval before Step 5.
  1. 通过MCP逐页查看PDF。调用
    mcp__plugin_pika_pika__analyze_media
    对最终PDF URL(
    application/pdf
    )执行
    all_pages: true
    检查,要求逐页QA:确认每页为16:9横版、无空白页、无US Letter黑边、无不符美学风格的页面、无层级混乱、无图片/裁剪失败。若发现任何问题,修复HTML/素材后重新渲染再交付。
    • 仅当图片密集PDF超时/光栅化预算不足时使用fallback:
      all_pages: true
      仍是强制的首次QA调用。若全页调用超时或仅因多页PDF光栅化预算耗尽而失败,不要仅因该聚合失败而终止流程。先通过MCP/服务器证据验证结构:完成的
      mcp__plugin_pika_pika__html_to_pdf
      mcp__plugin_pika_pika__task_status
      结果必须报告
      page_count: 12
      (存在时
      pages[]
      长度为12)、渲染请求必须包含恰好12个
      body_pages
      、提交的
      pdf_options.paper_size
      必须为
      {width: 1920, height: 1080, unit: "px"}
      且内边距为0。将该服务器渲染结果+请求视为MediaBox/页面尺寸约定;不要使用本地PDF检查工具或库。若服务器结果缺少
      page_count
      ,则fallback不成立:使用更小的JPG素材重新渲染,或终止并报告
      pdf_structural_metadata_unavailable
      。结构验证通过后,对第1至12页分别调用
      mcp__plugin_pika_pika__analyze_media(media=<pdf_url>, application/pdf, page: <n>)
      ,每页执行相同的视觉QA检查。任何单页视觉缺陷仍会阻止交付,需重新渲染。
  2. 所有文案需重新检查是否存在AI语气,再放入PDF。
  3. 真实照片完整性——用户的脸/宠物/伴侣仅使用精选原始照片(或轻度调色),绝不使用AI渲染版本。
  4. 排版一致性——每页使用相同的标题+正文字体组合。不得出现意外字体切换。
  5. 所有素材不得包含内嵌文字(延续步骤3规则)。检查生成、精选、公开动态、Reel衍生素材;若帧包含内嵌文案/标题叠加层或其他可见文字,替换素材并重新渲染后再交付。对于Reel衍生素材,验证制作笔记包含素材来源时对每个候选帧执行的
    mcp__plugin_pika_pika__analyze_media
    OCR/无文字检查结果;最终PDF质检是素材来源环节的后备,而非第一个文字检查环节。
  6. 配色符合规范——无违规颜色。
  7. PDF大小低于50 MB——若html_to_pdf返回“File too large”错误,说明跳过了步骤4d的MCP PNG→JPG转换步骤。
  8. 语气页素材符合模板意图——自拍页实际展示自拍;脆弱感页实际展示安静场景素材。不得“所有页面都用抽象氛围素材”。
  9. 语气页素材与情绪板素材不同。仅使用新素材。
在聊天中交付: PDF路径 + 一行描述。询问:“是否批准该PDF以打包工具包,或在锁定前需要调整什么内容?”
🛑 等待明确批准后进入步骤5。

Step 5 — Package the Kit (HARD GATE — only after explicit ship-it)

步骤5 — 打包工具包(硬性确认环节——仅在用户明确回复ship-it后执行)

🛑 STOP. This is a hard gate, not a soft suggestion.
Same gate logic as
build-a-brand
Step 4. Do not begin packaging until the user says ship-it / approved / go / lock it in. "I love it" alone is ambiguous — confirm.
Final ask format when delivering Step 4: end with "ready to lock this in and ship the kit, or want to adjust anything first?" Then stop and wait. Do not stage the kit directory. Do not move files. The identity is not approved until the user says so.
Kit structure:
./tmp/[handle]-influencer-kit/
├── influencer-persona.pdf      # HEADLINE artifact — the designed Influencer Persona PDF (never named "media-kit.pdf")
├── mood-board.png              # approved composite board (~1920×1224 with title header)
├── mood-board-no-header.png    # full-bleed variant used in the PDF (1920×1080, no title band)
├── persona.md                  # comprehensive identity spec (contract for downstream skills) — formerly brand.md, do NOT recreate as brand.md
├── identity.md                 # "who you are" vs "who your influencer is" + strategic direction + gap analysis + content critique — long-form
├── voice-bank.md               # bios, captions (with context labels + 2 per mode), hooks, DMs, do/don't — what the PDF was built from
├── next-steps.md               # Key Next Steps roadmap from Step 2a strategic direction, Step 2b gap analysis, and Step 2c content critique / gear recs; mirrored as a standalone file for downstream content calendar skills
├── images/                     # 🛑 LOAD-BEARING: every image that appears in the PDF, packaged together
│   │                           #   the user receives a self-contained reference of all the visual assets without
│   │                           #   having to dig through mood-board/ and build/ subfolders. shipped at the same
│   │                           #   level as the PDF so it's obvious. populate with copies (not symlinks) of:
│   ├── tile_01.jpg … tile_06.jpg     # 6 mood-board curated tiles used in the composite
│   ├── voice_01.jpg … voice_10.jpg   # 10 voice-page curated tiles used in voice grids + categories page
│   ├── v_hook_1.jpg … v_promo_2.jpg  # generated voice-page tiles after MCP JPG conversion
│   ├── mood-board.png          # title-banded version (matches root-level file)
│   └── mood-board-no-header.png # full-bleed version embedded in PDF page 3 (matches root-level file)
├── mood-board/                 # raw source assets (kept for re-generation, not for end-user browsing)
│   ├── curated/                # individual full-res user photos
│   ├── voice-curated/          # 10 tiles pulled from feed for the voice pages (2 per mode, distinct from mood-board tiles)
│   └── generated/              # raw atmosphere tile URLs generated for mood board + voice pages
├── build/                      # build-time sources — keep for re-renders
│   ├── mood-board.html         # title-banded mood-board HTML rendered by html_to_png
│   ├── mood-board-no-header.html # full-bleed mood-board HTML rendered by html_to_png
│   ├── asset-url-map.md        # raw generated URL → MCP JPG URL map used by the final PDF
│   └── influencer-persona.html # source notes for shared_head + body_pages (never named "media-kit.html")
└── README.md                   # how to feed this kit to other skills
images/
is mandatory.
When Step 5 packages the kit, copy every image actually used in the PDF into
./images/
as the user-facing asset directory. The
mood-board/
and
build/
folders stay for re-generation/debugging — they contain raw sources, render HTML, and URL maps that are confusing to browse. The user's mental model is "I want a folder of the photos that ended up in my kit";
images/
is that folder. Use file copies, not symlinks (zip and Google Drive both handle copies cleanly; symlinks break across both).
Sample posts are NOT shipped. Earlier versions of this skill included a directory of image+caption pairs. That was cut — the voice bank's caption-mode examples already demonstrate voice-in-context, and adding image+caption pairs added work without shipping a clearly-better artifact. Downstream skills that need user likeness should use the user's curated photos as their own runtime references; this skill does not generate fake versions of the user.
persona.md
structure
— see
references/persona-md-template.md
. Must include:
  • Quick reference block (handle, niche, voice in 3 words, aesthetic in 3 words, authenticity dial, typography pair)
  • The full identity (real you + influencer you, with the dial explicitly stated)
  • Strategic direction — the path chosen in Step 2a (niche label, why-it-fits, monetization profile, content load, who's already winning)
  • Gap analysis — current state → target persona, what's missing (content categories, formats, cadence, voice, aesthetic, follower-tier)
  • Content critique + gear recs — the production-quality call-outs from Step 2c, including the specific gear/tools recommended
  • Visual aesthetic (palette, lighting, settings, photography style, subjects, forbidden, relationship to existing grid, typography: fontDisplay + fontBody)
  • Voice (sounds like / never sounds like / adjectives / forbidden)
  • Bio variants
  • Caption modes
  • Hook openers
  • DM + comment voice
  • Do & Don't
  • Reference creators (with dimension + what to borrow)
  • Key Next Steps — same 4–6 cards that appear on the final PDF page; this section is the canonical source and the PDF page reads from it
  • How to use this with other skills — section telling other Pika skills (
    ugc-ads
    ,
    podcast
    ,
    founder-product-video
    ,
    app-sizzle
    ,
    app-store-screens
    ) how to read this kit. e.g. "ugc-ads: pull voice from voice-bank.md hook captions; pull likeness reference from one of the curated photos in mood-board/curated/; adjectives from the Visual aesthetic section drive setting/lighting; use the locked typography pair for any on-screen text."
README.md
— 1-page guide:
  • What's in the kit
  • How to paste
    persona.md
    into other Pika skill prompts
  • Which file feeds which skill (table)
  • Note: the kit doesn't include a content calendar; it's identity + a glow-up roadmap, not a posting schedule.
Save location:
./tmp/[handle]-influencer-kit/
. Per the global rule, all generated deliverables go to
./tmp/
, never
~/Desktop
. Tell the user the path so they can navigate to it.
Final delivery message:
  • Path to the kit folder
  • Inline preview of the
    persona.md
    quick reference block
  • 1-line on how to use it next: "paste
    persona.md
    at the top of any Pika skill prompt to get on-brand output."
🛑 停止操作。这是硬性确认环节,而非建议。
build-a-brand
步骤4的确认逻辑相同。在用户回复ship-it/approved/go/lock it in前,不得开始打包。仅回复“I love it”是模糊的——需确认。
交付步骤4时的最终询问格式: 结尾添加“是否准备锁定内容并打包工具包,还是需要先调整内容?”然后停止操作并等待。不要提前创建工具包目录,不要移动文件。用户明确批准后,身份内容才正式确定。
工具包结构:
./tmp/[账号名]-influencer-kit/
├── influencer-persona.pdf      # 核心成果——定制的网红Persona PDF(绝不要命名为"media-kit.pdf")
├── mood-board.png              # 已批准的合成情绪板(约1920×1224,带标题栏)
├── mood-board-no-header.png    # PDF中使用的全屏变体(1920×1080,无标题栏)
├── persona.md                  # 完整身份规范(下游技能的约定)——原brand.md,请勿重新命名为brand.md
├── identity.md                 # "真实的你" vs "网红身份" + 策略方向 + 差距分析 + 内容评价——长文
├── voice-bank.md               # 个人简介、文案(带场景标签+每个模板2个示例)、钩子、私信、注意事项——PDF的基础内容
├── next-steps.md               # 关键下一步行动路线图,来自步骤2a策略方向、步骤2b差距分析、步骤2c内容评价/设备建议;作为独立文件供下游内容日历技能使用
├── images/                     # 🛑 核心要求: PDF中使用的所有图片,统一打包
│   │                           #   用户无需浏览mood-board/和build/子文件夹,即可获取所有视觉素材。与PDF同级,方便用户查找。复制(而非链接)以下内容:
│   ├── tile_01.jpg … tile_06.jpg     # 合成情绪板使用的6张精选素材
│   ├── voice_01.jpg … voice_10.jpg   # 语气页网格+内容分类页使用的10张精选素材
│   ├── v_hook_1.jpg … v_promo_2.jpg  # MCP转换为JPG后的语气页生成素材
│   ├── mood-board.png          # 带标题栏版本(与根目录文件一致)
│   └── mood-board-no-header.png # PDF第3页嵌入的全屏版本(与根目录文件一致)
├── mood-board/                 # 原始素材(用于重新生成,不供用户浏览)
│   ├── curated/                # 用户提供的高清照片
│   ├── voice-curated/          # 从动态中提取的10张语气页素材(每个模板2张,与情绪板素材不同)
│   └── generated/              # 为情绪板+语气页生成的原始氛围素材URL
├── build/                      # 制作过程素材——用于重新渲染
│   ├── mood-board.html         # 通过html_to_png渲染的带标题栏情绪板HTML
│   ├── mood-board-no-header.html # 通过html_to_png渲染的全屏情绪板HTML
│   ├── asset-url-map.md        # 最终PDF使用的原始生成URL → MCP JPG URL映射
│   └── influencer-persona.html # shared_head + body_pages的源笔记(绝不要命名为"media-kit.html")
└── README.md                   # 如何将该工具包用于其他技能的说明
images/
是必须的
。步骤5打包工具包时,将PDF中实际使用的所有图片复制到
./images/
,作为用户可见的素材目录。
mood-board/
build/
文件夹用于重新生成/调试——包含原始素材、渲染HTML和URL映射,对用户来说难以浏览。用户的需求是“获取工具包中所有照片的文件夹”;
images/
就是该文件夹。使用文件复制,而非链接(zip和Google Drive都能很好地处理复制;链接在两者中都会失效)。
不交付示例动态。旧版技能包含图片+文案对的目录。现已移除——风格库的文案模板示例已体现语境中的语气,添加图片+文案对只会增加工作量,不会提升成果质量。需要用户形象的下游技能应使用用户的精选照片作为运行时参考;本技能不生成用户的仿制品。
persona.md
结构
——详见
references/persona-md-template.md
。必须包含:
  • 快速参考块(账号名、细分领域、3个语气关键词、3个美学关键词、真实性平衡、字体组合
  • 完整身份(真实的你 + 网红身份,明确说明平衡)
  • 策略方向——步骤2a中选择的方向(细分领域标签、适配原因、变现概况、内容要求、成功案例)
  • 差距分析——当前状态→目标Persona,缺失内容(内容类别、形式、频率、语气、美学、粉丝量级)
  • 内容评价 + 设备建议——步骤2c中的制作质量评价,包括具体设备/工具建议
  • 视觉美学(配色、光线、场景、摄影风格、主体、禁用内容、与现有动态的关系、排版:fontDisplay + fontBody
  • 语气(示例语气/禁用语气/形容词/禁用词汇)
  • 个人简介变体
  • 文案模板
  • 开场钩子
  • 私信+评论语气
  • 注意事项
  • 参考创作者(维度 + 借鉴点)
  • 关键下一步行动——与PDF最后一页相同的4–6张卡片;本部分是规范来源,PDF页面引用本部分内容
  • 如何与其他技能配合使用——说明其他Pika技能(
    ugc-ads
    podcast
    founder-product-video
    app-sizzle
    app-store-screens
    )如何读取该工具包。例如“ugc-ads: 从voice-bank.md的钩子文案中提取语气;从mood-board/curated/的精选照片中提取形象参考;视觉美学部分的形容词指导场景/光线;使用锁定的字体组合制作屏幕文字。”
README.md
——1页指南:
  • 工具包包含内容
  • 如何将
    persona.md
    粘贴到其他Pika技能提示词中
  • 哪个文件供哪个技能使用(表格)
  • 注意:工具包不包含内容日历;它是身份定位+形象升级路线图,而非发布计划。
保存位置:
./tmp/[账号名]-influencer-kit/
。根据全局规则,所有生成的交付成果保存至
./tmp/
,而非
~/Desktop
。告知用户路径,方便其访问。
最终交付消息:
  • 工具包文件夹路径
  • persona.md
    快速参考块的inline预览
  • 一行后续使用说明:“将
    persona.md
    粘贴到任何Pika技能提示词顶部,即可获得符合品牌调性的输出。”

Key Principles

核心原则

  • The person is the brief. Read what's in front of you (socials, photos, the way they talk in chat) before asking anything. Don't ask things you can already answer from inputs.
  • One specific person, not a niche label. Don't write "lifestyle creator" — describe the actual human, with their actual comfort show and their actual high-school self.
  • Authentic > polished. When the user provided real photos, real photos always beat AI-generated. GPT-image-2 only fills atmosphere-coverage gaps in the mood board — never regenerates the user's real subjects.
  • Anti-generic at every output. Visual aesthetic test, caption test, do/don't test: could this belong to anyone else? If yes, rewrite.
  • Authenticity dial is named explicitly. persona.md must say what's amplified vs. true-self. Hiding the dial makes the identity feel dishonest.
  • Render-and-READ every text artifact. Open the file. Read it as a stranger would. If the caption could appear on a random influencer page, rewrite.
  • Approval gates are hard stops. Three gates (identity → mood board → voice → ship). Creative direction ≠ approval.
  • Output to
    ./tmp/
    , never
    ~/Desktop
    (per global rule).
  • Always
    provider="gpt-image-2"
    for any image generation (per global rule).
  • 用户是核心。先分析用户提供的内容(社交账号、照片、聊天语气),再提问。不要询问已能从输入内容中获取答案的问题。
  • 聚焦具体个人,而非细分领域标签。不要写“生活方式博主”——描述真实的人,包括其真实的舒适剧集和高中时期的细节。
  • 真实 > 完美。当用户提供真实照片时,真实照片始终优于AI生成素材。GPT-image-2仅用于填补情绪板中的氛围缺口——绝不重新生成用户的真实主体。
  • 所有输出避免通用化。视觉美学测试、文案测试、注意事项测试:是否可被其他人使用?若是则改写。
  • 明确说明真实性平衡。persona.md必须说明哪些部分被放大、哪些是真实自我。隐藏平衡会让身份显得不诚实。
  • 渲染并阅读所有文本成果。打开文件,以陌生人的视角阅读。若文案可出现在任意网红页面,则改写。
  • 批准环节是硬性停止点。三个环节(身份→情绪板→语气→打包)。创意方向≠批准。
  • 输出至
    ./tmp/
    ,而非
    ~/Desktop
    (遵循全局规则)。
  • **始终使用
    provider="gpt-image-2"
    **进行图片生成(遵循全局规则)。

Quality Standards — Non-Negotiable

质量标准——不可协商

The Anti-Generic Test

反通用测试

Before delivering anything: could this identity be repasted onto a different creator? If yes, it's not done.
Strong creator brand = specific human + specific point of view + specific visual aesthetic Step 3 can actually render. Weak creator brand = vibes + adjectives + "authentic + aspirational + community-driven."
交付任何内容前:*该身份是否可直接复制到另一位创作者身上?*若是,则未完成。
优秀的创作者品牌 = 具体的人 + 明确的观点 + 步骤3可实际渲染的具体视觉美学。 糟糕的创作者品牌 = 氛围感 + 形容词 + “真实+有抱负+社群驱动”。

Copy Standards

文案标准

What good creator copy sounds like:
  • It has an actual opinion: "audiobooks count as reading. fight me."
  • It names a real moment: "the third time I cried at the desk job"
  • It trusts the reader: no over-explaining, no "as someone who has always..."
  • It contradicts itself sometimes (real people do)
What bad creator copy sounds like (never write this):
  • "Living my truth ✨"
  • "On a journey to..."
  • "Sharing my passion for..."
  • "As a [identity], I believe..."
  • Anything that could be a Canva quote-card template
优秀的创作者文案:
  • 有明确观点:“有声书也算阅读。来辩。”
  • 提及真实瞬间:“我在办公桌前第三次哭的时候”
  • 信任读者:不过度解释,不使用“作为一直…的人”
  • 有时会自相矛盾(真实的人会这样)
糟糕的创作者文案(绝不要写):
  • “活出真实的自己 ✨”
  • “踏上…的旅程”
  • “分享我对…的热情”
  • “作为[身份],我相信…”
  • 任何可作为Canva引语模板的内容

Aesthetic Standards

美学标准

Mood boards must feel like THIS person. A board labeled "clean girl beige" or "dark academia" without the user explicitly naming that aesthetic is a failure. The board should be hard to label with a one-word TikTok aesthetic — it should feel like a specific human's visual world.
情绪板必须体现该用户的独特性。若用户未明确指定,情绪板被标注为“clean girl米色”或“暗黑学术风”是失败的。情绪板应难以用一个TikTok美学词标签——它应体现具体个人的视觉世界。

Voice Standards

语气标准

  • Every sample caption must pass a read-aloud test. Read it. If it doesn't sound like a person, rewrite.
  • Forbidden words from the identity must never appear in any sample.
  • Vulnerability is fine; performed vulnerability is not. If the vulnerable-mode caption reads as a Notes-app screenshot draft for engagement — rewrite.
🛑 HARD RULE: Never write in AI-creator-voice. This is the most common failure mode. AI voice mimics "casual creator wit" by reaching for the same construction patterns every time. Banned constructions in any sample copy you write for the user:
  • "apparently i [did thing] / moved / became [thing] now" — fake self-discovery
  • em-dash parentheticals for wit — "i moved to boston — apparently — to wear coats"
  • "[thing] + [adjacent thing]" lists with plus signs as connectors — "one drink + a fake errand"
  • forced quirky-specific details that are actually generic — "fake errand", "fake date", "the third time i [thing]", "the [universe/algorithm] knew"
  • "I'm just a girl who..." / "I'm not [extreme] but [normal]" structures
  • "real ones know" / "we don't talk about [thing]" / "iykyk" / "no thoughts just [thing]"
  • lower-stakes self-deprecation that's actually self-promotion — "apparently I'm a wool-coat person now"
  • listicle-style minimal poetry — "two drinks. one walk. no thoughts."
  • observational tone with a wink at the end that wasn't earned
The test: read the sample caption aloud. If you imagine ANY creator could post it without changing a word and it would still fit them — it's AI voice, not THIS user's voice. Rewrite.
The fix: the user's scraped captions ARE their voice. Pull phrases, sentence rhythm, and energy DIRECTLY from the posts you read in Step 1. If scrape failed or the account was caption-light, fall back to how they answered the 4 canonical questions and how they talked in chat. The "sounds like" section of persona.md should use direct quotes from the user wherever possible — first from their captions, then from their chat — not invented "creator-style" approximations. If their actual words are flat-declarative, the brand voice is flat-declarative; don't add "wit" they didn't ask for. If they explicitly ask for "more fun / less dry," add wit through more specific real-life observations, never through the banned constructions above.
  • 每个示例文案必须通过朗读测试。朗读文案。若听起来不像真人,则改写。
  • 身份中禁用的词汇不得出现在任何示例中。
  • 脆弱感是可以的;刻意的脆弱感不行。若脆弱感模板的文案看起来像是为了互动而写的笔记草稿——改写。
🛑 硬性规则:绝不要使用AI创作者语气。这是最常见的失败模式。AI语气通过重复相同的结构模式模仿“休闲创作者幽默”。用户示例文案中禁用以下结构:
  • "apparently i [did thing] / moved / became [thing] now" — 虚假的自我发现
  • 用破折号括号体现幽默 — "i moved to boston — apparently — to wear coats"
  • 用加号连接"[thing] + [adjacent thing]"列表 — "one drink + a fake errand"
  • 刻意添加的古怪细节实则通用 — "fake errand"、"fake date"、"the third time i [thing]"、"the [universe/algorithm] knew"
  • "I'm just a girl who..." / **"I'm not [extreme] but [normal]"**结构
  • "real ones know" / "we don't talk about [thing]" / "iykyk" / "no thoughts just [thing]"
  • 低风险自我贬低实则自我推销 — "apparently I'm a wool-coat person now"
  • 清单式极简诗歌 — "two drinks. one walk. no thoughts."
  • 无依据的观察语气+结尾 wink
测试方法: 朗读示例文案。若你认为任何创作者都能发布该文案且无需修改——这是AI语气,而非该用户的语气。改写。
解决方法: 用户爬取的文案就是其真实语气。直接从步骤1分析的动态中提取短语、句子节奏和活力。若爬取失败或账号文案较少,则 fallback到用户回答4个固定问题的语气及聊天语气。persona.md的“示例语气”部分应尽可能使用用户的直接引用——优先从文案中提取,其次从聊天中提取,而非虚构“创作者风格”的近似表达。若用户的实际语气是平铺直叙的,则品牌语气也是平铺直叙的;不要添加用户未要求的“幽默”。若用户明确要求“更有趣/不枯燥”,则通过更具体的真实生活观察添加幽默,而非使用上述禁用结构。

Engine choice: gpt-image-2

引擎选择: gpt-image-2

Default to
gpt-image-2
at
quality: "medium"
for all generations:
  • Best instruction-following for cast-diversity prompts.
  • Strongest no-text guardrail adherence (critical for mood board tiles).
  • Native 9:16 portrait for the default 6×2 mood-board tile layout; 4:3 landscape if using the 4×3 alternate.
Avoid
nano-banana-pro
(bakes magazine-style text into shots). Use
seedream
only if the user explicitly asks for 2K/4K print-tier shots.
默认使用
gpt-image-2
quality: "medium"
进行所有生成:
  • 在多样性提示词方面的指令遵循最佳。
  • 最严格遵守无文字约束(对情绪板素材至关重要)。
  • 默认9:16竖版,适配默认6×2情绪板素材布局;若使用4×3变体则为4:3横版。
避免使用
nano-banana-pro
(会在素材中添加杂志风格文字)。仅当用户明确要求2K/4K印刷级素材时才使用
seedream

Runtime expectations

运行时间预期

Tell the user the rough total up front.
StageTimeNotes
Stage 0 → Step 1 (intake + Q&A + social scrape)5–15 minUser-paced; one batch of questions; scrape runs in parallel
Step 2 (identity summary)2–4 minAll text; user reads + approves
Step 3 (mood board: font pick + curate + GPT fill + composite)6–10 mingpt-image-2 generations + HTML/CSS composite rendered through
mcp__plugin_pika_pika__html_to_png
Step 4 (Influencer Persona PDF: voice draft + 10 new generated tiles + 10 curated tiles + HTML layout + MCP JPG conversion + html_to_pdf render)12–20 minHeaviest step. ~10 gpt-image-2 generations in parallel, curated image selection, MCP image renders, HTML + PDF render, JPG conversion pass to stay under 50 MB cap.
Step 5 (package kit)2–4 minFile assembly + README
Total: ~30–60 min wall-clock excluding user response time.
提前告知用户大致总耗时。
阶段时间说明
阶段0 → 步骤1(信息收集+问答+社交爬取)5–15分钟用户主导节奏;一批问题;爬取并行执行
步骤2(身份总结)2–4分钟全文本;用户阅读+批准
步骤3(情绪板:字体选择+精选+GPT填充+合成)6–10分钟gpt-image-2生成 + 通过
mcp__plugin_pika_pika__html_to_png
渲染HTML/CSS合成
步骤4(网红Persona PDF:语气撰写+10个新生成素材+10个精选素材+HTML布局+MCP JPG转换+html_to_pdf渲染)12–20分钟最耗时的步骤。约10个gpt-image-2并行生成、精选素材选择、MCP图片渲染、HTML+PDF渲染、JPG转换以控制在50 MB以内。
步骤5(打包工具包)2–4分钟文件组装 + README撰写
总耗时:约30–60分钟(不含用户回复时间)。

Load-bearing phrases

核心表述

Verbatim anchors that go into gpt-image-2 prompts (or procedural rules that hold the recipe together). Do not paraphrase or strip these when simplifying nearby prose — they're empirical behavior dependencies, not writing style. Every entry here is referenced inline elsewhere in this file via
(load-bearing — …; see Load-bearing phrases section)
.
需逐字放入gpt-image-2提示词的锚点(或维持流程稳定的规则)。简化附近文本时不得改写或删除这些表述——它们是经验证的行为依赖,而非写作风格。本部分的每个条目都在文件其他部分通过
(核心表述——…;详见核心表述部分)
引用。

Composition anchors (the AI-tell fix)

构图锚点(AI痕迹修复)

  • "NOT a styled flat lay / product hero shot / stock food photo"
    — append to every Promo-mode tile prompt and any prompt featuring objects on a surface. Without it, gpt-image-2 reverts to centered product composition that later CSS treatment cannot rescue. Referenced in Step 4d.5 prompt formula.
  • "generic bathroom"
    /
    "generic cluttered vanity"
    /
    "slightly cluttered nightstand"
    — environmental context anchors that move the model from studio-set composition to lived-in-context composition without inventing the user's real home or body. Pair with the negative anchor above. Referenced in Step 4d.5 worked examples.
  • "NOT a styled flat lay / product hero shot / stock food photo"
    — 追加到每个推广模板素材提示词及任何包含物体的提示词中。没有它,gpt-image-2会回到居中产品构图,后续CSS处理无法修复。详见步骤4d.5提示词公式。
  • "generic bathroom"
    /
    "generic cluttered vanity"
    /
    "slightly cluttered nightstand"
    — 环境上下文锚点,引导模型从工作室构图转向真实生活场景构图,同时不虚构用户的真实住所或身体。与上述负面锚点配合使用。详见步骤4d.5示例。

Identity-preservation anchors (selfie / face content)

身份保护锚点(自拍/人脸内容)

  • "faceless"
    ,
    "hand obscuring face"
    ,
    "phone obscuring face"
    ,
    "back-of-head 3/4"
    — load-bearing for ALL selfie-mode generations. Without one of these, gpt-image-2 invents the user's face, breaking the "this is actually her" anchor (hard rule in Step 3). Referenced in Step 4b voice imagery + Step 3 hard rule. If gpt-image-2 rejects the original wording on content-policy grounds, retry once after dropping the word "selfie" and rephrasing as a hand-held phone body-framing shot with the face hidden by a hand-held phone; do not repeat the rejected prompt.
  • "faceless"
    ,
    "hand obscuring face"
    ,
    "phone obscuring face"
    ,
    "back-of-head 3/4"
    — 所有自拍风格生成的核心表述。没有其中一个,gpt-image-2会虚构用户的脸,破坏“这就是真实的她”的锚点(步骤3硬性规则)。详见步骤4b语气配图 + 步骤3硬性规则。若gpt-image-2因内容政策拒绝原始表述,重新尝试一次,删除“selfie”一词,改写为手持手机拍摄身体的画面或手遮挡脸的画面;不得重复使用被拒绝的提示词。

Image-content guardrails

图片内容约束

  • "NO text anywhere in image"
    — every generated tile, no exceptions. Without it, gpt-image-2 bakes faux brand labels onto bottles, scarves, books, journal pages. Referenced in Step 3 mandatory check #3 + Step 4d.5 worked examples.
  • "social-content atmosphere only"
    — mood-board generation framing. Pushes gpt-image-2 away from vision-board / wallpaper / art-print energy toward creator-feed energy. Referenced in Step 3 source priority #6.
  • "NO text anywhere in image"
    — 所有生成素材,无一例外。没有它,gpt-image-2会在瓶子、围巾、书籍、日记本上添加虚假品牌标签。详见步骤3强制检查#3 + 步骤4d.5示例。
  • "social-content atmosphere only"
    — 情绪板生成的定位表述。引导gpt-image-2远离愿景板/壁纸/艺术印刷品风格,转向创作者动态风格。详见步骤3素材优先级#6。

Procedural anchors

流程锚点

  • Name a specific ethnicity per face-bearing prompt — gpt-image-2 defaults to lighter skin tones otherwise. Vary across the set. Referenced in Step 3 mandatory check #6.
  • Render generated PNGs through
    mcp__plugin_pika_pika__html_to_png
    before PDF embedding
    — the MCP JPG conversion keeps the final PDF under the 50 MB cap and lets CSS filters/overlays match the generated tile to nearby real-photo register. This is size + presentation hygiene; it cannot rescue bad generated composition.
  • 每个带有人物的提示词指定具体种族——否则gpt-image-2默认生成浅肤色人物。在素材集中变化种族。详见步骤3强制检查#6。
  • PDF嵌入前通过
    mcp__plugin_pika_pika__html_to_png
    渲染生成PNG
    ——MCP JPG转换可将最终PDF控制在50 MB以内,并通过CSS滤镜/叠加层匹配生成素材与附近真实照片风格。这是尺寸+呈现规范;无法修复糟糕的生成构图。

Voice anchors (caption copy)

语气锚点(文案)

  • AI-creator-voice ban list — "apparently i [did thing] now", em-dash parentheticals for wit, "[x] + [y]" lists with plus connectors, "fake errand", "I'm just a girl who…", "real ones know", "iykyk", "no thoughts just [thing]", listicle minimal poetry. Every sample caption is checked against this. The list lives in full under Quality Standards § Voice Standards § HARD RULE: Never write in AI-creator-voice. Treat that list as load-bearing — adding a new caption-mode example without re-checking it against this list is the #1 way the voice bank silently degrades.
  • AI创作者语气禁用列表——"apparently i [did thing] now"、用破折号括号体现幽默、用加号连接"[x] + [y]"列表、"fake errand"、"I'm just a girl who…"、"real ones know"、"iykyk"、"no thoughts just [thing]"、清单式极简诗歌。每个示例文案都需检查该列表。完整列表位于质量标准§语气标准§硬性规则:绝不要使用AI创作者语气。该列表是核心——添加新文案模板示例时若未检查该列表,风格库会悄然退化。

Maintenance

维护

When editing the skill: if you touch a section that references one of these phrases inline, leave the phrase exactly as quoted. If you want to change one, change it in this section first, propagate the change to all inline references, and document why in the commit message.
编辑技能时:若修改引用本部分表述的章节,需保持表述完全一致。若需修改表述,先修改本部分,再同步到所有引用位置,并在提交信息中说明原因。

Failure modes

失败模式

SymptomCauseFix
Identity reads as generic / could belong to anyoneSkipped the specificity push in Step 2; defaulted to adjective soupRewrite with a real opinion, a real moment, the user's actual comfort show. Re-run anti-generic test.
Mood board feels like a one-word aesthetic presetGPT-image-2 prompts referenced an aesthetic label ("clean girl", "dark academia")Strip aesthetic labels from prompts. Prompt for the specific visual qualities (warm window light, papers on desk, half-drunk drink, etc.) — not the meta-label.
Mood board shipped fully-generated, no curated tilesUser-supplied images didn't land on disk (chat-pasted screenshots arrive without a file path); skill compromised and substituted GPT-rendered stand-ins instead of unblocking the file pathUnblock the file path first. Ask the user to save the file to
./tmp/[handle]/grid.png
(or similar) or drag-drop the actual file from Finder, not a screenshot paste. Do NOT substitute generated stand-ins for the user's real world.
Generated a fake version of the user's real subject (their dog, their face, their car)Treated the user's real subjects as "fill these with GPT" gapsHard rule: NEVER generate the user's face, pet, partner, friends, family, or recurring specific objects. Those come from their photos only. GPT renders atmosphere — rooms, light, weather, settings, types of objects. The user will tell you "that's not my dog" — and they're right.
Mood board reads as stocky / iStockphoto densityToo many tiles (24+), uniform polish, uniform color register, studio-cleanExactly 12 tiles at 1920×1080. Varied palette within the identity. Real-feeling snapshots > campaign-clean. The curated user tiles set the "real" register; generated tiles match that energy, never slick up.
Mood board palette collapses to one color (e.g. "just brown")Every generated prompt named the same dominant color registerCohesive palette is fine; mono-register is dead. Even a warm-cognac identity can include cream, sage, dusty rose, deep navy where those co-exist in the user's world. Vary lighting and setting, not just subjects.
Sample captions sound like creator-template fillerWrote in "creator voice" instead of THIS creator's voiceRe-read the user's actual scraped captions and the personality Q&A. Pull phrases / energy / sentence rhythm from those. Rewrite from there.
mcp__plugin_pika_pika__scrape_social
returns "not found"
Handle typo or wrong platform (gave you a TikTok handle thinking it was IG)Confirm the handle with the user, retry with the correction.
Taste URL sent to
mcp__plugin_pika_pika__scrape_social
and fails
Spotify / Depop / Letterboxd / personal sites are taste signals, not supported
mcp__plugin_pika_pika__scrape_social
platforms
Do not call
mcp__plugin_pika_pika__scrape_social
for unsupported taste URLs. Record what the URL suggests; if unclear, ask what dimension to borrow from it.
mcp__plugin_pika_pika__scrape_social
returns "private account"
Account is privateAsk the user to set it public temporarily OR paste 5–10 recent captions in chat. If they cannot, mark the voice bank as inferred from chat answers and keep caption examples conservative.
mcp__plugin_pika_pika__scrape_social
throttled
Platform rate-limited the workerWait 60s and retry once. If still throttled, fall back to manual caption paste.
mcp__plugin_pika_pika__scrape_social
returns posts but no caption text
Photos-only / reels-only account with no caption bodiesUse bio + chat-voice as voice signal. Note in
persona.md
that captions weren't available so other skills know the voice was inferred.
URL signature mismatch
when trying to use a reel cover
Used or modified the ephemeral IG
visual_media_url
/ signed cover URL instead of a durable video asset; signed cover query params must stay untouched and are not reliable curated-tile sources
For handle-only Reels accounts, ignore the signed cover for curated tiles. Use the durable rehosted
.mp4
media_url
(for example
cdn.pika.art
) and call
mcp__plugin_pika_pika__extract_frame(video_url: <durable_rehosted_mp4>, time_s: 0)
to create the tile.
Reel-derived curated tile has baked-in caption/title overlaysTreated an extracted reel frame as already approved, skipped the sourcing-time
mcp__plugin_pika_pika__analyze_media
OCR/no-text check, or sampled a mid-clip reel frame after the creator's text appeared
Extract frame 0 /
time_s: 0
from the durable rehosted
.mp4
, then call
mcp__plugin_pika_pika__analyze_media
on that candidate frame before placement. If frame 0 contains visible text, a title card, or a burned-in caption, drop the reel or try a later keyframe and run the same
mcp__plugin_pika_pika__analyze_media
no-text check before using it. Mood-board and voice-page tiles must have no text on tiles; final PDF QA is only a backstop.
User says "make it more X" without approvingTreated creative direction as approvalIncorporate the direction, regenerate the relevant artifact, ask for approval again. Don't skip the gate.
Mood board has visible misaligned / overlapping tilesHTML/CSS grid dimensions or
object-position
values were off
Re-check tile dimensions and grid spacing. Re-render with
mcp__plugin_pika_pika__html_to_png
. Don't ship a misaligned board.
Generated cast defaults to white / light-skinnedgpt-image-2 default behavior when ethnicity isn't named in the promptName a specific ethnicity per face-bearing prompt, varied across the set. See
references/aesthetic-prompts.md
for the pattern.
Authenticity dial isn't called out in persona.mdTreated "real / amplified / different" as a private decision instead of public specpersona.md must say it out loud. Other skills downstream read this file — they need to know what's amplified.
Skill skipped the strategic conversation and went straight to identity lockTreated Step 2 as just "draft identity.md" instead of the four-part real-talk stepStep 2 has four parts (2a strategic direction, 2b gap analysis, 2c content critique + gear, 2d identity lock). Skipping 2a–2c is the most common failure mode — the user came for the real talk, not for packaging.
Influencer Persona PDF labeled "Media Kit" anywhereDefaulted to the older naming when the user said remove itThe cover eyebrow reads "Influencer Persona · [year]", the PDF file is
influencer-persona.pdf
, the HTML is
influencer-persona.html
. "Media Kit" appears nowhere in the kit. If found, rename.
Voice page 2×2 grid bleeds off the right or bottom edge of the pageUsed
position: absolute
with hand-tuned offsets, or asymmetric padding (e.g.
100px 120px
)
Equal 80px padding on all four sides; flex/grid layout with explicit 800×920px grid container and
width:100% / height:100%
on each tile. See Step 4 page-sequence load-bearing layout rule.
Key Next Steps page has 300+ px of dead space below the cardsGrid rows auto-sized to content; flex container had spare height that didn't propagate to rowsSet
.next-grid { height: 720px; grid-template-rows: 1fr 1fr 1fr; }
so the 3 rows distribute evenly across a definite height. See Step 4 page-sequence load-bearing rule for the final page.
Kit shipped a contact card / "let's work together" pageInherited the old refs-and-contact final pageThe final page is Key Next Steps, not a contact card. The user defines their persona, not their brand-deal contact form, in this skill.
Content categories card photos crop the subject's chin off or center the wrong thingUsed uniform
background-position: center
for all 4 cards
Each card declares its own per-card
background-position
(e.g.
50% 20%
for upper-body,
50% 35%
for top-down kitchen,
50% 50%
for centered subjects). Image container is 300×400, not 240×320.
PDF has giant white bars at the top and bottom of every pageForgot to pass
pdf_options.paper_size
to
mcp__plugin_pika_pika__html_to_pdf
— the @page CSS rule is not enough; the tool defaults to US Letter (792×612, 1.29 aspect) and the 1920×1080 HTML letterboxes into it
Always pass
pdf_options: { paper_size: {width: 1920, height: 1080, unit: "px"}, margins: {top:0, right:0, bottom:0, left:0, unit:"px"} }
. After render, call
mcp__plugin_pika_pika__analyze_media
on the PDF as
application/pdf
with
all_pages: true
and ask it to flag white-bar letterboxing or any non-16:9 page.
mcp__plugin_pika_pika__analyze_media(all_pages: true)
times out on an image-dense final PDF
Multi-page PDF rasterization budget exhausted before Gemini sees every pageUse the Step 4e fallback only with portable evidence:
mcp__plugin_pika_pika__html_to_pdf
/
mcp__plugin_pika_pika__task_status
result
page_count: 12
(and
pages[]
length 12 when present), original
body_pages.length === 12
, and explicit 1920×1080
pdf_options.paper_size
. Then run
mcp__plugin_pika_pika__analyze_media
with
page: <n>
for page 1 through page 12. If
page_count
is missing, rerender smaller or stop with
pdf_structural_metadata_unavailable
; do not use local PDF tools/libraries.
Voice-page AI tiles still look stocky in the rendered PDFThe prompt composition is stocky OR the final
body_pages
still reference raw generated PNGs instead of MCP-converted JPG URLs
Re-generate with lived-in-context prompts. Then update
asset_url_map
, replace the raw generated URLs in
body_pages
, and grep the source notes for stale raw generated URLs before rerender.
Final kit ships without an
images/
folder
Treated the existing
mood-board/
+
build/
subfolders as sufficient
At Step 5 packaging, create
./images/
and copy every image actually used in the PDF (curated tiles, MCP-converted generated tiles, mood-board variants). The user expects a single browse-able folder of "every photo in my kit"; raw asset folders aren't a substitute.
Kit shipped before user explicitly approvedMisread "I love this" or "make it more X" as ship-itApproval is explicit only. Re-ask if ambiguous. The gate exists because the kit locks every decision.
mcp__plugin_pika_pika__html_to_pdf
returns "File too large" (>50 MB)
Embedded gpt-image-2 PNG outputs at full resolution. 10–20 PNGs at 1–3 MB each blow past the cap.Mandatory: convert large PNG image inputs to JPG q85 through
mcp__plugin_pika_pika__html_to_png
before the final render. See Step 4d for the workflow. Result should land far below 50 MB for a 12-page kit.
Texture-swatch / color-swatch palette page reads as "all cloth" or "all boring"Tried to make the Visual Aesthetic page interesting with material textures (cashmere, peony, silk, leather, velvet, eucalyptus) but textures end up reading as the same fabric register — they don't tell a brand partner what the creator actually MAKES.Replace the Visual Aesthetic page with a Content Categories page (page 4). 4 cards in 2×2 grid, each = image + category name + 1-line description + 3–4 example post types. Categories derived from the user's actual feed. This is what brand partners want to see.
Voice-page imagery doesn't match mode (e.g. atmospheric scenery on a Selfie page)Treated voice pages as "any aesthetic image" instead of mode-demonstrative imageryWhen the mode is "Casual Selfie," show actual selfies (mirror, 3/4 back-of-head, phone-obscured face). When "Vulnerable," show quiet-moment imagery (unmade bed, candle, journal). Match the mode intent — that's the whole point of demonstrating voice-in-context.
Voice pages reuse mood-board tilesDefaulted to mood-board imagery to save generation costVoice pages need FRESH imagery — different curated posts from scrape (use other
image_versions2.candidates[0].url
indices) + freshly-generated tiles. The mood board is the visual world; voice pages prove the voice in NEW contexts.
Mood board on PDF page 3 reads with empty cream around itUsed
mood-board.png
(with title band) and
background-size: contain
Pre-render a
mood-board-no-header.png
variant (1920×1080, just the tile grid, no title) and embed full-bleed with
background-size: cover
. Keep the title-banded version as the standalone disk artifact.
Thin cream slivers visible on the left/right edges (or between rows) of the full-bleed mood board pageNo-header HTML kept the standard outer padding/guttersThe no-header variant must use zero outer padding, zero gutters, and tile dimensions that exactly fill 1920×1080. The standalone disk mood-board.png keeps its gutters/header.
AI tiles still feel AI after CSS treatmentThe AI tell is the composition, not just the photography style — and presentation treatment can't redo composition. Common: skincare flat lay on marble with peonies, perfume on silk scarf, perfectly arranged cinnamon dough.Re-generate with prompts that put the item in real lived-in context — handheld + actual messy environment + faceless person partially in frame + "NOT a styled flat lay / product hero shot." See Step 4d.5 for the prompt formula and Promo-mode examples.
gpt-image-2 rejects a generated selfie-register prompt on content policyThe prompt used risky terms such as "mirror selfie", "selfie", or "phone obscuring face"Retry once with policy-safe wording: drop the word "selfie", describe a hand-held phone body-framing shot or face hidden by a hand-held phone, and keep no visible generated faces. Do not repeat the rejected prompt.
html_to_pdf rejects image asset with "MIME type 'text/html' is not in the allowlist"A parallel subshell
curl PUT
after
mcp__plugin_pika_pika__upload_asset
dropped the
Content-Type: image/jpeg
header silently; R2 stored the JPG with
text/html
MIME
After every batch PUT, verify with
curl -s -I <url> | grep content-type
and re-PUT any with wrong types. Or use sequential PUTs (but watch the 5-min URL TTL — see next row).
Cloudflare R2 returns 403 ExpiredRequest on PUTPresigned URL TTL is 300s. A sequential
curl PUT
chain of 10+ files at 8–12s each exceeds the window for the last few URLs.
Run PUTs in parallel using
(curl ... &)
subshells +
wait
. Mint + PUT within one tight cycle. Re-mint if you hit 403 — don't try to extend a stale URL.
症状原因修复方法
身份定位通用化 / 可复制到其他人步骤2中未推进细化;默认使用形容词堆砌加入真实观点、真实瞬间、用户的真实舒适剧集。重新执行反通用测试。
情绪板像单一美学预设gpt-image-2提示词引用了美学标签("clean girl"、"dark academia")从提示词中删除美学标签。提示具体视觉特征(暖窗边光、桌上文件、半杯饮品等)——而非元标签。
情绪板全为生成素材,无精选素材用户提供的图片未成功保存到磁盘(聊天粘贴的截图无文件路径);技能妥协使用GPT渲染替代素材,而非解决文件路径问题先解决文件路径问题。请用户将文件保存到
./tmp/[账号名]/grid.png
(或类似路径),或从Finder拖拽上传实际文件,而非粘贴截图。绝不使用生成素材替代用户的真实素材
生成用户真实主体的仿制品(狗、脸、车)将用户真实主体视为“用GPT填补的缺口”硬性规则:绝不生成用户的脸、宠物、伴侣、朋友、家人或常用特定物品。这些仅来自用户的照片。GPT仅渲染氛围——房间、光线、天气、场景、物品类型。用户会说“那不是我的狗”——他们是对的。
情绪板像库存照片 / iStockphoto风格素材过多(24+)、统一抛光感、统一配色、工作室完美质感1920×1080尺寸恰好12个素材。身份内配色多样化。真实快照 > 广告级完美质感。用户的精选素材锚定“真实”风格;生成素材匹配该风格,而非过度美化。
情绪板配色单一(例如“全棕色”)每个生成提示词都指定相同的主导配色配色统一是可以的;单一风格是呆板的。即使是温暖白兰地棕的身份,也可融入奶油色、鼠尾草绿、灰粉色、深藏青色——只要这些颜色自然存在于用户的生活中。变化光线和场景,而非仅主体。
示例文案像创作者模板填充内容撰写“创作者语气”而非该用户的语气重新阅读用户的实际爬取文案和个性问答。从中提取短语/活力/句子节奏。基于此改写。
mcp__plugin_pika_pika__scrape_social
返回"not found"
账号拼写错误或平台错误(将TikTok账号当作IG账号提供)与用户确认账号,修正后重试。
风格URL传入
mcp__plugin_pika_pika__scrape_social
失败
Spotify / Depop / Letterboxd / 个人网站是风格信号,而非
mcp__plugin_pika_pika__scrape_social
支持的平台
不要对不支持的风格URL调用
mcp__plugin_pika_pika__scrape_social
。记录URL体现的风格;若不明确,询问用户可借鉴的维度。
mcp__plugin_pika_pika__scrape_social
返回"private account"
账号私密请用户暂时设置为公开,或在聊天中粘贴5–10条近期文案。若无法实现,则标记风格库为基于聊天推断,并保守撰写文案示例。
mcp__plugin_pika_pika__scrape_social
被限流
平台对工作进程限流等待60秒后重试一次。若仍被限流,则 fallback到手动粘贴文案。
mcp__plugin_pika_pika__scrape_social
返回动态但无文案
仅图片/Reel的账号,无文案内容使用个人简介+聊天语气作为语气信号。在
persona.md
中注明文案不可用,以便其他技能知晓语气是推断的。
使用Reel封面时出现
URL signature mismatch
使用或修改了临时IG
visual_media_url
/签名封面URL,而非持久化视频资产;签名封面查询参数必须保持不变,且不是可靠的精选素材来源
对于仅提供账号的Reel账号,忽略签名封面作为精选素材。使用持久化重托管
.mp4
media_url
(例如
cdn.pika.art
),调用
mcp__plugin_pika_pika__extract_frame(video_url: <durable_rehosted_mp4>, time_s: 0)
创建素材。
Reel衍生精选素材包含内嵌文案/标题叠加层将提取的Reel帧视为已批准,跳过素材来源时的
mcp__plugin_pika_pika__analyze_media
OCR/无文字检查,或采样了创作者添加文字后的Reel中间帧
从持久化重托管
.mp4
提取第0帧/
time_s: 0
,然后在使用前调用
mcp__plugin_pika_pika__analyze_media
对候选帧进行检查。若第0帧包含可见文字、标题卡或内嵌文案,则放弃该Reel或尝试后续关键帧,并在使用前对后续关键帧执行相同的无文字检查。情绪板和语气页素材不得包含文字;最终PDF质检只是后备。
用户说“让它更X”但未批准将创意方向视为批准融入用户方向,重新生成相关成果,再次请求批准。不要跳过确认环节。
情绪板素材错位/重叠HTML/CSS网格尺寸或
object-position
值错误
重新检查素材尺寸和网格间距。通过
mcp__plugin_pika_pika__html_to_png
重新渲染。不要交付错位的情绪板。
生成人物默认白色/浅肤色gpt-image-2默认行为,提示词未指定种族每个带有人物的提示词指定具体种族,在素材集中变化种族。详见
references/aesthetic-prompts.md
的模板。
persona.md未明确说明真实性平衡将“真实/放大/差异化”视为私人决策,而非公开规范persona.md必须明确说明。下游其他技能会读取该文件——它们需要知道哪些部分被放大。
技能跳过策略对话直接进入身份锁定将步骤2仅视为“撰写identity.md”,而非包含四部分的核心建议环节步骤2包含四部分(2a策略方向、2b差距分析、2c内容评价+设备、2d身份锁定)。跳过2a–2c是最常见的失败模式——用户前来是为了核心建议,而非包装。
网红Persona PDF任何地方标注为"Media Kit"默认使用旧命名,而用户要求移除封面小字为“Influencer Persona · [年份]”,PDF文件名为
influencer-persona.pdf
,HTML文件名为
influencer-persona.html
。工具包中不得出现“Media Kit”。若发现,重命名。
语气页2×2网格溢出页面右侧或底部使用
position: absolute
手动调整偏移,或不对称内边距(例如
100px 120px
四边设置相等的80px内边距;使用flex/grid布局,明确800×920px网格容器,每个素材设置
width:100% / height:100%
。详见步骤4页面顺序核心布局规则。
关键下一步行动页面卡片下方有300+px空白网格行自动适配内容;flex容器有多余高度未传递到行设置
.next-grid { height: 720px; grid-template-rows: 1fr 1fr 1fr; }
,使3行均匀分布在明确高度中。详见步骤4页面顺序最后一页的核心规则。
工具包包含联系卡片 / “合作联系”页面继承旧版参考+联系最后一页最后一页是关键下一步行动,而非联系卡片。用户在本技能中定义身份,而非品牌合作联系表单。
内容分类卡片照片裁剪掉人物下巴或居中错误内容所有4张卡片统一使用
background-position: center
每张卡片单独设置
background-position
(例如
50% 20%
上半身、
50% 35%
俯视厨房、
50% 50%
居中主体)。图片容器为300×400,而非240×320。
PDF每页顶部和底部有巨大空白忘记向
mcp__plugin_pika_pika__html_to_pdf
传递
pdf_options.paper_size
——仅
@page
CSS规则不够;工具默认使用US Letter尺寸(792×612,1.29比例),1920×1080的HTML会被添加黑边适配
始终传递
pdf_options: { paper_size: {width: 1920, height: 1080, unit: "px"}, margins: {top:0, right:0, bottom:0, left:0, unit:"px"} }
。渲染后,调用
mcp__plugin_pika_pika__analyze_media
对PDF(
application/pdf
)执行
all_pages: true
检查,要求标记黑边或非16:9页面。
mcp__plugin_pika_pika__analyze_media(all_pages: true)
在图片密集的最终PDF上超时
多页PDF光栅化预算耗尽,Gemini未查看所有页面仅当有可靠证据时使用步骤4e fallback:
mcp__plugin_pika_pika__html_to_pdf
/
mcp__plugin_pika_pika__task_status
结果
page_count: 12
(存在时
pages[]
长度为12)、原始
body_pages.length === 12
、显式1920×1080
pdf_options.paper_size
。然后对第1至12页分别调用
mcp__plugin_pika_pika__analyze_media
,指定
page: <n>
。若
page_count
缺失,使用更小素材重新渲染或终止并报告
pdf_structural_metadata_unavailable
;不要使用本地PDF工具/库。
语气页AI素材在渲染后的PDF中仍显库存感提示词构图是库存感,或最终
body_pages
仍引用原始生成PNG而非MCP转换的JPG URL
使用真实生活场景提示词重新生成。然后更新
asset_url_map
,替换
body_pages
中的原始生成URL,渲染前在源笔记中搜索过时的原始生成URL。
最终工具包无
images/
文件夹
将现有
mood-board/
+
build/
子文件夹视为足够
步骤5打包时,创建
./images/
并复制PDF中实际使用的所有图片(精选素材、MCP转换的生成素材、情绪板变体)。用户期望一个可浏览的“工具包中所有照片”文件夹;原始素材文件夹无法替代。
用户明确批准前交付工具包将“I love this”或“make it more X”视为打包指令仅接受明确批准。若模糊则重新询问。确认环节存在是因为工具包锁定所有决策。
mcp__plugin_pika_pika__html_to_pdf
返回"File too large" (>50 MB)
嵌入全分辨率gpt-image-2 PNG输出。10–20张PNG每张1–3 MB,总和超过限制强制要求:最终渲染前,通过
mcp__plugin_pika_pika__html_to_png
将大尺寸PNG图片转换为JPG q85。详见步骤4d流程。12页工具包应远低于50 MB。
纹理/配色页显得“全是布料”或“无聊”尝试用材质纹理(羊绒、牡丹、丝绸、皮革、天鹅绒、尤加利叶)让视觉美学页更有趣,但纹理最终显得都是布料——无法向品牌合作方展示创作者的实际内容将视觉美学页替换为内容分类页(第4页)。2×2网格的4张卡片,每张 = 图片 + 分类名称 + 一行描述 + 3–4个示例内容类型。分类基于用户实际动态。这是品牌合作方想看到的内容。
语气页素材与模板不符(例如自拍页用抽象氛围素材)将语气页视为“任何美学图片”,而非模板演示素材若模板是“休闲自拍”,则展示真实自拍(镜面、3/4背影、手机遮挡脸)。若模板是“脆弱感”,则展示安静场景素材(未整理的床、蜡烛、日记)。匹配模板意图——这是演示语境中语气的核心目的。
语气页重复使用情绪板素材为节省生成成本默认使用情绪板素材语气页需要新素材——爬取动态中的其他精选内容(使用
image_versions2.candidates[0].url
的其他索引) + 新生成素材。情绪板是视觉风格展示;语气页需在新语境中展示语气。
PDF第3页情绪板边缘有空白使用
mood-board.png
(带标题栏)并设置
background-size: contain
预渲染
mood-board-no-header.png
变体(1920×1080,仅素材网格,无标题),全屏嵌入并设置
background-size: cover
。带标题栏的版本作为独立磁盘成果。
全屏情绪板页面左侧/右侧(或行之间)有细空白无标题HTML保留了标准外边距/间隙无标题变体必须使用零外边距、零间隙,素材尺寸恰好填满1920×1080。独立磁盘的mood-board.png保留边距/标题栏。
AI素材经过CSS处理后仍显AI痕迹AI痕迹是构图问题,而非仅摄影风格——呈现处理无法修复构图。常见问题:大理石上的护肤品平铺、丝巾上的香水、完美排列的肉桂面团使用真实生活场景提示词重新生成——手持+真实杂乱环境+无脸人物部分入镜+"NOT a styled flat lay / product hero shot"。详见步骤4d.5提示词公式和推广模板示例。
gpt-image-2因内容政策拒绝自拍风格提示词提示词使用了风险词汇,如"mirror selfie"、"selfie"或"phone obscuring face"重新尝试一次,使用符合政策的措辞:删除“selfie”一词,描述手持手机拍摄身体的画面或手遮挡脸的画面,且无生成人脸。不得重复使用被拒绝的提示词。
html_to_pdf拒绝图片资产,提示“MIME type 'text/html' is not in the allowlist”
mcp__plugin_pika_pika__upload_asset
后的并行子shell
curl PUT
静默丢失
Content-Type: image/jpeg
头;R2将JPG存储为
text/html
MIME类型
每次批量PUT后,使用
curl -s -I <url> | grep content-type
验证,重新PUT类型错误的文件。或使用顺序PUT(但注意5分钟URL有效期——见下一行)。
Cloudflare R2返回403 ExpiredRequest on PUT预签名URL有效期为300秒。顺序执行10+个
curl PUT
,每个耗时8–12秒,最后几个URL过期
使用
(curl ... &)
子shell +
wait
并行执行PUT。在一个周期内生成并PUT。若出现403则重新生成并PUT——不要尝试延长过期URL。