synthetic-session-generator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Synthetic Session Generator

合成会话生成器

Purpose

用途

Generate fictional but believable coaching/therapy session transcripts that read like real recorded sessions, while remaining clearly synthetic. Outputs feed three jobs: eval datasets (with ground-truth labels to benchmark summarizers and analyzers), product demos (realistic sessions without exposing real client data), and training/prompt examples (few-shot material for a coaching or therapy assistant).

Realism comes from two disciplines: persona consistency (a client speaks the same way, carries the same history and presenting issues across a session arc) and modality fidelity (the practitioner uses the techniques, question forms, and pacing of the chosen framework). Every output is watermarked as synthetic so it can never be mistaken for a real clinical record.

生成虚构但可信的辅导/治疗会话记录，使其读起来像真实录制的会话，但明确标注为合成内容。输出可用于三类场景：评估数据集（含基准摘要工具和分析工具的真值标签）、产品演示（使用逼真会话而无需暴露真实客户数据）、训练/提示示例（为辅导或治疗助手提供少样本对话素材）。

真实性源于两个核心要素：角色一致性（客户在整个会话过程中说话方式一致，拥有相同的背景和待解决问题）和模式保真度（从业者采用所选框架的技术、提问形式和节奏）。所有输出均添加水印标注为合成内容，避免被误认为真实临床记录。

When to Use

使用场景

Use when a user asks for fake/synthetic/mock/demo coaching or therapy transcripts, eval or test data for session-analysis tools (e.g. the

coaching-session-summarizer

), few-shot dialogue examples, or persona-consistent session series. Do not use to analyze or summarize a real transcript — that is the job of

coaching-session-summarizer

transcript-analyzer

当用户请求生成虚假/合成/模拟/演示版辅导或治疗记录、会话分析工具的评估或测试数据（如

coaching-session-summarizer

）、少样本对话示例，或符合角色设定的系列会话时使用。请勿用于分析或总结真实会话——该工作由

coaching-session-summarizer

或

transcript-analyzer

完成。

Workflow

工作流程

Step 0 — Setup mode (configure defaults)

步骤0 — 设置模式（配置默认值）

When the user wants to configure the skill ("setup", "set my defaults", "always use Russian / IFS / 50-minute sessions"), run setup mode. Offer the three choices via AskUserQuestion, then persist them:

Language — output language for the transcript (
```
en
```
,
```
ru
```
,
```
de
```
,
```
es
```
,
```
fr
```
,
```
pt
```
,
```
it
```
,
```
nl
```
).
Modality — default framework (
```
icf-grow
```
,
```
cbt
```
,
```
ifs
```
,
```
act-mi
```
).
Session duration — minutes (e.g. 25 / 50 / 80); mapped to a turn budget (~0.6 turns/min).

bash

python3 scripts/setup_config.py --language ru --modality cbt --duration 50 --show
python3 scripts/setup_config.py --show     # view current defaults

This writes

config.json

in the skill directory. Later

scaffold_session.py

runs inherit these defaults, so the user only specifies what differs (e.g. persona and session position). Per-run flags always override the saved config.

当用户想要配置技能（“setup”、“设置我的默认值”、“始终使用俄语/IFS/50分钟会话”）时，启动设置模式。通过AskUserQuestion提供以下三个选项，然后保存配置：

语言 — 记录输出语言（
```
en
```
、
```
ru
```
、
```
de
```
、
```
es
```
、
```
fr
```
、
```
pt
```
、
```
it
```
、
```
nl
```
）。
模式 — 默认框架（
```
icf-grow
```
、
```
cbt
```
、
```
ifs
```
、
```
act-mi
```
）。
会话时长 — 分钟数（如25/50/80）；对应对话轮次预算（约0.6轮/分钟）。

bash

python3 scripts/setup_config.py --language ru --modality cbt --duration 50 --show
python3 scripts/setup_config.py --show     # 查看当前默认配置

该命令会在技能目录下生成

config.json

文件。后续运行

scaffold_session.py

时会继承这些默认值，用户只需指定与默认值不同的参数（如角色和会话阶段）。单次运行的参数将始终覆盖保存的配置。

Step 1 — Gather the generation spec

步骤1 — 收集生成规格

Honour the setup-mode defaults (Step 0); only ask for parameters the user hasn't already fixed.

Collect (or infer sensible defaults for) these parameters. Ask only for what materially changes the output; default the rest.

Use case: eval / demo / training (drives whether ground-truth labels are emitted).
Modality:
```
icf-grow
```
,
```
cbt
```
,
```
ifs
```
, or
```
act-mi
```
. See
```
references/modalities.md
```
for the technique cheat-sheet, signature moves, and vocabulary of each.
Persona: pick an existing persona from
```
references/personas.md
```
, or generate a new one and persist it back into that file so a session series stays consistent. A persona = name, demographics, presenting issue, history, speech register, defenses/resistances, goals.
Session position: intake / early / mid-arc / breakthrough / rupture-and-repair / closing. This sets emotional tone and what prior material is referenced.
Format:
```
fathom
```
,
```
plain
```
,
```
json
```
, or
```
markdown
```
(see Step 3). Markdown is always produced.
Language: defaults from setup config; pass
```
--language
```
. Author all dialogue, persona voice, and the watermark-adjacent text in that language; keep eval tag keys in English.
Duration / length:
```
--duration <minutes>
```
(preferred — maps to a turn budget) or the coarse
```
--length
```
(short ~15 / standard ~30 / long ~50+).

遵循步骤0中的设置模式默认值；仅询问用户未提前设定的参数。

收集（或推断合理默认值）以下参数。仅询问会对输出产生实质性影响的参数，其余使用默认值。

使用场景：评估/演示/训练（决定是否输出真值标签）。
模式：
```
icf-grow
```
、
```
cbt
```
、
```
ifs
```
或
```
act-mi
```
。详见
```
references/modalities.md
```
中的技术速查表、标志性方法和各框架的词汇表。
角色：从
```
references/personas.md
```
中选择现有角色，或生成新角色并保存回该文件，以确保系列会话的一致性。角色包含姓名、人口统计信息、待解决问题、背景、说话风格、防御/抗拒点、目标。
会话阶段： intake（初诊）/ early（初期）/ mid-arc（中期）/ breakthrough（突破）/ rupture-and-repair（修复破裂关系）/ closing（收尾）。该参数设置情绪基调及需参考的前期内容。
格式：
```
fathom
```
、
```
plain
```
、
```
json
```
或
```
markdown
```
（见步骤3）。始终生成markdown格式。
语言：默认使用设置配置中的语言；通过
```
--language
```
参数指定。所有对话、角色语气和水印相关文本均使用该语言；评估标签的键保持英文。
时长/长度：
```
--duration <分钟数>
```
（推荐——对应对话轮次预算）或粗略的
```
--length
```
参数（short约15分钟/standard约30分钟/long约50+分钟）。

Step 2 — Build the session skeleton, then write the dialogue

步骤2 — 构建会话框架，然后编写对话

Run the scaffolding script to turn the spec into a structured skeleton (phases, beat list, turn budget, JSON shell, and the synthetic watermark):

bash

python3 scripts/scaffold_session.py --modality cbt --persona maya --position mid-arc \
    --length standard --format json --out /tmp/session_skeleton.json

Then write the actual dialogue by hand (model-authored), filling each beat. The script provides structure and guardrails; Claude provides the natural, non-templated language. Key realism rules (full list in

references/realism_guide.md

Open with logistics/check-in small talk; never jump straight to deep work.
Give the client disfluencies, hedges, self-interruption, and at least one moment of resistance or avoidance. Real clients don't deliver clean insights on cue.
Keep the practitioner in-modality: CBT uses thought records and Socratic questioning; IFS uses parts language and "How do you feel toward that part?"; GROW moves Goal→Reality→Options→Will; ACT/MI uses values, defusion, and change talk. Avoid mixing modalities unless depicting eclectic practice deliberately.
Maintain persona voice: vocabulary, sentence length, and recurring metaphors stay stable.
End with a summary, a between-session task/experiment, and scheduling.

运行框架脚本将规格转换为结构化框架（阶段、节拍列表、轮次预算、JSON外壳和合成水印）：

bash

python3 scripts/scaffold_session.py --modality cbt --persona maya --position mid-arc \
    --length standard --format json --out /tmp/session_skeleton.json

然后手动编写实际对话（由模型生成），填充每个节拍。脚本提供结构和约束；Claude生成自然、非模板化的语言。核心真实性规则（完整列表见

references/realism_guide.md

）：

从后勤/签到闲聊开始；切勿直接切入深度工作。
给客户添加语误、犹豫、自我打断，以及至少一次抗拒或回避的时刻。真实客户不会按提示清晰表达见解。
从业者需遵循对应模式：CBT使用思维记录和苏格拉底式提问；IFS使用部分语言和“你对那个部分有什么感觉？”；GROW遵循目标→现状→选项→意愿的流程；ACT/MI使用价值观、解离和改变谈话。除非刻意描述折衷疗法，否则避免混合模式。
保持角色语气：词汇、句子长度和重复隐喻保持一致。
以总结、会话间任务/实验和预约安排收尾。

Step 3 — Render formats (always include markdown)

步骤3 — 渲染格式（始终包含markdown）

Author once in the JSON turn structure, then convert. Always render the markdown format (it is the canonical, human-readable artifact); add any other formats the user asked for.

bash

undefined

先以JSON轮次结构编写，再进行转换。始终渲染markdown格式（这是标准的人工可读产物）；添加用户要求的其他格式。

bash

undefined

markdown is always produced:

始终生成markdown格式：

python3 scripts/convert_format.py --in /tmp/session.json --to markdown --auto-timestamps --out session.md

plus any requested extras:

再生成用户要求的其他格式：

python3 scripts/convert_format.py --in /tmp/session.json --to fathom --auto-timestamps --out session.txt


- **markdown** *(always)* — Obsidian note with YAML frontmatter (persona id, modality, session
  position, synthetic flag) above the transcript.
- **fathom** — speaker-labeled, timestamped lines matching the Fathom/Granola export style, so the
  transcript flows through existing skills (`coaching-session-summarizer`, `transcript-analyzer`).
- **plain** — simple `Coach:` / `Client:` turn-taking markdown.
- **json** — the source itself: turns with `speaker`, `timestamp`, `text`, and eval tags
  (`technique`, `emotion`, `phase`); for evals, also the `ground_truth` block.

**Timestamps.** Do not hand-invent timestamps. Pass `--auto-timestamps` so the converter emulates
them from each turn's word count (~150 wpm + a short inter-turn gap), keeping timing internally
consistent. Tune pace with `--wpm`. See `assets/templates/` for a reference example of each format.

python3 scripts/convert_format.py --in /tmp/session.json --to fathom --auto-timestamps --out session.txt


- **markdown**（必选）—— 带有YAML前置元数据（角色ID、模式、会话阶段、合成标记）的Obsidian笔记，位于记录上方。
- **fathom**—— 带有说话人标签和时间戳的行，匹配Fathom/Granola导出格式，以便记录可在现有技能（`coaching-session-summarizer`、`transcript-analyzer`）中流转。
- **plain**—— 简单的`Coach:`/`Client:`轮次对话markdown格式。
- **json**—— 源文件本身：包含`speaker`、`timestamp`、`text`和评估标签（`technique`、`emotion`、`phase`）的轮次；对于评估场景，还包含`ground_truth`块。

**时间戳**。请勿手动创建时间戳。传递`--auto-timestamps`参数，让转换器根据每个轮次的单词数（约150词/分钟 + 短暂轮次间隔）自动生成时间戳，保持时间内部一致。通过`--wpm`参数调整语速。详见`assets/templates/`中的各格式参考示例。

Step 4 — (Optional) Case-conceptualization card with portrait

步骤4 —（可选）带肖像的病例概念卡片

When the user wants a card summarizing the case (for demos, persona bibles, or eval context), build it from the same session JSON and pair it with a generated portrait:

bash

python3 scripts/make_card.py --in /tmp/session.json --out /tmp/card.md            # scaffold
python3 scripts/make_card.py --in /tmp/session.json --print-prompt                # portrait prompt

Run
```
make_card.py
```
to emit the card scaffold (modality-aware formulation skeleton + themes/goals pulled from
```
ground_truth
```
+ a watermark + a ready portrait prompt).
Fill the
```

```
blocks with the clinical formulation (model-authored).
Generate the portrait with the gpt-image-2
skill using the prompt from
```
--print-prompt
```
. Keep it illustrative, not photoreal — a stylized image cannot be mistaken for a photo of a real person. Then re-run with
```
--image <path>
```
(or edit the card) to embed it.

当用户需要一张卡片总结病例（用于演示、角色手册或评估背景）时，从同一会话JSON构建卡片并搭配生成的肖像：

bash

python3 scripts/make_card.py --in /tmp/session.json --out /tmp/card.md            # 生成框架
python3 scripts/make_card.py --in /tmp/session.json --print-prompt                # 生成肖像提示词

运行
```
make_card.py
```
生成卡片框架（适配模式的表述框架 + 从
```
ground_truth
```
提取的主题/目标 + 水印 + 现成的肖像提示词）。
填充
```

```
块（由模型生成临床表述）。
使用**
```
gpt-image-2
```
技能**，通过
```
--print-prompt
```
输出的提示词生成肖像。保持肖像为插画风格，而非照片写实——风格化图像不会被误认为真实人物的照片。然后使用
```
--image <路径>
```
重新运行（或编辑卡片）以嵌入肖像。

Step 4b — (Optional) Render the card as an HTML page via

tufte-report

步骤4b —（可选）通过

tufte-report

将卡片渲染为HTML页面

When the user wants a shareable HTML page of the case card (portrait + conceptualization), hand the filled card to the tufte-report
skill, which produces a standalone Tufte-style HTML file.

Build and fill the card (Step 4), including the embedded portrait.
Invoke the
```
tufte-report
```
skill with the card's conceptualization as the narrative content and the portrait as a figure. Map card sections to the report: Snapshot/Presenting issue → intro narrative; Formulation → the main 2-column narrative+data section; Working themes and Goals & experiments → a status/dashboard panel; Emotional arc → a sparkline or labelled sequence. Pass the portrait path so it renders as the hero figure.
Keep the synthetic watermark visible in the HTML (header or footer), and confirm the output path (default: current working directory) before writing the
```
.html
```
.

The portrait must remain the illustrative, non-photoreal image from Step 4 — the HTML page is for demos and persona bibles, never presented as a real client record.

当用户需要可分享的病例卡片HTML页面（肖像+概念表述）时，将填充好的卡片交给**

tufte-report

技能**，该技能会生成独立的Tufte风格HTML文件。

构建并填充卡片（步骤4），包括嵌入的肖像。
调用
```
tufte-report
```
技能，将卡片的概念表述作为叙事内容，肖像作为配图。将卡片部分映射到报告：Snapshot/Presenting issue（快照/待解决问题）→介绍性叙事；Formulation（表述）→主双栏叙事+数据部分；Working themes（工作主题）和Goals & experiments（目标与实验）→状态/仪表板面板；Emotional arc（情绪曲线）→迷你折线图或标记序列。传递肖像路径使其作为主配图渲染。
确保合成水印在HTML中可见（页眉或页脚），并在写入
```
.html
```
文件前确认输出路径（默认：当前工作目录）。

肖像必须保持步骤4中的插画风格、非照片写实图像——HTML页面用于演示和角色手册，绝不能作为真实客户记录展示。

Step 5 — Watermark and save

步骤5 — 添加水印并保存

Always apply the synthetic watermark — this is non-negotiable. The scaffold script injects it; verify it survived format conversion. Each output must carry, in a location appropriate to its format (frontmatter, JSON metadata, or a header/footer comment):

⚠️ SYNTHETIC — AI-generated fictional session. Not a real person, not clinical advice.

Confirm the save location before writing. Ask the user where to save and state the default — the current working directory (

). Only fall back to

/tmp/

for throwaway intermediate scaffolds the user will not keep. Use clear filenames (e.g.

<persona>_<modality>_<position>.md

). For eval batches, write one file per session into the chosen directory plus a manifest listing personas, modalities, and label coverage.

必须添加合成水印——这是不可协商的要求。框架脚本会注入水印；需验证水印在格式转换后仍保留。每个输出都必须在适合其格式的位置（前置元数据、JSON元数据或页眉/页脚注释）包含以下内容：

⚠️ 合成内容 — AI生成的虚构会话。非真实人物，非临床建议。

保存前确认保存位置。询问用户保存位置并说明默认值——当前工作目录（

）。仅在用户不会保留的临时中间框架文件时回退到

/tmp/

。使用清晰的文件名（如

<persona>_<modality>_<position>.md

）。对于评估批次，将每个会话写入选定目录中的单独文件，并生成一份包含角色、模式和标签覆盖范围的清单。

Limitations and Constraints

限制与约束

Synthetic only. Never present output as a real session, real person, or clinical record. The watermark is mandatory and must never be stripped, even for demos (use the optional clean-body variant only when the user explicitly confirms, and keep provenance in metadata).
Not clinical guidance. Generated dialogue is illustrative fiction; it must not be used as a source of therapeutic technique, diagnosis, or advice for real care. Do not reproduce real protocols verbatim or imply clinical validity.
No real PII. Do not base personas on identifiable real individuals or copy details from real transcripts. If given a real transcript as a style reference, abstract patterns only — never names, specifics, or verbatim content (route true anonymization to
```
session-anonymizer
```
).
Portraits stay illustrative. Generate card portraits as stylized illustrations, never photorealistic faces — a synthetic illustration cannot be mistaken for a photo of a real person. The card carries its own synthetic watermark; keep it.
Safety-sensitive content. Crisis, self-harm, abuse, or risk scenarios may be depicted only when the use case clearly warrants it (e.g. red-team evals), must stay clearly fictional and watermarked, and must depict responsible practitioner handling — never operational harmful detail.
Stay in scope. This skill generates; it does not analyze real sessions. Hand real-transcript summarization to
```
coaching-session-summarizer
```
and anonymization to
```
session-anonymizer
```
.

仅用于合成内容。绝不能将输出呈现为真实会话、真实人物或临床记录。水印是强制性的，即使在演示中也绝不能移除（仅当用户明确确认时可使用可选的无水印变体，但需在元数据中保留来源信息）。
非临床指导。生成的对话是说明性虚构内容；不得作为治疗技术、诊断或真实护理建议的来源。不得逐字复制真实协议或暗示临床有效性。
无真实个人身份信息（PII）。不得基于可识别的真实人物创建角色，或从真实记录中复制细节。如果收到真实记录作为风格参考，仅提取抽象模式——绝不使用姓名、具体信息或逐字内容（如需真正的匿名化，请使用
```
session-anonymizer
```
）。
肖像保持插画风格。生成卡片肖像时使用风格化插画，绝不使用照片写实的人脸——合成插画不会被误认为真实人物的照片。卡片本身带有合成水印，请保留该水印。
敏感安全内容。仅当使用场景明确需要时（如红队评估）才可描绘危机、自我伤害、虐待或风险场景，且必须明确标注为虚构并添加水印，同时描绘从业者的负责任处理方式——绝不能包含有害操作细节。
保持范围。该技能仅用于生成内容；不分析真实会话。真实记录的摘要工作请交给
```
coaching-session-summarizer
```
，匿名化工作请交给
```
session-anonymizer
```
。

synthetic-session-generator

Original

Translation

Synthetic Session Generator

合成会话生成器

Purpose

用途

When to Use

使用场景

Workflow

工作流程

Step 0 — Setup mode (configure defaults)

步骤0 — 设置模式（配置默认值）

Step 1 — Gather the generation spec

步骤1 — 收集生成规格

Step 2 — Build the session skeleton, then write the dialogue

步骤2 — 构建会话框架，然后编写对话

Step 3 — Render formats (always include markdown)

步骤3 — 渲染格式（始终包含markdown）

markdown is always produced:

始终生成markdown格式：

plus any requested extras:

再生成用户要求的其他格式：

Step 4 — (Optional) Case-conceptualization card with portrait

步骤4 —（可选）带肖像的病例概念卡片

Step 4b — (Optional) Render the card as an HTML page via
`tufte-report`

步骤4b —（可选）通过
`tufte-report`
将卡片渲染为HTML页面

Step 5 — Watermark and save

步骤5 — 添加水印并保存

Limitations and Constraints

限制与约束

synthetic-session-generator

Original

Translation

Synthetic Session Generator

合成会话生成器

Purpose

用途

When to Use

使用场景

Workflow

工作流程

Step 0 — Setup mode (configure defaults)

步骤0 — 设置模式（配置默认值）

Step 1 — Gather the generation spec

步骤1 — 收集生成规格

Step 2 — Build the session skeleton, then write the dialogue

步骤2 — 构建会话框架，然后编写对话

Step 3 — Render formats (always include markdown)

步骤3 — 渲染格式（始终包含markdown）

markdown is always produced:

始终生成markdown格式：

plus any requested extras:

再生成用户要求的其他格式：

Step 4 — (Optional) Case-conceptualization card with portrait

步骤4 —（可选）带肖像的病例概念卡片

Step 4b — (Optional) Render the card as an HTML page via tufte-report

步骤4b —（可选）通过tufte-report将卡片渲染为HTML页面

Step 5 — Watermark and save

步骤5 — 添加水印并保存

Limitations and Constraints

限制与约束

Step 4b — (Optional) Render the card as an HTML page via
`tufte-report`

步骤4b —（可选）通过
`tufte-report`
将卡片渲染为HTML页面