wjs-editing-multicam

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

wjs-editing-multicam

Combine N synced camera angles into a single rendered MP4. Decisions are audio-energy-driven only — the cam with the loudest mic each second wins. Output is hard cuts (or hard cuts plus a corner PiP).

将N个已同步的机位视频合并为单个渲染后的MP4文件。机位切换仅由音频能量决定——每秒麦克风音量最大的机位将被选中。输出为硬切效果（或硬切加角落画中画）。

What this skill IS — and IS NOT

本技能能做什么——不能做什么

Is	Is not
Audio-energy-driven cam switching	Face / framing detection (no face_recognition, no MediaPipe)
Single-source audio (one cam's mic)	Multi-mic mix / per-speaker gating
Hard cuts, with optional PiP inset	Crossfades / opacity transitions / sliding animations
`ffmpeg` concat + `overlay` filter renders	HyperFrames composition / `<hf-clip>`
Coverage-aware (won't pick a cam outside its sidecar window)	Frame-accurate beat alignment / VAD-edge cuts

If you need face tracking, fade transitions, captions, or HyperFrames composition, use the hyperframes skill on top of this skill's MP4 output.

支持功能	不支持功能
基于音频能量的机位切换	人脸/取景检测（无face_recognition，无MediaPipe）
单源音频（单个机位的麦克风）	多麦克风混音/按说话人门控
硬切，可选画中画嵌入	交叉淡入淡出/透明度过渡/滑动动画
基于 `ffmpeg` concat + `overlay` 滤镜渲染	HyperFrames合成 / `<hf-clip>`
覆盖范围感知（不会选择超出其副文件时间窗口的机位）	帧级精准节拍对齐/基于语音活动检测(VAD)的边缘剪辑

如果您需要人脸追踪、淡入淡出过渡、字幕或HyperFrames合成，请在本技能输出的MP4文件基础上使用hyperframes技能。

REQUIRED INPUT

必需输入

Original camera files (untouched) plus their
.sync.json
sidecars next to them. If sources aren't synced yet, run wjs-syncing-multicam first to write the sidecars. Missing sidecar = cam assumed at delta=0, full coverage.

autoedit.py

reads each sidecar for

delta_seconds

overlap_in_reference

, lifts the cam's audio envelope into the reference timeline, and only schedules a cam during its coverage window.

render_cuts.py

render_pip.py

apply

ffmpeg -itsoffset

per input using the EDL's

deltas[]

array.

原始机位文件（未修改）及其旁边的
.sync.json
副文件。如果源视频尚未同步，请先运行wjs-syncing-multicam生成副文件。缺少副文件则默认该机位的时间偏移为0，覆盖全程。

autoedit.py

会读取每个副文件中的

delta_seconds

overlap_in_reference

，将机位的音频包络映射到参考时间线，并且仅在机位的覆盖时间窗口内安排其出镜。

render_cuts.py

render_pip.py

会利用EDL的

deltas[]

数组，为每个输入应用

ffmpeg -itsoffset

参数。

When NOT to use

不适用场景

One source — nothing to switch between; use video-segmentation.
Polished NLE timeline already exists — don't fight the editor.
Want fade transitions / overlay captions / brand title cards — run this skill first to get the cut-down MP4, then feed it into wjs-overlaying-video or hyperframes.

仅单个源视频——没有可切换的机位；请使用video-segmentation技能。
已有完善的非线性编辑(NLE)时间线——无需使用本工具。
需要淡入淡出过渡/叠加字幕/品牌片头——先运行本技能生成剪辑后的MP4，再将其输入wjs-overlaying-video或hyperframes技能。

Pipeline

处理流程

Read each input's sidecar → list of
```
delta_seconds[k]
```
+
```
overlap_in_reference[k]
```
.
Extract per-cam mono PCM @ 16 kHz from the original file.
Log-RMS envelope at 1 Hz frame rate (per-second).
Lift each envelope into reference timeline by indexing at
```
t_ref - delta_k
```
; uncovered seconds become
```
-inf
```
so they're never picked.
Audio source = the cam with the largest envelope spread (90th − 10th percentile over its covered seconds), with a small bonus for coverage fraction.
Score per second:
```
cam[k] - mean(other covered cams)
```
. Highest score = best active-speaker candidate.
Editor decides EDL — two modes:
- ```
rotation
```
  (default): random dwell in [
```
min_dwell=8
```
  ,
```
max_dwell=15
```
  ] s, pick best-scoring covered cam (≠ current) at each cut.
- ```
greedy
```
  : hysteresis — hold current unless another cam's lookahead-window score beats it by
```
--switch-threshold
```
  . Floor
```
min_dwell=4
```
  , ceiling
```
max_dwell=18
```
  . Both force-switch if the active cam exits its coverage window mid-shot.
Emit EDL JSON.

读取每个输入的副文件 → 获取
```
delta_seconds[k]
```
+
```
overlap_in_reference[k]
```
列表。
从原始文件中提取每个机位的单声道PCM音频（16 kHz采样率）。
以1 Hz帧率（每秒一次）计算对数均方根(RMS)包络。
将每个包络映射到参考时间线：通过
```
t_ref - delta_k
```
进行索引；未覆盖的秒数标记为
```
-inf
```
，确保不会被选中。
音频源选择：选择包络跨度最大（覆盖时间段内的90th − 10th百分位数）的机位，同时会根据覆盖范围比例给予少量额外加分。
每秒评分：
```
cam[k] - mean(other covered cams)
```
。得分最高的机位即为最佳当前说话人候选。
编辑器生成EDL——两种模式：
- ```
rotation
```
  （默认）：随机停留时间在[
```
min_dwell=8
```
  ,
```
max_dwell=15
```
  ]秒之间，每次切换时选择得分最高的可用机位（≠当前机位）。
- ```
greedy
```
  ：滞后模式——保持当前机位，除非其他机位的前瞻窗口得分超过
```
--switch-threshold
```
  阈值。最小停留时间下限为4秒，上限为18秒。两种模式都会在当前机位超出其覆盖窗口时强制切换。
输出EDL JSON文件。

EDL schema (

edl.json

)

EDL schema (

edl.json

)

json

{
  "_about": "EDL produced by wjs-editing-multicam/autoedit.py. Times in reference timeline. Render scripts apply ffmpeg -itsoffset deltas[k] per input.",
  "_help": {
    "inputs":        "Original media paths, in cam-index order (cam 0, cam 1, ...).",
    "deltas":        "Per-cam delta_seconds from each sidecar. Render uses ffmpeg -itsoffset deltas[k].",
    "duration_sec":  "Output duration in reference timeline.",
    "audio_source":  "Cam index whose audio track becomes the master. Single source — not a mix.",
    "coverage":      "[start, end] per cam in reference timeline.",
    "edl":           "List of {cam, start, end} segments. Times are reference-timeline seconds."
  },
  "inputs":       ["cam_a.MOV", "cam_b.MOV"],
  "deltas":       [0.0, 12.345],
  "duration_sec": 4512,
  "audio_source": 0,
  "coverage":     [[0.0, 4512.0], [12.345, 4499.835]],
  "edl":          [{"cam": 0, "start": 0, "end": 13}, {"cam": 1, "start": 13, "end": 28}, ...]
}

autoedit.py

writes

_about

_help

directly into the file so opening the JSON in any editor explains itself.

json

{
  "_about": "EDL produced by wjs-editing-multicam/autoedit.py. Times in reference timeline. Render scripts apply ffmpeg -itsoffset deltas[k] per input.",
  "_help": {
    "inputs":        "Original media paths, in cam-index order (cam 0, cam 1, ...).",
    "deltas":        "Per-cam delta_seconds from each sidecar. Render uses ffmpeg -itsoffset deltas[k].",
    "duration_sec":  "Output duration in reference timeline.",
    "audio_source":  "Cam index whose audio track becomes the master. Single source — not a mix.",
    "coverage":      "[start, end] per cam in reference timeline.",
    "edl":           "List of {cam, start, end} segments. Times are reference-timeline seconds."
  },
  "inputs":       ["cam_a.MOV", "cam_b.MOV"],
  "deltas":       [0.0, 12.345],
  "duration_sec": 4512,
  "audio_source": 0,
  "coverage":     [[0.0, 4512.0], [12.345, 4499.835]],
  "edl":          [{"cam": 0, "start": 0, "end": 13}, {"cam": 1, "start": 13, "end": 28}, ...]
}

autoedit.py

会将

_about

_help

直接写入文件，因此在任何编辑器中打开该JSON文件即可了解其内容。

Render

渲染

Script	What it does
`scripts/render_cuts.py`	Hard cuts only. `concat` filter graph over per-segment `trim+scale+pad` . Audio = `audio_source` cam, trimmed to first EDL row's start.
`scripts/render_pip.py`	Hard cuts + corner picture-in-picture overlay. Main cam = EDL row's `cam` ; PiP cam picked round-robin (or via per-row `pip` field). PiP is scaled to `--pip-width` (default 480 px), placed in a configurable corner with optional white border. No fade / no opacity — solid block on/off.

Both apply

-itsoffset deltas[k]

per input.

脚本	功能
`scripts/render_cuts.py`	仅硬切效果。基于每个片段的 `trim+scale+pad` 使用 `concat` 滤镜图。音频采用 `audio_source` 机位的音轨，修剪至EDL第一行的起始时间。
`scripts/render_pip.py`	硬切效果 + 角落画中画叠加。主机位为EDL行中的 `cam` ；画中画机位采用循环选择（或通过每行的 `pip` 字段指定）。画中画缩放至 `--pip-width` （默认480像素），放置在可配置的角落，可选白色边框。无淡入淡出/无透明度变化——为实心块开关效果。

两个脚本都会为每个输入应用

-itsoffset deltas[k]

参数。

Brainstorm before running

运行前的思考

Three real knobs to confirm with the user:

Pacing —
```
--mode rotation
```
(varied dwell, easier on the ear) vs
```
--mode greedy
```
(energy-following, snappier).
PiP — yes / no. If yes, which corner + width?
Min cut length —
```
--min-dwell
```
floor. 8 s default for rotation is conservative; talking-heads can go to 4.

audio_source

is auto-picked; override with

--audio-source <cam-index>

if the auto-pick sounds wrong on a 30 s listen.

需要与用户确认三个关键设置：

节奏 —
```
--mode rotation
```
（停留时间多变，听觉体验更舒适） vs
```
--mode greedy
```
（跟随能量变化，切换更灵敏）。
画中画(PiP) — 是/否。如果是，选择哪个角落以及宽度？
最小剪辑长度 —
```
--min-dwell
```
下限。默认的8秒是保守设置；访谈类视频可设为4秒。

audio_source

为自动选择；如果试听30秒后发现自动选择的音频源不合适，可使用

--audio-source <cam-index>

手动指定。

File layout

文件布局

working_dir/
  cam_a.MOV                 # ORIGINAL, untouched
  cam_a.MOV.sync.json       # from wjs-syncing-multicam
  cam_b.MOV                 # ORIGINAL, untouched
  cam_b.MOV.sync.json
  edl.json                  # from autoedit.py
  multicam_render.mp4       # from render_cuts.py OR render_pip.py

working_dir/
  cam_a.MOV                 # 原始文件，未修改
  cam_a.MOV.sync.json       # 来自wjs-syncing-multicam
  cam_b.MOV                 # 原始文件，未修改
  cam_b.MOV.sync.json
  edl.json                  # 来自autoedit.py
  multicam_render.mp4       # 来自render_cuts.py 或 render_pip.py

Common pitfalls

常见误区

Trusting
audio_source
without listening. Spread + coverage is a proxy. Always sample a 30 s clip before committing — a high-spread track can still be clipped / distorted.
Running
autoedit.py
on the full 75 min before tuning. Run on a 2-min slice first (
```
ffmpeg -ss A -t 120
```
an extract per cam), listen, adjust
```
--min-dwell
```
/
```
--mode
```
, then commit to full length.
Expecting face-driven framing. This skill doesn't see the video — only the audio. If one cam is well-framed but quiet, the editor won't favor it. Use
```
--audio-source
```
+ per-segment
```
pip
```
overrides as the manual escape hatch.
Re-rendering when sync was wrong. EDL bakes in
```
deltas[]
```
at autoedit time. If you fix the sidecars later, re-run
```
autoedit.py
```
to regenerate the EDL before re-rendering.

未试听就信任
audio_source
。包络跨度+覆盖范围只是代理指标。在正式渲染前务必试听30秒片段——跨度大的音轨仍可能存在削波/失真问题。
未调整参数就直接处理75分钟的完整视频。先处理2分钟的片段（用
```
ffmpeg -ss A -t 120
```
为每个机位提取片段），试听后调整
```
--min-dwell
```
/
```
--mode
```
，再进行完整长度的处理。
期望基于人脸的取景。本技能不分析视频画面——仅基于音频。如果某个机位取景良好但音量小，编辑器不会优先选择它。可使用
```
--audio-source
```
+ 逐段
```
pip
```
手动覆盖作为补救方案。
同步错误后重新渲染。EDL在autoedit阶段已固化
```
deltas[]
```
参数。如果后续修复了副文件，需重新运行
```
autoedit.py
```
生成新的EDL后再重新渲染。

wjs-editing-multicam

Original

Translation

wjs-editing-multicam

wjs-editing-multicam

What this skill IS — and IS NOT

本技能能做什么——不能做什么

REQUIRED INPUT

必需输入

When NOT to use

不适用场景

Pipeline

处理流程

EDL schema (
`edl.json`
)

EDL schema (
`edl.json`
)

Render

渲染

Brainstorm before running

运行前的思考

File layout

文件布局

Common pitfalls

常见误区

wjs-editing-multicam

Original

Translation

wjs-editing-multicam

wjs-editing-multicam

What this skill IS — and IS NOT

本技能能做什么——不能做什么

REQUIRED INPUT

必需输入

When NOT to use

不适用场景

Pipeline

处理流程

EDL schema (edl.json)

EDL schema (edl.json)

Render

渲染

Brainstorm before running

运行前的思考

File layout

文件布局

Common pitfalls

常见误区

EDL schema (
`edl.json`
)

EDL schema (
`edl.json`
)