wjs-editing-multicam
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesewjs-editing-multicam
wjs-editing-multicam
Combine N synced camera angles into a single rendered MP4. Decisions are audio-energy-driven only — the cam with the loudest mic each second wins. Output is hard cuts (or hard cuts plus a corner PiP).
将N个已同步的机位视频合并为单个渲染后的MP4文件。机位切换仅由音频能量决定——每秒麦克风音量最大的机位将被选中。输出为硬切效果(或硬切加角落画中画)。
What this skill IS — and IS NOT
本技能能做什么——不能做什么
| Is | Is not |
|---|---|
| Audio-energy-driven cam switching | Face / framing detection (no face_recognition, no MediaPipe) |
| Single-source audio (one cam's mic) | Multi-mic mix / per-speaker gating |
| Hard cuts, with optional PiP inset | Crossfades / opacity transitions / sliding animations |
| HyperFrames composition / |
| Coverage-aware (won't pick a cam outside its sidecar window) | Frame-accurate beat alignment / VAD-edge cuts |
If you need face tracking, fade transitions, captions, or HyperFrames composition, use the hyperframes skill on top of this skill's MP4 output.
| 支持功能 | 不支持功能 |
|---|---|
| 基于音频能量的机位切换 | 人脸/取景检测(无face_recognition,无MediaPipe) |
| 单源音频(单个机位的麦克风) | 多麦克风混音/按说话人门控 |
| 硬切,可选画中画嵌入 | 交叉淡入淡出/透明度过渡/滑动动画 |
基于 | HyperFrames合成 / |
| 覆盖范围感知(不会选择超出其副文件时间窗口的机位) | 帧级精准节拍对齐/基于语音活动检测(VAD)的边缘剪辑 |
如果您需要人脸追踪、淡入淡出过渡、字幕或HyperFrames合成,请在本技能输出的MP4文件基础上使用hyperframes技能。
REQUIRED INPUT
必需输入
Original camera files (untouched) plus their sidecars next to them. If sources aren't synced yet, run wjs-syncing-multicam first to write the sidecars. Missing sidecar = cam assumed at delta=0, full coverage.
.sync.jsonautoedit.pydelta_secondsoverlap_in_referencerender_cuts.pyrender_pip.pyffmpeg -itsoffsetdeltas[]原始机位文件(未修改)及其旁边的副文件。如果源视频尚未同步,请先运行wjs-syncing-multicam生成副文件。缺少副文件则默认该机位的时间偏移为0,覆盖全程。
.sync.jsonautoedit.pydelta_secondsoverlap_in_referencerender_cuts.pyrender_pip.pydeltas[]ffmpeg -itsoffsetWhen NOT to use
不适用场景
- One source — nothing to switch between; use video-segmentation.
- Polished NLE timeline already exists — don't fight the editor.
- Want fade transitions / overlay captions / brand title cards — run this skill first to get the cut-down MP4, then feed it into wjs-overlaying-video or hyperframes.
- 仅单个源视频——没有可切换的机位;请使用video-segmentation技能。
- 已有完善的非线性编辑(NLE)时间线——无需使用本工具。
- 需要淡入淡出过渡/叠加字幕/品牌片头——先运行本技能生成剪辑后的MP4,再将其输入wjs-overlaying-video或hyperframes技能。
Pipeline
处理流程
- Read each input's sidecar → list of +
delta_seconds[k].overlap_in_reference[k] - Extract per-cam mono PCM @ 16 kHz from the original file.
- Log-RMS envelope at 1 Hz frame rate (per-second).
- Lift each envelope into reference timeline by indexing at ; uncovered seconds become
t_ref - delta_kso they're never picked.-inf - Audio source = the cam with the largest envelope spread (90th − 10th percentile over its covered seconds), with a small bonus for coverage fraction.
- Score per second: . Highest score = best active-speaker candidate.
cam[k] - mean(other covered cams) - Editor decides EDL — two modes:
- (default): random dwell in [
rotation,min_dwell=8] s, pick best-scoring covered cam (≠ current) at each cut.max_dwell=15 - : hysteresis — hold current unless another cam's lookahead-window score beats it by
greedy. Floor--switch-threshold, ceilingmin_dwell=4. Both force-switch if the active cam exits its coverage window mid-shot.max_dwell=18
- Emit EDL JSON.
- 读取每个输入的副文件 → 获取+
delta_seconds[k]列表。overlap_in_reference[k] - 从原始文件中提取每个机位的单声道PCM音频(16 kHz采样率)。
- 以1 Hz帧率(每秒一次)计算对数均方根(RMS)包络。
- 将每个包络映射到参考时间线:通过进行索引;未覆盖的秒数标记为
t_ref - delta_k,确保不会被选中。-inf - 音频源选择:选择包络跨度最大(覆盖时间段内的90th − 10th百分位数)的机位,同时会根据覆盖范围比例给予少量额外加分。
- 每秒评分:。得分最高的机位即为最佳当前说话人候选。
cam[k] - mean(other covered cams) - 编辑器生成EDL——两种模式:
- (默认):随机停留时间在[
rotation,min_dwell=8]秒之间,每次切换时选择得分最高的可用机位(≠当前机位)。max_dwell=15 - :滞后模式——保持当前机位,除非其他机位的前瞻窗口得分超过
greedy阈值。最小停留时间下限为4秒,上限为18秒。 两种模式都会在当前机位超出其覆盖窗口时强制切换。--switch-threshold
- 输出EDL JSON文件。
EDL schema (edl.json
)
edl.jsonEDL schema (edl.json
)
edl.jsonjson
{
"_about": "EDL produced by wjs-editing-multicam/autoedit.py. Times in reference timeline. Render scripts apply ffmpeg -itsoffset deltas[k] per input.",
"_help": {
"inputs": "Original media paths, in cam-index order (cam 0, cam 1, ...).",
"deltas": "Per-cam delta_seconds from each sidecar. Render uses ffmpeg -itsoffset deltas[k].",
"duration_sec": "Output duration in reference timeline.",
"audio_source": "Cam index whose audio track becomes the master. Single source — not a mix.",
"coverage": "[start, end] per cam in reference timeline.",
"edl": "List of {cam, start, end} segments. Times are reference-timeline seconds."
},
"inputs": ["cam_a.MOV", "cam_b.MOV"],
"deltas": [0.0, 12.345],
"duration_sec": 4512,
"audio_source": 0,
"coverage": [[0.0, 4512.0], [12.345, 4499.835]],
"edl": [{"cam": 0, "start": 0, "end": 13}, {"cam": 1, "start": 13, "end": 28}, ...]
}autoedit.py_about_helpjson
{
"_about": "EDL produced by wjs-editing-multicam/autoedit.py. Times in reference timeline. Render scripts apply ffmpeg -itsoffset deltas[k] per input.",
"_help": {
"inputs": "Original media paths, in cam-index order (cam 0, cam 1, ...).",
"deltas": "Per-cam delta_seconds from each sidecar. Render uses ffmpeg -itsoffset deltas[k].",
"duration_sec": "Output duration in reference timeline.",
"audio_source": "Cam index whose audio track becomes the master. Single source — not a mix.",
"coverage": "[start, end] per cam in reference timeline.",
"edl": "List of {cam, start, end} segments. Times are reference-timeline seconds."
},
"inputs": ["cam_a.MOV", "cam_b.MOV"],
"deltas": [0.0, 12.345],
"duration_sec": 4512,
"audio_source": 0,
"coverage": [[0.0, 4512.0], [12.345, 4499.835]],
"edl": [{"cam": 0, "start": 0, "end": 13}, {"cam": 1, "start": 13, "end": 28}, ...]
}autoedit.py_about_helpRender
渲染
| Script | What it does |
|---|---|
| Hard cuts only. |
| Hard cuts + corner picture-in-picture overlay. Main cam = EDL row's |
Both apply per input.
-itsoffset deltas[k]| 脚本 | 功能 |
|---|---|
| 仅硬切效果。基于每个片段的 |
| 硬切效果 + 角落画中画叠加。主机位为EDL行中的 |
两个脚本都会为每个输入应用参数。
-itsoffset deltas[k]Brainstorm before running
运行前的思考
Three real knobs to confirm with the user:
- Pacing — (varied dwell, easier on the ear) vs
--mode rotation(energy-following, snappier).--mode greedy - PiP — yes / no. If yes, which corner + width?
- Min cut length — floor. 8 s default for rotation is conservative; talking-heads can go to 4.
--min-dwell
audio_source--audio-source <cam-index>需要与用户确认三个关键设置:
- 节奏 — (停留时间多变,听觉体验更舒适) vs
--mode rotation(跟随能量变化,切换更灵敏)。--mode greedy - 画中画(PiP) — 是/否。如果是,选择哪个角落以及宽度?
- 最小剪辑长度 — 下限。默认的8秒是保守设置;访谈类视频可设为4秒。
--min-dwell
audio_source--audio-source <cam-index>File layout
文件布局
working_dir/
cam_a.MOV # ORIGINAL, untouched
cam_a.MOV.sync.json # from wjs-syncing-multicam
cam_b.MOV # ORIGINAL, untouched
cam_b.MOV.sync.json
edl.json # from autoedit.py
multicam_render.mp4 # from render_cuts.py OR render_pip.pyworking_dir/
cam_a.MOV # 原始文件,未修改
cam_a.MOV.sync.json # 来自wjs-syncing-multicam
cam_b.MOV # 原始文件,未修改
cam_b.MOV.sync.json
edl.json # 来自autoedit.py
multicam_render.mp4 # 来自render_cuts.py 或 render_pip.pyCommon pitfalls
常见误区
- Trusting without listening. Spread + coverage is a proxy. Always sample a 30 s clip before committing — a high-spread track can still be clipped / distorted.
audio_source - Running on the full 75 min before tuning. Run on a 2-min slice first (
autoedit.pyan extract per cam), listen, adjustffmpeg -ss A -t 120/--min-dwell, then commit to full length.--mode - Expecting face-driven framing. This skill doesn't see the video — only the audio. If one cam is well-framed but quiet, the editor won't favor it. Use + per-segment
--audio-sourceoverrides as the manual escape hatch.pip - Re-rendering when sync was wrong. EDL bakes in at autoedit time. If you fix the sidecars later, re-run
deltas[]to regenerate the EDL before re-rendering.autoedit.py
- 未试听就信任。包络跨度+覆盖范围只是代理指标。在正式渲染前务必试听30秒片段——跨度大的音轨仍可能存在削波/失真问题。
audio_source - 未调整参数就直接处理75分钟的完整视频。先处理2分钟的片段(用为每个机位提取片段),试听后调整
ffmpeg -ss A -t 120/--min-dwell,再进行完整长度的处理。--mode - 期望基于人脸的取景。本技能不分析视频画面——仅基于音频。如果某个机位取景良好但音量小,编辑器不会优先选择它。可使用+ 逐段
--audio-source手动覆盖作为补救方案。pip - 同步错误后重新渲染。EDL在autoedit阶段已固化参数。如果后续修复了副文件,需重新运行
deltas[]生成新的EDL后再重新渲染。autoedit.py