transcribe

Original🇺🇸 English
Translated
1 scriptsChecked / no sensitive code detected

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

14installs
Added on

NPX Install

npx skill4agent add openai/skills transcribe

Tags

Translated version includes tags in frontmatter

Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

  1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
  2. Verify
    OPENAI_API_KEY
    is set. If missing, ask the user to set it locally (do not ask them to paste the key).
  3. Run the bundled
    transcribe_diarize.py
    CLI with sensible defaults (fast text transcription).
  4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
  5. Save outputs under
    output/transcribe/
    when working in this repo.

Decision rules

  • Default to
    gpt-4o-mini-transcribe
    with
    --response-format text
    for fast transcription.
  • If the user wants speaker labels or diarization, use
    --model gpt-4o-transcribe-diarize --response-format diarized_json
    .
  • If audio is longer than ~30 seconds, keep
    --chunking-strategy auto
    .
  • Prompting is not supported for
    gpt-4o-transcribe-diarize
    .

Output conventions

  • Use
    output/transcribe/<job-id>/
    for evaluation runs.
  • Use
    --out-dir
    for multiple files to avoid overwriting.

Dependencies (install if missing)

Prefer
uv
for dependency management.
uv pip install openai
If
uv
is unavailable:
python3 -m pip install openai

Environment

  • OPENAI_API_KEY
    must be set for live API calls.
  • If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
  • Never ask the user to paste the full key in chat.

Skill path (set once)

bash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
User-scoped skills install under
$CODEX_HOME/skills
(default:
~/.codex/skills
).

CLI quick start

Single file (fast text default):
python3 "$TRANSCRIBE_CLI" \
  path/to/audio.wav \
  --out transcript.txt
Diarization with known speakers (up to 4):
python3 "$TRANSCRIBE_CLI" \
  meeting.m4a \
  --model gpt-4o-transcribe-diarize \
  --known-speaker "Alice=refs/alice.wav" \
  --known-speaker "Bob=refs/bob.wav" \
  --response-format diarized_json \
  --out-dir output/transcribe/meeting
Plain text output (explicit):
python3 "$TRANSCRIBE_CLI" \
  interview.mp3 \
  --response-format text \
  --out interview.txt

Reference map

  • references/api.md
    : supported formats, limits, response formats, and known-speaker notes.