gemma-tuner-multimodal

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemma Multimodal Fine-Tuner

Gemma多模态微调工具

Skill by ara.so — Daily 2026 Skills collection.
Fine-tune Gemma 4 and Gemma 3n models on text, images, and audio data entirely on Apple Silicon (MPS), with support for streaming large datasets from GCS/BigQuery without filling local storage.

ara.so提供的Skill — 2026年度每日Skill合集。
你可以完全在Apple Silicon(MPS)设备上基于文本、图像、音频数据微调Gemma 4和Gemma 3n模型,支持从GCS/BigQuery流式加载大规模数据集,不会占用本地存储。

What It Does

功能特性

  • Text LoRA: instruction-tuning or completion fine-tuning from local CSV
  • Image + Text LoRA: captioning and VQA from local CSV
  • Audio + Text LoRA: the only Apple-Silicon-native path for this modality
  • Cloud streaming: train on terabytes from GCS/BigQuery without local copy
  • MPS-native: no NVIDIA GPU required — runs on MacBook Pro/Air/Mac Studio

  • Text LoRA:基于本地CSV文件进行指令微调或补全微调
  • 图像+文本LoRA:基于本地CSV文件实现图像描述生成和视觉问答(VQA)
  • 音频+文本LoRA:目前唯一原生支持Apple Silicon的该模态微调方案
  • 云端流式加载:无需本地下载,可直接基于GCS/BigQuery上的TB级数据集训练
  • 原生支持MPS:无需NVIDIA GPU,可在MacBook Pro/Air/Mac Studio上运行

Installation

安装

Prerequisites

前置要求

  • macOS 12.3+ with Apple Silicon (arm64)
  • Python 3.10+ (native arm64, not Rosetta)
  • Hugging Face account with Gemma access
bash
undefined
  • 搭载Apple Silicon(arm64架构)的设备,macOS 12.3及以上版本
  • Python 3.10及以上版本(原生arm64版本,非Rosetta转译版本)
  • 已开通Gemma访问权限的Hugging Face账号
bash
undefined

Install Python 3.12 if needed

Install Python 3.12 if needed

brew install python@3.12
brew install python@3.12

Create venv

Create venv

python3.12 -m venv .venv source .venv/bin/activate
python3.12 -m venv .venv source .venv/bin/activate

Verify arm64 (must show arm64, not x86_64)

Verify arm64 (must show arm64, not x86_64)

python -c "import platform; print(platform.machine())"
python -c "import platform; print(platform.machine())"

Install PyTorch

Install PyTorch

pip install torch torchaudio
pip install torch torchaudio

Clone and install

Clone and install

git clone https://github.com/mattmireles/gemma-tuner-multimodal cd gemma-tuner-multimodal pip install -e .
git clone https://github.com/mattmireles/gemma-tuner-multimodal cd gemma-tuner-multimodal pip install -e .

For Gemma 4 support (separate venv recommended)

For Gemma 4 support (separate venv recommended)

pip install -r requirements/requirements-gemma4.txt
undefined
pip install -r requirements/requirements-gemma4.txt
undefined

Authenticate with Hugging Face

Hugging Face身份认证

bash
huggingface-cli login
bash
huggingface-cli login

Or set environment variable:

Or set environment variable:

export HF_TOKEN=your_token_here

---
export HF_TOKEN=your_token_here

---

CLI Commands

CLI命令

bash
undefined
bash
undefined

Check system is ready

Check system is ready

gemma-macos-tuner system-check
gemma-macos-tuner system-check

Guided setup wizard (recommended for first run)

Guided setup wizard (recommended for first run)

gemma-macos-tuner wizard
gemma-macos-tuner wizard

Prepare dataset

Prepare dataset

gemma-macos-tuner prepare <dataset-profile>
gemma-macos-tuner prepare <dataset-profile>

Fine-tune a model

Fine-tune a model

gemma-macos-tuner finetune <profile> --json-logging
gemma-macos-tuner finetune <profile> --json-logging

Evaluate a run

Evaluate a run

gemma-macos-tuner evaluate <profile-or-run>
gemma-macos-tuner evaluate <profile-or-run>

Export merged HF/SafeTensors (merges LoRA when adapter_config.json present)

Export merged HF/SafeTensors (merges LoRA when adapter_config.json present)

gemma-macos-tuner export <run-dir-or-profile>
gemma-macos-tuner export <run-dir-or-profile>

Blacklist bad samples from errors

Blacklist bad samples from errors

gemma-macos-tuner blacklist <profile>
gemma-macos-tuner blacklist <profile>

List training runs

List training runs

gemma-macos-tuner runs list

---
gemma-macos-tuner runs list

---

Configuration (
config/config.ini
)

配置说明(
config/config.ini

The config is hierarchical INI: defaults → groups → models → datasets → profiles.
ini
[defaults]
output_dir = output
batch_size = 2
gradient_accumulation_steps = 8
learning_rate = 2e-4
num_train_epochs = 3

[model:gemma-3n-e2b-it]
group = gemma
base_model = google/gemma-3n-E2B-it

[model:gemma-4-e2b-it]
group = gemma
base_model = google/gemma-4-E2B-it

[dataset:my-audio-dataset]
data_dir = data/datasets/my-audio-dataset
audio_column = audio_path
text_column = transcript

[profile:my-audio-profile]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05
max_seq_length = 512
Use
GEMMA_TUNER_CONFIG
env var to point to config outside repo root:
bash
export GEMMA_TUNER_CONFIG=/path/to/my/config.ini

配置采用层级化INI结构:默认配置 → 分组配置 → 模型配置 → 数据集配置 → 运行配置。
ini
[defaults]
output_dir = output
batch_size = 2
gradient_accumulation_steps = 8
learning_rate = 2e-4
num_train_epochs = 3

[model:gemma-3n-e2b-it]
group = gemma
base_model = google/gemma-3n-E2B-it

[model:gemma-4-e2b-it]
group = gemma
base_model = google/gemma-4-E2B-it

[dataset:my-audio-dataset]
data_dir = data/datasets/my-audio-dataset
audio_column = audio_path
text_column = transcript

[profile:my-audio-profile]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05
max_seq_length = 512
你可以使用
GEMMA_TUNER_CONFIG
环境变量指定仓库根目录外的配置文件路径:
bash
export GEMMA_TUNER_CONFIG=/path/to/my/config.ini

Modality Configuration

模态配置

Text-Only Fine-Tuning

纯文本微调

Instruction tuning (user/assistant pairs):
ini
[profile:text-instruction]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = instruction
prompt_column = prompt
text_column = response
max_seq_length = 2048
lora_r = 16
lora_alpha = 32
Completion tuning (full sequence trained):
ini
[profile:text-completion]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = completion
text_column = text
max_seq_length = 2048
CSV format for instruction tuning (
data/datasets/my-text-dataset/train.csv
):
csv
prompt,response
"What is photosynthesis?","Photosynthesis is the process by which plants..."
"Explain LoRA fine-tuning","LoRA (Low-Rank Adaptation) is a parameter-efficient..."
指令微调(用户/助手对话对形式):
ini
[profile:text-instruction]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = instruction
prompt_column = prompt
text_column = response
max_seq_length = 2048
lora_r = 16
lora_alpha = 32
补全微调(训练完整序列):
ini
[profile:text-completion]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = completion
text_column = text
max_seq_length = 2048
指令微调的CSV格式
data/datasets/my-text-dataset/train.csv
):
csv
prompt,response
"What is photosynthesis?","Photosynthesis is the process by which plants..."
"Explain LoRA fine-tuning","LoRA (Low-Rank Adaptation) is a parameter-efficient..."

Image Fine-Tuning

图像微调

ini
[profile:image-caption]
model = gemma-3n-e2b-it
dataset = my-image-dataset
modality = image
image_sub_mode = captioning
image_token_budget = 256
prompt_column = prompt
text_column = caption
max_seq_length = 512
CSV format (
data/datasets/my-image-dataset/train.csv
):
csv
image_path,prompt,caption
/data/images/img1.jpg,Describe this image,A dog sitting on a green lawn...
/data/images/img2.jpg,What is shown here,A bar chart showing quarterly revenue...
ini
[profile:image-caption]
model = gemma-3n-e2b-it
dataset = my-image-dataset
modality = image
image_sub_mode = captioning
image_token_budget = 256
prompt_column = prompt
text_column = caption
max_seq_length = 512
CSV格式
data/datasets/my-image-dataset/train.csv
):
csv
image_path,prompt,caption
/data/images/img1.jpg,Describe this image,A dog sitting on a green lawn...
/data/images/img2.jpg,What is shown here,A bar chart showing quarterly revenue...

Audio Fine-Tuning

音频微调

ini
[profile:audio-asr]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
audio_column = audio_path
text_column = transcript
max_seq_length = 512
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05
CSV format (
data/datasets/my-audio-dataset/train.csv
):
csv
audio_path,transcript
/data/audio/recording1.wav,The patient presents with acute respiratory symptoms
/data/audio/recording2.wav,Counsel objects to the characterization of the evidence

ini
[profile:audio-asr]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
audio_column = audio_path
text_column = transcript
max_seq_length = 512
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05
CSV格式
data/datasets/my-audio-dataset/train.csv
):
csv
audio_path,transcript
/data/audio/recording1.wav,The patient presents with acute respiratory symptoms
/data/audio/recording2.wav,Counsel objects to the characterization of the evidence

Supported Models

支持的模型

Model KeyHugging Face IDNotes
gemma-3n-e2b-it
google/gemma-3n-E2B-it
Default, ~2B instruct
gemma-3n-e4b-it
google/gemma-3n-E4B-it
~4B instruct
gemma-4-e2b-it
google/gemma-4-E2B-it
Needs requirements-gemma4.txt
gemma-4-e4b-it
google/gemma-4-E4B-it
Needs requirements-gemma4.txt
gemma-4-e2b
google/gemma-4-E2B
Base, needs Gemma 4 stack
gemma-4-e4b
google/gemma-4-E4B
Base, needs Gemma 4 stack
Add custom models with a
[model:your-name]
section using
group = gemma
.

模型键名Hugging Face ID备注
gemma-3n-e2b-it
google/gemma-3n-E2B-it
默认模型,约2B参数的对话版
gemma-3n-e4b-it
google/gemma-3n-E4B-it
约4B参数的对话版
gemma-4-e2b-it
google/gemma-4-E2B-it
需要安装requirements-gemma4.txt依赖
gemma-4-e4b-it
google/gemma-4-E4B-it
需要安装requirements-gemma4.txt依赖
gemma-4-e2b
google/gemma-4-E2B
基座模型,需要Gemma 4依赖栈
gemma-4-e4b
google/gemma-4-E4B
基座模型,需要Gemma 4依赖栈
你可以通过添加
[model:your-name]
配置段,指定
group = gemma
来接入自定义模型。

Dataset Directory Layout

数据集目录结构

data/
└── datasets/
    └── <dataset-name>/
        ├── train.csv       # required
        ├── validation.csv  # optional
        └── test.csv        # optional

data/
└── datasets/
    └── <dataset-name>/
        ├── train.csv       # 必需
        ├── validation.csv  # 可选
        └── test.csv        # 可选

Output Layout

输出目录结构

output/
└── {run-id}-{profile}/
    ├── metadata.json
    ├── metrics.json
    ├── checkpoint-*/
    └── adapter_model/      # LoRA artifacts

output/
└── {run-id}-{profile}/
    ├── metadata.json
    ├── metrics.json
    ├── checkpoint-*/
    └── adapter_model/      # LoRA产物

Python API Examples

Python API示例

Running Fine-Tuning Programmatically

编程式运行微调

python
from gemma_tuner.core.config import load_config
from gemma_tuner.core.ops import run_finetune
python
from gemma_tuner.core.config import load_config
from gemma_tuner.core.ops import run_finetune

Load config

Load config

config = load_config("config/config.ini")
config = load_config("config/config.ini")

Run fine-tuning for a profile

Run fine-tuning for a profile

run_finetune(profile="my-audio-profile", config=config, json_logging=True)
undefined
run_finetune(profile="my-audio-profile", config=config, json_logging=True)
undefined

Using Device Utilities

使用设备工具

python
from gemma_tuner.utils.device import get_device, memory_hint

device = get_device()   # Returns "mps", "cuda", or "cpu"
print(f"Training on: {device}")

hint = memory_hint(model_key="gemma-3n-e2b-it")
print(hint)
python
from gemma_tuner.utils.device import get_device, memory_hint

device = get_device()   # Returns "mps", "cuda", or "cpu"
print(f"Training on: {device}")

hint = memory_hint(model_key="gemma-3n-e2b-it")
print(hint)

Loading and Inspecting Datasets

加载与查看数据集

python
from gemma_tuner.utils.dataset_utils import load_csv_dataset

train_df, val_df = load_csv_dataset(
    data_dir="data/datasets/my-text-dataset",
    text_column="response",
    prompt_column="prompt"
)
print(f"Train samples: {len(train_df)}, Val samples: {len(val_df)}")
python
from gemma_tuner.utils.dataset_utils import load_csv_dataset

train_df, val_df = load_csv_dataset(
    data_dir="data/datasets/my-text-dataset",
    text_column="response",
    prompt_column="prompt"
)
print(f"Train samples: {len(train_df)}, Val samples: {len(val_df)}")

Custom LoRA Config

自定义LoRA配置

python
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3n-E2B-it",
    torch_dtype="auto",
    device_map="mps"
)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

python
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3n-E2B-it",
    torch_dtype="auto",
    device_map="mps"
)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Common Patterns

常用使用流程

Full Workflow: Text Instruction Tuning

完整流程:文本指令微调

bash
undefined
bash
undefined

1. Prepare your data

1. 准备数据

mkdir -p data/datasets/my-dataset cp train.csv data/datasets/my-dataset/ cp validation.csv data/datasets/my-dataset/
mkdir -p data/datasets/my-dataset cp train.csv data/datasets/my-dataset/ cp validation.csv data/datasets/my-dataset/

2. Add profile to config/config.ini

2. 在config/config.ini中添加运行配置

cat >> config/config.ini << 'EOF' [dataset:my-dataset] data_dir = data/datasets/my-dataset
[profile:my-text-run] model = gemma-3n-e2b-it dataset = my-dataset modality = text text_sub_mode = instruction prompt_column = prompt text_column = response max_seq_length = 2048 lora_r = 16 lora_alpha = 32 EOF
cat >> config/config.ini << 'EOF' [dataset:my-dataset] data_dir = data/datasets/my-dataset
[profile:my-text-run] model = gemma-3n-e2b-it dataset = my-dataset modality = text text_sub_mode = instruction prompt_column = prompt text_column = response max_seq_length = 2048 lora_r = 16 lora_alpha = 32 EOF

3. Prepare dataset

3. 预处理数据集

gemma-macos-tuner prepare my-dataset
gemma-macos-tuner prepare my-dataset

4. Fine-tune

4. 执行微调

gemma-macos-tuner finetune my-text-run --json-logging
gemma-macos-tuner finetune my-text-run --json-logging

5. Export merged weights

5. 导出合并后的权重

gemma-macos-tuner export my-text-run
undefined
gemma-macos-tuner export my-text-run
undefined

GCS Streaming for Large Datasets

针对大规模数据集的GCS流式加载

ini
[dataset:large-audio-gcs]
source = gcs
gcs_bucket = my-bucket
gcs_prefix = audio-training-data/
audio_column = audio_path
text_column = transcript

[profile:large-audio-run]
model = gemma-3n-e4b-it
dataset = large-audio-gcs
modality = audio
lora_r = 32
lora_alpha = 64
Set credentials:
bash
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
gemma-macos-tuner finetune large-audio-run
ini
[dataset:large-audio-gcs]
source = gcs
gcs_bucket = my-bucket
gcs_prefix = audio-training-data/
audio_column = audio_path
text_column = transcript

[profile:large-audio-run]
model = gemma-3n-e4b-it
dataset = large-audio-gcs
modality = audio
lora_r = 32
lora_alpha = 64
设置凭证:
bash
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
gemma-macos-tuner finetune large-audio-run

Add a Custom Gemma Checkpoint

添加自定义Gemma检查点

ini
[model:my-custom-gemma]
group = gemma
base_model = my-org/my-gemma-checkpoint

[profile:custom-run]
model = my-custom-gemma
dataset = my-dataset
modality = text
text_sub_mode = instruction

ini
[model:my-custom-gemma]
group = gemma
base_model = my-org/my-gemma-checkpoint

[profile:custom-run]
model = my-custom-gemma
dataset = my-dataset
modality = text
text_sub_mode = instruction

Troubleshooting

故障排查

Wrong architecture (x86_64 instead of arm64)

架构错误(显示x86_64而非arm64)

bash
python -c "import platform; print(platform.machine())"
bash
python -c "import platform; print(platform.machine())"

Must be arm64 — if x86_64, reinstall Python natively:

必须返回arm64 — 如果是x86_64,请重新安装原生arm64版本的Python:

brew install python@3.12 python3.12 -m venv .venv && source .venv/bin/activate
undefined
brew install python@3.12 python3.12 -m venv .venv && source .venv/bin/activate
undefined

MPS out of memory

MPS内存不足

  • Reduce
    batch_size
    (try 1)
  • Increase
    gradient_accumulation_steps
    to compensate
  • Use a smaller model (
    e2b
    instead of
    e4b
    )
  • Reduce
    max_seq_length
  • 减小
    batch_size
    (可尝试设为1)
  • 增大
    gradient_accumulation_steps
    补偿batch size减小的影响
  • 使用更小的模型(选
    e2b
    版本而非
    e4b
    版本)
  • 减小
    max_seq_length

Gemma 4 model not loading

Gemma 4模型加载失败

bash
undefined
bash
undefined

Gemma 4 requires the updated Transformers stack

Gemma 4需要更新版本的Transformers栈

pip install -r requirements/requirements-gemma4.txt
pip install -r requirements/requirements-gemma4.txt

Use a separate venv if you also need Gemma 3n

如果你同时需要使用Gemma 3n,建议使用独立的虚拟环境

undefined
undefined

Config not found outside repo root

无法找到仓库外的配置文件

bash
export GEMMA_TUNER_CONFIG=/absolute/path/to/config/config.ini
gemma-macos-tuner finetune my-profile
bash
export GEMMA_TUNER_CONFIG=/absolute/path/to/config/config.ini
gemma-macos-tuner finetune my-profile

Hugging Face auth errors

Hugging Face认证错误

bash
huggingface-cli login
bash
huggingface-cli login

Or:

或者:

export HF_TOKEN=your_hf_token
export HF_TOKEN=your_hf_token
undefined
undefined

System check before debugging anything else

调试前先执行系统检查

bash
gemma-macos-tuner system-check
bash
gemma-macos-tuner system-check

Audio tower loaded even for text-only runs

纯文本运行时也加载了音频塔权重

This is a known v1 issue — USM audio tower weights stay in memory even for
modality = text
. See
README/KNOWN_ISSUES.md
. Workaround: use a smaller model variant to stay within RAM budget.

这是v1版本的已知问题 — 即使设置了
modality = text
,USM音频塔权重仍会保留在内存中,详见
README/KNOWN_ISSUES.md
。临时解决方案:使用更小的模型变体,控制内存占用在RAM预算内。

Architecture Reference

架构参考

FileRole
gemma_tuner/cli_typer.py
Main CLI entrypoint (
gemma-macos-tuner
)
gemma_tuner/core/ops.py
Dispatches prepare/finetune/evaluate/export
gemma_tuner/scripts/finetune.py
Router: Gemma models →
models/gemma/finetune.py
gemma_tuner/models/gemma/finetune.py
Core training loop with LoRA
gemma_tuner/scripts/export.py
Merges LoRA → HF/SafeTensors tree
gemma_tuner/utils/device.py
MPS/CUDA/CPU selection and memory hints
gemma_tuner/utils/dataset_utils.py
CSV loading, blacklist/protection semantics
gemma_tuner/wizard/
Interactive CLI wizard (questionary + Rich)
config/config.ini
Hierarchical INI configuration
文件路径作用
gemma_tuner/cli_typer.py
CLI主入口(对应
gemma-macos-tuner
命令)
gemma_tuner/core/ops.py
分发预处理/微调/评估/导出任务
gemma_tuner/scripts/finetune.py
路由:将Gemma模型请求转发到
models/gemma/finetune.py
gemma_tuner/models/gemma/finetune.py
基于LoRA的核心训练循环实现
gemma_tuner/scripts/export.py
合并LoRA权重,导出为HF/SafeTensors格式
gemma_tuner/utils/device.py
MPS/CUDA/CPU设备选择与内存提示功能
gemma_tuner/utils/dataset_utils.py
CSV加载、黑名单/样本保护语义实现
gemma_tuner/wizard/
交互式CLI向导实现(基于questionary + Rich)
config/config.ini
层级化INI配置文件