gemma-tuner-multimodal

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemma Multimodal Fine-Tuner

Gemma多模态微调工具

Skill by ara.so — Daily 2026 Skills collection.

Fine-tune Gemma 4 and Gemma 3n models on text, images, and audio data entirely on Apple Silicon (MPS), with support for streaming large datasets from GCS/BigQuery without filling local storage.

由ara.so提供的Skill — 2026年度每日Skill合集。

你可以完全在Apple Silicon（MPS）设备上基于文本、图像、音频数据微调Gemma 4和Gemma 3n模型，支持从GCS/BigQuery流式加载大规模数据集，不会占用本地存储。

What It Does

功能特性

Text LoRA: instruction-tuning or completion fine-tuning from local CSV
Image + Text LoRA: captioning and VQA from local CSV
Audio + Text LoRA: the only Apple-Silicon-native path for this modality
Cloud streaming: train on terabytes from GCS/BigQuery without local copy
MPS-native: no NVIDIA GPU required — runs on MacBook Pro/Air/Mac Studio

Text LoRA：基于本地CSV文件进行指令微调或补全微调
图像+文本LoRA：基于本地CSV文件实现图像描述生成和视觉问答（VQA）
音频+文本LoRA：目前唯一原生支持Apple Silicon的该模态微调方案
云端流式加载：无需本地下载，可直接基于GCS/BigQuery上的TB级数据集训练
原生支持MPS：无需NVIDIA GPU，可在MacBook Pro/Air/Mac Studio上运行

Installation

安装

Prerequisites

前置要求

macOS 12.3+ with Apple Silicon (arm64)
Python 3.10+ (native arm64, not Rosetta)
Hugging Face account with Gemma access

bash

undefined

搭载Apple Silicon（arm64架构）的设备，macOS 12.3及以上版本
Python 3.10及以上版本（原生arm64版本，非Rosetta转译版本）
已开通Gemma访问权限的Hugging Face账号

bash

undefined

Install Python 3.12 if needed

brew install python@3.12

Create venv

python3.12 -m venv .venv source .venv/bin/activate

Verify arm64 (must show arm64, not x86_64)

python -c "import platform; print(platform.machine())"

Install PyTorch

pip install torch torchaudio

Clone and install

git clone https://github.com/mattmireles/gemma-tuner-multimodal cd gemma-tuner-multimodal pip install -e .

For Gemma 4 support (separate venv recommended)

pip install -r requirements/requirements-gemma4.txt

undefined

pip install -r requirements/requirements-gemma4.txt

undefined

Authenticate with Hugging Face

Hugging Face身份认证

bash

huggingface-cli login

bash

huggingface-cli login

Or set environment variable:

export HF_TOKEN=your_token_here

---

export HF_TOKEN=your_token_here

---

CLI Commands

CLI命令

bash

undefined

bash

undefined

Check system is ready

gemma-macos-tuner system-check

Guided setup wizard (recommended for first run)

gemma-macos-tuner wizard

Prepare dataset

gemma-macos-tuner prepare <dataset-profile>

Fine-tune a model

gemma-macos-tuner finetune <profile> --json-logging

Evaluate a run

gemma-macos-tuner evaluate <profile-or-run>

Export merged HF/SafeTensors (merges LoRA when adapter_config.json present)

gemma-macos-tuner export <run-dir-or-profile>

Blacklist bad samples from errors

gemma-macos-tuner blacklist <profile>

List training runs

gemma-macos-tuner runs list

---

gemma-macos-tuner runs list

---

Configuration (

config/config.ini

)

配置说明（

config/config.ini

）

The config is hierarchical INI: defaults → groups → models → datasets → profiles.

ini

[defaults]
output_dir = output
batch_size = 2
gradient_accumulation_steps = 8
learning_rate = 2e-4
num_train_epochs = 3

[model:gemma-3n-e2b-it]
group = gemma
base_model = google/gemma-3n-E2B-it

[model:gemma-4-e2b-it]
group = gemma
base_model = google/gemma-4-E2B-it

[dataset:my-audio-dataset]
data_dir = data/datasets/my-audio-dataset
audio_column = audio_path
text_column = transcript

[profile:my-audio-profile]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05
max_seq_length = 512

Use

GEMMA_TUNER_CONFIG

env var to point to config outside repo root:

bash

export GEMMA_TUNER_CONFIG=/path/to/my/config.ini

配置采用层级化INI结构：默认配置 → 分组配置 → 模型配置 → 数据集配置 → 运行配置。

ini

[defaults]
output_dir = output
batch_size = 2
gradient_accumulation_steps = 8
learning_rate = 2e-4
num_train_epochs = 3

[model:gemma-3n-e2b-it]
group = gemma
base_model = google/gemma-3n-E2B-it

[model:gemma-4-e2b-it]
group = gemma
base_model = google/gemma-4-E2B-it

[dataset:my-audio-dataset]
data_dir = data/datasets/my-audio-dataset
audio_column = audio_path
text_column = transcript

[profile:my-audio-profile]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05
max_seq_length = 512

你可以使用

GEMMA_TUNER_CONFIG

环境变量指定仓库根目录外的配置文件路径：

bash

export GEMMA_TUNER_CONFIG=/path/to/my/config.ini

Modality Configuration

模态配置

Text-Only Fine-Tuning

纯文本微调

Instruction tuning (user/assistant pairs):

ini

[profile:text-instruction]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = instruction
prompt_column = prompt
text_column = response
max_seq_length = 2048
lora_r = 16
lora_alpha = 32

Completion tuning (full sequence trained):

ini

[profile:text-completion]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = completion
text_column = text
max_seq_length = 2048

CSV format for instruction tuning (

data/datasets/my-text-dataset/train.csv

csv

prompt,response
"What is photosynthesis?","Photosynthesis is the process by which plants..."
"Explain LoRA fine-tuning","LoRA (Low-Rank Adaptation) is a parameter-efficient..."

指令微调（用户/助手对话对形式）：

ini

[profile:text-instruction]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = instruction
prompt_column = prompt
text_column = response
max_seq_length = 2048
lora_r = 16
lora_alpha = 32

补全微调（训练完整序列）：

ini

[profile:text-completion]
model = gemma-3n-e2b-it
dataset = my-text-dataset
modality = text
text_sub_mode = completion
text_column = text
max_seq_length = 2048

指令微调的CSV格式（

data/datasets/my-text-dataset/train.csv

）：

csv

prompt,response
"What is photosynthesis?","Photosynthesis is the process by which plants..."
"Explain LoRA fine-tuning","LoRA (Low-Rank Adaptation) is a parameter-efficient..."

Image Fine-Tuning

图像微调

ini

[profile:image-caption]
model = gemma-3n-e2b-it
dataset = my-image-dataset
modality = image
image_sub_mode = captioning
image_token_budget = 256
prompt_column = prompt
text_column = caption
max_seq_length = 512

CSV format (

data/datasets/my-image-dataset/train.csv

csv

image_path,prompt,caption
/data/images/img1.jpg,Describe this image,A dog sitting on a green lawn...
/data/images/img2.jpg,What is shown here,A bar chart showing quarterly revenue...

ini

[profile:image-caption]
model = gemma-3n-e2b-it
dataset = my-image-dataset
modality = image
image_sub_mode = captioning
image_token_budget = 256
prompt_column = prompt
text_column = caption
max_seq_length = 512

CSV格式（

data/datasets/my-image-dataset/train.csv

）：

csv

image_path,prompt,caption
/data/images/img1.jpg,Describe this image,A dog sitting on a green lawn...
/data/images/img2.jpg,What is shown here,A bar chart showing quarterly revenue...

Audio Fine-Tuning

音频微调

ini

[profile:audio-asr]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
audio_column = audio_path
text_column = transcript
max_seq_length = 512
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05

CSV format (

data/datasets/my-audio-dataset/train.csv

csv

audio_path,transcript
/data/audio/recording1.wav,The patient presents with acute respiratory symptoms
/data/audio/recording2.wav,Counsel objects to the characterization of the evidence

ini

[profile:audio-asr]
model = gemma-3n-e2b-it
dataset = my-audio-dataset
modality = audio
audio_column = audio_path
text_column = transcript
max_seq_length = 512
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05

CSV格式（

data/datasets/my-audio-dataset/train.csv

）：

csv

audio_path,transcript
/data/audio/recording1.wav,The patient presents with acute respiratory symptoms
/data/audio/recording2.wav,Counsel objects to the characterization of the evidence

Supported Models

支持的模型

Model Key	Hugging Face ID	Notes
`gemma-3n-e2b-it`	`google/gemma-3n-E2B-it`	Default, ~2B instruct
`gemma-3n-e4b-it`	`google/gemma-3n-E4B-it`	~4B instruct
`gemma-4-e2b-it`	`google/gemma-4-E2B-it`	Needs requirements-gemma4.txt
`gemma-4-e4b-it`	`google/gemma-4-E4B-it`	Needs requirements-gemma4.txt
`gemma-4-e2b`	`google/gemma-4-E2B`	Base, needs Gemma 4 stack
`gemma-4-e4b`	`google/gemma-4-E4B`	Base, needs Gemma 4 stack

Add custom models with a

[model:your-name]

section using

group = gemma

模型键名	Hugging Face ID	备注
`gemma-3n-e2b-it`	`google/gemma-3n-E2B-it`	默认模型，约2B参数的对话版
`gemma-3n-e4b-it`	`google/gemma-3n-E4B-it`	约4B参数的对话版
`gemma-4-e2b-it`	`google/gemma-4-E2B-it`	需要安装requirements-gemma4.txt依赖
`gemma-4-e4b-it`	`google/gemma-4-E4B-it`	需要安装requirements-gemma4.txt依赖
`gemma-4-e2b`	`google/gemma-4-E2B`	基座模型，需要Gemma 4依赖栈
`gemma-4-e4b`	`google/gemma-4-E4B`	基座模型，需要Gemma 4依赖栈

你可以通过添加

[model:your-name]

配置段，指定

group = gemma

来接入自定义模型。

Dataset Directory Layout

数据集目录结构

data/
└── datasets/
    └── <dataset-name>/
        ├── train.csv       # required
        ├── validation.csv  # optional
        └── test.csv        # optional

data/
└── datasets/
    └── <dataset-name>/
        ├── train.csv       # 必需
        ├── validation.csv  # 可选
        └── test.csv        # 可选

Output Layout

输出目录结构

output/
└── {run-id}-{profile}/
    ├── metadata.json
    ├── metrics.json
    ├── checkpoint-*/
    └── adapter_model/      # LoRA artifacts

output/
└── {run-id}-{profile}/
    ├── metadata.json
    ├── metrics.json
    ├── checkpoint-*/
    └── adapter_model/      # LoRA产物

Python API Examples

Python API示例

Running Fine-Tuning Programmatically

编程式运行微调

python

from gemma_tuner.core.config import load_config
from gemma_tuner.core.ops import run_finetune

python

from gemma_tuner.core.config import load_config
from gemma_tuner.core.ops import run_finetune

Load config

config = load_config("config/config.ini")

Run fine-tuning for a profile

run_finetune(profile="my-audio-profile", config=config, json_logging=True)

undefined

run_finetune(profile="my-audio-profile", config=config, json_logging=True)

undefined

Using Device Utilities

使用设备工具

python

from gemma_tuner.utils.device import get_device, memory_hint

device = get_device()   # Returns "mps", "cuda", or "cpu"
print(f"Training on: {device}")

hint = memory_hint(model_key="gemma-3n-e2b-it")
print(hint)

python

from gemma_tuner.utils.device import get_device, memory_hint

device = get_device()   # Returns "mps", "cuda", or "cpu"
print(f"Training on: {device}")

hint = memory_hint(model_key="gemma-3n-e2b-it")
print(hint)

Loading and Inspecting Datasets

加载与查看数据集

python

from gemma_tuner.utils.dataset_utils import load_csv_dataset

train_df, val_df = load_csv_dataset(
    data_dir="data/datasets/my-text-dataset",
    text_column="response",
    prompt_column="prompt"
)
print(f"Train samples: {len(train_df)}, Val samples: {len(val_df)}")

python

from gemma_tuner.utils.dataset_utils import load_csv_dataset

train_df, val_df = load_csv_dataset(
    data_dir="data/datasets/my-text-dataset",
    text_column="response",
    prompt_column="prompt"
)
print(f"Train samples: {len(train_df)}, Val samples: {len(val_df)}")

Custom LoRA Config

自定义LoRA配置

python

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3n-E2B-it",
    torch_dtype="auto",
    device_map="mps"
)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

python

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3n-E2B-it",
    torch_dtype="auto",
    device_map="mps"
)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Common Patterns

常用使用流程

Full Workflow: Text Instruction Tuning

完整流程：文本指令微调

bash

undefined

bash

undefined

1. Prepare your data

1. 准备数据

mkdir -p data/datasets/my-dataset cp train.csv data/datasets/my-dataset/ cp validation.csv data/datasets/my-dataset/

2. Add profile to config/config.ini

2. 在config/config.ini中添加运行配置

cat >> config/config.ini << 'EOF' [dataset:my-dataset] data_dir = data/datasets/my-dataset

[profile:my-text-run] model = gemma-3n-e2b-it dataset = my-dataset modality = text text_sub_mode = instruction prompt_column = prompt text_column = response max_seq_length = 2048 lora_r = 16 lora_alpha = 32 EOF

cat >> config/config.ini << 'EOF' [dataset:my-dataset] data_dir = data/datasets/my-dataset

3. Prepare dataset

3. 预处理数据集

gemma-macos-tuner prepare my-dataset

4. Fine-tune

4. 执行微调

gemma-macos-tuner finetune my-text-run --json-logging

5. Export merged weights

5. 导出合并后的权重

gemma-macos-tuner export my-text-run

undefined

gemma-macos-tuner export my-text-run

undefined

GCS Streaming for Large Datasets

针对大规模数据集的GCS流式加载

ini

[dataset:large-audio-gcs]
source = gcs
gcs_bucket = my-bucket
gcs_prefix = audio-training-data/
audio_column = audio_path
text_column = transcript

[profile:large-audio-run]
model = gemma-3n-e4b-it
dataset = large-audio-gcs
modality = audio
lora_r = 32
lora_alpha = 64

Set credentials:

bash

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
gemma-macos-tuner finetune large-audio-run

ini

[dataset:large-audio-gcs]
source = gcs
gcs_bucket = my-bucket
gcs_prefix = audio-training-data/
audio_column = audio_path
text_column = transcript

[profile:large-audio-run]
model = gemma-3n-e4b-it
dataset = large-audio-gcs
modality = audio
lora_r = 32
lora_alpha = 64

设置凭证：

bash

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
gemma-macos-tuner finetune large-audio-run

Add a Custom Gemma Checkpoint

添加自定义Gemma检查点

ini

[model:my-custom-gemma]
group = gemma
base_model = my-org/my-gemma-checkpoint

[profile:custom-run]
model = my-custom-gemma
dataset = my-dataset
modality = text
text_sub_mode = instruction

ini

[model:my-custom-gemma]
group = gemma
base_model = my-org/my-gemma-checkpoint

[profile:custom-run]
model = my-custom-gemma
dataset = my-dataset
modality = text
text_sub_mode = instruction

Troubleshooting

故障排查

Wrong architecture (x86_64 instead of arm64)

架构错误（显示x86_64而非arm64）

bash

python -c "import platform; print(platform.machine())"

bash

python -c "import platform; print(platform.machine())"

Must be arm64 — if x86_64, reinstall Python natively:

必须返回arm64 — 如果是x86_64，请重新安装原生arm64版本的Python：

brew install python@3.12 python3.12 -m venv .venv && source .venv/bin/activate

undefined

brew install python@3.12 python3.12 -m venv .venv && source .venv/bin/activate

undefined

MPS out of memory

MPS内存不足

Reduce
```
batch_size
```
(try 1)
Increase
```
gradient_accumulation_steps
```
to compensate
Use a smaller model (
```
e2b
```
instead of
```
e4b
```
)
Reduce
```
max_seq_length
```

减小
```
batch_size
```
（可尝试设为1）
增大
```
gradient_accumulation_steps
```
补偿batch size减小的影响
使用更小的模型（选
```
e2b
```
版本而非
```
e4b
```
版本）
减小
```
max_seq_length
```

Gemma 4 model not loading

Gemma 4模型加载失败

bash

undefined

bash

undefined

Gemma 4 requires the updated Transformers stack

Gemma 4需要更新版本的Transformers栈

pip install -r requirements/requirements-gemma4.txt

Use a separate venv if you also need Gemma 3n

如果你同时需要使用Gemma 3n，建议使用独立的虚拟环境

undefined

undefined

Config not found outside repo root

无法找到仓库外的配置文件

bash

export GEMMA_TUNER_CONFIG=/absolute/path/to/config/config.ini
gemma-macos-tuner finetune my-profile

bash

export GEMMA_TUNER_CONFIG=/absolute/path/to/config/config.ini
gemma-macos-tuner finetune my-profile

Hugging Face auth errors

Hugging Face认证错误

bash

huggingface-cli login

bash

huggingface-cli login

Or:

或者：

export HF_TOKEN=your_hf_token

Accept Gemma license at: https://huggingface.co/google/gemma-3n-E2B-it

请先在https://huggingface.co/google/gemma-3n-E2B-it接受Gemma的使用许可

undefined

undefined

System check before debugging anything else

调试前先执行系统检查

bash

gemma-macos-tuner system-check

bash

gemma-macos-tuner system-check

Audio tower loaded even for text-only runs

纯文本运行时也加载了音频塔权重

This is a known v1 issue — USM audio tower weights stay in memory even for

modality = text

. See

README/KNOWN_ISSUES.md

. Workaround: use a smaller model variant to stay within RAM budget.

这是v1版本的已知问题 — 即使设置了

modality = text

，USM音频塔权重仍会保留在内存中，详见

README/KNOWN_ISSUES.md

。临时解决方案：使用更小的模型变体，控制内存占用在RAM预算内。

Architecture Reference

架构参考

File	Role
`gemma_tuner/cli_typer.py`	Main CLI entrypoint ( `gemma-macos-tuner` )
`gemma_tuner/core/ops.py`	Dispatches prepare/finetune/evaluate/export
`gemma_tuner/scripts/finetune.py`	Router: Gemma models → `models/gemma/finetune.py`
`gemma_tuner/models/gemma/finetune.py`	Core training loop with LoRA
`gemma_tuner/scripts/export.py`	Merges LoRA → HF/SafeTensors tree
`gemma_tuner/utils/device.py`	MPS/CUDA/CPU selection and memory hints
`gemma_tuner/utils/dataset_utils.py`	CSV loading, blacklist/protection semantics
`gemma_tuner/wizard/`	Interactive CLI wizard (questionary + Rich)
`config/config.ini`	Hierarchical INI configuration

文件路径	作用
`gemma_tuner/cli_typer.py`	CLI主入口（对应 `gemma-macos-tuner` 命令）
`gemma_tuner/core/ops.py`	分发预处理/微调/评估/导出任务
`gemma_tuner/scripts/finetune.py`	路由：将Gemma模型请求转发到 `models/gemma/finetune.py`
`gemma_tuner/models/gemma/finetune.py`	基于LoRA的核心训练循环实现
`gemma_tuner/scripts/export.py`	合并LoRA权重，导出为HF/SafeTensors格式
`gemma_tuner/utils/device.py`	MPS/CUDA/CPU设备选择与内存提示功能
`gemma_tuner/utils/dataset_utils.py`	CSV加载、黑名单/样本保护语义实现
`gemma_tuner/wizard/`	交互式CLI向导实现（基于questionary + Rich）
`config/config.ini`	层级化INI配置文件

gemma-tuner-multimodal

Original

Translation

Gemma Multimodal Fine-Tuner

Gemma多模态微调工具

What It Does

功能特性

Installation

安装

Prerequisites

前置要求

Install Python 3.12 if needed

Install Python 3.12 if needed

Create venv

Create venv

Verify arm64 (must show arm64, not x86_64)

Verify arm64 (must show arm64, not x86_64)

Install PyTorch

Install PyTorch

Clone and install

Clone and install

For Gemma 4 support (separate venv recommended)

For Gemma 4 support (separate venv recommended)

Authenticate with Hugging Face

Hugging Face身份认证

Or set environment variable:

Or set environment variable:

CLI Commands

CLI命令

Check system is ready

Check system is ready

Guided setup wizard (recommended for first run)

Guided setup wizard (recommended for first run)

Prepare dataset

Prepare dataset

Fine-tune a model

Fine-tune a model

Evaluate a run

Evaluate a run

Export merged HF/SafeTensors (merges LoRA when adapter_config.json present)

Export merged HF/SafeTensors (merges LoRA when adapter_config.json present)

Blacklist bad samples from errors

Blacklist bad samples from errors

List training runs

List training runs

Configuration (config/config.ini)

配置说明（config/config.ini）

Modality Configuration

模态配置

Text-Only Fine-Tuning

纯文本微调

Image Fine-Tuning

图像微调

Audio Fine-Tuning

音频微调

Supported Models

支持的模型

Dataset Directory Layout

数据集目录结构

Output Layout

输出目录结构

Python API Examples

Python API示例

Running Fine-Tuning Programmatically

编程式运行微调

Load config

Load config

Run fine-tuning for a profile

Run fine-tuning for a profile

Using Device Utilities

使用设备工具

Loading and Inspecting Datasets

加载与查看数据集

Custom LoRA Config

自定义LoRA配置

Common Patterns

常用使用流程

Full Workflow: Text Instruction Tuning

完整流程：文本指令微调

1. Prepare your data

Configuration (
`config/config.ini`
)

配置说明（
`config/config.ini`
）