vision-bench

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vision Bench — LLM Image Evaluation

Vision Bench — LLM图像评估

Compare images by scoring them with one or more vision LLM judges against structured rubric criteria.

通过一个或多个视觉LLM评估模型，对照结构化的评分标准对图像进行评分和对比。

Quick Start

快速开始

bash

undefined

bash

undefined

Install dependencies

安装依赖

pip install pyyaml openai anthropic mistralai

Score a single image

对单张图像评分

python bench.py image.png --criteria photorealism --judge gemini-2.5-flash

Compare two AI-generated images

对比两张AI生成的图像

python bench.py img_a.png img_b.png
--criteria text_to_image
--prompt "a fox in a snowy forest"
--judge gpt-4o

Multi-judge consensus

多评估模型共识评分

python bench.py img.png
--criteria portrait
--judges gpt-4o gemini-2.5-flash claude-opus-4-5-20251022

OpenRouter models (any vision-capable model)

OpenRouter模型（任何支持视觉的模型）

python bench.py img_a.png img_b.png
--criteria artistic_style
--judges "openrouter/meta-llama/llama-4-maverick" "openrouter/mistralai/pixtral-large-2411"

List all presets

列出所有预设

python bench.py --list-presets

Save report to file

将报告保存到文件

python bench.py img.png --criteria chart_analysis --save report.md

undefined

python bench.py img.png --criteria chart_analysis --save report.md

undefined

Presets

预设标准

Preset	Use Case
`text_to_image`	Compare AI image generators (Midjourney, DALL-E, Flux)
`photorealism`	How convincingly an image looks like a photo
`artistic_style`	Style consistency, composition, color harmony
`portrait`	AI-generated portrait quality and realism
`product_photo`	E-commerce product image quality
`document_ocr`	Document text extraction and layout understanding
`chart_analysis`	Chart and data visualization comprehension
`invoice`	Financial document field extraction accuracy
`ui_screenshot`	App/web screenshot understanding
`scientific`	Scientific/medical image accuracy
`alt_text`	Accessibility image description quality

Custom criteria: pass any

.yaml

file as

--criteria path/to/my.yaml

预设名称	使用场景
`text_to_image`	对比AI图像生成器（Midjourney、DALL-E、Flux）
`photorealism`	图像的照片真实感程度
`artistic_style`	风格一致性、构图、色彩协调性
`portrait`	AI生成肖像的质量与真实感
`product_photo`	电商产品图像质量
`document_ocr`	文档文本提取与布局理解能力
`chart_analysis`	图表与数据可视化的理解能力
`invoice`	财务文档字段提取的准确性
`ui_screenshot`	App/网页截图的理解能力
`scientific`	科学/医学图像的准确性
`alt_text`	无障碍图像描述的质量

自定义标准：传入任意

.yaml

文件作为

--criteria path/to/my.yaml

。

Judge Providers

评估模型提供商

Prefix	Provider	Example
`gpt-` , `o1` , `o3` , `o4`	OpenAI	`gpt-4o`
`claude-`	Anthropic	`claude-sonnet-4-5-20251022`
`gemini-`	Google Gemini	`gemini-2.5-flash`
`pixtral-` , `mistral-` , `ministral-`	Mistral	`pixtral-12b-2409`
`openrouter/`	OpenRouter (any model)	`openrouter/meta-llama/llama-4-maverick`

前缀	提供商	示例
`gpt-` , `o1` , `o3` , `o4`	OpenAI	`gpt-4o`
`claude-`	Anthropic	`claude-sonnet-4-5-20251022`
`gemini-`	Google Gemini	`gemini-2.5-flash`
`pixtral-` , `mistral-` , `ministral-`	Mistral	`pixtral-12b-2409`
`openrouter/`	OpenRouter（任意模型）	`openrouter/meta-llama/llama-4-maverick`

API Keys

API密钥

Keys are loaded from

secrets.enc.yaml

(SOPS + age encrypted) with fallback to environment variables.

Supported keys:

OPENAI_API_KEY

ANTHROPIC_API_KEY

GEMINI_API_KEY

OPENROUTER_API_KEY

To encrypt your own keys:

bash

sops --config .sops.yaml --encrypt --input-type yaml --output-type yaml secrets.yaml > secrets.enc.yaml

密钥从

secrets.enc.yaml

（SOPS + age加密）加载， fallback到环境变量。

支持的密钥：

OPENAI_API_KEY

ANTHROPIC_API_KEY

GEMINI_API_KEY

OPENROUTER_API_KEY

加密自己的密钥：

bash

sops --config .sops.yaml --encrypt --input-type yaml --output-type yaml secrets.yaml > secrets.enc.yaml

Output Formats

输出格式

--output markdown

(default) ·

--output json

--output table

--output markdown

（默认）·

--output json

--output table

Files

文件说明

```
bench.py
```
— CLI entry point
```
judge.py
```
— Multi-provider LLM judge logic
```
report.py
```
— Report generation
```
vault.py
```
— SOPS secrets decryption
```
criteria/
```
— 11 YAML preset files
```
.sops.yaml
```
— Age key config for encryption
```
secrets.enc.yaml
```
— Encrypted API keys

```
bench.py
```
— CLI入口文件
```
judge.py
```
— 多提供商LLM评估逻辑
```
report.py
```
— 报告生成模块
```
vault.py
```
— SOPS密钥解密模块
```
criteria/
```
— 11个YAML预设文件
```
.sops.yaml
```
— Age密钥加密配置
```
secrets.enc.yaml
```
— 加密后的API密钥