Loading...
Loading...
Score and compare images using vision LLMs as judges. YAML-defined criteria presets for 11 use cases (text-to-image, photorealism, document OCR, charts, UI, portrait, product, scientific, invoice, alt-text, artistic style). Supports OpenAI, Anthropic, Gemini, Mistral, and OpenRouter as judge providers. Keys auto-decrypted via SOPS + age.
npx skill4agent add glebis/claude-skills vision-bench# Install dependencies
pip install pyyaml openai anthropic mistralai
# Score a single image
python bench.py image.png --criteria photorealism --judge gemini-2.5-flash
# Compare two AI-generated images
python bench.py img_a.png img_b.png \
--criteria text_to_image \
--prompt "a fox in a snowy forest" \
--judge gpt-4o
# Multi-judge consensus
python bench.py img.png \
--criteria portrait \
--judges gpt-4o gemini-2.5-flash claude-opus-4-5-20251022
# OpenRouter models (any vision-capable model)
python bench.py img_a.png img_b.png \
--criteria artistic_style \
--judges "openrouter/meta-llama/llama-4-maverick" "openrouter/mistralai/pixtral-large-2411"
# List all presets
python bench.py --list-presets
# Save report to file
python bench.py img.png --criteria chart_analysis --save report.md| Preset | Use Case |
|---|---|
| Compare AI image generators (Midjourney, DALL-E, Flux) |
| How convincingly an image looks like a photo |
| Style consistency, composition, color harmony |
| AI-generated portrait quality and realism |
| E-commerce product image quality |
| Document text extraction and layout understanding |
| Chart and data visualization comprehension |
| Financial document field extraction accuracy |
| App/web screenshot understanding |
| Scientific/medical image accuracy |
| Accessibility image description quality |
.yaml--criteria path/to/my.yaml| Prefix | Provider | Example |
|---|---|---|
| OpenAI | |
| Anthropic | |
| Google Gemini | |
| Mistral | |
| OpenRouter (any model) | |
secrets.enc.yamlOPENAI_API_KEYANTHROPIC_API_KEYGEMINI_API_KEYOPENROUTER_API_KEYsops --config .sops.yaml --encrypt --input-type yaml --output-type yaml secrets.yaml > secrets.enc.yaml--output markdown--output json--output tablebench.pyjudge.pyreport.pyvault.pycriteria/.sops.yamlsecrets.enc.yaml