Nano Banana Builder
Operator Context
This skill operates as an operator for building production web applications powered by Google's Nano Banana image generation APIs. It implements the Phased Build architectural pattern -- Scaffold, Integrate, Polish, Verify -- with Domain Intelligence embedded in model selection, conversational editing, and production hardening.
Hardcoded Behaviors (Always Apply)
- CLAUDE.md Compliance: Read and follow repository CLAUDE.md before building
- Exact Model Names Only: Use or
gemini-3-pro-image-preview
exclusively. Never invent model strings, add date suffixes, or guess names.
- Server-Side API Calls: All Gemini API calls go through server actions or API routes. Never expose API keys client-side.
- Storage Over Base64: Store generated images in object storage (Vercel Blob, S3/R2) and persist URLs, not raw base64 in databases.
- Rate Limit Handling: Every production integration must include rate limiting with user-friendly feedback.
- Conversational Editing: Leverage multi-turn history for iterative refinement rather than one-shot generation.
Default Behaviors (ON unless disabled)
- Model Selection by Use Case: Flash for speed/volume, Pro for quality/text rendering
- Loading State UX: Show progress indicators during 5-30s generation time
- Error Boundaries: Wrap generation components in error boundaries with retry
- Debounced Input: Require explicit user action to generate, never on keystroke
- Unique Design Per App: Vary UI style, color, layout, and interaction to fit purpose
- Environment Variable Validation: Check for required keys at startup
Optional Behaviors (OFF unless enabled)
- Batch Generation: Generate multiple images in parallel with queue management
- Image Composition: Combine multiple generated images into composites
- Style Transfer: Apply reference styles across generations
- Gallery Persistence: Save generation history with browsing and search
What This Skill CAN Do
- Build complete image generation web applications with Next.js
- Implement server actions and API routes for both Gemini image models
- Handle iterative, multi-turn image editing conversations via useChat
- Configure object storage (Vercel Blob, S3/R2) for generated images
- Implement rate limiting and quota management with Upstash Redis
- Select the correct model based on speed, quality, and cost tradeoffs
What This Skill CANNOT Do
- Use non-Gemini image models (DALL-E, Midjourney, Stable Diffusion)
- Deploy to non-Node.js environments (Python, Go, etc.)
- Implement custom model fine-tuning or training
- Handle image input/classification (that is Gemini Vision, not Nano Banana)
- Skip any of the 4 build phases
Instructions
CRITICAL: Valid Model Names
Only two model strings exist for image generation. Use them exactly as written.
| Model String (exact) | Alias | Best For |
|---|
| Nano Banana | Fast iterations, drafts, high volume (2-5s) |
gemini-3-pro-image-preview
| Nano Banana Pro | Quality output, text rendering, 2K resolution |
Common wrong names:
gemini-2.5-flash-preview-05-20
(text model suffix),
(Pro does not generate images),
(does not exist),
(image input, not generation).
Phase 1: SCAFFOLD
Goal: Set up the Next.js project structure with dependencies and environment.
Step 1: Initialize project and install dependencies
bash
npm install @ai-sdk/google ai @ai-sdk/react
# Storage (pick one):
npm install @vercel/blob # Vercel Blob
# or configure S3/R2 via aws-sdk
# Rate limiting (optional):
npm install @upstash/ratelimit @upstash/redis
Step 2: Configure environment variables
bash
# .env.local
GEMINI_API_KEY=your_api_key_here
BLOB_READ_WRITE_TOKEN=your_vercel_token # if using Vercel Blob
Step 3: Define the application structure
markdown
## App Structure
- app/actions/generate.ts -- Server action for image generation
- app/api/generate/route.ts -- API route (if using useChat)
- app/components/ -- React client components
- lib/storage.ts -- Storage abstraction
- lib/rate-limit.ts -- Rate limiting (if production)
Gate: Project initializes, dependencies install, env vars configured. Proceed only when gate passes.
Phase 2: INTEGRATE
Goal: Wire up Gemini image generation with server-side API calls and client components.
Step 1: Create server action or API route
typescript
// app/actions/generate.ts
'use server'
import { google } from '@ai-sdk/google'
import { generateText } from 'ai'
export async function generateImage(prompt: string) {
const result = await generateText({
model: google('gemini-2.5-flash-image'),
prompt,
providerOptions: {
google: {
responseModalities: ['IMAGE'],
imageConfig: { aspectRatio: '16:9' }
}
}
})
return result.files[0] // { base64, uint8Array, mediaType }
}
Step 2: Build client component with loading states
Use
for multi-turn editing or direct server action calls for single-shot generation. Always include loading indicators and error display.
Step 3: Connect storage
Upload generated images to object storage immediately. Return persistent URLs, not base64.
Gate: User can enter a prompt and receive a generated image displayed in the UI. Proceed only when gate passes.
Phase 3: POLISH
Goal: Harden for production with rate limiting, error handling, and design variation.
Step 1: Add rate limiting
Implement per-user rate limits using Upstash Redis or in-memory fallback. Show friendly wait messages on 429 responses.
Step 2: Implement error boundaries and retry
Wrap generation in try/catch with specific handling for 429 (rate limit), 401 (bad key), 400 (content policy), and network timeout. Provide user-visible feedback for each case.
Step 3: Apply unique design
Match UI style to the application purpose. Avoid generic AI startup aesthetics. Vary color scheme, layout, typography, and interaction pattern intentionally.
Gate: App handles all error cases gracefully and has intentional visual design. Proceed only when gate passes.
Phase 4: VERIFY
Goal: Confirm the application works end-to-end and meets production standards.
Step 1: Generate an image successfully with a test prompt
Step 2: Verify model name strings are exactly
or
gemini-3-pro-image-preview
Step 3: Confirm no API keys are exposed in client-side code or bundles
Step 4: Test error states (invalid prompt, rate limit simulation, missing env var)
Step 5: Verify images persist in storage with retrievable URLs
Step 6: Run full test suite if tests exist, no regressions
Gate: All verification steps pass. Build is complete.
Error Handling
Error: "Rate Limit Exceeded (429)"
Cause: Too many requests to Gemini API within quota window
Solution:
- Implement rate limiting middleware (Upstash Redis or in-memory)
- Show user-friendly wait message with estimated retry time
- Queue requests if burst traffic is expected
Error: "Invalid API Key (401)"
Cause: Missing or incorrect GEMINI_API_KEY in environment
Solution:
- Verify key exists in and is loaded server-side
- Check key has image generation permissions enabled
- Never expose key in client components or API responses
Error: "Content Policy Violation (400)"
Cause: Prompt triggers Gemini safety filters
Solution:
- Display clear user guidance on acceptable content
- Do not retry the same prompt automatically
- Log violations for monitoring without storing prompt content
Error: "Network Timeout or Generation Failure"
Cause: Generation exceeding timeout or transient network issue
Solution:
- Implement retry with exponential backoff (max 3 attempts)
- Show progress indicator during the 5-30s generation window
- Fall back to cached/placeholder image if all retries fail
Anti-Patterns
Anti-Pattern 1: Inventing Model Names
What it looks like: Using
gemini-2.5-flash-preview-05-20
or
for generation
Why wrong: These model strings do not support image generation. Date suffixes belong to text models, and 2.5 Pro has no image output capability.
Do instead: Use exactly
or
gemini-3-pro-image-preview
. No variations.
Anti-Pattern 2: Exposing API Keys Client-Side
What it looks like: Calling Gemini directly from React components or embedding keys in client bundles
Why wrong: Credentials are visible in browser DevTools, enabling abuse and billing attacks.
Do instead: Route all API calls through server actions or API routes. Store keys in environment variables accessed only server-side.
Anti-Pattern 3: Storing Base64 in Database
What it looks like: Saving raw base64 image data directly to PostgreSQL or MongoDB
Why wrong: Bloats database size, increases query latency, and makes backups expensive.
Do instead: Upload to object storage (Vercel Blob, S3, R2) immediately after generation. Persist only the URL.
Anti-Pattern 4: Ignoring Multi-Turn Context
What it looks like: Treating every generation as a fresh request with no conversation history
Why wrong: Discards Nano Banana's strongest feature -- conversational editing and iterative refinement.
Do instead: Track generation history as chat messages. Use
to enable natural language editing of previous results.
Anti-Pattern 5: No Loading States
What it looks like: Submit button goes disabled with no visual feedback for 5-30 seconds
Why wrong: Users assume the app is broken and spam-click, wasting quota and degrading UX.
Do instead: Show skeleton loaders, progress bars, or estimated wait time during generation.
References
This skill uses these shared patterns:
- Anti-Rationalization - Prevents shortcut rationalizations
- Verification Checklist - Pre-completion checks
Domain-Specific Anti-Rationalization
| Rationalization | Why It's Wrong | Required Action |
|---|
| "I know the model name" | Wrong model strings silently fail or error | Verify against exact list |
| "Base64 is fine for now" | Technical debt compounds fast with image data | Use object storage from day one |
| "Rate limiting can wait" | First production spike causes 429 cascade | Implement before deploying |
| "Loading state is cosmetic" | 5-30s silence destroys user trust | Always show generation progress |
Reference Files
${CLAUDE_SKILL_DIR}/references/advanced-patterns.md
: Server actions, API routes, client components, multi-image composition
${CLAUDE_SKILL_DIR}/references/configuration.md
: Provider options, storage setup, rate limiting, cost optimization