Search Results: gemini

Found 570 Skills

AI & Machine Learningcnemri/google-genai-skill...

google-genai-sdk-python

Expert guidance for writing Python code using the official Google GenAI SDK (google-genai) for Gemini API and Vertex AI. Use for text generation, multimodal inputs, reasoning, tools, and media generation.

🇺🇸|EnglishTranslated

AI & Machine Learningshiqkuangsan/oh-my-daily-...

tooyoung:nano-banana-builder

Build full-stack web applications powered by Google Gemini's Nano Banana & Nano Banana Pro image generation APIs. Use when creating Next.js image generation apps, text-to-image tools, or iterative image editors.

🇺🇸|EnglishTranslated

AI & Machine Learningbuildatscale-tv/claude-co...

generate

Nano Banana Pro (nano-banana-pro) image generation skill. Use this skill when the user asks to "generate an image", "generate images", "create an image", "make an image", uses "nano banana", or requests multiple images like "generate 5 images". Generates images using Google's Gemini 2.5 Flash for any purpose - frontend designs, web projects, illustrations, graphics, hero images, icons, backgrounds, or standalone artwork. Invoke this skill for ANY image generation request.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningbinhmuc/autobot-review

ai-multimodal

Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.

🇺🇸|EnglishTranslated

7 scripts/Attention

AI & Machine Learningmiticojo/adk-skill

adk-skill

Build single-agent and multi-agent systems using Google's Agent Development Kit (ADK) in Python, Java, Go, or TypeScript. Use when creating AI agents with ADK, designing multi-agent architectures, implementing agent tools, configuring agent callbacks, managing agent state, orchestrating sequential/parallel/loop agent workflows, or when the user mentions ADK, google-adk, google agent development kit, agentic AI with Gemini, or agent orchestration with Google tools. Also use when setting up ADK projects, writing agent tests, deploying agents, or integrating MCP tools with ADK.

🇺🇸|EnglishTranslated

AI & Machine Learningsundial-org/awesome-openc...

antigravity-image-gen

Generate images using the internal Google Antigravity API (Gemini 3 Pro Image). High quality, native generation without browser automation.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningvinta/hal-9000

magi-ex

Use when brainstorming, evaluating architecture choices, or comparing trade-offs where independent perspectives from different model families (Claude/Codex/Gemini) would surface blind spots

🇺🇸|EnglishTranslated

AI & Machine Learningsamhvw8/dot-claude

ai-multimodal

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

🇺🇸|EnglishTranslated

6 scripts/Attention

AI & Machine Learningwlzh/skills

image-generator

通用图片生成 Skill，支持多种 AI 模型（ModelScope、Gemini 等），可被其他 Skills 调用

🇺🇸|EnglishTranslated

2 scripts/Checked

Tools & Utilitieshuangwb8/chineseresearchl...

nsfc-schematic

Use this when users explicitly request to "generate NSFC schematic diagram/mechanism diagram" or need to convert the research mechanism, algorithm architecture, and module relationships in the proposal into "editable + embeddable" diagrams. By default, editable source files (`.drawio`) and rendered files (`.pdf`/`.svg`/`.png`) are output; when users actively mention the Nano Banana/Gemini image model, you can switch to PNG-only mode. ⚠️ Not applicable scenarios: Users only want to polish the main text (should rewrite text directly), only want to modify the format/size of existing images (should use image processing skills), and have no clear intention of requiring "schematic/mechanism diagram".

🇨🇳|ChineseTranslated

22 scripts/Attention

AI & Machine Learningsamuraigpt/generative-med...

muapi-nano-banana

Reasoning-driven image generation using structured creative briefs (Gemini 3 style) — generates high-fidelity images via muapi.ai with logic-based prompting

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningdinghuanghao/openword

openword-player

Operate OpenWord end-to-end for live adventure sessions. Use when Codex needs to download/install/start OpenWord, guide a human player in the browser, or play autonomously through REST API (create/load game, do_action loop, state/image retrieval), including configuring GEMINI_API_KEY and sharing interesting scenes and choices during play.

🇺🇸|EnglishTranslated

1 scripts/Attention