搜索：llm-inference - AI Agent Skills

AI & Machine Learningeyadsibai/ltk

llm-inference

Use when "LLM inference", "serving LLM", "vLLM", "llama.cpp", "GGUF", "text generation", "model serving", "inference optimization", "KV cache", "continuous batching", "speculative decoding", "local LLM", "CPU inference"

🇺🇸|EnglishTranslated

8

AI & Machine Learningskillssh/skills

agent-tools

Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

182.7k

AI & Machine Learningskillssh/skills

infsh-cli

Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

182.7k

AI & Machine Learninginference-sh/skills

inference-sh

Run 150+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

929

AI & Machine Learninginference-sh/skills

skills

Run 150+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok

🇺🇸|EnglishTranslated

412

AI & Machine Learningdavila7/claude-code-templ...

llama-cpp

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

🇺🇸|EnglishTranslated

16

AI & Machine Learningdavila7/claude-code-templ...

nowait-reasoning-optimizer

Implements the NOWAIT technique for efficient reasoning in R1-style LLMs. Use when optimizing inference of reasoning models (QwQ, DeepSeek-R1, Phi4-Reasoning, Qwen3, Kimi-VL, QvQ), reducing chain-of-thought token usage by 27-51% while preserving accuracy. Triggers on "optimize reasoning", "reduce thinking tokens", "efficient inference", "suppress reflection tokens", or when working with verbose CoT outputs.

🇺🇸|EnglishTranslated

15

1 scripts/Checked

AI & Machine Learningdavila7/claude-code-templ...

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

🇺🇸|EnglishTranslated

12

AI & Machine Learningteam-telnyx/skills

telnyx-ai-inference-javascript

Access Telnyx LLM inference APIs, embeddings, and AI analytics for call insights and summaries. This skill provides JavaScript SDK examples.

🇺🇸|EnglishTranslated

12

AI & Machine Learningaradotso/trending-skills

dflash-mlx-speculative-decoding

Lossless DFlash speculative decoding for MLX on Apple Silicon — 1.7–4x faster LLM inference using block diffusion drafting with target model verification.

🇺🇸|EnglishTranslated

12

AI & Machine Learningdavila7/claude-code-templ...

speculative-decoding

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

🇺🇸|EnglishTranslated

11

AI & Machine Learningjezweb/claude-skills

cloudflare-workers-ai

Run LLMs and AI models on Cloudflare's GPU network with Workers AI. Includes Llama 4, Gemma 3, Mistral 3.1, Flux images, BGE embeddings, streaming, and AI Gateway. Handles 2025 breaking changes. Prevents 7 documented errors. Use when: implementing LLM inference, images, RAG, or troubleshooting AI_ERROR, rate limits, max_tokens, BGE pooling, context window, neuron billing, Miniflare AI binding, NSFW filter, num_steps.

🇺🇸|EnglishTranslated

10

5 scripts/Attention

Search Results: llm-inference

llm-inference

agent-tools

infsh-cli

inference-sh

skills

llama-cpp

nowait-reasoning-optimizer

serving-llms-vllm

telnyx-ai-inference-javascript

dflash-mlx-speculative-decoding

speculative-decoding

cloudflare-workers-ai

Search Results: llm-inference

llm-inference

agent-tools

infsh-cli

inference-sh

skills

llama-cpp

nowait-reasoning-optimizer

serving-llms-vllm

telnyx-ai-inference-javascript

dflash-mlx-speculative-decoding

speculative-decoding

cloudflare-workers-ai