Search Results: openai-compatible-api

Found 10 Skills

chat

Generate chat completions using Sarvam AI's Sarvam-M model. Use when the user needs AI chat, text generation, question answering, or reasoning in Indian languages. Sarvam-M is a 24B parameter model with hybrid thinking, superior Indic language understanding, and OpenAI-compatible API. Free to use.

🇺🇸|EnglishTranslated

AI & Machine Learningwanshuiyin/auto-claude-co...

auto-review-loop-llm

Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/hermes-skills

openclaw-zero-token

Use major AI models (Claude, ChatGPT, Gemini, DeepSeek, Qwen, etc.) without API tokens by leveraging browser authentication instead of paid API keys

🇺🇸|EnglishTranslated

AI & Machine Learningjau123/meigen-art

setup

Configure MeiGen plugin provider and API keys. Use this when the user runs /meigen:setup, asks to "configure meigen", "set up image generation", "add API key", or needs help configuring the plugin.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesvllm-project/vllm-skills

vllm-deploy-k8s

Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

nemo-evaluator-sdk

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

🇺🇸|EnglishTranslated

AI & Machine Learningaradotso/trending-skills

club-3090-llm-serving

Recipes and configs for serving LLMs locally on RTX 3090 GPUs using vLLM, llama.cpp, and SGLang with OpenAI-compatible API

🇺🇸|EnglishTranslated

AI & Machine Learningveniceai/skills

venice-image-generate

Generate images with Venice. Covers POST /image/generate (Venice-native), POST /images/generations (OpenAI-compatible), GET /image/styles (style presets), request fields (prompt, dimensions, cfg_scale, seed, variants, style_preset, aspect_ratio, resolution, safe_mode, watermark), and response formats.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

deployment

Serve a quantized or unquantized LLM checkpoint as an OpenAI-compatible API endpoint using vLLM, SGLang, or TRT-LLM. Use when user says "deploy model", "serve model", "start vLLM server", "launch SGLang", "TRT-LLM deploy", "AutoDeploy", "benchmark throughput", "serve checkpoint", or needs an inference endpoint from a HuggingFace or ModelOpt-quantized checkpoint. Do NOT use for quantizing models (use ptq) or evaluating accuracy (use evaluation).

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningvllm-project/vllm-skills

vllm-bench-serve

Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.

🇺🇸|EnglishTranslated