Search Results: gpu-inference

Found 7 Skills

AI & Machine Learningmicrosoft/azure-skills

airunway-aks-setup

Set up AI Runway on AKS — from bare cluster to running model. Covers cluster verification, controller install, GPU assessment, provider setup, and first deployment. WHEN: "setup AI Runway", "onboard AKS cluster", "install AI Runway", "airunway setup", "deploy model to AKS", "GPU inference on AKS", "KAITO setup on AKS", "run LLM on AKS", "vLLM on AKS", "set up model serving on AKS", "AI Runway controller".

🇺🇸|EnglishTranslated

152.9k

AI & Machine Learninghuggingface/skills

huggingface-community-evals

Run evaluations for Hugging Face Hub models using inspect-ai and lighteval on local hardware. Use for backend selection, local GPU evals, and choosing between vLLM / Transformers / accelerate. Not for HF Jobs orchestration, model-card PRs, .eval_results publication, or community-evals automation.

🇺🇸|EnglishTranslated

3 scripts/Checked

AI & Machine Learningjakerains/agentskills

onnx-webgpu-converter

Convert HuggingFace transformer models to ONNX format for browser inference with Transformers.js and WebGPU. Use when given a HuggingFace model link to convert to ONNX, when setting up optimum-cli for ONNX export, when quantizing models (fp16, q8, q4) for web deployment, when configuring Transformers.js with WebGPU acceleration, or when troubleshooting ONNX conversion errors. Triggers on mentions of ONNX conversion, Transformers.js, WebGPU inference, optimum export, model quantization for browser, or running ML models in the browser.

🇺🇸|EnglishTranslated

1 scripts/Checked

Data Processingeventual-inc/daft

daft-udf-tuning

Optimize Daft UDF performance. Invoke when user needs GPU inference, encounters slow UDFs, or asks about async/batch processing.

🇺🇸|EnglishTranslated

AI & Machine Learningwanshuiyin/auto-claude-co...

serverless-modal

Run GPU workloads on Modal — training, fine-tuning, inference, batch processing. Zero-config serverless: no SSH, no Docker, auto scale-to-zero. Use when user says "modal run", "modal training", "modal inference", "deploy to modal", "need a GPU", "run on modal", "serverless GPU", or needs remote GPU compute.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

🇺🇸|EnglishTranslated

AI & Machine Learningtruefoundry/tfy-deploy-sk...

truefoundry-llm-deploy

Deploys ML and LLM models on TrueFoundry with GPU inference servers (vLLM, TGI, NVIDIA NIM). Uses YAML manifests with `tfy apply`. Use when serving language models, deploying Hugging Face models, or hosting GPU-accelerated inference endpoints.

🇺🇸|EnglishTranslated

2 scripts/Attention