Search Results: gguf-format

Found 2 Skills

AI & Machine Learningdavila7/claude-code-templ...

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

llm-inference

Use when "LLM inference", "serving LLM", "vLLM", "llama.cpp", "GGUF", "text generation", "model serving", "inference optimization", "KV cache", "continuous batching", "speculative decoding", "local LLM", "CPU inference"

🇺🇸|EnglishTranslated