Loading...
Loading...
Found 103 Skills
Use to select models to run locally with llama.cpp and GGUF on CPU, Mac Metal, CUDA, or ROCm. Covers finding GGUFs, quant selection, running servers, exact GGUF file lookup, conversion, and OpenAI-compatible local serving.
Expert skill for integrating local Large Language Models using llama.cpp and Ollama. Covers secure model loading, inference optimization, prompt handling, and protection against LLM-specific vulnerabilities including prompt injection, model theft, and denial of service attacks.
Transcribe audio to text using local whisper.cpp. Use when user wants to convert audio/video to text, get transcription, or speech-to-text.
Dump Unity IL2CPP symbols from iOS/Android builds. Extract method names, addresses, and type info from IL2CPP binaries and global-metadata.dat, then generate IDA/Ghidra import scripts.
Recipes and configs for serving LLMs locally on RTX 3090 GPUs using vLLM, llama.cpp, and SGLang with OpenAI-compatible API
GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.
Static analysis skill for C/C++ codebases. Use when hardening code quality, triaging noisy builds, running clang-tidy, cppcheck, or scan-build, interpreting check categories, suppressing false positives, or integrating static analysis into CI. Activates on queries about clang-tidy checks, cppcheck, scan-build, compile_commands.json, code hardening, or static analysis warnings.
Use when integrating Foundation Models framework, implementing on-device AI with Apple Intelligence, building tool-calling AI features, working with guided generation schemas, converting models with Core ML and coremltools, or running open-source LLMs on Apple Silicon. Covers Foundation Models (LanguageModelSession, @Generable, @Guide, SystemLanguageModel, structured output, tool calling), Core ML (coremltools, model conversion, quantization, palettization, pruning, Neural Engine, MLTensor), MLX Swift (transformer inference, unified memory), and llama.cpp (GGUF, cross-platform LLM).
Provides file paths to language-specific extension files for the code-testing pipeline. Call this skill to discover available extension guidance files (e.g., dotnet.md for .NET, cpp.md for C++). Do not use directly — invoked by code-testing agents and skills that need language-specific references.
Use this when you need to run static analysis with cppcheck, clang-tidy, or GCC analyzer on embedded C/C++ code, or perform MISRA-C compliance checks.
Use when "LLM inference", "serving LLM", "vLLM", "llama.cpp", "GGUF", "text generation", "model serving", "inference optimization", "KV cache", "continuous batching", "speculative decoding", "local LLM", "CPU inference"
Run a free 35B AI coding agent on Apple Silicon Macs using local LLMs via llama.cpp or MLX with web search, shell, and file tools.