Loading...
Loading...
Found 1,564 Skills
Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.
OpenAI-compatible proxy aggregating 14 free-tier LLM providers with automatic failover and per-key rate tracking.
Script-First llms.txt generator. Uses a deterministic script to crawl the project structure, identify brand guides, and catalog content files. Provides a repo manifest for the agent to draft context-aware /llms.txt and /llms-full.txt files.
Systematic approach to exploring the TensorRT-LLM codebase before implementing new features or optimizations. Teaches how to discover existing infrastructure, trace code paths, and avoid reimplementing what already exists. Derived from real mistakes where ~250 lines of code were written and deleted because existing forward methods weren't discovered upfront. Use when starting any new feature, optimization, or code modification in TRT-LLM.
Karpathy's LLM Wiki: build/query interlinked markdown KB.
Router skill for LLMQuant investor-lens workflows. Use when the user wants an investor-style reasoning overlay grounded in LLMQuant Data evidence.
Produce an LLM Build Pack (prompt+tool contract, data/eval plan, architecture+safety, launch checklist). Use for building with LLMs, GPT/Claude apps, prompt engineering, RAG, and tool-using agents.
Use when setting up, deploying, or operating vLLM Studio (env keys, controller/frontend startup, Docker services, branch workflow, and release checklists).
Use when you want rubric based LLM quality scoring on generated outputs; pair with addon-deterministic-eval-suite.
Fact-checks LLM responses by extracting verifiable claims, verifying each via web search, producing an audit report with verdicts, and optionally revising inaccurate responses. Use when the user asks to audit, fact-check, double-check, or verify a response.
Train your own GPT-2 level LLM for under $100 using nanochat, Karpathy's minimal hackable harness covering tokenization, pretraining, finetuning, evaluation, inference, and chat UI.
Deploys ML and LLM models on TrueFoundry with GPU inference servers (vLLM, TGI, NVIDIA NIM). Uses YAML manifests with `tfy apply`. Use when serving language models, deploying Hugging Face models, or hosting GPU-accelerated inference endpoints.