Loading...
Loading...
Found 30 Skills
Run 397B parameter Mixture-of-Experts LLMs on a MacBook using pure C/Metal with SSD streaming
Build with Surf pay-per-use APIs at surf.cascade.fyi. Twitter data, Reddit data, web search/crawl, and LLM inference - no signup, no API keys, just pay per call. Use when working with Surf endpoints, fetching Twitter/X data, Reddit data, web crawling/search, pay-per-request LLM inference, setting up x402-proxy or @x402/fetch with Surf, or any mention of surf.cascade.fyi. Triggers on surf, surf.cascade.fyi, surf API, twitter data, reddit data, web crawl, surf inference, x402 endpoints, MCP surf tools.
Connect to local LLM endpoints (Ollama, llama.cpp, vLLM) with automatic provider fallback. Use when: (1) you need to run LLM inference locally for privacy/cost, (2) you want to use models not available via cloud APIs, (3) you need offline capability, (4) you want automatic fallback to cloud providers when local fails.
Command-line interface for Ollama - Local LLM inference and model management via Ollama REST API. Designed for AI agents and power users who need to manage models, generate text, chat, and create embeddings without a GUI.
Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.
LLM inference via paid API: OpenAI-compatible chat completions proxied through x402 providers. Supports Kimi K2.5, MiniMax M2.5. Uses x_payment tool for automatic USDC micropayments ($0.001-$0.003/call). Use when: (1) generating text with a specific model, (2) running chat completions through a pay-per-request LLM endpoint, (3) comparing outputs across models.