Search Results: llm-cost-reduction

Found 4 Skills

AI & Machine Learninglebsral/dspy-programming-...

ai-cutting-costs

Reduce your AI API bill. Use when AI costs are too high, API calls are too expensive, you want to use cheaper models, optimize token usage, reduce LLM spending, route easy questions to cheap models, or make your AI feature more cost-effective. Covers DSPy cost optimization — cheaper models, smart routing, per-module LMs, fine-tuning, caching, and prompt reduction.

🇺🇸|EnglishTranslated

AI & Machine Learningsernote/audit-prompt-cach...

audit-prompt-caching

Use whenever the user mentions LLM prompt/prefix cache misses, cached_tokens=0, cache_read_input_tokens/cache_creation_input_tokens, prompt_cache_key, cache_control/cachePoint placement, stable prefixes, tool/schema stability, TTFT/prefill latency, OpenAI/Claude/Bedrock/OpenRouter routing, vLLM/SGLang KV reuse, or LLM cost/speed regressions on repeated long prompts. Use when reviewing LLM request shape changes: prompt text, message order, request builders, tools, schemas, response_format, provider API surface, model/router settings, agent loop structure, context compaction, or inference deployment. Use for speeding up agents only when prompt-cache stability, TTFT, or cache cost is central. Do not use for generic prompt writing, generic RAG design, token counting, or non-LLM performance.

🇺🇸|EnglishTranslated

8 scripts/Attention

AI & Machine Learningyesilsin-netizen/save-tok...

save-token

💰 Save Token | Token 节省器 TRIGGERS: Use when token cost is high, conversation is long, files read multiple times, or before complex tasks. Guiding skill that helps agents identify and avoid sending duplicate context to LLM APIs. Teaches agents to recognize repeated content and summarize instead of re-sending. 触发条件：Token 成本高、对话长、文件多次读取、复杂任务前。指导 Agent 识别重复内容，避免重复发送，从而节省 Token。

🇺🇸|EnglishTranslated

AI & Machine Learningclaude-dev-suite/claude-d...

token-optimization

Token optimization best practices for MCP server and tool interactions. Minimizes token consumption while maintaining effectiveness. USE WHEN: user mentions "token usage", "optimize tokens", "reduce API calls", "MCP efficiency", asks about "how to use less tokens", "MCP best practices", "limit output size", "efficient queries" DO NOT USE FOR: Code optimization - use `performance` instead, Text compression - this is about API usage patterns, Cost optimization (infrastructure) - use cloud/DevOps skills

🇺🇸|EnglishTranslated