TECHThe Smallest Agent That Works, Part 3: The Three Agents With State TECHThe Smallest Agent That Works, Part 2: The Three Reach-Out Agents TECHThe Smallest Agent That Works, Part 1: The Three Cheap Agents TECHWhat MLX Got to Throw Away (That PyTorch Can't)TECHThe Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy TECHEvery Useful Skill Is One of Five Shapes TECHMCP Gives You the Kitchen. Skills Are the Recipe.TECHThe Three-Level Skill, and Why Yours Is Probably One Level Too Many TECHThe Smallest Agent That Works, Part 3: The Three Agents With State TECHThe Smallest Agent That Works, Part 2: The Three Reach-Out Agents TECHThe Smallest Agent That Works, Part 1: The Three Cheap Agents TECHWhat MLX Got to Throw Away (That PyTorch Can't)TECHThe Unified-Memory Bet: Why On-Device Inference Stopped Being a Toy TECHEvery Useful Skill Is One of Five Shapes TECHMCP Gives You the Kitchen. Skills Are the Recipe.TECHThe Three-Level Skill, and Why Yours Is Probably One Level Too Many

AI × Forward$ moving-ai-forward

Search⌘KSubscribe

// tag

#Caching

1 post · page 1/1

Tech· 2026-04-21· 5min

Cutting LLM Token Costs: 12 Techniques That Actually Move the Bill

Most teams overpay for LLM tokens by 3–5× without realizing it. Here are 12 techniques, ordered by impact — from prompt caching that cuts 90% off repeated system prompts, to model routing that saves 80% on easy tasks, to the context-window mistake almost every team makes.

#caching#llm#rag#ai