// tag

#RAG

3 posts · page 1/1

Tech· 2026-05-27· 5min

The Smallest Agent That Works, Part 2: The Three Reach-Out Agents

When the cheap tiers run out, the agent has to reach beyond the model itself — into knowledge it doesn't have, tools it can't natively use, or its own previous answer. RAG, tool use, and self-critique: three patterns, three failure modes worth pricing in. Part 2 of three.

#llm#rag#agent-architecture#ai-engineering
Tech· 2026-04-23· 5min

Vectorless RAG Hits 98.7%. Here's What the Infographic Edited Out.

Tree-walking RAG really does beat chunked vector search on hierarchical documents — the 98.7% vs 50% gap on FinanceBench is real. But the headline hides the three costs that decide whether you should actually rip out your vector store: latency, per-query token burn, and the multi-document corpus problem that "vectorless" quietly punts on.

#llm#rag#retrieval#vector-databases
Tech· 2026-04-21· 5min

Cutting LLM Token Costs: 12 Techniques That Actually Move the Bill

Most teams overpay for LLM tokens by 3–5× without realizing it. Here are 12 techniques, ordered by impact — from prompt caching that cuts 90% off repeated system prompts, to model routing that saves 80% on easy tasks, to the context-window mistake almost every team makes.

#caching#llm#rag#ai