GPT-5.5: The Benchmark Jump Isn't the Story. Tokens Per Task Is.
OpenAI shipped GPT-5.5 today. Everyone will quote the Terminal-Bench 2.0 jump from 75.1% to 82.7% — and miss the claim buried further down: significantly fewer tokens per Codex task, and state-of-the-art coding intelligence at half the cost of competitive frontier models. Here's which number to actually watch, and a decision rule for when to switch.