Why Your Coding Agent Burns Reasoning Tokens on Problems It Already Solved
Every time a coding agent hits a CI failure, it re-reasons from scratch — a structured dev-loop control plane routes feedback at the right context depth, eliminating redundant reasoning and cutting costs by up to 93%.
Your coding agent charged you $40 today. Roughly $37 of that was it thinking very hard about errors it already understood.
This is not an exaggeration. It is the hidden cost structure of unstructured agentic workflows, and most of the advice about reducing agent token spend is aimed at the wrong problem. Developers shorten prompts, cap output tokens, turn off the most powerful model. These changes affect the 10% of your token bill that is visible. The 90% that is invisible, the reasoning tokens burning through CI failure analysis and re-diagnosis and re-planning, stays exactly the same.
The Re-Reasoning Trap
Here is what happens when a coding agent hits a CI failure without a structured control plane. The agent writes code. CI fails on a type error. The CI output, often hundreds of lines of raw logs, gets dumped into a new context window. The agent reads the diff again. It re-diagnoses the error. It re-plans the fix. It writes new code. CI fails again, sometimes on the same error, sometimes on a new one introduced by the fix. The cycle repeats.
Each iteration of this loop triggers a full reasoning budget. The agent has no persistent memory of what it already diagnosed two cycles ago. Every CI failure is a cold start. The model reasons its way from the raw error output back to a plan, every single time.
Spotify's engineering team described this problem directly in their 2025 Honk series. Without structured verification loops, their agents "often produce code that simply doesn't work." An LLM judge rejected roughly 25% of all agent outputs. About half of those rejections triggered agent re-reasoning and course correction. Every one of those re-reasoning cycles was billed. A tighter feedback loop, one that routes the right signal at the right context depth instead of triggering a full re-analysis, would have avoided most of that spend.
Where the Token Budget Actually Goes
In a typical agentic coding session, the token budget breaks down roughly like this: most of it goes to diagnosis, a smaller chunk to planning, and a relatively small fraction to actual code generation and execution. The asymmetry is significant because developers typically try to optimize for the smallest part of the spend. Capping output tokens saves money on execution. It does nothing to the diagnosis cost, which can be 5 to 10 times larger.
A thread on Hacker News about two weeks of Claude Code usage, 392 points, 379 comments, surfaced a consistent pattern: developers spending $200 to $600 per month on Max plans, with the top survival tip being "turn off Opus" to stop reasoning token bleed. Simon Willison noted in the same thread that comprehensive tests are the single biggest unlock for making agent runs reliable and affordable. The implication, which most people missed, is that unguided repair loops are where the money goes. Tests are not just quality gates. They are the feedback mechanism that tells the agent when to stop re-reasoning.
A dev.to post on the real cost of running AI coding agents put the figure at $20 to $50 per day on medium-sized projects from full-codebase context reads alone. Reasoning tokens are the hidden multiplier on top of that. Most people looking at their token bill are reading the wrong line items.
What Structured Feedback Routing Changes
The fix is not about the model or the prompt. It is about how CI results are delivered back into the agent loop. When a control plane routes CI feedback at the correct context depth, meaning it delivers a structured, targeted summary of what failed and where, rather than dumping raw log output into a fresh context window, the agent can skip re-diagnosis and go straight to acting. It already knows the codebase structure. It already knows the task. It just needs the specific failure signal, not a cold-start context read.
This is the mechanism behind the 93% reduction in reasoning tokens that AgentRail achieved on a benchmark task versus plain Codex. The pipeline is: structured issue intake, so the agent starts with a clean, scoped task rather than a vague prompt, then agent action, then CI result injection at the right depth, then targeted retry if needed, then PR submission. Each step eliminates a category of redundant reasoning. The agent never re-reads the full diff to re-diagnose an error it encountered in the previous cycle. The control plane already has that state and routes only what the agent needs.
The 47% reduction in total tokens is the downstream effect of eliminating those redundant loops. Fewer re-diagnosis cycles means fewer context reads, fewer re-planning steps, fewer retries that start from scratch.
The Practical Takeaway
If you are running coding agents and your token costs feel out of control, the first question to ask is not which model to switch to. It is: how is CI feedback getting back into the agent? If the answer is "it dumps the raw logs into the next context window," you are paying for re-diagnosis on every failure cycle.
AgentRail is a local-first, source-available control plane for Claude Code, Codex, and Cursor. It handles structured issue intake, CI feedback injection, PR submission, and review gating through one API. You can install it with npm install -g @agentrail-core/cli && agentrail init, or see how it works at https://agentrail.app. Run it on your next PR cycle and compare the reasoning token delta. The difference is visible in one task.