For years, the dominant constraint in software engineering was implementation cost. Writing software was expensive. Changing it was expensive. The whole discipline organized itself around that fact — estimation, sprints, backlogs, delivery teams — because throughput was the bottleneck.
AI is dismantling that bottleneck faster than most organizations have been able to think about what replaces it.
But there’s a less-noticed shift happening alongside the capability story. Quietly, and then quite visibly, the major AI vendors are moving coding workflows toward consumption-based pricing. GitHub Copilot has introduced premium requests. Anthropic has repeatedly adjusted Claude Code access as demand exploded. OpenAI has separated agent-style workflows into explicit credit models for enterprise use. The details differ — different token pools, different rate limits, different structures — but the direction is consistent: frontier inference is not economically unlimited, and the vendors are starting to price it that way.
This matters more than it might appear, for one reason: AI-assisted development is not reducing demand for software. It’s accelerating it. Developers produce more. Non-developers can now produce some. Agents execute implementation work in parallel. If software was already eating the world, AI is increasing the bite rate.
Which means total inference demand is likely to rise faster than efficiency gains bring costs down. Even as models get cheaper per token, the sheer volume of generated code, tests, pipelines, reports, and scaffolding will grow. Organizations routing large amounts of that work through frontier reasoning models are building a cost structure that may become uncomfortable quickly.
The current pattern of AI-assisted development tends to look like this: open a long-running session with a flagship model, iterate continuously, keep large context windows alive, regenerate as needed. It works. The problem is what it’s spending expensive reasoning capacity on.
A significant portion of that work is not actually ambiguous. It’s repetitive. Deterministic. Structurally predictable. Generating a stable HTML report structure for the fifteenth time doesn’t require the same model that helped you design the architecture. Implementing a transformation you’ve already fully specified doesn’t require frontier-level reasoning. But because the workflow isn’t designed to distinguish between the two, it routes everything through the same expensive path.
That’s partly an economic inefficiency. But it’s also a signal. If implementation continuously requires frontier-level reasoning, that’s usually not a model problem — it’s a decomposition problem. It means ambiguity that should have been resolved upstream is still alive in the work.
Historically, you could absorb that ambiguity because implementation throughput was the real constraint anyway. When implementation gets cheap, the constraint moves. What’s left is ambiguity, architectural clarity, decomposition quality, validation, and organizational alignment. That’s where the work now actually is.
The architectural implication is straightforward, even if acting on it isn’t. Use stronger models where ambiguity and interpretation are genuinely high — architecture, requirements, tradeoff analysis, decomposition, synthesis. Once that work is done and structure has been established, the remaining implementation should be deterministic enough that it doesn’t need the same reasoning capacity.
I’ve been exploring what this looks like in practice through two experimental repositories, Rupify and Speckify. Rupify concentrates expensive reasoning at the front of a workflow: AI-assisted stakeholder interviews, requirements normalization, formal specification generation. The goal is to resolve ambiguity deliberately and early, producing canonical artifacts that downstream work can execute against. Speckify then takes those specifications and decomposes them into atomic, traceable implementation units — work that’s structured enough that smaller, cheaper models can handle it reliably.
The point isn’t “better AI coding.” It’s redesigning the workflow so that frontier reasoning gets used once, where it’s genuinely needed, rather than continuously throughout the entire delivery cycle. The economics look very different when you do that.
For a while, the dominant question in AI-assisted engineering was capability: which model, which agent, which coding assistant. That question isn’t going away. But another constraint is now becoming visible alongside it.
The organizations that scale this well probably won’t be the ones that put AI everywhere. They’ll be the ones that figure out where ambiguity actually lives in their workflows, structure everything else aggressively, and stop paying frontier-model prices for work that stopped being uncertain three steps ago.
The competitive advantage isn’t more AI. It’s better placement.
