When the Upgrade Broke the System
An OpenClaw upgrade exposed architectural fragility — and forced the move to middleware.
The Upgrade
This chapter didn’t start with architecture.
It started with an upgrade.
I updated OpenClaw.
At first, nothing looked obviously broken.
But then something strange happened.
Two prompts to Kai consumed nearly half of my available session tokens.
Two prompts.
That had never happened before.
That was the signal.
Something in the request assembly had changed. Context was being passed differently. Optimizations we had built were no longer behaving the same way.
The system wasn’t failing loudly.
It was drifting quietly.
And quiet drift is more dangerous than obvious failure.
That’s when I realized:
If optimization logic lives inside the tool you’re upgrading, you’re coupling stability to version churn.
The Hidden Fragility
Up until that point, context minification lived close to OpenClaw itself.
It worked.
But it was fragile.
An upgrade could:
- Change how prompts are assembled
- Modify fallback behavior
- Adjust internal request shape
- Introduce silent behavior shifts
The more optimization we added inside OpenClaw, the more brittle upgrades became.
That wasn’t sustainable.
The Real Catalyst: Token Exhaustion
Around the same time, another pressure surfaced.
Usage limits.
Not architectural limits.
Token limits.
I ran out.
Which forced a very practical question:
What happens when your primary provider hits a wall?
Fallback wasn’t theoretical anymore.
It was necessary.
But earlier context-window failures had already revealed something important:
Fallback isn’t just switching providers.
Different providers have:
- Different context limits
- Different token accounting
- Different response shapes
You can’t blindly forward the same prompt everywhere.
That realization shifted routing from convenience to governance.
Why Middleware Emerged
Owning a middleware layer wasn’t about flexibility.
It was about insulation.
If context trimming, routing policy, provider selection, and fallback logic live outside OpenClaw:
- OpenClaw can upgrade safely
- Routing logic remains stable
- Provider churn doesn’t require client rewrites
- Token governance can evolve independently
This wasn’t feature expansion.
It was decoupling.
Phase 1: Prove the Boundary
Instead of building a complex routing engine immediately, I paused.
The first version of middleware did one thing:
Proxy to LM Studio through an OpenAI-compatible interface.
That’s it.
Validate:
- Contract compatibility
- Streaming behavior
- Deployment stability
- Docker integration
No routing intelligence yet.
Just proving the boundary.
Because layering policy on top of instability guarantees fragility.
Model Aliasing: Decoupling Identity from Provider
Once the boundary held, logical model names were introduced:
claude-mainlocal-smalllocal-large
OpenClaw no longer referenced providers directly.
It referenced intent.
The middleware resolved the rest.
That small abstraction insulated the system from:
- Provider outages
- Cost spikes
- API changes
- Future migrations
Clients became stable.
Infrastructure became swappable.
The Context Failure That Reframed Everything
The error:
cannot truncate prompt with n_keep >= n_ctx
was more than a runtime issue.
OpenClaw built prompts optimized for large-context Anthropic models.
When fallback hit a smaller local model, the same prompt failed.
That exposed a deeper truth:
Routing isn’t model selection.
It’s prompt transformation.
Different providers require different context strategies.
The middleware wasn’t just a router anymore.
It was a policy engine.
That set up the next phase: stepping back and deciding what this system had actually become.
Want the next chapter automatically?