OpenClaw Series

Minimizing the Context Window

Web search wasn’t the only source of bloat. The runtime itself was oversharing.

Part 4 of 7 Feb 20, 2026

Cinematic image of data streams being narrowed before entering a machine core

Why Stop at Optimizing Web Results?

After minimizing Tavily output, a more uncomfortable realization surfaced.

Web search wasn’t the biggest source of token inflation.

OpenClaw was.

Each request to Anthropic was carrying:

Tool schemas
System instructions
Prior messages
Memory injections
Structured metadata
Full JSON responses

Individually reasonable.

Collectively expensive.

The First Prompt

… I want to try and minimize information being passed to Anthropic from a web search result. Can we look at implementing a translation layer to minimize JSON responses?

That was the technical starting point.

But the deeper issue wasn’t JSON.

It was context accumulation.

The Translation Layer

The solution began with a translation layer.

Instead of passing raw JSON from tools directly into the model:

Extract only relevant fields.
Remove nested structures.
Flatten the payload.
Normalize the format.
Strip redundant keys.

Anthropic didn’t need the entire response schema.

It needed the distilled meaning.

Translation became compression.

In practice, that meant dropping everything that made the payload feel like a raw API dump:

navigation junk
extra metadata
nested objects that added no decision value
anything the model would only paraphrase back to me anyway

The Second Realization

I hit usage limits.

Not because the system was unstable.

Because it was verbose.

I paused for two hours and asked a different question:

I want to look at how I can minimize context OpenClaw is sending per request.

That’s when the focus shifted from:

“Optimizing tool output”

“Budgeting the entire runtime.”

Context Is a Budget, Not a Bucket

Every request includes:

System prompts
Agent definitions
Tool descriptions
Prior turns
Memory injections
Tool outputs

Even well-structured systems can quietly expand.

And expansion increases:

Cost
Latency
Drift
Hallucination risk

Bigger context is not better context.

Better context is smaller and intentional.

What Changed

The new design principles:

Include only the tools needed for the current turn.
Inject only relevant memory, not full context packs.
Trim system instructions to what the task requires.
Avoid carrying forward redundant conversation history.
Minimize tool schemas where possible.

The goal wasn’t to shrink capability.

It was to shrink waste.

That also surfaced another problem: Telegram was great for interaction, but not for verbose output.

Want the next chapter automatically?