OpenClaw Series

Minimizing the Context Window

Web search wasn’t the only source of bloat. The runtime itself was oversharing.

Part 4 of 7 Feb 20, 2026
Cinematic image of data streams being narrowed before entering a machine core

Why Stop at Optimizing Web Results?

After minimizing Tavily output, a more uncomfortable realization surfaced.

Web search wasn’t the biggest source of token inflation.

OpenClaw was.

Each request to Anthropic was carrying:

  • Tool schemas
  • System instructions
  • Prior messages
  • Memory injections
  • Structured metadata
  • Full JSON responses

Individually reasonable.

Collectively expensive.

The First Prompt

… I want to try and minimize information being passed to Anthropic from a web search result. Can we look at implementing a translation layer to minimize JSON responses?

That was the technical starting point.

But the deeper issue wasn’t JSON.

It was context accumulation.

The Translation Layer

The solution began with a translation layer.

Instead of passing raw JSON from tools directly into the model:

  • Extract only relevant fields.
  • Remove nested structures.
  • Flatten the payload.
  • Normalize the format.
  • Strip redundant keys.

Anthropic didn’t need the entire response schema.

It needed the distilled meaning.

Translation became compression.

In practice, that meant dropping everything that made the payload feel like a raw API dump:

  • navigation junk
  • extra metadata
  • nested objects that added no decision value
  • anything the model would only paraphrase back to me anyway

The Second Realization

I hit usage limits.

Not because the system was unstable.

Because it was verbose.

I paused for two hours and asked a different question:

I want to look at how I can minimize context OpenClaw is sending per request.

That’s when the focus shifted from:

“Optimizing tool output”

to

“Budgeting the entire runtime.”

Context Is a Budget, Not a Bucket

Every request includes:

  • System prompts
  • Agent definitions
  • Tool descriptions
  • Prior turns
  • Memory injections
  • Tool outputs

Even well-structured systems can quietly expand.

And expansion increases:

  • Cost
  • Latency
  • Drift
  • Hallucination risk

Bigger context is not better context.

Better context is smaller and intentional.

What Changed

The new design principles:

  • Include only the tools needed for the current turn.
  • Inject only relevant memory, not full context packs.
  • Trim system instructions to what the task requires.
  • Avoid carrying forward redundant conversation history.
  • Minimize tool schemas where possible.

The goal wasn’t to shrink capability.

It was to shrink waste.

That also surfaced another problem: Telegram was great for interaction, but not for verbose output.

Want the next chapter automatically?