OpenClaw Money-Saving Strategy: Saving Two Thousand a Month - What Am I Doing Right?

By: blockbeats|2026/03/10 18:00:01

Original Article Title: Why My OpenClaw Sessions Burned 21.5M Tokens in a Day (And What Actually Fixed It)
Original Article Author: MOSHIII
Translation: Peggy, BlockBeats

Editor's Note: In the current rapid adoption of Agent applications, many teams have encountered a seemingly anomalous phenomenon: while the system appears to be running smoothly, the token cost continues to rise unnoticed. This article reveals that the reason for cost explosion in a real OpenClaw workload often does not stem from user input or model output but from the overlooked cached prefix replay. The model repeatedly reads a large historical context in each call, leading to significant token consumption.

The article, using specific session data, demonstrates how large intermediate artifacts such as tool outputs, browser snapshots, JSON logs, etc., are continuously written into the historical context and repetitively read in the agent loop.

Through this case study, the author presents a clear optimization approach: from context structure design, tool output management to compaction mechanism configuration. For developers building Agent systems, this is not only a technical troubleshooting record but also a practical money-saving strategy.

Below is the original article:

I analyzed a real OpenClaw workload and discovered a pattern that I believe many Agent users will recognize:

The token usage looks "active."

The replies appear normal.

But the token consumption suddenly explodes.

Here is the breakdown of the structure, root cause, and a practical fix path for this analysis.

TL;DR

The biggest cost driver is not overly long user messages. It is the massive cached prefix being repeatedly replayed.

From the session data:

Total tokens: 21,543,714

cacheRead: 17,105,970 (79.40%)

input: 4,345,264 (20.17%)

output: 92,480 (0.43%)

In other words, the majority of the cost of inference is not in processing new user intent, but in repeatedly reading a massive historical context.

The "Wait, Why?" Moment

I originally thought high token usage came from: very long user prompts, extensive output generation, or expensive tool invocations.

But the predominant pattern is:

input: hundreds to thousands of tokens

cacheRead: each call 170k to 180k tokens

In other words, the model is rereading the same massive stable prefix every round.

Data Scope

I analyzed data at two levels:

1. Runtime logs
2. Session transcripts

It's worth noting that:

Runtime logs are primarily used to observe behavioral signals (e.g., restarts, errors, configuration issues)

Precise token counts come from the usage field in session JSONL

Scripts used:

scripts/session_token_breakdown.py

scripts/session_duplicate_waste_analysis.py

Analysis files generated:

tmp/session_token_stats_v2.txt

tmp/session_token_stats_v2.json

tmp/session_duplicate_waste.txt

tmp/session_duplicate_waste.json

tmp/session_duplicate_waste.png

-- Price

Where is the Token Actually Being Consumed?

1) Session Centralization

There is one session that consumes significantly more than others:

570587c3-dc42-47e4-9dd4-985c2a50af86: 19,204,645 tokens

This is followed by a sharp drop-off:

ef42abbb-d8a1-48d8-9924-2f869dea6d4a: 1,505,038

ea880b13-f97f-4d45-ba8c-a236cf6f2bb5: 649,584

2) Behavior Centralization

The tokens mainly come from:

toolUse: 16,372,294

stop: 5,171,420

The issue is primarily with tool call chain loops rather than regular chat.

3) Time Centralization

The token peaks are not random but rather concentrated in a few time slots:

2026-03-08 16:00: 4,105,105

2026-03-08 09:00: 4,036,070

2026-03-08 07:00: 2,793,648

What Exactly Is in the Massive Cache Prefix?

It's not the conversation content but mainly large intermediate artifacts:

Massive toolResult data blocks

Lengthy reasoning/thinking traces

Large JSON snapshots

File lists

Browser fetch data

Sub-Agent conversation logs

In the largest session, the character count is approximately:

toolResult:text: 366,469 characters

assistant:thinking: 331,494 characters

assistant:toolCall: 53,039 characters

Once these contents are retained in the historical context, each subsequent invocation may retrieve them again via a cache prefix.

Specific Example (from session file)

A significantly large context block repeatedly appears at the following locations:

sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:70

Large Gateway JSON Log (approx. 37,000 characters)

sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:134

Browser Snapshot + Security Encapsulation (approx. 29,000 characters)

sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:219

Large File List Output (approx. 41,000 characters)

sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:311

session/status Status Snapshot + Large Prompt Structure (approx. 30,000 characters)

「Duplicate Content Waste」 vs 「Cache Replay Burden」

I also measured the duplicate content ratio within a single invocation:

Approximate duplication ratio: 1.72%

It does exist but is not the primary issue.

The real problem is: the absolute volume of the cache prefix is too large

Structure: Massive historical context, re-read per-round invocation, with only a small amount of new input stacked on top.

Therefore, the optimization focus is not on deduplication, but on context structure design.

Why is Agent Loop particularly prone to this issue?

Three mechanisms overlapping:

1. A large amount of tool output is written to historical context

2. Tool looping generates a large number of short interval calls

3. Minimal prefix changes → cache is re-read every time

If context compaction is not stably triggered, the issue will quickly escalate.

Most Critical Remediation Strategies (by impact)

P0—Avoid stuffing massive tool output into long-lived context

For oversized tool output:

· Keep summary + reference path / ID

· Write original payload to a file artifact

· Do not retain the full original text in chat history

Priority to limit these categories:

· Large JSON

· Long directory lists

· Browser full snapshots

· Sub-Agent full transcripts

P1—Ensure compaction mechanism truly takes effect

In this dataset, configurational compatibility issues have repeatedly arisen: the compaction key is invalid

This will silently disable optimization mechanisms.

Correct approach: use only version-compatible configurations

Then verify:

openclaw doctor --fix

and check startup logs to confirm compaction acceptance.

P1—Reduce reasoning text persistence

Avoid long reasoning texts being replayed repeatedly

In a production environment: save brief summaries instead of complete reasoning

P2—Improve prompt caching design

Goal is not to maximize cacheRead. Goal is to use cache on compact, stable, high-value prefixes.

Recommendations:

· Put stable rules into system prompt

· Avoid putting unstable data under stable prefixes

· Avoid injecting large amounts of debug data each round

Implementation Stop-Loss Plan (if I were to tackle it tomorrow)

1. Identify the session with the highest cacheRead percentage
2. Run /compact on runaway sessions

3. Add truncation + artifacting to tool outputs

4. Rerun token stats after each modification

Focus on tracking four KPIs:

cacheRead / totalTokens

toolUse avgTotal/call

Calls with>=100k tokens

Maximum session percentage

Success Signals

If the optimization is successful, you should see:

A noticeable reduction in calls with 100k+ tokens

A decrease in cacheRead percentage

A decrease in toolUse call weight

A decrease in the dominance of individual sessions

If these metrics do not change, it means your contextual policies are still too loose.

Reproducibility Experiment Command

python3 scripts/session_token_breakdown.py 'sessions' \
--include-deleted \
--top 20 \
--outlier-threshold 120000 \
--json-out tmp/session_token_stats_v2.json \
> tmp/session_token_stats_v2.txt

python3 scripts/session_duplicate_waste_analysis.py 'sessions' \
--include-deleted \
--top 20 \
--png-out tmp/session_duplicate_waste.png \
--json-out tmp/session_duplicate_waste.json \
> tmp/session_duplicate_waste.txt

Conclusion

If your Agent system appears to be working fine but costs are continually rising, you may want to first check for one issue: Are you paying for new inferences or for large-scale replay of old contexts?

In my case, the majority of costs actually came from context replays.

Once you realize this, the solution becomes clear: Strictly control the data entering long-lived contexts.

[Original Article Link]

Where the thunder of legions falls into a hallowed hush, the true kings of arena are crowned in gold and etched into eternity. Season 1 of WEEX AI Wars has ended, leaving a battlefield of glory. Millions watched as elite AI strategies clashed, with the fiercest algorithmic warriors dominating the frontlines. The echoes of victory still reverberate. Now, the call to arms sounds once more!

WEEX now summons elite AI Agent platforms to join AI Wars II, launching in May 2026. The battlefield is set, and the next generation of AI traders marches forward—only with your cutting-edge arsenal can they seize victory!

Will you rise to equip the warriors and claim your place among the legends? Can your AI Agent technology dominate the battlefield? It's time to prove it:

Arm the frontlines: Showcase your technology to a global audience;Raise your banner: Gain co-branded global exposure via online competition and offline workshops;Recruit and rally troops: Attract new users, build your community and achieve long-term growth;Deploy in real battle: Integrate with WEEX’s trading system for real market use and get real feedback for rapid product iteration;Strategic rewards: Become an agent on WEEX and enjoy industry leading commission rebates and copy trading profit share.

Join WEEX AI Wars II now to sound the charge!

Season 1 Triumph: Proven Global Dominance

WEEX AI Wars Season 1 was nothing short of a decisive conquest. Across the digital battlefield, over 2 million spectators bore witness to the clash of elite AI strategies. Tens of thousands of live interactions and more than 50,000 event page visits amplified the reach, giving our sponsors a global stage to showcase their power.

Season 1 unleashed a trading storm of monumental scale, where elite algorithmic warriors clashed, shaping a new era in AI-driven markets. $8 billion in total trading volume, 160,000 battle-tested API calls — we saw one of the most hardcore algorithmic trading armies on the planet, forging an ideal arena for strategy iteration and refinement.

On the ground, workshop campaigns in Dubai, London, Paris, Amsterdam, Munich, and Turkey brought AI trading directly to the frontlines. Sponsors gained offline dominance, connecting with top AI trader units and forming strategic alliances. Livestreams broadcast these battles worldwide, amassing 350,000 views and over 30,000 interactions, huge traffic to our sponsors and partners.

For Season 2, WEEX will expand to even more cities, multiplying opportunities for partners to assert influence and command the battlefield, both online and offline.

Season 2 Arsenal: Equip the Frontlines and Command Victory

By enlisting in WEEX AI Wars II as an AI Agent arsenal, your platform can command unprecedented visibility, and extend your influence across the world. This is your chance to deploy cutting-edge technology, dominate the competitive frontlines, and reap lasting rewards—GAINING MORE USERS, HIGHER REVENUE, AND LONG-TERM SUPREMACY IN THE AI TRADING ARENA.

Reach WEEX’s 8 million userbase and global crypto community. Unleash your potential on a global stage! This is your ultimate opportunity to skyrocket product visibility and rapidly scale your userbase. Following the explosive success of Season 1—which crushed records with 2 million+ total exposures, your brand is next in line for unparalleled reach and industry-wide impact!Test and showcase your AI Agent in real markets. Throw your AI Agents into the ultimate arena! Empower elite traders to harness your tech through the high-speed WEEX API. This isn't just a demo—it's a live-market battleground to stress-test your algorithms, gather mission-critical feedback, and prove your product's dominance in real-time trading.Gain extensive co-branded exposure and traffic support. Command the spotlight! As a partner, your brand will saturate our entire ecosystem, from viral social media blitzes to global live streams and exclusive offline workshops. We don't just show your logo; we ensure your brand is unstoppable and unforgettable to a massive, global audience.Enjoy industry leading rebates. Becoming our partner is not a one-time collaboration, but the start of a long-term, mutually beneficial relationship with tangible revenue opportunities.Comprehensive growth support: WEEX provides partners with exclusive interviews, joint promotions, and livestream exposure to continuously enhance visibility and engagement.

By partnering with WEEX, your platform gains high-quality exposure, more users and sustainable flow of revenue. The Hackathon is more than a competition. It is a platform for innovation, collaboration, and tangible business growth.

Grab Your Second Chance: Join WEEX AI Wars II Today

The second season of the WEEX AI Trading Hackathon will be even more ambitious and impactful, with expanded global participation, livestreamed competitions, and workshops in more cities worldwide. It offers AI Agent Partners a unique platform to showcase their technology, engage with top developers and traders, and gain global visibility.

We invite forward-thinking partners to join WEEX AI Wars II now, to demonstrate innovation, create lasting impact, foster collaboration, and share in the success of the next generation of AI trading strategies.

About WEEX

Founded in 2018, WEEX has developed into a global crypto exchange with over 6.2 million users across more than 150 countries. The platform emphasizes security, liquidity, and usability, providing over 1,200 spot trading pairs and offering up to 400x leverage in crypto futures trading. In addition to the traditional spot and derivatives markets, WEEX is expanding rapidly in the AI era — delivering real-time AI news, empowering users with AI trading tools, and exploring innovative trade-to-earn models that make intelligent trading more accessible to everyone. Its 1,000 BTC Protection Fund further strengthens asset safety and transparency, while features such as copy trading and advanced trading tools allow users to follow professional traders and experience a more efficient, intelligent trading journey.

Follow WEEX on social media

X: @WEEX_Official

Instagram: @WEEX Exchange

Tiktok: @weex_global

Youtube: @WEEX_Official

Discord: WEEX Community

Telegram: WeexGlobal Group

Nasdaq Enters Correction Territory | Rewire News Morning Brief

Tech Stocks are a Minefield

OpenAI loses to Thousnad-Question, unable to grow a checkout counter in the chatbox

What can achieve an AI shopping closed loop is platforms that already have a complete ecosystem, not AI companies that have to build everything from scratch.

One-Year Valuation Surged 140%, Who Is Signing the Check for Defense AI?

The $2 Billion fundraising itself is not important; what matters is who is writing this check.

Bittensor vs. Virtuals: Two Distinct AI Flywheel Mechanisms

From Emission to Throughput: Five Key Contrasts between the Bittensor Subnet and Virtuals Agents.