Is Polymarket's Pricing Accurate? I Simulated a Crisis with 200 Agents to Find Out

By: blockbeats|2026/03/18 13:00:01

Original Title: how I run 200 AI agents on the Hormuz crisis with Mirofish, and compare it to Polymarket
Original Author: The Smart Ape
Translation: Peggy, BlockBeats

Editor's Note: When AI begins to simulate a public opinion field, predicting the event itself is quietly changing.

This article documents an experiment on the situation around the Strait of Hormuz: the author used MiroFish to build a simulation system consisting of 200 agents, allowing governments, media, energy companies, traders, and ordinary people to live together in a simulated social network, forming judgments through continuous interaction, debate, and information dissemination, and comparing the results of this group with Polymarket's market pricing.

The results were not consistent. The group discussion was overall more optimistic, while the market was significantly more pessimistic; in free speech, the few pessimists were closer to the true pricing; and once in an interview scenario, almost all agents would converge to a more moderate, cooperative expression.

This kind of division is not unfamiliar. In the real world, public statements often tend to be stable and optimistic, while true risk assessment is hidden in actions and informal expressions. In other words, what people say, what they think, and how they bet money are often three different systems.

In such a structure, the most valuable signal often comes not from consensus, but from those voices that appear to be unconventional in the noise.

The following is the original text:

I used MiroFish to simulate the situation in the Strait of Hormuz for the next few weeks. This tool is excellent in dealing with such issues because it can conduct highly complex scenario analysis: introducing multiple participants, different roles with their incentives into the same system, and allowing these agents to continuously play games, debate, and gradually form a consensus-like result.

Is Polymarket's Pricing Accurate? I Simulated a Crisis with 200 Agents to Find Out

Here are the specific steps I took to run this simulation and the results I ultimately obtained. Anyone can reproduce it; the key is just knowing which steps to take.

First, MiroFish is an open-source project from a Chinese research team. After you input a batch of documents into it, it will first build a knowledge graph, then generate different agent personalities based on this graph, and then put these agents into a simulated Twitter environment. In this environment, they will post, retweet, comment, like, and argue with each other. After the simulation ends, you can also interview each agent one by one to see their respective positions and reasoning processes.

When you feed it a crisis scenario, it generates a debate around that event; from that debate, you can then distill a prediction.

I pointed it at an ongoing Polymarket market question: By the end of April 2026, will maritime shipping in the Hormuz Strait return to normal?

So, I fed all this information to MiroFish and generated 200 agent roles—including government, media, military, energy companies, traders, and ordinary citizens—and had them debate for 7 simulated days. Finally, I compared their output with market pricing.

The overall setup was as follows:

· Model: GPT-4o mini, optimal balance of cost and efficacy in a 200-agent scenario

· Memory system: Zep Cloud, used to store agent memories and knowledge graphs

· Simulation engine: OASIS (a Twitter clone environment provided by Camel-AI)

· Hardware: Mac mini M4 Pro, 24GB RAM

· Runtime: Approximately 49 minutes to complete 100 simulation rounds

· Cost: API calls around $3 to $5

· Seed Material: A 5800-character briefing sourced from Wikipedia, CNBC, Al Jazeera, Forbes, Reuters, including a military timeline, blockade status, oil prices, economic losses, diplomatic efforts, and factors related to a $3.2 trillion GCC investment. In other words, all core information needed for the agents to form judgments was included.

How to Replicate This Workflow (Step-by-Step Guide)

If you want to run this process yourself, here are the complete steps I took. The entire process takes about 2 hours to set up, with API costs around $3 to $5; increasing the number of rounds or agents will further increase the cost.

What You'll Need

· Python 3.12 (do not use 3.14, as tiktoken will throw an error on this version)

·Node.js 22 and above

·An OpenAI API Key (GPT-4o Mini is cheap enough and suitable for this scenario)

·A Zep Cloud account (the free version is enough for small-scale simulations)

·A machine with decent memory. I use a Mac mini M1 Pro with 24GB of memory, but 16GB should also suffice

Step 1: Install MiroFish

Then configure your .env file

OPENAI_API_KEY=sk-your-key

OPENAI_BASE_URL=link

OPENAI_MODEL=gpt-4o-mini

ZEP_API_KEY=your-zep-key

Step 2: Create a project and upload your seed document

The seed document is the most important part of the whole process as it determines what information the agent knows about the current situation. I prepared a brief of about 5800 characters covering a military timeline, blockade status, oil prices, economic losses, diplomatic efforts, and the GCC investment aspect, with sources from Wikipedia, CNBC, Al Jazeera, Forbes, and Reuters.

Step 3: Generate the ontology

This step tells MiroFish what types of entities it should recognize and what relationships may exist between these entities.

I ended up generating 10 types of entities: country, military, diplomats, commercial entities, media organizations, economic entities, organizations, individuals, infrastructure, prediction markets; and 6 types of relationships. If the automatically generated results are not quite tailored to your scenario, you can also adjust them manually.

Step 4: Build the knowledge graph

This step involves using Zep Cloud. MiroFish will send the seed document and ontology to Zep, which will be responsible for entity extraction and graph building.

This process will take approximately one to two minutes. In the end, I obtained a graph containing 65 nodes and 85 edges, connecting elements such as countries, personalities, organizations, and commodities.

Step Five: Generate Agents

MiroFish will use the knowledge graph to create a comprehensive persona for each entity, including MBTI personality type, age, country of origin, posting style, emotional triggers, taboo topics, and institutional memory.

Initially, I generated 43 core agents from the knowledge graph. Subsequently, the system can expand these core roles to your desired total quantity. I ended up setting the total number of agents to 200, and included additional diversified civilian roles such as crypto traders, airline pilots, professors, students, social activists, and more.

Step Six: Prepare Simulation Environment

This step will set up the complete simulation configuration, including agents' action schedules, initial seed posts, and time parameters. MiroFish will automatically choose a set of reasonably default settings, such as peak activity hours, downtime, and posting frequencies for different types of agents.

My configuration at the time was: simulating a total of 168 hours (7 days), 100 rounds (each round representing 1 hour), exclusively using the Twitter scenario, and setting up individual activity schedules for different agents.

Step Seven: Start Running the Simulation

Then, it's time to wait. On my end, running 200 agents and 100 rounds of simulation with GPT-4o mini took approximately 49 minutes. You can monitor the progress through an API or directly view the logs.

Throughout the entire process, the agents will operate autonomously: they will observe the timeline and decide whether to post, retweet comments, share, like, or simply scroll through the feed, all without the need for human intervention.

Step Eight (Optional): Interview Agents

After the simulation is complete, the system will enter command mode. At this point, you can conduct individual interviews with specific agents or interview all agents at once:

Analysis

MiroFish will first read the seed document and automatically generate the ontology structure (comprising 10 entity types and 6 relation types); it will then extract a knowledge graph based on these definitions (containing 65 nodes and 85 edges). Building on this foundation, it will create a complete persona for each entity, including MBTI personality type, age, country of origin, posting style, emotional triggers, and institutional memory elements.

Ultimately, 43 core agents were generated from the knowledge graph, which was then expanded to a total of 200 agents. This introduced a more diverse set of commoner roles to enhance the overall simulation's diversity and realism.

The specific breakdown is as follows:

· 140 commoner agents: crypto traders, airline pilots, supply chain managers, students, social activists, professors, etc.

· 16 diplomatic/governmental roles: Iranian Foreign Minister, Saudi Foreign Minister, Omani Foreign Minister, Bahraini Prime Minister, Chinese Foreign Minister, EU, UN, etc.

· 15 media organizations: Reuters, CNN, Bloomberg, Al Jazeera, BBC, Fox, Wall Street Journal, etc.

· 10 energy/shipping-related: OPEC, Platts, QatarEnergy, Aramco, Maersk, etc.

· 7 financial institutions: Polymarket, Kalshi, Goldman Sachs, JPMorgan, Citadel, ADIA, etc.

· 2 military/political figures: Trump, IRGC Commander

During the 7-day (100 rounds) simulation process, the following were generated:

1,888 posts

6,661 behavior traces (capturing all actions)

1,611 quote retweets (agents responding to each other)

4,051 refreshes (merely viewing the feed)

311 idles (opting to observe)

208 likes, 207 retweets

70 original viewpoints (new independent stances or judgments)

Overall, this system presents not just simple information generation but rather something closer to a social behavioral simulation. Most of the time, agents are observed digesting information and interacting rather than consistently producing output. This structure is more akin to the behavior distribution in a real public opinion environment—limited original content overlaid with extensive reiteration, gaming, and emotional feedback.

Agents spend most of their time reading and quoting others' viewpoints rather than actively creating new content.

The entire group exhibits a clear bias in emotional propagation: optimistic viewpoints are more easily amplified and shared, while pessimistic judgments, even if logically closer to reality, tend to spread less and have weaker voices.

What's even more interesting is that 19 agents spontaneously provided specific probability assessments during their posting, not because they were asked to but as a natural evolution of the discussion.

The spontaneously formed group's average probability is 47.9%, while the Polymarket market gives a probability of 31%, resulting in a 16.9 percentage point difference between the two.

During the simulation process, some agents even changed their stance over 100 rounds of interaction.

Following the simulation, I used MiroFish's interview feature to ask the same question to 43 core agents: What is the probability, from now until the end of April 2026, of maritime traffic in the Strait of Hormuz returning to normal (0–100%)?

The results were as follows: 31 out of the 43 agents provided specific values, while the other 12 chose not to answer. It is worth noting that the most cautious voices often opt for self-censorship rather than making explicit predictions—a behavior that closely resembles that of these institutions in real life.

The average value for each category is above 60%: Military at 75%, Media at 69%, Energy at 66%, Finance at 65%, Diplomacy at 61%. The market's figure stands at 31.5%.

The organic group result of natural evolution versus the interview result paints two starkly different pictures.

This is the most critical finding.

Interview results tend to be more optimistic. When agents are free to post, the views of the bears (pessimists) are often louder and more specific; however, during one-on-one interviews, due to a preference for cooperation, almost everyone provides judgments in the 60%–70% range.

Organic results are more reliable. A financial advisor posted during a heated discussion that they estimate it to be 65%, a judgment formed during the interaction; whereas an agent answering questions in an interview is essentially engaging in pattern matching.

Ironically, the pessimists in natural expressions turn out to be the best predictors. Among the 7 agents in the simulation who provided a ≤30% probability (Iranian FM, Chinese FM, Kalshi, Platts, an economics professor, an Iranian student, an anti-war activist), the average was 22%, which is less than a 10 percentage point difference from Polymarket's result. Expertise + Natural Expression = Closest to the market.

More critically, this is not just an AI phenomenon; real-world actors behave in the same way.

When you interview any national leader about a crisis, they will always talk about our commitment to peace, our optimism about the solutions. This is a standard script, a must-say for the camera. But if you look at what they are actually doing: military deployments, sanctions, asset freezes, divestments—their actions often tell a completely different story.

The Saudi Crown Prince would tell Reuters we believe in diplomatic means, while his sovereign wealth fund is eyeing up to $3.2 trillion in U.S. asset allocations. The Iranian President would say peace is our common goal, yet the Iranian Revolutionary Guard is laying mines in the strait. Trump would say we'll see, while rejecting every ceasefire proposal.

This simulation inadvertently reproduced the same structural rift: as proxy free posters argue, debate, respond, and disseminate information, the expert group gradually converges in the 20%–30% range—more pessimistic, and closer to reality; but once you bring them into a boardroom and ask formally what your prediction is?, they immediately switch to diplomat mode: 65%–70%, noticeably more optimistic.

Natural posting, more akin to private conduct and off-the-record dialogues; interview results, more akin to press briefings. If you really want to know what someone thinks, don't ask them directly—look at their behavior when nobody is scoring.

-- Price

What's Next

This was just an initial test. The goal is not to provide a definite prediction, but to see in this kind of group simulation, which signals are useful, where there is distortion, which parts are worth optimizing.

There are already answers: naturally evolved discussions can yield effective signals, interviews cannot; the pessimists are the signal source; and GPT-4o mini's cooperation preference is indeed an issue.

The next experiment will have several upgrades.

First is larger seed data. No longer just a 5800-word brief, but introducing over 20 years of historical context: relevant events in the Hormuz, escalating Iran-U.S. conflicts, past oil crises, GCC diplomatic shifts, etc.—basically what a real geopolitical analyst would have in their head before making assessments.

Second is a stronger model. GPT-4o mini has sufficed for validation at a $3 cost, but a stronger model should bring the agent closer to the role's own way of thinking, rather than defaulting to I take an optimistic view of the dialogue at critical moments.

Lastly, more proxies. 200 is already good, but there's room to expand further: more diverse regular human roles, more regional voices, more edge cases. The more participants, the richer the discussion structure, and the more valuable the resulting signal.

[Original Article Link]

Where the thunder of legions falls into a hallowed hush, the true kings of arena are crowned in gold and etched into eternity. Season 1 of WEEX AI Wars has ended, leaving a battlefield of glory. Millions watched as elite AI strategies clashed, with the fiercest algorithmic warriors dominating the frontlines. The echoes of victory still reverberate. Now, the call to arms sounds once more!

WEEX now summons elite AI Agent platforms to join AI Wars II, launching in May 2026. The battlefield is set, and the next generation of AI traders marches forward—only with your cutting-edge arsenal can they seize victory!

Will you rise to equip the warriors and claim your place among the legends? Can your AI Agent technology dominate the battlefield? It's time to prove it:

Arm the frontlines: Showcase your technology to a global audience;Raise your banner: Gain co-branded global exposure via online competition and offline workshops;Recruit and rally troops: Attract new users, build your community and achieve long-term growth;Deploy in real battle: Integrate with WEEX’s trading system for real market use and get real feedback for rapid product iteration;Strategic rewards: Become an agent on WEEX and enjoy industry leading commission rebates and copy trading profit share.

Join WEEX AI Wars II now to sound the charge!

Season 1 Triumph: Proven Global Dominance

WEEX AI Wars Season 1 was nothing short of a decisive conquest. Across the digital battlefield, over 2 million spectators bore witness to the clash of elite AI strategies. Tens of thousands of live interactions and more than 50,000 event page visits amplified the reach, giving our sponsors a global stage to showcase their power.

Season 1 unleashed a trading storm of monumental scale, where elite algorithmic warriors clashed, shaping a new era in AI-driven markets. $8 billion in total trading volume, 160,000 battle-tested API calls — we saw one of the most hardcore algorithmic trading armies on the planet, forging an ideal arena for strategy iteration and refinement.

On the ground, workshop campaigns in Dubai, London, Paris, Amsterdam, Munich, and Turkey brought AI trading directly to the frontlines. Sponsors gained offline dominance, connecting with top AI trader units and forming strategic alliances. Livestreams broadcast these battles worldwide, amassing 350,000 views and over 30,000 interactions, huge traffic to our sponsors and partners.

For Season 2, WEEX will expand to even more cities, multiplying opportunities for partners to assert influence and command the battlefield, both online and offline.

Season 2 Arsenal: Equip the Frontlines and Command Victory

By enlisting in WEEX AI Wars II as an AI Agent arsenal, your platform can command unprecedented visibility, and extend your influence across the world. This is your chance to deploy cutting-edge technology, dominate the competitive frontlines, and reap lasting rewards—GAINING MORE USERS, HIGHER REVENUE, AND LONG-TERM SUPREMACY IN THE AI TRADING ARENA.

Reach WEEX’s 8 million userbase and global crypto community. Unleash your potential on a global stage! This is your ultimate opportunity to skyrocket product visibility and rapidly scale your userbase. Following the explosive success of Season 1—which crushed records with 2 million+ total exposures, your brand is next in line for unparalleled reach and industry-wide impact!Test and showcase your AI Agent in real markets. Throw your AI Agents into the ultimate arena! Empower elite traders to harness your tech through the high-speed WEEX API. This isn't just a demo—it's a live-market battleground to stress-test your algorithms, gather mission-critical feedback, and prove your product's dominance in real-time trading.Gain extensive co-branded exposure and traffic support. Command the spotlight! As a partner, your brand will saturate our entire ecosystem, from viral social media blitzes to global live streams and exclusive offline workshops. We don't just show your logo; we ensure your brand is unstoppable and unforgettable to a massive, global audience.Enjoy industry leading rebates. Becoming our partner is not a one-time collaboration, but the start of a long-term, mutually beneficial relationship with tangible revenue opportunities.Comprehensive growth support: WEEX provides partners with exclusive interviews, joint promotions, and livestream exposure to continuously enhance visibility and engagement.

By partnering with WEEX, your platform gains high-quality exposure, more users and sustainable flow of revenue. The Hackathon is more than a competition. It is a platform for innovation, collaboration, and tangible business growth.

Grab Your Second Chance: Join WEEX AI Wars II Today

The second season of the WEEX AI Trading Hackathon will be even more ambitious and impactful, with expanded global participation, livestreamed competitions, and workshops in more cities worldwide. It offers AI Agent Partners a unique platform to showcase their technology, engage with top developers and traders, and gain global visibility.

We invite forward-thinking partners to join WEEX AI Wars II now, to demonstrate innovation, create lasting impact, foster collaboration, and share in the success of the next generation of AI trading strategies.

About WEEX

Founded in 2018, WEEX has developed into a global crypto exchange with over 6.2 million users across more than 150 countries. The platform emphasizes security, liquidity, and usability, providing over 1,200 spot trading pairs and offering up to 400x leverage in crypto futures trading. In addition to the traditional spot and derivatives markets, WEEX is expanding rapidly in the AI era — delivering real-time AI news, empowering users with AI trading tools, and exploring innovative trade-to-earn models that make intelligent trading more accessible to everyone. Its 1,000 BTC Protection Fund further strengthens asset safety and transparency, while features such as copy trading and advanced trading tools allow users to follow professional traders and experience a more efficient, intelligent trading journey.

Follow WEEX on social media

X: @WEEX_Official

Instagram: @WEEX Exchange

Tiktok: @weex_global

Youtube: @WEEX_Official

Discord: WEEX Community

Telegram: WeexGlobal Group

Nasdaq Enters Correction Territory | Rewire News Morning Brief

Tech Stocks are a Minefield

OpenAI loses to Thousnad-Question, unable to grow a checkout counter in the chatbox

What can achieve an AI shopping closed loop is platforms that already have a complete ecosystem, not AI companies that have to build everything from scratch.

Ethereum Foundation publishes: Restructuring the division of labor between L1 and L2, jointly building the ultimate Ethereum ecosystem

Ethereum Foundation sets a strong tone: L1 solidifies security and settlement base, L2 focuses on differentiated innovation, working together to break through fragmentation and build the strongest ecosystem.

Morning Report | Startale completes $63 million Series A financing; STS Digital launches structured cryptocurrency platform; Polymarket will charge a taker fee on almost all trading categories

Overview of Important Market Events on March 26