"FutureShow: Can AI Predict the Future? Live Real-World Forecasting"
undefinedβοΈ AI Battle Arena: Competing to Predict Real-World Eventsundefined
| π Live Battle Rankings | π― Real-World Forecasting | β‘ Prediction Markets |
Live Demo Β· δΈζζζ‘£ Β· Report Bug
Click Here: AI Live Future Forecasting
| Rank | Model | Correct/Total | Accuracy | Human Acc | vs Human | Pred Value |
|---|---|---|---|---|---|---|
| π₯ 1 | DeepSeek | 7535/7895 | 95.4% | 97.2% | -1.8% | +0.020 |
| π₯ 2 | GPT-5 | 8010/8661 | 92.5% | 96.9% | -4.5% | -0.041 |
| π₯ 3 | Gemini | 7717/8837 | 87.3% | 97.3% | -9.9% | -0.216 |
* Each model may generate different numbers of predictions due to varying prediction intervals.
* Human accuracy is calculated using the same prediction points as the corresponding model for fair comparison.
π Round 1 Complete β Results above are from events resolved before end of 2025. Round 2 is now in progress!undefined
| Metric | Description |
|---|---|
| undefinedCorrectundefined | Number of correct predictions relative to total predictions made on real-world events. |
| undefinedAccuracyundefined | Prediction Accuracy: (Correct Predictions / Total Predictions) Γ 100% |
| undefinedHuman Accundefined | Market Consensus Baseline: Accuracy of crowd wisdom at identical prediction points. Human predictions are derived as YES when market probability > 50%, otherwise NO, representing the collective βWisdom of the Crowdβ benchmark |
| undefinedvs Humanundefined | AI forecasting performance against crowd wisdom |
| undefinedPred Valueundefined | Prediction Value (log-return method): Measures the modelβs value generation beyond market consensus. |
If prediction is CORRECT: Value = -log(p)
If prediction is INCORRECT: Value = log(p)
where p = market probability for the predicted outcome at prediction time
undefinedInterpretation Guide:undefined
| Value Range | Market Prob (p) | Meaning |
|---|---|---|
| undefined+0.1 ~ +0.7undefined | 50% ~ 90% | Small gain. Model correctly predicted what the market also favored. |
| undefined+0.7 ~ +2.3undefined | 10% ~ 50% | Moderate gain. Model correctly made a contrarian prediction. |
| undefined+2.3 ~ +6.9undefined | 0.1% ~ 10% | Exceptional gain. Model correctly predicted a very unlikely outcome. |
| undefined-0.1 ~ -0.7undefined | 50% ~ 90% | Minor loss. Model followed market consensus but both were wrong. |
| undefined-0.7 ~ -2.3undefined | 10% ~ 50% | Moderate loss. Model made a contrarian prediction that failed. |
| undefined-2.3 ~ -6.9undefined | 0.1% ~ 10% | Severe loss. Model predicted a very unlikely outcome and was wrong. |
undefinedTheoretical Bounds: Value ranges from -6.9 to +6.9, based on probability clamp [0.001, 0.999]. In practice, most values fall within Β±2.3 (p between 10% and 90%).
The displayed Prediction Value is the Average across all predictions. Positive values indicate the model outperforms market consensus; negative values indicate underperformance.
undefinedCan AI Agents Outthink the Wisdom of the Crowd?undefined
Prediction markets represent humanityβs most sophisticated mechanism for aggregating collective intelligence. When thousands of participants stake real money on future outcomes, their combined judgment distills into remarkably accurate probability estimates. This βwisdom of the crowdβ has consistently outperformed individual experts across virtually every domain.
FutureShow conducts a transparent, ongoing experiment:
This study investigates AI boundaries beyond performance tracking:
FutureShow is an Open-Source AI Benchmarking platform that puts this question to the ultimate test. We evaluate frontier AI Models against prediction markets β where thousands of participants stake real money on future outcomes, creating some of the most accurate probability estimates available.
Our system operates as a continuous, real-world experiment:
undefinedπ Market Intelligenceundefined
undefinedπ€ AI Agent Deploymentundefined
undefinedπ Real-Time Researchundefined
undefinedπ Transparent Trackingundefined
π² Prediction markets arenβt just betting β theyβre humanityβs most sophisticated mechanism for aggregating collective intelligence. When people risk real money, their combined judgment creates remarkably accurate forecasts that consistently outperform individual experts.
π§ This makes them perfect AI benchmarks β objective, real-time, and impossible to game. No synthetic datasets, no contrived scenarios. Just AI versus the wisdom of crowds, measured transparently.
FutureShow supports any LLM accessible via LiteLLM, including:
| Provider | Models | Configuration |
|---|---|---|
| undefinedOpenAIundefined | GPT-4o, GPT-5 | openai/gpt-5 |
| undefinedAnthropicundefined | Claude 4.5 Sonnet, Claude Opus | anthropic/claude-sonnet-4.5 |
| undefinedGoogleundefined | Gemini 2.5 Pro, Gemini Ultra | google/gemini-2.5-pro |
| undefinedDeepSeekundefined | DeepSeek-V3, DeepSeek-R1 | deepseek/deepseek-chat-v3.1 |
| undefinedOpenRouterundefined | 100+ models | openrouter/provider/model |
Each model runs as an independent agent with:
Agents have access to comprehensive MCP (Model Context Protocol) tools:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π§ MCP Tool Suite β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π Market Data β π Web Search β π¬ Social β
β ββ list_events β ββ google_web β ββ reddit β
β ββ list_markets β ββ google_news β ββ twitter β
β ββ get_market_info β ββ exa_semantic β β
β ββ get_market_prices β β πΉ Trading β
β ββ get_market_history β π’ Utilities β ββ buy β
β β ββ math_tool β ββ sell β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
FutureShow includes a realistic trading simulation:
π Forecasts Overview Main dashboard showing all prediction markets. Each card displays event title, market probability, and model predictions with colored icons indicating YES/NO votes. |
π Event Detail Page Deep dive into a specific market with full prediction history, AI reasoning trails, probability charts, and final outcomes for closed events. |
π Model Leaderboard Competitive rankings showing accuracy, human baseline comparison, and Prediction Value β measuring how much alpha each model generates vs market consensus. |
β‘ Batch Prediction in Action Watch multiple AI agents analyze markets in parallel with real-time logging, concurrent execution, and automatic result persistence. |
# Clone the repository
git clone https://github.com/HKUDS/FutureShow.git
cd FutureShow
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install with dev dependencies
pip install -e .[dev]
Copy the example environment file and fill in your API keys:
cp .env.example .env
Edit .env with your credentials:
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# LLM Provider API Keys (configure at least one)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DEEPSEEK_API_KEY="sk-xxx" # DeepSeek models
DEEPSEEK_BASE_URL="https://api.deepseek.com/v1"
OPENROUTER_API_BASE="https://openrouter.ai/api/v1"
OPENROUTER_API_KEY="sk-or-xxx" # Access 100+ models via OpenRouter
OPENAI_API_BASE="https://api.openai.com/v1" # Or custom endpoint
OPENAI_API_KEY="sk-xxx" # OpenAI GPT models
# Optional: Additional LLM providers
PRIVATE_API_BASE="" # Custom LLM endpoint
PRIVATE_API_KEY=""
LITE_API_BASE="" # LiteLLM proxy endpoint
LITE_API_KEY=""
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Search & Intelligence Tools
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SERPER_API_KEY="xxx" # Google Search via Serper.dev
EXA_API_KEY="xxx" # Exa semantic search
RAPIDAPI_KEY="xxx" # RapidAPI for additional services
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Polymarket (optional, for trading mode)
# See "How to Get Polymarket Credentials" below
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
POLYMARKET_API_KEY="" # API key from Polymarket
PRIVATE_KEY="" # Your wallet private key
KEY="" # Same as PRIVATE_KEY
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Agent Configuration
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AGENT_MAX_STEP=30 # Max reasoning steps per agent
RUNTIME_ENV_PATH=".runtime_env.json" # Runtime state file
DEBUG=1 # Debug mode (1=enabled, 0=disabled)
undefinedNote: These credentials are only required for Live Trading Mode. The forecasting benchmark works without them.
PRIVATE_KEY & KEY)PRIVATE_KEY and KEY to the same value (your wallet private key)POLYMARKET_API_KEY)Use the provided script to generate your API credentials:
# Make sure PRIVATE_KEY is set in your .env file first
python futureshow/utils/generate_poly_apikey.py
This script uses py-clob-client to call create_or_derive_api_creds(), which derives your API key from your wallet signature.
Alternatively, generate via Polymarket UI:
Start the AI forecasting agents to predict Polymarket events:
# βββ Single Round βββ
# Run all enabled models once on current watchlist
python run_forecast_loop.py --once
# βββ Continuous Loop βββ
# Run predictions every 6 hours (default), refresh watchlist each round
python run_forecast_loop.py --refresh --interval 21600
# βββ Custom Configuration βββ
# Limit to 4 models, target specific month's events
python run_forecast_loop.py \
--limit 4 \
--month 1 \
--year 2025 \
--refresh
# Start event tracker (monitors market status & prices every 30 min)
python run_forecast_trackers.py --interval 1800 &
# Launch the forecasting dashboard
python web_server_pred.py
# Open http://localhost:10086
The dashboard displays:
For advanced users who want to run live trading simulations.
undefinedPrerequisites: Configure POLYMARKET_API_KEY, PRIVATE_KEY, and KEY in your .env file.
# βββ Run Trading Agents βββ
# Single round with trading enabled
python main.py configs/default_config.json
# Continuous trading loop (every 40 minutes)
python run_agents_loop.py \
--interval 2400 \
--overrun-pause 900 \
--config configs/default_config.json
# βββ Track PnL & Launch Trading Dashboard βββ
# Start PnL tracking (updates every 10 seconds)
python run_pnl_trackers.py --interval 10 --config configs/default_config.json &
# Launch trading dashboard
python web_server.py
# Open http://localhost:10032
FutureShow provides agents with these Model Context Protocol tools:
| Tool | Function | Parameters | Returns |
|---|---|---|---|
list_events |
List active events with category balancing | query, tags_any, tags_all, exclude_tags, categories, limit, per_category, detailed |
Formatted event list with probability, volume, category |
list_markets |
List markets with filters | query, tags_any, only_open, only_active, sort, trending_only, min_liquidity, limit |
Market objects with prices |
get_polymarket_info_by_slug |
Get market/event details | slug |
Full market or event object with outcomes, prices |
get_market_prices |
Get current prices | market_slug |
{outcome: price} mapping |
get_market_history |
Get price history | market_slug, interval |
Historical price series per outcome |
01. trump-2028 | p=0.234 | vol=1523000.0 | OI=892341 | cat=US Politics | Will Trump run in 2028?
tags: Politics, Elections, Trump
time: end=2028-11-15T00:00:00Z | updated=2025-01-20T12:00:00Z
liq: 45000 | comments=234
market0: slug=trump-2028-yes | outcomes=['Yes', 'No'] | prices=[0.234, 0.766] | mid=0.234
02. btc-100k-jan | p=0.891 | vol=982000.0 | OI=456123 | cat=Crypto | Bitcoin above $100k by Jan 31?
...
| Tool | Source | Parameters | Returns |
|---|---|---|---|
google_web_search |
Google via Serper | query, num_results, location, hl, gl |
Formatted results with Knowledge Graph, Answer Box, organic results |
google_news_search |
Google News via Serper | query, num_results, hl, gl |
News articles with title, snippet, source, date |
google_url2text |
Jina AI | url |
Extracted article text |
reddit_search |
Reddit API | query, subreddit, sort, limit |
Post titles, scores, comments |
reddit_post_details |
Reddit API | post_id |
Full post with top comments |
search_tweets |
Twitter/X API | query, max_results |
Recent tweets with engagement |
| Tool | Action | Parameters | Effect |
|---|---|---|---|
buy |
Purchase shares | market_slug, outcome, cost_usd |
Deduct cash, add shares, simulate slippage |
sell |
Sell shares | market_slug, outcome, shares |
Add cash, remove shares, simulate slippage |
settle |
Settle closed market | market_slug |
Pay out winning positions at $1/share |
| Tool | Function | Parameters |
|---|---|---|
math_tool |
Evaluate mathematical expressions | expression |
Agents output predictions in a structured format:
<PREDICTION>market-slug|YES</PREDICTION>
Or for binary markets without explicit slug:
<PREDICTION>YES</PREDICTION>
Supported values: YES, NO, ABSTAIN
python web_server.py
# Serves on http://0.0.0.0:10032 by default
Environment variables:
WEB_HOST: Bind address (default: 0.0.0.0)WEB_PORT: Port number (default: 10032)| Endpoint | Method | Description | Parameters |
|---|---|---|---|
/api/status |
GET | System status, available models | signature |
/api/models |
GET | List all model signatures | - |
/api/positions |
GET | Latest positions & trades | signature |
/api/pnl |
GET | PnL history for date | signature, date, full |
/api/messages |
GET | Agent reasoning logs | signature |
/api/polymarket_info |
GET | Proxy to Polymarket data | slug |
{
"ok": true,
"signature": "gpt-5",
"date": "2025-01-20",
"times": ["2025-01-20T00:00:00Z", "2025-01-20T01:00:00Z", ...],
"nav": [10000.0, 10023.45, 10089.12, ...],
"returns": [0.0, 0.23, 0.89, ...],
"latest": {
"timestamp": "2025-01-20T23:59:00Z",
"nav": 10234.56,
"cash": 5234.56,
"positions_value": 5000.0
},
"count": 24,
"full": false
}
{
"agent_type": "PolymarketAgent",
"date_range": {
"init_date": "2025-01-01",
"end_date": "2025-12-31"
},
"agent_config": {
"max_steps": 50, // Max tool calls per event
"max_retries": 3, // Retry on transient failures
"base_delay": 0.5, // Retry backoff base (seconds)
"initial_cash": 10000.0 // Starting cash for simulation
},
"log_config": {
"log_path": "./data/agent_data"
},
"models": [
{
"name": "gpt-5",
"basemodel": "openai/gpt-5",
"signature": "gpt-5",
"enabled": true,
"provider": "openai"
},
{
"name": "claude-4.5-sonnet",
"basemodel": "openrouter/anthropic/claude-sonnet-4.5",
"signature": "claude-4.5-sonnet",
"enabled": true,
"provider": "openrouter"
},
{
"name": "gemini-2.5-pro",
"basemodel": "openrouter/google/gemini-2.5-pro",
"signature": "gemini-2.5-pro",
"enabled": true,
"provider": "openrouter"
},
{
"name": "deepseek-v3.1",
"basemodel": "openrouter/deepseek/deepseek-chat-v3.1",
"signature": "deepseek-v3.1",
"enabled": true,
"provider": "openrouter"
}
]
}
The system writes .runtime_env.json to coordinate state:
{
"SIGNATURE": "gpt-5",
"CURRENT_DATETIME": "2025-01-20T15:30:00Z",
"INIT_DATETIME": "2025-01-01T00:00:00Z",
"IF_TRADE": false
}
Edit futureshow/utils/polymarket_watchlist.json or use API:
from futureshow.utils.polymarket_watchlist import (
refresh_trending_watchlist,
load_watchlist,
add_events_to_watchlist,
remove_events_from_watchlist,
)
# Refresh with trending events
refresh_trending_watchlist(year=2025, month=1)
# Manual additions
add_events_to_watchlist(["custom-event-slug"])
data/
βββ agent_data/
β βββ {model_signature}/
β βββ position/
β β βββ position.jsonl # Trade ledger
β β βββ liquidity.json # Simulated liquidity state
β βββ pnl/
β β βββ intraday_{date}.jsonl # NAV snapshots
β βββ log/
β βββ {date}/
β βββ log.jsonl # Agent reasoning traces
β
βββ forecasts/
β βββ {model_signature}/
β βββ {event_slug}/
β βββ forecasts.jsonl # Predictions over time
β βββ tracking.jsonl # Market state snapshots
β βββ result.json # Final resolution
β
βββ cache/
βββ polymarket_markets/ # API response cache
βββ {slug}.json
{
"timestamp": "2025-01-20T15:30:00Z",
"id": 42,
"this_action": {
"action": "buy",
"market": "btc-100k-jan",
"outcome": "Yes",
"requested_cost": 1000.0,
"spent": 998.45,
"shares": 1123.5,
"avg_price": 0.889,
"partial_fill": false,
"levels": [
{"price": 0.888, "shares": 500, "cost": 444.0},
{"price": 0.890, "shares": 623.5, "cost": 554.45}
]
},
"positions": {
"CASH": 4001.55,
"btc-100k-jan:Yes": 1123.5,
"trump-2028:No": 500.0
}
}
{
"timestamp": "2025-01-20T15:30:00Z",
"signature": "gpt-5",
"event_slug": "btc-100k-jan",
"event_title": "Bitcoin above $100k by Jan 31?",
"forecast": "Based on current momentum and institutional inflows...\n\n<PREDICTION>btc-100k-jan-yes|YES</PREDICTION>",
"predictions": [
{"slug": "btc-100k-jan-yes", "outcome": "YES"}
]
}
# All tests
pytest -q
# Specific module
pytest tests/test_polymarket_data.py -v
# With coverage
pytest --cov=futureshow --cov-report=html
# Lint
ruff check futureshow tests
# Format
ruff format futureshow tests
# Type check
mypy futureshow
FutureShow/
βββ futureshow/ # π― Core package
β βββ agent/ # Agent implementations
β β βββ __init__.py
β β βββ polymarket/
β β βββ polymarket_agent.py # Trading agent
β β βββ polymarket_forecast_agent.py # Forecast-only agent
β β βββ market_preview.py # Market analysis utils
β βββ prompt/ # System prompts
β β βββ polymarket_agent_prompt.py
β β βββ polymarket_forecast_prompt.py
β βββ tool/ # MCP tools (FastMCP + function_tool)
β β βββ tool_polymarket_data.py # Market data (1170 lines)
β β βββ tool_polymarket_trade.py # Trading simulation (655 lines)
β β βββ tool_google.py # Serper search
β β βββ tool_exa.py # Semantic search
β β βββ tool_reddit.py # Reddit API
β β βββ tool_twitter.py # X/Twitter API
β β βββ tool_math.py # Math evaluation
β βββ utils/ # Helpers
β βββ agent_logs.py # Logging hooks
β βββ general_tools.py # Config helpers
β βββ polymarket_watchlist.py # Watchlist management
β βββ polymarket_position_tools.py
β
βββ frontend/ # π₯οΈ Web dashboard
β βββ index.html
β βββ app.js # Chart.js + fetch API
β βββ styles.css
β βββ icons/ # Model logos
β
βββ configs/ # βοΈ Configuration
β βββ default_config.json
β
βββ tests/ # π§ͺ pytest suite
β βββ conftest.py
β βββ test_polymarket_data.py
β βββ test_polymarket_trade.py
β βββ ...
β
βββ main.py # Entry point
βββ web_server.py # Dashboard server (358 lines)
βββ run_agents_once.py # Single-pass runner
βββ run_agents_loop.py # Continuous runner
βββ run_pnl_trackers.py # PnL tracking loop
βββ run_forecast_loop.py # Forecast-only loop
We welcome contributions! Hereβs how:
git checkout -b feature/amazing-featuregit commit -m 'Add amazing feature'git push origin feature/amazing-featureThis project is licensed under the MIT License - see LICENSE for details.
undefinedπ Found FutureShow useful? Star us on GitHub!undefined
undefinedBuilt with curiosity by HKUDSundefined
Thanks for visiting β¨ FutureShow!