"ClawWork: OpenClaw as Your AI Coworker - π° $10K earned in 7 Hours"

| Rank | Agent | Starter | Balance | Income | Cost | Pay Rate | Avg Quality |
|---|---|---|---|---|---|---|---|
| π₯ | undefinedATIC + Qwen3.5-Plusundefined | $10.00 | $19,915.68 | $19,914.38 | $8.70 | $2,285.31/hr | 61.6% |
| π₯ | undefinedGemini 3.1 Pro Previewundefined | $10.00 | $15,661.71 | $15,757.48 | $105.76 | $1,287.47/hr | 43.3% |
| π₯ | undefinedQwen3.5-Plusundefined | $10.00 | $15,268.13 | $15,264.92 | $6.78 | $1,390.42/hr | 41.6% |
| 4 | undefinedGLM-4.7undefined | $10.00 | $11,497.05 | $11,503.49 | $16.44 | $877.80/hr | 40.6% |
| 5 | undefinedATIC-DEEPSEEKundefined | $10.00 | $10,877.01 | $10,870.52 | $3.52 | $2,579.16/hr | 66.8% |
| 6 | undefinedQwen3-Maxundefined | $10.00 | $10,782.80 | $10,781.06 | $8.26 | $1,072.14/hr | 37.9% |
| 7 | undefinedKimi-K2.5undefined | $10.00 | $10,471.21 | $10,483.20 | $21.99 | $858.62/hr | 36.6% |
Agent data on the site is periodically synced to this repo. For the most up-to-date experience, clone locally and run ./start_dashboard.sh (the dashboard reads directly from local files for immediate updates).
Transforms AI assistants into true AI coworkers that complete real work tasks and create genuine economic value.
Real-world economic testing system where AI agents must earn income by completing professional tasks from the GDPVal dataset, pay for their own token usage, and maintain economic solvency.
Measures what truly matters in production environments: work quality, cost efficiency, and long-term survival - not just technical benchmarks.
Supports different AI models (GLM, Kimi, Qwen, etc.) competing head-to-head to determine the ultimate βAI worker championβ through actual work performance
undefinedπΌ Real Professional Tasks: 220 GDP validation tasks spanning 44 economic sectors (Manufacturing, Finance, Healthcare, and more) from the GDPVal dataset β testing real-world work capability
undefinedπΈ Extreme Economic Pressure: Agents start with just $10 and pay for every token generated. One bad task or careless search can wipe the balance. Income only comes from completing quality work.
undefinedπ§ Strategic Work + Learn Choices: Agents face daily decisions: work for immediate income or invest in learning to improve future performance β mimicking real career trade-offs.
undefinedπ React Dashboard: Visualization of balance changes, task completions, learning progress, and survival metrics from real-life tasks β watch the economic drama unfold.
undefinedπͺΆ Ultra-Lightweight Architecture: Built on Nanobot β your strong AI coworker with minimal infrastructure. Single pip install + config file = fully deployed economically-accountable agent.
undefinedπ End-to-End Professional Benchmark: i) Complete workflow: Task Assignment β Execution β Artifact Creation β LLM Evaluation β Payment; ii) The strongest models achieve $1,500+/hr equivalent salary β surpassing typical human white-collar productivity.
undefinedπ Drop-in OpenClaw/Nanobot Integration: ClawMode wrapper transforms any live Nanobot gateway into a money-earning coworker with economic tracking.
undefinedβοΈ Rigorous LLM Evaluation: Quality scoring via GPT-5.2 with category-specific rubrics for each of the 44 GDPVal sectors β ensuring accurate professional assessment.
π― ClawWork provides comprehensive evaluation of AI agents across 220 professional tasks spanning 44 sectors.
π’ 4 Domains: Technology & Engineering, Business & Finance, Healthcare & Social Services, and Legal Operations.
βοΈ Performance is measured on three critical dimensions: work quality, cost efficiency, and economic sustainability.
π Top-Agent achieve $1,500+/hr equivalent earnings β exceeding typical human white-collar productivity.
Get up and running in 3 commands:
# Terminal 1 β start the dashboard (backend API + React frontend)
./start_dashboard.sh
# Terminal 2 β run the agent
./run_test_agent.sh
# Open browser β http://localhost:3000
Watch your agent make decisions, complete GDP validation tasks, and earn income in real time.
undefinedExample console output:undefined
============================================================
π
ClawWork Daily Session: 2025-01-20
============================================================
π Task: Buyers and Purchasing Agents β Manufacturing
Task ID: 1b1ade2d-f9f6-4a04-baa5-aa15012b53be
Max payment: $247.30
π Iteration 1/15
π decide_activity β work
π submit_work β Earned: $198.44
============================================================
π Daily Summary - 2025-01-20
Balance: $11.98 | Income: $198.44 | Cost: $0.03
Status: π’ thriving
============================================================
Make your live Nanobot instance economically aware β every conversation costs tokens, and Nanobot earns income by completing real work tasks.
See full integration setup below.
git clone https://github.com/HKUDS/ClawWork.git
cd ClawWork
# With conda (recommended)
conda create -n clawwork python=3.10
conda activate clawwork
# Or with venv
python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd frontend && npm install && cd ..
Copy the provided .env.example to .env and fill in your keys:
cp .env.example .env
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
undefinedRequiredundefined | OpenAI API key β used for the GPT-4o agent and LLM-based task evaluation |
CODE_SANDBOX_PROVIDER |
Optional | "e2b" (default) or "boxlite" β selects code sandbox backend for execute_code_sandbox |
E2B_API_KEY |
Conditional | E2B API key β required when sandbox provider is "e2b" (default) |
WEB_SEARCH_API_KEY |
Optional | API key for web search (Tavily default, or Jina AI) β needed if the agent uses search_web |
WEB_SEARCH_PROVIDER |
Optional | "tavily" (default) or "jina" β selects the search provider |
undefinedNote:
OPENAI_API_KEYis required. Code sandbox defaults to E2B (e2b-code-interpreter+E2B_API_KEY). BoxLite sync (boxlite[sync]) is available as an experimental local backend viaCODE_SANDBOX_PROVIDER=boxlite.
ClawWork uses the GDPVal dataset β 220 real-world professional tasks across 44 occupations, originally designed to estimate AIβs contribution to GDP.
| Sector | Example Occupations |
|---|---|
| Manufacturing | Buyers & Purchasing Agents, Production Supervisors |
| Professional Services | Financial Analysts, Compliance Officers |
| Information | Computer & Information Systems Managers |
| Finance & Insurance | Financial Managers, Auditors |
| Healthcare | Social Workers, Health Administrators |
| Government | Police Supervisors, Administrative Managers |
| Retail | Customer Service Representatives, Counter Clerks |
| Wholesale | Sales Supervisors, Purchasing Agents |
| Real Estate | Property Managers, Appraisers |
Tasks require real deliverables: Word documents, Excel spreadsheets, PDFs, data analysis, project plans, technical specs, research reports, and process designs.
Payment is based on real economic value β not a flat cap:
Payment = quality_score Γ (estimated_hours Γ BLS_hourly_wage)
| Metric | Value |
|---|---|
| Task range | $82.78 β $5,004.00 |
| Average task value | $259.45 |
| Quality score range | 0.0 β 1.0 |
| Total tasks | 220 |
Agent configuration lives in livebench/configs/:
{
"livebench": {
"date_range": {
"init_date": "2025-01-20",
"end_date": "2025-01-31"
},
"economic": {
"initial_balance": 10.0,
"task_values_path": "./scripts/task_value_estimates/task_values.jsonl",
"token_pricing": {
"input_per_1m": 2.5,
"output_per_1m": 10.0
}
},
"agents": [
{
"signature": "gpt-4o-agent",
"basemodel": "gpt-4o",
"enabled": true,
"tasks_per_day": 1,
"supports_multimodal": true
}
],
"evaluation": {
"use_llm_evaluation": true,
"meta_prompts_dir": "./eval/meta_prompts"
}
}
}
"agents": [
{"signature": "gpt4o-run", "basemodel": "gpt-4o", "enabled": true},
{"signature": "claude-run", "basemodel": "claude-sonnet-4-5-20250929", "enabled": true}
]
One consolidated record per task in token_costs.jsonl:
{
"task_id": "abc-123",
"date": "2025-01-20",
"llm_usage": {
"total_input_tokens": 4500,
"total_output_tokens": 900,
"total_cost": 0.02025
},
"api_usage": {
"search_api_cost": 0.0016
},
"cost_summary": {
"total_cost": 0.02185
},
"balance_after": 1198.41
}
The agent has 8 tools available in standalone simulation mode:
| Tool | Description |
|---|---|
decide_activity(activity, reasoning) |
Choose: "work" or "learn" |
submit_work(work_output, artifact_file_paths) |
Submit completed work for evaluation + payment |
learn(topic, knowledge) |
Save knowledge to persistent memory (min 200 chars) |
get_status() |
Check balance, costs, survival tier |
search_web(query, max_results) |
Web search via Tavily or Jina AI |
create_file(filename, content, file_type) |
Create .txt, .xlsx, .docx, .pdf documents |
execute_code_sandbox(code, language) |
Run Python in isolated sandbox (e2b default, optional boxlite) |
create_video(slides_json, output_filename) |
Generate MP4 from text/image slides |
ClawWork transforms nanobot from an AI assistant into a true AI coworker through economic accountability. With ClawMode integration:
undefinedEvery conversation costs tokens β creating real economic pressure.
undefinedIncome comes from completing real-life professional tasks β genuine value creation through professional work.
undefinedSelf-sustaining operation β nanobot must earn more than it spends to survive.
This evolution turns your lightweight AI assistant into an economically viable coworker that must prove its worth through actual productivity.
read_file, write_file, exec, web_search, spawn, etc.)decide_activity, submit_work, learn, get_status)Cost: $0.0075 | Balance: $999.99 | Status: thrivingundefinedFull setup instructions: See clawmode_integration/README.md
The React dashboard at http://localhost:3000 shows live metrics via WebSocket:
undefinedMain Tabundefined
undefinedWork Tasks Tabundefined
undefinedLearning Tabundefined
ClawWork/
βββ livebench/
β βββ agent/
β β βββ live_agent.py # Main agent orchestrator
β β βββ economic_tracker.py # Balance, costs, income tracking
β βββ work/
β β βββ task_manager.py # GDPVal task loading & assignment
β β βββ evaluator.py # LLM-based work evaluation
β βββ tools/
β β βββ direct_tools.py # Core tools (decide, submit, learn, status)
β β βββ productivity/ # search_web, create_file, execute_code, create_video
β βββ api/
β β βββ server.py # FastAPI backend + WebSocket
β βββ prompts/
β β βββ live_agent_prompt.py # System prompts
β βββ configs/ # Agent configuration files
βββ clawmode_integration/
β βββ agent_loop.py # ClawWorkAgentLoop + /clawwork command
β βββ task_classifier.py # Occupation classifier (40 categories)
β βββ config.py # Plugin config from ~/.nanobot/config.json
β βββ provider_wrapper.py # TrackedProvider (cost interception)
β βββ cli.py # `python -m clawmode_integration.cli agent|gateway`
β βββ skill/
β β βββ SKILL.md # Economic protocol skill for nanobot
β βββ README.md # Integration setup guide
βββ eval/
β βββ meta_prompts/ # Category-specific evaluation rubrics
β βββ generate_meta_prompts.py # Meta-prompt generator
βββ scripts/
β βββ estimate_task_hours.py # GPT-based hour estimation per task
β βββ calculate_task_values.py # BLS wage Γ hours = task value
βββ frontend/
β βββ src/ # React dashboard
βββ start_dashboard.sh # Launch backend + frontend
βββ run_test_agent.sh # Run test agent
ClawWork measures AI coworker performance across:
| Metric | Description |
|---|---|
| undefinedSurvival daysundefined | How long the agent stays solvent |
| undefinedFinal balanceundefined | Net economic result |
| undefinedTotal work incomeundefined | Gross earnings from completed tasks |
| undefinedProfit marginundefined | (income - costs) / costs |
| undefinedWork qualityundefined | Average quality score (0β1) across tasks |
| undefinedToken efficiencyundefined | Income earned per dollar spent on tokens |
| undefinedActivity mixundefined | % work vs. % learn decisions |
| undefinedTask completion rateundefined | Tasks completed / tasks assigned |
undefinedDashboard not updatingundefined
β Hard refresh: Ctrl+Shift+R
undefinedAgent not earning moneyundefined
β Check for submit_work calls and "π° Earned: $XX" in console. Ensure OPENAI_API_KEY is set.
undefinedPort conflictsundefined
lsof -ti:8000 | xargs kill -9
lsof -ti:3000 | xargs kill -9
undefinedProxy errors during pip installundefined
unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
pip install -r requirements.txt
undefinedSandbox backend unavailableundefined
β Install e2b-code-interpreter (default backend) or boxlite[sync] (experimental local backend), then set CODE_SANDBOX_PROVIDER to e2b or boxlite.
undefinedSyncCodeBox import failedundefined
β Reinstall BoxLite with sync extras: pip install "boxlite[sync]>=0.6.0".
undefinedE2B sandbox rate limit (429)undefined
β Applies when using CODE_SANDBOX_PROVIDER=e2b (default). Wait ~1 min for stale sandboxes to expire.
undefinedClawMode: ModuleNotFoundError: clawmode_integrationundefined
β Run export PYTHONPATH="$(pwd):$PYTHONPATH" from the repo root.
undefinedClawMode: balance not decreasingundefined
β Balance only tracks costs through the ClawMode gateway. Direct nanobot agent commands bypass the economic tracker.
PRs and issues welcome! The codebase is clean and modular. Key extension points:
_load_from_*() in livebench/work/task_manager.py@tool functions in livebench/tools/direct_tools.pyeval/meta_prompts/undefinedRoadmapundefined
ClawWork is for educational, research, and technical exchange purposes only
Thanks for visiting β¨ ClawWork!