Agent Framework Wizard Battle
A head-to-head comparison of every major Python LLM agent framework, all tackling the same challenge: guiding a user through building a 5e (2024) Level 1 D&D Wizard.
Each framework implements the same agent as a web server with the same shared frontend, deployed side-by-side in this repo for direct comparison.
The Challenge
Each framework must serve the frontend chat interface that displays the conversation and updates the Character Sheet in real-time.
What We’re Testing
- Conversation Handling — Managing conversation history between agent and user.
- Few-Shot Examples — Concrete examples of desired agent behavior at each step of the wizard-building process.
- Tool Calling — Dice rolling is handled by tools, not the LLM.
- Retrieval Augmented Generation — Race and spell information surfaced into context based on user questions.
- Multi-Step Orchestration — Each step of building a wizard has its own prompt, behavior examples, and available tools (e.g., assigning ability scores vs. building a spellbook).
- State Management — The character sheet is a Pydantic model with validation preventing illegal states (e.g., picking a level 2 spell).
Shared Infrastructure
- Frontend: Single-page
index.html, shared across all agents.
- Communication: HTTP + SSE. Two endpoints per agent: one for the index.html and one for chat and the character sheet via POST + SSE.
- LLM Backend: All frameworks communicate with models via the LiteLLM Python SDK, making them equally model-agnostic.
The Frameworks