agent-framework-wizard-battle

A head-to-head comparison of every major Python LLM agent framework, all tackling the same challenge: guiding a user through building a 5e Level 1 D&D Wizard.

HTML

public

View on GitHub

Agent Framework Wizard Battle

A head-to-head comparison of every major Python LLM agent framework, all tackling the same challenge: guiding a user through building a 5e (2024) Level 1 D&D Wizard.

Each framework implements the same agent as a web server with the same shared frontend, deployed side-by-side in this repo for direct comparison.

The Challenge

Each framework must serve the frontend chat interface that displays the conversation and updates the Character Sheet in real-time.

What We’re Testing

Conversation Handling — Managing conversation history between agent and user.
Few-Shot Examples — Concrete examples of desired agent behavior at each step of the wizard-building process.
Tool Calling — Dice rolling is handled by tools, not the LLM.
Retrieval Augmented Generation — Race and spell information surfaced into context based on user questions.
Multi-Step Orchestration — Each step of building a wizard has its own prompt, behavior examples, and available tools (e.g., assigning ability scores vs. building a spellbook).
State Management — The character sheet is a Pydantic model with validation preventing illegal states (e.g., picking a level 2 spell).

Shared Infrastructure

Frontend: Single-page index.html, shared across all agents.
Communication: HTTP + SSE. Two endpoints per agent: one for the index.html and one for chat and the character sheet via POST + SSE.
LLM Backend: All frameworks communicate with models via the LiteLLM Python SDK, making them equally model-agnostic.

The Frameworks

Find me

v0.3.3[beta]