Intelligent error monitoring agent that uses Airweave to find context from your code, tickets, and Slack. Clusters similar errors, enriches with relevant context, and creates actionable alerts.
Build production-grade error monitoring with semantic understanding. This project shows how Airweave transforms raw errors into actionable insights by connecting your codebase, tickets, and Slack discussions.
undefinedBackground: This project is based on our internal agent “Donke” that handles ~40,000 Airweave queries/month. Read more about how we use Airweave to improve Airweave.
Error monitoring tools give you alerts. What you actually need is context:
This agent uses Airweave to automatically find that context and make intelligent decisions about severity, deduplication, and alerting.
Raw Errors → [Clustering] → [Context Search] → [Analysis] → Intelligent Alerts
│ │ │
└── Groups └── Airweave └── Severity + status
similar finds code, determination with
errors tickets, Slack suppression logic
Example: 20 raw errors → 4 actionable clusters → 4 alerts (instead of 20)
The frontend provides a visual demonstration of how the error monitoring pipeline works. It’s designed to showcase the tool’s capabilities with sample data.
# Clone and setup
git clone <repo-url>
cd support-agent
cp .env.example .env
# Optional: Add LLM key for smarter clustering (works without it too)
# OPENAI_API_KEY=your_key (or ANTHROPIC_API_KEY)
# Start backend
cd backend
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000
# Start frontend (new terminal)
cd frontend
npm install && npm run dev
Open http://localhost:3000 and click Run Demo.
The interactive visualization walks you through:
The demo uses:
To use this as an actual error monitoring agent for your application:
Replace sample data with real errors from your monitoring stack.
DATA_SOURCE=sentry
SENTRY_AUTH_TOKEN=your_token
SENTRY_ORG_SLUG=your_org
SENTRY_PROJECT_SLUG=your_project # Optional: filter to specific project
DATA_SOURCE=azure
AZURE_TENANT_ID=your_tenant
AZURE_CLIENT_ID=your_client
AZURE_CLIENT_SECRET=your_secret
AZURE_LOG_ANALYTICS_WORKSPACE_ID=your_workspace_id
Create backend/sources/your_source.py:
from sources.base import DataSource
class YourSource(DataSource):
@property
def name(self) -> str:
return "Your Source"
async def fetch_errors(self, window_minutes=30, limit=100):
# Fetch from your API and normalize to RawError format
return [RawError(...) for error in your_api.get_errors()]
This is where the magic happens - Airweave searches your GitHub code, Linear tickets, and Slack threads.
AIRWEAVE_API_KEY=your_key
AIRWEAVE_COLLECTION_ID=your_collection
AIRWEAVE_API_URL=https://api.airweave.ai # or self-hosted
undefinedSet up your Airweave collection:undefined
Now when errors occur, the agent will find relevant context from your actual codebase and history.
LINEAR_ENABLED=true
LINEAR_API_KEY=lin_api_...
LINEAR_TEAM_ID=your_team_uuid
LINEAR_DEFAULT_LABEL_IDS=label1,label2 # Optional
SLACK_ENABLED=true
SLACK_BOT_TOKEN=xoxb-your-token
SLACK_CHANNEL_ID=C0123456789
Run the pipeline on a schedule (e.g., every 5 minutes):
# run_monitoring.py
import asyncio
from main import run_pipeline, PipelineConfig
async def main():
config = PipelineConfig(use_sample_data=False) # Use real data source
# Runs the full pipeline: fetch → cluster → search → analyze → alert
result = await run_pipeline(config)
print(f"Pipeline completed: {result.status}")
if __name__ == "__main__":
asyncio.run(main())
# Cron (every 5 minutes)
*/5 * * * * cd /path/to/backend && source venv/bin/activate && python run_monitoring.py
# Or use a scheduler like APScheduler, Celery, etc.
Run the FastAPI server and trigger via API:
uvicorn main:app --host 0.0.0.0 --port 8000
# Trigger pipeline
curl -X POST http://localhost:8000/api/run
Errors are grouped by semantic similarity to reduce alert noise:
| Stage | Method | Example |
|---|---|---|
| 1. Error Type | HTTP code, exception class | All 429 rate limit errors together |
| 2. Merge | Combine similar clusters | Auth errors from different services |
| 3. LLM (optional) | Semantic similarity | “Rate limit” + “Too many requests” |
Not every error needs an alert:
| Status | Meaning | Action |
|---|---|---|
NEW |
First occurrence | Create ticket + Alert |
REGRESSION |
Was fixed, came back | Reopen ticket + Alert |
ONGOING |
Known issue | Comment on ticket (suppressed) |
NEW or REGRESSION → Always alertS1/S2 severity → Always alert (override suppression)ONGOING with open ticket → Suppress (don’t spam)Each error cluster is enriched with relevant context from your connected sources:
# The agent searches your Airweave collection for each error cluster
from airweave import AirweaveSDK
client = AirweaveSDK(api_key=AIRWEAVE_API_KEY)
# Search with optional source filtering
results = client.search.search(
collection_id=COLLECTION_ID,
query="timeout in sync_worker batch processing",
source_name="github", # Filter to specific source (github, linear, slack)
limit=5
)
The agent automatically searches for:
This context informs severity analysis and helps engineers understand the issue immediately.
├── backend/
│ ├── main.py # FastAPI + WebSocket + Pipeline orchestration
│ ├── config.py # Configuration management
│ ├── state.py # JSON-based state storage (signatures, mutes)
│ ├── schemas.py # Pydantic data models
│ ├── samples/ # Sample error data for demo
│ ├── sources/ # Data sources (sample, sentry, azure)
│ ├── clients/ # External APIs (airweave, linear, slack)
│ └── pipeline/
│ ├── clustering.py # Error type clustering
│ ├── search.py # Airweave context search
│ ├── analysis.py # Severity + status determination
│ └── actions.py # Linear/Slack integrations
├── frontend/ # React demo visualization (not needed for production)
│ ├── src/
│ │ ├── App.tsx # Main app with WebSocket connection
│ │ └── components/ # Pipeline visualization components
├── docs/
│ ├── CONFIGURATION.md # All configuration options
│ └── ARCHITECTURE.md # Technical deep dive
└── .env.example # All available environment variables
| Endpoint | Method | Description |
|---|---|---|
/api/config |
GET | Current configuration status |
/api/state |
GET | State storage statistics |
/api/samples |
GET | Sample error data |
/api/run |
POST | Trigger pipeline run |
/api/mute |
POST | Mute an error signature |
/api/mute/{sig} |
DELETE | Unmute a signature |
Connect to /ws for real-time pipeline updates (used by the frontend demo):
const ws = new WebSocket('ws://localhost:8000/ws');
ws.send(JSON.stringify({ action: 'run_pipeline' }));
ws.onmessage = (e) => {
const event = JSON.parse(e.data);
// Events: step_started, step_data_ready, step_completed, pipeline_completed
};
See .env.example for all options. Key settings:
| Variable | Description | Default |
|---|---|---|
DATA_SOURCE |
Error source: sample, sentry, azure |
sample |
AIRWEAVE_API_KEY |
Airweave API key for context search | - |
OPENAI_API_KEY |
OpenAI key for LLM clustering/analysis | - |
LINEAR_ENABLED |
Enable real Linear ticket creation | false |
SLACK_ENABLED |
Enable real Slack notifications | false |
MIT License - use this as a starting point for your own error monitoring system.
Built with Airweave - context retrieval for AI agents across apps & databases.