Local-first agent operating system for multi-project development.
A lightweight local-first project cockpit — manage multiple
long-term projects through a single web chat surface, with structured
markdown memory and a bounded execution layer that can delegate code
work to a sandboxed Coding Agent.
Detailed implementation status, task history, and the roadmap live in
ROADMAP.md; the system’s shape (files, pipelines,
invariants) is inARCHITECTURE.md. The stable
operating guide for any agent working on this repo is
CLAUDE.md.
Agent OS is a small Agent Operating System built for one builder
running multiple projects. It combines:
.md files, not buried in chat scrollback.execution_workspaces/{project_id}/repo/,@code … or by confirming aresult.md.Existing tools each solve a piece of the workflow but not the whole loop:
Agent OS combines the parts that matter for project work:
ChatGPT-like conversation per project + Claude Code-like execution
power on demand + readable local memory files. It stays lightweight
on purpose.
Phase 1 & 2 (workspace + memory + orchestration + semantic
writeback) are complete. Phase 3 — Execution Layer is complete
through Task 06.2E (automatic command verification + bounded repair).
The Coding Agent runs sandboxed jobs in the background, the runs panel
auto-refreshes, inferred coding intent is surfaced as a confirmable
plan, terminal runs reconcile back into project memory, the main
agent can inspect specific repo files on demand through a bounded
sandboxed channel, and post-run verification now covers both a
project-defined verify command (06.2A) and an opt-in headless-browser
screenshot of a project-managed dev server (06.2B). The whole
build-and-preview loop now lives in the chat thread (06.2D): a run
posts a natural “running” note, then a completion summary with a
Run browser verification button; clicking it installs dependencies,
starts the dev server on port 5174, captures a screenshot, and returns
a live preview URL + thumbnail inline — and keeps the dev server alive
so the URL stays usable. The Runs panel gained Start / Stop preview
controls; the run detail modal is now a detailed inspection view.
Command verification is now automatic (06.2E): Agent OS infers the
right checks from the repo (npm run build, pytest, or a syntax
check), gives the Coding Agent one bounded repair pass if they fail, and
marks a run completed only after they pass — then offers browser
verification in chat.
Full task log and the next-step plan are in ROADMAP.md.
┌────────────┐ ┌────────────────┐ ┌────────────────┐
│ Frontend │ ←→ │ FastAPI │ ←→ │ Anthropic API │
│ (React) │ │ /api/* │ │ (Claude) │
└────────────┘ └────────────────┘ └────────────────┘
│
┌────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
memory/ projects/{id}/ execution_workspaces/{id}/
(global .md) (project .md) ├─ repo/ ← Coding Agent
├─ runs/ ← per-run artifacts
├─ logs/
├─ AGENT.md
└─ TASK.md
orchestrator.pyllm.py wraps the Anthropic SDK;memory/; project memory in projects/{id}/.backend/execution/ contains the sandbox,@code and explicitcd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add ANTHROPIC_API_KEY
uvicorn main:app --reload --port 8000
cd frontend
npm install
npm run dev
Open http://localhost:5173.
agent-os/
├── frontend/ # React + Vite + TypeScript UI
├── backend/ # Python + FastAPI
│ ├── main.py # API endpoints
│ ├── orchestrator.py # context assembly + memory judge + inspect loop
│ ├── llm.py # Anthropic SDK wrapper
│ ├── database.py # SQLite (conversations + messages + pending exec)
│ ├── execution/ # sandbox, runner, judges, reconciliation, inspect
│ └── tests/ # backend test suite (stubbed LLM, no API key needed)
├── memory/ # global markdown memory (private; ships SOUL.md + *.example.md templates + README)
├── projects/ # per-project markdown memory (private; ships *.example.md templates + README)
├── execution_workspaces/ # Coding Agent workspaces (private; ships *.example.md templates + README)
├── README.md # this file (public landing page)
├── ROADMAP.md # detailed status + task log + next steps
├── ARCHITECTURE.md # system shape: files, pipelines, invariants
└── CLAUDE.md # stable operating guide for coding agents
cd backend
python tests/test_delegation_judge.py
python tests/test_pending_execution.py
python tests/test_pending_execution_db.py
python tests/test_memory_reconciliation.py
python tests/test_inspect.py
python tests/test_verification.py
python tests/test_verification_inference.py
python tests/test_browser_verification.py
python tests/test_runner_diagnostics.py
All tests stub the LLM caller, so no Anthropic API key is needed to
run them.