agent-os-earth

Local-first agent operating system for multi-project development.

0
0
0
Python
public
Forked

Agent OS

A lightweight local-first project cockpit — manage multiple
long-term projects through a single web chat surface, with structured
markdown memory and a bounded execution layer that can delegate code
work to a sandboxed Coding Agent.

Detailed implementation status, task history, and the roadmap live in
ROADMAP.md; the system’s shape (files, pipelines,
invariants) is in ARCHITECTURE.md. The stable
operating guide for any agent working on this repo is
CLAUDE.md.

What is Agent OS?

Agent OS is a small Agent Operating System built for one builder
running multiple projects. It combines:

  • Project-scoped conversations — each project has its own chat
    history, isolated from other projects.
  • Structured markdown memory — durable project state lives in
    readable, editable .md files, not buried in chat scrollback.
  • A thin orchestration layer — the main agent assembles context
    from memory, produces planning / explanation replies, and decides via
    a separate semantic-judge call whether memory files should be updated.
  • A bounded execution layer — a sandboxed Coding Agent runs inside
    a per-project workspace under execution_workspaces/{project_id}/repo/,
    dispatched explicitly by the user via @code … or by confirming a
    model-proposed plan.
  • A verification surface — a Runs panel and per-run detail modal
    show status, files changed, commands run, blockers, and result.md.

Why this exists

Existing tools each solve a piece of the workflow but not the whole loop:

  • ChatGPT is great for discussion and planning but can’t operate
    local code or persist project-shaped memory.
  • Claude Code is great at executing inside a repo but isn’t ideal
    as the long-term conversation surface for a project.
  • General agent platforms are too broad and not optimized for a
    multi-project workflow.

Agent OS combines the parts that matter for project work:
ChatGPT-like conversation per project + Claude Code-like execution
power on demand + readable local memory files. It stays lightweight
on purpose.

Current status

Phase 1 & 2 (workspace + memory + orchestration + semantic
writeback) are complete. Phase 3 — Execution Layer is complete
through Task 06.2E (automatic command verification + bounded repair).
The Coding Agent runs sandboxed jobs in the background, the runs panel
auto-refreshes, inferred coding intent is surfaced as a confirmable
plan, terminal runs reconcile back into project memory, the main
agent can inspect specific repo files on demand through a bounded
sandboxed channel, and post-run verification now covers both a
project-defined verify command (06.2A) and an opt-in headless-browser
screenshot of a project-managed dev server (06.2B). The whole
build-and-preview loop now lives in the chat thread (06.2D): a run
posts a natural “running” note, then a completion summary with a
Run browser verification button; clicking it installs dependencies,
starts the dev server on port 5174, captures a screenshot, and returns
a live preview URL + thumbnail inline — and keeps the dev server alive
so the URL stays usable. The Runs panel gained Start / Stop preview
controls; the run detail modal is now a detailed inspection view.
Command verification is now automatic (06.2E): Agent OS infers the
right checks from the repo (npm run build, pytest, or a syntax
check), gives the Coding Agent one bounded repair pass if they fail, and
marks a run completed only after they pass — then offers browser
verification in chat.

Full task log and the next-step plan are in ROADMAP.md.

Architecture

┌────────────┐    ┌────────────────┐    ┌────────────────┐
│  Frontend  │ ←→ │  FastAPI       │ ←→ │  Anthropic API │
│  (React)   │    │  /api/*        │    │  (Claude)      │
└────────────┘    └────────────────┘    └────────────────┘
                         │
        ┌────────────────┼────────────────────────┐
        │                │                        │
        ▼                ▼                        ▼
   memory/         projects/{id}/        execution_workspaces/{id}/
   (global .md)    (project .md)         ├─ repo/    ← Coding Agent
                                         ├─ runs/    ← per-run artifacts
                                         ├─ logs/
                                         ├─ AGENT.md
                                         └─ TASK.md
  • Frontend — React + Vite + TypeScript. Three-column layout
    (project list / chat / context + runs).
  • Backend orchestrator — Python + FastAPI. orchestrator.py
    assembles context from memory; llm.py wraps the Anthropic SDK;
    memory writeback is a second LLM call gated by a policy filter.
  • Memory layer — pure markdown files on disk. Global memory in
    memory/; project memory in projects/{id}/.
  • Execution layerbackend/execution/ contains the sandbox,
    tool runtime, runner, background dispatch manager, run store,
    delegation judge, pending-execution flow, memory reconciliation,
    and on-demand file inspection.

Design philosophy

  • Brain vs. hands. The main agent plans and remembers; the Coding
    Agent executes. They communicate through summaries, not by sharing
    context. This is the single most important architectural rule.
  • Memory as a structured context layer. Knowledge worth remembering
    belongs in named markdown files with stable sections, not in hidden
    conversation buffers. Memory writes are model-proposed but
    policy-filtered before they touch disk.
  • Local-first trusted execution. Everything that runs runs on the
    user’s machine, in a project-scoped sandbox, surfaced through a UI
    the user can read and refresh. No cloud queues, no opaque background
    workers, no implicit cross-project access.
  • Explicit trust boundary before automation. @code and explicit
    user confirmation of a model-proposed plan are the only two paths
    that dispatch a run.

Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • An Anthropic API key

Backend

cd backend
pip install -r requirements.txt
cp .env.example .env
# Edit .env and add ANTHROPIC_API_KEY
uvicorn main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173.

Repository layout

agent-os/
├── frontend/              # React + Vite + TypeScript UI
├── backend/               # Python + FastAPI
│   ├── main.py            # API endpoints
│   ├── orchestrator.py    # context assembly + memory judge + inspect loop
│   ├── llm.py             # Anthropic SDK wrapper
│   ├── database.py        # SQLite (conversations + messages + pending exec)
│   ├── execution/         # sandbox, runner, judges, reconciliation, inspect
│   └── tests/             # backend test suite (stubbed LLM, no API key needed)
├── memory/                # global markdown memory  (private; ships SOUL.md + *.example.md templates + README)
├── projects/              # per-project markdown memory  (private; ships *.example.md templates + README)
├── execution_workspaces/  # Coding Agent workspaces  (private; ships *.example.md templates + README)
├── README.md              # this file (public landing page)
├── ROADMAP.md             # detailed status + task log + next steps
├── ARCHITECTURE.md        # system shape: files, pipelines, invariants
└── CLAUDE.md              # stable operating guide for coding agents

Running the tests

cd backend
python tests/test_delegation_judge.py
python tests/test_pending_execution.py
python tests/test_pending_execution_db.py
python tests/test_memory_reconciliation.py
python tests/test_inspect.py
python tests/test_verification.py
python tests/test_verification_inference.py
python tests/test_browser_verification.py
python tests/test_runner_diagnostics.py

All tests stub the LLM caller, so no Anthropic API key is needed to
run them.

v0.3.3[beta]