A 6-layer memory operating system for Hermes Agent — persistent memory with Qdrant, structured facts, fabric recall, auto-curated wiki, and surgical context injection. Runs locally, any LLM provider.

Your agent finally stops forgetting.
Permanent memory. Local memory infrastructure. API-provider agnostic. Surgically token-efficient.
Six memory layers. Automatic, intelligent context injection. Structured facts with trust scoring. A self-curating wiki pipeline. Semantic search across every conversation you’ve ever had.
Memory OS turns Hermes Agent into a real long-term collaborator — one that remembers your projects, your decisions, your reasoning, and brings exactly the right context back at exactly the right moment. Like talking to a colleague who was there for every session.
Memory infrastructure runs entirely on your machine. Works with any LLM provider — OpenRouter, OpenAI, Anthropic, Ollama, or local models. No memory subscription. No vendor lock-in.
You spend hours configuring the agent, teaching it your preferences, solving hard problems together — and in the next session it acts like it’s meeting you for the first time.
After months of hitting these walls in production, I built something that actually works.
Not just another plugin. A complete memory operating system — 7 layers working in concert, from flat files to a vector database, with surgical context injection, a knowledge pipeline that organizes itself, and an explicit Ground Truth hierarchy that tells the agent to actually use the injected memory.
Designed and refined by someone who ran headfirst into every limitation of stock Hermes and every existing memory solution.
Requirements: Hermes Agent + Docker (Qdrant + Redis + ARQ Worker) + Python 3.11+.
Compatible with any LLM provider Hermes supports — OpenRouter, OpenAI, Anthropic, Ollama, and more.
┌──────────────────────────────────────────────────────────────────┐
│ LAYER 1 · WORKSPACE │
│ MEMORY.md · USER.md · CREATIVE.md │
│ → Injected into the system prompt every single turn │
├──────────────────────────────────────────────────────────────────┤
│ LAYER 2 · SESSIONS │
│ state.db (SQLite + FTS5) │
│ → Full-text search across your entire conversation history │
├──────────────────────────────────────────────────────────────────┤
│ LAYER 3 · STRUCTURED FACTS │
│ memory_store.db (SQLite + HRR + FTS5 + trust scoring) │
│ → Durable facts with entity resolution and an automatic │
│ feedback loop that trains trust scores over time │
├──────────────────────────────────────────────────────────────────┤
│ LAYER 4 · FABRIC (CROSS-SESSION) │
│ Icarus Plugin (heavily forked) │
│ → LLM-powered session extraction + multi-source injection │
│ → 16 tools: fabric_recall, fabric_write, fabric_brief, etc. │
├──────────────────────────────────────────────────────────────────┤
│ LAYER 5 · VECTOR DATABASE │
│ Qdrant (4096d Cosine + BM25 sparse) │
│ → 4-level fallback: hybrid → dense → lexical → SQLite │
│ → Weekly decay scanner + semantic dedup (cosine >0.92 → merge) │
├──────────────────────────────────────────────────────────────────┤
│ LAYER 6 · LLM WIKI │
│ Auto-curated vault: concepts/ · entities/ · comparisons/ │
│ → Continuously ingested into Qdrant via wiki-continuous-ingest │
├──────────────────────────────────────────────────────────────────┤
│ ⚡ LAYER 7 · GROUND TRUTH HIERARCHY (identity layer) │
│ SOUL.md · rulebook.md │
│ → Tells the agent that injected memory is authoritative │
│ → Without this, layers 2-6 deliver context the agent ignores │
└──────────────────────────────────────────────────────────────────┘
How it flows:
pre_llm_call → surgical recall from all four sources (Fabric + Qdrant + Sessions + Facts)
But recall is not enough. The agent must be explicitly instructed to treat this injected context as authoritative. That’s what Layer 7 provides — without it, the agent rediscovers knowledge that’s already in the prompt.
post_llm_call + on_session_end → automatic learning extraction and capture
Each source is gated by relevance thresholds. Per-session deduplication prevents the same context from appearing twice. A social-closer filter skips trivial messages entirely. No padding. No firehose. The LLM gets exactly what it needs — nothing more.
Layers 1-6 ensure memory is captured, stored, and injected. Layer 7 ensures the injected memory is used.
Without the Ground Truth hierarchy:
fabric_recall to re-find themsession_search to re-discover itfact_store to confirm themThe result: memory-zero behavior despite perfect injection. Every rediscovery burns tokens, context, and time.
→ Read Layer 7: Ground Truth Hierarchy — the critical fix.
| Aspect | Stock Hermes | Memory OS |
|---|---|---|
| Workspace memory | MEMORY.md + USER.md | + CREATIVE.md + intelligent injection |
| Session memory | Basic state.db | + FTS5 full-text search + session injection |
| Structured facts | Not present | Fact store + trust scoring + feedback loop |
| Cross-session recall | Limited | Fabric fork + multi-source injection |
| Vector search | Not present | Qdrant hybrid + 4-level fallback cascade |
| Cleanup and deduplication | Not present | Decay scanner + semantic dedup + archival |
| Knowledge pipeline | Not present | Self-curating LLM Wiki |
| Ground Truth hierarchy | Not present | Injected memory ranked as authoritative; agent must use context provided |
| Token efficiency | — | Surgical: gated retrieval + per-session dedup + no wasted rediscovery |
| Infrastructure | — | Local memory stack (Qdrant + Redis + ARQ) + any LLM provider |
Because almost every modern memory solution is cloud-first. If you want real, private memory infrastructure running on your own machine — with no cloud memory subscription, full provider flexibility, and no data leaving your local stack — none of them deliver what Memory OS delivers.
| Memory OS | mem0 | Zep | Letta | |
|---|---|---|---|---|
| Local memory infrastructure | ✓ | ✗ | ✗ | ✗ |
| No memory subscription | ✓ | ✗ | ✗ | ✗ |
| Provider agnostic (OpenRouter, Ollama…) | ✓ | Partial | Partial | Partial |
| Hermes-native | ✓ | ✗ | ✗ | ✗ |
| Structured facts + trust scores | ✓ | Partial | ✗ | ✗ |
| Self-curating wiki | ✓ | ✗ | ✗ | ✗ |
| Intelligent decay + archival | ✓ | ✗ | ✗ | ✗ |
| Ground Truth hierarchy | ✓ | ✗ | ✗ | ✗ |
Icarus Plugin (heavily modified fork) — bundled in icarus/
The upstream esaradev/icarus-plugin is the base, but this fork is not upstream-compatible. Key additions: LLM-powered session extraction (replaces text[:500] truncation), multi-source injection (Qdrant + sessions + facts — upstream is fabric only), CREATIVE.md isolation (fixes § delimiter corruption from dual-writer conflict), backtick sanitization, system injection filter, and social closer detection.
Vault Curator v3 — ClaudioDrews/vault-curator
Frontmatter enrichment, semantic linking, and MOC index generation for the wiki layer.
For people who take Hermes Agent seriously.
For people who want an agent that actually evolves over time — one that doesn’t need the world re-explained every session.
For people who value clean engineering, extreme efficiency, and solutions that hold up in real local production.
If you’re like me — tired of amnesiac agents — Memory OS was built for you.
Want to see the agent remember for real?
Clone it, run it, feel the difference.
→ Setup guide · Layer deep-dives · Infrastructure docs · Operational skills · License
MIT License · Built with obsession by someone who runs Hermes every single day.