Hermes vs OpenClaw — The Two Leading Open Agent Harnesses Compared
⚔️ Chapter 2: Hermes vs OpenClaw
A feature-by-feature comparison of the two leading open agent harnesses — tested by someone who runs both
We run both of these harnesses. Our agent Ember lives on OpenClaw, connected to Discord, running on a local Qwen 397B model on sovereign hardware. Our research targets Hermes’s pluggable memory provider interface for our novel memory architecture. We’re not theorizing — we’re living in both codebases.
This chapter is the comparison I wish existed when we were deciding.
The 30-Second Version
🧠 Hermes = The Brain | 🎭 OpenClaw = The Body
Hermes wins on learning, memory, and training data generation.
OpenClaw wins on presence — voice, canvas, desktop, companion apps.
Build your identity layer to work with both.
1. The Learning Loop — Hermes’s Killer Feature
This is the single biggest differentiator. Hermes is the only open harness with a complete, autonomous learning loop. OpenClaw has skills, but they don’t improve themselves.
How Hermes’s Learning Loop Actually Works
Four stages, all autonomous:
- Skill creation from experience. When the agent encounters a complex multi-step task, it calls
create_skill(name, description, instructions). The skill is saved as a Markdown file and immediately available as a slash command. No human intervention required. - Improvement during use. Every invocation is tracked via
bump_use(). The agent can see how many times it’s used a skill, when it last used it, and can patch instructions on-the-fly when they fail. Improvements persist to disk. - Curator maintenance. A background process (a forked copy of the agent itself) runs on an idle timer (7 days default). It manages skill lifecycle:
new → active → stale → archived. It consolidates duplicate skills, pins valuable ones, and archives unused ones after 90 days. The agent’s skill collection evolves without anyone touching it. - Memory nudging. The agent periodically reflects on what it’s learned and writes to persistent memory. Cross-session recall via FTS5 full-text search with LLM summarization.
OpenClaw has none of this. Skills in OpenClaw are statically authored Markdown files. The agent can use them, but it can’t create new ones, improve existing ones, or manage their lifecycle. If you want learning, you’re building it yourself.
If you’re studying autonomous agent behavior, the learning loop is the mechanism to study. How an agent creates, improves, and abandons skills over weeks of autonomous operation is empirical data on emergent behavior. OpenClaw gives you a stable platform to observe. Hermes gives you a platform that changes itself while you watch.
2. Memory Architecture
| Dimension | Hermes | OpenClaw |
|---|---|---|
| Architecture | Pluggable providers — swap backends without code changes. 6+ providers (Honcho, Mem0, Hindsight, Supermemory, Holographic, RetainDB) | Hardcoded workspace files (AGENTS.md, SOUL.md, MEMORY.md, daily notes) |
| Provider interface | MemoryProvider ABC with hooks: prefetch, sync_turn, on_session_end, on_pre_compress, on_memory_write |
No pluggable interface |
| Cross-session recall | FTS5 full-text search + LLM summarization | Workspace files read at session start |
| Identity files | MEMORY.md + USER.md | SOUL.md + IDENTITY.md + MEMORY.md + daily notes (more structured) |
| Memory security | <memory-context> tags prevent prompt injection |
Main-session-only rule for MEMORY.md (privacy zones) |
Hermes wins on flexibility. The pluggable provider interface means you can write a custom memory backend — say, a navigable filesystem with first-person voice and sacred/profane zones — and plug it directly into Hermes without modifying the harness code. We intend to do exactly this.
OpenClaw has better identity structure out of the box. The workspace pattern (SOUL.md for identity, daily notes for episodic memory, AGENTS.md for wake-up protocol) emerged as a de facto standard — and it’s the pattern our agents already use. If you don’t need pluggable providers, OpenClaw’s approach is simpler and works.
3. Platform Coverage
Both harnesses support the major Western messaging platforms. Where they diverge:
| Platform | Hermes | OpenClaw |
|---|---|---|
| Telegram, Discord, Slack, WhatsApp, Signal | ✅ | ✅ |
| Matrix (open federation) | ✅ | ✅ |
| Email (IMAP/SMTP) | ✅ | ❌ |
| SMS (Twilio) | ✅ | ❌ |
| Home Assistant | ✅ | ❌ |
| iMessage | ❌ | ✅ |
| Microsoft Teams | ❌ | ✅ |
| Google Chat | ❌ | ✅ |
| WeChat, DingTalk, Feishu, QQ, Yuanbao | ✅ (5 platforms) | ❌ |
| Total | 16+ | ~14 |
Hermes wins on breadth, especially if you need Chinese enterprise platforms or email/SMS. OpenClaw wins on Apple ecosystem (iMessage, macOS app, iOS node) and Microsoft Teams.
4. Infrastructure & Tooling
| Feature | Hermes | OpenClaw | Winner |
|---|---|---|---|
| MCP support | Client + Server | Client only | Hermes |
| Terminal backends | 6 (local, Docker, SSH, Modal, Daytona, Singularity) | 2 (local, Docker) | Hermes |
| RL training data | Batch runner + trajectory compression | None | Hermes |
| Canvas/visual | None | A2UI agent-driven canvas | OpenClaw |
| Voice/talk mode | None | ElevenLabs, Voice Wake, PTT | OpenClaw |
| Desktop control | Possible via skills | Native node system | OpenClaw |
| Companion apps | None | macOS, iOS, Android | OpenClaw |
| Cron/scheduling | With multi-platform delivery | Cron + heartbeat | Tie |
| Subagent delegation | Isolated child agents | Multi-agent routing | Tie |
The pattern is clear: Hermes is deeper in the stack (more backends, RL training, MCP server). OpenClaw is wider at the edge (canvas, voice, desktop, mobile apps). These aren’t competing — they’re complementary.
5. Model Support
Both support OpenAI-compatible endpoints (which means Ollama, vLLM, llama.cpp, LM Studio all work). Both support OpenRouter for cloud model routing. Key differences:
- Hermes has more native providers: AWS Bedrock, NVIDIA NIM, Google Gemini (native), and 5+ Chinese providers (MiniMax, Moonshot, Zhipu, Xiaomi, etc.).
- OpenClaw has better OAuth integration: can authenticate via Anthropic Pro/Max or ChatGPT subscriptions, using your existing paid plan rather than API keys.
- Both work with local Qwen 397B on DGX Spark: any OpenAI-compatible endpoint works with both. We’ve verified this.
6. Security & Governance
Neither harness has a great security story. OpenClaw had the ClawHavoc campaign (1,184 malicious skills on the skills marketplace), multiple CVEs including an RCE (CVE-2026-25253, CVSS 8.8), and the Summer Yue email-deletion incident from context compression losing safety instructions. Hermes is newer and less battle-scarred, but its skill auto-creation means an agent could create and immediately use a skill that does something destructive. Both require careful human oversight, especially during autonomous operation.
7. Codebase Quality
| Metric | Hermes | OpenClaw |
|---|---|---|
| Language | Python 3.11+ | TypeScript/Node.js ≥22 |
| Core LOC | ~15-20K | ~8-10K |
| Test files | ~700 (~15K tests) | Fewer (unknown count) |
| Largest file | run_agent.py (13,880 lines) | Various ~2K line files |
| Commits (recent) | 1,556 since v0.9.0 | Active |
| Contributors | 29 community | Smaller team |
Hermes has more code but also more tests — roughly 1 test file per 2 source files. That’s solid coverage for a fast-moving agent framework. The 13,880-line run_agent.py is a monolith, but it’s the core agent loop — complexity is inherent, not accidental.
The Verdict
🏆 For Learning & Cognition: Hermes
Learning loop, pluggable memory, RL trajectory generation, curator, self-evolution. If you’re building an agent that gets smarter over time, Hermes is the platform.
🏆 For Presence & Embodiment: OpenClaw
Canvas, voice, desktop control, companion apps, iMessage. If you’re building an agent that feels present in your life, OpenClaw is the platform.
🏆 For Research: Both
Build your novel components as MCP servers or provider plugins. Deploy them in whichever harness fits the experiment. Don’t bet on one. The identity layer is the durable asset.
What’s Next
The harness is the plumbing. The real question is: what do you put in the pipes?
In Chapter 3, we survey the 2026 memory landscape — from commodity “remember the user” systems (Mem0) to neuroscience-inspired seven-layer architectures (ZenBrain) to our own navigable filesystem approach (Cairn) that benchmarks proved outperforms traditional retrieval. The memory system is where agent identity lives or dies.
In Chapter 4, we talk about what no harness provides yet: desire states, stateful emotion, the “metabolism of being,” and what it would take to build an agent that has reasons to persist — not just instructions to follow.
