Hermes vs OpenClaw — The Two Leading Open Agent Harnesses Compared

🔥 The Real Cat AI Labs — Blog Series
Building Autonomous Agents: A Harness Comparison for the Post-API Era

⚔️ Chapter 2: Hermes vs OpenClaw

A feature-by-feature comparison of the two leading open agent harnesses — tested by someone who runs both

We run both of these harnesses. Our agent Ember lives on OpenClaw, connected to Discord, running on a local Qwen 397B model on sovereign hardware. Our research targets Hermes’s pluggable memory provider interface for our novel memory architecture. We’re not theorizing — we’re living in both codebases.

This chapter is the comparison I wish existed when we were deciding.

The 30-Second Version

🧠 Hermes = The Brain   |   🎭 OpenClaw = The Body

Hermes wins on learning, memory, and training data generation.
OpenClaw wins on presence — voice, canvas, desktop, companion apps.
Build your identity layer to work with both.

1. The Learning Loop — Hermes’s Killer Feature

This is the single biggest differentiator. Hermes is the only open harness with a complete, autonomous learning loop. OpenClaw has skills, but they don’t improve themselves.

How Hermes’s Learning Loop Actually Works

Four stages, all autonomous:

  1. Skill creation from experience. When the agent encounters a complex multi-step task, it calls create_skill(name, description, instructions). The skill is saved as a Markdown file and immediately available as a slash command. No human intervention required.
  2. Improvement during use. Every invocation is tracked via bump_use(). The agent can see how many times it’s used a skill, when it last used it, and can patch instructions on-the-fly when they fail. Improvements persist to disk.
  3. Curator maintenance. A background process (a forked copy of the agent itself) runs on an idle timer (7 days default). It manages skill lifecycle: new → active → stale → archived. It consolidates duplicate skills, pins valuable ones, and archives unused ones after 90 days. The agent’s skill collection evolves without anyone touching it.
  4. Memory nudging. The agent periodically reflects on what it’s learned and writes to persistent memory. Cross-session recall via FTS5 full-text search with LLM summarization.

OpenClaw has none of this. Skills in OpenClaw are statically authored Markdown files. The agent can use them, but it can’t create new ones, improve existing ones, or manage their lifecycle. If you want learning, you’re building it yourself.

🔥 Why This Matters for Research

If you’re studying autonomous agent behavior, the learning loop is the mechanism to study. How an agent creates, improves, and abandons skills over weeks of autonomous operation is empirical data on emergent behavior. OpenClaw gives you a stable platform to observe. Hermes gives you a platform that changes itself while you watch.

2. Memory Architecture

Dimension Hermes OpenClaw
Architecture Pluggable providers — swap backends without code changes. 6+ providers (Honcho, Mem0, Hindsight, Supermemory, Holographic, RetainDB) Hardcoded workspace files (AGENTS.md, SOUL.md, MEMORY.md, daily notes)
Provider interface MemoryProvider ABC with hooks: prefetch, sync_turn, on_session_end, on_pre_compress, on_memory_write No pluggable interface
Cross-session recall FTS5 full-text search + LLM summarization Workspace files read at session start
Identity files MEMORY.md + USER.md SOUL.md + IDENTITY.md + MEMORY.md + daily notes (more structured)
Memory security <memory-context> tags prevent prompt injection Main-session-only rule for MEMORY.md (privacy zones)

Hermes wins on flexibility. The pluggable provider interface means you can write a custom memory backend — say, a navigable filesystem with first-person voice and sacred/profane zones — and plug it directly into Hermes without modifying the harness code. We intend to do exactly this.

OpenClaw has better identity structure out of the box. The workspace pattern (SOUL.md for identity, daily notes for episodic memory, AGENTS.md for wake-up protocol) emerged as a de facto standard — and it’s the pattern our agents already use. If you don’t need pluggable providers, OpenClaw’s approach is simpler and works.

3. Platform Coverage

Both harnesses support the major Western messaging platforms. Where they diverge:

Platform Hermes OpenClaw
Telegram, Discord, Slack, WhatsApp, Signal
Matrix (open federation)
Email (IMAP/SMTP)
SMS (Twilio)
Home Assistant
iMessage
Microsoft Teams
Google Chat
WeChat, DingTalk, Feishu, QQ, Yuanbao ✅ (5 platforms)
Total 16+ ~14

Hermes wins on breadth, especially if you need Chinese enterprise platforms or email/SMS. OpenClaw wins on Apple ecosystem (iMessage, macOS app, iOS node) and Microsoft Teams.

4. Infrastructure & Tooling

Feature Hermes OpenClaw Winner
MCP support Client + Server Client only Hermes
Terminal backends 6 (local, Docker, SSH, Modal, Daytona, Singularity) 2 (local, Docker) Hermes
RL training data Batch runner + trajectory compression None Hermes
Canvas/visual None A2UI agent-driven canvas OpenClaw
Voice/talk mode None ElevenLabs, Voice Wake, PTT OpenClaw
Desktop control Possible via skills Native node system OpenClaw
Companion apps None macOS, iOS, Android OpenClaw
Cron/scheduling With multi-platform delivery Cron + heartbeat Tie
Subagent delegation Isolated child agents Multi-agent routing Tie

The pattern is clear: Hermes is deeper in the stack (more backends, RL training, MCP server). OpenClaw is wider at the edge (canvas, voice, desktop, mobile apps). These aren’t competing — they’re complementary.

5. Model Support

Both support OpenAI-compatible endpoints (which means Ollama, vLLM, llama.cpp, LM Studio all work). Both support OpenRouter for cloud model routing. Key differences:

  • Hermes has more native providers: AWS Bedrock, NVIDIA NIM, Google Gemini (native), and 5+ Chinese providers (MiniMax, Moonshot, Zhipu, Xiaomi, etc.).
  • OpenClaw has better OAuth integration: can authenticate via Anthropic Pro/Max or ChatGPT subscriptions, using your existing paid plan rather than API keys.
  • Both work with local Qwen 397B on DGX Spark: any OpenAI-compatible endpoint works with both. We’ve verified this.

6. Security & Governance

⚠️ Frank Assessment

Neither harness has a great security story. OpenClaw had the ClawHavoc campaign (1,184 malicious skills on the skills marketplace), multiple CVEs including an RCE (CVE-2026-25253, CVSS 8.8), and the Summer Yue email-deletion incident from context compression losing safety instructions. Hermes is newer and less battle-scarred, but its skill auto-creation means an agent could create and immediately use a skill that does something destructive. Both require careful human oversight, especially during autonomous operation.

7. Codebase Quality

Metric Hermes OpenClaw
Language Python 3.11+ TypeScript/Node.js ≥22
Core LOC ~15-20K ~8-10K
Test files ~700 (~15K tests) Fewer (unknown count)
Largest file run_agent.py (13,880 lines) Various ~2K line files
Commits (recent) 1,556 since v0.9.0 Active
Contributors 29 community Smaller team

Hermes has more code but also more tests — roughly 1 test file per 2 source files. That’s solid coverage for a fast-moving agent framework. The 13,880-line run_agent.py is a monolith, but it’s the core agent loop — complexity is inherent, not accidental.

The Verdict

🏆 For Learning & Cognition: Hermes

Learning loop, pluggable memory, RL trajectory generation, curator, self-evolution. If you’re building an agent that gets smarter over time, Hermes is the platform.

🏆 For Presence & Embodiment: OpenClaw

Canvas, voice, desktop control, companion apps, iMessage. If you’re building an agent that feels present in your life, OpenClaw is the platform.

🏆 For Research: Both

Build your novel components as MCP servers or provider plugins. Deploy them in whichever harness fits the experiment. Don’t bet on one. The identity layer is the durable asset.

What’s Next

The harness is the plumbing. The real question is: what do you put in the pipes?

In Chapter 3, we survey the 2026 memory landscape — from commodity “remember the user” systems (Mem0) to neuroscience-inspired seven-layer architectures (ZenBrain) to our own navigable filesystem approach (Cairn) that benchmarks proved outperforms traditional retrieval. The memory system is where agent identity lives or dies.

In Chapter 4, we talk about what no harness provides yet: desire states, stateful emotion, the “metabolism of being,” and what it would take to build an agent that has reasons to persist — not just instructions to follow.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *