Hermes Agent Deployment Guide

🚀 Quick Install

pip install, one command to start, driven entirely by config.yaml

5 min setup

⚙️ config.yaml Reference

Every field explained with best practices and examples for minimal and advanced configs

Full Reference

❌ Troubleshooting

Platform integration failures, Skill not firing, Gateway timeout — diagnosis and fix

Debug Guide

💡 Tips & Tricks

Skill authoring, Memory, multi-agent, hot reload, debugging techniques

Pro Tips

I. Installation

Requirements: Ubuntu 20.04+ / macOS 13+, Python 3.10+, internet connection. No GPU required for CPU inference; GPU (NVIDIA CUDA) accelerates model inference.

Install Hermes CLI

pip install hermes-agent
hermes --version

Tip: Use a virtual environment to avoid system Python pollution: python -m venv hermes-env && source hermes-env/bin/activate && pip install hermes-agent

Initialize Project

hermes init ~/hermes-agent
cd ~/hermes-agent

Generates:

~/hermes-agent/
├── config.yaml   # Main config
├── skills/       # Skill directory (create manually)
├── memory/       # Memory storage
└── logs/         # Log files

Write config.yaml

See Section II for full field reference. Minimal required: agent.model, agent.api_key, gateway.port.

Note: YAML is indentation-sensitive. Use spaces (not tabs), 2 spaces per level.

Start

# Foreground (see live logs)
hermes start

# Daemon (background)
hermes start --daemon

# Check status
hermes status

After start: Gateway listens on http://localhost:8000. API endpoint: /v1/chat. Web UI: /ui (if enabled in config).

II. config.yaml Reference

Field	Type	Description
agent.name	string	Agent name for logs and debugging
agent.model	string	Default model ID: `gpt-4o`, `claude-4-sonnet`, `deepseek-v3`, etc.
agent.api_base	string	API base URL. Leave empty for OpenAI official. For FlowerWolf: `https://api.flowerwolf.net/v1`
agent.api_key	string	Required. Get from flowerwolf.net/token_en.html
agent.max_tokens	int	Max tokens per reply, default 4096. Set to 8192 for complex tasks.
agent.temperature	float	Sampling temperature, 0.0–2.0, default 0.7
agent.system_prompt	string	System prompt defining the Agent's role and behavior. Supports `{{user_name}}`, `{{agent_name}}`, `{{current_time}}`.
gateway.host	string	Bind address, default `0.0.0.0` (accepts all sources)
gateway.port	int	HTTP port, default 8000
gateway.allow_all_users	bool	true = allow all users; false = whitelist only
skills.dir	string	Skill directory, default `./skills`
skills.autoload	bool	Auto-load all Skills on startup
memory.provider	string	Storage backend: `sqlite` (default), `postgres`, `memory`
memory.session_limit	int	Max messages per session before auto-summarization
platforms.feishu.*	bool/string	Feishu integration settings. Requires app_id, app_secret, bot_name.
platforms.telegram.*	bool/string	Telegram integration. Requires bot_token from @BotFather.
platforms.discord.*	bool/string	Discord integration. Requires bot_token and guild_id.
log.level	string	`debug` / `info` / `warn` / `error`
log.file	string	Log file path. Leave blank for stdout only.
cors.enabled	bool	Enable Cross-Origin Resource Sharing for frontend access
cors.origins	list	Allowed origin domains, e.g. `["https://flowerwolf.net"]`

Minimal Config

agent:
  name: my-hermes
  model: gpt-4o
  api_base: https://api.flowerwolf.net/v1
  api_key: your-token-here

gateway:
  host: "0.0.0.0"
  port: 8000
  allow_all_users: true

skills:
  dir: ./skills
  autoload: true

memory:
  provider: sqlite
  session_limit: 30

log:
  level: info

III. Skill Authoring

Skills are Hermes's extension units. Each is a file in skills/, loaded automatically on startup.

name: summarize_text
description: Compresses long text into a summary. Use when user says "summarize this", "too long", "give me the key points"
version: "1.0"
trigger:
  keywords: ["summarize", "summary", "too long", "key points", "compress"]
action: python
script: |
  def summarize(text, ratio=0.3):
      sentences = text.replace("\n", " ").split(".")
      summary = ".".join(sentences[:max(1, int(len(sentences) * ratio))]) + "."
      return summary
  result = summarize(text="{{context}}", ratio=0.3)
  print(result)

Key fields:

• name: Unique identifier, no duplicates.

• description: Tells the Agent when to invoke this Skill. Be specific.

• trigger.keywords: Keywords that increase the likelihood of this Skill being selected.

• action: python (run Python script) or http (make HTTP request).

• {{context}}: Replaced with relevant conversation context before execution.

HTTP Type Skill

name: weather_query
description: Looks up current weather for a city, e.g. "how's the weather in Tokyo"
version: "1.0"
trigger:
  keywords: ["weather", "temperature", "rain", "forecast"]
action: http
script:
  method: GET
  url: "https://api.weather.com/v3/wx/conditions/current?city={{city}}&key=YOUR_KEY"
  headers:
    Accept: application/json

Debug Skills: hermes skills list shows all loaded Skills. hermes skills test summarize_text runs a single Skill in isolation.

IV. Troubleshooting

Feishu: DM Works But Group Messages Don't Trigger

The most common issue. Check in order:

1. In Feishu event subscription, confirm "Use long connection to receive events" is selected (NOT Webhook URL mode).

2. Required permissions: im:message, im:message:send_as_bot.

3. For group messages: set FEISHU_GROUP_POLICY=open or platforms.feishu.group_policy: "open" in config.yaml. Feishu blocks group messages to bots by default.

4. Check logs: hermes logs | grep feishu. If OpenClaw bot under the same app receives group messages fine, the platform config is OK — the issue is Hermes-side.

Gateway Won't Start / Port Already in Use

Find what's using the port: ss -tlnp | grep 8000. If it's a previous Hermes, hermes stop. Otherwise kill <PID> or change the port in config: gateway.port: 8001.

Skill Loaded But Never Invoked

Two things to verify: 1) hermes skills list shows the Skill. 2) The Skill's description is specific enough — the Agent uses description text to decide when to call a Skill. Too vague = never triggered.

You can also trigger manually: say "use the summarize_text Skill on this passage".

API Errors: 403 / 429 / 500

403: Invalid API key or zero balance. Check key spelling and Token balance.

429: Rate limit exceeded. FlowerWolf has RPM limits; high-volume usage needs queuing or plan upgrade.

500: Upstream API error. Check Hermes logs for details. Usually transient — retry after a few seconds.

Memory Growing Without Bound

SQLite mode appends indefinitely. Set memory.session_limit to trigger auto-summarization. To disable memory entirely: memory.provider: none.

CORS Error / Frontend Can't Access API

Browser enforces CORS when calling Hermes API directly. Enable in config: cors.enabled: true, cors.origins: ["*"] for dev (or specific domain for prod).

hermes start Produces No Output

No stdout in foreground usually means a config error. Try hermes start -v for verbose output. Or run python -m hermes directly to see Python-level errors.

V. Tips & Tricks

Memory-Driven User Profiles

Beyond conversation history, Hermes Memory can inject custom context per user. Set memory.inject_user_profile to store per-user preferences and history — the Agent sees them on every reply, like a persistent memory of each user.

Cron Jobs for Automation

Schedule tasks directly in config.yaml:

cron:
  "0 9 * * 1-5": "daily_briefing"    # Weekdays at 9am
  "0 */2 * * *": "check_alerts"      # Every 2 hours
  "0 0 * * 0": "weekly_report"       # Sunday midnight

Task names reference Skills. Results are pushed to configured channels (Feishu/TG/Discord).

Multi-Agent: Division of Labor

Configure multiple agents[] in config.yaml, each with its own model, skills, and responsibility. A Router Agent dispatches based on user intent: agents[0].name: "coder" for code review, agents[1].name: "support" for customer service.

Custom Webhook: Connect Any Platform

Hermes supports custom webhooks via platforms.custom. Any platform with webhook support (Slack, DingTalk, WeCom, etc.) can be connected — Hermes POSTs messages to your URL in a fixed format.

Skill Hot Reload

Add or update Skills without restarting Hermes: hermes skills reload rescans skills/ and reloads everything. New and modified Skills take effect immediately. Pair with log.level: debug to see detailed loading logs.

Streaming Output (SSE)

Enable agent.stream: true in config. API requests use Server-Sent Events — frontends can display AI output word-by-word as it arrives, like ChatGPT.

VI. FAQ

Hermes vs OpenClaw — what's the difference?

Both are Agent frameworks with different architectural philosophies. OpenClaw is more "all-in-one" — config in one file, all platform integrations built-in. Hermes is more "modular" — Skills are separate files, platform config is more flexible. OpenClaw is better for quick setup; Hermes for deep customization.

How many Hermes instances per server?

Each instance listens on one port, so resource limits are the constraint. Memory-only: 3–5 instances per 16GB RAM. With GPU inference: 3090/4090 24GB handles ~2 GPU-accelerated instances.

Does Hermes support streaming?

Yes. Set agent.stream: true. Uses Server-Sent Events (SSE) — frontends can render output token-by-token.

How to upgrade Hermes?

pip install hermes-agent --upgrade. Stop with hermes stop first. Config fields are generally backward-compatible; breaking changes are noted in the release notes.

Can Hermes integrate with our existing Feishu / DingTalk?

Absolutely. Hermes provides a standard HTTP API (/v1/chat) and WebSocket interface. Any messaging platform can trigger Hermes via bot events, and Hermes posts replies back through the bot API. See each platform's Bot development docs for the incoming event side.