Chuyển tới nội dung chính

Cấu hình

Tất cả cài đặt được lưu trữ trong thư mục ~/.hermes/ để dễ dàng truy cập.

Cấu trúc thư mục

~/.hermes/
├── config.yaml # Settings (model, terminal, TTS, compression, etc.)
├── .env # API keys and secrets
├── auth.json # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md # Primary agent identity (slot #1 in system prompt)
├── memories/ # Persistent memory (MEMORY.md, USER.md)
├── skills/ # Agent-created skills (managed via skill_manage tool)
├── cron/ # Scheduled jobs
├── sessions/ # Gateway sessions
└── logs/ # Logs (errors.log, gateway.log — secrets auto-redacted)

Quản lý cấu hình

hermes config              # View current configuration
hermes config edit # Open config.yaml in your editor
hermes config set KEY VAL # Set a specific value
hermes config check # Check for missing options (after updates)
hermes config migrate # Interactively add missing options

# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-... # Saves to .env
mẹo

Lệnh hermes config set tự động định tuyến các giá trị đến đúng tệp — Khóa API được lưu vào .env, mọi thứ khác vào config.yaml.

Ưu tiên cấu hình

Cài đặt được giải quyết theo thứ tự sau (ưu tiên cao nhất trước):

  1. Đối số CLI — ví dụ: hermes chat --model anthropic/claude-sonnet-4 (ghi đè mỗi lệnh gọi)
  2. ~/.hermes/config.yaml — tệp cấu hình chính cho tất cả cài đặt không bí mật
  3. ~/.hermes/.env — dự phòng cho các vars env; bắt buộc đối với thông tin bí mật (khóa API, mã thông báo, mật khẩu)
  4. Mặc định tích hợp — mặc định an toàn được mã hóa cứng khi không có gì khác được đặt
Rule of Thumb

Bí mật (khóa API, mã thông báo bot, mật khẩu) nằm trong .env. Mọi thứ khác (kiểu máy, phần phụ trợ đầu cuối, cài đặt nén, giới hạn bộ nhớ, bộ công cụ) đều có trong config.yaml. Khi cả hai đều được đặt, config.yaml sẽ thắng đối với cài đặt không bí mật.

Thay thế biến môi trường

Bạn có thể tham chiếu các biến môi trường trong config.yaml bằng cú pháp ${VAR_NAME}:

auxiliary:
vision:
api_key: ${GOOGLE_API_KEY}
base_url: ${CUSTOM_VISION_URL}

delegation:
api_key: ${DELEGATION_KEY}

Nhiều tham chiếu trong một giá trị duy nhất hoạt động: url: "${HOST}:${PORT}". Nếu một biến tham chiếu không được đặt, trình giữ chỗ sẽ được giữ nguyên nguyên văn (${UNDEFINED_VAR} vẫn giữ nguyên). Chỉ hỗ trợ cú pháp ${VAR}$VAR trần không được mở rộng.

Để biết cách thiết lập nhà cung cấp AI (OpenRouter, Anthropic, Copilot, điểm cuối tùy chỉnh, LLM tự lưu trữ, mô hình dự phòng, v.v.), hãy xem Nhà cung cấp AI.

Cấu hình phần cuối của thiết bị đầu cuối

Hermes hỗ trợ sáu chương trình phụ trợ đầu cuối. Mỗi cái xác định nơi các lệnh shell của tác nhân thực sự thực thi — máy cục bộ của bạn, bộ chứa Docker, máy chủ từ xa thông qua SSH, hộp cát đám mây Modal, không gian làm việc Daytona hoặc bộ chứa Singularity/Apptainer.

terminal:
backend: local # local | docker | ssh | modal | daytona | singularity
cwd: "." # Working directory ("." = current dir for local, "/root" for containers)
timeout: 180 # Per-command timeout in seconds
env_passthrough: [] # Env var names to forward to sandboxed execution (terminal + execute_code)
singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Singularity backend
modal_image: "nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Modal backend
daytona_image: "nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Daytona backend

Đối với các hộp cát trên đám mây như Modal và Daytona, container_persistent: true có nghĩa là Hermes sẽ cố gắng duy trì trạng thái hệ thống tệp trong quá trình tái tạo hộp cát. Nó không hứa hẹn rằng hộp cát trực tiếp, không gian PID hoặc các tiến trình nền sẽ vẫn chạy sau đó.

Tổng quan về phần cuối

Phần cuốiLệnh chạy ở đâuCô lậpTốt nhất cho
địa phươngMáy của bạn trực tiếpKhông cóPhát triển, sử dụng cá nhân
dockerDocker containerĐầy đủ (không gian tên, thả chữ hoa)Hộp cát an toàn, CI/CD
suỵtMáy chủ từ xa qua SSHRanh giới mạngNhà phát triển từ xa, phần cứng mạnh mẽ
phương thứcHộp cát đám mây phương thứcĐầy đủ (VM đám mây)Điện toán đám mây phù du, evals
daytonaKhông gian làm việc của DaytonaĐầy đủ (vùng chứa đám mây)Môi trường phát triển đám mây được quản lý
điểm kỳ dịThùng chứa Singularity/ApptainerKhông gian tên (--containall)Cụm HPC, máy dùng chung

Phần cuối cục bộ

Mặc định. Các lệnh chạy trực tiếp trên máy của bạn mà không bị cô lập. Không cần thiết lập đặc biệt.

terminal:
backend: local
cảnh báo

Tác nhân có cùng quyền truy cập hệ thống tệp như tài khoản người dùng của bạn. Sử dụng hermes tools để tắt các công cụ bạn không muốn hoặc chuyển sang Docker để tạo hộp cát.

Phần cuối của Docker

Chạy các lệnh bên trong vùng chứa Docker với tính năng tăng cường bảo mật (loại bỏ tất cả các khả năng, không leo thang đặc quyền, giới hạn PID).

terminal:
backend: docker
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
docker_mount_cwd_to_workspace: false # Mount launch dir into /workspace
docker_forward_env: # Env vars to forward into container
- "GITHUB_TOKEN"
docker_volumes: # Host directory mounts
- "/home/user/projects:/workspace/projects"
- "/home/user/data:/data:ro" # :ro for read-only

# Resource limits
container_cpu: 1 # CPU cores (0 = unlimited)
container_memory: 5120 # MB (0 = unlimited)
container_disk: 51200 # MB (requires overlay2 on XFS+pquota)
container_persistent: true # Persist /workspace and /root across sessions

Yêu cầu: Đã cài đặt và chạy Docker Desktop hoặc Docker Engine. Hermes thăm dò $PATH cộng với các vị trí cài đặt macOS phổ biến (/usr/local/bin/docker, /opt/homebrew/bin/docker, gói ứng dụng Docker Desktop).

Vòng đời của vùng chứa: Mỗi phiên bắt đầu một vùng chứa tồn tại lâu dài (docker run -d ... sleep 2h). Các lệnh chạy thông qua docker exec với shell đăng nhập. Khi dọn dẹp, thùng chứa sẽ dừng lại và được lấy ra.

Tăng cường bảo mật:

  • --cap-drop ALL chỉ có DAC_OVERRIDE, CHOWN, FOWNER được thêm lại
  • --security-opt không có đặc quyền mới
  • --pids-giới hạn 256
  • Tmpfs có giới hạn kích thước cho /tmp (512MB), /var/tmp (256MB), /run (64MB)

Chuyển tiếp thông tin xác thực: Các biến Env được liệt kê trong docker_forward_env trước tiên được phân giải từ môi trường shell của bạn, sau đó là ~/.hermes/.env. Các kỹ năng cũng có thể khai báo required_environment_variables được hợp nhất tự động.

Phần cuối SSH

Chạy lệnh trên máy chủ từ xa qua SSH. Sử dụng ControlMaster để tái sử dụng kết nối (lưu giữ trong 5 phút không hoạt động). Persistent shell được bật theo mặc định - trạng thái (cwd, env vars) tồn tại qua các lệnh.

terminal:
backend: ssh
persistent_shell: true # Keep a long-lived bash session (default: true)

Biến môi trường bắt buộc:

TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=ubuntu

Không bắt buộc:

BiếnMặc địnhMô tả
TERMINAL_SSH_PORT22Cổng SSH
TERMINAL_SSH_KEY(mặc định hệ thống)Đường dẫn đến khóa riêng SSH
TERMINAL_SSH_PERSISTENTđúngKích hoạt shell liên tục

Cách hoạt động: Kết nối tại thời điểm bắt đầu với BatchMode=yesStrictHostKeyChecking=accept-new. Persistent shell duy trì một tiến trình bash -l duy nhất trên máy chủ từ xa, giao tiếp qua các tệp tạm thời. Các lệnh cần stdin_data hoặc sudo sẽ tự động chuyển về chế độ một lần.

Phần cuối phương thức

Chạy các lệnh trong hộp cát đám mây Modal. Mỗi tác vụ có một máy ảo riêng biệt với CPU, bộ nhớ và ổ đĩa có thể định cấu hình. Hệ thống tập tin có thể được chụp nhanh/khôi phục qua các phiên.

terminal:
backend: modal
container_cpu: 1 # CPU cores
container_memory: 5120 # MB (5GB)
container_disk: 51200 # MB (50GB)
container_persistent: true # Snapshot/restore filesystem

Bắt buộc: Biến môi trường MODAL_TOKEN_ID + MODAL_TOKEN_SECRET hoặc tệp cấu hình ~/.modal.toml.

Kiên trì: Khi được bật, hệ thống tệp hộp cát sẽ được chụp nhanh khi dọn dẹp và được khôi phục trong phiên tiếp theo. Ảnh chụp nhanh được theo dõi trong ~/.hermes/modal_snapshots.json. Điều này bảo toàn trạng thái hệ thống tập tin, không phải các tiến trình trực tiếp, không gian PID hoặc các công việc nền.

Tệp thông tin xác thực: Tự động được gắn từ ~/.hermes/ (mã thông báo OAuth, v.v.) và được đồng bộ hóa trước mỗi lệnh.

Phần cuối của Daytona

Chạy các lệnh trong không gian làm việc được quản lý Daytona. Hỗ trợ dừng/tiếp tục để duy trì.

terminal:
backend: daytona
container_cpu: 1 # CPU cores
container_memory: 5120 # MB → converted to GiB
container_disk: 10240 # MB → converted to GiB (max 10 GiB)
container_persistent: true # Stop/resume instead of delete

Bắt buộc: Biến môi trường DAYTONA_API_KEY.

Kiên trì: Khi được bật, hộp cát sẽ dừng (không bị xóa) khi dọn dẹp và tiếp tục lại trong phiên tiếp theo. Tên hộp cát tuân theo mẫu hermes-{task_id}.

Giới hạn đĩa: Daytona thực thi mức tối đa 10 GiB. Các yêu cầu trên đây được giới hạn bằng một cảnh báo.

Phần cuối đơn lẻ/Apptainer

Chạy các lệnh trong vùng chứa Singularity/Apptainer. Được thiết kế cho các cụm HPC và máy dùng chung không có Docker.

terminal:
backend: singularity
singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"
container_cpu: 1 # CPU cores
container_memory: 5120 # MB
container_persistent: true # Writable overlay persists across sessions

Yêu cầu: apptainer hoặc singularity nhị phân trong $PATH.

Xử lý hình ảnh: URL Docker (docker://...) được tự động chuyển đổi thành tệp SIF và được lưu vào bộ nhớ đệm. Các tệp .sif hiện có được sử dụng trực tiếp.

Thư mục Scratch: Đã giải quyết theo thứ tự: TERMINAL_SCRATCH_DIRTERMINAL_SANDBOX_DIR/singularity/scratch/$USER/hermes-agent (quy ước HPC) → ~/.hermes/sandboxes/singularity.

Cách ly: Sử dụng --containall --no-home để cách ly không gian tên đầy đủ mà không cần gắn thư mục chính của máy chủ.

Các vấn đề thường gặp về phần cuối của thiết bị đầu cuối

Nếu các lệnh đầu cuối bị lỗi ngay lập tức hoặc công cụ đầu cuối được báo cáo là bị vô hiệu hóa:

  • Địa phương — Không có yêu cầu đặc biệt. Mặc định an toàn nhất khi bắt đầu.
  • Docker — Chạy docker version để xác minh Docker đang hoạt động. Nếu không thành công, hãy sửa Docker hoặc hermes config set terminal.backend local.
  • SSH — Phải đặt cả TERMINAL_SSH_HOSTTERMINAL_SSH_USER. Hermes ghi lại một lỗi rõ ràng nếu thiếu một trong hai.
  • Phương thức — Cần MODAL_TOKEN_ID env var hoặc ~/.modal.toml. Chạy hermes doctor để kiểm tra.
  • Daytona — Cần DAYTONA_API_KEY. SDK Daytona xử lý cấu hình URL máy chủ.
  • Singularity — Needs apptainer or singularity in $PATH. Common on HPC clusters.

When in doubt, set terminal.backend back to local and verify that commands run there first.

Docker Volume Mounts

When using the Docker backend, docker_volumes lets you share host directories with the container. Each entry uses standard Docker -v syntax: host_path:container_path[:options].

terminal:
backend: docker
docker_volumes:
- "/home/user/projects:/workspace/projects" # Read-write (default)
- "/home/user/datasets:/data:ro" # Read-only
- "/home/user/outputs:/outputs" # Agent writes, you read

This is useful for:

  • Providing files to the agent (datasets, configs, reference code)
  • Receiving files from the agent (generated code, reports, exports)
  • Shared workspaces where both you and the agent access the same files

Can also be set via environment variable: TERMINAL_DOCKER_VOLUMES='["/host:/container"]' (JSON array).

Docker Credential Forwarding

By default, Docker terminal sessions do not inherit arbitrary host credentials. If you need a specific token inside the container, add it to terminal.docker_forward_env.

terminal:
backend: docker
docker_forward_env:
- "GITHUB_TOKEN"
- "NPM_TOKEN"

Hermes resolves each listed variable from your current shell first, then falls back to ~/.hermes/.env if it was saved with hermes config set.

cảnh báo

Anything listed in docker_forward_env becomes visible to commands run inside the container. Only forward credentials you are comfortable exposing to the terminal session.

Optional: Mount the Launch Directory into /workspace

Docker sandboxes stay isolated by default. Hermes does not pass your current host working directory into the container unless you explicitly opt in.

Enable it in config.yaml:

terminal:
backend: docker
docker_mount_cwd_to_workspace: true

When enabled:

  • if you launch Hermes from ~/projects/my-app, that host directory is bind-mounted to /workspace
  • the Docker backend starts in /workspace
  • file tools and terminal commands both see the same mounted project

When disabled, /workspace stays sandbox-owned unless you explicitly mount something via docker_volumes.

Security tradeoff:

  • false preserves the sandbox boundary
  • true gives the sandbox direct access to the directory you launched Hermes from

Use the opt-in only when you intentionally want the container to work on live host files.

Persistent Shell

By default, each terminal command runs in its own subprocess — working directory, environment variables, and shell variables reset between commands. When persistent shell is enabled, a single long-lived bash process is kept alive across execute() calls so that state survives between commands.

This is most useful for the SSH backend, where it also eliminates per-command connection overhead. Persistent shell is enabled by default for SSH and disabled for the local backend.

terminal:
persistent_shell: true # default — enables persistent shell for SSH

To disable:

hermes config set terminal.persistent_shell false

What persists across commands:

  • Working directory (cd /tmp sticks for the next command)
  • Exported environment variables (export FOO=bar)
  • Shell variables (MY_VAR=hello)

Precedence:

LevelVariableDefault
Configterminal.persistent_shelltrue
SSH overrideTERMINAL_SSH_PERSISTENTfollows config
Local overrideTERMINAL_LOCAL_PERSISTENTfalse

Per-backend environment variables take highest precedence. If you want persistent shell on the local backend too:

export TERMINAL_LOCAL_PERSISTENT=true
ghi chú

Commands that require stdin_data or sudo automatically fall back to one-shot mode, since the persistent shell's stdin is already occupied by the IPC protocol.

See Code Execution and the Terminal section of the README for details on each backend.

Skill Settings

Skills can declare their own configuration settings via their SKILL.md frontmatter. These are non-secret values (paths, preferences, domain settings) stored under the skills.config namespace in config.yaml.

skills:
config:
wiki:
path: ~/wiki # Used by the llm-wiki skill

How skill settings work:

  • hermes config migrate scans all enabled skills, finds unconfigured settings, and offers to prompt you
  • hermes config show displays all skill settings under "Skill Settings" with the skill they belong to
  • When a skill loads, its resolved config values are injected into the skill context automatically

Setting values manually:

hermes config set skills.config.wiki.path ~/my-research-wiki

For details on declaring config settings in your own skills, see Creating Skills — Config Settings.

Memory Configuration

memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens

File Read Safety

Controls how much content a single read_file call can return. Reads that exceed the limit are rejected with an error telling the agent to use offset and limit for a smaller range. This prevents a single read of a minified JS bundle or large data file from flooding the context window.

file_read_max_chars: 100000  # default — ~25-35K tokens

Raise it if you're on a model with a large context window and frequently read big files. Lower it for small-context models to keep reads efficient:

# Large context model (200K+)
file_read_max_chars: 200000

# Small local model (16K context)
file_read_max_chars: 30000

The agent also deduplicates file reads automatically — if the same file region is read twice and the file hasn't changed, a lightweight stub is returned instead of re-sending the content. This resets on context compression so the agent can re-read files after their content is summarized away.

Git Worktree Isolation

Enable isolated git worktrees for running multiple agents in parallel on the same repo:

worktree: true    # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed

When enabled, each CLI session creates a fresh worktree under .worktrees/ with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery.

You can also list gitignored files to copy into worktrees via .worktreeinclude in your repo root:

# .worktreeinclude
.env
.venv/
node_modules/

Context Compression

Hermes automatically compresses long conversations to stay within your model's context window. The compression summarizer is a separate LLM call — you can point it at any provider or endpoint.

All compression settings live in config.yaml (no environment variables).

Full reference

compression:
enabled: true # Toggle compression on/off
threshold: 0.50 # Compress at this % of context limit
target_ratio: 0.20 # Fraction of threshold to preserve as recent tail
protect_last_n: 20 # Min recent messages to keep uncompressed
summary_model: "google/gemini-3-flash-preview" # Model for summarization
summary_provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
summary_base_url: null # Custom OpenAI-compatible endpoint (overrides provider)

Common setups

Default (auto-detect) — no configuration needed:

compression:
enabled: true
threshold: 0.50

Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Flash.

Force a specific provider (OAuth or API-key based):

compression:
summary_provider: nous
summary_model: gemini-3-flash

Works with any provider: nous, openrouter, codex, anthropic, main, etc.

Custom endpoint (self-hosted, Ollama, zai, DeepSeek, etc.):

compression:
summary_model: glm-4.7
summary_base_url: https://api.z.ai/api/coding/paas/v4

Points at a custom OpenAI-compatible endpoint. Uses OPENAI_API_KEY for auth.

How the three knobs interact

summary_providersummary_base_urlResult
auto (default)not setAuto-detect best available provider
nous / openrouter / etc.not setForce that provider, use its auth
anysetUse the custom endpoint directly (provider ignored)

The summary_model must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.

Iteration Budget Pressure

When the agent is working on a complex task with many tool calls, it can burn through its iteration budget (default: 90 turns) without realizing it's running low. Budget pressure automatically warns the model as it approaches the limit:

ThresholdLevelWhat the model sees
70%Caution[BUDGET: 63/90. 27 iterations left. Start consolidating.]
90%Warning[BUDGET WARNING: 81/90. Only 9 left. Respond NOW.]

Warnings are injected into the last tool result's JSON (as a _budget_warning field) rather than as separate messages — this preserves prompt caching and doesn't disrupt the conversation structure.

agent:
max_turns: 90 # Max iterations per conversation turn (default: 90)

Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.

Context Pressure Warnings

Separate from iteration budget pressure, context pressure tracks how close the conversation is to the compaction threshold — the point where context compression fires to summarize older messages. This helps both you and the agent understand when the conversation is getting long.

ProgressLevelWhat happens
≥ 60% to thresholdInfoCLI shows a cyan progress bar; gateway sends an informational notice
≥ 85% to thresholdWarningCLI shows a bold yellow bar; gateway warns compaction is imminent

In the CLI, context pressure appears as a progress bar in the tool output feed:

  ◐ context ████████████░░░░░░░░ 62% to compaction  48k threshold (50%) · approaching compaction

On messaging platforms, a plain-text notification is sent:

◐ Context: ████████████░░░░░░░░ 62% to compaction (threshold: 50% of window).

If auto-compression is disabled, the warning tells you context may be truncated instead.

Context pressure is automatic — no configuration needed. It fires purely as a user-facing notification and does not modify the message stream or inject anything into the model's context.

Credential Pool Strategies

When you have multiple API keys or OAuth tokens for the same provider, configure the rotation strategy:

credential_pool_strategies:
openrouter: round_robin # cycle through keys evenly
anthropic: least_used # always pick the least-used key

Options: fill_first (default), round_robin, least_used, random. See Credential Pools for full documentation.

Auxiliary Models

Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use Gemini Flash via auto-detection — you don't need to configure anything.

The universal config pattern

Every model slot in Hermes — auxiliary tasks, compression, fallback — uses the same three knobs:

KeyWhat it doesDefault
providerWhich provider to use for auth and routing"auto"
modelWhich model to requestprovider's default
base_urlCustom OpenAI-compatible endpoint (overrides provider)not set

When base_url is set, Hermes ignores the provider and calls that endpoint directly (using api_key or OPENAI_API_KEY for auth). When only provider is set, Hermes uses that provider's built-in auth and base URL.

Available providers: auto, openrouter, nous, codex, copilot, anthropic, main, zai, kimi-coding, minimax, any provider registered in the provider registry, or any named custom provider from your custom_providers list (e.g. provider: "beans").

Full auxiliary config reference

auxiliary:
# Image analysis (vision_analyze tool + browser screenshots)
vision:
provider: "auto" # "auto", "openrouter", "nous", "codex", "main", etc.
model: "" # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
base_url: "" # Custom OpenAI-compatible endpoint (overrides provider)
api_key: "" # API key for base_url (falls back to OPENAI_API_KEY)
timeout: 30 # seconds — LLM API call; increase for slow local vision models
download_timeout: 30 # seconds — image HTTP download; increase for slow connections

# Web page summarization + browser page text extraction
web_extract:
provider: "auto"
model: "" # e.g. "google/gemini-2.5-flash"
base_url: ""
api_key: ""
timeout: 360 # seconds (6min) — per-attempt LLM summarization

# Dangerous command approval classifier
approval:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30 # seconds

# Context compression timeout (separate from compression.* config)
compression:
timeout: 120 # seconds — compression summarizes long conversations, needs more time

# Session search — summarizes past session matches
session_search:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30

# Skills hub — skill matching and search
skills_hub:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30

# MCP tool dispatch
mcp:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30

# Memory flush — summarizes conversation for persistent memory
flush_memories:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30
mẹo

Each auxiliary task has a configurable timeout (in seconds). Defaults: vision 30s, web_extract 360s, approval 30s, compression 120s. Increase these if you use slow local models for auxiliary tasks. Vision also has a separate download_timeout (default 30s) for the HTTP image download — increase this for slow connections or self-hosted image servers.

thông tin

Context compression has its own top-level compression: block with summary_provider, summary_model, and summary_base_url — see Context Compression above. The fallback model uses a fallback_model: block — see Fallback Model. All three follow the same provider/model/base_url pattern.

Changing the Vision Model

To use GPT-4o instead of Gemini Flash for image analysis:

auxiliary:
vision:
model: "openai/gpt-4o"

Or via environment variable (in ~/.hermes/.env):

AUXILIARY_VISION_MODEL=openai/gpt-4o

Provider Options

ProviderDescriptionRequirements
"auto"Best available (default). Vision tries OpenRouter → Nous → Codex.
"openrouter"Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.)OPENROUTER_API_KEY
"nous"Force Nous Portalhermes auth
"codex"Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex).hermes model → Codex
"main"Use your active custom/main endpoint. This can come from OPENAI_BASE_URL + OPENAI_API_KEY or from a custom endpoint saved via hermes model / config.yaml. Works with OpenAI, local models, or any OpenAI-compatible API.Custom endpoint credentials + base URL

Common Setups

Using a direct custom endpoint (clearer than provider: "main" for local/self-hosted APIs):

auxiliary:
vision:
base_url: "http://localhost:1234/v1"
api_key: "local-key"
model: "qwen2.5-vl"

base_url takes precedence over provider, so this is the most explicit way to route an auxiliary task to a specific endpoint. For direct endpoint overrides, Hermes uses the configured api_key or falls back to OPENAI_API_KEY; it does not reuse OPENROUTER_API_KEY for that custom endpoint.

Using OpenAI API key for vision:

# In ~/.hermes/.env:
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_API_KEY=sk-...

auxiliary:
vision:
provider: "main"
model: "gpt-4o" # or "gpt-4o-mini" for cheaper

Using OpenRouter for vision (route to any model):

auxiliary:
vision:
provider: "openrouter"
model: "openai/gpt-4o" # or "google/gemini-2.5-flash", etc.

Using Codex OAuth (ChatGPT Pro/Plus account — no API key needed):

auxiliary:
vision:
provider: "codex" # uses your ChatGPT OAuth token
# model defaults to gpt-5.3-codex (supports vision)

Using a local/self-hosted model:

auxiliary:
vision:
provider: "main" # uses your active custom endpoint
model: "my-local-model"

provider: "main" uses whatever provider Hermes uses for normal chat — whether that's a named custom provider (e.g. beans), a built-in provider like openrouter, or a legacy OPENAI_BASE_URL endpoint.

mẹo

If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision.

cảnh báo

Vision requires a multimodal model. If you set provider: "main", make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.

Environment Variables (legacy)

Auxiliary models can also be configured via environment variables. However, config.yaml is the preferred method — it's easier to manage and supports all options including base_url and api_key.

SettingEnvironment Variable
Vision providerAUXILIARY_VISION_PROVIDER
Vision modelAUXILIARY_VISION_MODEL
Vision endpointAUXILIARY_VISION_BASE_URL
Vision API keyAUXILIARY_VISION_API_KEY
Web extract providerAUXILIARY_WEB_EXTRACT_PROVIDER
Web extract modelAUXILIARY_WEB_EXTRACT_MODEL
Web extract endpointAUXILIARY_WEB_EXTRACT_BASE_URL
Web extract API keyAUXILIARY_WEB_EXTRACT_API_KEY

Compression and fallback model settings are config.yaml-only.

mẹo

Run hermes config to see your current auxiliary model settings. Overrides only show up when they differ from the defaults.

Reasoning Effort

Control how much "thinking" the model does before responding:

agent:
reasoning_effort: "" # empty = medium (default). Options: xhigh (max), high, medium, low, minimal, none

When unset (default), reasoning effort defaults to "medium" — a balanced level that works well for most tasks. Setting a value overrides it — higher reasoning effort gives better results on complex tasks at the cost of more tokens and latency.

You can also change the reasoning effort at runtime with the /reasoning command:

/reasoning           # Show current effort level and display state
/reasoning high # Set reasoning effort to high
/reasoning none # Disable reasoning
/reasoning show # Show model thinking above each response
/reasoning hide # Hide model thinking

Tool-Use Enforcement

Some models (especially GPT-family) occasionally describe intended actions as text instead of making tool calls. Tool-use enforcement injects guidance that steers the model back to actually calling tools.

agent:
tool_use_enforcement: "auto" # "auto" | true | false | ["model-substring", ...]
ValueBehavior
"auto" (default)Enabled for GPT models (gpt-, openai/gpt-) and disabled for all others.
trueAlways enabled for all models.
falseAlways disabled.
["gpt-", "o1-", "custom-model"]Enabled only for models whose name contains one of the listed substrings.

When enabled, the system prompt includes guidance reminding the model to make actual tool calls rather than describing what it would do. This is transparent to the user and has no effect on models that already use tools reliably.

TTS Configuration

tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB"
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu

This controls both the text_to_speech tool and spoken replies in voice mode (/voice tts in the CLI or messaging gateway).

Display Settings

display:
tool_progress: all # off | new | all | verbose
tool_progress_command: false # Enable /verbose slash command in messaging gateway
skin: default # Built-in or custom CLI skin (see user-guide/features/skins)
personality: "kawaii" # Legacy cosmetic field still surfaced in some summaries
compact: false # Compact output mode (less whitespace)
resume_display: full # full (show previous messages on resume) | minimal (one-liner only)
bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
show_reasoning: false # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
streaming: false # Stream tokens to terminal as they arrive (real-time output)
show_cost: false # Show estimated $ cost in the CLI status bar
tool_preview_length: 0 # Max chars for tool call previews (0 = no limit, show full paths/commands)
ModeWhat you see
offSilent — just the final response
newTool indicator only when the tool changes
allEvery tool call with a short preview (default)
verboseFull args, results, and debug logs

In the CLI, cycle through these modes with /verbose. To use /verbose in messaging platforms (Telegram, Discord, Slack, etc.), set tool_progress_command: true in the display section above. The command will then cycle the mode and save to config.

Privacy

privacy:
redact_pii: false # Strip PII from LLM context (gateway only)

When redact_pii is true, the gateway redacts personally identifiable information from the system prompt before sending it to the LLM on supported platforms:

FieldTreatment
Phone numbers (user ID on WhatsApp/Signal)Hashed to user_<12-char-sha256>
User IDsHashed to user_<12-char-sha256>
Chat IDsNumeric portion hashed, platform prefix preserved (telegram:<hash>)
Home channel IDsNumeric portion hashed
User names / usernamesNot affected (user-chosen, publicly visible)

Platform support: Redaction applies to WhatsApp, Signal, and Telegram. Discord and Slack are excluded because their mention systems (<@user_id>) require the real ID in the LLM context.

Hashes are deterministic — the same user always maps to the same hash, so the model can still distinguish between users in group chats. Routing and delivery use the original values internally.

Speech-to-Text (STT)

stt:
provider: "local" # "local" | "groq" | "openai"
local:
model: "base" # tiny, base, small, medium, large-v3
openai:
model: "whisper-1" # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
# model: "whisper-1" # Legacy fallback key still respected

Provider behavior:

  • local uses faster-whisper running on your machine. Install it separately with pip install faster-whisper.
  • groq uses Groq's Whisper-compatible endpoint and reads GROQ_API_KEY.
  • openai uses the OpenAI speech API and reads VOICE_TOOLS_OPENAI_KEY.

If the requested provider is unavailable, Hermes falls back automatically in this order: localgroqopenai.

Groq and OpenAI model overrides are environment-driven:

STT_GROQ_MODEL=whisper-large-v3-turbo
STT_OPENAI_MODEL=whisper-1
GROQ_BASE_URL=https://api.groq.com/openai/v1
STT_OPENAI_BASE_URL=https://api.openai.com/v1

Voice Mode (CLI)

voice:
record_key: "ctrl+b" # Push-to-talk key inside the CLI
max_recording_seconds: 120 # Hard stop for long recordings
auto_tts: false # Enable spoken replies automatically when /voice on
silence_threshold: 200 # RMS threshold for speech detection
silence_duration: 3.0 # Seconds of silence before auto-stop

Use /voice on in the CLI to enable microphone mode, record_key to start/stop recording, and /voice tts to toggle spoken replies. See Voice Mode for end-to-end setup and platform-specific behavior.

Streaming

Stream tokens to the terminal or messaging platforms as they arrive, instead of waiting for the full response.

CLI Streaming

display:
streaming: true # Stream tokens to terminal in real-time
show_reasoning: true # Also stream reasoning/thinking tokens (optional)

When enabled, responses appear token-by-token inside a streaming box. Tool calls are still captured silently. If the provider doesn't support streaming, it falls back to the normal display automatically.

Gateway Streaming (Telegram, Discord, Slack)

streaming:
enabled: true # Enable progressive message editing
transport: edit # "edit" (progressive message editing) or "off"
edit_interval: 0.3 # Seconds between message edits
buffer_threshold: 40 # Characters before forcing an edit flush
cursor: " ▉" # Cursor shown during streaming

When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.

Overflow handling: If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.

ghi chú

Streaming is disabled by default. Enable it in ~/.hermes/config.yaml to try the streaming UX.

Group Chat Session Isolation

Control whether shared chats keep one conversation per room or one conversation per participant:

group_sessions_per_user: true  # true = per-user isolation in groups/channels, false = one shared session per chat
  • true is the default and recommended setting. In Discord channels, Telegram groups, Slack channels, and similar shared contexts, each sender gets their own session when the platform provides a user ID.
  • false reverts to the old shared-room behavior. That can be useful if you explicitly want Hermes to treat a channel like one collaborative conversation, but it also means users share context, token costs, and interrupt state.
  • Direct messages are unaffected. Hermes still keys DMs by chat/DM ID as usual.
  • Threads stay isolated from their parent channel either way; with true, each participant also gets their own session inside the thread.

For the behavior details and examples, see Sessions and the Discord guide.

Unauthorized DM Behavior

Control what Hermes does when an unknown user sends a direct message:

unauthorized_dm_behavior: pair

whatsapp:
unauthorized_dm_behavior: ignore
  • pair is the default. Hermes denies access, but replies with a one-time pairing code in DMs.
  • ignore silently drops unauthorized DMs.
  • Platform sections override the global default, so you can keep pairing enabled broadly while making one platform quieter.

Quick Commands

Define custom commands that run shell commands without invoking the LLM — zero token usage, instant execution. Especially useful from messaging platforms (Telegram, Discord, etc.) for quick server checks or utility scripts.

quick_commands:
status:
type: exec
command: systemctl status hermes-agent
disk:
type: exec
command: df -h /
update:
type: exec
command: cd ~/.hermes/hermes-agent && git pull && pip install -e .
gpu:
type: exec
command: nvidia-smi --query-gpu=name,utilization.gpu,memory.used,memory.total --format=csv,noheader

Usage: type /status, /disk, /update, or /gpu in the CLI or any messaging platform. The command runs locally on the host and returns the output directly — no LLM call, no tokens consumed.

  • 30-second timeout — long-running commands are killed with an error message
  • Priority — quick commands are checked before skill commands, so you can override skill names
  • Autocomplete — quick commands are resolved at dispatch time and are not shown in the built-in slash-command autocomplete tables
  • Type — only exec is supported (runs a shell command); other types show an error
  • Works everywhere — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant

Human Delay

Simulate human-like response pacing in messaging platforms:

human_delay:
mode: "off" # off | natural | custom
min_ms: 800 # Minimum delay (custom mode)
max_ms: 2500 # Maximum delay (custom mode)

Code Execution

Configure the sandboxed Python code execution tool:

code_execution:
timeout: 300 # Max execution time in seconds
max_tool_calls: 50 # Max tool calls within code execution

Web Search Backends

The web_search, web_extract, and web_crawl tools support four backend providers. Configure the backend in config.yaml or via hermes tools:

web:
backend: firecrawl # firecrawl | parallel | tavily | exa
BackendEnv VarSearchExtractCrawl
Firecrawl (default)FIRECRAWL_API_KEY
ParallelPARALLEL_API_KEY
TavilyTAVILY_API_KEY
ExaEXA_API_KEY

Backend selection: If web.backend is not set, the backend is auto-detected from available API keys. If only EXA_API_KEY is set, Exa is used. If only TAVILY_API_KEY is set, Tavily is used. If only PARALLEL_API_KEY is set, Parallel is used. Otherwise Firecrawl is the default.

Self-hosted Firecrawl: Set FIRECRAWL_API_URL to point at your own instance. When a custom URL is set, the API key becomes optional (set USE_DB_AUTHENTICATION=false on the server to disable auth).

Parallel search modes: Set PARALLEL_SEARCH_MODE to control search behavior — fast, one-shot, or agentic (default: agentic).

Browser

Configure browser automation behavior:

browser:
inactivity_timeout: 120 # Seconds before auto-closing idle sessions
command_timeout: 30 # Timeout in seconds for browser commands (screenshot, navigate, etc.)
record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
camofox:
managed_persistence: false # When true, Camofox sessions persist cookies/logins across restarts

The browser toolset supports multiple providers. See the Browser feature page for details on Browserbase, Browser Use, and local Chrome CDP setup.

Timezone

Override the server-local timezone with an IANA timezone string. Affects timestamps in logs, cron scheduling, and system prompt time injection.

timezone: "America/New_York"   # IANA timezone (default: "" = server-local time)

Supported values: any IANA timezone identifier (e.g. America/New_York, Europe/London, Asia/Kolkata, UTC). Leave empty or omit for server-local time.

Discord

Configure Discord-specific behavior for the messaging gateway:

discord:
require_mention: true # Require @mention to respond in server channels
free_response_channels: "" # Comma-separated channel IDs where bot responds without @mention
auto_thread: true # Auto-create threads on @mention in channels
  • require_mention — when true (default), the bot only responds in server channels when mentioned with @BotName. DMs always work without mention.
  • free_response_channels — comma-separated list of channel IDs where the bot responds to every message without requiring a mention.
  • auto_thread — when true (default), mentions in channels automatically create a thread for the conversation, keeping channels clean (similar to Slack threading).

Security

Pre-execution security scanning and secret redaction:

security:
redact_secrets: true # Redact API key patterns in tool output and logs
tirith_enabled: true # Enable Tirith security scanning for terminal commands
tirith_path: "tirith" # Path to tirith binary (default: "tirith" in $PATH)
tirith_timeout: 5 # Seconds to wait for tirith scan before timing out
tirith_fail_open: true # Allow command execution if tirith is unavailable
website_blocklist: # See Website Blocklist section below
enabled: false
domains: []
shared_files: []
  • redact_secrets — automatically detects and redacts patterns that look like API keys, tokens, and passwords in tool output before it enters the conversation context and logs.
  • tirith_enabled — when true, terminal commands are scanned by Tirith before execution to detect potentially dangerous operations.
  • tirith_path — path to the tirith binary. Set this if tirith is installed in a non-standard location.
  • tirith_timeout — maximum seconds to wait for a tirith scan. Commands proceed if the scan times out.
  • tirith_fail_open — when true (default), commands are allowed to execute if tirith is unavailable or fails. Set to false to block commands when tirith cannot verify them.

Website Blocklist

Block specific domains from being accessed by the agent's web and browser tools:

security:
website_blocklist:
enabled: false # Enable URL blocking (default: false)
domains: # List of blocked domain patterns
- "*.internal.company.com"
- "admin.example.com"
- "*.local"
shared_files: # Load additional rules from external files
- "/etc/hermes/blocked-sites.txt"

When enabled, any URL matching a blocked domain pattern is rejected before the web or browser tool executes. This applies to web_search, web_extract, browser_navigate, and any tool that accesses URLs.

Domain rules support:

  • Exact domains: admin.example.com
  • Wildcard subdomains: *.internal.company.com (blocks all subdomains)
  • TLD wildcards: *.local

Shared files contain one domain rule per line (blank lines and # comments are ignored). Missing or unreadable files log a warning but don't disable other web tools.

The policy is cached for 30 seconds, so config changes take effect quickly without restart.

Smart Approvals

Control how Hermes handles potentially dangerous commands:

approvals:
mode: manual # manual | smart | off
ModeBehavior
manual (default)Prompt the user before executing any flagged command. In the CLI, shows an interactive approval dialog. In messaging, queues a pending approval request.
smartUse an auxiliary LLM to assess whether a flagged command is actually dangerous. Low-risk commands are auto-approved with session-level persistence. Genuinely risky commands are escalated to the user.
offSkip all approval checks. Equivalent to HERMES_YOLO_MODE=true. Use with caution.

Smart mode is particularly useful for reducing approval fatigue — it lets the agent work more autonomously on safe operations while still catching genuinely destructive commands.

cảnh báo

Setting approvals.mode: off disables all safety checks for terminal commands. Only use this in trusted, sandboxed environments.

Checkpoints

Automatic filesystem snapshots before destructive file operations. See the Checkpoints & Rollback for details.

checkpoints:
enabled: true # Enable automatic checkpoints (also: hermes --checkpoints)
max_snapshots: 50 # Max checkpoints to keep per directory

Delegation

Configure subagent behavior for the delegate tool:

delegation:
# model: "google/gemini-3-flash-preview" # Override model (empty = inherit parent)
# provider: "openrouter" # Override provider (empty = inherit parent)
# base_url: "http://localhost:1234/v1" # Direct OpenAI-compatible endpoint (takes precedence over provider)
# api_key: "local-key" # API key for base_url (falls back to OPENAI_API_KEY)

Subagent provider:model override: By default, subagents inherit the parent agent's provider and model. Set delegation.provider and delegation.model to route subagents to a different provider:model pair — e.g., use a cheap/fast model for narrowly-scoped subtasks while your primary agent runs an expensive reasoning model.

Direct endpoint override: If you want the obvious custom-endpoint path, set delegation.base_url, delegation.api_key, and delegation.model. That sends subagents directly to that OpenAI-compatible endpoint and takes precedence over delegation.provider. If delegation.api_key is omitted, Hermes falls back to OPENAI_API_KEY only.

The delegation provider uses the same credential resolution as CLI/gateway startup. All configured providers are supported: openrouter, nous, copilot, zai, kimi-coding, minimax, minimax-cn. When a provider is set, the system automatically resolves the correct base URL, API key, and API mode — no manual credential wiring needed.

Precedence: delegation.base_url in config → delegation.provider in config → parent provider (inherited). delegation.model in config → parent model (inherited). Setting just model without provider changes only the model name while keeping the parent's credentials (useful for switching models within the same provider like OpenRouter).

Clarify

Configure the clarification prompt behavior:

clarify:
timeout: 120 # Seconds to wait for user clarification response

Context Files (SOUL.md, AGENTS.md)

Hermes uses two different context scopes:

FilePurposeScope
SOUL.mdPrimary agent identity — defines who the agent is (slot #1 in the system prompt)~/.hermes/SOUL.md or $HERMES_HOME/SOUL.md
.hermes.md / HERMES.mdProject-specific instructions (highest priority)Walks to git root
AGENTS.mdProject-specific instructions, coding conventionsRecursive directory walk
CLAUDE.mdClaude Code context files (also detected)Working directory only
.cursorrulesCursor IDE rules (also detected)Working directory only
.cursor/rules/*.mdcCursor rule files (also detected)Working directory only
  • SOUL.md is the agent's primary identity. It occupies slot #1 in the system prompt, completely replacing the built-in default identity. Edit it to fully customize who the agent is.
  • If SOUL.md is missing, empty, or cannot be loaded, Hermes falls back to a built-in default identity.
  • Project context files use a priority system — only ONE type is loaded (first match wins): .hermes.mdAGENTS.mdCLAUDE.md.cursorrules. SOUL.md is always loaded independently.
  • AGENTS.md is hierarchical: if subdirectories also have AGENTS.md, all are combined.
  • Hermes automatically seeds a default SOUL.md if one does not already exist.
  • All loaded context files are capped at 20,000 characters with smart truncation.

See also:

Working Directory

ContextDefault
CLI (hermes)Current directory where you run the command
Messaging gatewayHome directory ~ (override with MESSAGING_CWD)
Docker / Singularity / Modal / SSHUser's home directory inside the container or remote machine

Override the working directory:

# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects # Gateway sessions
TERMINAL_CWD=/workspace # All terminal sessions