GCP in Action: Building a Persistent AI Assistant with GCE, Hermes Agent, and Telegram

Published: (May 2, 2026 at 08:01 AM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Introduction

After solving the LINE Bot’s Vertex AI migration, I wondered whether an AI assistant could be more proactive and have long‑term memory. I turned to NousResearch’s open‑source Hermes Agent.
Unlike a typical chatbot, Hermes is designed as an “operating system that breathes”: it can execute shell commands, write Python scripts, manage long‑term memory, and stay in touch via various gateways (Telegram, Discord).

To keep it running 24/7, I deployed it on Google Compute Engine (GCE). This guide documents the deployment from scratch and the pitfalls encountered when configuring the latest Gemini 2.5 Flash model.

Prerequisites

ParameterDescription
PROJECT_IDYour Google Cloud project ID
LOCATIONglobal
GOOGLE_API_KEYAPI key from Google AI Studio
Machine typee2-medium (recommended for tool use)
OS imageUbuntu 22.04 LTS

Create the VM

gcloud compute instances create hermes-agent-vm \
    --project=YOUR_PROJECT_ID \
    --zone=us-central1-a \
    --machine-type=e2-medium \
    --image-family=ubuntu-2204-lts \
    --image-project=ubuntu-os-cloud \
    --boot-disk-size=30GB \
    --metadata=startup-script='#!/bin/bash
        apt-get update
        apt-get install -y git curl python3-pip python3-venv nodejs npm
    '

Install Hermes Agent

SSH into the instance

gcloud compute ssh hermes-agent-vm --zone=us-central1-a

Run the one‑click installer

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc

Configure Hermes

Model configuration

Create (or edit) ~/.hermes/config.yaml and explicitly specify the Gemini 2.5 Flash model without the google/ prefix, e.g.:

provider:
  name: gemini
  model: gemini-2.5-flash
  # auxiliary models (titles, summarization, etc.)
  auxiliary:
    title: gemini-2.5-flash
    summary: gemini-2.5-flash

API key

Store the API key and any required environment variables in ~/.hermes/.env:

GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY

Set Up Systemd for Persistence

Create a Systemd service file at /etc/systemd/system/hermes.service:

[Unit]
Description=Hermes Agent Gateway
After=network.target

[Service]
Type=simple
User=root
Environment=HOME=/root
Environment=PYTHONUNBUFFERED=1
ExecStartPre=/usr/bin/pkill -9 -f hermes || true
ExecStart=/usr/local/lib/hermes-agent/venv/bin/hermes gateway run
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable hermes
sudo systemctl restart hermes

Troubleshooting Common Issues

SymptomCauseFix
Agent reads messages but does not replyConfigured model identifier gemini-3-flash-preview (deprecated)Change all model references to gemini-2.5-flash in config.yaml or patch auxiliary_client.py
“404 Model Not Found” errorsUsing the google/ prefix (e.g., google/gemini-2.5-flash)Use the short name gemini-2.5-flash
“Gateway already running (PID …)” on service startA previous Hermes process is still aliveThe ExecStartPre line in the Systemd unit kills any stray process before starting a new one
Logs show errors from auxiliary functions (title generation, etc.)Default auxiliary model identifiers are outdatedExplicitly set auxiliary models in config.yaml as shown above

Conclusion

With the steps above, a dedicated Hermes Agent runs stably on GCE and is reachable via Telegram at any time. It can fetch information, execute scripts, and maintain long‑term memory on the cloud VM.

Key takeaways

  • Model identifiers change rapidly; always verify the exact name against the official documentation or the MCP tool.
  • Using the short model name (gemini-2.5-flash) avoids routing errors.
  • Systemd ensures the agent survives SSH disconnects and restarts automatically on failure.

If you want a 24‑hour AI digital double, follow this SOP to set up your own persistent Hermes Agent.

0 views
Back to Blog

Related posts

Read more »