GCP in Action: Building a Persistent AI Assistant with GCE, Hermes Agent, and Telegram

Published: 2 days ago (May 2, 2026 at 08:01 AM EDT)

3 min read

Source: Dev.to

Introduction

After solving the LINE Bot’s Vertex AI migration, I wondered whether an AI assistant could be more proactive and have long‑term memory. I turned to NousResearch’s open‑source Hermes Agent.
Unlike a typical chatbot, Hermes is designed as an “operating system that breathes”: it can execute shell commands, write Python scripts, manage long‑term memory, and stay in touch via various gateways (Telegram, Discord).

To keep it running 24/7, I deployed it on Google Compute Engine (GCE). This guide documents the deployment from scratch and the pitfalls encountered when configuring the latest Gemini 2.5 Flash model.

Prerequisites

Parameter	Description
`PROJECT_ID`	Your Google Cloud project ID
`LOCATION`	`global`
`GOOGLE_API_KEY`	API key from Google AI Studio
Machine type	`e2-medium` (recommended for tool use)
OS image	Ubuntu 22.04 LTS

Create the VM

gcloud compute instances create hermes-agent-vm \
    --project=YOUR_PROJECT_ID \
    --zone=us-central1-a \
    --machine-type=e2-medium \
    --image-family=ubuntu-2204-lts \
    --image-project=ubuntu-os-cloud \
    --boot-disk-size=30GB \
    --metadata=startup-script='#!/bin/bash
        apt-get update
        apt-get install -y git curl python3-pip python3-venv nodejs npm
    '

Install Hermes Agent

SSH into the instance

gcloud compute ssh hermes-agent-vm --zone=us-central1-a

Run the one‑click installer

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc

Configure Hermes

Model configuration

Create (or edit) ~/.hermes/config.yaml and explicitly specify the Gemini 2.5 Flash model without the google/ prefix, e.g.:

provider:
  name: gemini
  model: gemini-2.5-flash
  # auxiliary models (titles, summarization, etc.)
  auxiliary:
    title: gemini-2.5-flash
    summary: gemini-2.5-flash

API key

Store the API key and any required environment variables in ~/.hermes/.env:

GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY

Set Up Systemd for Persistence

Create a Systemd service file at /etc/systemd/system/hermes.service:

[Unit]
Description=Hermes Agent Gateway
After=network.target

[Service]
Type=simple
User=root
Environment=HOME=/root
Environment=PYTHONUNBUFFERED=1
ExecStartPre=/usr/bin/pkill -9 -f hermes || true
ExecStart=/usr/local/lib/hermes-agent/venv/bin/hermes gateway run
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable hermes
sudo systemctl restart hermes

Troubleshooting Common Issues

Symptom	Cause	Fix
Agent reads messages but does not reply	Configured model identifier `gemini-3-flash-preview` (deprecated)	Change all model references to `gemini-2.5-flash` in `config.yaml` or patch `auxiliary_client.py`
“404 Model Not Found” errors	Using the `google/` prefix (e.g., `google/gemini-2.5-flash`)	Use the short name `gemini-2.5-flash`
“Gateway already running (PID …)” on service start	A previous Hermes process is still alive	The `ExecStartPre` line in the Systemd unit kills any stray process before starting a new one
Logs show errors from auxiliary functions (title generation, etc.)	Default auxiliary model identifiers are outdated	Explicitly set auxiliary models in `config.yaml` as shown above

Conclusion

With the steps above, a dedicated Hermes Agent runs stably on GCE and is reachable via Telegram at any time. It can fetch information, execute scripts, and maintain long‑term memory on the cloud VM.

Key takeaways

Model identifiers change rapidly; always verify the exact name against the official documentation or the MCP tool.
Using the short model name (gemini-2.5-flash) avoids routing errors.
Systemd ensures the agent survives SSH disconnects and restarts automatically on failure.

If you want a 24‑hour AI digital double, follow this SOP to set up your own persistent Hermes Agent.