GCP in Action: Building a Persistent AI Assistant with GCE, Hermes Agent, and Telegram
Source: Dev.to
Introduction
After solving the LINE Bot’s Vertex AI migration, I wondered whether an AI assistant could be more proactive and have long‑term memory. I turned to NousResearch’s open‑source Hermes Agent.
Unlike a typical chatbot, Hermes is designed as an “operating system that breathes”: it can execute shell commands, write Python scripts, manage long‑term memory, and stay in touch via various gateways (Telegram, Discord).
To keep it running 24/7, I deployed it on Google Compute Engine (GCE). This guide documents the deployment from scratch and the pitfalls encountered when configuring the latest Gemini 2.5 Flash model.
Prerequisites
| Parameter | Description |
|---|---|
PROJECT_ID | Your Google Cloud project ID |
LOCATION | global |
GOOGLE_API_KEY | API key from Google AI Studio |
| Machine type | e2-medium (recommended for tool use) |
| OS image | Ubuntu 22.04 LTS |
Create the VM
gcloud compute instances create hermes-agent-vm \
--project=YOUR_PROJECT_ID \
--zone=us-central1-a \
--machine-type=e2-medium \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud \
--boot-disk-size=30GB \
--metadata=startup-script='#!/bin/bash
apt-get update
apt-get install -y git curl python3-pip python3-venv nodejs npm
'
Install Hermes Agent
SSH into the instance
gcloud compute ssh hermes-agent-vm --zone=us-central1-a
Run the one‑click installer
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc
Configure Hermes
Model configuration
Create (or edit) ~/.hermes/config.yaml and explicitly specify the Gemini 2.5 Flash model without the google/ prefix, e.g.:
provider:
name: gemini
model: gemini-2.5-flash
# auxiliary models (titles, summarization, etc.)
auxiliary:
title: gemini-2.5-flash
summary: gemini-2.5-flash
API key
Store the API key and any required environment variables in ~/.hermes/.env:
GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY
Set Up Systemd for Persistence
Create a Systemd service file at /etc/systemd/system/hermes.service:
[Unit]
Description=Hermes Agent Gateway
After=network.target
[Service]
Type=simple
User=root
Environment=HOME=/root
Environment=PYTHONUNBUFFERED=1
ExecStartPre=/usr/bin/pkill -9 -f hermes || true
ExecStart=/usr/local/lib/hermes-agent/venv/bin/hermes gateway run
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable hermes
sudo systemctl restart hermes
Troubleshooting Common Issues
| Symptom | Cause | Fix |
|---|---|---|
| Agent reads messages but does not reply | Configured model identifier gemini-3-flash-preview (deprecated) | Change all model references to gemini-2.5-flash in config.yaml or patch auxiliary_client.py |
| “404 Model Not Found” errors | Using the google/ prefix (e.g., google/gemini-2.5-flash) | Use the short name gemini-2.5-flash |
| “Gateway already running (PID …)” on service start | A previous Hermes process is still alive | The ExecStartPre line in the Systemd unit kills any stray process before starting a new one |
| Logs show errors from auxiliary functions (title generation, etc.) | Default auxiliary model identifiers are outdated | Explicitly set auxiliary models in config.yaml as shown above |
Conclusion
With the steps above, a dedicated Hermes Agent runs stably on GCE and is reachable via Telegram at any time. It can fetch information, execute scripts, and maintain long‑term memory on the cloud VM.
Key takeaways
- Model identifiers change rapidly; always verify the exact name against the official documentation or the MCP tool.
- Using the short model name (
gemini-2.5-flash) avoids routing errors. - Systemd ensures the agent survives SSH disconnects and restarts automatically on failure.
If you want a 24‑hour AI digital double, follow this SOP to set up your own persistent Hermes Agent.