Turning block/goose into an AI SRE Agent

Published: 3 weeks ago (January 13, 2026 at 08:25 PM EST)

4 min read

Source: Dev.to

That’s exactly why I started experimenting with block/goose as an AI‑powered SRE agent.

In this post, I’ll explain how I configured Goose to behave like a real SRE: querying AWS CloudWatch, reasoning about incidents, and operating inside a fully reproducible Nix environment.

What is block/goose?

block/goose is an autonomous‑agent framework designed to execute workflows using LLMs plus external tools. Think of it as a CLI‑native AI operator that can:

Call tools (APIs, CLIs, MCP servers)
Maintain session context
Execute multi‑step reasoning loops

What makes Goose interesting for SRE is that it does not try to replace tools — it orchestrates them.

Why an AI SRE Agent?

A real SRE spends most of their time doing:

Investigations across logs, metrics, and alarms
Repeating the same diagnostic steps
Cross‑referencing infra state with recent changes
Reducing operational toil

This repo turns Goose into an agent that can:

Query CloudWatch logs, metrics, and alarms
Reason about AWS infrastructure health
Follow structured incident workflows
Act as a first‑response investigator

This is not ChatOps fluff — it’s an operational assistant.

Architecture Overview

The setup has three pillars:

Goose as the reasoning engine
MCP servers as tool providers
Nix Flakes as the environment orchestrator

Goose (LLM Agent)
   ├── CloudWatch MCP Server
   ├── AWS CLI
   ├── GitHub MCP (optional)
   └── Local tooling (jq, httpie, docker, task)

Everything runs inside a reproducible dev shell.

The Reproducible Environment (Nix)

Just like all my other projects, this one is driven by a flake.nix.

Why? Because SRE tooling is fragile:

AWS CLI versions
Node vs. Python tooling
MCP servers via uvx or npx

Nix eliminates all of that drift. Once inside the shell you already have:

AWS credentials
Goose CLI
CloudWatch tooling
JSON processing tools

No README ritual. No “install this first”.

Configuring Goose as an SRE Agent

Provider & Model

NIXPKGS_ALLOW_UNFREE=1
GOOGLE_GEMINI_MODEL_NAME=gemini-2.5-flash
GITHUB_PERSONAL_ACCESS_TOKEN=
SENTRY_ACCESS_TOKEN=

Fast, cheap, and good at tool orchestration — perfect for SRE tasks.

CloudWatch as a First‑Class Tool

CloudWatch MCP Extension

GOOSE_PROVIDER: google
GOOSE_MODEL: gemini-2.5-flash
extensions:
  cloudwatch:
    enabled: true
    type: stdio
    name: cloudwatch
    description: AWS CloudWatch Observability (metrics/logs/alarms) via awslabs.cloudwatch-mcp-server
    cmd: uvx
    args:
      - awslabs.cloudwatch-mcp-server@latest
    envs:
      region: us-east-1
      profile: default
      FASTMCP_LOG_LEVEL: ERROR
    env_keys: []
    timeout: 30000
    bundled: null
    available_tools: []
  github:
    enabled: true
    type: stdio
    name: github
    description: GitHub MCP Server
    cmd: npx
    args:
      - -y
      - "@modelcontextprotocol/server-github"
    timeout: 30000
  sentry:
    enabled: true
    type: stdio
    name: sentry
    description: Sentry MCP Server
    cmd: npx
    args:
      - -y
      - mcp-remote@latest
      - https://mcp.sentry.dev/mcp
    timeout: 30000

This gives the agent the ability to:

Fetch metrics
Query logs
Inspect alarms
Reason over time‑series data

It’s the same data an SRE would manually inspect — just faster.

How Goose Behaves Like an SRE

Instead of prompting Goose with generic questions, I treat it like a teammate.

Examples

“Investigate elevated 5xx errors in the last 30 minutes.”
“Check if latency correlates with a deploy.”
“Summarize CloudWatch alarms for this service.”

Goose:

Chooses the right tool
Queries CloudWatch
Interprets the output
Produces a human‑readable diagnosis

No dashboards. No clicking. Just answers.

Why This Works

Tools, Not Plugins – Goose doesn’t hallucinate metrics; it queries real AWS APIs.
Reproducibility – Anyone can clone the repo and get the same SRE agent.
Composability – You can add:
- GitHub MCP (for correlating PRs)
- PagerDuty
- Terraform state inspection
- Custom runbooks

The agent evolves with your infrastructure.

Entering the Environment

nix --extra-experimental-features 'nix-command flakes' develop --impure

Why `--impure`?

AWS credentials
Docker socket
Local MCP binaries

This is intentional and controlled.

What This Is Not

❌ A replacement for on‑call engineers
❌ A magic “fix production” bot
❌ Another ChatOps toy

This is:

A first‑response investigator
A context‑gathering machine
A force multiplier for SREs

Conclusion

SRE work is pattern‑based, repetitive, and highly observable. By coupling an LLM‑driven agent with real, reproducible tooling, we can automate the noisy, repetitive parts of incident response while keeping the human in the loop for the nuanced decision‑making that only engineers can provide. Goose, powered by block/goose, demonstrates a practical path toward AI‑augmented SRE that is both trustworthy and extensible.

Procedural.

That makes it perfect for AI agents — as long as they:

Use real tools
Run in real environments
Respect operational boundaries

With block/goose, MCP servers, and Nix, you get exactly that.

If you’re curious about AI agents that actually do ops, this is a solid place to start.

Turning block/goose into an AI SRE Agent

What is block/goose?

Why an AI SRE Agent?

Architecture Overview

The Reproducible Environment (Nix)

Configuring Goose as an SRE Agent

Provider & Model

CloudWatch as a First‑Class Tool

CloudWatch MCP Extension

How Goose Behaves Like an SRE

Why This Works

Entering the Environment

Why `--impure`?

What This Is Not

Conclusion

Related posts

Creating a AI-enabled Slackbot with AWS Bedrock Knowledge Base

I built an extensive FREE Training Platform to learn Claude Code, Cursor, Codex CLI, and Gemini CLI

Debugging Agents is Tough: How I Built a 'Flight Recorder' for AI Kernel

What is block/goose?

Why an AI SRE Agent?

Architecture Overview

The Reproducible Environment (Nix)

Configuring Goose as an SRE Agent

Provider & Model

CloudWatch as a First‑Class Tool

CloudWatch MCP Extension

How Goose Behaves Like an SRE

Why This Works

Entering the Environment

Why --impure?

What This Is Not

Conclusion

Related posts

Creating a AI-enabled Slackbot with AWS Bedrock Knowledge Base

I built an extensive FREE Training Platform to learn Claude Code, Cursor, Codex CLI, and Gemini CLI

Debugging Agents is Tough: How I Built a 'Flight Recorder' for AI Kernel

Why `--impure`?