How Cleric uses tsnet to securely automate software operations

Published: 1 week ago (May 1, 2026 at 10:00 AM EDT)

5 min read

Source: Tailscale Blog

Source: Tailscale Blog

Cleric – Autonomous AI Site Reliability Engineer (SRE)

Cleric is building the first autonomous AI SRE to handle the heavy lifting of software operations. Our AI:

Investigates incidents
Triages alerts
Captures production context

This lets engineering teams focus on building, while the AI takes care of operational toil.

The Core Challenge

An effective AI SRE must be able to query the same internal tools, databases, and telemetry providers that a human engineer would. The key architectural question is:

How can we obtain secure, low‑latency access to customers’ private resources without imposing a heavy burden on their platform and security teams?

Our Solution

We leverage Tailscale and its tsnet library to create a secure connectivity layer that:

Is easy for customers to integrate
Requires minimal operational overhead for us

By using Tailscale’s zero‑trust networking and the lightweight tsnet client, we provide the AI SRE with the necessary access while maintaining strong security guarantees.

The Connectivity Problem {#the-connectivity-problem}

Our customers’ environments are as diverse as the services they run. We encounter everything from:

AWS‑only shops with strict VPC requirements
Identity‑perimeter setups where every service is authenticated via an IdP
VPN‑dependent environments that require a tunnel for any internal access
Multi‑cloud architectures that rely on complex peering models

When we set out to build Cleric’s connectivity layer, we defined three non‑negotiable principles:

Security – Follow the principle of least privilege. We must never access network endpoints that a customer does not explicitly allow.
Ease of Use – If a customer needs to spend weeks configuring routing tables and passing security reviews just to try Cleric, we have already lost them.
Scalability – We cannot maintain a unique, bespoke networking stack for every individual customer.

Why Traditional Methods Fall Short

Before landing on Tailscale, we evaluated the “standard” ways of connecting to private infrastructure. Each option introduced trade‑offs that were ultimately unacceptable.

The Reverse Proxy

We could ask customers to set up an authenticated reverse proxy to expose specific internal endpoints to our agent. While this sounds reasonable at first glance, in practice it becomes a nightmare for everyone involved:

Customer impact – extra infrastructure to maintain and a high‑friction security review.
Our impact – “implementation debt” because we must maintain custom logic for each customer’s proxy configuration.

Cloud‑Native Connectivity (VPC Peering & PrivateLink)

Solutions such as AWS VPC peering or Azure VNet peering are robust, but they require significant coordination:

CIDR block overlaps must be avoided.
Routing tables need to be updated manually.
Cross‑account permissions have to be granted.

As a multi‑cloud provider, managing a heterogeneous mix of provider‑specific connection mechanisms would quickly become an operational bottleneck for our team.

Building a private overlay with Tailscale {#building-a-private-overlay-with-tailscale}

We realized that what we actually needed wasn’t direct connectivity into the customer’s network; we needed a private, programmable overlay network.
We wanted to control exactly which endpoints were exposed, with zero changes to the customer’s existing network configuration.

Tailscale provided the perfect foundation. By using Tailscale’s WireGuard™‑based mesh, we can establish encrypted, peer‑to‑peer connections between our agent and customer resources regardless of where they sit.

The breakthrough for us was tsnet, Tailscale’s library that allows you to embed Tailscale directly into a Go binary.

Tailscale overlay diagram

How it works: virtualizing the network {#how-it-works-virtualizing-the-network}

Instead of asking a customer to install a VPN client on a gateway or bastion host, we provide a simple binary or a Helm chart. This Cleric Connector acts as a device (or several) on a Tailscale tailnet.

Implementation outline

Endpoint modeling – Each customer resource (database, Prometheus instance, etc.) is modeled as a distinct “device” within our internal architecture.
tsnet advantage – Using tsnet, we virtualize these devices into a single process. The agent sees a flat, secure namespace of authorized services instead of a complex web of subnets.
Zero‑config deployment – To the customer, the setup is “drop‑in.” They supply the Connector with a list of internal endpoints they wish to expose; Tailscale handles NAT traversal and encrypted tunneling automatically. No firewall holes, no public IPs, and no routing‑table headaches.

Security by design: beyond the bastion {#security-by-design-beyond-the-bastion}

One of the most significant advantages of this model is that it avoids the “all‑or‑nothing” trap of traditional VPNs.

In a traditional VPN or bastion host model, once you are “inside,” you often have broad visibility into the private network. If internal firewall rules are misconfigured, an external agent could potentially reach any resource on that subnet.

With the Cleric + Tailscale model, lateral movement is impossible:

Each endpoint is a unique device.
The agent can only see the specific endpoints it has been explicitly granted access to.
We don’t route based on IP ranges; we connect based on identity.

If a resource isn’t explicitly configured for the Connector, it doesn’t exist from the Cleric agent’s perspective.

For the security team, this model is trivially auditable: simply review the list of endpoints configured on the Connector. Cleric has no access to private endpoints beyond what’s on that list.

The Result {#the-result}

By building on Tailscale, we’ve turned a complex infrastructure hurdle into a competitive advantage. This solution has been running in production for several months now and has cut our time‑to‑value substantially, in some cases eliminating weeks of process time.

We’ve stopped worrying about network topologies and started focusing on what we do best: building the future of autonomous operations.