SRE Weekly Issue #506
View on sreweekly.com A message from our sponsor, Costory: You didn’t sign up to do FinOps.Costory automatically explains why your cloud costs change, and repor...
View on sreweekly.com A message from our sponsor, Costory: You didn’t sign up to do FinOps.Costory automatically explains why your cloud costs change, and repor...
Understanding Identity in Kubernetes Beginner Level Authentication vs Authorization - Authentication – Who are you? - Authorization – What can you do? Kubernet...
Overview If you’re a platform engineer or SRE, you know that managing infrastructure and efficiently managing it are two very different things. You’ve been abl...
View on sreweekly.com A message from our sponsor, Hopp: Paging at 2am? 🚨 Make incident triage feel like you’re at the same keyboard with Hopp. crisp, readable...
I used to think capacity planning was about setting up CloudWatch alarms and hoping they’d fire before things broke. Spoiler: that’s not capacity planning—that’...
At 2:07 a.m., a core production node went down. CPU usage spiked, latency ballooned and requests started timing out across the cluster. Monitoring tools caught...
10 AWS Production Incidents — What Really Broke & How I Fixed Them After handling hundreds of AWS production incidents, I’ve learned that textbook solutions ra...
!Cover image for Your 30-Minute Morning Monitoring Routine? The Problem Isn't Too Much Data.https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,...
Traditional DevOps works well… until the organization grows. At small scale, a central DevOps team deploying, fixing, and firefighting everything feels efficien...
View on sreweekly.com Finding the grain of sand in a heap of Salt Salt is Cloudflare’s configuration management tool. How do you find the root cause of a config...
AWS DevOps Agent – Best‑Practice Guide One of the key releases at AWS re:Invent 2025 was the launch of new frontier autonomous agents: - AWS DevOps Agent - AWS...
As Kubernetes adoption grows, so does operational complexity. What starts as a small cluster running a handful of services can quickly evolve into dozens of app...