SRE | EUNO.NEWS

16小时前 · devops

SRE 周刊第506期

在 sreweekly.com 上查看我们赞助商 Costory 的一条信息：您并没有报名参加 FinOps。Costory 自动解释您的云成本为何变化，并报告……

#SRE #FinOps #cloud cost management #AWS Marketplace #GCP Marketplace #Slack integration #cost optimization
3天前 · devops

Kubernetes IAM 与 RBAC 为 DevOps 与 SRE

理解 Kubernetes 中的身份初学者层面认证 vs 授权 - 认证 – 你是谁？ - 授权 – 你能做什么？ Kubernetes…

#kubernetes #IAM #RBAC #SRE #authentication #authorization #serviceaccounts #cloud-security
4天前 · devops

Red Hat Advanced Cluster Management for Kubernetes 2.15 中的新效率升级

概述如果你是平台工程师或 SRE，你会知道管理基础设施和高效管理它是完全不同的两件事。你已经能够...

#Red Hat #Advanced Cluster Management #Kubernetes #Version 2.15 #Platform Engineering #SRE #Cluster Management #Infrastructure Automation
1周前 · devops

SRE 周刊第505期

在 sreweekly.com 查看我们的赞助商 Hopp 的信息：凌晨 2 点被呼叫？🚨 让 incident triage 感觉就像你和 Hopp 在同一键盘上操作。简洁、易读……

#devops #sre #reliability
1周前 · devops

从被动到预测：真正有效的容量规划系统

我曾经以为容量规划只是设置 CloudWatch 警报，并希望它们在系统出问题之前触发。剧透：这并不是容量规划——那是…

#capacity planning #predictive scaling #cloud monitoring #infrastructure management #SRE #performance optimization
1周前 · devops

系统运行却无人觉醒：监控与人类响应之间的失效

凌晨2:07，核心生产节点宕机。CPU使用率飙升，延迟急剧增加，整个集群的请求开始超时。监控工具捕获到……

#monitoring #incident-response #alert-fatigue #observability #on-call #reliability #SRE
1周前 · devops

10个 AWS 生产事故教会我真实世界的 SRE

10个 AWS 生产事故——到底出了什么问题以及我如何修复它们在处理了数百起 AWS 生产事故后，我发现教科书式的解决方案往往……

#AWS #SRE #incident-response #cloud #Lambda
1周前 · devops

你的30分钟早晨监测例程？问题不在于数据太多。

封面图片：您的30分钟晨间监测例程？问题不在于数据太多。https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,...

#monitoring #AWS #observability #dashboards #incident-response #SRE
1周前 · devops

为什么传统 DevOps 停止扩展

传统的 DevOps 运作良好……直到组织规模扩大。在小规模时，一个集中式的 DevOps 团队负责部署、修复和处理所有问题，感觉很高效……

#devops #scaling #bottlenecks #automation #toolchain #self-service #kubernetes #terraform #sre #infrastructure-as-code
2周前 · devops

SRE 周刊第504期

在 sreweekly.com 上查看在一堆 Salt 中寻找一粒沙子 Salt 是 Cloudflare 的配置管理工具。如何找到配置的根本原因……

#SRE #Cloudflare #configuration management #root cause analysis #incident response #change management #observability
3周前 · devops

AWS DevOps Agent：10个最佳实践，充分利用它

AWS DevOps Agent – 最佳实践指南在2025年 AWS re:Invent 大会上，关键发布之一是全新前沿自主代理的推出： - AWS DevOps Agent - AWS…

#AWS #DevOps Agent #best practices #SRE #re:Invent #automation #monitoring #service level objectives
3周前 · devops

在不增加人手的情况下扩展 Kubernetes

随着 Kubernetes 的采用不断增长，运维复杂性也随之提升。最初只运行少量服务的小型集群，可能会迅速演变成包含数十个 app……

#kubernetes #cluster-scaling #platform-engineering #operational-automation #devops #sre #infrastructure-management

Newer posts

Older posts