Solved: Help us understand FinOps maturity & cloud cost challenges

Published: 2 months ago (February 25, 2026 at 08:38 AM EST)

7 min read

Source: Dev.to

Source: Dev.to

TL;DR: Cloud cost overruns stem from poor visibility and lack of ownership, exemplified by forgotten high‑cost instances. The solution involves a multi‑pronged FinOps approach, combining automated cleanup scripts, proactive policy‑as‑code guardrails, and fundamental organizational shifts toward showback and chargeback for sustained financial accountability.

Core Recommendations

Implement “Janitor” scripts (e.g., AWS Lambda) to automatically identify and terminate untagged or abandoned cloud resources – a reactive cost‑control measure.
Enforce “Policy as Code” using tools like Sentinel, Open Policy Agent (OPA), or Service Control Policies (SCPs) to prevent expensive or untagged resource provisioning at the IaC or AWS Organization level.
Drive Organizational Change through FinOps practices such as showback (displaying team‑specific cloud spend) and chargeback (allocating costs to team budgets) to foster a culture of financial ownership.

The Problem

“Struggling with runaway cloud costs and immature FinOps practices? This guide, from a Senior DevOps Engineer, breaks down the real reasons for cloud waste and offers three concrete solutions, from quick scripts to permanent cultural shifts, to get your spending under control.”

I still remember the Monday‑morning Slack message from Finance:

“Darian, can you explain this AWS spike?”

Opening the billing console, my stomach dropped. A developer had spun up a p4d.24xlarge EC2 instance on Friday afternoon for a “quick test” of a new ML model and then forgot about it. Over a single weekend that instance generated a five‑figure bill.

We had no guardrails, alerts, or ownership policies. It was a free‑for‑all, and we were paying for it—literally.

This isn’t a unique story. Teams are handed the keys to the cloud kingdom with immense power to innovate, but without the financial literacy or guardrails to do it responsibly. That’s the core of the FinOps maturity struggle. It’s not about being cheap; it’s about being efficient and accountable.

Root Causes

Issue	Description
Lack of Visibility	Engineers can’t see the cost of the infrastructure they’re provisioning in real‑time. `terraform apply` doesn’t show a price tag. Billing is an abstract concept dealt with weeks later.
Lack of Ownership	When no one is directly accountable for a resource (e.g., `dev‑test‑data‑processing‑cluster‑04`), no one has an incentive to shut it down. It becomes “the company’s infrastructure,” a shared problem.

Fixing this isn’t just about finding zombie servers. It’s about fundamentally changing how your teams interact with the cloud.

Solution #1 – Reactive “Stop the Bleeding” (Janitor Scripts)

“This is the reactive, ‘stop the bleeding’ approach. You’re not fixing the culture, but you are stopping the immediate waste.”

We built a simple AWS Lambda function, triggered nightly by EventBridge, that:

Scans all EC2 instances and RDS databases in our dev accounts.
Flags resources missing an owner tag or a TTL (Time‑To‑Live) tag.
Posts a warning to a Slack channel, tagging the creator (if identifiable via CloudTrail).
If the resource remains untagged after 24 hours, a second Lambda terminates it.

Result: Harsh? Yes. Effective? Absolutely.

Sample Python (Boto3) – Lambda Janitor

import boto3

def find_untagged_instances(event, context):
    ec2 = boto3.client('ec2', region_name='us-east-1')
    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )

    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            tags = instance.get('Tags', [])
            tag_keys = [tag['Key'] for tag in tags]

            if 'owner' not in tag_keys:
                print(f"ALERT: Instance {instance_id} is missing 'owner' tag.")
                # In a real script, you'd post this to Slack or SNS
                # and maybe add a "pending_termination" tag

Warning: This is a hack, not a strategy. It cleans up the mess but doesn’t teach anyone not to make one. You’ll spend time maintaining the script and dealing with angry developers whose “important test server” got terminated. Use it to gain initial control, but don’t stop here.

Solution #2 – “Shift‑Left” Prevention (Policy‑as‑Code)

“This is where you ‘shift left’ and prevent the problem from happening in the first place. Instead of cleaning up messes, you make it impossible to create them.”

Core Principle

Embed cost controls directly into your IaC pipeline and cloud account structure.

Mandatory Tagging with IaC Policies

Tools: Sentinel (Terraform Cloud), OPA integrated into CI/CD.
Policy Example: Fail a terraform plan if a resource lacks an owner tag or if an S3 bucket lacks a lifecycle policy.
Outcome: Developers receive immediate feedback before anything is deployed.

Service Control Policies (SCPs)

Apply SCPs at the AWS Organization level to developer accounts. SCPs act as “IAM policies on steroids,” allowing you to deny the creation of specific instance families.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyExpensiveInstanceTypesInDev",
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "ec2:InstanceType": [
            "p4d.24xlarge",
            "g5.12xlarge",
            "g5.24xlarge"
          ]
        }
      }
    }
  ]
}

Use‑Case: Block all p4, g5, etc., instance types in any account that isn’t the designated “ML Research” OU.

Solution #3 – Organizational Change (FinOps Practices)

“Drive ‘Organizational Change’ through FinOps practices like ‘showback’ and ‘chargeback’ to foster a culture of financial ownership.”

Steps to Implement

Showback – Publish weekly/monthly dashboards that break down cloud spend by team, project, or tag.
Chargeback – Allocate actual costs to each team’s budget, making overspend a direct responsibility.
FinOps Council – Form a cross‑functional group (Engineering, Finance, Product) to review spend, set budgets, and refine policies.
Education & Training – Run regular workshops on cloud pricing, cost‑effective architecture patterns, and tagging standards.

Expected Benefits

Benefit	Description
Transparency	Teams see the financial impact of their decisions in near‑real‑time.
Accountability	Ownership is assigned; teams are incentivized to optimize.
Continuous Improvement	Regular reviews surface new waste patterns and drive policy updates.

Putting It All Together

Phase	Action	Owner
1️⃣ Reactive	Deploy Lambda Janitor + Slack alerts.	Cloud Ops / SRE
2️⃣ Preventive	Implement Sentinel/OPA policies & SCPs.	Platform Engineering
3️⃣ Cultural	Roll out showback/chargeback dashboards, form FinOps council, run training.	Finance + Engineering Leadership

Bottom line: Start with the quick win (janitor script) to halt immediate waste, then lock down provisioning with policy‑as‑code, and finally embed financial responsibility into the organization’s DNA. This three‑layered approach moves you from “fire‑fighting” to “financially‑smart cloud engineering.”

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "arn:aws:ec2:*:*:instance/*",
      "Condition": {
        "StringLike": {
          "ec2:InstanceType": [
            "p4d.*",
            "p3.*",
            "g5.*",
            "x2iezn.*",
            "u-12tb1.metal"
          ]
        }
      }
    }
  ]
}

Approaches Overview

Approach	Effort	Time to Implement	Long‑Term Impact
1. The Janitor Script	Low	Days	Low (Reactive)
2. Policy & Guardrails	Medium	Weeks	High (Proactive)
3. Organizational Change	High	Months/Quarters	Transformational

Ultimately, a mature FinOps practice uses a combination of all three:

Janitor script for what slips through.
Guardrails to prevent most issues.
Cultural ownership to make everyone a responsible steward of cloud resources.

Stop chasing surprise bills and start building a platform that makes financial responsibility the path of least resistance.

👉 Read the original article on TechResolve.blog
☕ Support my work – If this article helped you, you can buy me a coffee: 👉