terraform advanced

Published: (December 1, 2025 at 07:13 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Why Terraform?

Terraform is used to automate cloud infrastructure so humans don’t manually create:

  • VPCs
  • Subnets
  • Security Groups
  • ECS clusters
  • Task definitions
  • Load balancers
  • IAM roles
  • RDS
  • S3
  • DynamoDB
  • Secrets
  • ECR registries
  • Route53
  • CloudWatch alarms

Without Terraform

  • Engineers click around the console
  • No history
  • No review/tracking
  • Accidental misconfiguration
  • Hard to reproduce environments
  • Hard to recover after failure
  • Hard to scale
  • Impossible to maintain dozens of environments

With Terraform

Everything is code and the entire infrastructure is repeatable:

  • Recreate the entire system from scratch
  • Teams use Git history, PR reviews, CI/CD pipelines
  • One command updates all cloud resources correctly

Your project includes:

  • 7 microservices (Python)
  • ECS cluster
  • ALB
  • Couchbase
  • Confluent Cloud
  • PostgreSQL/RDS
  • VPN / VPC
  • Kafka Brokers
  • Lambda (optional)
  • S3 state backend

Terraform Core Concepts EVERY Senior DevOps Must Know

(A) Terraform is

  • Declarative – you describe WHAT you want, Terraform decides HOW to build it.

(B) Terraform Files

FilePurpose
provider.tfAWS provider configuration
backend.tfS3 + DynamoDB state storage
variables.tfInputs used by your modules
ecs.tfECS cluster, services, task definitions
alb.tfLoad balancer, listeners, target groups
rds.tfPostgreSQL RDS instance
network.tfVPC + subnets (or reused existing VPC)
sg.tfSecurity groups
outputs.tfExport useful values (ALB DNS, SG IDs, etc.)

In an interview you must be able to explain why each file exists.


Terraform State — What Companies Expect You to Know

Terraform keeps a “memory” of everything it created in:

terraform.tfstate

The state file contains:

  • All AWS resource IDs
  • Dependencies between resources
  • Current configuration
  • Attributes of each resource

🚨 This file is critical — losing it is catastrophic.
Enterprises never store tfstate locally; they use remote backends:

  • S3 → stores the .tfstate file
  • DynamoDB → manages state locks

Why S3 Backend? (Your Project Example)

Your project involves:

  • GitHub Actions
  • Local development on different machines
  • Multiple terraform apply executions

If each machine stored a local tfstate, you would face:

  • Overwrites between concurrent applies
  • GitHub Actions unable to see local changes
  • Terraform confusion leading to duplicate or destructive actions

S3 solves these problems:

  • One central state for all Terraform runs
  • GitHub Actions and local machines read the same state
  • Safe to delete local terraform.tfstate files
  • State is versioned → rollback possible

Why DynamoDB Lock? (Your Exact Issue Today)

Terraform must prevent concurrent applies. During a run:

terraform plan
terraform apply

Terraform writes a lock into DynamoDB:

LockID = kafka-enterprise-orders-tfstate/terraform.tfstate
  • If GitHub Actions starts while a local run is active, the lock prevents corruption.
  • If a previous run crashes, the lock can remain indefinitely, requiring manual deletion.

Terraform Apply Flow (Your Project Real Flow)

  1. terraform init

    • Downloads AWS provider
    • Reads backend config
    • Connects to S3 and DynamoDB
  2. terraform plan

    • Compares desired infrastructure (code) with actual AWS state
    • Shows planned changes
  3. terraform apply

    • Creates/updates ECS task definitions and services
    • Updates ALB target groups and listeners
    • Modifies security groups and subnet associations
    • Manages IAM roles
    • Creates RDS instances if needed
    • Produces outputs (ALB DNS, SG IDs, etc.)

Terraform Drift Detection (VERY important for senior roles)

Terraform always detects drift between the state file and real resources.

Case 1 – A subnet was deleted manually

Terraform reports:

InvalidSubnetID.NotFound

It then attempts to reconcile the drift by recreating the missing subnet (or failing the run, prompting manual correction).


Terraform Variables & Secrets (Your Project)

You pass the following variables from GitHub Actions to Terraform:

TF_VAR_container_image_producer
TF_VAR_container_image_payment
TF_VAR_container_image_fraud
TF_VAR_container_image_analytics
TF_VAR_web_backend_image
TF_VAR_web_frontend_image
TF_VAR_confluent_bootstrap_servers
TF_VAR_confluent_api_key
TF_VAR_confluent_api_secret
TF_VAR_rds_password
TF_VAR_existing_vpc_id
TF_VAR_existing_public_subnet_ids
TF_VAR_existing_private_subnet_ids
TF_VAR_existing_ecs_tasks_sg_id
TF_VAR_existing_alb_sg_id
TF_VAR_existing_rds_sg_id

These variables allow the code to dynamically read container images and other secrets for each deployment. When code is pushed, CI/CD builds images, pushes them to GHCR, and injects the image tags into Terraform, which updates the ECS task definitions.


Terraform in Senior DevOps Interviews — What They Expect

Must‑Know Topics

  • Terraform state
  • Backends (S3, Azure Blob, GCS)
  • Locking (DynamoDB)
  • Modules
  • Workspaces
  • Providers
  • Data sources
  • Variables & outputs
  • Dependency graph
  • Lifecycle rules
  • terraform import
  • terraform taint
  • terraform graph
  • CI/CD pipelines
  • Secrets management

How to Explain with Real Examples

“Our Terraform manages ECS, ALB, VPC, RDS, SGs, and Kafka resources.
State is centralized in S3 with DynamoDB locking to avoid concurrency issues.
GitHub Actions injects images built in CI into ECS task definitions via TF_VAR_ variables.”


Why Terraform Is Required in Your Project

You have 17+ dependent AWS resources that must be updated together:

  • Changing a container image → ECS tasks update
  • Changing a port → ALB target group updates
  • Changing AWS region → VPC, subnets, and RDS must be recreated
  • Changing Kafka clusters → environment variables update
  • Changing security groups → ECS & ALB update

Terraform guarantees correct ordering and consistency across all resources.


How Terraform Works Internally (“Graph Theory”)

Terraform builds a dependency graph:

aws_vpc -> subnets -> route tables -> security groups -> 
alb -> ecs_cluster -> ecs_services -> tasks

It then executes operations:

  • In parallel where resources are independent
  • Sequentially where dependencies exist

Your ECS Errors Today — Terraform Detecting Infra Issues

  • Invalid security group – a secret contained an invalid value; Terraform rejected it.
  • Subnet deleted manually – Terraform warned about the missing subnet; you corrected the subnet IDs.

Terraform CI/CD — Your Pipeline

  1. Build Docker images
  2. Push images to GitHub Container Registry
  3. Run Terraform (plan & apply)
  4. Update ECS services with new task definitions
  5. New containers deploy instantly

This workflow provides an enterprise‑grade, repeatable deployment process.

Back to Blog

Related posts

Read more »

AWS Terraform Lifecycle Rules

Introduction Infrastructure as Code IaC is most powerful when you have full control over how resources behave during updates, replacements, and deletions. Terr...

Terraform Project: Simple EC2 + Security Group

Project Structure terraform-project/ │── main.tf │── variables.tf │── outputs.tf │── providers.tf │── terraform.tfvars │── modules/ │ └── ec2/ │ ├── main.tf │...