Stop Creating Everything: The Art of Terraform Data Sources

Published: 2 months ago (December 7, 2025 at 07:09 PM EST)

4 min read

Source: Dev.to

Introduction

It’s Day 13 of the AWS Challenge, and today I learned something that reshaped my perspective of Infrastructure as Code: you don’t have to manage everything.

Up until now, every Terraform practice I did was about creating resources—VPCs, subnets, security groups, etc. In real‑world organizations, multiple teams own different pieces of the infrastructure. The networking team builds the VPC, the security team manages security groups, and your job is to deploy your app into that existing infrastructure.

That’s where data sources come in, and they’re absolutely game‑changing.

Resources vs. Data Sources

Resource Block

resource "aws_vpc" "my_vpc" {
  cidr_block = "10.0.0.0/16"
  # Terraform creates, updates, and destroys this
}

Data Block

data "aws_vpc" "existing_vpc" {
  filter {
    name   = "tag:Name"
    values = ["shared-network-vpc"]
  }
  # Terraform just reads this, never touches it
}

The distinction is simple:

Resource – “I own this. I manage its entire lifecycle.”
Data source – “This already exists. I just need to reference it.”

The implications are huge.

Why This Matters: The Multi‑Team Reality

Your company has:

A networking team that manages all VPCs and subnets
A security team that maintains security groups and IAM policies
An infrastructure team (that’s you!) that deploys applications

Without data sources

You’d need access to everyone’s Terraform state files (good luck with that)
You’d end up copy‑pasting IDs manually
You might create duplicate resources, causing conflicts

With data sources

You simply query what you need:

# Find the VPC the cloud networking team created
data "aws_vpc" "company_vpc" {
  filter {
    name   = "tag:ManagedBy"
    values = ["cloud-networking-team"]
  }
}

# Find the security group the security team manages
data "aws_security_group" "approved_sg" {
  filter {
    name   = "tag:ManagedBy"
    values = ["security-team"]
  }
}

# Deploy your app using their infrastructure
resource "aws_instance" "my_app" {
  ami                     = data.aws_ami.latest_amazon_linux.id
  subnet_id               = data.aws_subnet.app_subnet.id
  vpc_security_group_ids = [data.aws_security_group.approved_sg.id]
}

Result: clean, collaborative, and conflict‑free.

My Hands‑On Demo: The Three Essential Data Sources

Setup: Simulating Existing Infrastructure

# This simulates what the networking team already deployed
resource "aws_vpc" "shared" {
  cidr_block = "10.0.0.0/16"
  tags = {
    Name = "shared-network-vpc"
  }
}

resource "aws_subnet" "shared" {
  vpc_id     = aws_vpc.shared.id
  cidr_block = "10.0.1.0/24"
  tags = {
    Name = "shared-primary-subnet"
  }
}

After applying this, imagine you “forgot” it exists—just like in large orgs.

Data Source #1: Finding the VPC

data "aws_vpc" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared-network-vpc"]
  }
}

What this does

Queries AWS for a VPC with the specified tag
Returns the VPC ID, CIDR block, and all other attributes
Refreshes on every terraform apply

Console test

terraform console
> data.aws_vpc.shared.id
"vpc-0a1b2c3d4e5f6"
> data.aws_vpc.shared.cidr_block
"10.0.0.0/16"

Data Source #2: Finding the Subnet (Chained!)

data "aws_subnet" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared-primary-subnet"]
  }
  vpc_id = data.aws_vpc.shared.id  # Use the VPC data source
}

Key takeaways

Data sources can be chained; one feeds into another
vpc_id narrows the search, preventing accidental matches when multiple subnets share a tag

Data Source #3: Latest AMI (The Dynamic One)

data "aws_ami" "amazon_linux_2" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

Why this is brilliant

most_recent = true always grabs the newest matching AMI
Wildcards (*) give flexible pattern matching
Multiple filters ensure you get exactly what you need, keeping instances up‑to‑date automatically

Putting It All Together: The Final Resource

resource "aws_instance" "main" {
  ami           = data.aws_ami.amazon_linux_2.id
  instance_type = "t2.micro"
  subnet_id     = data.aws_subnet.shared.id
  private_ip    = "10.0.1.50"

  tags = {
    Name = "day13-instance"
  }
}

Running terraform plan yields:

Plan: 1 to add, 0 to change, 0 to destroy.

Only one resource is created—the EC2 instance. The VPC and subnet are merely referenced, not managed.

The Power Move: Terraform Console Testing

terraform console

# Test VPC data source
> data.aws_vpc.shared.id
"vpc-0a1b2c3d4e5f6"

# Test subnet data source
> data.aws_subnet.shared.cidr_block
"10.0.1.0/24"

# Test AMI data source
> data.aws_ami.amazon_linux_2.name
"amzn2-ami-hvm-2.0.20231218.0-x86_64-gp2"

Use the console to verify filters and values before any real deployment.

Common Data Sources You’ll Actually Use

Network Resources

# VPC lookup
data "aws_vpc" "main" { ... }

# Subnet lookup
data "aws_subnet" "main" { ... }

# Security Group lookup
data "aws_security_group" "main" { ... }

# All availability zones in the current region
data "aws_availability_zones" "available" {
  state = "available"
}

Compute Resources

# Latest AMI (super common)
data "aws_ami" "latest" { ... }

# Existing EC2 instance
# data "aws_instance" "existing" { ... }

These data sources form the backbone of any Terraform configuration that needs to interoperate with pre‑existing cloud resources.