Stop Creating Everything: The Art of Terraform Data Sources
Source: Dev.to
Introduction
It’s Day 13 of the AWS Challenge, and today I learned something that reshaped my perspective of Infrastructure as Code: you don’t have to manage everything.
Up until now, every Terraform practice I did was about creating resources—VPCs, subnets, security groups, etc. In real‑world organizations, multiple teams own different pieces of the infrastructure. The networking team builds the VPC, the security team manages security groups, and your job is to deploy your app into that existing infrastructure.
That’s where data sources come in, and they’re absolutely game‑changing.
Resources vs. Data Sources
Resource Block
resource "aws_vpc" "my_vpc" {
cidr_block = "10.0.0.0/16"
# Terraform creates, updates, and destroys this
}
Data Block
data "aws_vpc" "existing_vpc" {
filter {
name = "tag:Name"
values = ["shared-network-vpc"]
}
# Terraform just reads this, never touches it
}
The distinction is simple:
- Resource – “I own this. I manage its entire lifecycle.”
- Data source – “This already exists. I just need to reference it.”
The implications are huge.
Why This Matters: The Multi‑Team Reality
Your company has:
- A networking team that manages all VPCs and subnets
- A security team that maintains security groups and IAM policies
- An infrastructure team (that’s you!) that deploys applications
Without data sources
- You’d need access to everyone’s Terraform state files (good luck with that)
- You’d end up copy‑pasting IDs manually
- You might create duplicate resources, causing conflicts
With data sources
You simply query what you need:
# Find the VPC the cloud networking team created
data "aws_vpc" "company_vpc" {
filter {
name = "tag:ManagedBy"
values = ["cloud-networking-team"]
}
}
# Find the security group the security team manages
data "aws_security_group" "approved_sg" {
filter {
name = "tag:ManagedBy"
values = ["security-team"]
}
}
# Deploy your app using their infrastructure
resource "aws_instance" "my_app" {
ami = data.aws_ami.latest_amazon_linux.id
subnet_id = data.aws_subnet.app_subnet.id
vpc_security_group_ids = [data.aws_security_group.approved_sg.id]
}
Result: clean, collaborative, and conflict‑free.
My Hands‑On Demo: The Three Essential Data Sources
Setup: Simulating Existing Infrastructure
# This simulates what the networking team already deployed
resource "aws_vpc" "shared" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "shared-network-vpc"
}
}
resource "aws_subnet" "shared" {
vpc_id = aws_vpc.shared.id
cidr_block = "10.0.1.0/24"
tags = {
Name = "shared-primary-subnet"
}
}
After applying this, imagine you “forgot” it exists—just like in large orgs.
Data Source #1: Finding the VPC
data "aws_vpc" "shared" {
filter {
name = "tag:Name"
values = ["shared-network-vpc"]
}
}
What this does
- Queries AWS for a VPC with the specified tag
- Returns the VPC ID, CIDR block, and all other attributes
- Refreshes on every
terraform apply
Console test
terraform console
> data.aws_vpc.shared.id
"vpc-0a1b2c3d4e5f6"
> data.aws_vpc.shared.cidr_block
"10.0.0.0/16"
Data Source #2: Finding the Subnet (Chained!)
data "aws_subnet" "shared" {
filter {
name = "tag:Name"
values = ["shared-primary-subnet"]
}
vpc_id = data.aws_vpc.shared.id # Use the VPC data source
}
Key takeaways
- Data sources can be chained; one feeds into another
vpc_idnarrows the search, preventing accidental matches when multiple subnets share a tag
Data Source #3: Latest AMI (The Dynamic One)
data "aws_ami" "amazon_linux_2" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
Why this is brilliant
most_recent = truealways grabs the newest matching AMI- Wildcards (
*) give flexible pattern matching - Multiple filters ensure you get exactly what you need, keeping instances up‑to‑date automatically
Putting It All Together: The Final Resource
resource "aws_instance" "main" {
ami = data.aws_ami.amazon_linux_2.id
instance_type = "t2.micro"
subnet_id = data.aws_subnet.shared.id
private_ip = "10.0.1.50"
tags = {
Name = "day13-instance"
}
}
Running terraform plan yields:
Plan: 1 to add, 0 to change, 0 to destroy.
Only one resource is created—the EC2 instance. The VPC and subnet are merely referenced, not managed.
The Power Move: Terraform Console Testing
terraform console
# Test VPC data source
> data.aws_vpc.shared.id
"vpc-0a1b2c3d4e5f6"
# Test subnet data source
> data.aws_subnet.shared.cidr_block
"10.0.1.0/24"
# Test AMI data source
> data.aws_ami.amazon_linux_2.name
"amzn2-ami-hvm-2.0.20231218.0-x86_64-gp2"
Use the console to verify filters and values before any real deployment.
Common Data Sources You’ll Actually Use
Network Resources
# VPC lookup
data "aws_vpc" "main" { ... }
# Subnet lookup
data "aws_subnet" "main" { ... }
# Security Group lookup
data "aws_security_group" "main" { ... }
# All availability zones in the current region
data "aws_availability_zones" "available" {
state = "available"
}
Compute Resources
# Latest AMI (super common)
data "aws_ami" "latest" { ... }
# Existing EC2 instance
# data "aws_instance" "existing" { ... }
These data sources form the backbone of any Terraform configuration that needs to interoperate with pre‑existing cloud resources.