Stop Creating Everything: The Art of Terraform Data Sources

Published: (December 7, 2025 at 07:09 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

It’s Day 13 of the AWS Challenge, and today I learned something that reshaped my perspective of Infrastructure as Code: you don’t have to manage everything.

Up until now, every Terraform practice I did was about creating resources—VPCs, subnets, security groups, etc. In real‑world organizations, multiple teams own different pieces of the infrastructure. The networking team builds the VPC, the security team manages security groups, and your job is to deploy your app into that existing infrastructure.

That’s where data sources come in, and they’re absolutely game‑changing.

Resources vs. Data Sources

Resource Block

resource "aws_vpc" "my_vpc" {
  cidr_block = "10.0.0.0/16"
  # Terraform creates, updates, and destroys this
}

Data Block

data "aws_vpc" "existing_vpc" {
  filter {
    name   = "tag:Name"
    values = ["shared-network-vpc"]
  }
  # Terraform just reads this, never touches it
}

The distinction is simple:

  • Resource – “I own this. I manage its entire lifecycle.”
  • Data source – “This already exists. I just need to reference it.”

The implications are huge.

Why This Matters: The Multi‑Team Reality

Your company has:

  • A networking team that manages all VPCs and subnets
  • A security team that maintains security groups and IAM policies
  • An infrastructure team (that’s you!) that deploys applications

Without data sources

  • You’d need access to everyone’s Terraform state files (good luck with that)
  • You’d end up copy‑pasting IDs manually
  • You might create duplicate resources, causing conflicts

With data sources

You simply query what you need:

# Find the VPC the cloud networking team created
data "aws_vpc" "company_vpc" {
  filter {
    name   = "tag:ManagedBy"
    values = ["cloud-networking-team"]
  }
}

# Find the security group the security team manages
data "aws_security_group" "approved_sg" {
  filter {
    name   = "tag:ManagedBy"
    values = ["security-team"]
  }
}

# Deploy your app using their infrastructure
resource "aws_instance" "my_app" {
  ami                     = data.aws_ami.latest_amazon_linux.id
  subnet_id               = data.aws_subnet.app_subnet.id
  vpc_security_group_ids = [data.aws_security_group.approved_sg.id]
}

Result: clean, collaborative, and conflict‑free.

My Hands‑On Demo: The Three Essential Data Sources

Setup: Simulating Existing Infrastructure

# This simulates what the networking team already deployed
resource "aws_vpc" "shared" {
  cidr_block = "10.0.0.0/16"
  tags = {
    Name = "shared-network-vpc"
  }
}

resource "aws_subnet" "shared" {
  vpc_id     = aws_vpc.shared.id
  cidr_block = "10.0.1.0/24"
  tags = {
    Name = "shared-primary-subnet"
  }
}

After applying this, imagine you “forgot” it exists—just like in large orgs.

Data Source #1: Finding the VPC

data "aws_vpc" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared-network-vpc"]
  }
}

What this does

  • Queries AWS for a VPC with the specified tag
  • Returns the VPC ID, CIDR block, and all other attributes
  • Refreshes on every terraform apply

Console test

terraform console
> data.aws_vpc.shared.id
"vpc-0a1b2c3d4e5f6"
> data.aws_vpc.shared.cidr_block
"10.0.0.0/16"

Data Source #2: Finding the Subnet (Chained!)

data "aws_subnet" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared-primary-subnet"]
  }
  vpc_id = data.aws_vpc.shared.id  # Use the VPC data source
}

Key takeaways

  • Data sources can be chained; one feeds into another
  • vpc_id narrows the search, preventing accidental matches when multiple subnets share a tag

Data Source #3: Latest AMI (The Dynamic One)

data "aws_ami" "amazon_linux_2" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

Why this is brilliant

  • most_recent = true always grabs the newest matching AMI
  • Wildcards (*) give flexible pattern matching
  • Multiple filters ensure you get exactly what you need, keeping instances up‑to‑date automatically

Putting It All Together: The Final Resource

resource "aws_instance" "main" {
  ami           = data.aws_ami.amazon_linux_2.id
  instance_type = "t2.micro"
  subnet_id     = data.aws_subnet.shared.id
  private_ip    = "10.0.1.50"

  tags = {
    Name = "day13-instance"
  }
}

Running terraform plan yields:

Plan: 1 to add, 0 to change, 0 to destroy.

Only one resource is created—the EC2 instance. The VPC and subnet are merely referenced, not managed.

The Power Move: Terraform Console Testing

terraform console

# Test VPC data source
> data.aws_vpc.shared.id
"vpc-0a1b2c3d4e5f6"

# Test subnet data source
> data.aws_subnet.shared.cidr_block
"10.0.1.0/24"

# Test AMI data source
> data.aws_ami.amazon_linux_2.name
"amzn2-ami-hvm-2.0.20231218.0-x86_64-gp2"

Use the console to verify filters and values before any real deployment.

Common Data Sources You’ll Actually Use

Network Resources

# VPC lookup
data "aws_vpc" "main" { ... }

# Subnet lookup
data "aws_subnet" "main" { ... }

# Security Group lookup
data "aws_security_group" "main" { ... }

# All availability zones in the current region
data "aws_availability_zones" "available" {
  state = "available"
}

Compute Resources

# Latest AMI (super common)
data "aws_ami" "latest" { ... }

# Existing EC2 instance
# data "aws_instance" "existing" { ... }

These data sources form the backbone of any Terraform configuration that needs to interoperate with pre‑existing cloud resources.

Back to Blog

Related posts

Read more »

Terraform Data Source (AWS)

What Are Terraform Data Sources? A data source in Terraform is a read‑only lookup to an existing resource. Instead of creating something new, Terraform queries...

Day-13: Data sources in Terraform

What are Data Sources? You can use data sources to fetch information about existing VPCs, subnets, AMIs, security groups, etc. hcl data 'data_source_type' 'dat...

Day 13: Terraform Data Sources

Data Source Think of a data source like a phone directory with a username and phone number as key‑value pairs accessed via an API. Instead of hard‑coding value...

AWS Terraform Lifecycle Rules

Introduction Infrastructure as Code IaC is most powerful when you have full control over how resources behave during updates, replacements, and deletions. Terr...