Terraform Data Source (AWS)
Source: Dev.to
What Are Terraform Data Sources?
A data source in Terraform is a read‑only lookup to an existing resource. Instead of creating something new, Terraform queries the cloud provider (AWS in this case) and returns information that can be used inside your configuration.
When to Use Data Sources
- A resource is already created (e.g., shared VPCs, existing AMIs).
- Another team manages the resource (network or security team).
- Your Terraform module should not own or recreate the resource.
- You need the latest or filtered version of something (e.g., the newest AMI).
- You want to avoid hard‑coding identifiers such as IDs or ARNs.
Using data sources leads to cleaner, more dynamic infrastructure code.
Example 1: Fetching VPC ID Using a Data Source
In many organizations networking is centralized. The VPC already exists, and your Terraform code will only deploy application resources inside it.
data "aws_vpc" "vpc_name" {
filter {
name = "tag:Name"
values = ["default-vpc"]
}
}
- Searches for a VPC where the tag Name = default-vpc.
- Returns the VPC’s ID, accessible as
data.aws_vpc.vpc_name.id. - Avoids the need to manually capture or maintain the VPC ID.
Example 2: Fetching Subnet ID from a Specific VPC
Once the VPC is retrieved, you often need a subnet inside it.
data "aws_subnet" "shared_subnet" {
filter {
name = "tag:Name"
values = ["subnet-a"]
}
vpc_id = data.aws_vpc.vpc_name.id
}
- Fetches a subnet with Name = subnet-a.
- Ensures the subnet belongs to the VPC fetched earlier.
- Returns a single subnet ID, usable as
data.aws_subnet.shared_subnet.id.
This lets Terraform deploy EC2 or Lambda resources into the correct shared subnet without hard‑coding anything.
Example 3: Fetching the Latest Amazon Linux 2 AMI
AMI IDs change frequently across regions, and using outdated or hard‑coded AMIs leads to deployment failures.
data "aws_ami" "linux2" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "architecture"
values = ["x86_64"]
}
}
- Retrieves the latest Amazon Linux 2 AMI.
- Limits results to official images owned by the Amazon account.
- Ensures the AMI matches the required architecture and virtualization type.
A perfect example of how data sources keep images up to date automatically.
Using the Data Sources to Launch an EC2 Instance
After fetching the VPC, Subnet, and AMI, you can provision an EC2 instance using those dynamic values:
resource "aws_instance" "ec2_one" {
ami = data.aws_ami.linux2.id
instance_type = var.instance_type
subnet_id = data.aws_subnet.shared_subnet.id
tags = var.tags
}
- Uses the AMI from the data source.
- Places the EC2 instance inside the shared subnet.
- Applies the user‑provided instance type and tags.
The result is a reusable, environment‑independent, and future‑proof Terraform configuration.
Why Data Sources Matter
- Avoids Hardcoding – No need to store IDs, ARNs, or AMIs manually.
- Enables Multi‑Team, Multi‑Account Use – Teams can reference central resources without needing permissions to modify them.
- Improves Reusability – Modules become generic and work across dev, test, and prod seamlessly.
- Supports Dynamic and Automated Infrastructure – Fetching the latest AMIs ensures security and consistency.
- Reduces Human Error – Eliminates error‑prone copy‑pasting of IDs.
Conclusion
Terraform data sources are essential for building dynamic, secure, and production‑ready infrastructure. They allow your code to interact with existing AWS resources—such as VPCs, subnets, AMIs, and more—without recreating them. The examples above represent real‑world scenarios where infrastructure teams rely heavily on these patterns, especially in shared network environments. By using data sources effectively, your Terraform setup becomes more scalable, maintainable, and aligned with best DevOps practices.