Terraform Rules

Project structure rules

Use variables.tf for variables

Severity: 🟥 Won’t affect the state. Can be fixed in the scope of a PR

Description: Typically, when developing a Terraform project, one would separate variables into their own file to improve readability. This is especially valid for modules. Most of the public modules utilize this convention. This file is often named variables.tf. Clear resources separation allows easy & productive code review as well as improves code learning curve.

Example:

# bad example
# main.tf
variable "create_networking" {
    description = "Creates VPC, subnets, VPNs, Routing tables if true"
    type = bool
    default = true
}
resource "aws_vpc" "this" {
  count = var.create_networking ? 1 : 0
  ###
}

# good example
# variables.tf
variable "create_networking" {
    description = "Creates VPC, subnets, VPNs, Routing tables if true"
    type = bool
    default = true
}
# main.tf
resource "aws_vpc" "this" {
  count = var.create_networking ? 1 : 0
  ###
}

Don’t add anything to variables.tf except variables

Severity: 🟥 Won’t affect the state. Can be fixed in the scope of a PR

Description: Typically, when developing a Terraform project, one would separate variables into their own file to improve readability. This is especially valid for modules. Most of the public modules utilize this convention. Having other things defined in a variables.tf file besides variables leads to confusion. Clear resources separation allows easy & productive code review as well as improves code learning curve.

Example:

# bad example
# variables.tf
locals {
  networks_number = var.create_networking ? 1 : 0
}

variable "create_networking" {
    description = "Creates VPC, subnets, VPNs, Routing tables if true. Skips creation, or deletes existing if sets to false. Default value is true"
    type = bool
    default = true
}

# good example
# variables.tf
variable "create_networking" {
    description = "Creates VPC, subnets, VPNs, Routing tables if true. Skips creation, or deletes existing if sets to false. Default value is true"
    type = bool
    default = true
}
# main.tf
locals {
  networks_number = var.create_networking ? 1 : 0
}

Use outputs.tf for outputs

Severity: 🟥 Won’t affect the state. Can be fixed in the scope of a PR

Description: Typically, when developing a Terraform project, one would separate outputs into their own file to improve readability. This is especially valid for modules and projects exposing their state to others. Most of the public modules utilize this convention. This file is often named outputs.tf.Clear resources separation allows easy & productive code review as well as improves code learning curve.

Example:

# bad example
# main.tf
resource "google_pubsub_topic" "this" {
  name = "platform-events"
}
outputs "this_pubsub_topic_name" {
    description = "Platform events topic name. Can be passed as an environment variable to backend applications to send general platform-specifc events"
    value = google_pubsub_topic.this.name
}

# good example
# outputs.tf
outputs "this_pubsub_topic_name" {
    description = "Platform events topic name. Can be passed as an environment variable to backend applications to send general platform-specifc events"
    value = google_pubsub_topic.this.name
}
# main.tf
resource "google_pubsub_topic" "this" {
  name = "platform-events"
}

Don’t add anything to outputs.tf except outputs

Severity: 🟥 Won’t affect the state. Can be fixed in the scope of a PR

Description: Typically, when developing a Terraform project, one would separate outputs into their own file to improve readability. This is especially valid for modules and projects exposing their state to others. Most of the public modules utilize this convention. Having other things defined in an outputs.tf file besides outputs leads to confusion. Clear resources separation allows easy & productive code review as well as improves code learning curve.

Example:

# bad example
# outputs.tf
locals {
  subnets_names = [for network in module.subnets.subnets : network.name]
}
output "subnets_names" {
  description = "The names of the subnets being created"
  value       = local.subnets_names
}

# good example
# locals.tf
locals {
  subnets_names = [for network in module.subnets.subnets : network.name]
}
# outputs.tf
output "subnets_names" {
  description = "The names of the subnets being created"
  value       = local.subnets_names
}

Use providers.tf for providers

Severity: 🟥 Won’t affect the state. Can be fixed in the scope of a PR

Description: Typically, when developing a Terraform project, one would separate providers into their own file to improve readability. This is especially valid for root modules. This file is often named providers.tf. Clear resources separation allows easy & productive code review as well as improves code learning curve.

Example:

# bad example
# main.tf
provider "aws" {
  region = "us-east-1"
}
resource "aws_vpc" "example" {
  cidr_block = "10.0.0.0/16"
}

# good example
# providers.tf
provider "aws" {
  region = "us-east-1"
}
# main.tf
resource "aws_vpc" "example" {
  cidr_block = "10.0.0.0/16"
}

Don’t add anything to providers.tf except providers

Severity: 🟥 Won’t affect the state. Can be fixed in the scope of a PR

Description: Typically, when developing a Terraform project, one would separate providers into their own file to improve readability. This is especially valid for root modules. Having other things defined in a providers.tf file besides providers leads to confusion. Clear resources separation allows easy & productive code review as well as improves code learning curve.

Example:

# bad example
# providers.tf
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.0.0"
    }
  }

  backend "azurerm" {
    resource_group_name  = "StorageAccount-ResourceGroup"
    storage_account_name = "abcd1234"
    container_name       = "tfstate"
    key                  = "prod.terraform.tfstate"
    use_azuread_auth     = true
  }
}
provider "azurerm" {
  features {}
}

# good example
# providers.tf
provider "azurerm" {
  features {}
}
# versions.tf
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.0.0"
    }
  }
}
# backend.tf
terraform {
  backend "azurerm" {
    resource_group_name  = "StorageAccount-ResourceGroup"
    storage_account_name = "abcd1234"
    container_name       = "tfstate"
    key                  = "prod.terraform.tfstate"
    use_azuread_auth     = true
  }
}

Use versions.tf for required providers versions

Severity: 🟥 Won’t affect the state. Can be fixed in the scope of a PR

Description: Typically, when developing a Terraform project, one would separate required_versions into their own file to improve readability. This is especially valid for modules. This file is often named versions.tf

Example:

# bad example
# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}
resource "aws_vpc" "example" {
  cidr_block = "10.0.0.0/16"
}

# good example
# versions.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}
# main.tf
resource "aws_vpc" "example" {
  cidr_block = "10.0.0.0/16"
}

Only required providers versions should be in versions.tf

Severity: 🟥 Won’t affect the state. Can be fixed in the scope of a PR

Description: Typically, when developing a Terraform project, one would separate variables into their own file to improve readability. This is especially valid for modules. Most of the public modules utilize this convention. Having other things defined in a variables.tf file besides variables leads to confusion.

Example:

# bad example
# variables.tf
locals {
  networks_number = var.create_networking ? 1 : 0
}

variable "create_networking" {
    description = "Creates VPC, subnets, VPNs, Routing tables if true. Skips creation, or deletes existing if sets to false. Default value is true"
    type = bool
    default = true
}

# good example
# variables.tf
variable "create_networking" {
    description = "Creates VPC, subnets, VPNs, Routing tables if true. Skips creation, or deletes existing if sets to false. Default value is true"
    type = bool
    default = true
}
# main.tf
locals {
  networks_number = var.create_networking ? 1 : 0
}

Create new terraform project if you have more than 50 resources

Severity: 🟧 Affects the state. Could be fixed in the scope of another PR

Description: Large Terraform projects have pretty significant blast radius, as well as impact productivity because producing a terraform plan becomes a heavy operation that can lead to throttling from a cloud provider’s API and . Especially if you mix foundational resources, such as networking and databases. Splitting existing state into multiple state has to be approached with caution because it could lead to destructive actions such as deletion of resources and ultimately terraform’s state corruption. Ideally you’d account for it during the design face but we understand that it’s not always possible.

Example: For example you manage your organization’s structure via terraform and you have the following files structure:

# Files Structure:
# .
# ├── main.tf
# ├── variables.tf
# ├── outputs.tf
# ├── providers.tf
# ├── versions.tf
# ├── projects.tf
# ├── groups.tf
# ├── permissions.tf
# └── locals.tf

And between groups.tf, permissions.tf and projects.tf there are 50 different resources managed by terraform, such as folders, projects, IAM groups, etc defined explicitly. Update to this project affects all resources managed and can potentially lead to human-error and accidental deletion. The better structure would look like:

# Files Structure:
# .
# ├── projects/
# │   ├── main.tf
# │   ├── variables.tf
# │   ├── outputs.tf
# │   ├── providers.tf
# │   ├── versions.tf
# │   └── projects.tf
# ├── permissions/
# │   ├── main.tf
# │   ├── variables.tf
# │   ├── outputs.tf
# │   ├── providers.tf
# │   ├── versions.tf
# │   ├── groups.tf
# │   └── permissions.tf

Such a structure would allow to separate projects management from permissions management and as the result improve the potential blast radius. Another example: For example you manage everything related to your networks in a single terraform project:

# Files Structure:
# .
# ├── variables
# ├── outputs.tf
# ├── providers.tf
# ├── versions.tf
# ├── networks.tf
# ├── client_vpn.tf
# └── site-to-site_vpn.tf

Such a project can easily go over 50 resources. However splitting them is a significant challenge. So, this rule is not exactly set in stone, and if something is too complicated of a task that overshadows the benefits (especially if for example you make changes to this project once every 3-4 months) then this recommendation could be ignored.

Naming conventions

Use {type}_{name}_{attribute} for output names

Severity: 🟧 Affects the state. Could be fixed in the scope of another PR

Description: When upstream outputs are used, they help developers identify what type & resource they are coming from. This is helpful when you have many things exposed in your module or state. Typical naming strategy is to use the attribute’s name, however more clear one is to also mention resource type without a provider (like instance, instead of aws_instance), and resource name (not to confuse with the attribute your resource could have called “name”).

Example:

# main.tf
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  tags = {
    Name = "HelloWorld"
  }
}

# bad example
# outputs.tf
output "instance_id" {
  description = "EC2 instance ID"
  value       = aws_instance.web.id
}

# good example
# outputs.tf
output "instance_web_id" {
  description = "EC2 instance ID"
  value       = aws_instance.web.id
}

Use {name}_{attribute} as output name when referencing a module

Severity: 🟧 Affects the state’s outputs. Potentially affects downstream. Should be treated as an API schema change

Description: When upstream outputs are used, they help developers identify what type & resource they are coming from. This is helpful when you have many things exposed in your module or state. Typical naming strategy is to use the attribute’s name, however more clear one is to also mention module name (like vpc), and module’s output name.

Example:

# main.tf
module "cluster" {
  source  = "terraform-google-modules/kubernetes-engine/google//modules/private-cluster"
  version = "~> 34.0"

  project_id              = var.project_id
  name                    = "${local.cluster_type}-cluster${var.cluster_name_suffix}"
  regional                = false
  region                  = var.region
  zones                   = var.zones
  network                 = var.network
  subnetwork              = var.subnetwork
  ip_range_pods           = var.ip_range_pods
  ip_range_services       = var.ip_range_services
  create_service_account  = false
  service_account         = var.compute_engine_service_account
  enable_private_endpoint = true
  enable_private_nodes    = true
  master_ipv4_cidr_block  = "172.16.0.0/28"
  deletion_protection     = false

  master_authorized_networks = [
    {
      cidr_block   = data.google_compute_subnetwork.subnetwork.ip_cidr_range
      display_name = "VPC"
    },
  ]
}

# bad example
# outputs.tf
output "kubernetes_endpoint" {
  description = "Private Control Plane endpoint for a k8s cluster"
  value       = module.cluster.endpoint
}

# good example
# outputs.tf
output "cluster_endpoint" {
  description = "Private Control Plane endpoint for a k8s cluster"
  value       = module.cluster.endpoint
}

Always prefer snake case

Severity: 🟧 Will trigger resource/module recreation if applied to existing resources. Potentially affects downstream with outputs.

Description: Snake case is being used by majority of the providers. This is practically the standard for naming conventions of terraform resources and variables/outputs. It’s recommended to use it instead of “-“ and CamelCase.

Example:

# bad example
# outputs.tf
output "cluster-endpoint" {
  description = "Private Control Plane endpoint for a k8s cluster"
  value       = module.cluster.endpoint
  sensitive   = true
}

# good example
# outputs.tf
output "cluster_endpoint" {
  description = "Private Control Plane endpoint for a k8s cluster"
  value       = module.cluster.endpoint
  sensitive   = true
}

Don’t repeat resource type in a resource name

Severity: 🟧 Will trigger resource/module recreation if applied to existing resources.

Description: Terraform resource names are used to identify the purpose or the number of resources being created. Repeating the resource type in any manner doesn’t really help. Such as using group as a name for azurerm_resource_group resource type. If it’s a single resource group you’re creating, the more suitable name is this and if you create many, then names such as these or projects should be used (just as an example).

Example:

# bad example
# main.tf
resource "azurerm_resource_group" "group" {
  name     = "example-resources"
  location = "West Europe"
}

# bad example
# main.tf
resource "azurerm_resource_group" "this" {
  name     = "example-resources"
  location = "West Europe"
}

Prefer “this” when naming a single terraform resource

Severity: 🟧 Will trigger resource/module recreation if applied to existing resources.

Description: Terraform doesn’t pose any restrictions on resource naming, however for the transparency purposes, especially in the modules, it’s better to name a resource this if it’s the only resource type you’re using, such as VPC.

Example:

# bad example
# main.tf
resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

# good example
# main.tf
resource "aws_vpc" "this" {
  cidr_block = "10.0.0.0/16"
}

Use descriptive names if can’t use this

Severity: 🟧 Will trigger resource/module recreation if applied to existing resources.

Description: Terraform doesn’t pose any restrictions on resource naming, however for the transparency purposes, especially in the modules. this should reference only to a single module/resource you’re creating. If count/for_each are used or you have multiple copies of the same resource type such as aws_subnet then this isn’t a clear enough name.

Example:

# bad example
# main.tf
resource "aws_subnet" "this" {
  count = len(var.subnets)

  vpc_id = aws_vpc.this.id
  cidr_block = var.subnets[count.index]

  tags = {
    Name = "subnet-${count.index}"
  }
}

# good example
# main.tf
resource "aws_subnet" "private" {
  count = len(var.subnets)

  vpc_id = aws_vpc.this.id
  cidr_block = var.subnets[count.index]

  tags = {
    Name = "subnet-${count.index}"
  }
}

Code quality improvements

Meta-arguments for_each, if used, should be the first attribute in a block

Severity: 🟥 Doesn’t affect the state. Could be fixed in the scope of an existing PR

Description: for_each helps to create multiple resources using the same terraform block. Either by using resources/data/modules or dynamic blocks. Listing it first right after { is crucial for the code readability. This is especially useful if you define a map for the aws_subnet attribute in the argument’s value and doesn’t abstract it with a “local” or by other means. This way the attribute stands out and it’s easy to review.

Example:

# bad example
# main.tf
resource "google_clouddeploy_automation" "b_automation" {
  name              = "${each.key}-cd-automation"
  project           = google_clouddeploy_delivery_pipeline.pipeline[each.key].project
  location          = google_clouddeploy_delivery_pipeline.pipeline[each.key].location
  delivery_pipeline = google_clouddeploy_delivery_pipeline.pipeline[each.key].name
  for_each          = local.projects
  service_account   = each.value.email
  selector {
    targets {
      id = "*"
    }
  }
  suspended = false
  rules {
    promote_release_rule {
      id = "promote-release"
    }
  }
}

# good example
# main.tf
resource "google_clouddeploy_automation" "b_automation" {
  for_each = local.projects

  name              = "${each.key}-cd-automation"
  project           = google_clouddeploy_delivery_pipeline.pipeline[each.key].project
  location          = google_clouddeploy_delivery_pipeline.pipeline[each.key].location
  delivery_pipeline = google_clouddeploy_delivery_pipeline.pipeline[each.key].name
  service_account   = each.value.email
  selector {
    targets {
      id = "*"
    }
  }
  suspended = false
  rules {
    promote_release_rule {
      id = "promote-release"
    }
  }
}

Meta-argument count, if used, should be the first attribute in a block

Severity: 🟥 Doesn’t affect the state. Could be fixed in the scope of an existing PR

Description: count helps to create multiple resources using the same terraform block. Mostly used for conditional resources creation. Listing it first right after { is crucial for the code readability. This way the attribute stands out and it’s easy to review.

Example:

# bad example
# main.tf
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  count         = var.enabled ? 1 : 0
  instance_type = "t3.micro"

  tags = {
    Name = "HelloWorld"
  }
}

# good example
# main.tf
resource "aws_instance" "web" {
  count = var.enabled ? 1 : 0

  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  tags = {
    Name = "HelloWorld"
  }
}

Meta-argument depends_on should be the last of the resource’s attributes

Severity: 🟥 Doesn’t affect the state. Could be fixed in the scope of an existing PR

Description: Placing depends_on as the last argument in a terraform resource configuration ensures readability and clear separation from the resource-specific arguments (by adding a single empty line depends_on).

Example:

# bad example
# main.tf
resource "aws_instance" "this" {
  ami                    = "ami-0c94855ba95c574c9"
  depends_on             = [aws_security_group.this]
  instance_type          = "t2.micro"
}
resource "aws_security_group" "this" {
  name = "example"

  # ... other arguments ...
}


# good example
# main.tf
resource "aws_instance" "this" {
  ami                    = "ami-0c94855ba95c574c9"
  instance_type          = "t2.micro"

  depends_on = [aws_security_group.this]
}
resource "aws_security_group" "this" {
  name = "example"

  # ... other arguments ...
}

Meta-block lifecycle should be the last in a resource definition

Severity: 🟥 Doesn’t affect the state. Could be fixed in the scope of an existing PR

Description: Placing lyfecycle as the last block in a terraform resource configuration ensures readability and clear separation from the resource-specific blocks & attributes (by adding a single empty line before lyfecycle).

Example:

# bad example main.tf
resource "aws_instance" "this" {
  ami = "ami-0c94855ba95c574c9"
  lifecycle {
    create_before_destroy = true
  }
  instance_type = "t2.micro"
  depends_on = [aws_security_group.this]
}
resource "aws_security_group" "this" {
  name = "example"
  # ... other arguments ...
}


# good example main.tf
resource "aws_instance" "this" {
  ami           = "ami-0c94855ba95c574c9"
  instance_type = "t2.micro"
  depends_on = [aws_security_group.this]
  lifecycle {
    create_before_destroy = true
  }
}
resource "aws_security_group" "this" {
  name = "example"
  # ... other arguments ...
}

Group resources using for_each

Severity: 🟧 Will trigger resource/module recreation if applied to existing resources.

Description: If you copy paste the same resource & slightly change the value 4+ times, it’s better to group them all using for_each meta argument. This will ensure DRYness of the code without going extreme.

Example:

# bad example
# main.tf
resource "aws_subnet" "main_1" {
  vpc_id               = aws_vpc.main.id
  cidr_block           = "10.0.1.0/24"
  availability_zone_id = "use1-az1"

  tags = {
    Name = "Main-1"
  }
}
resource "aws_subnet" "main_2" {
  vpc_id               = aws_vpc.main.id
  cidr_block           = "10.0.2.0/24"
  availability_zone_id = "use1-az2"

  tags = {
    Name = "Main-2"
  }
}
resource "aws_subnet" "main_3" {
  vpc_id               = aws_vpc.main.id
  cidr_block           = "10.0.3.0/24"
  availability_zone_id = "use1-az3"

  tags = {
    Name = "Main-3"
  }
}
resource "aws_subnet" "main_4" {
  vpc_id               = aws_vpc.main.id
  cidr_block           = "10.0.4.0/24"
  availability_zone_id = "use1-az4"

  tags = {
    Name = "Main-4"
  }
}

# good example
# main.tf
locals {
  private_subnets = {
    az1 = "10.0.1.0/24"
    az2 = "10.0.2.0/24"
    az3 = "10.0.3.0/24"
    az4 = "10.0.4.0/24"
  }
}
resource "aws_subnet" "private" {
  for_each = local.private_subnets

  vpc_id               = aws_vpc.main.id
  cidr_block           = each.value
  availability_zone_id = "use1-${each.key}"

  tags = {
    Name = "Main-${each.key}"
  }
}

Define both description & type for a variable

Severity: 🟥 Doesn’t affect the state. Could be fixed in the scope of an existing PR

Description: This rule is by far the most essential, because your variables control what’s going on with the state. Even something obvious as “environment”, can and will be interpreted in many different ways: “is it stage, or staging?” or “is it included into resources prefixes?” that’s what description is for. Variable’s type in addition to being useful in “documentation” also helps terraform to throw meaningful errors if something doesn’t match.

Example:

# bad example
variable "app_settings" {}
variable "app" {}

# good example
variable "app_setting" {
  description = "Configurable settings for the backend application. Any changes trigger application redeployment"
  type        = object({
    cpu           = number
    memory        = number
    min_replicas  = number
    internal_port = number
    attach_volume = optional(bool, false)
  })
}
variable "app" {
  description = "Application name. Used in the cloud resources names, as well as container image name. It must be a lowercase string"
  type       = string

  # good idea to throw some validation if it makes sense
  validation {
    condition     = lower(var.app) == var.app
    error_message = "Application must be lowercase"
  }
}

Follow specific order for arguments in variables

Severity: 🟥 Doesn’t affect the state. Could be fixed in the scope of an existing PR

Description: This is essentially your state’s API documentation. It doesn’t mean you have to specify all of them, but if you do, they should be in the listed order to make sure it’s readable in the long run.

Example:

# bad example
variable "app" {
  type = string
  validation {
    condition     = lower(var.app) == var.app
    error_message = "Application must be lowercase"
  }
  description = "Application name. Used in the cloud resources names, as well as container image name. It must be a lowercase string"
}

# good example
variable "app" {
  description = "Application name. Used in the cloud resources names, as well as container image name. It must be a lowercase string"
  type        = string

  validation {
    condition     = lower(var.app) == var.app
    error_message = "Application must be lowercase"
  }
}

Set description for your output

Severity: 🟥 Doesn’t affect the state. Could be fixed in the scope of an existing PR

Description: This is essentially your state’s API documentation. Even something obvious as “environment”, can and will be interpreted in many different ways: “should I used it as a prefix everywhere?” that’s what description is for.

Example:

# bad example
output "main_db_instance_address" {
  value = try(aws_db_instance.main[0].address, null)
}

# good example
output "main_db_instance_address" {
  description = "Full address of the main database if created, otherwise null"
  value       = try(aws_db_instance.main[0].address, null)
}

Don’t output plain strings

Severity: 🟥 Doesn’t affect the state. Could be fixed in the scope of an existing PR

Description: By using plain strings as values for your outputs you potentially disrupt dependncy graph. Consider moving the string to either local, variable, or retrieve it dynamically from resources you have.

Example:

# bad example
output "this_instance_availability_zone" {
  description = "Availability Zone used by the instance"
  value       = "us-east-2a"
}

# good example
output "this_instance_availability_zone" {
  description = "Availability Zone used by the instance"
  value       = aws_instance.this.availability_zone
}