Terragrunt vs Terraform Workspaces: A Practical Decision Guide
You've got 47 Terraform state files scattered across three AWS accounts, and every terraform apply feels like defusing a bomb. Should you reach for Terragrunt or finally figure out what workspaces actually do?
The answer isn't "it depends" — it's "they solve fundamentally different problems." Let me show you when each one saves your sanity and when it makes things worse.
The Core Problem Both Tools Address
Terraform's default mode assumes you're managing one thing. One environment. One state file. One backend config. This falls apart the moment you need dev/staging/prod, or multiple regions, or — god forbid — both.
Workspaces and Terragrunt both tackle state isolation and configuration reuse, but their approaches are radically different:
- Workspaces = single codebase, multiple state files, minimal abstraction
- Terragrunt = wrapper tool, DRY configurations, opinionated folder structure
Think of workspaces as namespaces within a single Terraform project. Terragrunt is more like a build system that orchestrates multiple Terraform projects.
When Terraform Workspaces Actually Work
Workspaces shine when your environments are nearly identical and differ only by variable values. Classic use case: deploying the same Lambda function to dev, staging, and prod where the only changes are memory allocation and environment variables.
# main.tf
resource "aws_lambda_function" "api" {
function_name = "api-${terraform.workspace}"
memory_size = var.memory_sizes[terraform.workspace]
environment {
variables = {
LOG_LEVEL = terraform.workspace == "prod" ? "warn" : "debug"
}
}
}
variable "memory_sizes" {
default = {
dev = 128
staging = 256
prod = 1024
}
}
# Usage
terraform workspace new staging
terraform workspace select staging
terraform apply -var-file="staging.tfvars"
# List all workspaces
terraform workspace list
# default
# * staging
# prod
State files live in the same backend but under workspace-specific paths: env:/staging/terraform.tfstate. Clean, simple, built-in.
Use workspaces when:
- Environments differ only in sizing, not architecture
- You're a small team (< 5 engineers touching infra)
- Your CI/CD can handle workspace-aware deploys
- You don't need cross-stack dependencies
Workspace limitations that will bite you:
- No native way to share outputs between workspaces
- Backend config is still duplicated
- Provider configs can't vary by workspace (same AWS account, same region)
terraform.workspacescattered through code becomes maintenance hell
When Terragrunt Becomes Necessary
The moment your dev environment lives in a different AWS account than prod, or your database module needs to reference outputs from your VPC module, workspaces start fighting you.
Terragrunt's killer feature isn't DRY configs — it's dependency management and multi-account orchestration.
# live/prod/us-east-1/vpc/terragrunt.hcl
terraform {
source = "git::git@github.com:acme/modules.git//vpc?ref=v2.3.1"
}
include "root" {
path = find_in_parent_folders()
}
inputs = {
vpc_cidr = "10.0.0.0/16"
environment = "prod"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
# live/prod/us-east-1/eks/terragrunt.hcl
terraform {
source = "git::git@github.com:acme/modules.git//eks?ref=v1.8.0"
}
include "root" {
path = find_in_parent_folders()
}
dependency "vpc" {
config_path = "../vpc"
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.private_subnet_ids
cluster_version = "1.29"
}
# live/terragrunt.hcl (root config)
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite"
}
config = {
bucket = "acme-terraform-state-${get_aws_account_id()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite"
contents = <<EOF
provider "aws" {
region = "${local.region}"
default_tags {
tags = {
Environment = "${local.environment}"
ManagedBy = "terragrunt"
}
}
}
EOF
}
Now run terragrunt run-all apply from live/prod/us-east-1/ and it deploys VPC first, waits for outputs, then deploys EKS with those outputs injected. No manual copying of VPC IDs.
The Hidden Costs of Each Approach
Workspace overhead:
- Every
terraform planrequires explicit workspace selection - CI/CD pipelines need workspace-aware logic
- New team members inevitably run against the wrong workspace
- No built-in visualization of what's deployed where
Terragrunt overhead:
- Extra binary to install, version, and maintain
- Learning curve for
find_in_parent_folders(),dependencyblocks,includepatterns - Debugging generated configs requires
terragrunt render-json - Cache directories (
.terragrunt-cache) balloon disk usage - Some Terraform Cloud/Enterprise features don't play nicely
From benchmarks I've run, Terragrunt adds ~3-8 seconds overhead per module initialization due to source downloading and config generation. On a 15-module stack, that's 1-2 extra minutes on run-all plan. Not catastrophic, but not free.
Real Decision Framework
Ask these three questions:
1. Do environments live in different AWS accounts or require different provider configs? Yes → Terragrunt. Workspaces can't swap provider configs.
2. Do you have cross-module dependencies (EKS needs VPC outputs, RDS needs security group outputs)?
Yes → Terragrunt. The dependency block is worth the learning curve.
3. Is your infrastructure essentially the same resources scaled differently? Yes → Workspaces might be enough. Keep it simple.
For a concrete example: a SaaS company I worked with had 3 environments, single AWS account, identical architectures. Workspaces worked fine for two years. Then they added a data analytics stack that only existed in prod, needed to reference the main VPC, and ran in a separate account for compliance. That's when they migrated to Terragrunt — the workspace model couldn't express "this module only exists in prod and needs outputs from a different workspace."
Migration Path: Workspaces to Terragrunt
If you're currently on workspaces and hitting walls, here's the extraction pattern:
# Export existing state
terraform workspace select prod
terraform state pull > prod-state.json
# Initialize new Terragrunt structure
mkdir -p live/prod/us-east-1/app
cd live/prod/us-east-1/app
# Create terragrunt.hcl (as shown above)
terragrunt init
# Import state
terragrunt state push ../../../prod-state.json
# Verify
terragrunt plan # Should show no changes
Do this module by module, environment by environment. Don't try to migrate everything at once.
Make the Call
If your infrastructure fits in one AWS account with nearly identical environments, start with workspaces. You can migrate later. Adding Terragrunt to a simple setup creates overhead you don't need.
If you're already juggling multiple accounts, have stacks that reference each other, or your terraform.workspace conditionals are spreading like mold — stop fighting it. Install Terragrunt, set up the folder structure, and embrace the dependency graph.
Next step: if you're leaning toward Terragrunt, start with their quick start and convert a single non-critical module. Don't refactor your entire infrastructure based on a blog post. Prove it works for your team first.
Written by GeekOnCloud
DevOps & Infrastructure engineer at geekoncloud.com