Battle-Tested Terraform Techniques: Advanced Patterns for Scaling Cloud Infrastructure

Table of Contents

    Dynamic Configuration Using YAML/JSON Files

    One of the most powerful yet underutilized features in Terraform is the ability to use external YAML or JSON files as input for dynamic resource creation. This technique, recommended by Google, offers a more flexible and maintainable approach to infrastructure management.

    The Basic Pattern

    When managing multiple environments or components, you can organize configurations into separate YAML files:

    .
    ├── terraform/
       ├── main.tf
       └── environments/
           ├── prod/
              ├── storage.yaml
              └── compute.yaml
           └── staging/
               ├── storage.yaml
               └── compute.yaml

    Here’s how the YAML files might look:

    # environments/prod/storage.yaml
    # buckets.yaml
    buckets:
      - name: production-logs
        location: US
        storage_class: STANDARD
        versioning: true
        lifecycle_rules:
          - action: Delete
            age: 90
      
      - name: backup-archive
        location: EU
        storage_class: COLDLINE
        versioning: false
        lifecycle_rules:
          - action: Delete
            age: 365

    And the terraform that consumes the *.yaml files:

    # main.tf
    locals {
      environments = toset(["prod", "staging"])
      
      # Find latest config file for each environment
      latest_storage_configs = {
        for env in local.environments : env => yamldecode(
          file(
            sort(fileset("${path.module}/environments/${env}", "storage_*.yaml"))[-1]
          )
        )
      }
    }
    
    resource "google_storage_bucket" "buckets" {
      for_each = merge([
        for env, config in local.storage_configs : {
          for bucket in config.buckets : "${env}-${bucket.name}-${config.timestamp}" => merge(bucket, {
            environment = env,
            deployment_timestamp = config.timestamp
          })
        }
      ]...)
    
      # Add a precondition to verify unique timestamps
      lifecycle {
        precondition {
          condition     = alltrue(local.check_timestamps)
          error_message = "All configuration files must have unique timestamps"
        }
      }
    
      name          = each.value.name
      location      = each.value.location
      storage_class = each.value.storage_class
    
      versioning {
        enabled = each.value.versioning
      }
    
      dynamic "lifecycle_rule" {
        for_each = each.value.lifecycle_rules
        content {
          condition {
            age = lifecycle_rule.value.age
          }
          action {
            type = lifecycle_rule.value.action
          }
        }
      }
    }

    Advantages of YAML Terraform Inputs

    1. Separation of Configuration and Logic: Configuration details are isolated from Terraform code, making both easier to maintain.
    2. Version Control Friendly: Changes to configurations are clearly visible in version control.
    3. Programmatic Updates: Easy to integrate with automation tools or CI/CD pipelines.
    4. Reduced Error Risk: Compared to maintaining large locals blocks, YAML provides better structure and validation.

    Challenges and Solutions of YAML Terraform Inputs

    Input Validation

    You can implement validation in two ways:

    Using Preconditions:

    resource "google_storage_bucket" "buckets" {
      for_each = { for bucket in local.bucket_config.buckets : bucket.name => bucket }
    
      lifecycle {
        precondition {
          condition = contains(["US", "EU", "ASIA"], each.value.location)
          error_message = "Location must be one of: US, EU, ASIA"
        }
    
        precondition {
          condition = contains(["STANDARD", "NEARLINE", "COLDLINE"], each.value.storage_class)
          error_message = "Invalid storage class specified"
        }
      }
      # ... rest of resource configuration
    }

    Using Variable Validation:

    variable "bucket_config" {
      type = object({
        buckets = list(object({
          name          = string
          location      = string
          storage_class = string
          versioning    = bool
          lifecycle_rules = list(object({
            action = string
            age    = number
          }))
        }))
      })
    
      validation {
        condition = alltrue([
          for bucket in var.bucket_config.buckets :
          contains(["US", "EU", "ASIA"], bucket.location)
        ])
        error_message = "Invalid location specified in bucket configuration"
      }
    
      validation {
        condition = alltrue([
          for bucket in var.bucket_config.buckets :
          contains(["STANDARD", "NEARLINE", "COLDLINE"], bucket.storage_class)
        ])
        error_message = "Invalid storage class specified in bucket configuration"
      }
    }

    User Interface Considerations

    While YAML/JSON configuration offers flexibility, it can be challenging for users unfamiliar with these formats. Consider implementing:

    • A web form that generates valid YAML/JSON
    • CLI tools with interactive prompts
    • Schema documentation with examples
    • Pre-commit hooks for validation

    Best Practices

    1. Always include a schema file (e.g., JSON Schema) to document the expected structure
    2. Implement comprehensive validation at both the variable and resource level
    3. Provide example configurations for common use cases
    4. Consider building helper tools for configuration generation
    5. Use descriptive error messages in validations to guide users

    This pattern becomes particularly valuable in large-scale infrastructure where configuration changes are frequent and need to be managed systematically.

    Schema Validation for Configuration Files

    While YAML/JSON configurations offer flexibility, they also introduce potential errors. Implementing schema validation in your CI/CD pipeline can catch issues before they reach your Terraform workflow.

    Creating a JSON Schema

    First, define a schema that describes your expected configuration structure:

    {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "type": "object",
      "required": ["buckets"],
      "properties": {
        "buckets": {
          "type": "array",
          "items": {
            "type": "object",
            "required": ["name", "location", "storage_class", "versioning"],
            "properties": {
              "name": {
                "type": "string",
                "pattern": "^[a-z0-9][-a-z0-9_.]*[a-z0-9]$",
                "maxLength": 63,
                "description": "Bucket name must be between 3-63 chars, start/end with number or letter"
              },
              "location": {
                "type": "string",
                "enum": ["US", "EU", "ASIA"],
                "description": "Geographic location of the bucket"
              },
              "storage_class": {
                "type": "string",
                "enum": ["STANDARD", "NEARLINE", "COLDLINE"],
                "description": "Storage class affecting availability and cost"
              },
              "versioning": {
                "type": "boolean",
                "description": "Whether object versioning is enabled"
              },
              "lifecycle_rules": {
                "type": "array",
                "items": {
                  "type": "object",
                  "required": ["action", "age"],
                  "properties": {
                    "action": {
                      "type": "string",
                      "enum": ["Delete", "SetStorageClass"],
                      "description": "Action to take on matching objects"
                    },
                    "age": {
                      "type": "integer",
                      "minimum": 1,
                      "description": "Age in days when the action should occur"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }

    Implementing GitHub Workflow Validation

    Create a GitHub workflow to validate YAML files against your schema:

    name: Validate Terraform YAML Configs
    
    on:
      pull_request:
        paths:
          - 'terraform/environments/**/*.yaml'
    
    jobs:
      validate-yaml:
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v3
          
          - name: Install ajv-cli
            run: npm install -g ajv-cli
          
          - name: Validate YAML files
            run: |
              for file in $(find terraform/environments -name "*.yaml"); do
                echo "Validating $file..."
                # Convert YAML to JSON for validation
                yq eval -o=json "$file" > temp.json
                
                # Validate against schema
                if ! ajv validate -s schema.json -d temp.json; then
                  echo "❌ Validation failed for $file"
                  exit 1
                fi
                rm temp.json
              done
            
          - name: Comment on PR
            if: failure()
            uses: actions/github-script@v6
            with:
              script: |
                github.rest.issues.createComment({
                  issue_number: context.issue.number,
                  owner: context.repo.owner,
                  repo: context.repo.name,
                  body: '❌ YAML validation failed. Please check the workflow logs and ensure your configuration matches the schema.'
                })

    This workflow:

    1. Triggers on YAML file changes
    2. Converts YAML to JSON
    3. Validates against schema
    4. Fails PR if validation fails

    Benefits of Schema Validation

    1. Catch Errors Early: Validation happens before Terraform runs
    2. Self-Documenting: Schema serves as configuration documentation
    3. IDE Integration: Many editors can validate against JSON schema
    4. Standardization: Enforces consistent configuration patterns

    Unit Testing YAML Configurations with Git Hooks

    While schema validation in CI/CD provides a safety net, catching configuration errors earlier in the development cycle saves time and reduces failed pipelines. Git pre-commit hooks offer an elegant solution for “unit testing” YAML configurations before they even reach your repository.

    Implementing Pre-commit Validation

    Create a pre-commit hook to validate YAML files against your schema:

    #!/bin/bash
    # File: .git/hooks/pre-commit
    
    # Check if yaml files are being committed
    yaml_files=$(git diff --cached --name-only --diff-filter=ACMR | grep "^terraform/environments/.*\.yaml$")
    
    if [ -z "$yaml_files" ]; then
        exit 0
    fi
    
    # Ensure required tools are installed
    command -v yq >/dev/null 2>&1 || { echo "yq is required but not installed. Aborting." >&2; exit 1; }
    command -v ajv >/dev/null 2>&1 || { echo "ajv-cli is required but not installed. Aborting." >&2; exit 1; }
    
    echo "Validating YAML files against schema..."
    
    # Validate each yaml file
    for file in $yaml_files; do
        if [ -f "$file" ]; then
            echo "Checking $file..."
            if ! yq eval -o=json "$file" | ajv validate -s schema.json; then
                echo "❌ $file failed schema validation"
                exit 1
            fi
        fi
    done

    Setting Up the Testing Environment

    1. Save the hook as .git/hooks/pre-commit
    2. Make it executable: chmod +x .git/hooks/pre-commit
    3. Install dependencies:
    npm install -g ajv-cli
    brew install yq  # or apt-get install yq for Linux

    Benefits of Pre-commit Testing

    1. Instant Feedback: Developers get immediate validation results
    2. Offline Validation: No need for CI pipeline to catch basic errors
    3. Reduced Pipeline Load: Catch errors before they trigger CI/CD runs
    4. Standardized Testing: Every developer uses the same validation rules

    Best Practices

    1. Include setup instructions in your repository’s README
    2. Version control your schema file
    3. Consider using pre-commit framework for more complex validations
    4. Keep schema and hooks in sync with Terraform validation rules

    Leave a Reply