Skip to main content
GenioCT

YAML-Driven Terraform: Building a Self-Service Infrastructure Catalog

By GenioCT | | 10 min read
Terraform DevOps Azure IaC

In this article

A YAML-driven product catalog sits between application teams and the Terraform codebase, providing a self-service interface with guardrails.

Every platform team hits the same wall. Application teams want cloud resources. They don’t want to learn Terraform. They want to fill in a form - or better, commit a YAML file - and get a working environment. Meanwhile, the platform team wants consistency, governance, and the ability to sleep at night.

The usual answer is Terraform modules. You write reusable modules, document them, and ask teams to use them. This works for a while. Then you end up with 30 slightly different main.tf files across 30 repositories, each one a creative interpretation of your module documentation.

We have built a different approach for several enterprise clients: a YAML-driven infrastructure catalog. Infrastructure is defined entirely in YAML. Terraform reads the YAML, transforms it into resource maps, and applies the infrastructure. The platform team controls the stacks and the pipeline. Everyone stays in their lane.

The Problem with “Just Use Modules”

Terraform modules are the right building block, but they are the wrong interface for application teams. This is why:

  • HCL is a barrier. Application developers think in YAML, JSON, or environment variables - not in resource blocks and data sources
  • Copy-paste drift. When teams copy a Terraform root module and modify it, you get drift. Different backend configurations, inconsistent naming, forgotten tags
  • Review bottleneck. If the platform team reviews every Terraform PR, you become the bottleneck. If you do not, you get surprises in production
  • Upgrade pain. Updating a module version across 30 consumers means 30 PRs, 30 plan reviews, and 30 deployment windows

The YAML-driven approach solves all of these by separating what teams want from how it gets provisioned.

Related reading: Terraform module best practices · Azure CAF Terraform modules

Architecture: How It Works

The pattern has two repositories and a CLI tool that ties them together.

The Config Repository

This is where infrastructure is defined. A central service registry maps resource types to their config paths, and each resource type has its own directory with YAML files per environment:

# services.yml - the service registry
network_base:
  config_path: resources/network_base
keyvault:
  config_path: resources/keyvault
storage_account:
  config_path: resources/storage_account
virtual_machine:
  config_path: resources/virtual_machine
config/
├── services.yml                    # Service registry
└── resources/
    ├── network_base/
    │   ├── dev/
    │   │   └── platform.yml        # Dev network config
    │   └── prd/
    │       └── platform.yml        # Prod network config
    ├── keyvault/
    │   ├── dev/
    │   │   └── platform.yml
    │   └── prd/
    │       └── platform.yml
    └── virtual_machine/
        ├── dev/
        │   └── workstations.yml
        └── prd/
            └── app-servers.yml

The YAML files aren’t abstract orders with a product reference. They are direct resource definitions - rich, detailed configurations that describe exactly what infrastructure looks like:

# resources/network_base/dev/platform.yml
vnets:
  - name: vnet-platform-dev-we-001
    resource_group: rg-network-dev-we-001
    location: westeurope
    address_spaces:
      - 10.10.0.0/16
    subnets:
      - name: snet-app-dev-we-001
        address_prefix: 10.10.1.0/24
        nsg: nsg-snet-app-dev-we-001
        route_table: rt-app-dev-we-001
        service_endpoints:
          - Microsoft.KeyVault
          - Microsoft.Storage
      - name: snet-data-dev-we-001
        address_prefix: 10.10.2.0/24
        nsg: nsg-snet-data-dev-we-001
        delegation:
          name: databricks
          service: Microsoft.Databricks/workspaces

nsgs:
  - name: nsg-snet-app-dev-we-001
    resource_group: rg-network-dev-we-001
    location: westeurope
    rules:
      - name: AllowHTTPS
        priority: 100
        direction: Inbound
        access: Allow
        protocol: Tcp
        source_prefix: VirtualNetwork
        destination_ports: ["443"]
      - name: DenyAllInbound
        priority: 4096
        direction: Inbound
        access: Deny
        protocol: "*"
        source_prefix: "*"
        destination_ports: ["*"]

route_tables:
  - name: rt-app-dev-we-001
    resource_group: rg-network-dev-we-001
    location: westeurope
    enable_bgp_route_propagation: false
    routes:
      - name: to-firewall
        address_prefix: "0.0.0.0/0"
        next_hop_type: VirtualAppliance
        next_hop_in_ip_address: "10.0.1.4"

This is what application teams submit in a pull request. It is easy to read, easy to review, and the schema is well-understood. No HCL knowledge needed.

The IaC Repository

The IaC repo contains one Terraform stack per resource type. Each stack follows the same structure:

iac/
└── tf/
    ├── network_base/
    │   ├── versions.tf
    │   ├── providers.tf
    │   ├── variables.tf
    │   ├── locals.tf      # YAML loading + transformation
    │   └── main.tf        # Resource definitions
    ├── keyvault/
    │   ├── ...
    ├── storage_account/
    │   ├── ...
    └── virtual_machine/
        ├── ...

The critical piece is locals.tf. This is where the YAML gets loaded, decoded, and transformed into resource maps that Terraform’s for_each can consume:

# tf/network_base/locals.tf

# Step 1: Load the service registry and config file
locals {
  services    = yamldecode(file("../../config/services.yml"))
  config_path = "../../config/${local.services[var.stack].config_path}/${var.env}/${var.file}.yml"
  config      = yamldecode(file(local.config_path))
}

# Step 2: Transform YAML lists into keyed maps for for_each
locals {
  vnets = { for v in try(local.config.vnets, []) : v.name => v }
  nsgs  = { for n in try(local.config.nsgs, [])  : n.name => n }
  rts   = { for r in try(local.config.route_tables, []) : r.name => r }
}

# Step 3: Flatten nested structures (subnets live inside vnets)
locals {
  subnets = {
    for s in flatten([
      for v in values(local.vnets) : [
        for sn in try(v.subnets, []) : merge(sn, {
          vnet_name      = v.name
          resource_group = v.resource_group
        })
      ]
    ]) : "${s.vnet_name}/${s.name}" => s
  }
}

# Step 4: Collect all referenced resource groups for data lookups
locals {
  rg_names = distinct(concat(
    [for v in values(local.vnets) : v.resource_group],
    [for n in values(local.nsgs)  : n.resource_group],
    [for r in values(local.rts)   : r.resource_group]
  ))
}

# Step 5: Platform-enforced tags
locals {
  tags = {
    catalog_env   = var.env
    catalog_file  = var.file
    catalog_stack = var.stack
    managedBy = "Terraform"
  }
}

Then main.tf uses for_each on these transformed maps:

# tf/network_base/main.tf

data "azurerm_resource_group" "rg" {
  for_each = toset(local.rg_names)
  name     = each.value
}

resource "azurerm_virtual_network" "this" {
  for_each = local.vnets

  name                = each.value.name
  location            = each.value.location
  resource_group_name = each.value.resource_group
  address_space       = each.value.address_spaces
  dns_servers         = try(each.value.dns_servers, [])

  tags = merge(
    data.azurerm_resource_group.rg[each.value.resource_group].tags,
    try(each.value.tags, {}),
    local.tags
  )
}

resource "azurerm_subnet" "this" {
  for_each = local.subnets

  name                 = each.value.name
  resource_group_name  = each.value.resource_group
  virtual_network_name = azurerm_virtual_network.this[each.value.vnet_name].name
  address_prefixes     = [each.value.address_prefix]

  service_endpoints = try(each.value.service_endpoints, [])

  dynamic "delegation" {
    for_each = try(each.value.delegation, null) != null ? [each.value.delegation] : []
    content {
      name = delegation.value.name
      service_delegation {
        name = delegation.value.service
      }
    }
  }
}

Notice: no child modules, no abstraction layers. The stack directly creates resources using for_each on the YAML-derived maps. The try() function handles optional fields with sensible defaults. This is deliberately simple - and that simplicity is the point.

Azure docs: Cloud Adoption Framework - Platform automation · Terraform on Azure best practices

The CI/CD Pipeline

The pipeline orchestrates stacks with explicit dependency ordering. Infrastructure resources have natural dependencies - you need resource groups before networking, networking before VMs - and the pipeline enforces this:

# azure-pipelines.yml (simplified)
parameters:
  - name: run_mode
    type: string
    default: plan
    values: [plan, apply, destroy]
  - name: env
    type: string
  - name: file
    type: string

stages:
  - template: templates/run_stack.yml
    parameters:
      stack: resource_group
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: []

  - template: templates/run_stack.yml
    parameters:
      stack: network_base
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [resource_group]

  - template: templates/run_stack.yml
    parameters:
      stack: keyvault
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [network_base]

  - template: templates/run_stack.yml
    parameters:
      stack: virtual_machine
      env: ${{ parameters.env }}
      file: ${{ parameters.file }}
      run_mode: ${{ parameters.run_mode }}
      dependencies: [keyvault]

Each stack stage runs terraform init with the right backend key, then plan or apply depending on the run mode. The state key is derived from the stack, environment, and file name - giving you isolated state per deployment unit.

A CLI wrapper tool handles the plumbing: loading backend configuration, constructing the config path from the service registry, and calling Terraform with the right variables:

# Example: plan networking for the dev platform
tfcat plan --stack network_base --env dev --file platform

# Apply everything for production
tfcat apply --stack keyvault --env prd --file platform

Azure docs: Azure DevOps Pipelines with Terraform · Environment approvals and gates

Why This Works Better Than Terraform Modules Alone

You might be thinking: “just write better modules.” Fair point. The key differences with the YAML-driven approach:

  • Separation of repos. The config repo is the interface. The IaC repo is the engine. Different teams, different review cycles, different release cadences
  • No HCL for consumers. Teams that define infrastructure never see a .tf file. They write YAML that describes what they want, not how to build it
  • Flat resource maps, not nested modules. Each stack uses for_each directly on resources. No module inception, no abstraction layers to debug through
  • Governance by schema. The YAML structure is the policy. You can’t order a VM size that isn’t in the config. You can’t skip tags. The shape of the YAML enforces the rules
  • Audit trail. Every infrastructure change is a YAML diff in git. Who changed what, when, and why - all in commit history

Patterns We Have Learned

After building this for multiple clients across 30+ resource types, here are the patterns that matter:

One stack per resource type

Don’t bundle unrelated resources. A network_base stack handles VNets, subnets, NSGs, and route tables - things that are naturally related. But VMs and networking go in separate stacks with separate state files and separate pipeline stages.

Use try() liberally for optional fields

The YAML should only contain what needs to vary. Everything else gets a default via Terraform’s try() function:

dns_servers         = try(each.value.dns_servers, [])
enable_bgp          = try(each.value.enable_bgp_route_propagation, true)
private_ip_address  = try(each.value.private_ip_address, null)

This keeps the YAML configs clean. A simple subnet doesn’t need 20 fields - just the ones that differ from the default.

Flatten nested structures into keyed maps

YAML is hierarchical. Terraform’s for_each wants flat maps. The locals.tf transformation layer bridges this gap. Subnets live inside VNets in YAML but need to be a flat map for azurerm_subnet:

subnets = {
  for s in flatten([
    for v in values(local.vnets) : [
      for sn in try(v.subnets, []) : merge(sn, {
        vnet_name      = v.name
        resource_group = v.resource_group
      })
    ]
  ]) : "${s.vnet_name}/${s.name}" => s
}

This pattern works for any parent-child relationship: NSG rules inside NSGs, routes inside route tables, containers inside storage accounts.

The product catalog flow: teams submit YAML orders, the engine validates and provisions, the platform team controls the catalog.

Merge tags from multiple sources

Tags should come from three layers: the resource group (inherited), the YAML config (custom per resource), and the platform (automatic metadata). Merge them in that order:

tags = merge(
  data.azurerm_resource_group.rg[each.value.resource_group].tags,
  try(each.value.tags, {}),
  local.tags  # catalog_env, catalog_stack, managedBy, etc.
)

This gives you cost allocation, compliance tagging, and management metadata without trusting anyone to remember.

Separate state per stack, per environment

Each combination of stack + environment + file gets its own state file: tfstate/dev-platform-network_base.tfstate. This isolates blast radius. A failed VM deployment doesn’t lock or corrupt networking state.

Getting Started

You don’t need to build the full catalog on day one. Start with:

  1. Pick one resource type. Networking is usually the best first candidate - every environment needs it, and it has natural sub-resources (VNets, subnets, NSGs)
  2. Define the YAML schema. Write a sample YAML that describes your current environment. If the YAML is readable by a non-Terraform person, the schema is right
  3. Write the stack. One locals.tf that transforms YAML to maps. One main.tf that creates resources with for_each. Keep it flat
  4. Add a pipeline. Start with plan-only. Add apply after you trust the pattern
  5. Expand. Add resource types as demand grows. Each new stack follows the same pattern: YAML → locals → resources

The teams that succeed with this approach start small, prove the pattern works, and expand organically. The teams that fail try to build a platform for every possible use case before anyone has submitted a single YAML file.

Azure docs: Azure landing zone Terraform modules · Terraform state management

What This Means in Practice

A YAML-driven infrastructure catalog isn’t a framework or a product you install. It is a pattern - a way of organizing your Terraform codebase that separates configuration from implementation.

The platform team owns the stacks: the Terraform code, the pipeline, the CLI tooling. Application teams own the config: YAML files that describe what they need, in which environment, with what settings. The YAML boundary between them is the contract.

This pattern has saved our clients weeks of back-and-forth per quarter. More importantly, it turns infrastructure provisioning from a ticket-based process into a self-service workflow - without sacrificing governance or traceability.

Your platform team has better things to do than reviewing copy-pasted Terraform modules. Give them a catalog instead.

Related: Bicep vs Terraform: Why We Default to Terraform (and When Bicep Wins) explains our thinking on when to use Terraform and when Bicep is the better choice.

Ready to automate your infrastructure?

From Infrastructure as Code to CI/CD pipelines, we help teams ship faster with confidence and less manual overhead.

Typical engagement: 2-6 weeks depending on scope.
Discuss your automation goals
Share this article

Start with a Platform Health Check

Not sure where to begin? A quick architecture review gives you a clear picture. No obligation.