Azure Policy Guardrails That Developers Don't Hate
In this article

We have seen dozens of Azure Policy implementations across enterprise clients. The majority share the same failure mode: the governance team enables 80 policies on a Monday, sets them all to Deny, and by Friday the helpdesk is drowning in tickets from developers who can’t deploy anything. Policy gets a reputation as the thing that blocks work. Teams find workarounds. The governance team loses credibility. Everyone loses.
Azure Policy is one of the strongest governance tools in any cloud platform. It is best at enforcing clear, deterministic rules at deployment and attaching baseline configuration automatically. It is much weaker as a synchronization mechanism for business metadata, exception lifecycle management, or ongoing reconciliation with systems outside Azure. But within its sweet spot, implementation matters more than the tool itself. The difference between policies that protect the platform and policies that paralyse it comes down to which policies you pick, how you roll them out, and whether developers have a path forward when something gets blocked.
Why Most Azure Policy Implementations Fail
The pattern is predictable. An organisation decides to “get serious about governance” and turns on everything at once. Too many policies enabled in a single wave. Deny mode from day one with no transition period. No exception path for legitimate edge cases. Assignments that nobody reviews or maintains after the initial deployment. The result: developers see policies as obstacles, not guardrails.
The root cause is treating policy as a one-time compliance exercise instead of a product. Policies need ownership, a rollout plan, and an ongoing maintenance cycle. We wrote about this broader principle in our piece on why developers need guardrails, not more tools. The same thinking applies here: governance should make the right thing easy, not make every thing hard.
The Policies That Actually Matter
Not all policies carry equal weight. After years of Azure engagements, we’ve narrowed it down to four categories that deliver the highest governance value with the lowest developer friction.
Tag Enforcement
Tags are the foundation of cost management, ownership tracking, and operational accountability. Without consistent tags, your FinOps reporting is unreliable, incident responders can’t find the team that owns a resource, and nobody knows which cost centre is paying for that forgotten VM.
The practical approach: enforce tags on resource groups, not on every individual resource. Resources inherit the resource group’s tags in cost reporting, and enforcing at the resource group level avoids the constant friction of every single resource creation requiring tag input.
Here is a policy definition that denies resource group creation when a required tag is missing:
{
"mode": "All",
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Resources/subscriptions/resourceGroups"
},
{
"field": "[concat('tags[', parameters('tagName'), ']')]",
"exists": "false"
}
]
},
"then": {
"effect": "deny"
}
},
"parameters": {
"tagName": {
"type": "String",
"metadata": {
"displayName": "Required Tag Name",
"description": "Name of the tag that must be present on resource groups"
}
}
}
}
Assign this three times with different parameter values: CostCenter, Environment, and Owner. Three assignments, one definition. Clean and maintainable.
One caveat: tag inheritance in Azure Cost Management is not automatic. You need to explicitly enable it in Cost Management settings, and not all billing account types support it (EA, MCA, and MPA with Azure Plan are supported). If tag inheritance is not enabled, tagging only at the resource group level will leave gaps in your usage records because individual resource rows in cost exports will show empty tag values. Verify that your billing account supports tag inheritance before relying on the RG-only tagging strategy for FinOps reporting.
A broader caveat: if your real problem is that cost centres, ownership, or application mappings drift over time, Policy is only part of the answer. Use Policy to require tags and apply safe inheritance patterns. Use Cost Management tag inheritance where supported for reporting. But use a separate reconciliation process against the authoritative source (HR, CMDB, finance system) for ongoing accuracy. Policy enforces tagging shape. It does not maintain tagging truth.
If your teams use Terraform or Bicep with standard templates, the tags are already in the code. Developers rarely even notice this policy exists. That is the goal: governance that works in the background.
SKU and Region Restrictions
Unrestricted SKU access leads to two problems. Developers accidentally provision expensive VM sizes in development environments. And resources end up in regions that violate data residency requirements or sit outside your network topology.
For VM sizes, use the built-in Allowed virtual machine size SKUs policy. Assign it at the landing zone management group level with a list of approved SKUs per environment. Development subscriptions get B-series and D2s. Production gets the full approved list. Simple, effective, rarely causes complaints because developers don’t usually care which specific SKU they get as long as it works.
For regions, the built-in Allowed locations policy handles this:
{
"listOfAllowedLocations": {
"value": [
"westeurope",
"northeurope"
]
}
}
Assign at the organisation management group level. Every resource must live in West Europe or North Europe. If your organisation operates globally, adjust the list per management group. The key is that the restriction exists somewhere in the hierarchy.
These two policies together prevent the most common accidental cost overruns and compliance violations we see in the field.
Network Controls
Network policies carry the highest security value and require the most care in rollout. Three policies matter most for production environments.
Deny public IPs on network interfaces. Production workloads should not have public IP addresses directly attached to NICs. Traffic should flow through a load balancer, Application Gateway, or Azure Firewall. Use the built-in Network interfaces should not have public IPs policy in Deny mode on production subscriptions.
Require private endpoints for PaaS services. Storage accounts, Key Vaults, and SQL databases should not be accessible over the public internet in production. Azure provides built-in policies for each service type. For example, Azure Key Vault should disable public network access and Storage accounts should use private link. Start with Audit to see your current exposure, then move to Deny once existing resources are remediated.
Deny subnets without NSG association. Every subnet should have a Network Security Group attached. The built-in Subnets should be associated with a Network Security Group policy catches subnets that were created manually or through automation that skipped the NSG step. Assign in Audit mode first, because this one surfaces a surprising number of existing violations in most environments.
Network policies are where the exception path matters most. Some Azure services (like Azure Bastion or Azure Firewall subnets) have specific requirements that conflict with blanket rules. Plan your exemptions before switching to Deny.
Diagnostic Settings
Of all the policy categories, diagnostic settings are the “invisible guardrail” that developers appreciate most, because they never have to think about it.
A DeployIfNotExists policy automatically creates diagnostic settings on resources that don’t have them, routing logs and metrics to your central Log Analytics workspace. The developer creates a Key Vault. The policy silently adds diagnostic settings. Logs start flowing. Nobody filed a ticket. Nobody forgot to configure monitoring.
{
"policyRule": {
"if": {
"field": "type",
"equals": "Microsoft.KeyVault/vaults"
},
"then": {
"effect": "DeployIfNotExists",
"details": {
"type": "Microsoft.Insights/diagnosticSettings",
"roleDefinitionIds": [
"/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"
],
"existenceCondition": {
"field": "Microsoft.Insights/diagnosticSettings/workspaceId",
"equals": "[parameters('logAnalyticsWorkspaceId')]"
},
"deployment": {
"properties": {
"mode": "incremental",
"template": {
"resources": [
{
"type": "Microsoft.KeyVault/vaults/providers/diagnosticSettings",
"apiVersion": "2021-05-01-preview",
"name": "[concat(field('name'), '/Microsoft.Insights/setByPolicy')]",
"properties": {
"workspaceId": "[parameters('logAnalyticsWorkspaceId')]",
"logs": [
{
"categoryGroup": "allLogs",
"enabled": true
}
],
"metrics": [
{
"category": "AllMetrics",
"enabled": true
}
]
}
}
]
}
}
}
}
}
}
}
Two operational details that trip people up with DeployIfNotExists. First, the policy assignment needs a managed identity with the right RBAC roles to create diagnostic settings on target resources. The roleDefinitionIds in the policy definition tell Azure which roles to assign, but the identity must actually have those permissions at the scope where the policy is assigned. If the identity lacks permissions, the remediation silently fails. Second, DeployIfNotExists only triggers on new resource creation or updates. Resources that already exist and are noncompliant will not be fixed by the assignment alone. You need to run a remediation task to bring existing resources into compliance. Plan for an initial remediation pass when you first assign these policies.
You need one definition per resource type (Key Vault, Storage, SQL, and so on), but the ALZ accelerator includes most of these out of the box. Check the Azure Landing Zone policies reference to see which diagnostics policies are already available as built-in definitions before writing custom ones.
The Exception Path
Before you switch any policy to Deny mode, you need an exception path. Without one, the first legitimate edge case that gets blocked will turn developers against the entire policy framework.
Azure Policy supports exemptions with two categories: Waiver (the policy applies but we’re accepting the risk) and Mitigated (the risk is addressed through an alternative control). Use the right category. It matters for audit trails.
Every exemption should be time-limited. Set an expiration date, typically 90 days, with a mandatory review before renewal. Document the business justification directly in the exemption resource’s description field. When auditors ask why a production storage account has public access, the answer should be in the exemption metadata, not in someone’s memory.
The exemption workflow should be lightweight: a pull request to the policy repository, reviewed by the security or platform team, merged and applied through the same pipeline that manages policy assignments. If requesting an exception requires a three-week change advisory board process, developers will find workarounds instead of requesting exceptions. That is worse for governance, not better.
Rollout Strategy
Rolling out Azure Policy is a phased process. Rushing it is the single most common mistake we see.
Week 1-2: Audit everything. Enable your target policies in Audit mode across the hierarchy. Audit mode logs non-compliance without blocking anything. Let it run for at least two weeks to capture a realistic compliance baseline.
Week 3: Review compliance reports. Look at the Azure Policy compliance dashboard. Sort by non-compliant resources. You will find patterns: missing tags on old resource groups, storage accounts in unapproved regions, subnets without NSGs. Categorise the findings into quick fixes (tag remediation, region moves for non-production resources) and items that need planning (network architecture changes, PaaS service reconfiguration).
Week 4-6: Fix the easy ones. Run remediation tasks for tag policies. Clean up test resources in wrong regions. Attach NSGs to unprotected subnets. Get compliance scores up before switching to enforcement.
From week 7 onward, move policies to Deny one at a time. Start with tag enforcement, which is the least controversial. Then region restrictions. Then SKU limits. Network policies go last because they have the most edge cases. Each switch should be announced to affected teams with at least a week’s notice and a documented exception path.
Maintain a policy backlog like any product backlog. New compliance requirements, new Azure services that need diagnostic settings, policies that need updating as built-in definitions improve. The platform team reviews this backlog on the same cadence as their other work, as we discussed in the landing zones maintenance piece.
Your Starter Policy Set
If you’re starting from scratch, enable these seven policies first. All in Audit mode. Move to Deny or DeployIfNotExists after your baseline review.
- Require
CostCentertag on resource groups (Deny after baseline) - Require
Ownertag on resource groups (Deny after baseline) - Require
Environmenttag on resource groups (Deny after baseline) - Allowed locations: West Europe, North Europe (Deny after baseline)
- Allowed VM SKUs per subscription tier (Deny after baseline)
- Subnets should have NSG association (Audit, then Deny on new subscriptions)
- Deploy diagnostic settings for Key Vault to Log Analytics (DeployIfNotExists)
Seven policies. Three tag rules, two restriction rules, one network hygiene rule, one invisible automation. That set covers cost governance, data residency, spend control, basic network security, and observability. Add more only after these are stable and your exception path is working.
Start with Audit. Fix what’s broken. Switch to Deny when compliance is above 90%. Keep the exception path lightweight. Review quarterly. That is the entire playbook.
Related: Your Developers Don’t Need More Tools. They Need a Paved Path. explains the broader philosophy behind guardrails that work with teams instead of against them. Azure Landing Zones in 2026 and What Actually Matters Now covers policy hygiene as part of ongoing landing zone operations.
Looking for Azure architecture guidance?
We design and build Azure foundations that scale - landing zones, networking, identity, and governance tailored to your organisation.