KEDA Graduates CNCF and What It Means for Event-Driven Kubernetes Autoscaling

By GenioCT | Published on 22 August 2023 | 10 min read

KEDA Kubernetes Autoscaling CNCF AKS Architecture

In this article

What KEDA Is
The Architectural Premise
Where KEDA Earns Its Place
Where KEDA Does Not Pay Off
KEDA on Azure
Practical Scaler Choices
The Honest Trade-off
Where This Leaves Architects

A row of pod-shaped icons growing from zero to a peak and back to zero, with message-queue icons above feeding the scaling decision: event-driven autoscaling that reacts to queue depth, schedules, and external metrics rather than just CPU and memory.

In short: KEDA graduated from CNCF in August 2023 after four years through Sandbox and Incubation. Event-driven Kubernetes autoscaling is now a low-risk, production-mature choice for queue-consumer workloads, scheduled batch, and any service that should scale on something other than CPU and memory. The architectural decision is no longer whether to use KEDA, but which scalers earn their place, where to deploy it, and where the default Horizontal Pod Autoscaler is still enough.

The Cloud Native Computing Foundation graduated KEDA (Kubernetes Event-Driven Autoscaling) in August 2023, the final tier in the CNCF maturity model. Graduation requires production adoption at scale, a multi-vendor maintainer base, a passed security audit, and a stable governance model. The signal for architects is that KEDA can now be treated the way platform teams already treat Prometheus, Envoy, or cert-manager: a graduated CNCF project with a credible long-term trajectory and the procurement-defensibility that goes with it.

KEDA first shipped as v1.0 in May 2019, jointly maintained by Microsoft and Red Hat. It entered the CNCF Sandbox three months later. Incubation followed in March 2020. KEDA v2.0 in November 2020 rearchitected the project from a single monolithic agent into a Kubernetes-native operator plus a metrics-server adapter, and the v2 redesign is what made KEDA viable for production clusters at scale. The August 2023 graduation confirms that trajectory.

What KEDA Is

KEDA extends the Kubernetes Horizontal Pod Autoscaler. The default HPA only scales on CPU and memory metrics out of the box. KEDA brings in external signals (queue depth, message age, stream lag, scheduled time, custom Prometheus expressions, sixty-plus other sources) and makes them usable as scaling triggers without writing a controller per source.

The architecture has three parts. A KEDA operator watches the cluster for ScaledObject and ScaledJob custom resources. Each ScaledObject points at a workload (a Deployment, StatefulSet, or custom resource that implements the scale subresource) and declares one or more triggers, each backed by a scaler. The KEDA metrics adapter exposes those external signals through the standard Kubernetes external-metrics API. KEDA itself manages the 1-to-0 and 0-to-1 transitions through an activation-threshold mechanism, since the built-in HPA cannot natively scale to or from zero replicas.

Credentials for the trigger source (storage account keys, Service Bus connection strings, AWS access keys) are decoupled through a TriggerAuthentication resource that can pull from Kubernetes secrets, Azure Key Vault, AWS Secrets Manager, HashiCorp Vault, or workload identity. The split keeps scaling configuration in one place and identity configuration in another, which matches how most platform teams already organise RBAC.

The Architectural Premise

CPU is a poor proxy for backlog. A queue consumer waiting for messages sits at near-zero CPU even when fifty thousand messages are queued. A traditional HPA reading CPU would not scale up because the consumer is idle. KEDA lets the same workload scale on the actual signal: the message count in the queue, the age of the oldest message, the consumer-group lag on a Kafka topic.

The same principle applies to scheduled batch work. A workload that should only exist for a daily processing window does not need a 24/7 deployment; it needs a cron trigger that scales the workload up at 02:00, runs for an hour, and scales it back to zero. That used to mean Kubernetes CronJob plus a separate scaling story; KEDA collapses the two into one ScaledObject.

The activation threshold tells KEDA “do not bother scaling up unless at least N events are present”, which prevents thrashing when a few stray events arrive after a quiet period. Without it, scale-to-zero workloads would wake up for every single message and defeat the cost benefit of running them at zero in the first place.

Where KEDA Earns Its Place

KEDA pays off when the workload has a clear external signal that drives demand and the team is comfortable operating one more cluster component.

Queue-consumer workloads are the canonical fit. Service Bus, Storage Queue, Event Hubs, Kafka, RabbitMQ, SQS, and Google Pub/Sub all have first-party scalers that work out of the box. A consumer service that processes messages in batches benefits from scaling on queue depth or message age, and from scaling to zero when the queue drains.

Scheduled batch jobs benefit from the cron scaler. Workloads that should only exist during a known window cost less when KEDA tears the deployment down outside that window.

Event-driven microservices that idle most of the time and burst on demand fit the same shape. So do disaster-recovery workloads that should be at zero replicas in the standby region unless promoted. Both patterns rely on the activation threshold to avoid waking up the workload unnecessarily.

Cost-conscious AKS clusters where idle replicas would dominate the bill benefit indirectly. KEDA does not reduce the cost of an active workload, but it makes it practical to run more workloads in the same cluster because their idle baseline drops to near zero.

Where KEDA Does Not Pay Off

Steady-state workloads with predictable load that tracks CPU or memory closely do not need KEDA. The default HPA is enough, simpler to operate, and one fewer component to upgrade.

Latency-critical synchronous APIs where pre-warmed capacity matters more than precise scaling do not benefit. Scale-to-zero implies a cold start on the first request after the scale-up, and for some workloads that cost is unacceptable. The KEDA decision here is the same shape as any serverless decision: scale-to-zero only pays off when the workload tolerates the cold start.

Workloads where the external signal is noisy or unreliable cause more trouble than they solve. A Prometheus metric that lags by minutes will produce scaling decisions based on stale data. A queue that occasionally reports phantom messages will cause thrashing. Validate the signal before relying on it.

Teams without the appetite to debug one more thing will find KEDA harder than it looks on the demo. Trigger auth misconfigurations, metric server replicas that fail to come back after a node drain, RBAC issues across namespaces, and the timing of HPA stabilisation windows all surface in production sooner or later. Container Apps absorbs most of that operational weight. On AKS, the team owns it.

KEDA on Azure

Azure offers three integration shapes and they differ significantly in operational weight.

The AKS managed KEDA add-on installs and updates KEDA as a first-party component of the cluster. Microsoft owns the lifecycle: install, upgrades, security patches, integration with Azure Monitor. At time of writing the add-on is in public preview with general availability on the near-term roadmap. For platform teams already running AKS, the managed add-on is the lowest-friction way to get KEDA into the cluster.

Container Apps uses KEDA as the autoscaling engine under the hood. The ScaledObject CRD is abstracted away into “scale rules” defined in the Container Apps configuration model. You write a Service Bus or Storage Queue rule against a connection string or managed identity, and the platform translates that into the underlying KEDA configuration. The result is KEDA without operating KEDA. The trade-off is that custom scaler authorship and advanced KEDA features (some metadata fields, custom auth modes, certain CRD-only options) are not exposed.

Self-hosted KEDA via the official Helm chart is the third option. Platform teams that need full control over the configuration, that develop custom scalers, or that run multi-cluster patterns where the managed add-on does not fit, install KEDA themselves and own the lifecycle. This is the default on non-AKS Kubernetes distributions and remains a valid choice on AKS for teams already deep in cluster operations.

Practical Scaler Choices

The KEDA scaler catalogue passed sixty entries by graduation. A handful carry most of the production weight on Azure, and each has its own operational sharp edges.

Service Bus (queue and topic/subscription) is the most-deployed Azure scaler. It is well-tested, has clear semantics, and matches the patterns most enterprise message workloads already use. The metric is queue length by default and can be switched to active message count.

Storage Queue is the simplest scaler on Azure. The metric is queue length, the credential model is straightforward, and the scaler has been stable across KEDA versions.

Event Hubs scales on either unprocessed event count or consumer-group lag. The configuration that catches teams out is the checkpoint store: KEDA needs to read the same Storage account where the consumer is committing checkpoints, and the consumer-group lag calculation is only as accurate as the latest checkpoint. Verify the checkpoint cadence before tuning the scaler.

Kafka scales on consumer-group lag per partition. The scaler works but partition-lag scaling needs tuning. A workload with twelve partitions and a slow consumer can sit with five hundred messages of lag for hours before the scaler reacts, because activation thresholds and stabilisation windows are conservative by default. Tighten them only after measuring.

Prometheus is powerful and introduces a hard dependency on the Prometheus stack being healthy. If Prometheus is down or stale, the scaler reports stale metrics. For environments where Prometheus is part of the platform contract this is fine. For environments where Prometheus is best-effort, prefer a source closer to the work.

Cron is the simplest trigger to configure and one of the most underused. A workload that should only exist between 02:00 and 04:00 each day is a one-line scaler definition.

CPU and memory scalers exist too. KEDA can manage these in the same ScaledObject framework as everything else, which is useful when the team wants one scaling concept across all workloads instead of mixing default HPA configuration with KEDA configuration.

The Honest Trade-off

KEDA solves a real Kubernetes gap and has matured into a low-risk component to add to a cluster. The operational weight is small but not zero: one extra controller, one extra CRD set, one extra IAM/RBAC story to keep healthy through cluster upgrades.

The scale-to-zero benefit is real but the cold-start cost has to be measured for the specific workload. Some workloads can tolerate a five-second cold start; others cannot tolerate a five-millisecond one. The KEDA decision is the same shape as any serverless decision and lives or dies on whether the workload behaviour matches the platform behaviour.

The CNCF graduation does not change the architectural decision. It changes the risk profile. Adopting a graduated project at a board-level architecture review is a defensible position with less paperwork than adopting an incubating project. The August 2023 milestone moves KEDA into the same long-term-bet category as Prometheus, Envoy, and Kubernetes itself.

Where This Leaves Architects

Default to KEDA on AKS for any workload that should scale on something other than CPU and memory, and prefer the managed add-on once it goes GA. Trust Container Apps to handle KEDA on your behalf for the HTTP and event-driven workloads it manages; the abstraction covers the common cases and the cost in lost flexibility is rarely the deciding factor for teams already on ACA.

Reach for self-hosted KEDA and custom scaler authorship only when the platform built-in catalogue does not cover the trigger you need. The barrier to entry is lower than writing a controller from scratch, but it is real.

Skip KEDA where default HPA already does the job. A steady-state web service that scales cleanly on CPU does not need a second scaling component in the mix. Adding KEDA there is operational weight without architectural benefit.

If your team is sizing a Kubernetes autoscaling strategy for queue-consumer workloads, scheduled batch, or scale-to-zero microservices, we help Azure-first organisations choose between AKS managed KEDA, Container Apps scale rules, and self-hosted installs, identify which scalers carry the most operational weight in practice, and define a clear default for which scaling concern lives where. Get in touch about a Kubernetes autoscaling review.

Related: Azure Container Apps vs AKS Decision Framework the broader platform decision KEDA sits on top of · Container Apps Express in 2026 the new ACA tier that scales to zero but does not yet expose custom KEDA scalers · AKS in 2026 and When It Still Wins the cluster-ownership end of the spectrum where custom scaler authorship lives · Dapr Graduates CNCF the sister CNCF graduation that completes the distributed-application building-block picture.

Share this article

KEDA Graduates CNCF and What It Means for Event-Driven Kubernetes Autoscaling

What KEDA Is

The Architectural Premise

Where KEDA Earns Its Place

Where KEDA Does Not Pay Off

KEDA on Azure

Practical Scaler Choices

The Honest Trade-off

Where This Leaves Architects

More from the blog

AKS in 2026 and When It Still Wins

AKS Just Went GA: What Enterprise Teams Need to Know Before Going All-In

Dapr Graduates CNCF and What It Means for Distributed Application Building Blocks

Start with a Governator-powered Azure Health Check