🐳 Azure Kubernetes Service (AKS)

Fully managed Kubernetes — microservices orchestration with full control


Table of Contents

  1. Product Overview
  2. Node Pools
  3. Networking
    1. Network Policy
  4. RBAC & Security
  5. Scaling
  6. Upgrades
  7. SLA
  8. Monitoring
  9. Common Exam Scenarios

Product Overview

Azure Kubernetes Service (AKS) is a managed Kubernetes service that offloads cluster control-plane management (API server, etcd, scheduler) to Azure — you only manage and pay for the worker nodes. AKS is the right choice when you need full Kubernetes flexibility: custom controllers, advanced networking, stateful workloads, GPU scheduling, or fine-grained resource control at scale.

flowchart TD
    subgraph AKS["AKS Cluster"]
        CP["Control Plane\n(Azure-managed, free)"]
        subgraph NP["Node Pools"]
            SYS["System Node Pool\n(kube-system, DNS, metrics)"]
            USR["User Node Pool(s)\n(application workloads)"]
            SPT["Spot Node Pool\n(optional, interruptible)"]
        end
        CP --> NP
    end
    subgraph Net["Networking"]
        KBN["Kubenet\n(simpler, NAT)"]
        CNI["Azure CNI\n(pod IPs from VNet)"]
        OVL["Azure CNI Overlay\n(pods on private overlay)"]
    end
    subgraph Add["Add-ons & Integrations"]
        MON["Azure Monitor\n(Container Insights)"]
        KV["Key Vault\n(CSI driver)"]
        ACR["Azure Container Registry"]
        KEDA["KEDA\n(event-driven scaling)"]
        DAPR["Dapr\n(sidecar mesh)"]
    end
    AKS --> Net
    AKS --> Add

Node Pools

Pool Type Purpose Notes
System Runs Kubernetes system pods (DNS, metrics-server) Every cluster needs at least one; taint prevents app pods
User Runs application workloads One or more; different VM sizes per pool
Spot Uses Azure Spot VMs — up to 90% cheaper Evictable; use for fault-tolerant batch workloads
Virtual nodes Burst to Azure Container Instances Instant scale-out without provisioning nodes

⚠️ Exam Caveat — System vs User Node Pool: You cannot delete the system node pool while the cluster exists. For compliance or isolation, application workloads should use a dedicated User node pool with taints/tolerations to prevent system pods from co-locating.


Networking

Mode How Pods Get IPs Pros Cons
Kubenet NAT from node IP; pods share node IP Simpler, smaller IP usage No direct pod-to-pod routing from VNet; requires UDRs for hybrid access
Azure CNI Each pod gets a real VNet IP Full VNet routing, accessible from on-prem Requires large IP address space (nodes × max_pods)
Azure CNI Overlay Pods on private CIDR overlay; nodes get VNet IPs Reduces IP consumption vs CNI Slightly more complexity
Azure CNI Powered by Cilium eBPF-based networking High performance, network policy, Hubble observability Newer; not supported on all scenarios

⚠️ Exam Caveat — Kubenet vs Azure CNI: If the scenario mentions pods must be directly reachable from on-premises or from other VNet resources without NAT, the answer is Azure CNI (pods get real VNet IPs). Kubenet uses NAT and pods are not directly routable from outside the cluster.

Network Policy

Engine Notes
Calico Open-source; supports Kubenet and Azure CNI
Azure Network Policy Native Azure; Azure CNI only
Cilium eBPF-based; Azure CNI Powered by Cilium only

RBAC & Security

Feature Detail
Kubernetes RBAC Role/ClusterRole + RoleBinding/ClusterRoleBinding inside the cluster
Azure RBAC for Kubernetes Entra ID groups/users mapped to Kubernetes RBAC — no kubeconfig credential management
Workload Identity Pods authenticate to Azure services via Entra ID managed identity (replaces Pod Identity)
Key Vault CSI Driver Mount secrets from Azure Key Vault directly into pod file systems
Microsoft Defender for Containers Runtime threat detection, image scanning, Kubernetes audit log analysis
Private cluster AKS API server exposed only on a private IP within the VNet

⚠️ Exam Caveat — Azure RBAC vs Kubernetes RBAC: With Azure RBAC for Kubernetes, you manage access through Entra ID and Azure role assignments — no need to distribute kubeconfig files or manage local Kubernetes accounts. This is the preferred model for enterprise environments.


Scaling

Mechanism Description
Horizontal Pod Autoscaler (HPA) Scales pod replicas based on CPU, memory, or custom metrics
Vertical Pod Autoscaler (VPA) Adjusts pod CPU/memory requests and limits automatically
Cluster Autoscaler Adds or removes nodes when pods cannot be scheduled (scale-out) or nodes are underutilised (scale-in)
KEDA Event-driven scaling of pods based on external event sources (queues, Event Hubs, HTTP)
Virtual Nodes (ACI burst) Instantly burst workloads to ACI without provisioning new nodes

⚠️ Exam Caveat — Cluster Autoscaler vs HPA: HPA scales pods; Cluster Autoscaler scales nodes. They complement each other: HPA adds pods → if no capacity, Cluster Autoscaler adds nodes. Both should be enabled together for elastic production workloads.


Upgrades

AKS supports Kubernetes version upgrades for both the control plane and node pools:

Upgrade Type Detail
Control plane Upgraded first; backward-compatible with N-2 node pool versions
Node pool Upgraded separately after control plane; can be staged
Node surge Extra nodes provisioned during upgrade to maintain capacity (configurable %)
Auto-upgrade channels patch, stable, rapid, node-image — automated upgrade cadence

⚠️ Exam Caveat: AKS only supports the latest 3 minor Kubernetes versions (N, N-1, N-2). If a cluster runs an unsupported version, it cannot be upgraded — a new cluster may be required.


SLA

Configuration SLA
Free control plane tier No SLA
Standard tier (paid control plane) 99.95%
Standard tier + Availability Zones 99.99%

⚠️ Exam Caveat: The AKS control plane SLA (99.95% / 99.99%) requires the Standard tier (not free). Free tier clusters have no uptime guarantee — suitable only for dev/test.


Monitoring

Tool Coverage
Container Insights (Azure Monitor) Node/pod CPU, memory, logs, live data
Prometheus + Grafana Open-source metrics; Azure Managed Prometheus available
Azure Monitor Alerts Alert on pod restarts, node CPU, OOM events
Kubernetes Events Cluster-level event stream for scheduling, scaling, errors

Common Exam Scenarios

Scenario Answer
Full Kubernetes control, custom controllers AKS
Pods must be directly routable from on-premises AKS + Azure CNI
Manage cluster access via Entra ID groups Azure RBAC for Kubernetes
Auto-scale pods on Event Hub message count KEDA on AKS
Burst to immediate capacity without new nodes Virtual Nodes (ACI integration)
Highest AKS SLA Standard tier + Availability Zones (99.99%)
Protect API server from public internet Private AKS cluster
Mount Key Vault secrets into pod filesystem Key Vault CSI Driver
Automated patch version upgrades Auto-upgrade channel: patch
Cost-optimise interruptible batch node pool Spot node pool