📊 05 — Monitor and Maintain Azure Resources
Domain Weight: 10–15% of exam
Covers Azure Monitor, Log Analytics, alerts, action groups, VM Insights, Network Watcher, Azure Backup, Recovery Services vaults, and Azure Site Recovery.
Table of Contents
📈 Monitor Resources in Azure
Azure Monitor Architecture
Azure Monitor is the central monitoring platform for all Azure resources.
1
2
3
4
5
6
7
8
9
10
11
12
Azure Resources (VMs, Storage, SQL, App Service...)
↓
Azure Monitor (collection layer)
├── Metrics (time-series numerical data)
└── Logs (structured/unstructured records)
↓
Log Analytics Workspace (logs storage + query)
↓
Action Layers
├── Alerts → Action Groups → (Email, SMS, Webhook, ITSM, runbook...)
├── Dashboards & Workbooks
└── Insights (VM Insights, Network Insights, Container Insights...)
Metrics in Azure Monitor
Metrics are lightweight, near-real-time numerical values (e.g. CPU %, requests/sec).
| Property | Value |
|---|---|
| Retention | 93 days (standard) |
| Granularity | 1 minute to 1 hour |
| Storage | Azure Monitor Metrics store (no workspace needed) |
| Query | Metrics Explorer in portal |
Common metric namespaces:
| Resource | Key Metrics |
|---|---|
| VM | Percentage CPU, Network In/Out, Disk Read/Write |
| Storage Account | Transactions, Availability, Egress, Ingress |
| App Service | HTTP 2xx/4xx/5xx, Response Time, CPU Time |
| SQL Database | DTU consumption, deadlocks, connections |
1
2
3
4
5
6
7
8
9
10
11
12
# List available metrics for a VM
az monitor metrics list-definitions \
--resource /subscriptions/<sub>/resourceGroups/MyRG/providers/Microsoft.Compute/virtualMachines/MyVM \
-o table
# Query CPU metric for last hour
az monitor metrics list \
--resource /subscriptions/<sub>/resourceGroups/MyRG/providers/Microsoft.Compute/virtualMachines/MyVM \
--metric "Percentage CPU" \
--interval PT1M \
--aggregation Average \
-o table
Log Analytics Workspace
Log Analytics Workspace is the storage and query engine for Azure Monitor Logs (also called Log Analytics logs or Azure Monitor Logs).
1
2
3
4
5
6
7
# Create a workspace
az monitor log-analytics workspace create \
--resource-group MyRG \
--workspace-name MyWorkspace \
--location westeurope \
--sku PerGB2018 \
--retention-time 90
Connecting resources to send logs:
| Resource | Method |
|---|---|
| VM | Azure Monitor Agent (AMA) + Data Collection Rule |
| Storage Account | Diagnostic settings |
| VNet NSG | NSG flow logs |
| Activity Log (subscription) | Diagnostic settings to workspace |
| App Service | Diagnostic settings |
1
2
3
4
5
6
7
# Create diagnostic setting to send VM metrics/logs to workspace
az monitor diagnostic-settings create \
--name VMDiagSettings \
--resource /subscriptions/<sub>/resourceGroups/MyRG/providers/Microsoft.Compute/virtualMachines/MyVM \
--workspace /subscriptions/<sub>/resourceGroups/MyRG/providers/Microsoft.OperationalInsights/workspaces/MyWorkspace \
--metrics '[{"category": "AllMetrics", "enabled": true}]' \
--logs '[{"category": "VMInsightsHealth", "enabled": true}]'
Querying Logs with KQL
Azure Monitor Logs uses Kusto Query Language (KQL).
Common KQL patterns:
// Heartbeat — check which agents are reporting in
Heartbeat
| summarize LastHeartbeat = max(TimeGenerated) by Computer
| where LastHeartbeat < ago(1h)
// CPU over 80% in last 24h
Perf
| where CounterName == "% Processor Time"
| where CounterValue > 80
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 5m)
// Failed logins on Windows VMs
SecurityEvent
| where EventID == 4625
| summarize count() by Account, Computer
| order by count_ desc
// Application exceptions from App Insights
exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc
// Storage account transactions
AzureMetrics
| where ResourceProvider == "MICROSOFT.STORAGE"
| where MetricName == "Transactions"
| summarize sum(Total) by bin(TimeGenerated, 1h)
KQL key operators:
| Operator | Purpose |
|---|---|
where |
Filter rows |
project |
Select/rename columns |
summarize |
Aggregate (count, avg, sum, max) |
extend |
Add computed columns |
join |
Join two tables |
render |
Visualise (timechart, barchart, etc.) |
ago(1h) |
Time expression: 1 hour ago |
bin(Time, 5m) |
Round time to 5-minute buckets |
Setting Up Alert Rules
Alerts trigger when a condition is met. Components: Alert rule → Condition → Action group.
Alert rule types:
| Type | Source | Use Case |
|---|---|---|
| Metric alert | Azure Monitor Metrics | CPU > 90%, response time > 2s |
| Log search alert | Log Analytics KQL query | Error count > 10 in 5 min |
| Activity log alert | Azure Activity Log | VM deleted, policy assignment changed |
| Smart detection alert | Application Insights | Anomalous failure rates |
1
2
3
4
5
6
7
8
9
10
11
# Create a metric alert for CPU > 90%
az monitor metrics alert create \
--resource-group MyRG \
--name HighCPUAlert \
--resource /subscriptions/<sub>/resourceGroups/MyRG/providers/Microsoft.Compute/virtualMachines/MyVM \
--condition "avg Percentage CPU > 90" \
--evaluation-frequency PT1M \
--window-size PT5M \
--severity 2 \
--action /subscriptions/<sub>/resourceGroups/MyRG/providers/microsoft.insights/actionGroups/MyActionGroup \
--description "VM CPU above 90% for 5 minutes"
Alert severity levels:
| Severity | Level |
|---|---|
| 0 | Critical |
| 1 | Error |
| 2 | Warning |
| 3 | Informational |
| 4 | Verbose |
Action Groups
Action groups define what happens when an alert fires. One alert can trigger multiple actions.
Action types:
| Type | Description |
|---|---|
| Email/SMS/Push/Voice | Notify a person |
| Azure Function | Trigger a function |
| Logic App | Start a workflow |
| Webhook | HTTP POST to a URL |
| ITSM | Create ticket in ServiceNow, etc. |
| Automation Runbook | Run a PowerShell runbook |
| Event Hubs | Stream alert to Event Hubs |
| Secure Webhook | Webhook with Entra ID auth |
1
2
3
4
5
6
# Create action group
az monitor action-group create \
--resource-group MyRG \
--name MyActionGroup \
--short-name MyAG \
--action email alice alice@contoso.com
Alert Processing Rules
Alert processing rules suppress or reroute alerts — useful for maintenance windows.
- Suppression: Silence alerts during scheduled maintenance
- Action group override: Add/remove action groups from triggered alerts
1
2
3
4
5
6
7
az monitor alert-processing-rule create \
--resource-group MyRG \
--name MaintenanceSuppress \
--scopes /subscriptions/<sub>/resourceGroups/MyRG \
--rule-type Suppression \
--schedule-start-datetime "2025-12-24 00:00:00" \
--schedule-end-datetime "2025-12-26 00:00:00"
VM Insights, Storage Insights, and Network Insights
VM Insights: Pre-built workbooks for VM performance and dependency mapping.
- Requires Azure Monitor Agent (AMA) + Dependency agent
- Provides: CPU, memory, disk, network utilisation; Map view of connections
Storage Insights: Overview of all storage accounts — transactions, errors, latency, capacity.
Network Insights: Topology view, health overview, metrics for VNets, LBs, VPN Gateways.
Azure Network Watcher and Connection Monitor
| Tool | Frequency | Use Case |
|---|---|---|
| Connection Troubleshoot | On-demand | One-time connectivity test |
| Connection Monitor | Continuous | Ongoing network health monitoring |
| NSG Flow Logs | Continuous | Log all traffic decisions for analysis |
| Packet Capture | On-demand | Capture packets for deep analysis |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Enable Network Watcher (per region)
az network watcher configure \
--locations westeurope \
--enabled true \
--resource-group NetworkWatcherRG
# Enable NSG flow logs
az network watcher flow-log create \
--resource-group MyRG \
--nsg MyNSG \
--storage-account mystorageacct \
--workspace MyWorkspace \
--enabled true \
--name MyNSGFlowLog
💾 Implement Backup and Recovery
Recovery Services Vault
A Recovery Services vault stores backup data and is used by both Azure Backup and Azure Site Recovery.
1
2
3
4
5
6
7
8
9
10
11
# Create Recovery Services vault
az backup vault create \
--resource-group MyRG \
--name MyRSVault \
--location westeurope
# Enable soft delete (default: enabled, 14 days)
az backup vault backup-properties set \
--resource-group MyRG \
--name MyRSVault \
--soft-delete-feature-state Enable
Vault redundancy options:
| Setting | Replication | Use Case |
|---|---|---|
| Locally Redundant (LRS) | 3 copies, 1 datacenter | Low cost; acceptable if no DR needed |
| Geo-Redundant (GRS) | 6 copies, 2 regions | Default; recommended for DR |
| Zone-Redundant (ZRS) | 3 copies, 3 AZs | HA within region |
⚠️ Exam Caveat: Vault redundancy must be set before the first backup item is registered. You cannot change it after items are protected.
Azure Backup Vault
Backup Vault is a newer vault type used for newer backup data sources (not Recovery Services):
| Data Source | Vault Type |
|---|---|
| Azure VMs | Recovery Services Vault |
| SQL in VMs | Recovery Services Vault |
| Azure Files | Recovery Services Vault |
| Azure Blobs | Backup Vault |
| Azure Disks | Backup Vault |
| PostgreSQL | Backup Vault |
Backup Policies
Backup policies define schedule and retention.
1
2
3
4
5
6
7
8
9
10
11
12
# Show default VM backup policy
az backup policy show \
--resource-group MyRG \
--vault-name MyRSVault \
--name DefaultPolicy
# Enable VM backup
az backup protection enable-for-vm \
--resource-group MyRG \
--vault-name MyRSVault \
--vm /subscriptions/<sub>/resourceGroups/MyRG/providers/Microsoft.Compute/virtualMachines/MyVM \
--policy-name DefaultPolicy
Policy retention examples:
| Recovery Point | Retention |
|---|---|
| Daily | 7–9999 days |
| Weekly | 1–5163 weeks |
| Monthly | 1–1188 months |
| Yearly | 1–99 years |
Perform Backup and Restore
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Run an on-demand backup immediately
az backup protection backup-now \
--resource-group MyRG \
--vault-name MyRSVault \
--container-name iaasvmcontainer;iaasvmcontainerv2;MyRG;MyVM \
--item-name MyVM \
--backup-management-type AzureIaasVM \
--retain-until 2025-12-31
# List restore points
az backup recoverypoint list \
--resource-group MyRG \
--vault-name MyRSVault \
--container-name iaasvmcontainer;iaasvmcontainerv2;MyRG;MyVM \
--item-name MyVM \
--backup-management-type AzureIaasVM \
-o table
# Restore a VM (to new VM)
az backup restore restore-disks \
--resource-group MyRG \
--vault-name MyRSVault \
--container-name iaasvmcontainer;iaasvmcontainerv2;MyRG;MyVM \
--item-name MyVM \
--rp-name <recovery-point-name> \
--storage-account mystorageacct \
--restore-to-staging-storage-account true
Restore types:
| Type | Description |
|---|---|
| Restore VM | Create new VM from recovery point |
| Restore disks | Restore managed disks; attach manually |
| File recovery | Mount recovery point as drive; copy individual files |
| Cross-region restore | Restore to secondary region (requires GRS vault) |
⚠️ Exam Caveat: Azure Backup does not protect temporary disks. Data on the temp disk (
D://dev/sdb) is not included in backups.
Azure Site Recovery (ASR)
ASR provides disaster recovery — replicates VMs to a secondary region so you can failover.
ASR components:
| Component | Role |
|---|---|
| Source region | Where VMs currently run |
| Target region | Failover destination |
| Recovery Services Vault | Stores replication config (must be in target region) |
| Replication policy | RPO/RTO targets; app-consistent snapshot frequency |
| Recovery plan | Ordered failover of multiple VMs with custom scripts |
1
2
3
4
5
6
7
8
# Enable replication for a VM (portal is easier; CLI shows concept)
az site-recovery protection-container-mapping create \
--resource-group MyRG \
--vault-name MyRSVault \
--fabric-name westeurope \
--protection-container-name westeurope-container \
--name MyReplicationMapping \
--policy-id <policy-id>
ASR failover types:
| Failover Type | Description | Use Case |
|---|---|---|
| Test failover | Non-disruptive; creates VM in isolated network | DR drill |
| Planned failover | Zero data loss; shut down source first | Planned migration |
| Unplanned failover | Possible data loss; uses latest recovery point; source may be down; | Actual disaster |
⚠️ Exam Caveat: After failover, you must commit the failover to complete the process, then set up replication back (re-protect) to the original region if you want the ability to fail back.
Configure Reports and Alerts for Backups
Backup Center: Unified view of all backup data across vaults, subscriptions, and regions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# List backup jobs
az backup job list \
--resource-group MyRG \
--vault-name MyRSVault \
-o table
# Set up backup alerts (via Azure Monitor)
az monitor metrics alert create \
--resource-group MyRG \
--name BackupFailureAlert \
--resource /subscriptions/<sub>/resourceGroups/MyRG/providers/Microsoft.RecoveryServices/vaults/MyRSVault \
--condition "count BackupHealthEvent > 0" \
--window-size PT1H \
--severity 1
Built-in backup alerts types:
| Alert | Trigger |
|---|---|
| Backup failure | Job fails |
| Delete backup data | Protected item deleted |
| Soft delete | Item moved to soft-delete state |
| Stop protection | Backup protection disabled |
🔄 Domain 5 Scenario Quick Reference
| Scenario | Solution |
|---|---|
| Get notified when VM CPU stays above 90% for 5 minutes | Create metric alert (Percentage CPU > 90, 5-min window) + action group |
| Silence alerts during a weekend maintenance window | Create alert processing rule with suppression schedule |
| Back up all VMs in a resource group daily and keep for 30 days | Create Recovery Services Vault + backup policy + enable protection |
| Recover a single file from a VM backup | Use File Recovery mount point from recovery point |
| Replicate VMs to a secondary region for DR | Enable Azure Site Recovery replication |
| Run a DR drill without affecting production | Test failover to isolated network in ASR |
| Analyse which VMs haven’t sent a heartbeat in 1 hour | KQL: Heartbeat \| summarize max(TimeGenerated) by Computer \| where ... < ago(1h) |
| Send backup failure notifications to the ops team | Configure Backup alerts + action group with email |
| Monitor network traffic decisions from NSGs | Enable NSG flow logs to a storage account and Log Analytics |
| Continuously test that VMs can reach an internal API | Configure Connection Monitor in Network Watcher |