3f: Maintain Pipelines
Monitor Pipeline Health
Key Pipeline Health Metrics
| Metric | Description | Target |
|---|---|---|
| Build success rate | % of builds that succeed | > 95% |
| Pipeline duration | Total time from trigger to completion | Trending down over time |
| Test pass rate | % of tests passing | > 99% |
| Flaky test rate | Tests that sometimes pass, sometimes fail | < 1% |
| MTTR (pipeline) | Time to fix a broken build | < 30 minutes |
| Queue time | Time waiting for an available agent | < 5 minutes |
| Agent utilization | % of time agents are busy | < 80% (leave headroom) |
Monitoring in Azure DevOps
Pipeline analytics (built-in):
- Pipelines → Analytics: View pass rate, run duration, test results over time
- Test Plans → Analytics: Test pass rate trends, top failing tests
- Dashboards: Add Pipeline health widgets
Pipeline dashboard widgets: | Widget | Shows | |——–|——-| | Build History | Recent build results timeline | | Test Results Trend | Pass/fail rate over time | | Deployment Status | Latest deployment status per environment | | Pipeline Duration | Build time trends | | Code Coverage | Coverage % over time |
Identifying Flaky Tests
Flaky test = test that produces inconsistent results without code changes.
Causes of flaky tests:
- Race conditions / async issues
- Dependency on test execution order
- External service timeouts
- Date/time dependencies
- Random data without fixed seeds
Azure DevOps flaky test detection:
- Automatically flagged in Test Results view
- “Flaky” badge appears on identified tests
- Configure in: Project Settings → Pipelines → Flaky test management
Strategies to handle:
- Quarantine: Remove from required test suite temporarily
- Retry: Auto-retry failed test (mask the symptom — use cautiously)
- Fix root cause: Properly async/await, mock time dependencies
- Mark as flaky: Track separately, don’t block CI
1
2
3
4
5
# Auto-retry flaky tests (Azure Pipelines)
- task: VSTest@3
inputs:
rerunFailedTests: true
rerunMaxAttempts: 3
Optimize Pipeline Performance
Optimize for Cost
| Strategy | Saving |
|---|---|
| Use Microsoft-hosted agents for short jobs | No agent maintenance cost |
| Use self-hosted agents for long/frequent jobs | No per-minute cost |
| Run tests in parallel (reduce wall clock time) | Fewer agent-minutes consumed |
| Cache dependencies between runs | Avoid re-downloading packages |
| Skip unchanged components (incremental builds) | Run only what changed |
| Adjust retention (delete old artifacts/runs) | Reduce storage costs |
Agent cost comparison:
- Microsoft-hosted: Billed per minute × parallel jobs
- Self-hosted: Infrastructure cost (VM, AKS) + agent management
Optimize for Time
Parallelism:
1
2
3
4
5
6
7
8
9
10
# Fan-out to multiple agents simultaneously
jobs:
- job: Test1
steps: [...]
- job: Test2
steps: [...]
- job: Test3
steps: [...]
- job: Finalize
dependsOn: [Test1, Test2, Test3]
Caching:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Azure Pipelines — cache NuGet packages
- task: Cache@2
displayName: 'Cache NuGet packages'
inputs:
key: 'nuget | "$(Agent.OS)" | **/packages.lock.json,!**/bin/**'
restoreKeys: |
nuget | "$(Agent.OS)"
path: '$(NUGET_PACKAGES)'
# GitHub Actions — cache node_modules
- uses: actions/cache@v4
with:
path: ~/.npm
key: $-node-$
restore-keys: |
$-node-
Incremental builds:
1
2
3
4
5
6
7
8
# Only trigger pipeline when relevant files change
trigger:
paths:
include:
- src/api/**
exclude:
- docs/**
- '**/*.md'
Checkout depth:
1
2
3
# Shallow clone (faster) — only get recent history
- checkout: self
fetchDepth: 1 # Only latest commit (not full history)
⚠️ Avoid
fetchDepth: 1if you use GitVersion, SonarQube, or git blame — these need full history.
Optimize for Reliability
- Retry transient failures: Network flakiness, service unavailability
- Timeout settings: Fail fast rather than hanging indefinitely
- Health gates: Don’t deploy to prod if monitoring shows issues
- Idempotent scripts: Scripts that can safely re-run without side effects
1
2
3
4
5
6
7
8
9
# Set job timeout
jobs:
- job: Build
timeoutInMinutes: 30 # Fail after 30 minutes
cancelTimeoutInMinutes: 5
# Step retry
- script: ./deploy.sh
retryCountOnTaskFailure: 3
Pipeline Concurrency
Azure Pipelines Concurrency
Organization-level parallel jobs:
- Microsoft-hosted: 1 free parallel job (public projects: 10 free)
- Each additional parallel job: ~$40/month
- Controls how many pipelines run simultaneously
Concurrency controls in YAML:
1
2
3
4
5
6
7
8
9
10
11
12
13
# Limit concurrency at pipeline level
# Only 1 instance of this pipeline runs at a time
concurrency:
group: production-deploy
cancelInProgress: false # Wait for current run; don't cancel it
# OR at job/stage level:
jobs:
- job: Deploy
pool:
name: 'MyPool'
demands:
- Agent.OS -equals Linux
GitHub Actions concurrency:
1
2
3
concurrency:
group: $-$
cancel-in-progress: true # Cancel previous run if new one starts
⭐ Use case for
cancel-in-progress: true: On PRs — cancel old runs when new commit is pushed.
⭐ Use case forcancel-in-progress: false: On production deployments — let current deployment finish.
Retention Strategy for Pipeline Artifacts and Dependencies
Azure Pipelines Retention
Default retention:
- Build runs: 30 days
- Release deployments: 60 days
- Pipeline artifacts: Tied to build retention
Configure at organization level:
- Organization Settings → Pipelines → Settings → Retention policies
- Min/Max days for builds, releases
Configure at pipeline level:
1
2
3
4
5
6
7
8
9
10
11
12
13
# Override retention per pipeline
variables:
Build.ArtifactStagingDirectory: $(Pipeline.Workspace)/staging
# Retention lease — prevent a specific build from being deleted
- task: PowerShell@2
inputs:
script: |
az pipelines runs update \
--id $(Build.BuildId) \
--org $(System.TeamFoundationCollectionUri) \
--project $(System.TeamProject) \
--retention-leases "days=365"
Retention policies — recommendations:
| Artifact Type | Recommended Retention |
|---|---|
| Feature branch builds | 7–14 days |
| Main branch builds | 30–90 days |
| Release builds (tagged) | 1 year or permanent |
| Test results | 30 days |
| Staging deployments | 30 days |
| Production deployments | 1 year |
Azure Artifacts Retention
Azure Artifacts stores packages — manage retention per feed:
- Feed → Settings → Retention policies
- Delete packages older than N days
- Keep at least X latest versions
Migrate Classic to YAML Pipelines
Why Migrate?
| Classic | YAML |
|---|---|
| GUI-based, no code | Code-defined (stored in repo) |
| Harder to review/audit | Full code review via PRs |
| No versioning | Versioned with source code |
| Limited reusability | Templates, reusable workflows |
| UI-only configuration | Infrastructure as Code approach |
Migration Steps
- Audit existing classic pipeline:
- Document all tasks, variables, triggers, environments
- Identify linked variable groups, service connections
- Export as YAML (Azure DevOps UI):
- Classic pipeline → View YAML (for build pipelines)
- Note: Classic release pipelines cannot be directly exported to YAML
- Create YAML file:
1 2 3 4 5 6 7 8 9 10 11 12
# Start with the exported YAML and refine trigger: branches: include: [main] pool: vmImage: 'ubuntu-latest' # Migrate tasks one-by-one steps: - task: DotNetCoreCLI@2 ...
- Migrate environments:
- Classic deployment groups → YAML environments
- Recreate approval gates as environment checks
- Migrate variables:
- Classic variables → YAML
variables:section - Secret variables → Variable groups linked to Key Vault
- Classic variables → YAML
- Test in parallel:
- Run classic and YAML pipelines side by side
- Compare outputs and behaviour
- Switch and decommission:
- Update branch policies to use new YAML pipeline
- Disable/delete classic pipeline
Common Migration Challenges
| Challenge | Solution |
|---|---|
| Release gates | YAML environments with approvals and checks |
| Manual intervention tasks | Use environment approval checks |
| Artifact links between classic build + release | Switch to pipeline artifacts in YAML |
| GUI-configured task parameters | Export YAML, capture all inputs |
| Classic agent queues | Map to YAML pool configurations |
🧠 Key Exam Tips for Maintaining Pipelines
| Scenario | Answer |
|---|---|
| Build takes 45 minutes — speed it up | Cache dependencies, parallelize test jobs, shallow clone |
| Too many parallel pipelines consuming agent pool | Configure concurrency group limits |
| Test randomly fails but no code changed | Flaky test — quarantine and fix root cause |
| Keep production release artifacts forever | Set retention lease or increase retention to permanent |
| Prevent two deploys to production at the same time | concurrency: group: production-deploy, cancelInProgress: false |
| Cancel previous PR build when new commit pushed | concurrency: cancelInProgress: true |
| Migrate classic release pipeline to YAML | Recreate as YAML stages with environment approvals/checks |
| Delete old pipeline artifacts automatically | Configure retention policies at org or pipeline level |