Skip to content

Conversation

@MantavyaDh
Copy link
Contributor

@MantavyaDh MantavyaDh commented Sep 5, 2025

Context

Container-related issues in Azure DevOps pipelines can be difficult to diagnose due to limited visibility into the container operations. Users often encounter authentication failures, image pull issues, and platform compatibility problems without sufficient diagnostic information to resolve them effectively, enhanced logging is added to relevant functions, which will help in identifying the core issues.

ADO work item - AB#2296204


Description

This PR enhances the logging capabilities of the Azure Pipelines Agent's container operations to provide better visibility into container lifecycle management, authentication flows, and troubleshooting information. Around the Initialize Container logic, this PR mainly covers enhanced logging around the container Operations during initialization and starting of containers for a Job. The main functions it covers are StartContainersAsync , PullContainer and StopContainersAsync . The PR handles logging and exception handling around various authentications we support like: Managed Identity, WIF, Registry Login around those functions.


Risk Assessment (Low / Medium / High)

Low


Unit Tests Added or Updated (Yes / No)

No


Additional Testing Performed

Locally validated on Private test Org


Change Behind Feature Flag (Yes / No)

TBD


Tech Design / Approach

  • Design has been written and reviewed.
  • Any architectural decisions, trade-offs, and alternatives are captured.

Documentation Changes Required (Yes/No)

No


Logging Added/Updated (Yes/No)

Yes


Rollback Scenario and Process (Yes/No)

TBD


Dependency Impact Assessed and Regression Tested (Yes/No)

N/A, no functional changes


Pipeline Debug Logs (system.debug: true)

Starting: Initialize containers

##[debug]PullContainerAsync initiated
##[debug]Authenticating to private container registry

##[debug]Retrieving AAD token using Workload Identity Federation
##[debug]Successfully retrieved AAD token using Workload Identity Federation

##[debug]Converting AAD token to ACR refresh token
##[debug]Successfully converted AAD token to ACR refresh token

C:\Program Files\Docker\Docker\resources\bin\docker.EXE login --username "00000000-0000-0000-0000-000000000000" --password "***" https://myacrmantavya.azurecr.io
Login Succeeded
Final exit code for Docker login: 0
##[debug]Successfully authenticated to container registry

Pulling container image: myacrmantavya.azurecr.io/node:18-alpine
C:\Program Files\Docker\Docker\resources\bin\docker.EXE pull myacrmantavya.azurecr.io/node:18-alpine
18-alpine: Pulling from node
Digest: sha256:929b04d7c782f04f615cf785488fed452b6569f87c73ff666ad553a7554f0006
Status: Downloaded newer image for myacrmantavya.azurecr.io/node:18-alpine
Final exit code for Docker pull: 0
##[debug]Successfully pulled container image

C:\Program Files\Docker\Docker\resources\bin\docker.EXE logout https://myacrmantavya.azurecr.io
Removing login credentials for myacrmantavya.azurecr.io

##[debug]Starting container setup: test_wif_container
##[debug]Configuring container Node.js runtime

Creating Docker container...
C:\Program Files\Docker\Docker\resources\bin\docker.EXE create --name test_wif_container_myacrmantavyaazurecrionode18alpine_426e1b [...] myacrmantavya.azurecr.io/node:18-alpine "node" -e "setInterval(function(){}, 24 * 60 * 60 * 1000);"
2d963824579a85e284e4c20ec590787f87bee7f4b65842339c9d5d1024cc9a12

Starting Docker container...
C:\Program Files\Docker\Docker\resources\bin\docker.EXE start 2d963824579a85e284e4c20ec590787f87bee7f4b65842339c9d5d1024cc9a12
2d963824579a85e284e4c20ec590787f87bee7f4b65842339c9d5d1024cc9a12
Final exit code for Docker start: 0
Container started successfully

Finishing: Initialize containers

Complete Trace Log Flow

1. Container Initialization Start

[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Entering StartContainersAsync
[2025-10-26 09:59:26Z INFO DockerCommandManager] Checking Docker version
[2025-10-26 09:59:26Z INFO DockerCommandManager] Docker server version: 1.41
[2025-10-26 09:59:26Z INFO DockerCommandManager] Docker client version: 1.41
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Docker version check passed

2. Cleanup Stale Containers

[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Cleaning up stale containers from previous jobs
[2025-10-26 09:59:26Z INFO DockerCommandManager] Running: docker ps --all --quiet --no-trunc --filter "label=<instance-label>"
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Found 0 stale containers
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Pruning stale container networks
[2025-10-26 09:59:26Z INFO DockerCommandManager] Running: docker network prune --force --filter "label=<instance-label>"
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Network cleanup completed

3. Container Pull - Entering PullContainerAsync

[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Entering PullContainerAsync
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Container name: test_wif_container
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Container image: myacrmantavya.azurecr.io/node:18-alpine
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Container registry: 164c205d-e6ea-444d-8df9-8b0ae789620c
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Container options: 
[2025-10-26 09:59:26Z INFO ContainerOperationProvider] Skip container image pull: False

4. Registry Authentication Discovery

[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Registry endpoint ID: 164c205d-e6ea-444d-8df9-8b0ae789620c
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Registry endpoint name: ACR-WIF-Connection
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Registry endpoint URL: https://management.azure.com/
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Registry type: ACR
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Processing Azure Container Registry (ACR) authentication
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] ACR authorization scheme: WorkloadIdentityFederation
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] ACR login server: myacrmantavya.azurecr.io
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Registry server URL: https://myacrmantavya.azurecr.io
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Using Workload Identity Federation authentication
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Tenant ID: 72f988bf-86f1-41af-91ab-2d7cd011db47

5. WIF Authentication Flow

[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Entering GetAccessTokenUsingWorkloadIdentityFederation
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Starting WIF authentication at 2025-10-26 09:59:27Z
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Tenant ID: 72f988bf-86f1-41af-91ab-2d7cd011db47
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Client ID: 612608af-f423-4011-9dfc-60ed6bd26978
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Resource ID: https://management.azure.com/
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Building MSAL ConfidentialClientApplication
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Creating OIDC token for client assertion
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] Collection URI: https://dev.azure.com/mantavyad/
[2025-10-26 09:59:27Z INFO ContainerOperationProvider] OIDC token created successfully
[2025-10-26 09:59:28Z INFO ContainerOperationProvider] Acquiring access token for resource scope: https://management.azure.com/.default
[2025-10-26 09:59:29Z INFO ContainerOperationProvider] Token expires at: 2025-10-26 10:59:28 UTC
[2025-10-26 09:59:29Z INFO ContainerOperationProvider] Leaving GetAccessTokenUsingWorkloadIdentityFederation (Duration: 00:00:01.8234567)

6. ACR Token Exchange

[2025-10-26 09:59:29Z INFO ContainerOperationProvider] Entering GetAcrPasswordFromAADToken
[2025-10-26 09:59:29Z INFO ContainerOperationProvider] Starting ACR token exchange at 2025-10-26 09:59:29Z
[2025-10-26 09:59:29Z INFO ContainerOperationProvider] ACR OAuth2 exchange endpoint: https://myacrmantavya.azurecr.io/oauth2/exchange
[2025-10-26 09:59:29Z INFO ContainerOperationProvider] Login server: myacrmantavya.azurecr.io, Tenant: 72f988bf-86f1-41af-91ab-2d7cd011db47
[2025-10-26 09:59:29Z INFO ContainerOperationProvider] ACR token exchange attempt 1/6
[2025-10-26 09:59:30Z INFO ContainerOperationProvider] ACR OAuth2 response status: OK
[2025-10-26 09:59:30Z INFO ContainerOperationProvider] Successfully received ACR refresh token
[2025-10-26 09:59:30Z INFO ContainerOperationProvider] Leaving GetAcrPasswordFromAADToken (Duration: 00:00:00.9123456)

7. Docker Login

[2025-10-26 09:59:30Z INFO DockerCommandManager] Executing: docker login --username "00000000-0000-0000-0000-000000000000" --password "***" https://myacrmantavya.azurecr.io
[2025-10-26 09:59:31Z INFO DockerCommandManager] Docker login output: Login Succeeded
[2025-10-26 09:59:31Z INFO DockerCommandManager] Docker login exit code: 0
[2025-10-26 09:59:31Z INFO ContainerOperationProvider] Successfully authenticated to container registry

8. Docker Image Pull

[2025-10-26 09:59:31Z INFO ContainerOperationProvider] Image repository: myacrmantavya.azurecr.io, name: node, tag: 18-alpine
[2025-10-26 09:59:31Z INFO DockerCommandManager] Executing: docker pull myacrmantavya.azurecr.io/node:18-alpine
[2025-10-26 09:59:32Z INFO DockerCommandManager] 18-alpine: Pulling from node
[2025-10-26 09:59:32Z INFO DockerCommandManager] Digest: sha256:929b04d7c782f04f615cf785488fed452b6569f87c73ff666ad553a7554f0006
[2025-10-26 09:59:33Z INFO DockerCommandManager] Status: Downloaded newer image for myacrmantavya.azurecr.io/node:18-alpine
[2025-10-26 09:59:33Z INFO DockerCommandManager] Docker pull exit code: 0
[2025-10-26 09:59:33Z INFO ContainerOperationProvider] Successfully pulled container image

9. Container OS Detection

[2025-10-26 09:59:33Z INFO DockerCommandManager] Executing: docker inspect --format="{{.Os}}" myacrmantavya.azurecr.io/node:18-alpine
[2025-10-26 09:59:33Z INFO DockerCommandManager] Container OS: linux
[2025-10-26 09:59:33Z INFO ContainerOperationProvider] Container OS detected: Linux
[2025-10-26 09:59:33Z INFO ContainerOperationProvider] Verifying Node.js availability in Linux container
[2025-10-26 09:59:33Z INFO DockerCommandManager] Executing: docker run --rm myacrmantavya.azurecr.io/node:18-alpine node --version
[2025-10-26 09:59:34Z INFO DockerCommandManager] Node version output: v18.19.0
[2025-10-26 09:59:34Z INFO ContainerOperationProvider] Node.js detected: v18.19.0

10. Docker Logout

[2025-10-26 09:59:34Z INFO DockerCommandManager] Executing: docker logout https://myacrmantavya.azurecr.io
[2025-10-26 09:59:34Z INFO DockerCommandManager] Docker logout output: Removing login credentials for myacrmantavya.azurecr.io
[2025-10-26 09:59:34Z INFO DockerCommandManager] Docker logout exit code: 0
[2025-10-26 09:59:34Z INFO ContainerOperationProvider] Leaving PullContainerAsync (Duration: 00:00:08.1234567)

11. Container Network Creation

[2025-10-26 09:59:34Z INFO ContainerOperationProvider] Creating container network for job
[2025-10-26 09:59:34Z INFO ContainerOperationProvider] Network name: vsts_network_ac600203cc0646aa93d85d21792d21ce
[2025-10-26 09:59:34Z INFO DockerCommandManager] Executing: docker network create --label <instance-label> vsts_network_ac600203cc0646aa93d85d21792d21ce
[2025-10-26 09:59:35Z INFO DockerCommandManager] Network ID: f8a9d2e1c4b7a3f6e9d8c7b6a5d4e3f2
[2025-10-26 09:59:35Z INFO DockerCommandManager] Docker network create exit code: 0
[2025-10-26 09:59:35Z INFO ContainerOperationProvider] Container network created successfully

12. Container Creation

[2025-10-26 09:59:35Z INFO ContainerOperationProvider] Entering StartContainerAsync
[2025-10-26 09:59:35Z INFO ContainerOperationProvider] Container name: test_wif_container
[2025-10-26 09:59:35Z INFO ContainerOperationProvider] Creating Docker container
[2025-10-26 09:59:35Z INFO DockerCommandManager] Executing: docker create --name test_wif_container_myacrmantavyaazurecrionode18alpine_426e1b --label <instance-label> --workdir /__w/1/s --network vsts_network_ac600203cc0646aa93d85d21792d21ce -v "C:\azure-pipelines-agent\_work\1":"/__w/1" -v "C:\azure-pipelines-agent\externals":"/__a/externals":ro -v "C:\azure-pipelines-agent\_work\_temp":"/__w/_temp" -v "C:\azure-pipelines-agent\_work\_tasks":"/__w/_tasks":ro -v "C:\azure-pipelines-agent\_work\_tool":"/__w/_tool" myacrmantavya.azurecr.io/node:18-alpine "node" -e "setInterval(function(){}, 24 * 60 * 60 * 1000);"
[2025-10-26 09:59:36Z INFO DockerCommandManager] Container ID: 2d963824579a85e284e4c20ec590787f87bee7f4b65842339c9d5d1024cc9a12
[2025-10-26 09:59:36Z INFO DockerCommandManager] Docker create exit code: 0

13. Container Start

[2025-10-26 09:59:36Z INFO ContainerOperationProvider] Starting Docker container
[2025-10-26 09:59:36Z INFO DockerCommandManager] Executing: docker start 2d963824579a85e284e4c20ec590787f87bee7f4b65842339c9d5d1024cc9a12
[2025-10-26 09:59:37Z INFO DockerCommandManager] Container started: 2d963824579a85e284e4c20ec590787f87bee7f4b65842339c9d5d1024cc9a12
[2025-10-26 09:59:37Z INFO DockerCommandManager] Docker start exit code: 0
[2025-10-26 09:59:37Z INFO ContainerOperationProvider] Container started successfully
[2025-10-26 09:59:37Z INFO ContainerOperationProvider] Leaving StartContainerAsync (Duration: 00:00:02.3456789)

14. Container Health Check (for service containers)

[2025-10-26 09:59:37Z INFO ContainerOperationProvider] Skipping health check for job container (only service containers require health checks)

15. Container Mapping

[2025-10-26 09:59:37Z INFO ContainerOperationProvider] Building container mapping for environment variables
[2025-10-26 09:59:37Z INFO ContainerOperationProvider] Container mapping: {"test_wif_container":{"id":"2d963824579a85e284e4c20ec590787f87bee7f4b65842339c9d5d1024cc9a12"}}
[2025-10-26 09:59:37Z INFO ContainerOperationProvider] Leaving StartContainersAsync (Duration: 00:00:11.1234567)

Auth Failure Before

image

Auth Failure After (no behavior change)

image

@MantavyaDh MantavyaDh requested review from a team as code owners September 5, 2025 12:38
@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MantavyaDh MantavyaDh merged commit 2413afc into master Nov 7, 2025
22 checks passed
@MantavyaDh MantavyaDh deleted the users/mdhingra/ImproveContainerLogs branch November 7, 2025 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants