Skip to content

Conversation

@dillon-giacoppo
Copy link
Contributor

@dillon-giacoppo dillon-giacoppo commented Jun 29, 2025

Fix for: #3643

containerd snapshotter is an experimental feature https://github.com/docker/docs/blob/1da5f51da8a9c40c4318c8cec90b3939f0a25ca2/content/manuals/engine/storage/containerd.md that uses the containerd image store.

Using containerd offers multiple benefits, primarily multi-platform images and wasm container support.

When user containerd-snapshotter, rootfs points to the wrong directory leading to error in cadvisor:

manager.go:1116] Failed to create existing container: ... failed to identify the read-write layer ID for container "<CONTAINER_ID>". - open /rootfs/var/lib/docker/image/overlayfs/layerdb/mounts/<CONTAINER_ID>/mount-id: no such file or directory

This PR adds support for overlayfs driver and retrieves the root path from containerd spec directly. With this PR the containers can be loaded successfully:

image

Copy link
Collaborator

@iwankgb iwankgb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more than happy to give it a chance, but I am no longer able to merge PRs in cAdvisor.

@wolfspyre
Copy link

curious why this isn't merged?

@Clovel
Copy link

Clovel commented Nov 18, 2025

Any news on this ?

@wolfspyre
Copy link

is something preventing this from merging? it LOOKS like all's copacetic?
any idea why
pull-cadvisor-e2eExpected is stuck in Waiting for status to be reported ???

@faizan-syed
Copy link

@iwankgb
@dillon-giacoppo
Can this be merged ? its required as there is problem with cAdvisor with new docker version 29.x

@JPar99
Copy link

JPar99 commented Nov 29, 2025

cAdvisor Docker 29 Compatibility Fix - Changelog

Problem Statement

After upgrading to Docker 29.x on Ubuntu 24.04, cAdvisor was unable to monitor Docker containers. Prometheus metrics showed no container data, only the root filesystem metrics.

Root Cause: Docker 29.x introduced containerd-snapshotter as the default image storage backend, which changed the internal metadata structure. This broke cAdvisor's ability to read the expected /var/lib/docker/image/overlayfs/layerdb metadata directory.

Error encountered:


failed to identify the read-write layer ID for container "{container_id}".

open /rootfs/var/lib/docker/image/overlayfs/layerdb/mounts/{container_id}/mount-id: no such file or directory

Reference: #3749


Solution Summary

The fix involves two key changes:

  1. Disable containerd-snapshotter in Docker daemon configuration

  2. Upgrade cAdvisor to v0.53.0 (supports Docker 29's API v1.44+)

This allows cAdvisor to monitor containers via systemd cgroup integration instead of relying on the deprecated layerdb metadata.


Changes Made

1. Create/Update Docker Daemon Configuration

File: /etc/docker/daemon.json

Before: File did not exist (or had different configuration)

After:

{

"features": {

"containerd-snapshotter": false

}

}

Action Required:

sudo tee /etc/docker/daemon.json > /dev/null << 'EOF'

{

"features": {

"containerd-snapshotter": false

}

}

EOF

Then restart Docker:

sudo systemctl restart docker

2. Update cAdvisor Image and Configuration

File: /home/user/docker/prometheus/docker-compose.yml

Before:

cadvisor:

image: gcr.io/cadvisor/cadvisor:v0.45.0

command:

- '--housekeeping_interval=10s'

- '--store_container_labels=true'

After:

cadvisor:

image: ghcr.io/google/cadvisor:v0.53.0

command:

- '--housekeeping_interval=10s'

- '--store_container_labels=true'

Key Changes:

  • Registry changed: gcr.ioghcr.io (Google moved newer images to GitHub Container Registry)

  • Version changed: v0.45.0v0.53.0 (supports Docker 29's API v1.44+)

  • Removed --docker_only=true flag (doesn't work without layerdb structure)

Action Required:

cd /home/user/docker/prometheus

docker compose up -d cadvisor

docker compose restart cadvisor

Verification Steps

1. Verify Docker Configuration

cat /etc/docker/daemon.json

Expected output:

{

"features": {

"containerd-snapshotter": false

}

}

2. Verify Docker Info

docker info | grep -i snapshotter

Should show: driver-type: io.containerd.snapshotter.v1 is NOT present (or shows overlay driver)

3. Verify cAdvisor Logs

docker logs prometheus-cadvisor | grep "Registration of the docker container factory"

Expected output:


I... factory.go:223] Registration of the docker container factory successfully

4. Verify Metrics Collection

curl -s http://localhost:8029/metrics | grep "container_memory_usage_bytes" | grep "docker-" | head -5

Should show metrics like:


container_memory_usage_bytes{...id="/system.slice/docker-{container_id}.scope"...} {value} {timestamp}

5. Verify Prometheus Scraping

curl -s 'http://localhost:12090/api/v1/query?query=container_memory_usage_bytes' | jq '.data.result | length'

Should return a number greater than 20 (at least system cgroups + Docker containers)


How It Works After Fix

Container Monitoring Flow:


Docker Container

↓

systemd cgroup scope: /system.slice/docker-{container_id}.scope

↓

cAdvisor reads cgroup stats directly (no layerdb needed)

↓

cAdvisor exports metrics: container_memory_usage_bytes, container_cpu_usage_seconds, etc.

↓

Prometheus scrapes metrics from http://cadvisor:8080/metrics

↓

Grafana visualizes the data

Key Insight: Instead of reading Docker metadata from layerdb, cAdvisor now reads container resource metrics directly from the systemd cgroup hierarchy. This is actually more reliable and doesn't require Docker-specific metadata.


Reverting the Fix (When cAdvisor Fixes containerd-snapshotter Support)

When to revert: After cAdvisor releases a version with full containerd-snapshotter support (currently being worked on in PR #3709)

Step 1: Re-enable containerd-snapshotter in Docker

File: /etc/docker/daemon.json

Change from:

{

"features": {

"containerd-snapshotter": false

}

}

Change to:

{

"features": {

"containerd-snapshotter": true

}

}

Or delete the file entirely (containerd-snapshotter is the default in Docker 29.x):

sudo rm /etc/docker/daemon.json

Restart Docker:

sudo systemctl restart docker

Step 2: Update cAdvisor to the Fixed Version

File: /home/user/docker/prometheus/docker-compose.yml

Change from:

cadvisor:

image: ghcr.io/google/cadvisor:v0.53.0

command:

- '--housekeeping_interval=10s'

- '--store_container_labels=true'

Change to:

cadvisor:

image: ghcr.io/google/cadvisor:v0.55.0 # or newer version with containerd-snapshotter support

command:

- '--housekeeping_interval=10s'

- '--store_container_labels=true'

- '--docker_only=true' # Can re-enable this flag

Note: Replace v0.55.0 with the actual version that includes PR #3709 fixes. Check the cAdvisor releases page for version with containerd-snapshotter support.

Step 3: Restart cAdvisor

cd /home/user/docker/prometheus

docker compose up -d cadvisor

Step 4: Verify

docker logs prometheus-cadvisor | tail -20

curl -s http://localhost:8029/metrics | grep container_memory_usage_bytes | head -5

Why This Temporary Fix is Safe

  1. No data loss: Existing Docker containers continue to run normally

  2. Minimal performance impact: overlayfs (disabled mode) vs containerd-snapshotter is negligible for most workloads

  3. Reversible: Can re-enable containerd-snapshotter at any time

  4. Widely used: This is the recommended community workaround until cAdvisor releases a proper fix

  5. Better visibility: You now have full container monitoring working, which is more critical than the storage backend optimization


Related Issues and PRs


Files Modified

  1. /etc/docker/daemon.json - Created/Updated

  2. /home/user/docker/prometheus/docker-compose.yml - Updated cadvisor section

  3. /home/user/docker/prometheus/prometheus.yml - Added metric_relabel_configs (optional, for cleaner metrics)


Testing Checklist

  • Docker daemon restarts successfully after config change

  • cAdvisor container starts without errors

  • cAdvisor logs show successful Docker factory registration

  • cAdvisor /metrics endpoint returns container metrics

  • Prometheus successfully scrapes cAdvisor metrics

  • Container memory/CPU metrics visible in Prometheus

  • Multiple Docker containers being monitored

  • No metric collection errors in logs


Support and Monitoring

Monitor these metrics going forward:

# Docker container count

curl -s 'http://localhost:12090/api/v1/query?query=container_memory_usage_bytes' | jq '.data.result[] | select(.metric.id | contains("docker-")) | .metric.id' | wc -l

  

# Disk usage (if concerned about storage)

df -h /var/lib/docker

  

# Docker info

docker info

Last Updated: November 29, 2025

Status: ✅ Working - cAdvisor v0.53.0 successfully monitoring all Docker containers

@allenday
Copy link

allenday commented Dec 1, 2025

please merge it already!

@dims dims merged commit 7eda190 into google:master Dec 2, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants