Skip to content

[Feature]: Support multi-architecture container images via CVMFS multiarch layout #8454

@aldbr

Description

@aldbr

User Story

As a DIRAC site administrator deploying ARM (aarch64) and other non-x86_64 worker nodes, I want DIRAC to automatically resolve the correct container image for the worker node's architecture, so that Pilot-Jobs can seamlessly execute payloads on heterogeneous clusters without manual per-architecture configuration.

Feature Description

DIRAC currently hardcodes or configures a single container image path (via ContainerRoot) that implicitly assumes x86_64. With the introduction of ARM resources and the CVMFS unpacked.cern.ch support for multi-architecture images, DIRAC needs to:

  1. Detect the worker node architecture at runtime (using platform.machine() / uname -m).
  2. Map it to the OCI architecture name used by the CVMFS multiarch layout (e.g. x86_64: amd64, aarch64: arm64).
  3. Resolve the correct image path under the CVMFS .multiarch directory structure: /cvmfs/unpacked.cern.ch/.multiarch/<arch>/<registry>/<image>:<tag>
    For example: /cvmfs/unpacked.cern.ch/.multiarch/arm64/registry.hub.docker.com/library/alma9:latest
    Note: Variant handling is deferred. Initially, DIRAC only resolves by architecture (uname -m), not by variant. Can be added later if needed.

It should affect both code paths that launch containers:

  • SingularityComputingElement (job execution via the JobAgent`)
  • dirac-apptainer-exec (Pilot-Job command execution)

It should maintain backward compatibility with the existing ContainerRoot configuration option: if ContainerRoot is set and the new multiarch path doesn't exist, fall back to ContainerRoot.

Definition of Done

  • Multi-architecture images hosted in CVMFS unpacked are supported
  • DIRAC should pick the right container based on the underlying architecture
  • Unit tests
  • Documentation
  • Backward compatibility (no breaking change)

Alternatives Considered

No response

Related Issues

First attempt (CVMFS unpacked support for multi-architecture did not exist at that time, we are in the process of testing it with LHCb images): #7589

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions