-
Notifications
You must be signed in to change notification settings - Fork 72
{2025.06}[2024a] PyTorch 2.6.0 #1314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace |
|
New job on instance
edit: job exceeded its 1-day time limit. According to https://gist.github.com/boegelbot/b64a7290ab9a66973b6aed13ec38a1dd, this could take ~2 days. |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2 |
|
New job on instance
|
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2 |
|
New job on instance
|
I'm seeing lots of errors that look like the following one (i.e. with @Flamefire have you seen such errors before by any chance? |
|
May be due to having a too new glibc, according to conda-forge/pytorch-cpu-feedstock#350 (comment)? |
is /tmp mounted with What is really strange: The dlopen error leads to a tensor comparison error which doesn't make sense to me You could also try with 2.7.1: easybuilders/easybuild-easyconfigs#23923 |
I checked in our build container, but it doesn't seem to use that mount option |
|
Also see a lot of these: |
That will require some more work, e.g. #1278 needs to be deployed first. We don't have a CPU-only version of 2.7.1 as far as I can see? |
This could be the culprit: I see some google results suggesting " cannot enable executable stack as shared object requires: Invalid argument" can be fixed with
But isn't |
I dropped creating a CPU-only version after user complaints of the "strong(ly) GPU-accelerated" module doesn't support GPUs at all. |
I also found similar results, and when searching in the PyTorch repo I also found a commit that adds that flag here:
We filter bintuils in EESSI (https://github.com/EESSI/software-layer-scripts/blob/main/EESSI-extend-easybuild.eb#L48), so in that sense it's correct that it's picking up this
That definitely makes sense! I wanted to try the CPU-only version first, as I imagined it would cause fewer build issues 😅 . But then I'll wait until the CUDA for 2025.06 is ingested, and will then give 2.7.1 a try. |
This will be fun.