Skip to content

Update NVHPC version to 26#386

Draft
samhatfield wants to merge 13 commits into
developfrom
update-ci-nvhpc-26.3
Draft

Update NVHPC version to 26#386
samhatfield wants to merge 13 commits into
developfrom
update-ci-nvhpc-26.3

Conversation

@samhatfield
Copy link
Copy Markdown
Collaborator

This PR updates the NVHPC version to 26 (currently 26.1 but we could consider 26.3).

At some point nvfortran with check bounds enabled became a lot stricter about how it handles zero-sized arrays. These arrays crop up in a few places in ecTrans, especially when running with NPRTRV /= 1 and with NPROC > 1. It's possible for a task to have no resident data for a spectral field if it hasn't been assigned any through IVSET. So we pass around arrays with shapes like (0, n). Mathematically this shouldn't be a problem, as long as we index them appropriately (the alternative would be guards all over the place disabling subroutine calls when no local fields are present). But nvfortran now flags accesses to these arrays sometimes as "out of bounds", e.g. PSP(:,:) where the first dimension is zero sized.

So this PR attempts to resolve all of the offending cases as well. A few of my fixes are a bit hacky so I'm open to suggestions for better ways around this.

Comment thread src/etrans/cpu/internal/eftdir_ctl_mod.F90
Comment thread src/programs/ectrans-lam-benchmark.F90 Outdated
Comment on lines +1649 to +1651
if (size(sp3d) > 0) then
call egath_spec(kfgathg=numfld, kto=[(1, i = 1, numfld)], kvset=ivset, pspec=sp3d(:,:,jfld))
endif
Copy link
Copy Markdown
Collaborator

@wdeconinck wdeconinck Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a problem causing an MPI deadlock if it is expected that this rank is going to be communicating within this routine. (SEND/RECEIVE or BARRIER etc..)
When size(sp3d) > 0 on some ranks and size(sp3d) == 0 on others.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, I think you're right. The other tricks (LBOUND etc.) didn't work here. Let me investigate further.

Comment thread src/trans/cpu/internal/read_legpol_mod.F90 Outdated
Comment thread src/trans/cpu/internal/spnormc_mod.F90 Outdated

IF (NPROC > 1.AND.MYPROC /= KMASTER) THEN
CALL MPL_SEND(PSM(:,:),KDEST=NPRCIDS(KMASTER),KTAG=ITAG,&
CALL MPL_SEND(PSM,KDEST=NPRCIDS(KMASTER),KTAG=ITAG,&
Copy link
Copy Markdown
Contributor

@dhaumont dhaumont Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe PSM(:,: ) was required for some other compilers, otherwise it would not be there, no ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting, didn't know that. I thought the coder was just following our style guide, which I think recommends always passing colon indices even when every element is requested.

Copy link
Copy Markdown
Contributor

@dhaumont dhaumont Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure, it was more a question. Maybe @ddegrauwe or @RyadElKhatibMF will know better ?

@lukasm91
Copy link
Copy Markdown
Collaborator

lukasm91 commented Apr 23, 2026

@samhatfield This is a known regression and should be fixed in 26.3.

This provides some context:
https://forums.developer.nvidia.com/t/a-bug-of-nvfortran-26-1-regarding-empty-arrays/358946/5

@samhatfield
Copy link
Copy Markdown
Collaborator Author

samhatfield commented Apr 23, 2026

@samhatfield This is a known regression and should be fixed in 26.3.

This provides some context: https://forums.developer.nvidia.com/t/a-bug-of-nvfortran-26-1-regarding-empty-arrays/358946/5

😮 we were assuming 26.1 and 26.3 behave similarly. So maybe none of these changes are needed??

Edit: let's see... reverting all workarounds.

@wdeconinck
Copy link
Copy Markdown
Collaborator

I thought I was testing all this with 26.3 as well, having these issues.

@lukasm91
Copy link
Copy Markdown
Collaborator

It is at least worth testing... if you still see issues with 26.3, I am happy to also have a look. If it is a compiler bug, I would prefer fixing the compiler rather than adding workarounds :D

@samhatfield
Copy link
Copy Markdown
Collaborator Author

It is at least worth testing... if you still see issues with 26.3, I am happy to also have a look. If it is a compiler bug, I would prefer fixing the compiler rather than adding workarounds :D

Yes, issue still present in 26.3, see https://github.com/ecmwf-ifs/ectrans/actions/runs/24828523187/job/72670702893?pr=386.

@lukasm91
Copy link
Copy Markdown
Collaborator

Thanks, I will have a look!

@wdeconinck
Copy link
Copy Markdown
Collaborator

Thanks Lukas for being proactive about this! Always nicer to have a more robust compiler without workarounds.

@samhatfield
Copy link
Copy Markdown
Collaborator Author

When 26.5 is released soon (next month perhaps) we will revive this PR.

@samhatfield samhatfield marked this pull request as draft May 5, 2026 07:35
@wdeconinck
Copy link
Copy Markdown
Collaborator

@samhatfield could we just turn off bounds-checking for those compiler versions? Would that work for now?

@samhatfield samhatfield force-pushed the update-ci-nvhpc-26.3 branch from 9ec7c62 to 32c7b2e Compare May 12, 2026 11:31
@samhatfield
Copy link
Copy Markdown
Collaborator Author

@samhatfield could we just turn off bounds-checking for those compiler versions? Would that work for now?

I'm not actually sure where ecTrans gets its debug flags from: -Mlarge_arrays -g -O0 -Mbounds -traceback.

-Mlarge_arrays and -traceback are added explicitly by us, but who sets -g -O0 -Mbounds? Is it CMake itself? It's not ecBuild.

@lukasm91
Copy link
Copy Markdown
Collaborator

Yes it comes from -DCMAKE_BUILD_TYPE=Debug
https://github.com/Kitware/CMake/blob/61510710b12ddff97c2c22e146cdfb399daa2dad/Modules/Compiler/PGI-Fortran.cmake#L13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants