Move towards using MPL in the GPU version#335
Conversation
|
This is great step!
So the logic needs to be a bit different for this to work. |
There are preexisting references to |
|
Following offline discussions with @wdeconinck, I've added support for the MPI_F08 feature (on by default) of FIAT. This further reduces the configurations where it's necessary to call MPI directly (what I call "raw" MPI). The only remaining configuration in fact is when ecTrans is being built against a FIAT version earlier than {next version to be released} (a new release with MPI_F08 compatibility hasn't been made yet). If we in future made {next FIAT version to be released} as the minimum supported FIAT version, we could simply delete all raw MPI calls. I will do some testing to make sure everything is working, before this can be merged. |
|
Problems on LUMI... I wonder if we have to add an exception for CCE. |
I think this is again this Cray issue biting us: #157 (comment) |
|
Unfortunately I think we will have to enable |
8f77c9d to
d9dba97
Compare
|
Wow, what a nightmare. After a lot of tedious debugging, I noticed that I had removed the |
|
During debugging I noticed some issues with |
|
The plot thickens: |
b40c54a to
61593ab
Compare
|
I completely forgot about this PR. The non-GPU-aware MPI functionality is currently broken, so it would be good to get these changes in so it's fixed. To remind you: with this PR, when GPU-aware MPI is disabled we fall back on MPL. This means that we don't need to search for an MPI library when GPU-aware MPI is disabled. That search is currently missing, which is why configuring currently fails when GPU-aware MPI is disabled. Based on my experiments above, it seems that we can't yet rely on MPL for direct GPU-GPU communication, so I suggest that for now we continue to rely on raw MPI calls. Happy to merge this @wdeconinck? |
|
OK for me; but can we verify it works on lumi-g? |
I'll take a look. |
|
LUMI seems to be a bit messed up at the moment. We use CCE 17 in the CI. Well, this isn't available anymore, only CCE 19, and I'm not even able to build FIAT with that version (internal compiler error). |
1cc80b1 to
727f8f2
Compare
This is set when we enabled GPU-aware communication and FIAT doesn't support MPI_F08 (either because it's disabled, or because we're using an older version of FIAT which doesn't have any MPI_F08 at all).
Co-authored-by: Willem Deconinck <willem.deconinck@ecmwf.int>
727f8f2 to
499a4c1
Compare
|
Just adding a link to ecmwf-ifs/fiat#90 here. We should revisit the use of raw MPI when MPL supports device resident arrays. |
This is a slightly less brute-force alternative to PR #334 which also lays the groundwork for eventually relying entirely on MPL in the GPU code path. Let me explain...
With this branch, if you disable
GPU_AWARE_MPI, an MPI library is not required by ecTrans. No such library will be linked against and there will be no calls to MPI in any compiled object code. Whether MPI is called "under the hood" of MPL depends entirely on whether you compiled FIAT with or without MPI. In the latter case, the MPI serial fallback will be used. This means you can test on GPU platforms without an MPI installation by simply building FIAT without MPI and disablingGPU_AWARE_MPI.For now,
GPU_AWARE_MPIrequires direct calls to MPI, hence only for that configuration do we need to link against MPI::MPI_Fortran explicitly. Eventually we should have support to pass GPU buffers to MPL, and when that happens we can finally delete all references to MPI from ecTrans and rely entirely on MPL, much as we already do for the CPU version.