DART-MPI: call into MPI every once in a while for local put/get#712
DART-MPI: call into MPI every once in a while for local put/get#712devreal wants to merge 1 commit intodevelopmentfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## development #712 +/- ##
===============================================
+ Coverage 84.06% 85.19% +1.12%
===============================================
Files 336 336
Lines 24954 24945 -9
Branches 11349 11540 +191
===============================================
+ Hits 20977 21251 +274
Misses 3693 3693
+ Partials 284 1 -283
|
| #include <alloca.h> | ||
|
|
||
| /* the number of consecutive memcpy for local put/get before calling into MPI */ | ||
| #define NUM_CONSECUTIVE_MEMCPY 16 |
There was a problem hiding this comment.
Yes, I'm open to other suggestions :D
There was a problem hiding this comment.
at least document, that 16 is just an educated guess
| #include <alloca.h> | ||
|
|
||
| /* the number of consecutive memcpy for local put/get before calling into MPI */ | ||
| #define NUM_CONSECUTIVE_MEMCPY 16 |
There was a problem hiding this comment.
at least document, that 16 is just an educated guess
| #define NUM_CONSECUTIVE_MEMCPY 16 | ||
|
|
||
| /* number of performed local memcpy between calling into MPI */ | ||
| static _Thread_local int num_local_memcpy = 0; |
There was a problem hiding this comment.
this needs C11, is this already ensured by CMake? I also think its better to use <threads.h> and thread_local here.
There was a problem hiding this comment.
You're right, we don't officially require a C11 compiler. I am not going to open the discussion on whether we should abandon support for >20 year old compilers though...
| num_local_memcpy = 0; | ||
| return DART_OK; | ||
| } | ||
|
|
There was a problem hiding this comment.
get_shared_mem and put_shared_mem also call memcpy, are they also effected by this?
There was a problem hiding this comment.
They do not, as that doesn't depend on progress triggered locally.
6105977 to
151d79d
Compare
|
After having given this some more thought I modified the PR to always call into MPI for local memory accesses. If fast local access without progress is required the application should just dereference the native pointer provided by DASH. Once we are in DART we have lost the latency race anyway. Everything else just adds complexity. |
|
+1 |
We need to call into MPI every once in a while as otherwise polling on a local variable may not trigger progress if that is needed.