-
Notifications
You must be signed in to change notification settings - Fork 592
perf(cluster): use multithreads to optimize sendSnapshotByRawKV #3240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
Removed unused future header from slot_migrate.cc
|
@hll1213181368 Do you test how much it will be improved if you increase the number of threads? |
|
|
||
| // Wait til finish | ||
| for (auto &result : results) { | ||
| auto s = result.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if one of the threads fails? Others are still migrating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hll1213181368 Seems this comment was resolved? Did I miss anything?
| auto prefix = ComposeSlotKeyPrefix(namespace_, slot_range.start); | ||
| auto upper_bound = ComposeSlotKeyUpperBound(namespace_, slot_range.end); | ||
| int total_slots = slot_range.end - slot_range.start + 1; | ||
| unsigned int max_parallelism = std::thread::hardware_concurrency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to have an option instead of just querying nproc.
PragmaTwice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a performance improvement, I think it would be great if a benchmark (before and after this patch) is attached.
Refactor BatchSender to use global rate limiter instead of local bytes per second limit.
Removed the 'parallelism' parameter from the migrateSlotRange function and added a global rate limiter.
|
@hll1213181368 There are still two comments to be resolved, and you won't be able to mark them as resolved without any answer or actions. For #3240 (comment), we need to handle the partial failure case. For #3240 (comment), you should support a configuration, or explain what do you think. |
@git-hulk @PragmaTwice I have response this two comment. Do you have good idea? |
|
@hll1213181368 I didn't see that. Please check if your reply comments are in pending status. |
|
if one of the threads fails, Others thread still migrate. But the total result is NotOK. This slotRange migrate failed. we need to analysis the problem or retry to migrate this slotRange. @git-hulk Do you think so? Or do you have any good idea? |
|
@PragmaTwice how to set threads for this migrate task. I don't think more better idea. for example: migrate-fixed-thread、migrate-min-threads、migrate-max-threads. @git-hulk @PragmaTwice Later I will commit another pr. parallel to improve AsyncScanDBSize task. Similar to this deal principle. |
Those comments are pending and not visible to others. Please release them via the comment button. |
|
be542cf to
ca04900
Compare
|
@git-hulk Above I have reply comments. Better to solve one thread fails is to try migrate this failed slotRange again. |
I'm good with that allows the partial failure while doing the parallel migration. But the current implementation doesn't respect this behavior. We should collect and wait for all migration threads to finish, then check whether each is successful, partially successful, or a failure. |
@git-hulk The kvrocks-controller sends a range slots migration request, and the kvrocks node returns either success or failure result for the entire range. For a failed range, the kvrocks-controller is called again to retry the entire range. Successful slots within the range will be returned as successfully migrated, while failed slots will be migrated again. I think it's unnecessary for the kvrocks node to return which slots in the range succeeded and which failed. |





Use multithreads to improve slotRange migration speed of sendSnapshotByRawKV.