Prevent concurrent DeleteVolume/DeleteSnapshot during CreateVolume #6322
Open
nixpanic wants to merge 2 commits into
Open
Prevent concurrent DeleteVolume/DeleteSnapshot during CreateVolume #6322nixpanic wants to merge 2 commits into
nixpanic wants to merge 2 commits into
Conversation
6 tasks
| defer cs.VolumeLocks.Release(parentVol.VolID) | ||
| } | ||
| if rbdSnap != nil { | ||
| if acquired := cs.SnapshotLocks.TryAcquire(rbdSnap.VolID); !acquired { |
Contributor
There was a problem hiding this comment.
CreateVolumeFromSnapshot will fail with operation already exists since we already have locks there?
ceph-csi/internal/rbd/controllerserver.go
Lines 679 to 691 in 9d08300
Member
Author
There was a problem hiding this comment.
Good catch, I would expect the same indeed too. Will check it in more detail and correct it later.
Member
Author
There was a problem hiding this comment.
moved the locking all together, easier to follow the working that way.
When creating a volume from a source (either cloning from a volume or restoring from a snapshot), acquire locks on the source to prevent concurrent operations that could interfere with the clone/restore process. This prevents race conditions where the source volume or snapshot could be modified or deleted while being used as a source for creating a new volume. Assisted-by: AskBob <askbob@ibm.com> Signed-off-by: Niels de Vos <ndevos@ibm.com>
When creating a volume from a source (either cloning from a volume or restoring from a snapshot), acquire locks on the source to prevent concurrent operations that could interfere with the clone/restore process. This prevents race conditions where the source volume or snapshot could be modified or deleted while being used as a source for creating a new volume. Assisted-by: AskBob <askbob@ibm.com> Signed-off-by: Niels de Vos <ndevos@ibm.com>
231b980 to
f63a16c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe what this PR does
There is a race condition possible when
CreateVolumeuses a source volume/snapshot and the source is deleted at the same time. The creation of the volume may fail in weird ways. By grabbing a lock for the source volume/snapshot, a concurrentDeleteVolumeprocedure has to wait until theCreateVolumeprocedure has finished.Related issues
Fixes: #6321
Note: NFS does not use the source volume/snapshot in any way, it forwards the
CreateVolumecall on to the CephFS Controller. There is no need for the added locking in NFS. NVMe-oF only gets the Clone/Restore functionality through the work-in-progress #6277.Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>: retest the<job-name>after unrelatedfailure (please report the failure too!)