Skip to content

Prevent concurrent DeleteVolume/DeleteSnapshot during CreateVolume #6322

Open
nixpanic wants to merge 2 commits into
ceph:develfrom
nixpanic:source-vol-locking
Open

Prevent concurrent DeleteVolume/DeleteSnapshot during CreateVolume #6322
nixpanic wants to merge 2 commits into
ceph:develfrom
nixpanic:source-vol-locking

Conversation

@nixpanic

@nixpanic nixpanic commented Jun 4, 2026

Copy link
Copy Markdown
Member

Describe what this PR does

There is a race condition possible when CreateVolume uses a source volume/snapshot and the source is deleted at the same time. The creation of the volume may fail in weird ways. By grabbing a lock for the source volume/snapshot, a concurrent DeleteVolume procedure has to wait until the CreateVolume procedure has finished.

Related issues

Fixes: #6321

Note: NFS does not use the source volume/snapshot in any way, it forwards the CreateVolume call on to the CephFS Controller. There is no need for the added locking in NFS. NVMe-oF only gets the Clone/Restore functionality through the work-in-progress #6277.


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@nixpanic nixpanic requested a review from a team June 4, 2026 14:44
@nixpanic nixpanic added component/cephfs Issues related to CephFS component/rbd Issues related to RBD labels Jun 4, 2026
defer cs.VolumeLocks.Release(parentVol.VolID)
}
if rbdSnap != nil {
if acquired := cs.SnapshotLocks.TryAcquire(rbdSnap.VolID); !acquired {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateVolumeFromSnapshot will fail with operation already exists since we already have locks there?

func (cs *ControllerServer) createVolumeFromSnapshot(
ctx context.Context,
cr *util.Credentials,
secrets map[string]string,
rbdVol *rbdVolume,
snapshotID string,
) error {
if acquired := cs.SnapshotLocks.TryAcquire(snapshotID); !acquired {
log.ErrorLog(ctx, util.SnapshotOperationAlreadyExistsFmt, snapshotID)
return status.Errorf(codes.Aborted, util.VolumeOperationAlreadyExistsFmt, snapshotID)
}
defer cs.SnapshotLocks.Release(snapshotID)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I would expect the same indeed too. Will check it in more detail and correct it later.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the locking all together, easier to follow the working that way.

nixpanic added 2 commits June 10, 2026 11:55
When creating a volume from a source (either cloning from a volume or
restoring from a snapshot), acquire locks on the source to prevent
concurrent operations that could interfere with the clone/restore process.

This prevents race conditions where the source volume or snapshot could
be modified or deleted while being used as a source for creating a new
volume.

Assisted-by: AskBob <askbob@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
When creating a volume from a source (either cloning from a volume or
restoring from a snapshot), acquire locks on the source to prevent
concurrent operations that could interfere with the clone/restore process.

This prevents race conditions where the source volume or snapshot could
be modified or deleted while being used as a source for creating a new
volume.

Assisted-by: AskBob <askbob@ibm.com>
Signed-off-by: Niels de Vos <ndevos@ibm.com>
@nixpanic nixpanic force-pushed the source-vol-locking branch from 231b980 to f63a16c Compare June 10, 2026 09:55
@nixpanic nixpanic requested review from a team and iPraveenParihar June 10, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/cephfs Issues related to CephFS component/rbd Issues related to RBD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race Condition During Concurrent Clone and Delete (CephFS & RBD)

2 participants