Skip to content

Commit 2affae3

Browse files
committed
WIP: pkg/agent: wait for all volumes to be detached before rebooting
This commit provides PoC version of implementing agent waiting for all volumtes attached to the node to be detached as a step after draining the node, as shutting down the Pod does not mean the volume has been detached, as usually CSI agent will be running as a DaemonSet on the node and will take care of detaching the volume from the node when the pod shuts down. This commit improves rebooting experience, as right now if there is not enough time for CSI agent to detach the volumes from the node, node gets rebooted and pods using attached volumes have no way to be attached to other nodes, which effectively increases the downtime caused for stateful workloads. This commit still requires tests and better interface for the users. If someone wants to try this feature on their own cluster, I've published the following image I've been testing with: quay.io/invidian/flatcar-linux-update-operator:97c0dee50c807dbba7d2debc59b369f84002797e Closes #30 Signed-off-by: Mateusz Gozdek <[email protected]>
1 parent 73117aa commit 2affae3

File tree

2 files changed

+30
-0
lines changed

2 files changed

+30
-0
lines changed

examples/deploy/rbac/cluster-role.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,12 @@ rules:
5555
- daemonsets
5656
verbs:
5757
- get
58+
- apiGroups:
59+
- storage.k8s.io
60+
resources:
61+
- volumeattachment
62+
verbs:
63+
- list
5864
- apiGroups:
5965
- policy
6066
resourceNames:

pkg/agent/agent.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,30 @@ func (k *klocksmith) process(ctx context.Context) error {
291291

292292
klog.Info("Node drained, rebooting")
293293

294+
for {
295+
attachments, err := k.clientset.StorageV1().VolumeAttachments().List(ctx, metav1.ListOptions{})
296+
if err != nil {
297+
klog.Errorf("Listing volume attachments: %v", err)
298+
continue
299+
}
300+
301+
anyVolumeAttached := false
302+
303+
for _, attachment := range attachments.Items {
304+
if attachment.Status.Attached && attachment.Spec.NodeName == k.nodeName {
305+
anyVolumeAttached = true
306+
klog.Infof("Volume %q is still attached, waiting for detach", attachment.Name)
307+
}
308+
}
309+
310+
if !anyVolumeAttached {
311+
klog.Info("All volumes are detached from node, rebooting.")
312+
break
313+
}
314+
315+
time.Sleep(5 * time.Second)
316+
}
317+
294318
// Reboot.
295319
k.lc.Reboot(false)
296320

0 commit comments

Comments
 (0)