Conversation
Signed-off-by: Thibault NORMAND <thibault.normand@datadoghq.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Thibault NORMAND <me@zenithar.org>
d238abd to
6109573
Compare
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: fb46e35 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
|
The diskFull disruption creates a ballast file on the host filesystem via the injector pod. The injector pod mounts the host root at /mnt/host, but that mount has ReadOnly: true — which was correct for all existing injectors (network, CPU, etc.) that only read the host. diskFull must write to the host, so it gets read-only file system ENOSPC before even starting. Root cause
{
Name: "host",
MountPath: "/mnt/host",
ReadOnly: true, // ← must be false for diskFull
},Fix
File: Change signature: func (m *chaosPodService) generateChaosPodSpec(..., hostWritable bool) corev1.PodSpec {Inside the function, use the parameter:
File: Spec: m.generateChaosPodSpec(
targetNodeName,
terminationGracePeriod,
activeDeadlineSeconds,
args,
hostPathDirectory,
hostPathFile,
kind == chaostypes.DisruptionKindDiskFull, // hostWritable
), |
|
Many thanks for the deep investigation. I will fix that ASAP. I still have concerns about allowing write to a complete FS for writing a ballast in a dedicated directory. It will allow someone with access to the pod to alter the disrupted pod/node for purposes other than the expected disruption. I will propose a security gate. |
|
Could you also create an example file to test locally the disruption:
# Unless explicitly stated otherwise all files in this repository are licensed
# under the Apache License Version 2.0.
# This product includes software developed at Datadog (https://www.datadoghq.com/).
# Copyright 2026 Datadog, Inc.
apiVersion: chaos.datadoghq.com/v1beta1
kind: Disruption
metadata:
name: disk-full
namespace: chaos-demo
spec:
level: pod
selector:
service: demo-curl
count: 1
duration: 10m
diskFull:
path: "/mnt/data"
capacity: "95%" |
|
Could you also update the |
|
Could you also update the |
… address PR comments. Add diskFull to 5 missing registration points in validateGlobalDisruptionScope (at-least-one-kind check, ContainerFailure/NodeFailure/PodReplacement compatibility, OnInit compatibility), DisruptionCount(), and Explain(). Add writable shadow mount for the target path in chaos pod spec so the injector can write ballast files while keeping /mnt/host read-only. Add capacity mode test coverage, disk_full example, complete.yaml entry, and docs/README.md link. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What does this PR do?
Adds a new
diskFulldisruption kind that genuinely fills a target pod volume using thefallocate(2)syscall, causing real ENOSPC errors on all subsequent write operations. This fills a gap where existing disruptions (DiskPressure = I/O throttling, DiskFailure = eBPF onopenatonly) don't simulate actual disk exhaustion visible to monitoring and all syscalls.Features
fallocate(2)syscall (instant, O(1) on ext4/xfs) to genuinely consume disk space. Falls back to writing zeros on unsupported filesystems.unsafeMode.allowDiskFullNoFloor). Pod-level only. Webhook warning for ephemeral-storage eviction risk.fallocate/package (adapted from detailyang/go-fallocate, MIT) — no dependency onfallocateorddbinaries in the injector image.How it differs from existing disruptions
df/monitoring?openatonlyExample
Code Quality Checklist
Testing
unittests.Test coverage
Files changed (24 files, ~1350 lines)
api/v1beta1/disk_full.go,disruption_types.go,disruption_webhook.go,safemode.goinjector/disk_full.go(ballast file via fallocate)cli/injector/disk_full.go,cli/injector/main.gofallocate/(4 platform-specific files, adapted from go-fallocate MIT)safemode/safemode_disk_full.go,safemode/safemode.gotypes/types.go(DisruptionKindDiskFull)docs/disk_full.md,docs/disruption_catalogue.mdapi/v1beta1/disk_full_test.go,injector/disk_full_test.go