Skip to main content

Disk fill

Disk fill is a Kubernetes pod-level chaos fault that applies disk stress by filling the pod's ephemeral storage on a node. This fault evicts the application pod if its capacity exceeds the pod's ephemeral storage limit.

Disk Fill

Use cases

Disk fill:

  • Tests the ephemeral storage limits and ensures that the parameters are sufficient.
  • Determines the resilience of the application to unexpected storage exhaustion.
  • Evaluates the application's resilience to disk stress or replica evictions.
  • Simulates the filled data mount points.
  • Verifies file system performance, and thin-provisioning support.
  • Verifies space reclamation (UNMAP) capabilities on storage.

Permissions required

Below is a sample Kubernetes role that defines the permissions required to execute the fault.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: disk-fill
spec:
definition:
scope: Cluster # Supports "Namespaced" mode too
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["replicasets, daemonsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]

Prerequisites

  • Kubernetes > 1.16
  • The application pods should be in the running before and after injecting chaos.
  • Appropriate Ephemeral storage requests and limits should be set for the application before running the fault. An example specification is shown below:
    apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: db
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
ephemeral-storage: "2Gi"
limits:
ephemeral-storage: "4Gi"
- name: wp
image: wordpress
resources:
requests:
ephemeral-storage: "2Gi"
limits:
ephemeral-storage: "4Gi"

Mandatory tunables

Tunable Description Notes
NODE_LABEL Node label used to filter the target node if TARGET_NODE environment variable is not set. It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE. For more information, go to node label.
FILL_PERCENTAGE Percentage to fill the ephemeral storage limit. This limit is set in the target pod. It can be set to more than 100 which force evicts the pod. For more information, go to disk fill percentage
EPHEMERAL_STORAGE_MEBIBYTES Ephemeral storage required to be filled (in mebibytes). It is mutually exclusive with FILL_PERCENTAGE environment variable. If both are provided, FILL_PERCENTAGE takes precedence. For more information, go to disk fill mebibytes
CONTAINER_RUNTIME Container runtime interface for the cluster. Default: containerd. Supports docker, containerd and crio. For more information, go to container runtime
SOCKET_PATH Path to the containerd/crio/docker socket file. Default: /run/containerd/containerd.sock. For more information, go to socket path

Optional tunables

Tunable Description Notes
TARGET_CONTAINER Name of the container subject to disk fill. If it is not provided, the first container in the target pod will be subject to chaos. For more information, go to kill specific container
LIB_IMAGE Image used to run the stress command. Default: chaosnative/chaos-go-runner:main-latest. For more information, go to image used by the helper pod.
TOTAL_CHAOS_DURATION Duration for which to insert chaos (in seconds). Default: 60 s. For more information, go to duration of the chaos
TARGET_PODS Comma-separated list of application pod names subject to disk fill chaos. If not provided, the fault selects the target pods randomly based on provided appLabels. For more information, go to target specific pods
DATA_BLOCK_SIZE Data block size used to fill the disk (in KB). Default: 256 KB. For more information, go to data block size
PODS_AFFECTED_PERC Percentage of total pods to target. Provide numeric values. Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage
RAMP_TIME Period to wait before injecting chaos (in seconds). For example, 30 s. For more information, go to ramp time
SEQUENCE Sequence of chaos execution for multiple target pods. Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution

Disk fill percentage

Percentage of ephemeral storage limit to be filled at resource.limits.ephemeral-storage within the target application. Tune it by using the FILL_PERCENTAGE environment variable.

The following YAML snippet illustrates the use of this environment variable:

## percentage of ephemeral storage limit specified at `resource.limits.ephemeral-storage` inside target application
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: disk-fill
spec:
components:
env:
## percentage of ephemeral storage limit, which needs to be filled
- name: FILL_PERCENTAGE
value: "80" # in percentage
- name: TOTAL_CHAOS_DURATION
VALUE: "60"

Disk fill mebibytes

Ephemeral storage required to be filled in the target pod. Tune it by using the EPHEMERAL_STORAGE_MEBIBYTES environment variable.

EPHEMERAL_STORAGE_MEBIBYTES is mutually exclusive with the FILL_PERCENTAGE environment variable. If FILL_PERCENTAGE environment variable is set, the fault uses FILL_PERCENTAGE for the fill. Otherwise, the dault fills the ephemeral storage based on EPHEMERAL_STORAGE_MEBIBYTES environment variable.

The following YAML snippet illustrates the use of this environment variable:

# ephemeral storage which needs to fill in will application
# if ephemeral-storage limits is not specified inside target application
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: disk-fill
spec:
components:
env:
## ephemeral storage size, which needs to be filled
- name: EPHEMERAL_STORAGE_MEBIBYTES
value: "256" #in MiBi
- name: TOTAL_CHAOS_DURATION
VALUE: "60"

Data block size

Size of the data block required to fill the ephemeral storage of the target pod. It is in terms of KB. The default value of DATA_BLOCK_SIZE is 256 KB. Tune it by using the DATA_BLOCK_SIZE environment variable.

The following YAML snippet illustrates the use of this environment variable:

# size of the data block used to fill the disk
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: disk-fill
spec:
components:
env:
## size of data block used to fill the disk
- name: DATA_BLOCK_SIZE
value: "256" #in KB
- name: TOTAL_CHAOS_DURATION
VALUE: "60"

Container runtime and socket path

The CONTAINER_RUNTIME and SOCKET_PATH environment variables to set the container runtime and socket file path, respectively.

  • CONTAINER_RUNTIME: It supports docker, containerd, and crio runtimes. The default value is containerd.
  • SOCKET_PATH: It contains path of containerd socket file by default(/run/containerd/containerd.sock). For docker, specify path as /var/run/docker.sock. For crio, specify path as /var/run/crio/crio.sock.

The following YAML snippet illustrates the use of these environment variables:

## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-api-latency
spec:
components:
env:
# runtime for the container
# supports docker, containerd, crio
- name: CONTAINER_RUNTIME
value: "containerd"
# path of the socket file
- name: SOCKET_PATH
value: "/run/containerd/containerd.sock"
# provide the port of the targeted service
- name: TARGET_SERVICE_PORT
value: "80"
- name: PATH_FILTER
value: '/status'