Disk fill
Disk fill is a Kubernetes pod-level chaos fault that applies disk stress by filling the pod's ephemeral storage on a node. This fault evicts the application pod if its capacity exceeds the pod's ephemeral storage limit.
Use cases
Disk fill:
- Tests the ephemeral storage limits and ensures that the parameters are sufficient.
- Determines the resilience of the application to unexpected storage exhaustion.
- Evaluates the application's resilience to disk stress or replica evictions.
- Simulates the filled data mount points.
- Verifies file system performance, and thin-provisioning support.
- Verifies space reclamation (UNMAP) capabilities on storage.
Permissions required
Below is a sample Kubernetes role that defines the permissions required to execute the fault.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: disk-fill
spec:
definition:
scope: Cluster # Supports "Namespaced" mode too
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["replicasets, daemonsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
Prerequisites
- Kubernetes > 1.16
- The application pods should be in the running before and after injecting chaos.
- Appropriate Ephemeral storage requests and limits should be set for the application before running the fault. An example specification is shown below:
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: db
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
ephemeral-storage: "2Gi"
limits:
ephemeral-storage: "4Gi"
- name: wp
image: wordpress
resources:
requests:
ephemeral-storage: "2Gi"
limits:
ephemeral-storage: "4Gi"
Mandatory tunables
Tunable | Description | Notes |
---|---|---|
NODE_LABEL | Node label used to filter the target node if TARGET_NODE environment variable is not set. | It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE . For more information, go to node label. |
FILL_PERCENTAGE | Percentage to fill the ephemeral storage limit. This limit is set in the target pod. | It can be set to more than 100 which force evicts the pod. For more information, go to disk fill percentage |
EPHEMERAL_STORAGE_MEBIBYTES | Ephemeral storage required to be filled (in mebibytes). It is mutually exclusive with FILL_PERCENTAGE environment variable. If both are provided, FILL_PERCENTAGE takes precedence. | For more information, go to disk fill mebibytes |
CONTAINER_RUNTIME | Container runtime interface for the cluster. | Default: containerd. Supports docker, containerd and crio. For more information, go to container runtime |
SOCKET_PATH | Path to the containerd/crio/docker socket file. | Default: /run/containerd/containerd.sock . For more information, go to socket path |
Optional tunables
Tunable | Description | Notes |
---|---|---|
TARGET_CONTAINER | Name of the container subject to disk fill. | If it is not provided, the first container in the target pod will be subject to chaos. For more information, go to kill specific container |
LIB_IMAGE | Image used to run the stress command. | Default: chaosnative/chaos-go-runner:main-latest . For more information, go to image used by the helper pod. |
TOTAL_CHAOS_DURATION | Duration for which to insert chaos (in seconds). | Default: 60 s. For more information, go to duration of the chaos |
TARGET_PODS | Comma-separated list of application pod names subject to disk fill chaos. | If not provided, the fault selects the target pods randomly based on provided appLabels. For more information, go to target specific pods |
DATA_BLOCK_SIZE | Data block size used to fill the disk (in KB). | Default: 256 KB. For more information, go to data block size |
PODS_AFFECTED_PERC | Percentage of total pods to target. Provide numeric values. | Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage |
RAMP_TIME | Period to wait before injecting chaos (in seconds). | For example, 30 s. For more information, go to ramp time |
SEQUENCE | Sequence of chaos execution for multiple target pods. | Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution |
Disk fill percentage
Percentage of ephemeral storage limit to be filled at resource.limits.ephemeral-storage
within the target application. Tune it by using the FILL_PERCENTAGE
environment variable.
The following YAML snippet illustrates the use of this environment variable:
## percentage of ephemeral storage limit specified at `resource.limits.ephemeral-storage` inside target application
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: disk-fill
spec:
components:
env:
## percentage of ephemeral storage limit, which needs to be filled
- name: FILL_PERCENTAGE
value: "80" # in percentage
- name: TOTAL_CHAOS_DURATION
VALUE: "60"
Disk fill mebibytes
Ephemeral storage required to be filled in the target pod. Tune it by using the EPHEMERAL_STORAGE_MEBIBYTES
environment variable.
EPHEMERAL_STORAGE_MEBIBYTES
is mutually exclusive with the FILL_PERCENTAGE
environment variable. If FILL_PERCENTAGE
environment variable is set, the fault uses FILL_PERCENTAGE
for the fill. Otherwise, the dault fills the ephemeral storage based on EPHEMERAL_STORAGE_MEBIBYTES
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# ephemeral storage which needs to fill in will application
# if ephemeral-storage limits is not specified inside target application
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: disk-fill
spec:
components:
env:
## ephemeral storage size, which needs to be filled
- name: EPHEMERAL_STORAGE_MEBIBYTES
value: "256" #in MiBi
- name: TOTAL_CHAOS_DURATION
VALUE: "60"
Data block size
Size of the data block required to fill the ephemeral storage of the target pod. It is in terms of KB
. The default value of DATA_BLOCK_SIZE
is 256
KB. Tune it by using the DATA_BLOCK_SIZE
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# size of the data block used to fill the disk
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: disk-fill
spec:
components:
env:
## size of data block used to fill the disk
- name: DATA_BLOCK_SIZE
value: "256" #in KB
- name: TOTAL_CHAOS_DURATION
VALUE: "60"
Container runtime and socket path
The CONTAINER_RUNTIME
and SOCKET_PATH
environment variables to set the container runtime and socket file path, respectively.
CONTAINER_RUNTIME
: It supportsdocker
,containerd
, andcrio
runtimes. The default value iscontainerd
.SOCKET_PATH
: It contains path of containerd socket file by default(/run/containerd/containerd.sock
). Fordocker
, specify path as/var/run/docker.sock
. Forcrio
, specify path as/var/run/crio/crio.sock
.
The following YAML snippet illustrates the use of these environment variables:
## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-api-latency
spec:
components:
env:
# runtime for the container
# supports docker, containerd, crio
- name: CONTAINER_RUNTIME
value: "containerd"
# path of the socket file
- name: SOCKET_PATH
value: "/run/containerd/containerd.sock"
# provide the port of the targeted service
- name: TARGET_SERVICE_PORT
value: "80"
- name: PATH_FILTER
value: '/status'