Skip to main content

Pod IO stress

Pod I/O stress is a Kubernetes pod-level chaos fault that causes I/O stress on the application pod by increasing the number of input and output requests. Applying stress on the disk with continuous and heavy I/O degrades the reads and writes with respect to the microservices. Scratch space consumed on a node may lead to lack of memory for new containers to be scheduled. All these aspects increase resilience to stress.

Pod IO Stress

Use cases

Pod IO stress:

  • Aims to verify the resilience of applications that share the disk resource for ephemeral (or persistent) storage.
  • Simulates slower disk operations by the application.
  • Simulates noisy neighbour problems by hogging the disk bandwidth.
  • Verifies the disk performance on increasing I/O threads and varying I/O block sizes.
  • Checks how the application functions under high disk latency conditions and when I/O traffic is very high.
  • Checks how the application functions under large I/O blocks, and when other services monopolize the I/O disks.

Permissions required

Below is a sample Kubernetes role that defines the permissions required to execute the fault.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: pod-io-stress
spec:
definition:
scope: Cluster # Supports "Namespaced" mode too
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["replicasets, daemonsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]

Prerequisites

  • Kubernetes > 1.16
  • The application pods should be in the running state before and after injecting chaos.

Optional tunables

Tunable Description Notes
FILESYSTEM_UTILIZATION_PERCENTAGE Specifies the size as a percentage of free space on the file system. Default: 10 %. For more information, go to file system utilization percentage
FILESYSTEM_UTILIZATION_BYTES Specifies the size in gigabytes (GB). FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES are mutually exclusive. If both the values are provided, FILESYSTEM_UTILIZATION_PERCENTAGE takes priority. For more information, go to file system utilization bytes
NUMBER_OF_WORKERS Number of IO workers involved in IO disk stress. Default: 4. For more information, go to workers for stress
TOTAL_CHAOS_DURATION Duration for which to insert chaos (in seconds). Default: 120 s. For more information, go to duration of the chaos
NODE_LABEL Node label used to filter the target node if TARGET_NODE environment variable is not set. It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE. For more information, go to node label.
VOLUME_MOUNT_PATH Fill the given volume mount path. For more information, go to mount path
TARGET_PODS Comma-separated list of application pod names subject to pod IO stress. If not provided, the fault selects target pods randomly based on provided appLabels. For more information, go to target specific pods
PODS_AFFECTED_PERC Percentage of total pods to target. Provide numeric values. Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage
LIB_IMAGE Image used to inject chaos. Default: chaosnative/chaos-go-runner:main-latest. For more information, go to image used by the helper pod.
CONTAINER_RUNTIME Container runtime interface for the cluster. Default: containerd. Supports docker, containerd and crio. For more information, go to container runtime
SOCKET_PATH Path of the containerd or crio or docker socket file. Default: /run/containerd/containerd.sock For more information, go to socket path
RAMP_TIME Period to wait before and after injecting chaos (in seconds). For example, 30 s. For more information, go to ramp time
SEQUENCE Sequence of chaos execution for multiple target pods. Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution

File system utilization percentage

Amount (in percentage) of free space in the pod. Tune it by using the FILESYSTEM_UTILIZATION_PERCENTAGE environment variable.

The following YAML snippet illustrates the use of this environment variable:

# stress the i/o of the targeted pod with FILESYSTEM_UTILIZATION_PERCENTAGE of total free space
# it is mutually exclusive with the FILESYSTEM_UTILIZATION_BYTES.
# if both are provided then it will use FILESYSTEM_UTILIZATION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# percentage of free space of file system, need to be stressed
- name: FILESYSTEM_UTILIZATION_PERCENTAGE
value: "10" #in GB
- name: TOTAL_CHAOS_DURATION
VALUE: "60"

File system utilization bytes

Amount of free space available in the pod in gigabytes (GB). Tune it by using the FILESYSTEM_UTILIZATION_BYTES environment variable. FILESYSTEM_UTILIZATION_PERCENTAGE and FILESYSTEM_UTILIZATION_BYTES environment variables are mutually exclusive. If both the values are provided, FILESYSTEM_UTILIZATION_PERCENTAGE takes priority.

The following YAML snippet illustrates the use of this environment variable:

# stress the i/o of the targeted pod with given FILESYSTEM_UTILIZATION_BYTES
# it is mutually exclusive with the FILESYSTEM_UTILIZATION_PERCENTAGE.
# if both are provided then it will use FILESYSTEM_UTILIZATION_PERCENTAGE for stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# size of io to be stressed
- name: FILESYSTEM_UTILIZATION_BYTES
value: "1" #in GB
- name: TOTAL_CHAOS_DURATION
VALUE: "60"

Container runtime and socket path

The CONTAINER_RUNTIME and SOCKET_PATH environment variables to set the container runtime and socket file path, respectively.

  • CONTAINER_RUNTIME: It supports docker, containerd, and crio runtimes. The default value is containerd.
  • SOCKET_PATH: It contains path of containerd socket file by default(/run/containerd/containerd.sock). For docker, specify path as /var/run/docker.sock. For crio, specify path as /var/run/crio/crio.sock.

The following YAML snippet illustrates the use of this environment variable:

## provide the container runtime and socket file path
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# runtime for the container
# supports docker, containerd, crio
- name: CONTAINER_RUNTIME
value: "containerd"
# path of the socket file
- name: SOCKET_PATH
value: "/run/containerd/containerd.sock"
- name: TOTAL_CHAOS_DURATION
VALUE: "60"

Mount path

Volume mount path that is to be filled. Tune it by using the VOLUME_MOUNT_PATH environment variable.

The following YAML snippet illustrates the use of this environment variable:

# provide the volume mount path, which needs to be filled
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# path need to be stressed/filled
- name: VOLUME_MOUNT_PATH
value: "/some-dir-in-container"
- name: TOTAL_CHAOS_DURATION
VALUE: "60"

Workers for stress

Number of workers for the stress. Tune it by using the NUMBER_OF_WORKERS environment variable.

The following YAML snippet illustrates the use of this environment variable:

# number of workers for the stress
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-io-stress
spec:
components:
env:
# number of io workers
- name: NUMBER_OF_WORKERS
value: "4"
- name: TOTAL_CHAOS_DURATION
VALUE: "60"