Pod delete
Pod delete is a Kubernetes pod-level chaos fault that causes specific (or random) replicas of an application resource to fail forcibly (or gracefully).
- To ensure smooth usage, applications must have a minimum number of available replicas.
- When the pressure on other replicas increases, the horizontal pod autoscaler scales based on the observed resource utilization.
Use cases
Pod delete:
- Helps check the application's deployment sanity (replica availability and uninterrupted service) and recovery workflow.
- Can be used to verify:
- Disk (or volume) re-attachment times in stateful applications.
- Application start-up times, and readiness probe configuration (health endpoints and delays).
- Adherence to topology constraints (node selectors, toleration, zone distribution, and affinity (or anti-affinity) policies).
- Proxy registration times in service-mesh environments.
- Post (lifecycle) hooks and termination seconds configuration for the microservices (under active load)- that is, graceful termination handling.
- Resource budgeting on cluster nodes (whether request or limit settings are honored on available nodes for successful schedule).
- Simulates:
- Graceful delete, or rescheduling, of pods as a result of upgrades.
- Forced deletion of pods as a result of eviction.
- Leader-election in complex applications.
Permissions required
Below is a sample Kubernetes role that defines the permissions required to execute the fault.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: pod-delete
spec:
definition:
scope: Cluster # Supports "Namespaced" mode too
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["replicasets, daemonsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
Prerequisites
- Kubernetes > 1.16
- The application pods are in the running state before and after chaos injection.
Optional tunables
Tunable | Description | Notes |
---|---|---|
TARGET_CONTAINER | Name of the container subject to pod deletion. | None. For more information, go to target specific container |
NODE_LABEL | Node label used to filter the target node if TARGET_NODE environment variable is not set. | It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE . For more information, go to node label. |
TOTAL_CHAOS_DURATION | Duration for which to insert chaos (in seconds). | Default: 15 s. Overall run duration of the fault may exceed the TOTAL_CHAOS_DURATION by a few minutes. For more information, go to duration of the chaos |
CHAOS_INTERVAL | Time interval between two successive pod failures (in seconds). | Default: 5 s. For more information, go to chaos interval |
RANDOMNESS | Introduces randomness into pod deletions with a minimum period defined by CHAOS_INTERVAL | Default: false. Supports true and false. For more information, go to random interval |
FORCE | Application pod deletion mode. false indicates graceful deletion with the default termination period of 30s, and true indicates an immediate forceful deletion with 0s grace period. | Default: true , with terminationGracePeriodSeconds=0 . For more information, go to force delete |
TARGET_PODS | Comma-separated list of application pod names subject to chaos. | If it is not provided, it selects target pods based on provided appLabels. For more information, go to target specific pods |
PODS_AFFECTED_PERC | Percentage of total pods to target . Provide numeric values. | Default: 0 (corresponds to 1 replica). For more information, go to pod affected percentage |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30 s. For more information, go to ramp time |
SEQUENCE | Sequence of chaos execution for multiple target pods. | Default: parallel. Supports serial as well. For more information, go to sequence of chaos execution |
Force delete
Specifies if the target pod is deleted forcefully
or gracefully
. This fault deletes the pod forcefully if FORCE
is set to true
and gracefully if FORCE
is set to false
. Tune it by using the FORCE
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# tune the deletion of target pods forcefully or gracefully
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
components:
env:
# provided as true for the force deletion of pod
# supports true and false value
- name: FORCE
value: "true"
- name: TOTAL_CHAOS_DURATION
value: "60"
Random interval
Specifies whether or not to enable randomness in the chaos interval by setting RANDOMNESS
environment variable to true
. It supports boolean values. The default value is false
. Tune it by using the CHAOS_INTERVAL
environment variable.
- If
CHAOS_INTERVAL
is set in the form ofl-r
that is,5-10
then it will select a random interval between l and r. - If
CHAOS_INTERVAL
is set in the form ofvalue
that is,10
then it will select a random interval between 0 and value.
The following YAML snippet illustrates the use of this environment variable:
# contains random chaos interval with lower and upper bound of range i.e [l,r]
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-delete
spec:
components:
env:
# randomness enables iterations at random time interval
# it supports true and false value
- name: RANDOMNESS
value: "true"
- name: TOTAL_CHAOS_DURATION
value: "60"
# it will select a random interval within this range
# if only one value is provided then it will select a random interval within 0-CHAOS_INTERVAL range
- name: CHAOS_INTERVAL
value: "5-10"