Pod network partition
Pod network partition is a Kubernetes pod-level fault that blocks 100 percent ingress and egress traffic of the target application by creating a network policy.
Use cases
Pod network partition tests the application's resilience to lossy or flaky network.
Permissions required
Below is a sample Kubernetes role that defines the permissions required to execute the fault.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: pod-network-partition
spec:
definition:
scope: Cluster # Supports "Namespaced" mode too
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["networkpolicies"]
verbs: ["create", "delete", "get", "list"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["replicasets, daemonsets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["chaosEngines", "chaosExperiments", "chaosResults"]
verbs: ["create", "delete", "get", "list", "patch", "update"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
Prerequisites
- Kubernetes > 1.16
- The application pods should be in the running state before and after injecting chaos.
Optional tunables
Tunable | Description | Notes |
---|---|---|
TOTAL_CHAOS_DURATION | Duration for which to insert chaos (in seconds). | Default: 60 s. For more information, go to duration of the chaos |
NODE_LABEL | Node label used to filter the target node if TARGET_NODE environment variable is not set. | It is mutually exclusive with the TARGET_NODE environment variable. If both are provided, the fault uses TARGET_NODE . For more information, go to node label. |
POLICY_TYPES | Contains the type of network policy. | It supports egress , ingress and all values. For more information, go to policy type |
POD_SELECTOR | Contains the labels of the destination pods. | For example, app=cart . For more information, go to target pods |
NAMESPACE_SELECTOR | Contains the labels of the destination namespaces. | For example, env=prod . For more information, go to target namespaces |
PORTS | Comma-separated list of the target ports. | For example, 80,443,22. For more information, go to destination ports |
DESTINATION_IPS | IP addresses of the services or pods or the CIDR blocks (range of IPs) whose accessibility impacted. Comma-separated IPs or CIDRs can be provided. | If values are not provided, the fault induces network chaos on all IPs or destinations. For more information, go to destination IPs |
DESTINATION_HOSTS | DNS names or FQDN names of the services whose accessibility is impacted. | If not provided, this fault induces network chaos for all IPs or destinations or DESTINATION_IPS if already defined. For more information, go to destination hosts |
LIB_IMAGE | Image used to inject chaos. | Default: chaosnative/chaos-go-runner:main-latest . For more information, go to image used by the helper pod. |
RAMP_TIME | Period to wait before and after injecting chaos (in seconds). | For example, 30 s. For more information, go to ramp time |
If the environment variables DESTINATION_HOSTS
or DESTINATION_IPS
are left empty, the default behaviour is to target all hosts. To limit the impact on all the hosts, you can specify the IP addresses of the service (use commas to separate multiple values) or the DNS or the FQDN names of the services in DESTINATION_HOSTS
.
Destination IPs and destination hosts
Default IPs and hosts whose traffic is interrupted due to the network faults. Tune it by using the DESTINATION_IPS
and DESTINATION_HOSTS
environment variables, respectively.
DESTINATION_IPS
: It contains the IP addresses of the services or pods or the CIDR blocks (range of IPs) whose accessibility is impacted.DESTINATION_HOSTS
: It contains the DNS names or FQDN names of the services whose accessibility is impacted.
The following YAML snippet illustrates the use of these environment variables:
# it injects the chaos for specific ips/hosts
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-partition
spec:
components:
env:
# supports comma separated destination ips
- name: DESTINATION_IPS
value: "8.8.8.8,192.168.5.6"
# supports comma separated destination hosts
- name: DESTINATION_HOSTS
value: "nginx.default.svc.cluster.local,google.com"
- name: TOTAL_CHAOS_DURATION
value: "60"
Target specific namespaces
Specifies whether or not to provide access to and from the pods in a specific namespace. By default, the network partition interrupts traffic for all the namespaces. Tune it by using the NAMESPACE_SELECTOR
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# it injects the chaos for specified namespaces, matched by labels
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-partition
spec:
components:
env:
# labels of the destination namespace
- name: NAMESPACE_SELECTOR
value: "key=value"
- name: TOTAL_CHAOS_DURATION
value: "60"
Target specific pods
Specifies whether or not to provide access to and from specific pods by specifying the pod labels. By default, the network partition fault interrupts traffic for all the external pods. Tune it by using the POD_SELECTOR
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# it injects the chaos for specified pods, matched by labels
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-partition
spec:
components:
env:
# labels of the destination pods
- name: POD_SELECTOR
value: "key=value"
- name: TOTAL_CHAOS_DURATION
value: "60"
Policy type
Specifies whether or not to tune the interruption of the ingress or egress traffic. By default, the network partition fault interrupts both ingress and egress traffic. Tune it by using the POLICY_TYPES
environment variable.
The following YAML snippet illustrates the use of this environment variable:
# inject network loss for only ingress or only egress or all traffics
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-partition
spec:
components:
env:
# provide the network policy type
# it supports `ingress`, `egress`, and `all` values
# default value is `all`
- name: POLICY_TYPES
value: "all"
- name: TOTAL_CHAOS_DURATION
value: "60"
Destination ports
Comma-separated list of ports that interrupt the traffic during a network partition fault. Specific ports are accessed by tuning PORTS
environment variable.
- If
PORT
is not set and none of the pod-selector, namespace-selector, or destination_ips are provided, then the fault blocks the traffic for all ports for all pods and IPs. - If
PORT
is not set but one of the pod-selector, namespace-selector, or destination_ips are provided, then the fault allows all the ports for all the pods and IPs filtered by the specified selectors.
The following YAML snippet illustrates the use of this environment variable:
# it injects the chaos for specified ports
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: engine-nginx
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: litmus-admin
experiments:
- name: pod-network-partition
spec:
components:
env:
# comma separated list of ports
- name: PORTS
value: "tcp: [8080,80], udp: [9000,90]"
- name: TOTAL_CHAOS_DURATION
value: "60"