Chaos faults for AWS

Introduction

AWS faults disrupt the resources running on different AWS services from the EKS cluster. To perform such AWS chaos experiments, you will need to authenticate CE with the AWS platform. This can be done in two ways.

Using secrets: You can use secrets to authenticate CE with AWS regardless of whether the Kubernetes cluster is used for the deployment. This is Kubernetes' native way of authenticating CE with AWS.
IAM integration: You can authenticate CE using AWS using IAM when you have deployed chaos on the EKS cluster. You can associate an IAM role with a Kubernetes service account. This service account can be used to provide AWS permissions to the experiment pod which uses the particular service account.

Here are AWS faults that you can execute and validate.

ALB AZ down

ALB AZ down takes down the AZ (Availability Zones) on a target application load balancer for a specific duration.

CLB AZ down

CLB AZ down takes down the AZ (Availability Zones) on a target CLB for a specific duration.

EBS loss by ID

EBS loss by ID disrupts the state of EBS volume by detaching it from the node (or EC2) instance using volume ID for a certain duration.

EBS loss by tag

EBS loss by tag disrupts the state of EBS volume by detaching it from the node (or EC2) instance using volume ID for a certain duration.

EC2 CPU hog

EC2 CPU hog disrupts the state of infrastructure resources. It induces stress on the AWS ECS container using Amazon SSM Run command, which is carried out using SSM docs which is in-built into the fault.

EC2 DNS chaos

EC2 DNS chaos causes DNS errors on the specified EC2 instance for a specific duration.

Page 1 of 9

NLB AZ down

NLB AZ down takes down the access for AZ (Availability Zones) on a target network load balancer for a specific duration. This fault:

Restricts access to certain availability zones for a specific duration.
Tests the application's ability to handle the loss of availability zones and maintain uninterrupted traffic flow.

Use cases

NLB AZ down fault disrupts the traffic routing through the network load balancer, testing the application's resilience to AZ failures.
Simulating network failures and verifying the application's ability to recover and redirect traffic appropriately.

Introduction​

NLB AZ down​

ECS Fargate Memory Hog​

Resource Access Restrict​

ECS Container Volume Detach​

ECS Fargate CPU Hog​

ECS Fargate Memory Hog​

ALB AZ down​

CLB AZ down​

EBS loss by ID​

EBS loss by tag​

EC2 CPU hog​

EC2 DNS chaos​

EC2 HTTP latency​

EC2 HTTP modify body​

EC2 HTTP modify header​

EC2 HTTP reset peer​

EC2 HTTP status code​

EC2 IO stress​

EC2 memory hog​

EC2 network latency​

EC2 network loss​

EC2 process kill​

EC2 stop by ID​

EC2 stop by tag​

ECS agent stop​

ECS container CPU hog​

ECS container IO stress​

ECS container memory hog​

ECS container network latency​

ECS container network loss​

ECS instance stop​

ECS task stop​

ECS task scale​

ECS container HTTP latency​

ECS container HTTP modify body​

ECS container HTTP reset peer​

ECS container HTTP status code​

ECS invalid container image​

ECS network restrict​

ECS update container resource limit​

ECS update container timeout​

ECS update task role​

Lambda delete event source mapping​

Lambda toggle event mapping state​

Lambda update function memory​

Lambda update function timeout​

Lambda update role permission​

Lambda delete function concurrency​

RDS instance delete​

RDS instance reboot​

SSM chaos by ID​

Windows EC2 blackhole chaos​

Windows EC2 CPU hog​

Windows EC2 memory hog​

SSM chaos by tag​

Introduction

NLB AZ down

ECS Fargate Memory Hog

Resource Access Restrict

ECS Container Volume Detach

ECS Fargate CPU Hog

ECS Fargate Memory Hog

ALB AZ down

CLB AZ down

EBS loss by ID

EBS loss by tag

EC2 CPU hog

EC2 DNS chaos

EC2 HTTP latency

EC2 HTTP modify body

EC2 HTTP modify header

EC2 HTTP reset peer

EC2 HTTP status code

EC2 IO stress

EC2 memory hog

EC2 network latency

EC2 network loss

EC2 process kill

EC2 stop by ID

EC2 stop by tag

ECS agent stop

ECS container CPU hog

ECS container IO stress

ECS container memory hog

ECS container network latency

ECS container network loss

ECS instance stop

ECS task stop

ECS task scale

ECS container HTTP latency

ECS container HTTP modify body

ECS container HTTP reset peer

ECS container HTTP status code

ECS invalid container image

ECS network restrict

ECS update container resource limit

ECS update container timeout

ECS update task role

Lambda delete event source mapping

Lambda toggle event mapping state

Lambda update function memory

Lambda update function timeout

Lambda update role permission

Lambda delete function concurrency

RDS instance delete

RDS instance reboot

SSM chaos by ID

Windows EC2 blackhole chaos

Windows EC2 CPU hog

Windows EC2 memory hog

SSM chaos by tag