How to run Azure Disk Loss Experiment in LitmusChaos

This article is a guide for setting up and running the Azure Virtual Disk Loss experiment on LitmusChaos 2.0. The experiment causes detachment of one or more virtual disks from the instance for a certain chaos duration and then re-attached them. The broad objective of this experiment is to extend support of LitmusChaos to non-Kubernetes targets while ensuring resiliency for all kinds of targets, as a part of a single chaos workflow for the entirety of a business.

Currently, the experiment is available only as a technical preview in the chaos hub, so we will have to use the master branch of the chaos hub to access it.

If you are looking for the Azure Instance Stop experiment, you can find it here

Pre-Requisites

To run this experiment, we need a few things beforehand

An Azure account
Disk(s) attached to Virtual Machine Scale Set (or an Instance only)
A Kubernetes cluster with LitmusChaos 2.0 installed (you can follow this blog to set up LitmusChaos 2.0 on AKS — Getting Started with LitmusChaos 2.0 in Azure Kubernetes Service)

Setting up Azure Credentials as Kubernetes Secret

To let LitmusChaos access your Azure instances, you need to set up the azure credentials as a Kubernetes secret. It is a very simple process, first, you need to install Azure CLI (if you already haven’t) and log in to it. Now run this command to get the azure credentials saved in an azure.auth file.

az ad sp create-for-rbac — sdk-auth > azure.auth

Next, create a secret.yaml file with the following content. Change the content inside azure.auth with the contents inside your azure.auth file

apiVersion: v1
kind: Secret
metadata:
  name: cloud-secret
type: Opaque
stringData:
  azure.auth: |-
    {
      "clientId": "XXXXXXXXX",
      "clientSecret": "XXXXXXXXX",
      "subscriptionId": "XXXXXXXXX",
      "tenantId": "XXXXXXXXX",
      "activeDirectoryEndpointUrl": "XXXXXXXXX",
      "resourceManagerEndpointUrl": "XXXXXXXXX",
      "activeDirectoryGraphResourceId": "XXXXXXXXX",
      "sqlManagementEndpointUrl": "XXXXXXXXX",
      "galleryEndpointUrl": "XXXXXXXXX",
      "managementEndpointUrl": "XXXXXXXXX"
    }

Now run the following command. Remember to change the namespace if you have installed LitmusChaos in any other namespace

kubectl apply -f secret.yaml -n litmus

Updating ChaosHub

As the experiment is only available as a technical preview right now, we will have to update the ChaosHub to use the technical preview (master) branch.

Now change the branch to “master”.

Click on Submit Now and the ChaosHub will now show the Azure Disk Loss experiment.

Scheduling the Experiment Workflow

Now move to the Workflows section and click on Schedule a Workflow. Select the Self-Agent (or any other one if you have multiple agents installed) and click on Next.

Select the third option to create a workflow from experiments using ChaosHub. Click on Next.

Click Next again (or edit the workflow name if you want to) and now on the Experiments page, click on Add a new Experiment and select the Azure Virtual Disk Loss experiment.

Next click on Edit YAML, you will now have to add the Disk Name(s) and Resource Group name in the ChaosEngine environments. Scroll down to the ChaosEngine artefacts, where you will see the environment variables, set the values accordingly. If your disks are connected to an instance that is a part of Scale Set, set the SCALE_SET to “enable”. Save the changes and schedule your workflow

Note: For Scale set and node pools, the experiment works only for disk(s) attached to a specific instance in the scale set and not to the scale set

Observing the Experiment Run

Great, now your workflow is running and you can check it out, click on Go to Workflow and then select your workflow.

You can check the status of your disk(s) in the Azure Portal to verify that the experiment is working as expected.

You can also click on azure-disk-loss to view the experiment logs. After the given chaos duration, the experiment will automatically re-attach the disk(s), and it will give a pass/fail verdict. In case the experiment fails, verify through the logs and portal that the disk have been re-attached.

This was it, you have successfully run the Azure Disk Loss experiment using LitmusChaos 2.0 Chaos Center.

In this blog, we saw how we can perform the Azure Disk Loss experiment using LitmusChaos 2.0. You can learn more about this experiment from the docs. This experiment is one of the many experiments Non-Kubernetes experiments in LitmusChaos, including experiments for AWS, GKS, VMWare, which are targeted towards making Litmus an absolute Chaos Engineering toolset for every enterprise regardless of the technology stack used.

You can join the LitmusChaos community on Github and Slack. The community is very active and tries to solve queries quickly.

I hope you enjoyed this journey and found the blog interesting. You can leave your queries or suggestions (appreciation as well) in the comments below.

Show your ❤️ with a ⭐ on our Github. To learn more about Litmus, check out the Litmus documentation. Thank you! 🙏

Thank you for reading

Akash Shrivastava

Software Engineer at ChaosNative

Linkedin | Github | Instagram | Twitter