The Litmus developer community is not unfamiliar with the scaffolding utilities that the project provides to bootstrap chaos experiment code. There is a cool blog about it too. While the procedure helped developers get started quickly by generating all the necessary artifacts & laying out the standard experiment sequence, performing developer testing (popularly called DevTest) before the experiment got added into the litmus-e2e for regular pipeline runs was still a tad cumbersome.
Why Dev Testing On the Cluster is Necessary
As with any application for Kubernetes, the chaos experiments need to be tested, as we code along, in a Kubernetes cluster. While one can still run the experiment binaries by providing the right config (or the KUBECONFIG env variable), the developer is bound to make multiple runs with the experiment business logic running out of a pod in the cluster, as that is the eventual execution mode. This becomes especially important when these experiments need to be injected with config information in the form of ENV variables, config maps, secrets, or if they make use of host files at runtime.
Sometimes, the experiment may need to be run with a specific set of permissions or security policies, in which case the operational characteristics and experiment stability can be known only when it is run on an appropriate cluster environment.
The Problem Statement
Until now, this was being achieved by building Kubernetes Job manifests with all the run characteristics burned into the spec & running a private docker image from the developer's repository. However, in this model, any fixes/corrections, or enhancements would mean repeating the cycle - Fix Code -> Build Experiment -> Build Docker Image -> Push Docker Image -> Re-run Kubernetes Experiment Job. While one could churn out a quick script to do this, you would still need to maintain it & also update it for different experiments' needs. I would like to avoid this additional complexity, wouldn't you?
Okteto is a popular open-source project that enables you to perform what is called as "in-cluster development" by spinning up a dev environment (essentially a Kubernetes deployment on the cluster with the right base image replete with your favorite dev tools, described by a simple okteto.yaml manifest) consisting of the application source code. The terminal for this development container is provided right within your workspace, with any code changes made on your IDE getting reflected on the dev containers via a sync mechanism. This enables developers to run the latest code directly in the cluster, thereby accelerating what the Okteto community calls the "inner loop of development".
Okteto for Litmus Chaos Experiments
While Okteto is pretty nifty & helps people develop in the cluster, as the Litmus dev team, we needed a mechanism where the development container is spun-up with exactly the same dependencies and config parameters that the specific experiment desires. For example, the network chaos experiments need a different set of configuration params passed, to, say, a disk-based experiment. Here is where the swap mode of Okteto helps. While okteto can launch custom dev environments for your code, it can also swap container images on existing deployments, thereby inheriting all its properties. Here is a nice blog explaining this with a simple demo app.
Based on this knowledge, we decided to tweak the scaffold utilities to generate a test deployment with the standard chaos experiment ENVs/other config details burned into the spec (such as the experiment-specific RBAC/service account), while using a busybox image that does nothing - instead of the experiment job spec, that you have been used to seeing. With this, you could update the manifest to include any missing info, deploy it on the cluster and eventually use okteto to swap the image on this deployment to that of the development container to kickstart your dev-test process.
Steps to Perform DevTest of Litmus Chaos Experiments
Note: Refer to the previous (part-1) blog to go through the steps to bootstrap your experiment code. For the subsequent instructions, I assume you have gone through these initial steps. By this point, you must:
- Have already cloned the [litmus-go] (https://github.com/litmuschaos/litmus-go) repository (your fork, that is.)
- Live at the path
- Have generated the experiment artifacts & written the first-cut business logic of the chaos experiment
- Have access to a dev cluster (minikube or the like) with the kubeconfig setup from your workspace. You could also use the Okteto Cloud which allows you a free namespace with enough resources and even SSL endpoints for your apps!!)
Now, let's get testing our changes in the cluster!!
Install the Okteto CLI
curl https://get.okteto.com -sSfL | sh
(Optional) Create a sample nginx deployment that can be used as the application under test (AUT).
kubectl create deployment nginx --image=nginx
Setup the RBAC necessary for execution of this experiment by applying the generated
kubectl apply -f rbac.yaml
test/test.yamlwith the desired values (app & chaos info) in the ENV and appropriate
chaosServiceAccountalong with any other dependencies, if applicable (configmaps, volumes etc.,) & create this deployment
kubectl apply -f test/test.yml
Go to the root of this repository (litmuschaos/litmus-go) & launch the Okteto development environment in your workspace. This should take you to the bash prompt on the dev container into which the content of the litmus-go repo is loaded.
root@test:~/okteto/litmus-go# okteto up Deployment litmus-go doesn't exist in namespace litmus. Do you want to create a new one? [y/n]: y ✓ Development container activated ✓ Files synchronized The value of /proc/sys/fs/inotify/max_user_watches in your cluster nodes is too low. This can affect file synchronization performance. Visit https://okteto.com/docs/reference/known-issues/index.html for more information. Namespace: default Name: litmus-experiment Forward: 2345 -> 2345 8080 -> 8080 Welcome to your development container. Happy coding!
This dev container inherits the env, serviceaccount & other properties specified on the test deployment & is now suitable for running the experiment.
Execute the experiment against the sample app chosen & verify the steps via logs printed on the console.
go run experiments/<chaos-category>/<experiment name>/<experiment-name>.go
In parallel, observe the experiment execution via the changes to the chaos & application pods
watch -n 1 kubectl get pods
If there are necessary changes to the code based on the run, make them via your favorite IDE. These changes are automatically reflected in the dev container. Re-run the experiment to confirm changes.
Once the experiment code is validated, stop/remove the development environment
root@test:~/okteto/litmus-go# okteto down ✓ Development container deactivated i Run 'okteto push' to deploy your code changes to the cluster
You could also run a final test using the ChaosExperiment CR with the final (pushed image) image & a ChaosEngine CR mapping the experiment to the sample app we used earlier. For this, you would need the chaos operator running on the cluster along with the chaos CRDs installed. This is a breeze in the Okteto Cloud, which provides you with a single click install of the litmus infra.
As the Litmus community, we are always on the lookout for collaborating with cool projects & build/adopt techniques to simplify your life as chaos engineers & litmus developers. Give this a try & let us know your feedback!!
Are you an SRE or a Kubernetes enthusiast? Does Chaos Engineering excite you? Join our community on Slack For detailed discussions & regular updates On Chaos Engineering For Kubernetes.
Check out the LitmusChaos GitHub repo and do share your feedback. Submit a pull request if you identify any necessary changes.