Chaos in the World of Jenkins

Page content

Hello everyone!

Imagine a tool that regularly stop certain build processes in an existing CI/CD pipeline in order to ensure that the underlying infrastructure is indeed fault resilient.

Chaos engineering has been in the news for quite some time now and it is important for software engineers to determine the resiliency of the system. We often develop scalable, complex micro-services that connect to multiple systems but seldom do we think about the fact that what if one of the them does not work or is shut down completely. In this case, how will the system respond ?

For those of you who rely heavily on Jenkins, this post would definitely be useful. So, please stick along! 😄

Jenkins provides a plugin known as Chaos Butler that helps to bring down the build agents at a pre-defined time interval in order to prove the resiliency of the system.

Let’s understand how this plugin works!

Pre-requisites: Jenkins version 2.303.2 or above

Setup a new job in Jenkins

In this case, I have done the setup for a simple Spring-Boot based projet in Java. I have already installed the Chaos Butler plugin but if you wish to do so it can be installed by going to :

Manage Jenkins

Manage Plugins

Click on the Available tab


Configure Chaos Butler plugin

Once installed, you can configure the same by navigating to:

Manage Jenkins

Configure System

Chaos Butler


One can configure the frequency at which the Chaos Butler will run. The available options are:

  • Never (default) - the Chaos Butler is disabled
  • Every minute
  • Every 15 minutes
  • Every hour
  • Every 8 hours
  • Once per day
  • Once per week

Configuring jobs

By default for all the jobs that you create in Jenkins, Chaos Butler will be enabled. However, for some reasons if you feel you need to disable Chaos Butler for any one job, you can choose to opt out by checking the option Chaos Butler Opt-out as shown below:


Chaos Butler in action

Based on my configuration of the plugin, I was able to view the Chaos Butler in action during one of my build executions. This helped me identify the error scenarios and work on handling them gracefully in my client side application.


Note: It is recommended that agents which cannot reconnect automatically be opted out of the Chaos Butler.

Hope this was useful. In my upcoming posts, I will be talking more on Chaos Engineering using other tools or solutions.