Pitest: Measure the Quality of your Unit Tests with Mutation Testing

It is not uncommon among developers to discuss about the quality of automated unit tests: Are they testing enough of application code? And more importantly, are they really verifying the expected behavior?

The first question has a relatively simple answer: use automated code coverage tools that will track which lines of code and which branches in execution flow are being tested. Code coverage reports are very helpful to 1) determine which portions of application code are not being tested; and 2) if measuring code coverage per individual test, determine whether each test is effectively testing the appropriate piece of application code. If interested in techniques for that, you may want to look at this other blog post: https://deors.wordpress.com/2014/07/04/individual-test-coverage-sonarqube-jacoco/

However, no matter how useful is to measure code coverage, these reports will not let you know one fundamental aspect of tests: which behavior is being verified!

Simply put, your test code may be passing through every single line of your code, and not verifying anything. If you are familiar with JUnit framework, your test code may not contain a single assertion!

To overcome this limitation of automated unit testing, one technique that can be of great help is Mutation Testing.

Mutation Testing… Explained

Let’s assume you have your application code and your test code as usual. A mutation testing tool will take your application code and make small surgical changes, one at a time, a so-called “mutation”. It could be changing a logical operator in an if statement (e.g. > is changed to <=), it could be removing some service call, it could be changing some for loop, it could be altering some return value, and so forth.

Mutation testing is, therefore, based on the assumption that if you are testing your code and making the right assertions to verify the behavior, once you re-execute your unit tests with a mutation in application code some of them should fail.

Pitest – A Mutation Testing Framework for Java

Although very interesting, such a technique would be useless without the proper tools. There are some, and for different languages, like Jester, Jumble or NinjaTurtles, but probably the most mature and powerful we’ve seen to date is Pitest (http://pitest.org).

Working with Pitest is very simple and requires minimal effort to start. It can be integrated with build tools like Maven, Ant or Gradle, with IDEs like Eclipse (Pitclipse plug-in) or IntelliJ, and with quality tools like SonarQube.

Regardless of the way you execute it, Pitest will analyze the application byte codes and decide which mutations will be introduced (for a full list and description of available mutators in Pitest, check their site here: http://pitest.org/quickstart/mutators/).

To optimize the test execution as much as possible, Pitest gathers code coverage metrics in a “normal execution” and then re-executes only the matching test cases for a certain mutation. Total execution time is noticeably longer than a normal unit test execution, basically because the incredible test harness that Pitest adds even to the most simple of code bases.

As a result, Pitest generates a fully detail report showing which mutations “lived” after the execution, that is, which mutations where not detected by any existing assertion. These “lived” mutations are your main focus, because they mean that there is some logic, some return value, or some call that is not being verified.

Of course not all of the mutations will be meaningful. Some may produce out of memory errors or infinite loops. For those cases, Pitest does its best to detect them and remove from the resulting reports. These can be fine-tuned if needed, for example by tuning time outs and other parameters, but sensible defaults work really well to start with.

Pitest in Action

Seeing is believing, so we put Pitest to work on a simple 10-classes Java library. We decided to use the Maven plug-in, as this method requires zero configuration to start. We opened a command prompt at the project directory, and just executed this command:

> mvn org.pitest:pitest-maven:1.0.0:mutationCoverage

After a few minutes (5 to 6 for this project) and lots of iterations showing in the console, the build finishes and the reports are generated in target directory:

> target\pit-reports\201408181908\index.html

When the report loaded in the browser, the first fact that caught our attention was that one class, that we worked hard to be fully tested, AbstractContext, although with a 100% code coverage it showed one lived mutation. Oops, something was not properly verified. Was Pitest right?


After clicking the class name, we could see the detail on where the lived mutation was found:


Pitest was right! Although that method is fully tested, and there are test cases for every single execution flow, we were missing the proper assertion for that if statement. Really really hard to catch if not for a good tool helping us to find out more about our unit tests.

Of course, next step was to add the forgotten assertion to the relevant test method. Once done, we re-launched Pitest. After a few minutes, a new set of reports where created and once loaded in the browser… clean result for that class!



Although arguably a bit fortunate to obtain such a fabulous result at the first try, it is true that after a more thorough inspection of the reports we found many other places where assertions were missing.

Our view is that Pitest is a very valuable tool to write really meaningful and truly useful automated unit test suites, and should be standard gear for Java projects going forward. It is simple to use, requires zero or minimal configuration, and produces valuable results that directly impact in the quality of the test we create, and therefore in the quality of our deliverables.

To mutate, or not to mutate: that is the question.
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous unit tests.

Script to List Key Job Settings in Jenkins at a Glance

One can get addicted to scripting in Jenkins quickly! 😉

When you have dozens even hundreds of jobs in Jenkins, it is really important to have a way to review or change job settings in one shot. One of my favorite scripts, that I use when I want to get key settings from all jobs at a glance, is this one:

import hudson.model.*
import hudson.maven.*

for (job in Hudson.instance.items) {
  if (job instanceof MavenModuleSet) {
    mms = (MavenModuleSet) job
    def name = mms.name
    def jdk = "def"
    if (mms.JDK != null) {
      jdk = mms.JDK.name
    def mvn = mms.mavenName
    def goals = mms.goals
    printf("%-50s | %-10s | %-15s | %-50s\n", name, jdk, mvn, goals)

And this is a example output. I love it! 😀  (Be sure to scroll right to see full output.)

deors.demos.annotations.base                       | jdk-8      | null            | clean install                                     
deors.demos.annotations.base.client                | jdk-8      | null            | clean test                                        
deors.demos.annotations.base.processors            | jdk-8      | null            | clean install                                     
deors.demos.annotations.beaninfo                   | jdk-8      | null            | clean install                                     
deors.demos.annotations.beaninfo.client            | jdk-8      | null            | clean test                                        
deors.demos.annotations.beaninfo.processors        | jdk-8      | null            | clean install                                     
deors.demos.annotations.velocity.client            | jdk-8      | null            | clean test                                        
deors.demos.annotations.velocity.processors        | jdk-8      | null            | clean install                                     
deors.demos.batch.springbatch2                     | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.cloud.gae                              | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.cloud.heroku                           | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.cloud.rhc                              | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.cloud.vmc                              | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.java8                                  | jdk-8      | maven-3.2.1     | clean verify                                      
deors.demos.testing.arquillian                     | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.testing.arquillian-glassfish-embedded  | jdk-7      | null            | clean verify -Parquillian-glassfish-embedded      
deors.demos.testing.arquillian-glassfish-remote    | jdk-7      | null            | clean verify -Parquillian-glassfish-remote,!arquillian-glassfish-embedded
deors.demos.testing.arquillian-jboss-managed       | jdk-7      | null            | clean verify -Parquillian-jboss-managed,!arquillian-glassfish-embedded
deors.demos.testing.arquillian-jboss-remote        | jdk-7      | null            | clean verify -Parquillian-jboss-remote,!arquillian-glassfish-embedded
deors.demos.testing.arquillian-weld-embedded       | jdk-7      | null            | clean verify -Parquillian-weld-embedded,!arquillian-glassfish-embedded
deors.demos.testing.htmlunit                       | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.testing.htmlunit-cargo-glassfish       | jdk-7      | null            | -P glassfish cargo:redeploy                       
deors.demos.testing.htmlunit-cargo-jboss           | jdk-7      | null            | -P jboss cargo:redeploy                           
deors.demos.testing.htmlunit-cargo-tomcat          | jdk-7      | null            | -P tomcat cargo:redeploy                          
deors.demos.testing.htmlunit-deploy-glassfish      | jdk-7      | null            | clean install                                     
deors.demos.testing.htmlunit-deploy-tomcat         | jdk-7      | null            | clean install                                     
deors.demos.testing.mocks                          | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.testing.selenium                       | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.testing.selenium-cargo-glassfish       | jdk-7      | null            | -P glassfish cargo:redeploy                       
deors.demos.testing.selenium-cargo-jboss           | jdk-7      | null            | -P jboss cargo:redeploy                           
deors.demos.testing.selenium-cargo-tomcat          | jdk-7      | null            | -P tomcat cargo:redeploy                          
deors.demos.testing.selenium-deploy-glassfish      | jdk-7      | null            | clean install                                     
deors.demos.testing.selenium-deploy-tomcat         | jdk-7      | null            | clean install                                     
deors.demos.web.gwt2                               | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.web.gwt2spring                         | jdk-7      | maven-3.2.1     | clean verify                                      
deors.demos.web.springmvc3                         | jdk-7      | maven-3.2.1     | clean verify                                      
petclinic-1-build-test                             | jdk-7      | maven-3.2.1     | clean test                                        
petclinic-2-package                                | jdk-7      | null            | package -DskipTests=true                          
petclinic-3-tomcat-run                             | jdk-7      | null            | cargo:run -Pcargo-tomcat                          
petclinic-4-verify-selenium-htmlunit               | jdk-7      | maven-3.2.1     | failsafe:integration-test -P selenium-tests       
petclinic-5-verify-jmeter                          | jdk-7      | maven-3.2.1     | jmeter:jmeter -P jmeter-tests                     
petclinic-6-tomcat-stop                            | jdk-7      | null            | cargo:stop -Pcargo-tomcat                         
petclinic-9a-verify-selenium-openshift             | jdk-7      | maven-3.2.1     | failsafe:integration-test -P selenium-tests       
petclinic-9b-verify-selenium-heroku                | jdk-7      | maven-3.2.1     | failsafe:integration-test -P selenium-tests       
petclinic-full-all-browsers                        | jdk-7      | maven-3.2.1     | clean verify -P cargo-tomcat,selenium-tests       
petclinic-full-htmlunit-sonar                      | jdk-7      | maven-3.2.1     | clean verify -P cargo-tomcat,selenium-tests,jmeter-tests