The impact of flaky tests

Sep 12, 2023

Flaky tests, often the Achilles' heel of a software development process, are tests that exhibit both passing and failing results with the same code. They derail a software engineer's workflow by causing non-deterministic behaviour, complicating debugging and making test results unreliable. Understanding the common causes of flakiness, such as concurrency issues and test order dependency, can help developers prevent flaky tests and maintain the quality of their test suites.

The Time Drain Caused by Flaky Tests

One of the primary impacts of flaky tests is the enormous waste of precious time. For instance, at Google, about 16% of their over 4 million test suites are flaky. The time-consuming task of debugging failed tests and rerunning test suites, only to find out that the failure was due to a flaky test, can substantially slow down the software development process. When flaky tests affect end-to-end tests or regression tests, which take longer to execute, this time wastage is magnified.

Flaky Tests and Release Readiness

Flaky tests can also undermine release readiness by introducing uncertainty into the continuous integration (CI) and continuous delivery (CD) pipeline. When test results are unreliable, the risk of progressing with code changes that introduce unforeseen bugs or affect the functionality of the application increases, potentially resulting in costly rollbacks or negative user experiences.

Mitigating the Impact of Flaky Tests

Software development teams can mitigate the impact of flaky tests through early detection, effective management, and proactive prevention.

Early Detection and Management of Flaky Tests

Early detection of flaky tests can be achieved through automated test retries in the CI pipeline. Tools like GitHub and BuildPulse offer comprehensive platforms to detect flaky tests, measure their impact, and alert stakeholders of any issues. By incorporating an automated test retry mechanism, developers can distinguish between persistent errors and flaky ones, thereby improving the reliability of test runs and the validity of test results.

Google, for instance, employs a "quarantine" strategy, isolating flaky tests from the main test suite. This approach allows developers to prevent flaky tests from blocking progress and reducing the stability of the main branch, which can dramatically improve the overall software development process.

Proactive Prevention of Flaky Tests

Proactive prevention involves creating robust implementations that address common causes of test flakiness, such as order dependency, timeouts, and concurrency issues.

Developers can ensure each unit test is self-contained and independent, which can help eliminate dependencies on external factors or shared resources. Implementing proper timeout handling can minimize flakiness caused by slow or unresponsive dependencies. Following best practices for writing reliable, straightforward, and maintainable test code can also significantly reduce the occurrence of flaky tests.

Leveraging a robust testing framework tailored to the specifics of the application, such as asynchronous JavaScript APIs or HTML DOM interactions in a frontend test environment, can further mitigate the occurrence of flaky tests.

Importance of Data Analysis and Metrics

Data analysis and tracking metrics are essential in dealing with flaky tests. Comprehensive test reports provide detailed insights into each test's behavior, allowing developers to pinpoint the root cause of test failure. Tracking metrics over time can reveal patterns and trends related to test flakiness, enabling teams to focus their efforts on the most problematic areas.

Conclusion

While flaky tests pose significant challenges to software development workflows, proactive detection, effective management, and targeted prevention strategies can significantly mitigate their impact. BuildPulse provides tools to help you in this journey, end-to-end.


FAQ

What is the difference between a flaky test and a false positive?

A false positive is a test failure in your test suite due to an actual error in the code being executed, or a mismatch in what the test expects from the code.

A flaky test is when you have conflicting test results for the same code. For example, while running tests if you see that a test fails and passes, but the code hasn’t changed, then it’s a flaky test. There’s many causes of flakiness.

What is an example of a flaky test?

An example can be seen in growing ci cd pipelines - when pull request builds fail for changes you haven’t made.

What causes tests to be flaky?

Broken assumptions in test automation can introduce flaky tests - for example, if test data is shared between different tests, whether asynchronous or sequential, the results of one test can affect another. Poorly written test code can also be a factor - such as improper polling, event, or timeout handling for network requests or page loads. Any of these can lead to flaky test failures and test flakiness.

What is the best way to resolve or fix flaky tests?

Devops, software engineering, and software development teams will often need to compare code changes, logs, and other context from before the test instability started, and after. Adding retries can also help. Test detection and test execution tooling can help automate this process as well - BuildPulse enables you to find, assess impact metrics, quarantine, and fix flaky tests.

What are some strategies for preventing flaky tests?

Paying attention and prioritizing flaky tests as they come up can be a good way to prevent them from becoming an issue. This is where a testing culture is important - if a flaky test is spotted by an engineer, it should be logged right away. This, however, takes a certain level of hygiene - BuildPulse can provide monitoring so flaky tests are caught right away.