Products

What are flaky tests?

Aug 25, 2023

Flaky tests refer to test cases that both pass and fail nondeterministically. Despite having unaltered code and testing conditions, these tests yield varying results. This characteristic of flaky tests—producing inconsistent outcomes—leads to debugging challenges and can considerably undermine the confidence that developers place in their test infrastructure. Consequently, timely identification and resolution of flaky tests becomes an integral aspect of maintaining the reliability and efficiency of your testing process.

Significant Impact and the Subsequent Challenges

Flaky tests are capable of introducing substantial challenges in the development workflow. Initially, they contribute to unnecessary expenditure of time and effort as developers scramble to debug failed tests, only to realize these failures are not indicative of any substantial issues with the functionality. Developers, realizing that their code changes are not causing the failures, often resort to re-running the tests until they yield a passing result.

However, this cycle of constantly re-running failed tests, coupled with the non-deterministic nature of flaky tests, complicates the process of root cause analysis. This scenario can also introduce unnecessary failure points in Continuous Integration (CI) pipelines, due to the intermittent flakiness, thereby affecting test effectiveness and the overall development process. Automated tests form a crucial component of CI workflows. They ensure that code changes are meticulously validated. However, the introduction of flaky tests can lead to false positives and negatives, resulting in unreliable outcomes. These issues can create bottlenecks in the engineering process, reduce product stability, and affect release readiness. As a result, it is paramount to swiftly address any flakiness to prevent failed test runs and delayed deployments.

Common Causes Behind Flaky Tests

Flaky tests can be attributed to a myriad of factors and dependencies within the testing environment. Some of the most commonly observed causes include:

Poorly written tests: Tests that are based on insufficient assumptions or lack adequate test enforcement can lead to flakiness. If the test cases do not accurately depict the expected functionality, it can result in unpredictable outcomes.
Asynchronous operations: Test flakiness can arise when the tests involve asynchronous operations that rely heavily on timing. For instance, waiting for API responses or handling front-end interactions involving asynchronous JavaScript code can introduce elements of non-determinism.
Test order dependency: Ideally, tests should be independent and should not rely on the order of execution or shared resources. If a test case presupposes a specific execution order or alters shared data, it can lead to inconsistent test outcomes.
Concurrency issues: Tests that involve concurrency can be prone to flakiness. If the assumptions about the order of operations performed by different threads are incorrect, it could make the test outcomes non-deterministic.

Impact on DOM and End-to-End Testing

Flaky tests can cause particular issues when conducting end-to-end testing that involves the Document Object Model (DOM). Interactions with HTML and JavaScript can introduce elements of non-determinism, thereby causing flakiness in test outcomes. It is therefore imperative to address this challenge by thoughtfully designing test cases, incorporating suitable wait times, and utilizing specialized testing frameworks specifically designed for end-to-end scenarios.

Role in Unit Tests and Regression Testing

Flaky tests can significantly undermine the effectiveness of unit tests, which play a crucial role in identifying issues at the code level. Flakiness in unit tests can lead to false positives or missed regressions. Therefore, prioritizing regression testing and ensuring the reliability of unit tests is essential to maintain the stability and quality of your codebase.

FAQ

What is the difference between a flaky test and a false positive?

A false positive is a test failure in your test suite due to an actual error in the code being executed, or a mismatch in what the test expects from the code.

A flaky test is when you have conflicting test results for the same code. For example, while running tests if you see that a test fails and passes, but the code hasn’t changed, then it’s a flaky test. There’s many causes of flakiness.

What is an example of a flaky test?

An example can be seen in growing ci cd pipelines - when pull request builds fail for changes you haven’t made.

What causes tests to be flaky?

Broken assumptions in test automation can introduce flaky tests - for example, if test data is shared between different tests, whether asynchronous or sequential, the results of one test can affect another. Poorly written test code can also be a factor - such as improper polling, event, or timeout handling for network requests or page loads. Any of these can lead to flaky test failures and test flakiness.

What is the best way to resolve or fix flaky tests?

Devops, software engineering, and software development teams will often need to compare code changes, logs, and other context from before the test instability started, and after. Adding retries can also help. Test detection and test execution tooling can help automate this process as well - BuildPulse enables you to find, assess impact metrics, quarantine, and fix flaky tests.

What are some strategies for preventing flaky tests?

Paying attention and prioritizing flaky tests as they come up can be a good way to prevent them from becoming an issue. This is where a testing culture is important - if a flaky test is spotted by an engineer, it should be logged right away. This, however, takes a certain level of hygiene - BuildPulse can provide monitoring so flaky tests are caught right away.