Are GUI tests flaky by their nature?

This is a translation of the following post, written in Portuguese in my WordPress blog. By Walmyr.

Walmyr Filho
6 min readOct 20, 2016

How many times have you heard that GUI (graphical user interface) tests are flaky, fragile, or unreliable?

I have been researching for some time before writing this post. These are my findings and also some advice about how to deal with flaky tests.

However, before starting I would like to share a sentence from the book Continuous Delivery, from Jez Humble and Dave Farley, which I consider appropriate:

By following best practices and using the appropriate tools, you can dramatically reduce the cost of creating and maintaining automated acceptance testing, to the point that the benefits outweigh the costs.

Ok, now we can start.

According to an article on Google Testing Blog, even Google suffers from flaky tests. Either for reasons related to concurrency, or by depending on undefined or non-deterministic behaviors, or because of third parties’ fragile code, or because of infrastructure issues, among others.

However, there are some interesting advises in this same article that are suggested: find flaky tests, put them in quarantine for further investigation in order to make them more robust and reliable; and, re-run only the failed tests to determine if such tests are really flaky.

Despite this is not the perfect solution, at least it helps in reducing the time investigating tests that could be failing as false negatives, ensuring quick feedback after the tests are executed.

Another article, also from Google Testing Blog, demonstrates issues related to GUI tests on theory versus practice, that I disagree with some points, which I will explain soon.

It is suggested that, in theory, end-to-end tests are good for: developers, because the task of writing them is the responsibility of others (usually the testers); managers and decision makers, because tests that simulate real-world user scenarios help on determining the impact in case of failure; and testers, as they are concerned in not letting bugs pass, in addition of concerning about writing tests that check for the behavior of the “real world”.

In the same article, the practical insight into end-to-end testing is that makes me disagree with some points, as they try to demonstrate the weaknesses of such tests in a scenario where other approaches could be taken in order to better mitigate the problems.

In the example mentioned in the article, initially the tests fail due to a real failure in the application, where the sign in functionallity is not working, and since the vast majority of test scenarios depend on signing in, this vast majority fail. Here the point is, create tests as independent as possible.

In this case, a better approach, in my opinion, would be an earlier stage in the pipeline where just a smoke test of UI tests would be executed, and in this suite would run successuful sign in test case, however, the feedback in that test suite would be much quicker and would not compromise the rest of the tests, or the vast majority of tests. Besides, there was really a failure in the application, the sign in functionallity wasn’t working.

Another point which I disagree when “they say bad things” about end-to-end testing is demonstrated when flaky tests are identified only in the last set of tests, in other words, until then, problems really exist, whether in the application, or in a third-party application that the main application dependens on, or even in the infrastructure where such application was running, that is, such tests were not failing at random, or at least they were not the only ones to blame for the failures.

However, some interesting points to consider are: finding the root cause of an end-to-end test that is failing is not that simple; A test that is failing is not benefit to the end user, until the bug is fixed; end-to-end tests are slow when compared to unit tests or (some) integration tests.

So, this second article suggests to follow the so-called pyramid of tests, suggestion which I fully agree, as a way to “escape” of flaky tests, as unit and integration tests provides faster feedback after changes in the application, and also ease finding problems, since such tests exercise a smaller scope of the application as compared to end-to-end tests, which exercise the application as a whole.

However, there are ways to create end-to-end tests faster and more robust, such as running tests in parallel, and using good test development practices. We can not trick ourselves and think that there won’t have fragile tests, it is up to us identify them and work to improve them.

On this article, called Front-End Testing Done Right, I would like to raise some interesting points:

  1. Many developers do not consider the creation of automated tests as their responsibility. In this context, I always remember a phrase from Klaus Wuestefeld, in which he says: “Manually test software in the 21st century is unethical, it’s as a surgeon operating with unwashed hands”. In other words, is the developer’s ethical responsibility to create automated tests.
  2. If writing automated tests seems difficult for developers, something is wrong. Or they do not know how to use the test framework, or it is written incorrectly. Here the message is: “We all have to learn. All the time. That’s how it is.”
  3. “An unreliable test is not a test.” In this case, the best thing to do is to delete it (remember the possibility of putting it in quarantine to make it robust).
  4. As already mentioned, consider running tests in parallel to reduce the feedback loop.
  5. Identify non-deterministic tests and find ways to test them reliably, such as: if a test depends on certain data in the application for it to run, ensure that routines for writing in the database are executed, for example, that such data will be there when the tests are run. Or use some other way of data injection in the application, but do it.
  6. Wait for the correct state of the application (i.e.: instead of putting “sleeps” in your tests, use validations that such elements are actually available before trying to interact with them or make verifications).
  7. Identify elements by their IDs whenever possible and use Page Objects. IDs must be unique and ensure that your tests are interacting with the correct elements, so whenever an ID is available for an element which a test needs to interact or do a verification, do not hesitate to use it. And use Page Objects for purposes of respecting the DRY principle (don’t repeat yourself), this ensures easy maintenance of tests.
  8. Do not test everything through the GUI. As already mentioned about the pyramid of tests, identify what is the best layer to test different features of the application, sometimes a unit test is already enough to test certain functionality and it avoids creating flaky tests.
  9. Tests must be implemented regardless of environments. Try to write tests that can be run in a local development environment, a QA environment and even in a production environment, if possible. This reduces the incidence of flaky tests in an environment and reliable in another.

Finally, I would like to leave a final message mixed with the final message of this other article, called Why system tests can’t replace unit tests. End to end tests and unit tests do not exist for the same reason, despite both exist to provide fast feedback after changes in the application. They actually complement each other, so worry about writing them, and just right. While unit tests will help you with feedback in seconds and easy debugging when problems are found, GUI tests dramatically reduce the time and necessity for manual testing, which are subject to human error.

As further reading I suggest the following posts:

The Forgotten Layer of the Test Automation Pyramid

Introducing the Software Testing Cupcake (Anti-Pattern)

If you liked this post, clap, share it, or leave a comment!

See you in the next post, and good testing!

--

--

Walmyr Filho
Walmyr Filho

Written by Walmyr Filho

QA Engineer, clean coder, blogger, writer, YouTuber, online teacher, mentor, member of technology communities and passionate for good music 🎶 and skateboarding

Responses (1)