Designing a GUI for Automated Testing

The Problem and Its Solutions

If we were living in a perfect world, the business logic would be separated from the presentation layer. Since Rave sits atop a rich GUI, where event handlers can execute arbitrary code, there exists a strong temptation to put business logic in the presentation layer. The fact that we code both parts using the same language (C++) makes this temptation doubly hard to resist. Indeed, sometimes a clear cut separation doesn’t exist. So we shouldn’t find it at all surprising that our founding coders may not have kept up a wall of separation between GUI and Business Logic.

Let’s walk through an example that I have adapted from Martin Fowler’s post on GUI Architectures.

Suppose we have a system requirement that the GUI must display a red box when a seat is disconnected, a green box when connected, and a yellow box for a slow connection. Suppose further that the application already uses an integer to represent the connection state, and presents it through the function linkRate(). It ranges from 0Mbs (disconnected) to 1000Mbs (full connection), taking various intermediate values depending on a measured traffic rate (not just the OS ethernet link state). The green box represents any measured rate above 700Mbs, while the red box represents any rate below 5Mbs.

Where should the logic for choosing the box color reside?
Where should the listing of boundary values for each category reside?
Where should the listing of colors reside?
If you had to write tests to prove your solution worked, would you change your mind on where to place that code?

Logic Placement Description
GUI The GUI contains all the smarts. It reads the value of linkRate() from the application and then performs its own calculation to determine the color.
Shared The GUI and application share responsibility. The application provides a linkRateState() that presents an enum which the GUI then maps to a color.
Application The application contains all the smarts. It does the heavy lifting and provides a linkRateColor() method that tells a really dumb GUI what color to show.

For some variation of value preferences, all of the above can be reasonable decisions. I have a bias towards testing making the GUI as easy to test as possible. You might think that idealism would incline me to favor the dumbest GUI possible, and you’d be right in most cases, but I want to draw out some reasons to make an exception for this example.

The Case for the Dumb GUI

Mostly I want a dumb GUI because testing it is very hard. To test the GUI, I must launch it within a harness that intermediates all the events, introducing programmability to events like clicks, drags, and keyboard presses. The harness requires a full simulation of the application, including connections to external services (database, file system, SCUs, etc). Finally, at least with squish and RAVE, the test scripts execute at glacially slow human speeds, sleeping for entire seconds to allow for menu animations and other GUI renderings.

Having the dumbest possible GUI would mean having a presentation that so incredibly lightweight that it would be very improbable to get wrong. When the application tells it what color to show, the GUI has very little opportunity to do wrong. The mapping logic of linkRateColor() would have a unit test in the application, ensuring conformance to system requirements. With a thin enough GUI, I wouldn’t care that it didn’t have automated tests.
But placing linkRateColor() in the application muddies the purity of the application. For now it must always and forever include a link to whatever library provides QColor. I can no longer build the application without some GUI library. If I want to re-use that component, I drag the dependency along with it. And, finally, no part of the application actually uses linkRateColor(), it only exists to support the GUI.

The Case for the Shared Responsibility

I have nothing to say here but “eeewww gross”. Unless there is an application-side consumer for the linkRateState() it’s not worth coupling the GUI and application with such a specific API. Should the specification change the details about the boundary values between colors, then both the GUI and the application will need updating. We shouldn’t use designs that increase our maintenance overhead.

The Case for the Smarter GUI

If the application never has a need to know the boundary values for the different link rates, then we can assume that those values represent a specific presentation requirement. Given that circumstance, I favor placing the logic into the presentation layer.

Yes, this solution increases the GUI’s responsibilities, making it more complicated to test. Counter-intuitively, the increased testing difficulty of a more complex GUI has pushed me to advocate for making the GUI a stand-alone piece. The situation just serves to make my next point, about a stricter separation between presentation and application logic stronger.

The Wall of Separation (between GUI and Application Logic)

In an ideal world, the business logic carried out by the application and the presentation logic carried out by the GUI remain strictly separated. So separated, that we can pull apart the two pieces and test them separately. We can even build a second GUI (for a new customer) without impacting the underlying application. With this separation, the application acts as a data Model while the GUI(s) merely present a View of that data.
For testability purposes, let’s pull apart the two pieces an envision a wall between them. The only communication link through that wall is an API, depicted as a network socket. The application (network server/data model) responds only to specific messages (requests for and updates to data) sent over the socket. It keeps the GUI informed about changes by emitting other messages (events).

app_vs_gui

The clear separation between application and GUI serves dual purposes:

  1. It makes us think harder about which piece (GUI or app) should receive new logic
  2. It allows tests for each piece to remain laser-focused on that piece without getting distracted by the other parts of the system.

Testing the Application

To test the application, we simply fake the GUI. Because of the separation I’ve made here, it amounts to just implementing a network client that generates a sequence of data updates or requests, and asserts that it receives expected data-update events and delivery of requested data. In a different world (the real one), where the API exits as method calls instead of a network socket, we create a headless driver, that makes the calls and receives the events. Even more granularity can be achieved by single-stepping the event loop (when that makes sense), to assert that certain events do NOT occur.
In our example, we have the fake GUI assert that it receives a linkRateChanged() event, after the test, using an internal update function, modifies the linkRate variable. If the GUI can set the linkRate, then we can also test round-trip in 3 steps:

  1. Have the GUI send the update data request
  2. Step the application event loop
  3. Assert that the fake GUI receives the expected data changed event.

In both circumstances, we assert that the application generates events according to a specified protocol. With a large enough suite of individualized tests, we cover the application’s behavior for all the actions the GUI can take. When we miss an action, we simply record it in a new test as an expected event/response sequence.

Testing the GUI

To test the GUI, we simply fake the application. A test harness drives the GUI from one end, clicking and dragging on widgets and buttons, while the application that it links to provides a pre-programmed series of responses. If we generate the GUI events directly, e.g. by calling on the event handlers for specific widgets, we can even drive the GUI in a headless environment (by virtualizing X11). The tests remain focused on accuracy of presentation.
In our example, we have the fake application emit a linkRatechanged() changed event, and assert that the GUI updates the color according to presentation requirements. If the GUI can set the linkRate, then we can also test round-trip, using a similar 3 steps:

  1. Drive the GUI to go through the update link rate dialogs/widgets.
  2. Assert that the updateLinkRate() event is received by the fake application and respond with a pre-programmed linkRateChanged() event
  3. Assert that the real GUI updates the rendered color.

In both circumstances, we assert that the GUI performs renderings according to the events it receives from the fake application. Again, a large enough suite of individualized test covers the presentation layer for all data states the application can take. We still record any missed behaviors into a new test taking the form of an expected event/render sequence.

What the Wall of Separation Achieves

Separating the GUI from the application, and treating it as a View or presentation layer only (with the application taking the role of a data model) gives use the ability to separately each pieces. The wall itself represents an expected set of behaviors to command/response stimuli. In ordinary implementation we have direct C++ API calls, but that just muddies the idealized separation, and motivated me to start out with a network messaging description. Conceptually, testing the GUI can be approached with the same techniques as testing a client/server implementing a network protocol. If we clearly state the expected behavior, then each side of the fence merely has to uphold it’s end of the protocol.

Yes, the separation probably means more tests. But, those tests will be smaller, faster to execute, and easier to write and maintain. When we do perform whole-system testing (which is always rarely relative to the automated protocol testing, because of the costs involved), it will catch use-cases of the interaction not already covered by the piece-wise tests. However, a record of the command/response sequence in each failing whole-system use-case, can be rolled backing into separate piece-wise automated tests, one for each side.

Ultimately, our goal is to catch bugs earlier by exercising the behavior protocol of each side separately. By working toward that goal in this way, we can also ensure that we meet our system requirements by encoding them into automated behavior tests exercised against both sides of the wall.