Gui Testing

I may have missed it, but I've searched all over for a Wiki page devoted to testing user interfaces. It's a difficult problem in general, but with such enthusiasm for testing in the Wiki community, I'm sure a lot of people have thoughts about how to do it well. Anyone? (Recently some other thoughts on the topic have popped up on ThoughtfulReactionsUnitTests.) (For another nice article on testing java GUIs, see http://www.xp123.com/xplor/xp0001/index.shtml.)

The best method I know of is effective, but requires up-front work by the developers, and some test framework development by at least a Visual Basic level programmer. In practice, useful tests can be developed a half step behind the GUI developers and could be done earlier if anyone ever scheduled it like that. Unfortunately, what needs to happen is sufficiently specific to the technology/functionality involved that I don't know of a full commercial product. (Or maybe no one wants to buy or support a product that actually does it all?) But I haven't looked for a year+ : not since Q1 1997.

The standard cycle I have abstracted (and perhaps over-stated) from observation of multiple organizations goes:

0) GUIs are tested manually, often by the developers themselves. This is very unreliable and expensive. For new GUIs or those being significantly changed, quality is low, and failures at integration time or during user acceptance tests are common.

1) ScreenScraper/replay based GUI test techniques are adopted. In a few months, someone notices that these don't work, though they are beloved by managers looking for cheap solutions. The problem is that every time you change the screen layout all existing tests become useless, which means you have no regression tests. [See RegressionTesting.] Also, you can't write tests till the GUIs are finished, which means that the developers (often contractors) are unavailable to answer questions and the cost of writing the tests includes the cost of having the test authors reverse engineer the GUI.

2) Someone convinces the org that the methods that do work require the GUI tests to be written at "screen object level": e.g. in terms of send a message (or better, do an abstracted operation) to a certain control. For this to work, EVERY control must have a unique name, preferably meaningful. (NOT just button1 ... button89.) This requires up-front developer assistance and enforced "GUI coding standards" and forbids the use of some standard Wizards and other tools which uselessly rename all GUI controls every time you make a change to the visual screen layout.

3) For each screen (or better, UseCase) one can write a test script which exercises it. Use a commercial testing tool which allows writing these at the object/event level if possible. (Windows test modes are full of nasty special cases.) Structure the scripts with initialization, body, cleanup ...subscripts or sections which you can reuse later as appropriate to the GUI functionality you are testing. Each run of the test script should be independent of all others: you will often want to run a specific one of what will be many scripts.

4) However, for each such script there are MANY "test cases", representing different values/choices for the controls. If you just write the scripts, you will have an incredible number of scripts to maintain.

Instead, write the scripts generically, with named variables rather than literals and appropriate handling of optional (hence possibly missing) values. You will have a small number of scripts (maybe 1) per screen or task you are testing.

Define the test cases as rows in a database or flat file. MS Access works well for this: the "script runner" can be a VB program inside an Access DB which invokes the test script, passing in the values (or the test script can fetch the values itself....) Sometimes you store (control_name, value) tuples to define a test case, which avoids needing a new relational table/screen.

A real UsersManual? is very helpful here. If you already have one and the full toolset this process is building, you can start here before the GUI exists, filling in screen control names later.

Failing results should normally be logged and summarized (in another table) and possibly mailed to a contact/screen or GUI. You really do NOT want to have to only find one failure at a time. However, a single test script should typically terminate on first failure, as other problems will then cascade.

5) At this point, it is then a "simple" extension to create a test development environment, complete with versioning tests for different GUI versions, screens to automate setting up a new test/script, review & report test development status and semi-randomly choose values based on min/max meta-data for the controls, etc.

You now have a method for very efficiently developing GUI regression tests which can survive across many releases. [See RegressionTesting.] Of course, the whole process took longer than the original GUI project that asked for it. But later projects will benefit.

Of course, this being Windows, none of your tools will likely be under change control. Nor well documented.

At this point, the cycle usually continues:

6) The person developing the test environment falls in love with it and stops developing tests. The environment becomes increasingly Gothic, the MS Access infrastructure is prone to data and version lossage and the sheer volume of stuff you are dumping into MS Access starts to overwhelm it.

Meanwhile, other new hires (testing groups have high turnover) find it too hard/undocumented and long for a simpler method (or at least their method.)

The constraints on developing GUIs which fit the framework annoy new GUI developers, who complain to their managers that the coding rules are uselessly delaying their projects and forbidding their use of the most recent GUI technology. (Normally the test framework needs some support builtin for each class of screen control. If "overhead" expenditures for things like updating the framework are being controlled....)

7) A new senior manager comes in who think testing is fundamentally simple ("Just record the key clicks!") and doesn't see why he should be paying for all this VB development effort and support. Many of the GUI developers and some of the testing group finds the setup effort, support/coding constraints and learning curve excessive.

8) To "reduce overhead", the central testing group is dissolved, with each project told to buy some screen scrapers. The original test system developers leave (also lots of places for them to go: they have very valuable skills/experience). No one is maintaining the GUI regression test suites (see RegressionTesting) nor the environment/test apps which run them, which therefore become quickly obsolete....

9) A year+ later it is noticed that the GUIs have no automated tests and are of increasingly low quality and have long manual test cycles. A new manager decides to do something about it and we proceed to step 1.

Since senior management regard testing issues, especially for GUIs, as a technical detail beneath them, this cycle can continue as long as the organization remains solvent. The shift to Web based application architectures may represent an effective exit strategy. -- MarkSwanson

This whole piece is just delicious to read. Thanks! Couldn't agree more, you should write a book Mark... -- gb

Help help help!!

In my new project we are focusing early attention on the GUI, as it's absolutely central to the app. (I'm almost willing to say the GUI is the problem domain.) We're talking win32, here, and that great horrible beast called MFC.

How am I going to develop reasonable UnitTests? For that matter, how am I going to develop reasonable AcceptanceTests? I am quite concerned about this, as I've been imbibing too much XP juice (contains guarana [lots of caffiene] ), and I want to launch my new team into XP by exposing them to the joys of pre-testing code.

Here's a first pass, but it seems unsatisfactory. Please comment and theorize!

Add controllers: MFC (more correctly, Win32) tends to combine view and controller (and even model at times) into a single object, which makes it difficult to isolate the functionality. So replace in-situ control functionality with calls into an explicit controller. In essence, we add a layer of controllers between the windows and the model. Once this is done, we can test the controller-model functionality and the view-controller functionality independently.
Build/buy a minimalist layout system: A very important gui issue is the physical layout of the gui components on the screen, which will vary from box to box and even run to run. A simple layout system can be readily tested w/o having a user watch, because it will have constraints that can be asserted against.
Build/buy a robust multi-channel log: Ultimately, there's no way to get around writing scripts that simulate user interaction. This is not terribly difficult, and we should be able to buy the desired scripting software. The hard part is testing the results. A 'multi-channel' logging unit might simplify the testing process. The log has a number of channels, each of which can be written to independently, e.g. LOG(channel,text). A given acceptance test drives the program via script then compares the by-channel logs for the results. To develop a test, we turn on the script-capturer, interact with the program, then quit. Examine the logs manually and verify their correctness, then build an acceptance test that will re-run the script and compare the logs. The multiple channels are important, because we don't want to break existing script/log pairs just because we add function at another level.

So. Yech. Sounds like a lot of work. Worse, it sounds anti-XP because of all the thinking too far ahead I just did. Please help me figure out what's wrong with this picture-testing! -- MichaelHill

I have a similar question. I just met with one of our consultants who is doing some graphing/plotting software for a client. I was talking to him about the importance of tests, good test data, and so forth. I was emphasizing them along with their XP friends - refactoring, etc. In many ways, the XP method will work for him, but his function tests will have to be visually verified every single time. Won't they? Is there any other way?

This seems to be a harder problem than GUI testing in general, because often with GUI testing the layout isn't a huge concern; rather, you just want to verify the interactions between components, and that can be done by watching events and similar strategies. But for plotting and graphing, tiny details (individual pixels, even) are important. But you can't get that just from tracking drawing primitives, because optimization might change drawing strategies drastically while producing the same results.

I suppose he could do screen captures and compare them pixel-for-pixel ... is there an Extreme Programming challenge here? -- GlennVanderburg

How about this ...

Write AcceptanceTests that check whether the system's "analytical" features produce the right "vectors" of values to feed into the graphing tool. These can all be checked with the usual testing tools.

Write a small number of tests that convert provided constant vectors into pictures on the screen. Capture these and compare them bit for bit.

I've encountered QA departments with serious internal knowledge problems. One has to mentor them gently on some deep insights of software development. Like, save your test plan, because in a few months we'll give you the next release of the software and you won't want to write the test plan again from scratch.

The JDK 1.3 offers a special class for GuiTesting: http://java.sun.com/products/jdk/1.3/docs/api/java/awt/Robot.html. -- HelmutMerz BrokenLink, 06 Apr 2004

Hmm, Robot appears to be a ScreenScraper utility; it has mouseMove and mousePress methods, for example. So it probably is not as useful as it could be.

That's true, it's really only about native events. So it's just an API for the development of - platform independent - ScreenScraper utilities. -- HelmutMerz (slightly disappointed :-|)

I suppose that you could get it to do something cleverer without too much fuss: rather than remembering screen locations and putting mouse-clicks on them (and so on), one could do something like:

robot.mouseMove(myComponent.getX(), myComponent.getY());
robot.mousePress(InputEvent?.BUTTON1_MASK);
robot.mouseRelease(InputEvent?.BUTTON1_MASK);

which would make a good definition for a clickButtonOn(Component myComponent) method. Hmm. Well, it's a thought anyway.

Unfortunately, if you already have myComponent you can just call the doClick() method on it. I for one would like to do the GUI testing without having to keep references to each component. -- BrianRobinson

A cycle I've used a few times (not enough to know, but enough to smell the existence of something useful) involves the use of a WikiClone and some associated stuff. It goes something like this:

Clone a Wiki that pertains to the project at hand, and make sure the team and some stakeholders have access to it and know how to use it. I seed this with a few interesting index pages...UserProfileIndex?, UseCaseIndex?, ScenarioIndex?, ChoreographyIndex?, FeatureIndex?, DesignIndex?, and some others.
Write a handful of key UseCases. Usually more than two...ten is too many. I use a common form (in Wiki), and I end each name with 'UseCase' as a suffix (like PayPerViewUseCase?). Each agent in a UseCase has a user profile, like DomainExpertUserProfile?. Generally the community can get the UseCases and UserProfile?s put together reasonably soon, and the experience usually gets them fired up about Wiki as well. Also, the ability to adjust them is a huge win.
Write a few ScenarioIndex?'s for each UseCase, as you need them. Each describes how a user (referenced by UserProfile?) accomplishes the task.
Design a choreography. This is the flow through the application - although a good graphic designer or industrial designer can help, the choreography is more about behavior than appearance. I draw mine, using Visio. They look a lot like flowcharts. For websites, these sort-of correspond to screens. I hacked Wiki so that my users see the drawings (I turn them into GIFs or JPGs, then expand them inline), and I embed each in a page that includes wiki links to others. So the general theme is that you can walk through the application in Wiki. My teams like this because Wiki lets them annotate the things they like and don't like, and we can always change stuff. [A good resource for this kind of diagramming is JesseJamesGarrett?'s VisualVocabulary?: http://www.jjg.net/ia/visvocab/ - DinahSanders]
The boxes on a choreography end up being actions the application has to do. These turn into something like services.
Now, each scenario can get a drawing that shows several choreographies.
Somewhere between choreographies and scenario's, you can have the UI designer's sketch screen designs. These can be linked to the corresponding choreography or scenario.
The QA team can use all this to develop test suites, generally driven by the UseCases. They can walk through the choreographies and scenarios, looking for special things to exercise.
The QA team and stakeholders can begin walkthroughs, making sure that the app is solving the right problem. Sometimes applications that work precisely as designed fail because they were designed to solve the wrong problem.
The test scripts and tests themselves, along with expected results, notes, and all sorts of useful material, can be developed in or closely integrated with Wiki. I've found that this helps preserve organizational memory and learnings.
Finally, as parts start to be finished, everybody can try them out and exercise them along the way. Developers can exercise their code with the test scripts. Developers and testers can repeat tests. Etc.
When everybody says its finished, there is a reasonable (not perfect, but reasonable) way of supporting that assertion.

As I say, I've done this on about four projects. I'm sure there's lots of cool stuff to do here, someday maybe I'll even get to work more on it. The trouble, as always, is that there is always more code to write...I never get to follow up on these cool things. Wiki is awesome!

-- TomStambaugh

I have a short history of GUI automation and a set of useful links in a Web Watch article I wrote for Software Testing and Quality Engineering Magazine. http://www.stqemagazine.com/webinfo_detail.asp?id=102 [BrokenLink?, 06 Apr 2004] The page is automatically generated from who-knows-what, so it's pretty ugly. Sorry about that.

-- BrianMarick

Actually, it is not all that bad. We had been trying for nearly two years to find value in the screen-scraping style of GUITesting. We found that:

The automated tools were fairly expensive and required a completely different set of skills that the developers had, so
only a few persons were assigned to create GUI UnitTests, and
they couldn't keep up with the GUI developers, and
the tests they produced were very brittle, breaking whenever simple changes were made to the GUIs (this may have been their inexperience), so
the tests produced were essentially useless and the testers got discouraged and quit.

The result was that the only UnitTests we had were on the non-GUI portions of the code. All the business logic resides there, so that was not too bad but the GUIs became more and more bloated. When we got into PerformanceTesting?, we found that some of the GUIs were horrendously slow with large amounts of data. Some we could modify reasonably easily, but one particularly egregious GUI was so bad that nobody knew how it worked.

Finally, we followed a suggestion by MartinFowler to write a test for the GUI by hand, using the same framework and techniques used in our other UnitTests. Once we had written a reasonable library of GUI manipulation methods (which took just a few days) implementing the actual test went very quickly, and we were able to refactor the GUI, cutting its code size down by nearly two-thirds (writing the tests and refactoring took three weeks), followed by three days to speed up the code so that operations which used to take 6 minutes now take about 2 seconds.

My advice - just wade in and start writing those tests. -- RussellGold

Can you provide example methods from your library of GuiManipulationMethods?? Are they like the AWT Robot calls above? Do you eventually wind up rendering the GUI to a bitmap and compare it to an expected bitmap?

Here is an example test, which adds a subtree of business nodes to a structure represented by a JTree, saves it, deletes the top of the subtree, and finally confirms that all of the subnodes below are removed as well:

public void testNodeTreeDelete() throws Exception {
clickOn( _tree, getEntityPath( _nodeRoot ) );
DefaultMutableTreeNode? rootNode = (DefaultMutableTreeNode?) _tree.getLeadSelectionPath().getLastPathComponent();
int numChildren = rootNode.getChildCount();

selectPopupMenu( _tree, getEntityPath( _nodeRoot ), new String[] { "New Node" } );
replaceText( EnterpriseModelerFrame?.BUSINESS_NODE_NAME, "NodeT1" );
selectComboBoxValue( EnterpriseModelerFrame?.BUSINESS_NODE_STATUS, "Active" );
Object nodeT1 = getSelectedObject();

selectTreePopupMenu( nodeT1, "Copy" );
replaceText( EnterpriseModelerFrame?.BUSINESS_NODE_NAME, "NodeT1A" );
selectComboBoxValue( EnterpriseModelerFrame?.BUSINESS_NODE_STATUS, "Active" );
Object nodeT1A = getSelectedObject();
dragAndDrop( nodeT1A, nodeT1 );

selectPopupMenu( _tree, getEntityPath( nodeT1 ), new String[] { "Delete Family" }, new ConfirmationDialogHandler?() );

if (rootNode.getChildCount() != numChildren) reportFailure( "Wrong number of children on root node." );
}

This approach relies on the underlying entities, which are tested elsewhere, and on the Swing classes themselves, which are presumed to be reliable. We expect our Frames to implement an interface which returns a control by name. We absolutely do not use bit-maps, which would be very fragile. These tests can actually be written, like all good UnitTests, before the code which implements them. In some cases, we can check the color or size by fetching the appropriate renderer and asking for its color, size, position, etc.

Some of the library methods used include:

selectTreePopupMenu: pops up a menu on the specified tree node and selects the named menu item
clickOn: does a mouse-down on a GUI element, or a path in a JTree
dragAndDrop: drags one JTree node onto another
replaceText: selects the text in a control and types the specified text in its place
selectMenu: selects a menu item from the menu bar
selectComboBoxValue: selects a named value from a combo box

We have others, not shown here, and we develop new ones as needed. -- RussellGold

Some various GUI testing resources:

There are good tools available for doing RegressionTesting on GUI applications. I've had some good experiences with SQA Robot and Microsoft Test. They have scripting languages, and can test field values or do bitmap comparisons, as you wish.

SQA Robot, by SQA Software - bought by Rational. See http://www.rational.com/products/sqa_load/prodinfo/robot.jtmpl (BrokenLink as of 2003-05-07. Apparently SQA Robot is now called RationalRobot. See http://www.rational.com/products/robot/index.jsp) and http://www.rational.com/products/visual_test/index.jtmpl (See also the Saint Louis SQA user group at http://www.midwest-st-lab.com/Usersgroup.html)

For Java users, there are several projects which extend JUnit to test Java GUIs.

http://sourceforge.net/projects/abbot http://jemmy.netbeans.org http://sourceforge.net/projects/jfcunit http://sourceforge.net/projects/pounder

JFCUnit, Jemmy, and Abbot provide an extended library of robot-like functions similar to those described above (Button.click, Tree.selectNode, etc.). Abbot and Pounder also provide recording/playback and scripted tests which are useful for acceptance testing.

There is also a list for discussing these types of issues at http://groups.yahoo.com/group/java-gui-testing.

Avignon (http://www.nolacom.com/avignon/index.asp) helps you develop your own language for customer tests and allows you to develop your own level of GUI testing.

See http://resolute.teradyne.com/prods/sst/product_center/testexec.html for a good list of GUI test tools (BrokenLink).

See also MarathonMan.

An idea...

GUI development is difficult to test ... therefore, DoTheSimplestThingThatCouldPossiblyWork (and Test) and avoiding writing complicated widgets. I don't mean don't use the latest UI controls, just remove complication internally. Move that to other classes.

You can minimize complication in widgets by inverting the relationship between widgets and the underlying mechanics. Instead of having the UI drive the application, have the application drive the UI. That is, consider the UI a tool for the model/engine to communicate to the user, not for the user to communicate to the model/engine. ModelViewController helps immensely.

Take this concept right into the Controller. Break it into separate TightGroupsOfClasses. One that interacts with the operating system, that receives events for example. Another that contains a FiniteStateMachine that reacts to your OS wrapper class.

MFC and Win32 MakesMeCry. AppWizard doesn't help because it likes putting the entire model inside the UI. The Doc/View architecture really isn't. Why do CDocument subclasses receive Windows events?!

Therefore, just do it by hand, or if you can't, gut out the classes that AppWizard generates. Put everything into separate classes that MFC call.

You'll get lightweight UI controls you can easily test, even without UnitTests if you can't figure out how to write them for every possible user event. And you can easily test the non-OS driven classes using UnitTests. -- SunirShah

[This is useless advice because you're basically saying don't test the GUI, test the logic. Which is fine, but I was assuming that people were already separating this code out (Anyone who's professionally writing GUIs should be fired if they don't understand and use MVC. Of course, I don't think anyone sane should be using MFC, either...). The question at hand is unit testing GUIs per se, and I really don't see a correct way to do this. How do you test that a tooltip is shown at the correct time and in the correct place? That it has correct text? How about things like flicker and graphic corruption from resizing, especially with custom-built widgets? I don't think there's any way to solve these sort of problems.]

I think the article "Integrated, Effective Test Design and Automation" by Edward Kit (Software Development, February 1999) provides an XP-like approach to GUI testing. The idea is to use the scripting abilities of the testing tool. Kit argues that Capture / Playback creates scripts that are non-structured, non-commented and virtually non-maintainable. Thus, engineers should create test scripts by hand.

If the scripting language is easy to use, it should be easy to add tests "on the fly". -- AlbertBrandl

Perhaps some of the current discussion could be ReFactored out into the GuiTestingPatterns page.

These all sound like discussions of how to test an existing GUI. How do you CodeUnitTestFirst for GUIs? -- EddieDeyo

To CodeUnitTestFirst for GUIs, read ThenDontCallMainLoop.

I thought of an Idea of an EFFECTIVE GUI TESTING FRAMEWORK:

Why not using the power of AOP (AspectOrientedProgramming) for GUI tests?

It will look like that: (This is Pseudo - AspectJ. I don't know it - yet. I have written GUI in Delphi, but never in Java)

Aspet Test.testAddOrder {

  int step = 0; // count steps during the test

  after: class FormOrder?.onOpen() {
        if (step == 4) {
        btnDetails.click();
        }
        if (step == 8) 
        btnClose.click();
  }

  after: class FormOrderDetails?.onOpen() {
        if (step == 5) {
        editOrderNumber.text = "ED1723-74"; // an edit box
        btnSearch.click();
        AssertTrue?(TableOrder?.get("OrderNo?").equals("ED1723-74")); // suppose some aspect exoses DB easily
        editIRR.text = "0.88";
        btnUpdate.click();
        AssertTrue?(MessageBox.isActive(Message.ERROR)); // some framework aid to indentify error dialog
        Dialog.btnOk.click(); // close dialog
        editIRR.text = "... // and so on  
        }
  }

}

thru AspectJay you can check that all buttons named "btnAddRecordXXX" indeed add record to a table named XXX

ideas:

the step tracking could be smarter
no gui must be displayed => test is Swift! (aspect for all forms to be hidden)
assumption: standard gui classes work and need no test.
strict naming scheme required, or FindButton?(String name) functions could be written in the frameworks
the test must use presentation layer classes only. Changes to Database will be available by some specific DBRead class.

the good:

fast
easy
robust to Gui design changes (I think button name is more robust than finding button label);

the bad:

???

If I get some positive feedback that this is innovative and useful idea, I'll develop it as my Senior Project. I need your feedback. Thanks! -- Lior Bar-On

[RefactorMe: I'm curious how this can be applied to HTML UIs. I want to introduce ExtremeProgramming into a web project. Has anyone done this for an existing project? a new one? -- AndrewKrisThompson]

See WebTesting.

I haven't had much success in using TestDrivenDevelopment when GuiTestingGameApps. Perhaps this is one of those instances when this method doesn't work.

E-search for TomPlunket; he seems to be doing the polygon mesh layers test-first. You could ask him this question on news:comp.object, for example. -- PhlIp

See the extensions that I'm building into FIT for GUI testing - JohnGoodsen. Remember FIT is at http://fit.c2.com.

I found a good link with GUITesting references (payware, freeware, Java-only, etc.)

http://www.testingfaqs.org/t-gui.html

The only good way of testing GUIs I know requires the GUI to be written, in two parts. The front part is visual only, it contains the interface and it produces a string of text commands which gets sent to the back end, which processes them, and returns text responses. Each individual half is easy to test.

To test the front end, push the buttons in the right order and check that the right text file is generated. It's manual, but it's quick, and doesn't have to be done often. And the actual checking can just about be done with diff.

To test the back end, feed in a text file, and check what happens. The text file can be generated with the front end, by hand, or a combination of both. This can be completely automated.

The problem is that your GUI developers have to have enough discipline to keep everything that can break in the back end. I've seen it done, if only when the text version came first and the front end was a bolt-on. Otherwise, it's always too tempting to put the logic into the front end. Regards, --BenAveling

See TestFirstUserInterfaces UserInterface

CategoryTesting CategoryUserInterface