How does one automate programs with GUIs?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > How does one automate programs with GUIs?

yigh: Jan 3, 2021

I've used a couple different solutions to try to automate the mundane tasks in my day-to-day.

I've found macro recorders and coordinate-based programs to be very brittle.

I've also difficulty in writing programs that understand that UI and the compositions, in the same way that browser automation solutions like Selenium can parse through HTML and latch onto various page properties.

I checked out the windows-side COM APIs with Window Spy / Spy++ and a LOT of programs fundamentally are just one wrapper over the actual content of the program (e.g. container around a chrome browser window).

I am on the cusp of thinking the best way (non-brittle) solution is screen-capture based methods (take screenshot, interpret it in the program, determine a click coordinate, proceed with rest of program).

Anyone know of other good approaches?

Have a listen: https://sptfy.com/sineapple

# ? Jan 3, 2021 12:37

Adbot: ADBOT LOVES YOU

# ? Apr 27, 2024 17:31

Bruegels Fuckbooks: Sep 14, 2004; Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

Generally, analog automation is limited - e.g. the big problem with macro recorders / screenshot based utilities is "what if the resolution changes, what if the user is using a different OS and that has different button sizes." Screencapture based methods generally suck and are brittle - you can do stuff like try to define the screen capture so it only captures the region of interest, but it's a pain in the rear end.

The quickest and dirtiest way of automating windows UI is to get out a program like autohotkey or autoit. Native UIs will generally have something like a "controlid" or a "classname" for stuff like buttons in windows (and you may be able to see that in spy++), so rather than using coordinates, you'll be able to do stuff like "wait for window to exist containing a certain classid, click on button with certain classid" using the autohotkey script itself. If you recognize where the controls are by class, and position the cursor using this object recognition, it will be much more durable than using screencapture based methods. Similarly, verifying results by using screenshots is usually a bad idea and should only be done if there is no other option.

Under the hood, many native applications windows support MSAA (https://en.wikipedia.org/wiki/Microsoft_Active_Accessibility / UIA for .NET (https://en.wikipedia.org/wiki/Microsoft_UI_Automation). These apis are intended to help stuff like screen reading software, etc. For whatever app you're trying to automate, automation decision tree is something like:

a) Does the app have a COM interface or CLI interface that I can use without loving with UI automation?
b) Does app support MSAA / MUI and have class names for the objects?
c) gently caress it, are coordinates good enough, or do I want to do some kind of other hack?

Vast majority of apps fall under a) or b).

# ? Jan 4, 2021 15:15

Rocko Bonaparte: Mar 12, 2002; Every day is Friday!

There's layers to UI testing. There's going to be plenty of tests where you just want the UI to trigger your actions instead of testing the actions in isolation, and that's fine. In those cases, you want to seek out the accessibility API for the GUI framework and use that to perform the actions. "Accessibility" here is talking about augmentations you can make to your user experience to make the application accessible by people with disabilities. Treat your tests like use cases from a blind, deaf, quadriplegic. Coincidentally, the computer running the tests will be a blind, dead, quadriplegic. As a bonus, your application can not be used not only by a potential handful of blind/deaf people, but a larger group that tries to lean on those APIs when possible because their vision is particularly poo poo.

GUI automation tools like AutoIt and Microsoft's UI Automation have all kinds of quirks in them. They're all extremely annoying. You want to avoid actual screen real estate commands ("click at 100, 350") and rather invoke on specific GUI components. Even with this, an unexpected dialog can tank the whole thing, or the command just kind of hangs, or doesn't register. You would treat these like integration tests with a separate test bench that kills that poo poo if it hangs up. Expect false failures. Somebody can call me out on those failures being false, but I've deal with plenty of off-the-shelf applications that will just glitch like that 1 in 1,000 times. On the other hand, you can tank that 1/1000 occurrence but you probably want your testing tool to warn on it and retry, only to error after X retries do the same thing. The machine on which you test should also be pretty clean so system popups and other junk don't get in the way. You can argue that this is not realistic testing, but the machine is focused exclusively on the UI, and you shore up the rest elsewhere.

At some point, you're talking about testing as if a person is literally there with fingers and that's a whole other can of worms. You really only need to worry about this when you're making particularly custom controls. Even then, if they are plumbing into the GUI framework properly, those GUI automation frameworks that can invoke on controls should just take care of it. If all the tests using your control just vomit down the report, then you know the control died.

# ? Jan 15, 2021 20:57

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > How does one automate programs with GUIs?