Selenium + AT-SPI = GUI Testing

At KDE we have multiple levels of quality assurance ranging from various degrees of a humans testing features to fully automated testing. Indeed automated testing is incredibly important for the continued quality of our software. A big corner stone of our testing strategy are so called unit tests, they test a specific piece of our software for its behavior in isolation. But for many aspects of our software we need a much higher level view, testing pieces of Plasma’s application launcher in isolation is all good and well but that won’t tell us if the entire UI can be easily navigated using the keyboard. For this type of test we require a different testing approach altogether. A couple months ago I’ve set set out to create a testing framework for this use case and I’m glad to say that it has matured enough to be used for writing tests. I’d like to walk you through the technical building blocks and a simple example.

Let us start of by looking at the architecture at large. So… there’s Selenium which is an incredibly popular, albeit web-oriented, testing framework. Its main advantages for us are its popularity and that it sports a server-client split. This means we can leverage the existing client tooling available for Selenium without having to write anything ourselves, we only need to grow a server. The server component, called a WebDriver, implements the actual interaction with UI elements and is generic enough to also apply to desktop applications. Indeed so thought others as well: there already exists Appium – it extends Selenium with more app-specific features and behaviors. Something for us to build upon. The clients meanwhile are completely separate and talk to the WebDriver over a well defined JSON REST protocol, meaning we can reuse the existing clients without having to write anything ourselves. They are available in a multitude of programming languages, and who knows maybe we’ll eventually get one for writing Selenium tests in QML 😉

That of course doesn’t explain how GUI testing can work with this on Linux. Enter: AT-SPI. AT-SPI is an accessibility API and pretty much the standard accessibility system for use on Linux. Obviously its primary use is assistive technologies, like the screen reader Orca, but to do its job it essentially offers a toolkit-independent way of introspecting and interacting with GUI applications. This then gives us a way to implement a WebDriver without caring about the toolkit or app specifics. As long as the app supports AT-SPI, which all Qt apps do implicitly, we can test it.

Since all the client tooling is independent of the server all we needed to get GUI testing going was a WebDriver that talks to AT-SPI.

That is what I set out to write and I’m happy to announce that we now have an AT-SPI based WebDriver, and the first tests are popping into existence already. There is also lovely documentation to hold onto.

So, without further ado. Let us write a simple test. Since the documentation already writes one in Python I’ll use Ruby this time around so we have some examples of different languages. A simple candidate is KInfoCenter. We can test its search functionality with a couple of lines of code.

First we need to install selenium-webdriver-at-spi, clone it, cmake build it, and cmake install it. You’ll also need to install the relevant client libraries. For ruby that’s simply running gem install appium_lib.

Then we can start with writing our test. We will need some boilerplate setup logic. This is more or less the same for every test. For more details on the driver setup you may also check the wiki page.

  def setup
    @appium_driver = Appium::Driver.new(
      {
        'caps' => { app: 'org.kde.kinfocenter.desktop' },
        'appium_lib' => {
          server_url: 'http://127.0.0.1:4723',
          wait_timeout: 10,
          wait_interval: 0.5
        }
      }, true
    )
    @driver = @appium_driver.start_driver
  end

The driver will take care of starting the correct application and make sure that it is actually running correctly. Next we’ll write the actual test. Let’s test the search. The first order of business is using a tool called Accerciser to inspect the AT-SPI presentation of the application. For more information on how to use this tool please refer to the wiki. Using Accerciser I’ve located the search field and learned that it is called ‘Search’. So, let’s locate it and activate it, search for the CPU module:

  def test_search
    search = driver.find_element(:name, 'Search')
    search.click
    search.send_keys('cpu')

Next let us find the CPU list item and activate it:

    cpu = driver.find_element(:class_name, '[list item | CPU]')
    assert(cpu.displayed?)
    cpu.click

And finally let’s assert that the page was actually activated:

    cpu_tab = driver.find_element(:class_name, '[page tab | CPU]')
    assert(cpu_tab.displayed?)

To run the complete test we can use the run wrapper: selenium-webdriver-at-spi-run ./kinfocentertest.rb (mind that it needs to be +x). If all has gone well we should get a successful test.

Finished in 1.345276s, 0.7433 runs/s, 1.4867 assertions/s.

1 runs, 2 assertions, 0 failures, 0 errors, 0 skips
I, [2022-12-14T13:13:53.508516 #154338]  INFO -- : tests done
I, [2022-12-14T13:13:53.508583 #154338]  INFO -- : run.rb exiting true

This should get you started with writing a test for your application! I’ll gladly help and review your forthcoming tests.
For more detailed documentation check out the writing-tests wiki page as well as the appium command reference.

Of course the work is not done. selenium-webdriver-at-spi is very much still a work in progress and I’d be glad for others to help add features as they become needed. The gitlab project is the place for that. ❤

The complete code of the example above:

#!/usr/bin/env ruby
# frozen_string_literal: true

# SPDX-License-Identifier: GPL-2.0-only OR GPL-3.0-only OR LicenseRef-KDE-Accepted-GPL
# SPDX-FileCopyrightText: 2022 Harald Sitter <sitter@kde.org>

require 'appium_lib'
require 'minitest/autorun'

class TestKInfoCenter < Minitest::Test
  attr_reader :driver

  def setup
    @appium_driver = Appium::Driver.new(
      {
        'caps' => { app: 'org.kde.kinfocenter.desktop' },
        'appium_lib' => {
          server_url: 'http://127.0.0.1:4723',
          wait_timeout: 10,
          wait_interval: 0.5
        }
      }, true
    )
    @driver = @appium_driver.start_driver
  end

  def teardown
    driver.quit
  end

  def test_search
    search = driver.find_element(:name, 'Search')
    search.click
    search.send_keys('cpu')

    cpu = driver.find_element(:class_name, '[list item | CPU]')
    assert(cpu.displayed?)
    cpu.click

    cpu_tab = driver.find_element(:class_name, '[page tab | CPU]')
    assert(cpu_tab.displayed?)
  end
end
Advertisement

3 thoughts on “Selenium + AT-SPI = GUI Testing

  1. Hi. I would love to try this but I’m failing so far. Could you please provide some more info on the general setup, like how you use appium?

    I’ve tried to get this running a few times and failed.

    I have NodeJS LTS and Appium v1.22.3 installed. I start it with this command:
    appium -pa / –platform-name fake

    I tried to run both the python and ruby examples (Kubuntu 22.10 with and without backports). Specifying the desktop name and pid give the following error:

    selenium.common.exceptions.WebDriverException: Message: An unknown server-side error occurred while processing the command. Original error: ENOENT: no such file or directory, open ‘org.kde.kcalc.desktop’

    Specifying the path of the bin to be tested like this
    desired_caps[“app”] = “/usr/bin/kcalc”
    results in XML parsing errors.

    Any tips would be appreciated.

    Regards,
    John

  2. There’re just few more steps to make. To introduce a gherkin-alike syntax for test scenarios, so instead of writing a code like in your examples, testing scenario developers could write something like below:

    Run application A
    Wait for the root window of application A to appear
    Click on button B
    Make sure C has appeared
    ….

    I’m saying gherkin-alike, because gherkin itself is very rigid and doesn’t support any kind of control over test execution flow. It’s either all or nothing.

    Once this is done, it will drastically lower the barrier which is now preventing a lot of ppl without software engineering skills from starting to contribute to opensource projects. Like by providing additional test scenarios.

    You can think few steps ahead and imagine some KDE built-in tools, helping users to record their actions to that scenario format, which they can later share back to developers if they hit some bug. Instead of filling the tickets on how to reproduce a specific bug/issue.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s