Selenium + AT-SPI = GUI Testing

At KDE we have multiple levels of quality assurance ranging from various degrees of a humans testing features to fully automated testing. Indeed automated testing is incredibly important for the continued quality of our software. A big corner stone of our testing strategy are so called unit tests, they test a specific piece of our software for its behavior in isolation. But for many aspects of our software we need a much higher level view, testing pieces of Plasma’s application launcher in isolation is all good and well but that won’t tell us if the entire UI can be easily navigated using the keyboard. For this type of test we require a different testing approach altogether. A couple months ago I’ve set set out to create a testing framework for this use case and I’m glad to say that it has matured enough to be used for writing tests. I’d like to walk you through the technical building blocks and a simple example.

Let us start of by looking at the architecture at large. So… there’s Selenium which is an incredibly popular, albeit web-oriented, testing framework. Its main advantages for us are its popularity and that it sports a server-client split. This means we can leverage the existing client tooling available for Selenium without having to write anything ourselves, we only need to grow a server. The server component, called a WebDriver, implements the actual interaction with UI elements and is generic enough to also apply to desktop applications. Indeed so thought others as well: there already exists Appium – it extends Selenium with more app-specific features and behaviors. Something for us to build upon. The clients meanwhile are completely separate and talk to the WebDriver over a well defined JSON REST protocol, meaning we can reuse the existing clients without having to write anything ourselves. They are available in a multitude of programming languages, and who knows maybe we’ll eventually get one for writing Selenium tests in QML 😉

That of course doesn’t explain how GUI testing can work with this on Linux. Enter: AT-SPI. AT-SPI is an accessibility API and pretty much the standard accessibility system for use on Linux. Obviously its primary use is assistive technologies, like the screen reader Orca, but to do its job it essentially offers a toolkit-independent way of introspecting and interacting with GUI applications. This then gives us a way to implement a WebDriver without caring about the toolkit or app specifics. As long as the app supports AT-SPI, which all Qt apps do implicitly, we can test it.

Since all the client tooling is independent of the server all we needed to get GUI testing going was a WebDriver that talks to AT-SPI.

That is what I set out to write and I’m happy to announce that we now have an AT-SPI based WebDriver, and the first tests are popping into existence already. There is also lovely documentation to hold onto.

So, without further ado. Let us write a simple test. Since the documentation already writes one in Python I’ll use Ruby this time around so we have some examples of different languages. A simple candidate is KInfoCenter. We can test its search functionality with a couple of lines of code.

First we need to install selenium-webdriver-at-spi, clone it, cmake build it, and cmake install it. You’ll also need to install the relevant client libraries. For ruby that’s simply running gem install appium_lib.

Then we can start with writing our test. We will need some boilerplate setup logic. This is more or less the same for every test. For more details on the driver setup you may also check the wiki page.

  def setup
    @appium_driver = Appium::Driver.new(
      {
        'caps' => { app: 'org.kde.kinfocenter.desktop' },
        'appium_lib' => {
          server_url: 'http://127.0.0.1:4723',
          wait_timeout: 10,
          wait_interval: 0.5
        }
      }, true
    )
    @driver = @appium_driver.start_driver
  end

The driver will take care of starting the correct application and make sure that it is actually running correctly. Next we’ll write the actual test. Let’s test the search. The first order of business is using a tool called Accerciser to inspect the AT-SPI presentation of the application. For more information on how to use this tool please refer to the wiki. Using Accerciser I’ve located the search field and learned that it is called ‘Search’. So, let’s locate it and activate it, search for the CPU module:

  def test_search
    search = driver.find_element(:name, 'Search')
    search.click
    search.send_keys('cpu')

Next let us find the CPU list item and activate it:

    cpu = driver.find_element(:class_name, '[list item | CPU]')
    assert(cpu.displayed?)
    cpu.click

And finally let’s assert that the page was actually activated:

    cpu_tab = driver.find_element(:class_name, '[page tab | CPU]')
    assert(cpu_tab.displayed?)

To run the complete test we can use the run wrapper: selenium-webdriver-at-spi-run ./kinfocentertest.rb (mind that it needs to be +x). If all has gone well we should get a successful test.

Finished in 1.345276s, 0.7433 runs/s, 1.4867 assertions/s.

1 runs, 2 assertions, 0 failures, 0 errors, 0 skips
I, [2022-12-14T13:13:53.508516 #154338]  INFO -- : tests done
I, [2022-12-14T13:13:53.508583 #154338]  INFO -- : run.rb exiting true

This should get you started with writing a test for your application! I’ll gladly help and review your forthcoming tests.
For more detailed documentation check out the writing-tests wiki page as well as the appium command reference.

Of course the work is not done. selenium-webdriver-at-spi is very much still a work in progress and I’d be glad for others to help add features as they become needed. The gitlab project is the place for that. ❤

The complete code of the example above:

#!/usr/bin/env ruby
# frozen_string_literal: true

# SPDX-License-Identifier: GPL-2.0-only OR GPL-3.0-only OR LicenseRef-KDE-Accepted-GPL
# SPDX-FileCopyrightText: 2022 Harald Sitter <sitter@kde.org>

require 'appium_lib'
require 'minitest/autorun'

class TestKInfoCenter < Minitest::Test
  attr_reader :driver

  def setup
    @appium_driver = Appium::Driver.new(
      {
        'caps' => { app: 'org.kde.kinfocenter.desktop' },
        'appium_lib' => {
          server_url: 'http://127.0.0.1:4723',
          wait_timeout: 10,
          wait_interval: 0.5
        }
      }, true
    )
    @driver = @appium_driver.start_driver
  end

  def teardown
    driver.quit
  end

  def test_search
    search = driver.find_element(:name, 'Search')
    search.click
    search.send_keys('cpu')

    cpu = driver.find_element(:class_name, '[list item | CPU]')
    assert(cpu.displayed?)
    cpu.click

    cpu_tab = driver.find_element(:class_name, '[page tab | CPU]')
    assert(cpu_tab.displayed?)
  end
end

Plasma Analyzer

It’s a Plasma widget that visualizes what’s going on on your system, music-wise that is. I’ve started this project years ago but only recently found the motivation to get it to a somewhat acceptable state. It’s pretty amazing to have bars flying across the screen to Daft Punk’s `Touch`.

https://store.kde.org/p/1953779

KDE Crash Tracking System 💣

KDE is now evaluating Sentry, a crash tracking system.

Who can get access? Everyone with a KDE developer account.

But what is it?

Since forever we have used Bugzilla to manage crash reports but this has numerous challenges that haven’t made any improvements in at least 10 years:

  • Finding duplicates crashes is hard and in our case involves a human finding them
  • When debug symbols are missing we need to ask the user to recreate the problem, which is not always possible
  • Users need to worry about debug symbols (this is in part improved by the rise of debuginfod – yay!)
  • We have no easily consumed graphs on how prevalent a specific crash is, and by extension we have a hard time judging the importance
  • The user needs to actually write a report for us to learn of the crash (spoiler: most crashes never get this far)

All in all it’s a fairly dissatisfactory situation we are in currently. Enter Sentry.

Sentry is a purpose-built crash tracking system. It receives crash reports via API ingestion points and traces missing frames with the help of debuginfod, can detect duplicates automatically and thus show us particularly aggressive crashes, and much more. Best yet, it supports many different programming languages which allows us to not only improve the quality of our software but also our infrastructure services.

The current evaluation instance is already amazing and helped fix numerous problems, and the current setup is not even using all features yet and we have hampered rollout a bit: only git builds currently submit data. If all goes well and we find it to be amazing I hope we’ll eventually be able to roll this out to production releases.

Let’s look at a crash I’ve fixed recently.

Here’s what Sentry received from the user:

Not terribly useful. So with the power of debuginfod it turned it into this:

I then applied some brain power to create a fix and consequently the crash has disappeared, as we can see in this neat graphic here:

Here’s a complete crash information page from a recent infrastructure problem in our bugzilla bot:

Also check out my Akademy talk: