DrKonqi ❀ coredumpd

Get some popcorn and strap in for a long one! I shall delight you with some insights into crash handling and all that unicorn sparkle material.


Since Plasma 5.24 DrKonqi, Plasma’s infamous crash reporter, has gained support to route crashes through coredumpd and it is amazing – albeit a bit unused. That is why I’m telling you about it now because it’s matured a bit and is even more amazing – albeit still unused, I hope that will change.

To explain what any of this does I have to explain some basics first, so we are on the same page…

Most applications made by KDE will generally rely on KCrash, a KDE framework that implements crash handling, to, well, handle crashes. The way this works depends a bit on the operating system but one way or another when an application encounters a fault it first stops to think for a moment, about the meaning of life and whatever else, we call that “catching the crash”, during that time frame we can apply further diagnostics to help later figure out what went wrong. On POSIX systems specifically, we generate a backtrace and send that off to our bugzilla for handling by a developer – that is in essence the job of DrKonqi.

Currently DrKonqi operates in a mode of operation generally dubbed “just-in-time debugging”. When a crash occurs: KCrash immediately starts DrKonqi, DrKonqi attaches GDB to the still running process, GDB creates a backtrace, and then DrKonqi sends the trace along with metadata to bugzilla.

Just-in-time debugging is often useful on developer machines because you can easily switch to interactive debugging and also have a more complete picture of the environmental system state. For user systems it is a bit awkward though. You may not have time to deal with the report right now, you may have no internet connection, indeed the crash may be impossible to trace because of technical complications occurring during just-in-time debugging because of how POSIX signals work (threads continue running :O), etc.

In short: just-in-time really shouldn’t be the default.

Enter coredumpd.

Coredumpd is part of systemd and acts as kernel core handler. Ah, that’s a mouthful again. Let’s backtrace (pun intended)… earlier when I was talking about KCrash I only told part of the story. When fault occurs it doesn’t necessarily mean that the application has to crash, it could also neatly exit. It is only when the application takes no further action to alleviate the problem that the Linux kernel will jump in and do some rudimentary crash handling, forcefully. Very rudimentary indeed, it simply takes the memory state of the process and dumps it into a file. This is then aptly called a core dump. It’s kind of like a snapshot of the state of the process when the fault occurred and allows for debugging after the fact. Now things get interesting, don’t they? 🙂

So… KCrash can simply do nothing and let the Linux kernel do the work, and the Linux kernel can also be lazy and delegate the work to a so called core handler, an application that handles the core dumping. Well, here we are. That core handler can be coredumpd, making it the effective crash handler.

What’s the point you ask? — We get to be lazy!

Also, core dumping has one huge advantage that also is its disadvantage (depending on how you look at it): when a core dumps, the process is no longer running. When backtracing a core dump you are looking at a snapshot of the past, not a still running process. That means you can deal with crashes now or in 5 minutes or in 10 hours. So long as the core dump is available on disk you can trace the cause of the crash. This is further improved by coredumpd also storing a whole lot of metadata in journald. All put together it allows us to run drkonqi after-the-fact, instead of just-in-time. Amazing! I’m sure you will agree.

For the user everything looks the same, but under the hood we’ve gotten rid of various race conditions and gotten crash persistence across reboots for free!

Among other things this gives us the ability to look at past crashes. A GUI for which will be included in Plasma 5.25. Future plans also include the ability to file bug reports long after the fact.

Inner Workings

The way this works behind the scenes is somewhat complicated but should be easy enough to follow:

  • The application produces a fault
  • KCrash writes KCrash-specific metadata into a file on disk and doesn’t exit
  • The kernel issues a core dump via coredumpd
  • The systemd unit coredump@ starts
  • At the same time drkonqi-coredump-processor@ starts
  • The processor@ waits for coredump@ to finishes its task of dumping the core
  • The processor@ starts drkonqi-coredump-launcher@ in user scope
  • launcher@ starts DrKonqi with the same arguments as though it had been started just-in-time
  • DrKonqi assembles all the data to produce a crash report
  • the user is greeted by a crash notification just like just-in-time debugging
  • the entire crash reporting procedure is the same

Use It!

If you are using KDE neon unstable edition you are already using coredumpd based crash reporting for months! You haven’t even noticed, have you? 😉

If not, here’s your chance to join the after-the-fact club of cool kids.

KCRASH_DUMP_ONLY=1

in your `/etc/environment` and make sure your distribution has enabled the relevant systemd units accordingly.

Zabbix IRC Notifications

Zabbix_logo

Some months ago I rolled out the terrifyingly fancy monitoring platform Zabbix to monitor all Blue Systems servers conveniently. Ever since then I wanted IRC notifications but there didn’t seem to be anything compelling available, so I got quickly annoyed and moved on.

Eventually our very own Bhushan Shah poked me enough to figure out IRC notifications.

So, now we have zabbix-irc-pusher. It is an incredibly simple script connecting to IRC and sending messages to a channel. It does so without actually demonizing, which some might argue makes the script simpler. It does however mean that the script will make numerous join/quit messages appear in the relevant IRC channels, so it is advisable to enable outside messages for that channel so the bot doesn’t actually need to join the channel.

Setting the notifications up is a bit meh though, so here’s how. This is talking about Zabbix 2.x, but all of this should largely be the same for the recently released Zabbix 3.x.

First things first. Zabbix has built-in script support that is meant to be a simple notification solution where a specific notification script is simply called with 3 arguments corresponding to an e-mail’s To, Subject and Body field. These notification scripts need to be placed into a directory your zabbix-server uses for alert scripts. You can check the zabbix_server.conf’s AlertScriptsPath variable to find or change the directory in question. By default it will be something like /usr/lib/zabbix/alertscripts/ so we are going to roll with this for now. The script in question needs to be in that directory and made executable. Once the script is working and in the correct directory all the rest of the configuration happens in Zabbix itself.

In Administration→Media types create a new media type, make it type Script and write the name of the script file.

zabbix-irc-01

Next you need to use the script as notification strategy for a specific user. Notifications will not be issued if your script is not actually used for notifications on any user!

Go to Administration→Users pick any enabled user and go to the Media tab. Add a new media, select your IRC notification media, set an IRC channel to send notifications to and pick the notifications that should be sent. Don’t forget to actually update the user, once you add the media.

zabbix-irc-02

At this point we have the notification method set up, but not the content. To do that we’ll have to configure an action. In Configuration→Actions create a new action and define content.

We use the following:

Name: Report problems to IRC
Default subject: {TRIGGER.STATUS}: {TRIGGER.NAME} - http://m.neon.kde.org/zabbix/tr_events.php?triggerid={TRIGGER.ID}&eventid={EVENT.ID}
Default message:
Recovery subject: {TRIGGER.STATUS}: {TRIGGER.NAME} - http://m.neon.kde.org/zabbix/tr_events.php?triggerid={TRIGGER.ID}&eventid={EVENT.ID}
Recovery message:

You can define a bunch of conditions in which to notify.

Last but not least, you need to associate the action with the notification method we set up. In the operations tab add a new operation and associate with the user for which you set up the notification method. Don’t forget to actually hit add for the operation and also for the action to save both.

zabbix-irc-04 zabbix-irc-05

Once you are done you should have working IRC notifications. To check simply cause an event (e.g. take an agent offline) and check the event info under Monitoring→Events. Events fitting the action conditions should now have a message actions entry with information about the message delivery and the notification should have arrived on IRC. That’s it!

zabbix-irc-06 zabbix-irc-07

Naturally, all this applies to any script based notification, so whether your script forwards the information to IRC, Telegram or perhaps even an issue tracking system doesn’t really matter as far as the Zabbix side is concerned.

Unfortunately debugging script notifications is a bit of a crafty topic, so to make sure you don’t forget anything here’s a short list of things to do:

  1. Make sure Zabbix-Server has an alert scripts path set up
  2. Put script in alert scripts path
  3. Make script exectuable (chmod +x)
  4. In Zabbix add a media with type script and the relevant script’s file name
  5. Add a notification method to an enabled Zabbix user
  6. Add an action and associate it with the Zabbix user
  7. Check that new events have a message actions entry for the new action

 

Building a Jenkins Security Realm

java

Last week I spent a good while on writing a new security realm for KDE’s Jenkins setups. The result of my tireless java brewing is that the Jenkins installation of KDE neon now uses KDE’s Phabricator setup to authenticate users and manage permissions via OAuth.

We should hopefully see this roll out to the KDE CI Jenkins as well in the near future.

Since the documentation seems a bit scarce I am going to throw together some thoughts on how to implement OAuth security realms. For a primer on general plugin development I suggest having a look at the Jenkins Plugin Tutorial.

jenkins-securityJenkins security is split into two parts. The SecruityRealm controlling authentication of users and the AuthorizationStrategy controlling permissions of the users. These two are plugin description points for the respective functionality in Jenkins’ security.

The important thing to remember is that you can implement one without the other. For example the KDE OAuth plugin only implements a SecurityRealm as we currently have no need for our own AuthorizationStrategy. The Role Strategy plugin on the other hand implements only an AuthorizationStrategy.

To successfully implement a SecurityRealm you will need your realm class which is going to extend SecurityRealm and implement a UserDetailsService (this will actually only be used internally to, among other things, log in a user for API transactions). The SecurityRealm will use an AuthenticationToken to actually manage a session and a UserDetails instance to represent a user entity.

You can find some boilerplate code to outline a primitive realm we could use for OAuth2 in this git repository. Which would get a call-sequence similar to this one upon login request:

  • getLoginUrl (redirects to commenceLogin)
  • doCommenceLogin (redirects to request URI on oauth host)
  • doFinishLogin (gets redirected to by oauth host once authorized; requests access token)

After doFinishLogin the user should be authenticated and logged in. As you will probably notice there is talk of MyAuthToken and MyUser. Sample code for those is also available in the git repository.

They are both not terribly complicated, for the most part they are simply plain old data objects representing a session and a user. It is probably worth mentioning that a GrantedAuthority is approximately equal to the concept of a group membership, so much so that if you add more GrantedAuthorityImpls Jenkins will handle them as groups listed on the user profile and for use in AuthorizationStrategies.


MyAuthToken auth = new MyAuthToken(accessToken);
SecurityContextHolder.getContext().setAuthentication(auth);
SecurityListener.fireAuthenticated(auth.getUser());

And that’s all you need for your SecurityRealm. For the most part your realm will simply create a token “somehow” and then set it as active on the SecurityContextHolder. Once that is done you have an authenticated session at your hand.

For some more inspiration hop on over to my actual plugin’s git repository.