Yesterday evening at 20:45 (UTC+1) I decide to do a good deed and help Mamarok help a Kubuntu user getting his system fixed.
The symptoms sound quite familiar (apparently there also have been quite some cases of this): login doesn’t work, get thrown back to KDM.
At approximately 21:00 I realized that simply helping to help will not suffice in this particular case.
At 22:36 I gained SSH access to the affected machine.
At 00:43 I found the source of the issue.
At 01:17 I got all data for a somewhat feasible bug report.
I certainly do hope that you now wonder what the problem was I spent around 4 hours of time on. Well, an X crash, though you probably guessed that much already from the title of this post 😉
More interesting is how we got there, I collected the most important data in the bug report on launchpad.
Apparently some recent update triggered the issue. Naturally I inspected the dpkg.log, but since there was one rather large update recently, that did not give me much of a clue, xsession-errors is also rather unspectacular, well, other than the X connection problems (I suspect those were caused by X going down?).
So while I was digging around finding a sensible way to debug this properly, the help-seeker made some very weird discovery, apparently changing the locale from he_IL to en_US and creating a new user made login work for the new user (?). That seemed rather unrelated, but the fact that it worked for a new user lead me to suspect a problem with some config in $HOME, so we spent quite some time on looking for possible causes, didn’t find anything though. By that time I got ssh and vnc accesss, poked a bit more with the $HOME stuff, switched back to he_IL and guess what, login for that new user did not work (I did not try with en_US, so I cant even tell if it really worked before, but the user claimed it worked twice (?)). So while the issue gets weirder and weirder I at least discovered that login to an xterm session worked without any problem. hah!
Natural assumption at this point: if xterm session works, but KDE does not -> something KDE related must be at fault. So from the xterm session I invoke the usual suspects (kwin, plasma-desktop…), but nothing, indeed the system seemed to be working just fine, quite fast too. After some poking and prodding I finally came to think of the ultimate answer, and it is not 42! If KDE appears to be causing the problem, but all KDE apps start just fine on their own, the problem must be in startkde. That of course leads to another question: how to debug startkde?
Since the issue was appearing immediately (i.e. once you hit enter in KDM, X would crash, no splash, no nothing), the only option I could imagine was adding sleeps to startkde. Obviously something within startkde triggered the crash, so I just worked my way down the code. Added sleep 60 -> try login -> sleeps -> move sleep 60 a few lines down -> try login… Until I went beyond one line…
xsetroot -cursor_name left_ptr
…a.k.a. daemon command of startkde 😉
So I go back to the xterm session and run that command. But nothing happens. Nothing at all. After some more prodding I gave up and accepted the fact that the binary is not alone responsible. So I comment out that line and remove the sleep, and surprise surprise KDE login works!
Conclusion: xsetroot alone is not dangerous, but combined with something else in startkde triggers the X crash.
Finally it is time to get a backtrace, which is actually pretty easy, just follow the Ubuntu X Backtracing guide. Well, almost. First try did not result in a proper backtrace from gdb, apparently one needs to run X with -dumbSched, which is pretty easy to archive with kdm:
just add the option to ServerArgsLocal= and restart KDM.
On that topic: REINSTALLING IS NO FIX – GO FIND APACHELOGGER
Also, I think Kubuntu needs a new slogan “Kubuntu – The only OS with Developers coming home to you to fix your system” 😉