It is recommended to keep your system up to date. But there is no guarantee that the update process will always go smoothly. So I think there should be a contingency plan if the update goes wrong.
A little background on myself. I am a software developer and use Arch Linux (😍) as my daily driver. I also update my system every saturday so I have time to fix if something goes wrong. I have been using Arch Linux for past two years and never had some major issues until today.
It was a normal day. I woke up and ran
pacman -Syu then rebooted. To my great surprise the system was stuck at boot. The system just freeze, no login prompts or errors. I was both happy and sad. I was happy that I had the problem this day (best decision of my life to update every saturday). At the same time I was quite sad and frustrated as I had other things to do and I was planning to watch a movie 😂.
Until this day I never had this type of issues and so I never had any contingency plans. So, in this story I am going to talk about how I figured out the issue and the solution.
What is the issue?
I started thinking about all the things that might go wrong after an upgrade. I slowly started following a typical boot process.
- The OEM logo showed up (UEFI is OK)
- There are some logs about systemd and a message “Welcome to Arch Linux” (Initrd is OK)
- Then after starting some services the system just freeze (may be a service or kernel issue)
I think that the issue is related to the kernel rather than services as the system didn’t respond to keyboard events. So what you do in such situations, how do you reboot your system without long pressing the power button. Glad you asked, I recommend everyone to enable
sysrq more information can be found here. Basically
sysrq is used to trigger kernel level shortcuts which means you can pass commands directly to the kernel in cases even when the system is in dead lock or stuck.
As a software developer I know how valuable logs are. Since I have concluded that the issue might be with kernel, it’s time to get more info about what the kernel is doing. This can be achieved by passing parameters to the kernel. The way you pass parameter to the kernel depends on the type of boot loader that you use. I personally use systemd-boot as it is simple and easy to use. You can press
e once you have enabled the
editor option described here. If you other boot loaders like GRUB you can look here to find more information on how to pass kernel parameters. So I passed
ignore_loglevel to the kernel and OMG those logs 😵. Here are the logs when the kernel was stuck.
Now starts the fun part, we know that the system is stuck at
fb0: switching to inteldrmfb from EFI VGA . After some googling I found the issue is related to
i915 module which is the intel DRM driver. I also passed
nomodeset to kernel to disable
KMS and I booted to console but failed to start
xorg (no GUI).
So let’s come to a solution. One of the options is to downgrade the system. If you have installed many packages then downgrading each and every package is every difficult task and might result in inconsistent packages versions. Until this time I didn’t know that you can downgrade entire system to a specific date(😍). So let’s get down to it. First disable all the mirrors in you
/etc/pacman.d/mirrorlist , you can just add
# before the mirror to disable it. Then add a new mirror
Server = https://archive.archlinux.org/repos/2020/06/27/$repo/os/$arch . As you might have guessed, you can specify the date in the mirror link. You can now run
pacman -Syyuu to force downgrade the system to a specific date. You can learn more about it here. And there you go, now all your packages are downgraded to a specific date.
This is just a temporary fix but it also prevents you from having latest packages. You can try updating individual packages but this is not recommended by the Arch community.
Until this day I didn’t know you can have alternative kernel (😍). It feels great when you learn new things. This is the way I recommend as this let’s you have both latest packages and multiple kernels. Go ahead and install both the LTS(Long Term Support) and latest kernel from the arch repositories. Now create a new entry in the
bootloader configuration that points to the LTS kernel. Consult your
bootloader documentation for how to enter a new kernel entry. As I already mentioned I am using systemd-boot here is a config for LTS kernel.
title Arch Linux LTS
options root=PARTUUID=24a435b4-f6c6-e842-8dac-63edb7343de1 rw rootfstype=ext4
Now when you boot your system you should have a new entry (“Arch Linux LTS” in my case). So whenever you have any issue in latest kernel you can boot to LTS kernel and continue your work. I am using LTS kernel for now until the
i915 bug is fixed.
That’s it for this story, if you have any issues or suggestions let me know in the comments below.
So the issue is the i915 module and this issue is specially caused by the way Arch builds its kernel. Here is the link to my post in Arch linux forum about the issue. Long story short, just pass
intel_iommu=on parameter to the kernel and everything is good to go.