PVH

From BitFolk
Jump to: navigation, search

Some notes about PVH mode virtualisation, which BitFolk began switching to in November 2020.

What?

It's a different type of virtualisation available within Xen since v4.10. The three available types are:

  • Paravirtualised mode (PV). The guest kernel is modified heavily to work with all its hardware virtualised. The first type of virtualisation that Xen offered many years ago, in order to work even on CPUs with no hardware virtualisation features.
Prior to November 2020 this was the mode that all BitFolk VMs ran under.
  • Hardware virtualisation mode (HVM). Unmodified guest operating systems can be run, taking advantage of hardware virtualisation extensions in the host CPU and possible emulating all other required hardware typically through use of qemu. Paravirtualised device drivers can be used to improve performance of IO devices.
  • Paravirtual+Hardware mode (PVH). Guests use CPU hardware support for memory operations and privileged CPU instructions and are able to use native instructions for most other things, requiring only slight modification of the guest OS. Paravirtual IO drivers are used for performance.
As of November 2020 BitFolk changed its default mode for new guests to PVH and would like existing customers to switch their guests to PVH mode as soon as is practical.

Which Virtualisation Mode Is In Use?

You can tell which mode your guest is currently running under like so:

$ cat /sys/hypervisor/guest_type
PVH

If it says "PV" instead then that's pretty obvious. If there is no such file then your kernel is too old to support PVH anyway so you must be running in PV mode.

Why PVH?

  • PVH performs better than PV these days.
  • PVH is more secure than PV. Mitigations against the various CPU side channel attacks of the last few years work better in PVH or HVM than in PV mode.
  • PVH will soon be the only way to support 32-bit guests. Although it is hard to see why anyone should be running 32-bit guests in the year 2020, there is a significant legacy installation at BitFolk (some 40% of customer VMs).
The next stable release of Xen will drop support for 32-bit PV guests. Anyone who hasn't upgraded to 64-bit will need to run under either PVH or HVM, and BitFolk does not intend to support HVM.
The upstream Linux kernel removed support for 32-bit PV guests as of v5.9, so you can't actually run a mainline stable kernel as a 32-bit PV guest any more.

Why Not HVM?

Although it would be nice to be able to support unmodified guests (allowing, for example, FreeBSD, Haiku, Plan9 etc guests), BitFolk does not intend to support HVM in the near future.

There is significant extra complexity in running HVM. There have been a number of security vulnerabilities which only affect Xen running HVM guests. It also involves running a qemu process for each guest; there are more lines of code in qemu than in all of the rest of Xen. It is unclear what extra security burden that would involve.

It is possible that certain servers could be dedicated to only running HVM guests so that security issues would be somewhat partitioned and maintenance reboots for security issues would only affect these customers.

This is something that BitFolk will look into after PVH has been deployed.

Switching Between PV And PVH Mode

New accounts will be set to use PVH mode where it is known to work, but existing accounts will need to be switched by the customer.

To enable switching, BitFolk has added the virtmode command to the Xen Shell as of version 1.48bitfolk60. This command simply toggles between PV and PVH mode; the change will take effect from the next boot of the VPS. Should there be problems the customer can use the virtmode command again to switch back.

Don't forget that the Xen Shell can remain running after you disconnect, so you could be connecting to an older version. You can see the version by using the help command; if it's too old, exit from every window until you disconnect, and then connect again to start the latest version.

When using the install command of the Xen Shell to boot an operating system installer, the Xen Shell will refuse to proceed if you are in PVH mode and are trying to install something that is known to not work.

If in PV mode and issuing a command to install an operating system that is known to support PVH, the Xen Shell makes the suggestion to switch to PVH but will not force you to do so.

The rest of this article describes the situation at the time of the transition, testing that was done, etc.

How It Works As Of November 2020

Since some time in 2017, BitFolk switched to booting guests using pvgrub2. This is the Grub bootloader compiled as a Xen PV kernel instead of a bootloader image.

The idea is that this Grub searches the customer's block devices for a grub.cfg file, in the context of the customer's guest, and then if it finds one it obeys that configuration file so as to boot how the customer's operating system intended.

The pvgrub2 bootloader needs to be the same architecture as the customer's eventual chosen kernel (32-bit/i386/i686 versus 64-bit/amd64/x86_64). So, there is the arch command in the Xen Shell for customers to toggle between the two different bootloaders. This is a source of some confusion from time to time.

All being well the customer has a Grub2 experience they are familiar with, they select a working kernel and initramfs and boot completes without issue.

What Needs To Change

In theory, Xen since v4.10 has fully supported PVH mode guests and the Linux kernel since v4.11 has fully supported being run as a PVH mode guest. In reality the situation is a lot more complicated.

From the Xen guest configuration point of view, it simply needs lines like…

type   = "pv"
kernel = "/opt/grub/lib/grub-x86_64-xen.bin"

…changing to…

type   = "pvh"
kernel = "/opt/grub/lib/pvhgrub.bin"

That tells Xen to boot an instance of Grub2 that is compiled as a PVH guest (henceforth referred to as pvhgrub), which should then work the same as in the pvgrub2 case. Unfortunately this may result in a guest not booting correctly.

The problem is that the hypervisor, the bootloader (pvhgrub), and the guest kernel (Linux) must all do things correctly.

Although PVH mode has theoretically been supported since Linux kernel 4.11, this has only worked for direct kernel booting, i.e. where the guest kernel exists as a file in the host server and is loaded directly by the hypervisor. That is not how BitFolk does things and to do things this way would be a backwards step, so much newer guest kernels than 4.11 are going to be required.

As a consequence there is going to be a significant period of time where legacy guests — so old they won't ever be receiving packaged kernel upgrades — will need to continue running under PV mode.

This also means that there will sadly come a time when:

  1. Xen does not support 32-bit PV mode any more;
  2. There are still 32-bit customers at BitFolk; and
  3. Those guest kernels will be too old to work in PVH mode

At that time, the only option for those customers will be to find some way to get a newer kernel installed on their old VM. It is usually reasonably easy to boot an old userland with a newer kernel.

Possibly there can be some consultancy offered to achieve this, but obviously BitFolk strongly suggests that no customer allow themselves to end up in this position in the first place.

The good news is that, when you do have a new enough hypervisor and guest kernel, after making the configuration change everything continues to work the same without needing anything changing in the guest. The guest VM just works better. Despite the avalanche of text in this document, it looks like a reasonably recent Linux distribution release should work in PVH mode without its owner even noticing anything has changed.

What About Installation?

The way that BitFolk customers currently install their guests is to download the netboot installer image that their Linux distribution provides and then boot it in PV mode. The install takes place and then the customer boots into their real operating system.

Most Linux distributions provide a kernel and initramfs that are intended to be booted over the network, start some installer process and then complete the installation over the network downloading packages from the distribution's mirrors as necessary.

On Debian and Ubuntu this software is the debian-installer, optionally using preseeding. On CentOS and Red Hat-derived distributions this is Anaconda, optionally using kickstart.

As these variant kernels may behave differently from the packaged kernels of the distribution, we'll need to test this as well. It wouldn't be the end of the world if we had to just do the install phase in PV mode, though it would mean that all new installs would have to be 64-bit.

What About the Rescue VM?

As the Rescue VM is a Linux distribution, kernel and ramdisk controlled by BitFolk, it can be ensured that it works in PVH mode. In fact this can be one of the first things changed.

Experimentation

In order to find out the scope of the problems we may be dealing with we'll have to do some experimentation.

Debian

Release Architecture Kernel Version Installer works? pvhgrub boot works? Notes
9.x (stretch/oldstable) amd64 4.9.0-14 No No debian-install kernel doesn't have PVH support. Can install PV and then distribution kernel boots, but fails to find any block devices.
9.x (stretch/oldstable) amd64 4.19.0-0 (stretch-backports) No Yes debian-install kernel doesn't have PVH support. Needs backports kernel.
10.x (buster/stable) amd64 4.19.152-1 Yes Yes Uneventful
10.x (buster/stable) i386 4.19.152-1 Yes Yes Uneventful
11.x (bullseye/testing) amd64 5.9.0-1 Yes Yes Uneventful

9.x (stretch/oldstable) failed to find block devices

Installer kernel doesn't appear to support PVH. Can do install in PV mode and then boot in PVH mode, but distribution kernel fails to find any block devices:

Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
Begin: Running /scripts/local-block ... done.
done.
Gave up waiting for suspend/resume device
done.
Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... done.
done.
Gave up waiting for root file system device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/xvda1 does not exist.  Dropping to a shell!


BusyBox v1.22.1 (Debian 1:1.22.0-19+b3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

9.x (stretch/oldstable) with backported kernel

The 4.9.x kernel that comes with stretch is too old to support PVH, but there is a 4.19.x kernel in stretch-backports that will work.

From an existing machine just follow the normal instructions for installing a package from backports.

If wanting to do a new install the installer is going to run PV mode, but if you immediately at the first question select "Go back" and then from the main menu select "Change debconf priority" to "low" it will ask you many more questions, enabling you to add the backports repository from the installer.

At the end of the install process don't let it reboot. Instead select "Execute a shell" and then do the following to install the kernel from backports:

~ # mount -t proc proc /target/proc
~ # mount --rbind /dev /target/dev
~ # mount --rbind /sys /target/sys
~ # chroot /target /bin/bash
root@debtest1:/# apt -t stretch-backports install linux-image-amd64
root@debtest1:/# exit
~ # exit

You can now let the install shut down, and then when you boot again it will use a 4.19.x kernel that supports PVH mode.

Ubuntu

Release Architecture Kernel Version Installer works? pvhgrub boot works? Notes
18.04.x (Bionic Beaver) amd64 4.15.0-112 Yes No Kernel hangs just over 1 second in to boot.
20.04.x (Focal fossa) amd64 5.4.0-52 Yes Yes Decompressed kernel hack needs to be disabled.

18.04.x (Bionic Beaver) boot failure

Installation proceeds well, and initial boot looks like it is working, but then very shortly in to boot things grind to a halt:

.
.
.
[    1.272097] AppArmor: AppArmor sha1 policy hashing enabled
[    1.272107] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[    1.272121] ima: Allocated hash algorithm: sha1
[    1.272142] evm: HMAC attrs: 0x1
[    1.272164]   Magic number: 1:252:3141
[    1.272198] hctosys: unable to open rtc device (rtc0)
[    1.272235] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    1.272246] EDD information not available.
[    1.273345] Freeing unused kernel image memory: 2436K

Possibly there is an HWE kernel packaged that would work. It's likely that anything 4.19.x or later will work.

20.04.x (Focal Fossa) decompressed kernel hack needs to be disabled

20.04 uses LZ4-compressed kernel images. These are not supported by pvgrub2, so at the moment BitFolk's installer adds a script hack to decompress the kernel image at kernel install time.

It appears that pvhgrub, on the other hand, doesn't support decompressed kernels!

Not using BitFolk's decompressed kernel hack (so booting a normal LZ4-compressed kernel image) works fine with pvhgrub. BitFolk's installer has been modified to detect it's running under PVH mode and not install the hack, so new installs will be okay. Existing customers would need to make sure they have a normal compressed kernel. Unfortunate scope for confusion here.

CentOS

CentOS before version 8 comes with a kernel that is too old to support PVH mode, and Red Hat are not going to backport the support.

CentOS 8's kernel has had all Xen support intentionally disabled by Red Hat so as a result their installer cannot be booted. People using CentOS under PV mode right now have to install it from the Rescue VM and then install the kernel-ml package from ELRepo. The kernel-ml kernel also works under PVH mode.

Fedora

Although Fedors kernels support both PV and PVH mode, as of kernel-core-5.9.8-100 Fedora switched to zstd for kernel compression, and the PV boot loader does not support this.

The easiest way forward is to switch to PVH mode which does work with these kernels (and likely works with any kernel after 4.19).

The instructions for getting it working are basically the same as above:

xen shell> shutdown
xen shell> virtmode pvh
xen shell> boot

In addition you will need to add console=hvc0 to grub to be able to see the console output, by adding a line like:

GRUB_CMDLINE_LINUX="crashkernel=auto rd.auto rhgb consoleblank=0 audit=0 console=hvc0"

Then:

# grub2-mkconfig -o /boot/grub2/grub.cfg

If for some reason you did not want to switch to PVH mode you would need to find some way to decompress your kernel.

NixOS

The standard/default kernel doesn't have PVH support enabled, but the kernel meant for Xen dom0s can be used instead. Add the following to your NixOS configuration:

boot = {
  kernelPackages = pkgs.linuxPackages_xen_dom0;
  kernelParams = [ "console=hvc0" ];
};

Deadlines

The hard deadlines are all about 32-bit guest support at the moment.

Conceivably there could end up being security issues that mean PV guests become infeasible, but at the time of writing the reduced performance and increased security risk is all on the customer so it can be deferred to customer choice whether to switch.

Xen 32-bit PV support

The current stable version of Xen is 4.14, and this will be the last version to support 32-bit PV guests. It has security support until 2023-07-24. After this date the only way to run a 32-bit guest will be in PVH mode.

Linux 32-bit PV guest support

Already removed as of 5.9.0. If you don't want to switch to PVH mode you'll need to run a kernel older than 5.9.

Not likely to be an issue since the majority of customers wishing to keep running 32-bit guests are on ancient out of support Linux distributions that get no kernel upgrades anyway. You would only be hit by this if you wanted to do a new 32-bit install of what is currently Debian testing, for example, or when installing a new custom mainline kernel on your old install.