BlogIT » Migrate Paravirtualized Xen to KVM under RHEL
Update July 11, 2009: Re-registering VMs at RHN uses an extra entitlement with RHEL5.4Beta
Update July 15, 2009: Swap usage, clock and disk cache of the virtual machine
Update July 16, 2009: Replace virsh create with virsh define & start to create a managed domain and not a transient one
Update September 2, 2009: Re-registering with RHN works
Update September 2, 2009: RHEL5.4 has been released. Added a note about services on the physical host
Update September 6, 2009: Updating TimeKeeping and Hugepages
Update April 1, 2010: Hugepages configuration for RHEL 5.5
Update July 3, 2010: Make Hugepages mountpoint persistent
RedHat Enterprise Linux version 5.4 is out. It heralds the arrival of KVM as RedHat's official hypervisor. RedHat will be supporting Xen for the rest of the RHEL5 life cycle, so for the moment, there is no need to migrate to KVM.
However migrating to KVM has some advantages. For one KVM looks simpler from the outside, another is that it works with a normal kernel, meaning that all drivers that work on a normal kernel work as well. This not only encompasses display drivers, but CPU scaling (dynamically adapting the speed of the CPU) as well. This is not only very "green" but makes a difference is your or the companies wallet as well.
RedHat put a lot of work into making Xen easier to manage in RHEL5.0-5.3. As a result Xen uses a single disk image from which it can boot. The format of this image is the same as for KVM. One would suspect that migrating from one Hypervisor to another would be easy and it is. This blog will describe a step-by-step scenario on how to do it.
The starting situation is a RHEL5.3 Physical host with RHEL5.3 paravirtualized guests. The guests have two networking interfaces, one bridged to the physical network interface, and one bridged to a dummy network interface for an internal host network. Note that the minimum requirement to run with virtio is RHEL5.3.
Note:
I had some trouble with selinux in the rhel 5.4 beta. It is related to the attributes on /var/lib/libvirt. I do not use this directory to store the images, bit I use raw LVM volumes. To get my system running again, I just disabled selinux.
Configure the virtio drivers
Open /etc/modprobe.conf in the editor. In our case /etc/modprobe.conf contains the following lines:
alias eth0 xennet
alias eth1 xennet
alias scsi_hostadapter xenblk
change it to
alias eth0 virtio_net
alias eth1 virtio_net
alias scsi_hostadapter virtio_blk
Now add the virtio drivers to the kernel boot image (modify this lane to mirror the latest kernel version)
mkinitrd -f --with=virtio_blk --with=virtio_pci --builtin=xenblk initrd-2.6.18-128.1.16.el5.img 2.6.18-128.1.16.el5
The --builtin is necessary only when currently running under a xen kernel in paravirtualized mode
Internal clock
The internal clock of KVM is less stable than the clock under Xen. Heavy loads have been know to cause clock drift. There are two workarounds:
Now shut down the virtual system (shutdown -h now)
Updating the host
The physical host needs some updating as well. First, before you start, make sure all virtual systems are stopped (xm list) and that you are logged on as root. If RHEL5.4 is already released, yum will update the system automatically to this version. If now, the system needs to be subscribed to the RHEL5.4 beta channel. You can do this at RedHat network, if your system is subscribed to rhn. Also make sure the system has access to the Virtual Platform channel beta. Aside from the updates, some new packages need to be installed as well and all virtualization services must be disabled at boot time until we are ready with the configuration work.
yum clean all #for safety
yum update
yum install kernel kvm kvm-tools kmod-kvm kvm-qemu-img bridge-utils
chkconfig --level 2345 xend off
chkconfig --level 2345 xendomains off
chkconfig --level 2345 rhn-virtualization-host off
Edit /boot/grub/menu.lst and set the default boot kernel to the newest non-xen kernel (see example grub config)
Network configuration
By default only a network that is connected via NAT to the outside world is created. There are three options, leave it as is, but check that the IP range does not conflict with anything on the local network, change the IP range, or convert it to a host only network. I left the network, but adapted the IP range and created a new network for host-only networking. Be sure to change the uuid of the network. The format of the uuid should not change. Change any hex number [0-9|a-f] in the uuid string.
/etc/libvirt/qemu/networks/default.xml
<network><name>default</name><uuid>cc06c2a2-0766-45ee-baaa-896e04c7a3be</uuid><forward mode="nat"/><bridge name="virbr0" stp="on" forwarddelay="0"/><ip address="a.b.c.d" netmask="255.255.255.0"><dhcp><range start="a.b.c.e" end="a.b.c.f"/></dhcp></ip></network>
/etc/libvirt/qemu/networks/hostonly.xml
<network><name>hostonly</name><uuid>04255669-803e-d8f6-352a-086fa45ae09d</uuid><bridge name="virbr1" stp="on" forwarddelay="0"/><ip address="a.b.g.h" netmask="255.255.255.0"><dhcp><range start="a.b.g.i" end="a.b.g.j"/></dhcp></ip></network>
Note:
If you run any services on the physical host, which are bound to the network interface of the host only network, you need to watch the boot order. Most services are started before libvirtd. The Virtual bridges only exist after libvirtd has been started. Any services started before libvirtd will not be able to bind to the virbrX interface. Named (bind) for instance binds to the interfaces. If you use the host only network to access a nameserver on the physical hosts, you need to restart named after boot (of the physical host), or the guests cannot access the nameserver.
The bridged network is a bit more complex. Use the configuration file of eth0 as a basis. cp /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-br0. Remove the lines crossed out below and change/add the bold statements.
/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0BOOTPROTO=staticBROADCAST=a.b.c.255
HWADDR=ab:cd:ef:gh:ij:klIPADDR=a.b.c.dNETMASK=255.255.255.0NETWORK=a.b.c.0
BRIDGE=br0
ONBOOT=yes
/etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
BOOTPROTO=static
BROADCAST=a.b.c.255HWADDR=ab:cd:ef:gh:ij:kl
IPADDR=a.b.c.d
NETMASK=255.255.255.0
NETWORK=a.b.c.0
ONBOOT=yes
TYPE=Bridge
echo net.bridge.bridge-nf-call-ip6tables = 0 >> /etc/sysctl.conf
echo net.bridge.bridge-nf-call-iptables = 0 >> /etc/sysctl.conf
echo net.bridge.bridge-nf-call-arptables = 0 >> /etc/sysctl.conf
Swap usage and caching
If your physical machine is only running Virtual Machines and the memory is not oversubscribed (all VM's together use not more than 80-90%) of total memory, you might want to limit swapfile usage. Since the kernel sees the VMs as a process, rules for processes apply as well. One of those rules means that pages that are not referenced for a while are paged out to swap. The purpose is to free up memory to use for other processes or cache. This speeds up things that are being used. For a VM this is unwanted behavior. On a dedicated host nothing else does run and I don't want my VMs being cached, since that is already happening inside the VM. Double caching gives inconsistent performance behavior, let alone the effects when the host crashes.
There are two ways to put a stop to paging and swapping. The first is not to create a swapfile at all. The second one is to set the kernel dwappiness parameter to a low value. I've set it to 0.
echo vm.swappiness = 0 >> /etc/sysctl.conf
Converting the virtual machine configuration file
There are two ways of converting to KVM. The easiest one is to use virt-manager and create a new virtual machine with exactly the same details as the old one, but point it to a different virtual disk (smallest possible) to prevent overwriting any existing data. Then stop the machine (no need to really install anything) and change the configuration file in /etc/libvirt/qemu by hand to point at the right disk image. This method requires you to reboot first. Else the configuration tools wont see the networks we just created.
The other method is to convert the virtual machine definition by hand. Below is an old Xen definition file (/etc/xen/test1:
name = "test1"
uuid = "4a07fde8-f244-2a6d-9603-85ff2179a9bb"
maxmem = 512
memory = 512
vcpus = 2
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "tap:aio:/var/lib/xen/images/test1.img,xvda,w" ]
vif = [ "mac=00:16:3e:1a:d0:96,bridge=xenbr0", "mac=00:16:3e:1a:d0:97,bridge=xenbr1" ]
<domain type="kvm"><name>test1</name><uuid>48156322-4e0c-b658-b80a-1bf3b608b49d</uuid><memory>524288</memory><currentmemory>524288</currentmemory><vcpu>2</vcpu><os><type arch="x86_64" machine="pc">hvm</type><boot dev="hd"/></os><features><acpi/><apic/><pae/></features><clock offset="utc"/><on_poweroff>destroy</on_poweroff><on_reboot>restart</on_reboot><on_crash>restart</on_crash><devices><emulator>/usr/libexec/qemu-kvm</emulator><disk type="file" device="disk"><driver name="qemu" cache="none"/><source file="/var/lib/xen/images/test1.img"/><target dev="vda" bus="virtio"/></disk><interface type="bridge"><mac address="00:16:3e:1a:d0:96"/><source bridge="br0"/><model type="virtio"/></interface><interface type="network"><mac address="00:16:3e:1a:d0:97"/><source network="hostonly"/><model type="virtio"/></interface><serial type="pty"><source path="/dev/pts/2"/><target port="0"/></serial><console type="pty"><source path="/dev/pts/2"/><target port="0"/></console><input type="mouse" bus="ps2"/><graphics type="vnc" port="-1" autoport="yes" keymap="en-us"/></devices></domain>
<disk device="disk" type="block"><driver cache="none"/><source dev="/dev/vgvm/lvmyvolume"/><target dev="vda" bus="virtio"/></disk>
<vcpu cpuset="cpu1,cpu2,cpu3">virtual cpus</vcpu>
for example
<vcpu cpuset="0,1">4</vcpu>
Starting the virtual machines
You can now start the virtual machines by using the virsh command. Open a console directly after starting the domain to monitor boot progress. You also might want to start the machine after booting.
virsh define /etc/libvirt/qemu/[mymachine.xml]
virsh list
virsh start [mymachines ID]
virsh console [mymachines ID]
virsh autostart [mymachines ID]
Improving Performance with Hugepages
Note:
There could be some unwanted interaction with SELinux here. If you run into problems, either don't use Hugepages or turn SELinux off
KVM uses 4kB memory pages by default, just like any other process. One of the main differences between a normal average process and a kvm virtual machine process is the amount of memory allocated to it. Virtual machines normally use hundreds or even gigabytes of memory. This means a lot of overhead when the CPU switches between virtual machines since large memory tables need to be updated each time.
RHEL 5.4 and Hugepages
Linux also has Hugepages, special memory pages that are 1,2 or 4MB in size, shortening the list of memory pages dramatically and improving performance up to 10%. Sadly, support for Hugepages hasn't been implemented into libvirt. There is work on it in Fedora 12, but I don't expect to see those developments in RHEL5. There is a way however. First lets start by reserving the Hugepages. The file /proc/meminfo should contain the Hugepage size of the system somewhere in the last lines.
Now calculate the amount of Hugepages needed for the virtual machines and add at least 6 pages extra for each virtual machines. If you do not reserve enough pages, your virtual machine won't start. KVM uses some additional pages when starting up the VM, so if you don't add those 6 pages, the last VM will not start. Add the total of Hugepages to your kernel configuration by doing:
echo vm.nr_hugepages = XXXX >> /etc/sysctl.conf
mkdir /hugepages
echo hugetlbfs /hugepages hugetlbfs defaults 0 0
#!/bin/bash
exec /usr/libexec/qemu-kvm2 -mem-path /hugepages "$@"
RHEL 5.5 and Hugepages
RHEL 5.5 has native support for Hugepages. First make sure that the libhugetlbfs package is installed. Then execute the huge_page_setup_helper command and answer the questions.
[root@aurora ~]# rpm -qa | grep huge
libhugetlbfs-1.3-7.el5
libhugetlbfs-1.3-7.el5
[root@aurora ~]# huge_page_setup_helper.py
Current configuration:
* Total System Memory......: 7909 MB
* Shared Mem Max Mapping...: 7100 MB
* System Huge Page Size....: 2 MB
* Number of Huge Pages.....: 3550
* Total size of Huge Pages.: 7100 MB
* Remaining System Memory..: 809 MB
* Huge Page User Group.....: root (0)
How much memory would you like to allocate for huge pages? (input in MB, unless postfixed with GB):
mkdir /dev/hugepages
echo hugetlbfs /dev/hugepages hugetlbfs defaults 0 0
Download the patch here and place it in /usr/local/bin
Download the init script here and place it in /etc/init.d
Then do:
chkconfig --add libvirt_hugepages
chkconfig libvirt_hugepages on
<memorybacking><hugepages/></memorybacking>
Now reboot the system and the virtual machines should be started using Hugepages memory. You can verify this by looking at the qemu-kvm command in the process list. It should contain a -mem-path parameter now. If the Hugepages mountpoint is added after the system has rebooted, restart libvirtd, or else libvirt won't see the Hugepages.</memory>