This article was published originally at https://kubevirt.io/2020/Live-migration.html
This blog post will be explaining on KubeVirt’s ability to perform live migration of virtual machines.
Live Migration is a process during which a running Virtual Machine Instance moves to another compute node while the guest workload continues to run and remain accessible.
The concept of live migration is already well-known among virtualization platforms and enables administrators to keep user workloads running while the servers can be moved to maintenance for any reason that you might think of like:
- Hardware maintenance (physical, firmware upgrades, etc)
- Power management, by moving workloads to a lower number of hypervisors during off-peak hours
KubeVirt also includes support for virtual machine migration within Kubernetes when enabled.
Keep reading to learn how!
Enabling Live Migration¶
To enable live migration we need to enable the
feature-gate for it by adding
LiveMigration to the key:
1 2 3 4 5 6 7 8 9
kubevirt-config can be edited to append “
LiveMigration” to an existing configuration:
Configuring Live Migration¶
If we want to alter the defaults for Live-Migration, we can further edit the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Parameters are explained in the below table (check the documentation for more details):
||5||How many migrations might happen at the same time|
||2||How many outbound migrations for a particular node|
||64Mi||MiB/s to have the migration limited to, in order to not affect other systems|
||800||Time for a GiB of data to wait to be completed before aborting the migration.|
||150||Time to wait for Live Migration to progress in transferring data|
Performing the Live Migration¶
- Virtual Machines using PVC must have a
RWXaccess mode to be Live-Migrated
- Additionally, pod network binding of bridge interface is not allowed
Live migration is initiated by posting an object
VirtualMachineInstanceMigration to the cluster, indicating the VM name to migrate, like in the following example:
1 2 3 4 5 6
This will trigger the process for the VM.
When a VM is started, a calculation has been already performed indicating if the VM is live-migratable or not. This information is stored in the
VMI.status.conditions. Currently, most of the calculation is based on the
Access Modefor the VMI volumes but can be based on multiple parameters. For example:
yaml Status: Conditions: Status: True Type: LiveMigratable Migration Method: BlockMigration
If the VM is Live-Migratable, the request will submit successfully. The status change will be reported under
VMI.status. Once live migration is complete, a status of
Failed will be indicated.
info “Watch out!”
Migration Methodfield can contain:
BlockMigration: meaning that the disk data is being copied from source to destination
LiveMigration: meaning that only the memory is copied from source to destination
VMs with block devices located on shared storage backends like the ones provided by Rook that provide PVCs with ReadWriteMany access have the option to live-migrate only memory contents instead of having to also migrate the block devices.
Cancelling a Live Migration¶
If we want to abort the Live Migration, ‘Kubernetes-Style’, we’ll just delete the object we created for triggering it.
In this case, the VM status for migration will report some additional information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Note that there are some additional fields that indicate that
Abort Requested happened and in the above example that it has
Succeded, in this case, the original fields for migration will report as
Completed (because there’s no running migration) and
Failed set to true.
What can go wrong?¶
Live-migration is a complex process that requires transferring data from one ‘VM’ in one node to another ‘VM’ into another one, this requires that the activity of the VM being live-migrated to be compatible with the network configuration and throughput so that all the data can be migrated faster than the data is changed at the original VM, this is usually referred to as converging.
Some values can be adjusted (check the table for settings that can be tuned), to allow it to succeed but as a trade-off:
- Increasing the number of VMs that can migrate at once, will reduce the available bandwidth.
- Increasing the bandwidth could affect applications running on that node (origin and target).
- Storage migration (check the
Infonote in the Performing the Live Migration section on the differences) might also consume bandwidth and resources.
Sometimes, a node requires to be put on maintenance and it includes workloads on it, either containers or, in KubeVirt’s case, VM’s.
It is possible to use selectors, for example, move all the virtual machines to another node via
kubectl drain <nodename>, for example, evicting all KubeVirt VM’s from a node can be done via:
warning “Reenabling node after eviction”
Once the node has been tainted for eviction, we can use
kubectl uncordon <nodename>to make it schedulable again.
According to documentation,
--force are required because:
- Pods using
emptyDircan be deleted because the data is ephemeral.
- VMI will have
virt-handlerso it’s safe to proceed.
- VMIs are not owned by a
DaemonSet, so kubectl can’t guarantee that those are restarted. KubeVirt has its own controllers for it managing VMI, so kubectl shouldn’t bother about it.
If we omit the
--pod-selector, we’ll force eviction of all Pods and VM’s from a node.
important “Live Migration eviction”
In order to have VMIs using
LiveMigrationfor eviction, we have to add a specific spec in the VMI YAML, so that when the node is tainted with
kubevirt.io/drain:NoScheduleis added to a node.
yaml spec: evictionStrategy: LiveMigrate
From that point, when
kubectl taint nodes <foo> kubevirt.io/drain=draining:NoScheduleis executed, the migrations will start.
As a briefing on the above data:
LiveMigrateneeds to be enabled on KubeVirt as a feature gate.
LiveMigratewill add status to the VMI object indicating if it’s a candidate or not and if so, which mode to use (Block or Live)
- Based on the storage backend and other conditions, it will enable