A dedicated unikernel for microservices

Thursday, March 31, 2022

Notes about an hypervisor-agnostic implementation of VirtIO

This post presents some very informal notes about the requirements for a virtio backend that would be independent of the hypervisor/OS. Some of these requirements are:

* The library shall require an API that the hypervisor has to provide. 

* The library shall be flexible enough depending on the use case. For example, the library shall be able to work in a type-II hypervisor but also in a type-I hypervisor.

The use-cases are: 

1. The backend running as a user-space process like in KVM and QEMU, 

2. The backend running as a user-space process in Dom0 like in Xen, 

3. The backend running as a VM, e.g., JailHouse. 

I found that other works have already highlighted the need for a virtio backend that would be independent of the VMM and the hypervisor, e.g, Stratos project. In such a backend, the hypervisor would be abstracted away by a common interface, i.e., driver. This would raise some requirements to the hypervisor and the interface that the hypervisor need to expose to be abler to plug such a backend. Also, the way that virtio-devices can be implemented may vary. For example, it could be in a user process, a thread, or in a VM.

If we deal with an hypervisor that does not provide such a interface, we still need some sort of mechanism to communicate backend and frontend. This may require some sort of synchronization mechanism maybe by extending the mmio layout.

I have found three cases: 

1. Type II hypervisor(KVM) in which the backend run as a user application and the backend run as a part of the VMM. Backend only needs to register some callbacks to trap access to IO regions. 

2. Type-I like XEN (hvm), this is also the case although the VMM is running as in a different VM

3. Type-I**, there is not such a VM exit mechanism and backend can't see all frontend memory, the requirements are more to the hypervisor. One possible solution is to share a region of the memory between the frontend and the backend, let the frontend allocate and manipulate that memory that would be used for io-buffers. 

It could be interesting to put the backend as a VM. For that use-case, we require to communicate these VMs somehow. One possible way would to share a region of memory between the VMs and in addition implement some sort of ring bell mechanism between the VMs.

In my experiments, I added some extra bits in the mmio layout to allow to set the device and the driver blocked or resumed. These bits are used for the device-status initialization only. Then notifications to the vrings are done by using interVM irqs. 

In my PoC, each frontend gets a region of memory in which the virtio layout and the io-buffers are mapped. This is defined statically when the guest is created. The memory for io-buffers is also allocated at that time.  

If BE and FE are implemented as VMs, VMExits plays no role because we would need the hypervisor in that case. Also, it would be nice to be able to specify the BE as a driver for Linux.