Toro kernel

A dedicated unikernel for microservices

Wednesday, March 31, 2021

Habemus built-in GDBStub!

In this post, we present the built-in gdbstub that has been developed in the last three months. The gdbstub enables a user to connect to a Toro microVM instance for debugging. 

One way to debug an Operating System or a unikernel that is using the QEMU built-in gdbstub. This does not require any instrumentation in the guest and works well when only QEMU is used. However, the QEMU built-in gdbstub does not work well when using KVM and the microvm machine (https://forum.osdev.org/viewtopic.php?f=13&t=39998). For example, software breakpoints and hardware breakpoints are not correctly caught. Another way to debug a guest is to include a gdbstub in the kernel or unikernel. This is the approach that Linux follows and the approach that Toro microVM follows too.   

When a kernel includes a gdbstub, it is possible to debug it independently of the Virtual Machine Monitor (VMM), e.g., QEMU, firecracker. This only requires a way to connect the gdb client with the gdbstub, i.e., serial port or any other device. In our case, the communication between the gdb client and the gdbstub is based on the virtio-console device. QEMU is used to forward the serial port to a tcp port to which the gdb client connects.

The gdbstub implements the GDB remote protocol to talk with a gdb client (see https://www.embecosm.com/appnotes/ean4/embecosm-howto-rsp-server-ean4-issue-2.html#sec_exchange_target_remote). This allows a gdbclient to set breakpoints and execute step-by-step. However, the current implementation does not support the pause command, which remains future work. 

The gdbstub interfaces between the gdbclient and the underlying debugging hardware. The x86 architecture enables programmers to set breakpoints and execute step-by-step. In the following, we explain how the gdbstub uses these features. 

To set a breakpoint, the gdbstub simply replaces the value of the first byte of an instruction in memory by the hexadecimal value 0xcc. When the processor executes this instruction, it triggers the interruption number 3, i.e., INT3. The gdbstub handles this interruption by informing the gdb client that the program has catched a breakpoint. 

The gdb client reads the value of the Instruction Pointer (RIP) to figure out which breakpoint corresponds to, and then, informs the end-user. At this point, the user can continue the execution with the “continue” command or execute the next instruction by using the “step” command. 

In the interruption handler, the value of the RIP register in the stack corresponds with the RIP_BREAKPOINT + 1. This prevents from triggering the same breakpoint when returning from the interruption.

The “step” command set the processor in “step” mode. The single step mode is controlled by a flag in the RFLAG register. In this mode, the processor triggers the interruption number 1 just after each instruction has executed. During the execution of the interruption handler, this flag is disabled and it is only enabled when returning from the interruption. In step mode, the execution is instruction by instruction. Thus, to execute a line of code, several executions may be needed until the next pascal line is reached. Note that the current implementation does not use the x86 breakpoint registers to set breakpoints. 

To allow a smoothly debugging experience, the communication between the gdbclient and the gdbstub must be fast. In this path, two elements are crucial: the serial driver in the guest and the forwarding of such information to a tcp port. The microvm machine only enables one legacy serial port. When using the legacy serial port, the communication relies on a single byte of information. QEMU packages this byte into a single tcp packet and sends it to the client. This results in a poor communication in term of performance. Instead, we use the virtio-console device to communicate the gdbclient and the gdbstub. This has two benefits: first the gdbclient can send chunks of data instead of single bytes. Second, QEMU handles the tcp communication more efficiently, for example, by packaging more data into each packet. In the overall, the use of virtio-console results in an improvement of the performance. To avoid nested interruption in the gdbstub interruption handler, the current driver is implemented by relying on a pooling mechanism.

The resulting software architecture is illustrated in the following figure:


The user builts the application in the development host, which is a Windows 10 host with Freepascal and Lazarus. We use Lazarus as front-end for the gdbclient. This allows users to debug applications by using a graphical interface. The binary of the application is transferred to the deployment host in which the VM within the application is created. This host is a Debian 10 and the application is deployed as a KVM/QEMU microvm guest. The binary of the application is stored in a cephs cluster that is shared among other hosts. When the VM is instantiated, QEMU waits until the gdb client connects to the port tcp:1234. When this happens, the guest continues to execute (watch the whole flow at https://youtu.be/AdygWtGQFPU).

In this post, we presented the built-in gdbstub to debug applications in Toro microVM. When included in an application, this module allows communication to a gdbclient thus allowing users to set breakpoints, execute step-by-step and inspect the value of variables.


Bibliography:

- https://sourceware.org/gdb/onlinedocs/gdb/Notification-Packets.html#Notification-Packets

- https://eli.thegreenplace.net/2011/01/27/how-debuggers-work-part-2-breakpoints

- https://github.com/mborgerson/gdbstub


Monday, November 02, 2020

ToroKernel becomes ToroMicroVM

Hello! I just merged the work to support microvm in Toro. This work has started at the beginning of May'20 (end of lock-down in France) and has almost finished couple of days ago (still in lock-down). This work removes the support for legacy devices,e.g., 8259, pic, CMOS, changes the bootloader among other features. The use of microvm together with other technologies like virtiofs and vsocket has simplified the code of the kernel. For example, the stack TCP/IP has been removed and only VSocket is supported. Also, the buffer-cache has been removed from the VFS. The VFS thus becomes just a wrapper for the virtiofs driver. The drivers for emulated devices have been also removed, e.g., e1000, ne2000. I have also removed drivers like virtio-net and virtio-block, which are currently not used. This results in a minimalist unikernel that focuses on virtiofs and vsocket, i.e., ToroMicroVM. I did a refactoring of the VirtIO driver by putting the code that is independent from the devices into VirtIO.pas. This eases the adding of new virtio-drivers and eliminates code duplication. Currently, the kernel has ~ 13KLoC, which is about 6KLoC less than Torokernel (~19KLoC). User can still use the old ToroKernel by just checking out the tag ToroKernel. In the future, I may simply fork the project and renamed ToroMicroVM. 

Matias


Sunday, August 16, 2020

Debugging by using QEMU trace-events

Hello folks! In this post I am going to talk a bit about QEMU trace-events. I found this mechanism during the development of a virtio driver. Roughly speaking, trace-events are generated by different components of QEMU like the virtio queues, the ioapic, lapics, etc. To enable it, you have to compile QEMU with the following option:

--enable-trace-backends=simple

Then, you have to add the following line to command-line:

-trace events=./events

The file named events contains a list of events that we are interesting to observe. For example, this is mine:

apic_*

ioapic_*

virtio_*

In my case, I am interesting on checking if irqs are correcly acknowledged. To see this, I get all the events related with apic, ioapic and virtio. To output the logs in a file, I have to get QEMU monitor and first do 'trace-events on' and second 'trace-events flush'. I am not sure why this is not automatically done. You end up getting a file named 'trace-PID' in which PID is the corresponding PID of the QEMU process. To read this file, you have just to run the following python script:

python3 ~/qemulast/scripts/simpletrace.py ~/qemulast/build/trace-events-all trace-30572   

You will get something like:

virtio_mmio_read 131.447 pid=2451 offset=0x60

virtio_mmio_write_offset 141.046 pid=2451 offset=0x64 value=0x1

virtio_mmio_setting_irq 8.345 pid=2451 level=0x0

ioapic_set_irq 4.359 pid=2451 vector=0xc level=0x0

ioapic_eoi_broadcast 29.005 pid=2451 vector=0x2c

ioapic_clear_remote_irr 1.683 pid=2451 n=0xc vector=0x2c

In this example, we can see that when an IRQ is captured, the handler reads the status register and writes it to ack the irq. Then, the virtio sets the irq level at 0x0. The handler ends up by sending the EOI to the LAPIC. You can find more information about trace-events at: 

https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/devel/tracing.txt;hb=HEAD.

http://blog.vmsplice.net/2011/02/observability-using-qemu-tracing.html

http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html

Saturday, June 27, 2020

Status of the port of Toro to microvm

Since May I am working on porting TORO to the new microvm machine, which is a simplified QEMU machine with a reduced device model and an improved booting time, among others very interesting features (see https://github.com/qemu/qemu/blob/master/docs/microvm.rst). For Toro, I am interested in removing all the support for legacy hardware and to have virtio-vsocket and virtio-fs working on this kind of machine. I splitted the work into the following items:
1. Compile Toro as a PVH kernel and support PVH configuration during booting
  - Issue #390
  - Issue #391
2. Add support for multicore by identifying cores on the MP table.
  - Issue #392
3. Add support for LAPIC and IOAPIC
 - Issue #395
4. Use KVM clock to get current time
- Issue #366
5. Add mmio transport layer for virtio-vsocket
- Issue #403
6. Add mmio transport layer for virtio-fs
- Issue #404
Work items from 1 to 4 are already implemented. These were tasks that removed support for legacy hardware like 8259 and the CMOS. IRQs are now handled by the LAPIC and the IOAPIC. The issues 5 and 6 mainly add support for the virtio-mmio transport layer for these drivers. The detection of mmio devices is simpler than by using PCI. The information about virtio-devices is passed in the kernel command line. The driver has to parse the kernel command line and gets the base address and the irq base. The driver for virtio-vsocket has been already ported. I am currently working on porting the driver for virtio-fs. I hope this work is finished in about a month. Stay tuned! 

Matias E. Vara Larsen