A dedicated unikernel for microservices

Monday, December 31, 2018

Toro in 2018 was great!

Hello folks! the year is ending and I want to summarise what happen in Toro during 2018. Let me begin with the events. We had the opportunity to present Toro in FOSDEM'18 and Open Source Summit Europe 2018. Both conferences were very interesting and we got tone of feedback. Regarding with publications, I had the pleasure to write an article for the Blaise Pascal Magazine. I hope to continue by doing this in 2019. This year was particularly interesting in new features. Here I list a few of them: 
- Virtio network driver
- Fat driver to enable support of the qemu's vfat interface
- Support for try..except statement 
- Support for the -O2 flag when kernel is compiled
- Optimisation of the cpu usage during idle loop
- Optimisation of the booting time and the size of the generated image. This is going to be presented in FOSDEM'19.
But how 2019 will be for Toro? Toro is perfect for microservices and during 2019 we will show that. Toro is going to support both blocking and non-blocking sockets. The former for microservices that do IO and the latter for microservices that do not need to block to answer requests. Toro is going to support more Virtio drivers, e.g, block devices, serial devices. Following that work, I am investigating the porting of Toro to solutions that propose the use of microVMs like firecracker or NEMU. Roughly speaking, these solutions propose a reduced device model in which most of the devices are not emulated and only VirtIO devices are supported. These solutions have several benefits like small footprint and fast booting thus making these solutions perfect to host microservices.

Have a nice 2019!


Tuesday, December 18, 2018

Toro will be at FOSDEM'19!

Hello Folks! I have the pleasure to present Toro at FOSDEM'19. This will be my third presentation. This time I am going to talk about how Toro is optimized to speed up the booting time. This is particularly interesting in the context of microservices. When a VM is used to host a microservice, it is powered on/off on demand. This allows cloud providers to save resources like CPU and memory. However, this requires that the VM is up and running very quickly. In this talk, I discuss three approaches that aim to speed up the initialization of VMs. These approaches are NEMU, Qboot, and Firecracker (see abstract here). During the talk, I use these solutions in Toro and I discuss benefits and drawbacks. 

Sunday, August 26, 2018

Toro supports for try...except statement and user exceptions!

Hello folks! I just merged to master the commits to support the try..except statement. This allows user applications to handle exceptions. To do this, I had to switch to Linux RTL, which involved a lot of changes. I updated the wiki in case you want to try. In Windows, It is necessary to get a Freepascal cross-compiler from Windows to Linux that very well explained in the wiki. I hope you enjoy!

Matias Vara   

Thursday, August 16, 2018

Toro will be present in OSSEU18!

Hello folks! I am very happy to announce that Toro will be in OSSEU'18! For further information check http://sched.co/FxYD. I hope to see you all there!

Cheers, Matias.

Sunday, June 24, 2018

Reducing CPU usage on Toro guests, "The numbers"

Hello folks! I experimented around the last improvement of Toro regarding with reducing the energy consumption. I want to thank my very closed friend Cesar Bernardini for the experiments. In the tests, we compare an Ubuntu guest with a Toro guest on Qemu. We set up a 2 core machine with 256MB per core. To bench each kernel, we generate N http requests and then we stop, we repeat it every X time. Then, we measure the CPU usage of the Qemu process by using top. Then, we get the following graphs:

Toro without any improvement:
In this graph, you can see that Qemu's process is at 100% all the time. 

Toro with improvements:

With the improvements, Qemu's process is at 100%  only when traffic is received.  

Ubuntu guest:

When Ubuntu is on, i.e., when traffic is received, Qemu's process uses between ~40%...60%, then , when there is no trafic, cpu usage downs to around ~ 0%..15%.

In the next experiments, we incress the number of messages.

Toro guest:
When the number of messages is incressed, the Toro guest footprint does not change.

Ubuntu guest:
In the case of a Ubuntu guest, the cpu usage of the Qemu process reaches the 100% during traffic. This means that Ubuntu is correctly scaling the cpu usage on demand. 

Take Away Lessons:

- In production, CPU usage of Guests is important because the VCPUs are a shared resource
- The approach in Toro has reduced the CPU usage in a half, however a an overall power management solution must also scale the CPU, i.e., processor in P-State
- The approaches may depend on the hypervisor and its ability to emulate/virtualize the instructions related with power consumption, e.g., mwat/mcontrol

Thursday, May 24, 2018

Booting Toro in 123ms on QEMU-Lite

Hello folks! I have spent some time to port Toro to QEMU-Lite. This work is still very experimental and can be found in the branch feature-bootforqemu-lite. If you want to know more about QEMU-Lite check this great presentation. Roughly speaking,  QEMU-Lite is an improved version of QEMU, which is dedicated to boot a Linux kernel guest. QEMU-Lite improves the booting time by removing unnecessary steps in the booting process. For example, it removes the BIOS and the need of a bootloader. When QEMU jumps to the kernel code, the microprocessor is already in 64 bit long mode with paging enabled. To make Toro works on QEMU-Lite, I have to remove the whole bootloader and replace it by a simpler one that supports the Multiboot standar. So far I am only able to boot the application ToroHello.pas that takes only 123ms to boot. Future work will be to support multiprocessing so stay tuned!

Cheers, Matias.

Friday, April 20, 2018

Easing the sharing of files between host and guest by using the Qemu Fat feature

Hello folks! I have just committed the first version of a fat driver. This driver together with the vfat interface of Qemu eases the sharing of files between the guest and the host. This new feature relies on the mechanism of Qemu to present to a guest a fat partition from a directory in the host. This mechanism is enabled by passing "-drive file=fat:rw:ToroFiles", where ToroFiles is the path of a directory in the host machine. By doing so, Qemu presents to the guest a new block device in which there is a fat partition that includes all the file structure of the ToroFiles directory. Depending on some flags, the partition can be either fat32 or fat16. From the qemu's source code, it seems fat32 is not tested enough so I decided to support fat16 only. The main benefits of this mechanism is to ease the sharing of files between the guest and the host. The main drawback is you should not modify the directory while the guest is running because Qemu may get confused. To know more about this fetaure in qemu, you can visit https://en.wikibooks.org/wiki/QEMU/Devices/Storage. The commit that adds this feature can be found here https://github.com/MatiasVara/torokernel/commit/2de6631d10202f20db7cef61469ed9e795ed6954. For the moment, the driver allows only read operations. I expect to have writing operations soon.


Saturday, February 10, 2018

Docker image to compile Toro on Linux, Part II

In the first part of this post (here), I explained how to use a docker image to compile Toro. I worked a bit on this procedure and I modified CloudId.sh to make it use the container. To compile Toro by using CloudIt.sh, you need first to install docker and then follow these steps:
1. Pull the docker image from dokerhub
docker pull torokernel/ubuntu-for-toro
2. Clone torokernel git repo
3. Go to torokernel/examples and run:
./CloudIt.sh ToroHello 
If everything goes well, you will get ToroHello.img in torokernel/examples. In addition, if you have installed KVM, you will get an instance of a Toro guest that runs ToroHello.  


Monday, February 05, 2018

Docker image to compile Toro

Hello folks! I just created a docker image to compile Toro kernel examples. You can find the image in https://hub.docker.com/r/torokernel/ubuntu-for-toro/. To try it, follow the steps:

1. Install docker. You can find a good tutorial in https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-docker-ce-1

2. Once installed, in a command line run:

 docker pull torokernel/ubuntu-for-toro

3. Clone ToroKernel repository that will be used to provide the code to be compiled:

git clone https://github.com/MatiasVara/torokernel.git

and then move current directory to ./torokernel

4. In a command line, run:

sudo docker run -it -v $(pwd):/home/torokernel torokernel/ubuntu-for-toro bash 

This command returns a bash in which current directory, i.e., torokernel directory, is mounted at /home/torokernel. So now we can just go to /home/torokernel/examples and run:

wine c:/lazarus/lazbuild.exe ToroHello.lpi

This will compile and build ToroHello.img. When we are done in the Docker, we can Exit.


Thursday, January 25, 2018

Toro supports Virtio network drivers!

Hello folks! the last three weeks, I have been working on adding support for virtio network drivers in Toro (see VirtIONet.pas). In a virtualisation environment, virtio drivers have many benefits:
- they perform better than e1000 or other emulated network card.
- they abstract away the hardware of the host thus enabling the drivers to work on different hardware.  
- they are an standard way to talk with network cards which is supported by many hypervisors like KVM, QEMU or VirtualBox. 
The way that virtio network cards work is quite simple. They are based on the notion of virtqueue. In the case of networking, network cards have mainly two queues: the reception queue and the transmission queue. Roughly speaking, each queue has two rings of buffers: the available buffers and the used buffers. To provide a buffer to the device, the driver puts buffers in the available ring, then the device consumes and put them in the used ring. For example, in the case of the reception queue, the driver feeds the device by putting buffers in the available queue. Then, when a packet arrives, the device takes buffers from the available queue, writes the content and puts them in the used queue. You can find many bibliography on internet. However, I would recommend this post which also proposes the code C of the drivers. I think testing is the harder part. I found different behaviours depending on where are you testing, e.g., KVM, QEMU. For example, in KVM, if you don't set the model=virtio-net, the driver just does not work. To test, I basically have my own version of QEMU which prints all the logs straight to stdout. Also, Wireshark helps to trace the traffic and find duplicate packets or other kind of misbehaviour. The good part: when you are done with one virtio network driver, you can easily use it as a template because all virtio driver are very similar. I had no time yet to compare with e1000 but I am expecting good numbers :)

Cheers, Matias. 

Friday, January 12, 2018

Toro compiles with -O2 option

Hello everyone, I spent the last days by trying to compile Toro kernel with the -O2 option. This option tells the compiler to optimise the execution by using the CPU registers. This means that the compiler would keep the data in registers instead of the memory. This supposes a huge improvement in the performance since the access to registers is wide more faster than memory. I give for details about this issue in https://github.com/MatiasVara/torokernel/issues/135. The problem that I faced was that some assembler functions were violating the ABI of Windows x64. In other words, these functions were not restoring the value of registers that the compiler uses. In addition, the IRQ handlers were not doing so either. After fixing all these issues, I am able to compile with -O2 and the result is already in master. A simple comparison of TorowithFilesystem.pas shows an speed up of ~12%!!!! Compilation with -O3 is also posible but I did not make any benchmark.