A dedicated unikernel for microservices

Thursday, December 30, 2010

x86 rings protection on Toro

In x86 arch there are 4 ring levels. Ring 0 is the most privileged level and ring 3 is the least. In a OS the kernel runs in ring 0 and the user application runs in ring 3.

Ring 0 descriptors are used by the kernel and ring 3 descriptors are used by the user. The GDT supports up to 8192 descriptors but the OS just uses 4, two for kernel´s text and data, and two for user´s text and data. With these descriptors the kernel can access to all memory, for example 4GB in 32 bits.

When the OS uses privileged levels the processor has to check if every operation is valid, these mechanisms adds latencies. In a multitasking OS protection is essential and it protects the kernel code and data.

But what happens with dedicated multithread application? It runs alone in the system and it was written carefully, we need protection in this case? guessing that, we can reduce a lot the OS.
For example, if we want to implement syscalls, we don´t need traps. If the kernel and user application are in same ring, we just may use “call” instruction to kernel´s function. Actually, OS are using interruptions for support syscalls but they are too expensive. Don´t forget that we are jumping from ring 0 to ring 3.

In other way, when we are running user application and a interruption happens, the processor has to jump from ring 3 to ring 0, that is too expensive. In general cases a kernel procedure handles the interruption. If the user application runs in the same level that the kernel, we don´t spend time in latencies.

On TORO, the kernel and user application run in ring 0. And cause the kernel and app are compiling together, the syscalls are implemented easy. It just uses “call” instruction.

Matias E. Vara

Friday, November 26, 2010


Since 2006, the year in which TORO objectives were modified, the TORO OS project corresponding to the version 1.xx was discontinued. These versions achieved a great success in terms of functionality. The most stable version was 1.1.3, sometimes I see the source of those versions and I feel very sorry of have abandoned it, however it was impossible to me to continue with both projects simultaneously. So I decided to make a small tribute to those versions. For that i will show next how to test TORO 1.1.3 through BOCHS. I've put some screenshots for you to observe the beauty of a shell in PASCAL. Enjoy it!

For these simulations is necessary x86 Bochs, remember that the 1.1.3 version is only for 32 bits. Here I include the torobch.bxrc file contents.

megs: 256

romimage: file=BIOS-bochs-latest, address=0xf0000

floppya: 1_44=toro-1.1.3.img, status=inserted

boot: floppy

It is necessary to download the toro-1.1.3 image from the link:


If everything is allright, the first window you will see when execute BOCHS will be:

Corresponds the GRUB bootloader, there you select TORO-1.1.3 and press enter.

It will start to load the OS and then the Shell:

We are ready to enter commands on TORO. The first command we will see is the ls that, as you already know, lists the actual directory.

Now we go to the directory where finds TORO's source using cd.

And we are going to run echo printk.pas, this will display the file content on screen.

You can see all Shell commands at /BIN directory, these are:

Running reboot the system is closed and we can turn off the virtual machine.

I hope you enjoyed it, make your own experiences running commands. You can also burn the image in a 3 ½ floppy and try it in a real machine.

Attention: Versions 1.x.x have no relation with version 0.xx, they are different things.

Matias E. Vara


Saturday, November 20, 2010

Toro Builder uploaded!

I uploaded the new interface for Toro developing. You can compile and debug the kernel easy using ECLIPSE and QEMU, for download go here but first read the new WIKI for more information. I am working hardly for write usefull information.
If someone wants to be an EDITOR of WIKI contact me to torokernel@gmail.com, saludos.

Matias E. Vara

Wednesday, November 10, 2010

Context Switching

In this article I will try to make a brief description about “Context Switching” and specially how TORO implements it since version 0.01. I won’t talk about how TORO OS implements it in versions 1.xx, I’ll just say that it uses “Context Switching” by hardware. In future articles I will show the implementation of these ideas on pascal. If you have doubts, see the references! Enjoy it!

As we know, in kernel, the scheduler is in charge of thread distributions. A part of implementing the scheduling algorithm it performs the "Context Switching" procedure. When selects a new thread, the scheduler fills the processor registers with the values that they had just before the thread invokes the kernel (NOTE this comment is related with TORO’S scheduling algorithm, which is the cooperative thread).

Figure 1. Procedure for loading a new process.

In the x86-64 architecture some of the general use registers are RAX, RBX, RCX, etc. In addition to these registers, some system registers must be also updated, like CR3. These ones store information about the thread page directory that will be loaded. The procedure is named "Context Switching " and is a critical operation because it runs continuously so it must be very fast.

The “Context Switching” procedure can be implemented by software and by hardware.

When it’s implemented by hardware, it uses the mechanisms that provide a particular architecture for make the “Context Switching “. For example, for the x86 architecture, are used the structures named “tasks descriptors”, these ones are in the GDT (Global Descriptor Table) and when a new task must be loaded is simply used the instruction "call" to the task descriptor (called in the literature as TSS) [1].

On the other hand, in the implementation by software the “Context Switching” is done "by hand" and is a routine written by the programmer the one in charge of saving the value of the registers.

At first glance the “Context Switching” by hardware seems to be the best option because the programmer isn’t involved and is done "automatically". Often, in the “Context Switching” by hardware, registry values are all saved, but sometimes are not being used all records. That’s why the implementation by hardware may not be the best option.

For this reason the “Context Switching” in TORO is implemented by software, programming techniques are used for not using the “Context Switching” mechanism that offers a particular hardware. When the scheduler selects a new thread, it loads in the processor's registers the values corresponding to the new thread, and then starts run. The thread begins its execution after the moment when the SysThreadSwitch procedure was called.

As the “Context Switching” is always done after invoke the SysThreadSwitch function, the planner supposes that at that moment the processor's registers are not being used by the user application. In this way it's limited only to save the state of the stack's thread that has to be removed. For the implementation on x86-64 architecture, this is achieved by saving on the TThread structure the RSP registry value, wich keeps its position inside the stack.

The “Context Switching” implemented in TORO is faster than the one by hardware and the one implemented on an OS for general purposes. The selection of the “Context Switching” method is directly related to the cooperative thread model.

Therefore the utilization of “Context Switching” by software adds portability and speed [2].

[1]. Intel. IA-32 Intel® Architecture Software Developer’s Manual. Vol3. 2004.

[2]. Osdev Wiki, Context Switching, http://wiki.osdev.org/Context_Switching.

Matias E. Vara


Monday, November 08, 2010

Changes on SVN

I ´ll work on SVN directories so maybe it ´ll be off-line a few hours. You should make an update on SVN.

Matias E. Vara

Thursday, October 28, 2010

BugCon Presentation

Here it is the link to Toro´s presentation on BugCon2010. Enjoy!

PD: The presentation is in english. Thanks Eugene!

Matias E. Vara

Sunday, September 19, 2010

Threads migration without Lock in Toro

In a Multicore environment, the programmer needs to create local and remote threads. In TORO create a remote threads is easy, you just have to use BeginThread() with the appropriate CPU identification. On that basis, there are two important procedures in TORO:

- Thread Emigrating: is when the threads are created in a remote processor.
- Thread Inmigrating: is when the guest processor enque in its scheduler the threads that they are comming from others processors.

This is the unique kernel point which needs syncronization between the cores. The mechanism is called "Exchange Slot" and it works without any atomic operation. In this case it used for send and receiv threads but it works with any kind of data.

For every processor in TORO there is an structure called TSchedulerExchangeSlot:

TSchedulerExchangeSlot = record
DispatcherArray: array[0..MAX_CPU-1] of PThread;

EmigrateArray: array[0..MAX_CPU-1] of PThread;


Where MAX_CPU is the number of processors and PThread is a pointer to TThread structure. From the structure declaration we can see that every processor has two arrays
(DispatcherArray y EmigrateArray), and every entry in the array is a pointer to a thread´s queu.

The procedure to send threads to remote processor has three stages:
1-The user calls to BeginThread()for create a new one, if the parameter CPUID is different to local CPU then the kernel enque it to DispatcherArray[CPUID].
2-During Scheduling (cause SysThreadSwitch syscall). The procedure Emigrating()moves all threads from DispatcherArray[] to EmigrateArray[] (only if EmigrateArray[] is nil)
3-During Scheduling of the Remote CPU, the procedure Inmigrating() look for a not nil entry in EmigrateArray[LocalCPUID] in every TSchedulerExchangeSlot processor structure. If it is not nil Then import all the threads to local scheduler and become EmigrateArray[LocalCpuid] to nil.
Local processor just writes and read to DispatcherArray[]. While the local and remote processor write and read to EmigrateArray[], but the access is synchronized using nil pointer.
The “Exchange Slot” doesn´t need "LOCK" instruction.

The Inmigrating and Emigrating procedures are called from the Scheduler. The scheduler makes a few system task, for example in the picture, we can see the scheduler´s flow diagram. There, first it calls Inmigrating(), after that it calls Emigrating() and At the end a new thread is scheduling.

Matias E. Vara

Thursday, July 22, 2010

Toro in BugCon 2010!

I 'll be present in BugCon 2010 meeting, Mexico DF. I 'll talk about the problems of Modern Systems on Multicore environment and How Toro fixes them . Also I 'll compile and test toro step by step using Eclipse and QEMU. You can see others speakers here.
Matias E. Vara

Monday, April 12, 2010

Testing TORO using QEMU and ECLIPSE on Win64

I uploaded a video .

It shows an environment to compile and test toro fastly through Qemu and Eclipse.I 'll upload a good tutorial in few weeks. In other way I was making a release with all the tools necessary to compile and emulate TORO.

Matias E. Vara

Monday, April 05, 2010


Hi!, I was looking a good tool to compile and test toro fastly . I found it , I am testing Eclipse + Qemu in Ubuntu for 64 bit and I could run toro step by step and others things.
I am using a plug-in (Pascaline) for support Pascal lenguaje on Eclipse.
Here you have a few screen , but at the moment It doesn't work at 100% .

Qemu's session.


Thursday, January 07, 2010

Lazarus + GDB + QEMU

I am working hard to obtain a plataform to compile, run and test TORO fast. In this way I am doing a few modifications on Lazarus' source . Lazarus is an excelent IDE for Freepascal Compiler.
The BUILD program was included into Lazarus , So when you do "RUN" Lazarus makes :
- toro.exe
- ToroImage.img
- Run Qemu
- Debug TORO using Lazarus (GDB client ---> QEMU GDB Server) .
Lazarus can be compile to Linux or Window so It is easy to test TORO in both plataform.
Matias E. Vara