A dedicated kernel for multi-threading applications.

Saturday, January 15, 2011

Memory organization in a Multicore system

This paper is a part of the final project called "Parallel Algorithm with TORO kernel", Electronic Engineering, Universidad Nacional de La Plata. In the next months I will publish more papers about my final project. Enjoy!

Memory organization in a Multicore system

Actually, the "Uniform memory access" is the common way to access the memory (See SMP). In this kind of arquitecture, every processor can read every byte of memory, and the processors are independent. In this case, a shared bus is used and the processors compite but only one can write or read. In this environments just one processor can access to a byte in a gived time. For the programmers the memory access is transparent.


In 1992 Intel made the first SMP processor called Pentium PRO. And the memory bus was called Front Side Bus.

That is a bi-directional bus, it is too simple and very cheap, and in theory it scales well.

The next intel step was partition the FSB in two independent bus, but the cache coherency was a bootle-neck.

In 2007 it was implemented a bus per processor.

This kind of architecture is used by Atom, Celeron, Pentium and Core2 of intel.

In a system with many cores, the traffic through the FSB is heavy. The FSB doesn´t scale and it has a limit of 16 processor per bus. So the FSB is wall for the new multicores technology.

We can have CPU that it executes instructions fastly but we waste time if we can´t make the capture and decodification fastly. In the best case, we lose one cycle more reading from the memory.

Since 2001 the FSB has been replaced with point to point devices as Hypertransport or Intel QuickPath Interconnect. That changed the model memory to non uniform memory access

Matias E. Vara www.torokernel.org


Thursday, December 30, 2010

x86 rings protection on Toro

In x86 arch there are 4 ring levels. Ring 0 is the most privileged level and ring 3 is the least. In a OS the kernel runs in ring 0 and the user application runs in ring 3.

Ring 0 descriptors are used by the kernel and ring 3 descriptors are used by the user. The GDT supports up to 8192 descriptors but the OS just uses 4, two for kernel´s text and data, and two for user´s text and data. With these descriptors the kernel can access to all memory, for example 4GB in 32 bits.

When the OS uses privileged levels the processor has to check if every operation is valid, these mechanisms adds latencies. In a multitasking OS protection is essential and it protects the kernel code and data.

But what happens with dedicated multithread application? It runs alone in the system and it was written carefully, we need protection in this case? guessing that, we can reduce a lot the OS.
For example, if we want to implement syscalls, we don´t need traps. If the kernel and user application are in same ring, we just may use “call” instruction to kernel´s function. Actually, OS are using interruptions for support syscalls but they are too expensive. Don´t forget that we are jumping from ring 0 to ring 3.

In other way, when we are running user application and a interruption happens, the processor has to jump from ring 3 to ring 0, that is too expensive. In general cases a kernel procedure handles the interruption. If the user application runs in the same level that the kernel, we don´t spend time in latencies.

On TORO, the kernel and user application run in ring 0. And cause the kernel and app are compiling together, the syscalls are implemented easy. It just uses “call” instruction.

Matias E. Vara

Friday, November 26, 2010

Old TORO

Since 2006, the year in which TORO objectives were modified, the TORO OS project corresponding to the version 1.xx was discontinued. These versions achieved a great success in terms of functionality. The most stable version was 1.1.3, sometimes I see the source of those versions and I feel very sorry of have abandoned it, however it was impossible to me to continue with both projects simultaneously. So I decided to make a small tribute to those versions. For that i will show next how to test TORO 1.1.3 through BOCHS. I've put some screenshots for you to observe the beauty of a shell in PASCAL. Enjoy it!

For these simulations is necessary x86 Bochs, remember that the 1.1.3 version is only for 32 bits. Here I include the torobch.bxrc file contents.

megs: 256

romimage: file=BIOS-bochs-latest, address=0xf0000

floppya: 1_44=toro-1.1.3.img, status=inserted

boot: floppy

It is necessary to download the toro-1.1.3 image from the link:

http://sourceforge.net/projects/toro/files/images/toro-1.1.3/toro-1.1.3.img/download

If everything is allright, the first window you will see when execute BOCHS will be:


Corresponds the GRUB bootloader, there you select TORO-1.1.3 and press enter.

It will start to load the OS and then the Shell:

We are ready to enter commands on TORO. The first command we will see is the ls that, as you already know, lists the actual directory.


Now we go to the directory where finds TORO's source using cd.

And we are going to run echo printk.pas, this will display the file content on screen.


You can see all Shell commands at /BIN directory, these are:

Running reboot the system is closed and we can turn off the virtual machine.

I hope you enjoyed it, make your own experiences running commands. You can also burn the image in a 3 ½ floppy and try it in a real machine.

Attention: Versions 1.x.x have no relation with version 0.xx, they are different things.

Matias E. Vara

www.torokernel.org

Saturday, November 20, 2010

Toro Builder uploaded!

I uploaded the new interface for Toro developing. You can compile and debug the kernel easy using ECLIPSE and QEMU, for download go here but first read the new WIKI for more information. I am working hardly for write usefull information.
If someone wants to be an EDITOR of WIKI contact me to torokernel@gmail.com, saludos.

Matias E. Vara
www.torokernel.org