Friday, June 24, 2011
Toro bootloader
How can I start it?. The bootloader is a project itself, if you want to write a hobby OS you do not have to start from the bootloader. First, It will take you a lot of time, second, it is too hard to debug so you will become disappointed fast and you wont finish. I think that the important and interested things happens inside the kernel. Anyway, there are a few crazy guys that they want to make one. For that kind of guys, I have just started to write a few documentation about Toro's bootloader in the wiki. I hope that you find it interesting and appreciate the effort done (Yes, I don't like to write documentation but I know that it is too important ;) ).
Matias E. Vara
www.torokernel.org
Sunday, April 17, 2011
Memory Protection in a multicore environment
This post is contained into the final paper of Matias Vara named “Paralelizacion de Algoritmos Numericos con TORO Kernel” to get the degree on Electronic Engeniering from Universidad de La Plata. These theorical documents help to understand the kernel design.
Introduction
When a Kernel is designed for a multicore system, the shared memory must be protected of concurrent writing accesses. The memory's protection increments kernel code complexity and decreases operative system's performance. If one or more processors are having access to some data at the same time, mutual exclusion must be realized to protect shared data in multicore systems.
In a mono-processor multi-task system the scheduler often switch the task, so the unique risk is while the task is changing the information the scheduler take it out the cpu. The protecction is this case is easy: disabled the scheduler while the task is in a critical section and then enabled again.
In a Multiprocessor system that solution can't be implemented. When we have tasks running in parallel, two or more tasks may execute the same line in the same time; Hence, the scheduler state doesn't care.
Resources protection
For protect resources in a multiprocessing system we need to define atomic operations. These are implemented in just one assembler instruction but several clock cycles.
Atomic operations
In every processor, write and read operations are always atomic. This means that when the operation is executing nobody is using that memory area.
For certain kind of operations the processor blocked the memory, with this purpose is provided the #Lock signal that it is used for critical memory operations. While this signal is high, the calls from other processors are blocked.
Bus memory access is non-deterministic; this means that the first one processor gets the bus. All the processors compete for the bus, then in a system with a lot processor this is a bottleneck.
But, why do we need atomic operations? Supposing that we have to increment a counter, the pascal's source is :
counter := counter +1;
If this line is executed at the same time, in several processors, the result will be incorrect if it is not atomic.
The correct value is 2, using atomic operations the processors access to the variable once per time and the result is corrected. The time to the sincronization increments with the number of processor. The common atomics operations are "TEST and SET" and "COMPARE and SWAP".
Impact of atomic operations
In system with a few processors, atomic operations does not represent a big deal and they are a fast solution for shared memory problem; However, if we increment the number of processors then we make a bottleneck.
Supposing a computer with 8 cores and with 1.45 GHz [1], while an instruction average time is 0.24 ns, atomic increment spends 42.09 ns. The time wasted making lock becomes critical.
[1] Paula McKenney: RCU vs. Locking Performance on Different Types of CPUs.
http://www.rdrop.com/users/paulmck/RCU/LCA2004.02.13a.pdf, 2005
Etiquetas:
atomic operations,
locks,
protection,
SMP
Tuesday, April 05, 2011
Memory organization in a multicore system: Conclusion.
From programmer point of view, the access to local and remote memory is transparent. An NUMA could be implemented in a SMP system without any problem. However, the OS must do an efficient memory assignation for improve these technologies.
In the case of SMP, memory administation is easy to implemented while in NUMA is not. The system has to assign memory depending of the cpu where the process is running. Every CPU has an own memory bank. The system performance is poor if there are more remote access than local.
Windows has supported NUMA since 2003 version and Linux since 2.6.X. Both of them gives syscalls to exploit NUMA.
TORO kernel is optimized for NUMA technologies, keeping in mind the moderns processors. The unique way to support NUMA is using dedicate buses. In the high performance environment these improves mustn't forget.
Matias E. Vara
www.torokernel.org
In the case of SMP, memory administation is easy to implemented while in NUMA is not. The system has to assign memory depending of the cpu where the process is running. Every CPU has an own memory bank. The system performance is poor if there are more remote access than local.
Windows has supported NUMA since 2003 version and Linux since 2.6.X. Both of them gives syscalls to exploit NUMA.
TORO kernel is optimized for NUMA technologies, keeping in mind the moderns processors. The unique way to support NUMA is using dedicate buses. In the high performance environment these improves mustn't forget.
Matias E. Vara
www.torokernel.org
Sunday, March 13, 2011
e1000 driver for TORO
I have just started the implementation of e1000 driver like Intel Gigabit or compatible. I am using Minix 3 source and qemu as an emulator (Begin 0.12.0 version it supports e1000 emulator). The detection procedure is complete as you can see in the picture, It is uploaded to SVN.

Saludos!
Matias E. Vara
www.torokernel.org
Saludos!
Matias E. Vara
www.torokernel.org
Subscribe to:
Posts (Atom)