Monday, April 30, 2012
Toro's article in Microelectronic Congress
Matias E. Vara
www.torokernel.org
Sunday, April 17, 2011
Memory Protection in a multicore environment
This post is contained into the final paper of Matias Vara named “Paralelizacion de Algoritmos Numericos con TORO Kernel” to get the degree on Electronic Engeniering from Universidad de La Plata. These theorical documents help to understand the kernel design.
Introduction
When a Kernel is designed for a multicore system, the shared memory must be protected of concurrent writing accesses. The memory's protection increments kernel code complexity and decreases operative system's performance. If one or more processors are having access to some data at the same time, mutual exclusion must be realized to protect shared data in multicore systems.
[1] Paula McKenney: RCU vs. Locking Performance on Different Types of CPUs.
http://www.rdrop.com/users/paulmck/RCU/LCA2004.02.13a.pdf, 2005
Saturday, January 15, 2011
Memory organization in a Multicore system
Memory organization in a Multicore system
Actually, the "Uniform memory access" is the common way to access the memory (See SMP). In this kind of arquitecture, every processor can read every byte of memory, and the processors are independent. In this case, a shared bus is used and the processors compite but only one can write or read. In this environments just one processor can access to a byte in a gived time. For the programmers the memory access is transparent.
In 1992 Intel made the first SMP processor called Pentium PRO. And the memory bus was called Front Side Bus.
The next intel step was partition the FSB in two independent bus, but the cache coherency was a bootle-neck.
In 2007 it was implemented a bus per processor.
This kind of architecture is used by Atom, Celeron, Pentium and Core2 of intel.
In a system with many cores, the traffic through the FSB is heavy. The FSB doesn´t scale and it has a limit of 16 processor per bus. So the FSB is wall for the new multicores technology.
We can have CPU that it executes instructions fastly but we waste time if we can´t make the capture and decodification fastly. In the best case, we lose one cycle more reading from the memory.
Since 2001 the FSB has been replaced with point to point devices as Hypertransport or Intel QuickPath Interconnect. That changed the model memory to non uniform memory access
Matias E. Vara www.torokernel.org
Sunday, September 19, 2010
Threads migration without Lock in Toro
- Thread Emigrating: is when the threads are created in a remote processor.
- Thread Inmigrating: is when the guest processor enque in its scheduler the threads that they are comming from others processors.
This is the unique kernel point which needs syncronization between the cores. The mechanism is called "Exchange Slot" and it works without any atomic operation. In this case it used for send and receiv threads but it works with any kind of data.
For every processor in TORO there is an structure called TSchedulerExchangeSlot:
DispatcherArray: array[0..MAX_CPU-1] of PThread;
EmigrateArray: array[0..MAX_CPU-1] of PThread;
end;
Where MAX_CPU is the number of processors and PThread is a pointer to TThread structure. From the structure declaration we can see that every processor has two arrays
The procedure to send threads to remote processor has three stages:
1-The user calls to BeginThread()for create a new one, if the parameter CPUID is different to local CPU then the kernel enque it to DispatcherArray[CPUID].
2-During Scheduling (cause SysThreadSwitch syscall). The procedure Emigrating()moves all threads from DispatcherArray[] to EmigrateArray[] (only if EmigrateArray[] is nil)
3-During Scheduling of the Remote CPU, the procedure Inmigrating() look for a not nil entry in EmigrateArray[LocalCPUID] in every TSchedulerExchangeSlot processor structure. If it is not nil Then import all the threads to local scheduler and become EmigrateArray[LocalCpuid] to nil.
Local processor just writes and read to DispatcherArray[]. While the local and remote processor write and read to EmigrateArray[], but the access is synchronized using nil pointer.
The “Exchange Slot” doesn´t need "LOCK" instruction.
Matias E. Vara
www.torokernel.org