Showing posts with label atomic operations. Show all posts
Showing posts with label atomic operations. Show all posts
Monday, April 30, 2012
Toro's article in Microelectronic Congress
Here it is the link to the article that I've shown at Microelectronic Congress 2011, Faculty of Engineering, University of La Plata, Argentina. The only thing is it is in spanish. It talks about TORO project and it shows a benchmarks that I made for my graduated project. Enjoy!
Matias E. Vara
www.torokernel.org
Matias E. Vara
www.torokernel.org
Etiquetas:
atomic operations,
kernel,
linux,
locks,
multiprocessing,
NUMA,
SMP,
threading cooperative,
windows,
x86,
x86-64
Sunday, April 17, 2011
Memory Protection in a multicore environment
This post is contained into the final paper of Matias Vara named “Paralelizacion de Algoritmos Numericos con TORO Kernel” to get the degree on Electronic Engeniering from Universidad de La Plata. These theorical documents help to understand the kernel design.
Introduction
When a Kernel is designed for a multicore system, the shared memory must be protected of concurrent writing accesses. The memory's protection increments kernel code complexity and decreases operative system's performance. If one or more processors are having access to some data at the same time, mutual exclusion must be realized to protect shared data in multicore systems.
In a mono-processor multi-task system the scheduler often switch the task, so the unique risk is while the task is changing the information the scheduler take it out the cpu. The protecction is this case is easy: disabled the scheduler while the task is in a critical section and then enabled again.
In a Multiprocessor system that solution can't be implemented. When we have tasks running in parallel, two or more tasks may execute the same line in the same time; Hence, the scheduler state doesn't care.
Resources protection
For protect resources in a multiprocessing system we need to define atomic operations. These are implemented in just one assembler instruction but several clock cycles.
Atomic operations
In every processor, write and read operations are always atomic. This means that when the operation is executing nobody is using that memory area.
For certain kind of operations the processor blocked the memory, with this purpose is provided the #Lock signal that it is used for critical memory operations. While this signal is high, the calls from other processors are blocked.
Bus memory access is non-deterministic; this means that the first one processor gets the bus. All the processors compete for the bus, then in a system with a lot processor this is a bottleneck.
But, why do we need atomic operations? Supposing that we have to increment a counter, the pascal's source is :
counter := counter +1;
If this line is executed at the same time, in several processors, the result will be incorrect if it is not atomic.
The correct value is 2, using atomic operations the processors access to the variable once per time and the result is corrected. The time to the sincronization increments with the number of processor. The common atomics operations are "TEST and SET" and "COMPARE and SWAP".
Impact of atomic operations
In system with a few processors, atomic operations does not represent a big deal and they are a fast solution for shared memory problem; However, if we increment the number of processors then we make a bottleneck.
Supposing a computer with 8 cores and with 1.45 GHz [1], while an instruction average time is 0.24 ns, atomic increment spends 42.09 ns. The time wasted making lock becomes critical.
[1] Paula McKenney: RCU vs. Locking Performance on Different Types of CPUs.
http://www.rdrop.com/users/paulmck/RCU/LCA2004.02.13a.pdf, 2005
Etiquetas:
atomic operations,
locks,
protection,
SMP
Sunday, September 19, 2010
Threads migration without Lock in Toro
In a Multicore environment, the programmer needs to create local and remote threads. In TORO create a remote threads is easy, you just have to use BeginThread() with the appropriate CPU identification. On that basis, there are two important procedures in TORO:
- Thread Emigrating: is when the threads are created in a remote processor.
- Thread Inmigrating: is when the guest processor enque in its scheduler the threads that they are comming from others processors.
This is the unique kernel point which needs syncronization between the cores. The mechanism is called "Exchange Slot" and it works without any atomic operation. In this case it used for send and receiv threads but it works with any kind of data.
For every processor in TORO there is an structure called TSchedulerExchangeSlot:
TSchedulerExchangeSlot = record - Thread Emigrating: is when the threads are created in a remote processor.
- Thread Inmigrating: is when the guest processor enque in its scheduler the threads that they are comming from others processors.
This is the unique kernel point which needs syncronization between the cores. The mechanism is called "Exchange Slot" and it works without any atomic operation. In this case it used for send and receiv threads but it works with any kind of data.
For every processor in TORO there is an structure called TSchedulerExchangeSlot:
DispatcherArray: array[0..MAX_CPU-1] of PThread;
EmigrateArray: array[0..MAX_CPU-1] of PThread;
end;
Where MAX_CPU is the number of processors and PThread is a pointer to TThread structure. From the structure declaration we can see that every processor has two arrays
(DispatcherArray y EmigrateArray), and every entry in the array is a pointer to a thread´s queu.
The procedure to send threads to remote processor has three stages:
1-The user calls to BeginThread()for create a new one, if the parameter CPUID is different to local CPU then the kernel enque it to DispatcherArray[CPUID].
2-During Scheduling (cause SysThreadSwitch syscall). The procedure Emigrating()moves all threads from DispatcherArray[] to EmigrateArray[] (only if EmigrateArray[] is nil)
3-During Scheduling of the Remote CPU, the procedure Inmigrating() look for a not nil entry in EmigrateArray[LocalCPUID] in every TSchedulerExchangeSlot processor structure. If it is not nil Then import all the threads to local scheduler and become EmigrateArray[LocalCpuid] to nil.
Local processor just writes and read to DispatcherArray[]. While the local and remote processor write and read to EmigrateArray[], but the access is synchronized using nil pointer.
The “Exchange Slot” doesn´t need "LOCK" instruction.
The Inmigrating and Emigrating procedures are called from the Scheduler. The scheduler makes a few system task, for example in the picture, we can see the scheduler´s flow diagram. There, first it calls Inmigrating(), after that it calls Emigrating() and At the end a new thread is scheduling.
Matias E. Vara
www.torokernel.org
The procedure to send threads to remote processor has three stages:
1-The user calls to BeginThread()for create a new one, if the parameter CPUID is different to local CPU then the kernel enque it to DispatcherArray[CPUID].
2-During Scheduling (cause SysThreadSwitch syscall). The procedure Emigrating()moves all threads from DispatcherArray[] to EmigrateArray[] (only if EmigrateArray[] is nil)
3-During Scheduling of the Remote CPU, the procedure Inmigrating() look for a not nil entry in EmigrateArray[LocalCPUID] in every TSchedulerExchangeSlot processor structure. If it is not nil Then import all the threads to local scheduler and become EmigrateArray[LocalCpuid] to nil.
Local processor just writes and read to DispatcherArray[]. While the local and remote processor write and read to EmigrateArray[], but the access is synchronized using nil pointer.
The “Exchange Slot” doesn´t need "LOCK" instruction.
Matias E. Vara
www.torokernel.org
Etiquetas:
atomic operations,
freepascal,
kernel,
locks,
multiprocessing,
pascal,
threads,
toro
Subscribe to:
Posts (Atom)