A dedicated kernel for multi-threading applications.

Showing posts with label threads. Show all posts
Showing posts with label threads. Show all posts

Monday, December 12, 2011

Fixed an important bug in emigrate procedure

That's just a brief post about a recent change in the way that Toro migrates threads. 
Previously, when a Thread running in core #0 wanted create a new Thread in core #1, function ThreadCreate allocated the TThread structure, TLS and the Stack then, It migrated the whole TThread structure to the core #1.
The main problem in this mechanism was that all memories block were allocated in parent core. This is a serious infraction in  the NUMA model: TThread, TLS and the Stack are not already local memory.
Thus, I rewrote the way that Threads are migrated. When a Thread wants to create a new one remotely, Toro still invokes ThreadCreate BUT it is executed in the remote core. Instance of migrate the TThread structure, now Toro migrates a set of arguments to be passed toward ThreadCreate. When ThreadCreate finishes, the parent thread retrieve the TThreadID value or nil if it fails. 
As we can see, while a local thread is made immediately when ThreadCreate is invoked, a remote thread  spend two steps of latency: one for migrate the parameters and other for retrieve the result.       


Matias E. Vara
www.torokernel.org 
      

Sunday, September 19, 2010

Threads migration without Lock in Toro

In a Multicore environment, the programmer needs to create local and remote threads. In TORO create a remote threads is easy, you just have to use BeginThread() with the appropriate CPU identification. On that basis, there are two important procedures in TORO:

- Thread Emigrating: is when the threads are created in a remote processor.
- Thread Inmigrating: is when the guest processor enque in its scheduler the threads that they are comming from others processors.

This is the unique kernel point which needs syncronization between the cores. The mechanism is called "Exchange Slot" and it works without any atomic operation. In this case it used for send and receiv threads but it works with any kind of data.

For every processor in TORO there is an structure called TSchedulerExchangeSlot:

TSchedulerExchangeSlot = record
DispatcherArray: array[0..MAX_CPU-1] of PThread;

EmigrateArray: array[0..MAX_CPU-1] of PThread;

end;


Where MAX_CPU is the number of processors and PThread is a pointer to TThread structure. From the structure declaration we can see that every processor has two arrays
(DispatcherArray y EmigrateArray), and every entry in the array is a pointer to a thread´s queu.

The procedure to send threads to remote processor has three stages:
1-The user calls to BeginThread()for create a new one, if the parameter CPUID is different to local CPU then the kernel enque it to DispatcherArray[CPUID].
2-During Scheduling (cause SysThreadSwitch syscall). The procedure Emigrating()moves all threads from DispatcherArray[] to EmigrateArray[] (only if EmigrateArray[] is nil)
3-During Scheduling of the Remote CPU, the procedure Inmigrating() look for a not nil entry in EmigrateArray[LocalCPUID] in every TSchedulerExchangeSlot processor structure. If it is not nil Then import all the threads to local scheduler and become EmigrateArray[LocalCpuid] to nil.
Local processor just writes and read to DispatcherArray[]. While the local and remote processor write and read to EmigrateArray[], but the access is synchronized using nil pointer.
The “Exchange Slot” doesn´t need "LOCK" instruction.


The Inmigrating and Emigrating procedures are called from the Scheduler. The scheduler makes a few system task, for example in the picture, we can see the scheduler´s flow diagram. There, first it calls Inmigrating(), after that it calls Emigrating() and At the end a new thread is scheduling.


Matias E. Vara
www.torokernel.org