A dedicated kernel for multi-threading applications.

Wednesday, November 10, 2010

Context Switching

In this article I will try to make a brief description about “Context Switching” and specially how TORO implements it since version 0.01. I won’t talk about how TORO OS implements it in versions 1.xx, I’ll just say that it uses “Context Switching” by hardware. In future articles I will show the implementation of these ideas on pascal. If you have doubts, see the references! Enjoy it!

As we know, in kernel, the scheduler is in charge of thread distributions. A part of implementing the scheduling algorithm it performs the "Context Switching" procedure. When selects a new thread, the scheduler fills the processor registers with the values that they had just before the thread invokes the kernel (NOTE this comment is related with TORO’S scheduling algorithm, which is the cooperative thread).


Figure 1. Procedure for loading a new process.


In the x86-64 architecture some of the general use registers are RAX, RBX, RCX, etc. In addition to these registers, some system registers must be also updated, like CR3. These ones store information about the thread page directory that will be loaded. The procedure is named "Context Switching " and is a critical operation because it runs continuously so it must be very fast.

The “Context Switching” procedure can be implemented by software and by hardware.

When it’s implemented by hardware, it uses the mechanisms that provide a particular architecture for make the “Context Switching “. For example, for the x86 architecture, are used the structures named “tasks descriptors”, these ones are in the GDT (Global Descriptor Table) and when a new task must be loaded is simply used the instruction "call" to the task descriptor (called in the literature as TSS) [1].

On the other hand, in the implementation by software the “Context Switching” is done "by hand" and is a routine written by the programmer the one in charge of saving the value of the registers.

At first glance the “Context Switching” by hardware seems to be the best option because the programmer isn’t involved and is done "automatically". Often, in the “Context Switching” by hardware, registry values are all saved, but sometimes are not being used all records. That’s why the implementation by hardware may not be the best option.

For this reason the “Context Switching” in TORO is implemented by software, programming techniques are used for not using the “Context Switching” mechanism that offers a particular hardware. When the scheduler selects a new thread, it loads in the processor's registers the values corresponding to the new thread, and then starts run. The thread begins its execution after the moment when the SysThreadSwitch procedure was called.

As the “Context Switching” is always done after invoke the SysThreadSwitch function, the planner supposes that at that moment the processor's registers are not being used by the user application. In this way it's limited only to save the state of the stack's thread that has to be removed. For the implementation on x86-64 architecture, this is achieved by saving on the TThread structure the RSP registry value, wich keeps its position inside the stack.

The “Context Switching” implemented in TORO is faster than the one by hardware and the one implemented on an OS for general purposes. The selection of the “Context Switching” method is directly related to the cooperative thread model.

Therefore the utilization of “Context Switching” by software adds portability and speed [2].


[1]. Intel. IA-32 Intel® Architecture Software Developer’s Manual. Vol3. 2004.

[2]. Osdev Wiki, Context Switching, http://wiki.osdev.org/Context_Switching.



Matias E. Vara

www.torokernel.org

Monday, November 08, 2010

Changes on SVN

I ´ll work on SVN directories so maybe it ´ll be off-line a few hours. You should make an update on SVN.

Saludos
Matias E. Vara
www.torokernel.org

Thursday, October 28, 2010

BugCon Presentation

Here it is the link to Toro´s presentation on BugCon2010. Enjoy!
Saludos.

PD: The presentation is in english. Thanks Eugene!

Matias E. Vara
www.torokernel.org

Sunday, September 19, 2010

Threads migration without Lock in Toro

In a Multicore environment, the programmer needs to create local and remote threads. In TORO create a remote threads is easy, you just have to use BeginThread() with the appropriate CPU identification. On that basis, there are two important procedures in TORO:

- Thread Emigrating: is when the threads are created in a remote processor.
- Thread Inmigrating: is when the guest processor enque in its scheduler the threads that they are comming from others processors.

This is the unique kernel point which needs syncronization between the cores. The mechanism is called "Exchange Slot" and it works without any atomic operation. In this case it used for send and receiv threads but it works with any kind of data.

For every processor in TORO there is an structure called TSchedulerExchangeSlot:

TSchedulerExchangeSlot = record
DispatcherArray: array[0..MAX_CPU-1] of PThread;

EmigrateArray: array[0..MAX_CPU-1] of PThread;

end;


Where MAX_CPU is the number of processors and PThread is a pointer to TThread structure. From the structure declaration we can see that every processor has two arrays
(DispatcherArray y EmigrateArray), and every entry in the array is a pointer to a thread´s queu.

The procedure to send threads to remote processor has three stages:
1-The user calls to BeginThread()for create a new one, if the parameter CPUID is different to local CPU then the kernel enque it to DispatcherArray[CPUID].
2-During Scheduling (cause SysThreadSwitch syscall). The procedure Emigrating()moves all threads from DispatcherArray[] to EmigrateArray[] (only if EmigrateArray[] is nil)
3-During Scheduling of the Remote CPU, the procedure Inmigrating() look for a not nil entry in EmigrateArray[LocalCPUID] in every TSchedulerExchangeSlot processor structure. If it is not nil Then import all the threads to local scheduler and become EmigrateArray[LocalCpuid] to nil.
Local processor just writes and read to DispatcherArray[]. While the local and remote processor write and read to EmigrateArray[], but the access is synchronized using nil pointer.
The “Exchange Slot” doesn´t need "LOCK" instruction.


The Inmigrating and Emigrating procedures are called from the Scheduler. The scheduler makes a few system task, for example in the picture, we can see the scheduler´s flow diagram. There, first it calls Inmigrating(), after that it calls Emigrating() and At the end a new thread is scheduling.


Matias E. Vara
www.torokernel.org