That's just a brief post about a recent change in the way that Toro migrates threads.
Previously, when a Thread running in core #0 wanted create a new Thread in core #1, function ThreadCreate allocated the TThread structure, TLS and the Stack then, It migrated the whole TThread structure to the core #1.
The main problem in this mechanism was that all memories block were allocated in parent core. This is a serious infraction in the NUMA model: TThread, TLS and the Stack are not already local memory.
Thus, I rewrote the way that Threads are migrated. When a Thread wants to create a new one remotely, Toro still invokes ThreadCreate BUT it is executed in the remote core. Instance of migrate the TThread structure, now Toro migrates a set of arguments to be passed toward ThreadCreate. When ThreadCreate finishes, the parent thread retrieve the TThreadID value or nil if it fails.
As we can see, while a local thread is made immediately when ThreadCreate is invoked, a remote thread spend two steps of latency: one for migrate the parameters and other for retrieve the result.
Matias E. Vara
www.torokernel.org