Monday, December 12, 2011
Fixed an important bug in emigrate procedure
Thursday, August 25, 2011
Patching GDB 7.3 for QEMU remote kernel debug
In some systems may be necessary to install termcap library, simply execute:
$ sudo apt-get install libncurses5-dev
Tuesday, August 23, 2011
Toro in Microelectronic Conference (UNLP)
Saturday, July 30, 2011
Toro in Ubuntu 11.04!
Friday, June 24, 2011
Toro bootloader
Sunday, April 17, 2011
Memory Protection in a multicore environment
This post is contained into the final paper of Matias Vara named “Paralelizacion de Algoritmos Numericos con TORO Kernel” to get the degree on Electronic Engeniering from Universidad de La Plata. These theorical documents help to understand the kernel design.
Introduction
When a Kernel is designed for a multicore system, the shared memory must be protected of concurrent writing accesses. The memory's protection increments kernel code complexity and decreases operative system's performance. If one or more processors are having access to some data at the same time, mutual exclusion must be realized to protect shared data in multicore systems.
[1] Paula McKenney: RCU vs. Locking Performance on Different Types of CPUs.
http://www.rdrop.com/users/paulmck/RCU/LCA2004.02.13a.pdf, 2005
Tuesday, April 05, 2011
Memory organization in a multicore system: Conclusion.
In the case of SMP, memory administation is easy to implemented while in NUMA is not. The system has to assign memory depending of the cpu where the process is running. Every CPU has an own memory bank. The system performance is poor if there are more remote access than local.
Windows has supported NUMA since 2003 version and Linux since 2.6.X. Both of them gives syscalls to exploit NUMA.
TORO kernel is optimized for NUMA technologies, keeping in mind the moderns processors. The unique way to support NUMA is using dedicate buses. In the high performance environment these improves mustn't forget.
Matias E. Vara
www.torokernel.org
Sunday, March 13, 2011
e1000 driver for TORO
Saludos!
Matias E. Vara
www.torokernel.org
Sunday, March 06, 2011
Memory organization in a multicore system II
Non uniform memory architectures.
The first one in the NUMA tecnology was Sequent Computer Systems. They introduced NUMA at '90. Afterwards it was acquired by IBM and the tecnology was implemented in Power processors.
In other way, IBM made it own NUMA implementation called SE (Shared Everything). This implementation is presented in Power6 processors.
The Intel NUMA implementation is called QuickPath Interconnect. It allows to share memory between the processors and it is transparent for the Operative System. Each processor has a point to point controller.
AMD implementation uses fast links called "Hypertransport Links". In this implementation each procesor has a memory controller and a local memory. The processors are connected between them through a coherent Hypertransport link. Futhermore, each processor has a bi-directional no-coherent bus for IO devices.
Using Point-to-Point controller, the processor can access to memory region more fastly than other and there is an important latency if it tray to access to remote memory. In this way, we have two kind of memory: Local Memory and Remote.
Matias E. Vara
www.torokernel.org
Saturday, January 15, 2011
Memory organization in a Multicore system
Memory organization in a Multicore system
Actually, the "Uniform memory access" is the common way to access the memory (See SMP). In this kind of arquitecture, every processor can read every byte of memory, and the processors are independent. In this case, a shared bus is used and the processors compite but only one can write or read. In this environments just one processor can access to a byte in a gived time. For the programmers the memory access is transparent.
In 1992 Intel made the first SMP processor called Pentium PRO. And the memory bus was called Front Side Bus.
That is a bi-directional bus, it is too simple and very cheap, and in theory it scales well.
The next intel step was partition the FSB in two independent bus, but the cache coherency was a bootle-neck.
In 2007 it was implemented a bus per processor.
This kind of architecture is used by Atom, Celeron, Pentium and Core2 of intel.
In a system with many cores, the traffic through the FSB is heavy. The FSB doesn´t scale and it has a limit of 16 processor per bus. So the FSB is wall for the new multicores technology.
We can have CPU that it executes instructions fastly but we waste time if we can´t make the capture and decodification fastly. In the best case, we lose one cycle more reading from the memory.
Since 2001 the FSB has been replaced with point to point devices as Hypertransport or Intel QuickPath Interconnect. That changed the model memory to non uniform memory access
Matias E. Vara www.torokernel.org