Continuation of the article Memory organization in a multicore system.
Non uniform memory architectures.
In a NUMA system the processors have assigned a memory region, it access more fastly that others to it, however every procesor can access to every memory position. It is used message passing to remote memory access. The programmer see a continuos memory space and the hardware makes that abstraction.
The first one in the NUMA tecnology was Sequent Computer Systems. They introduced NUMA at '90. Afterwards it was acquired by IBM and the tecnology was implemented in Power processors.
In other way, IBM made it own NUMA implementation called SE (Shared Everything). This implementation is presented in Power6 processors.
The Intel NUMA implementation is called QuickPath Interconnect. It allows to share memory between the processors and it is transparent for the Operative System. Each processor has a point to point controller.
AMD implementation uses fast links called "Hypertransport Links". In this implementation each procesor has a memory controller and a local memory. The processors are connected between them through a coherent Hypertransport link. Futhermore, each processor has a bi-directional no-coherent bus for IO devices.
Using Point-to-Point controller, the processor can access to memory region more fastly than other and there is an important latency if it tray to access to remote memory. In this way, we have two kind of memory: Local Memory and Remote.
Matias E. Vara
www.torokernel.org
The first one in the NUMA tecnology was Sequent Computer Systems. They introduced NUMA at '90. Afterwards it was acquired by IBM and the tecnology was implemented in Power processors.
In other way, IBM made it own NUMA implementation called SE (Shared Everything). This implementation is presented in Power6 processors.
The Intel NUMA implementation is called QuickPath Interconnect. It allows to share memory between the processors and it is transparent for the Operative System. Each processor has a point to point controller.
AMD implementation uses fast links called "Hypertransport Links". In this implementation each procesor has a memory controller and a local memory. The processors are connected between them through a coherent Hypertransport link. Futhermore, each processor has a bi-directional no-coherent bus for IO devices.
Using Point-to-Point controller, the processor can access to memory region more fastly than other and there is an important latency if it tray to access to remote memory. In this way, we have two kind of memory: Local Memory and Remote.
Matias E. Vara
www.torokernel.org
No comments:
Post a Comment