What is NUMA?

In the past, multi-core processors had been designed as a Symmetric Multi-Processing (SMP) machine, where all the processors shared access to all the memory available in the system at the same speed, over a single common bus or “interconnect” path. As memory access speed requirement increased for data compute applications, this added more load on the shared bus and because of limited bus bandwidth, there are additional challenges such as latency and collisions between multiple CPUs. The single bus between the processor and memory has become a throughput bottleneck for data access, which eventually would affect overall system performance.

A multiprocessing architecture called Non-Uniform Memory Access (NUMA) was introduced that simplified the complexity of the bus by configuring clusters and allow microprocessors to share memory locally, thus improving performance and expandability of the system. NUMA splits the system into clusters or nodes, with each processor and memory located together on a certain node. Thus, the memory is divided to each individual processor, becoming their own local memory, which is accessed faster (lower latency) than memory on other processor boards. The processor still has access to the memory from the other processor, but with higher latency as it will consume more time to transfer data between nodes.

NUMA Benefits 

Intel’s NUMA divides its Xeon® processor and memory into different nodes and each CPU node passes through using Intel® QuickPath Interconnect (QPI) hub or Intel® Ultra Path Interconnect (UPI), a point-to-point processor interconnect that can increase scalability and available bandwidth. This approach successfully offloads the data access bottleneck.  

When utilized to its potential, NUMA architecture can dramatically improve memory throughput, further increasing scalability and performance. NUMA also provides a linear address space, which results in faster movement of data and less replication of data, resulting in easier programming.  

NFVi optimized architecture: NUMA balance for Intel® QAT 

Intel® Xeon® Scalable processors integrates Intel® QuickAssist Technology (Intel® QAT), chipset-based hardware acceleration for data compression and cryptographic workloads, while delivering enhanced data protection and efficient performance across server and network infrastructure.  

Intel QAT is well-suited for use in Network Function Virtualization (NFV) implementations. To benefit from its full potential, Intel® QAT requires NUMA awareness and configuration of the QAT driver set to take advantage of the system and obtain optimal performance. In order to create a balanced NUMA connectivity, the placement and movement of data is crucial to the overall performance.  

Featured product 

Lanner’s NCA-6520, a high-end network communications appliance powered by the 3rd Gen Intel® Xeon® Scalable Processor (Codenamed Ice Lake SP), has been designed for maximum performance, functionality, and scalability. The platform is optimized for computing power, complicated workloads, and Intel® QuickAssist Technology protection and acceleration.  

NCA-6520 is extremely flexible and scalable with 8 NIC module slots for almost any networking interface and I/O configuration, in addition to redundant power supply and four individual how-swappable smart fans. [For advanced security, the NCA-6520 provides Trusted Platform Module (TPM 2.0) support on board.] NCA-6520 is an optimized platform for network service providers and carriers.  

 


NCA-6520

2U 19" Rackmount Network Appliance Built with Intel® Xeon® Processor Scalable Family (Codenamed Ice Lake SP)

CPU Intel® Xeon® Processor Scalable Family(Codenamed Ice Lake SP)
Chipset Intel® C627A

Read more