the Q&A below or download the recording of Lanner’s first
webinar if you missed its live session on August 8th, 2012.
Hosted by Open Systems Media, Jesse delivered a presentation and a Q&A session to explore the performance difference across three generations of the Intel Core Processors. The performance comparison between the 1st (Nehalem) and the 2nd (Sandy Bridge) generations of the Intel Core Processors was the topic of a whitepaper released by Lanner; for the webinar the scope of this comparision was expanded to include the 3rd generation (Ivy Bridge) of the Intel Core Processor and the presentation was conducted in an interactive and engaging format.
The presentation was supported by the results of four in-house tests, designed by Lanner to measure and quantify Intel's claim that the latest generation of the Intel Core Processor offers performance gains over all previous processors.
Jesse joined Motorola's Asic Chip Design Division for a period of time after earning a Master of Science degree in Electrical Engineering from the University of Michigan (Ann Arbor). For the past 5 years, he has been acting as the Executive Product Planner of Network Computing Division here at Lanner; he is also the president of San Jiang Electrical Manufacturing Company.
1. What is DPDK?
DPDK is a software package developed by Intel to improve packet forwarding performance in Intel Architecture.
It’s very inefficient for Intel Architecture to handle small packets, mainly because in Linux environment, packets got to go through network stack inside Linux kernel.
For some other RISC network processor, they have optimized way to process packets, such as RMIOS, or Cavium’s Simple exec. That’s why Intel develops DPDK to handle packets bypassing network stack.
Intel created a poll mode driver to handle the packets. Instead of waiting for the interrupt, processor will keep watching for the packets coming into the ports.
Intel also created a whole data plane library to process the data, including memory management, buffer management, queue management, and packet flow classification.
DPDK will work on any Intel platform with multi-core processors, even with Atom dual-core cpus. Control and data processing could be implemented in different cores.
For more detailed discussion, probably got to wait for our next webinar, or please come to our seminar in Santa Clara in November 2012.
2. What is DDIO?
DDIO is Intel’s IO technology to improve the IO performance. The full name is data direct IO.
Traditionally, when the packets reach the LAN ports, go through PCI-E interface, they will be moved and stored in memory waiting for the CPU to process them. Before they are processed, CPU will instruct to move them to the cache. These movements will result in a lot of bottleneck because movement of data in and out of memory is not very fast compared with CPU’s execution speed.
With DDIO, the packets are moved directly into Cache instead of going into the memory, successfully eliminating the bottleneck. This is possible because the bigger L3 Cache size inside Intel’s newer generation processors.
Intel implemented DDIO starting from higher end Xeon processor Sandy Bridge-EP which is equipped with 20MB LL3 Cache.
At this time, there are only three Intel LAN chips supporting DDIO: 82599(10G), X540(1Gb) and i350(1Gb). With DDIO, the IO performance will be improved more than doubled.
3. Which of your products have Intel DPDK feature? And which products have the Intel DDIO feature?
Intel DPDK is just a software package to speed up the packet processing. Any processor with at least dual cores could take advantage of this feature. So we could say every of our products based on Intel’s processor could support DPDK.
For DDIO, our FW-8895(Romley platform) and FW-8893(Crystal Forest platform) with right modules could support this feature.
4. On slide 9 there is a DPI block on the architecture slide, is this in SW or silicon? What functionality does it include?
We’ve worked or discussed with companies such as Lionic, LSI and Netlogic in the past, trying to provide hardware offload solution for deep packet inspection, but for most customers, they still prefer using software solution.
Now, there are two companies providing software solution: Qosmos and Broadweb. We try to partner with them to fill this gap.
5. You mentioned something about Lanner’s Hybrid structure. Could you explain what it is?
We developed the hybrid architecture with one US company about three years ago. It’s the idea of integrating data plane and control plane together inside one unit.
At that time, this US company was using Cavium Octeon based product as a traffic director in the WAN optimization application. However, they still required to have Intel Architecture’s robust computing power, so they asked us to develop an integrated structure to include both boards.
Inside, everything’s modularized. You couldn’t see any cable connection. There are two motherboards on the back side. One on the bottom, acting as control plane. It’s a X86 board. The other one on the top, working as a data plane. It could be a RISC architecture board or another X86 board.
These two boards are connected together through middle plane. In front of the middle plane, on the front side, there are three layers of Ethernet modules, connected to the backside motherboards also through the middle plane. They could have up to 36 gigabit ports or 12 10G ports totally. The idea is very similar to ATCA, but much more flexible.
We could customize for the customers according to their needs. The main advantage of Hybrid architecture are energy saving and space saving.
6. It looks like your products use other processors like OCTEON and QorIQ for some packet processing? Will these be taken out in the future if Intel can
make processors that include more cores? Any guess on when this would happen?
Some of our customers prefer using RISC processors like Octeon or QorIQ for three reasons: Packet processing, Security processing offload and the price. They choose RISC processors not just for the higher number of cores. However, for better computing power, they still come to Intel Architecture.
With Intel’s new packet processing software package – DPDK and security coprocessor – Cave Creek, they might have great influence on the market depending on the price position.
7. Where do you feel the performance bottlenecks are for the Intel processors through the three generations? Memory? I/O? Are there specific things
Intel has done over the three generations to address any of these issues that you've seen?
From our perspective, we care only the performance of packet processing. The bottleneck comes from Linux kernel.
If all the packets got to go through network stack in the kernel, we just couldn’t see line speed for small packets in Intel Architecture. We could only see limited improvement from generation to generation.
Of course, with higher Cache memory and higher clock rate, we could see better performance, but to reach line rate for all kinds of packet, we need to have something like DPDK to bypass Linux kernel.
8. Can you please advise which 2 10GB module you used with the FW-8758 for the comparison test?
For testing the 10G throughputs, we used NCM-IXM203A which uses Intel® 82599ES 10G chip.
9. On slide 30, the forwarding benchmarks show between 64-128 bytes per packet line rate is reached. Do you happen to know at what packet size it hits line rate?
For 10G test, we were testing two ports. Because it was in full-duplex mode, for reaching line rate, that meant the machine could handle 20Gbs throughput.
This condition only happened using 1518 Byte large packets in generation 1 core processor test. But using 2nd generation core processor i5-2400, both 1280 and 1518 packet sizes test could reach line rate.
|Contact us today|