|
|
Read
the Q&A below or download the recording of Lanner’s first
webinar if you missed its live session on August 8th, 2012.
Hosted by Open Systems Media,
Jesse delivered a presentation and a Q&A session to
explore the performance difference across three generations of the
Intel Core Processors. The performance comparison between the 1st
(Nehalem) and the 2nd (Sandy Bridge) generations of the Intel Core
Processors was the topic of a whitepaper released by Lanner; for the
webinar the scope of this comparision was expanded to include
the 3rd generation (Ivy Bridge) of the Intel Core Processor and the
presentation was conducted in an interactive and engaging format.
The presentation was supported by
the results of four in-house tests, designed by Lanner to measure and
quantify Intel's claim that the latest generation of the Intel Core
Processor offers performance gains over all previous processors.
Jesse joined Motorola's Asic Chip
Design Division for a period of time after earning a Master of Science
degree in Electrical Engineering from the University of Michigan (Ann
Arbor). For the past 5 years, he has been acting as the Executive
Product Planner of Network Computing Division here at Lanner; he is
also the president of San Jiang Electrical Manufacturing Company.

Webinar
Q&A
1. What is DPDK?
DPDK is a software
package developed by Intel to improve packet
forwarding performance in Intel Architecture.
It’s very inefficient for
Intel Architecture to handle small packets,
mainly because in Linux
environment, packets got to go through network stack inside Linux
kernel.
For some other RISC network processor, they have
optimized way
to process packets, such as RMIOS, or Cavium’s Simple exec. That’s why
Intel develops DPDK to handle packets bypassing network stack.
Intel
created a poll mode driver to handle the packets. Instead of waiting
for the interrupt, processor will keep watching for the packets coming
into the ports.
Intel also created a whole data plane library to
process the data, including memory management, buffer management, queue
management, and packet flow classification.
DPDK will work on any Intel
platform with multi-core processors, even with Atom dual-core cpus.
Control and data processing could be implemented in different cores.
For more detailed discussion, probably got to wait for
our
next
webinar, or please come to our seminar in Santa Clara in November 2012.
2.
What is DDIO?
DDIO is Intel’s IO
technology to improve the
IO performance. The full name is data direct IO.
Traditionally, when
the packets reach the LAN ports, go through PCI-E interface, they will
be moved and stored in memory waiting for the CPU to process them.
Before they are processed, CPU will instruct to move them to the cache.
These movements will result in a lot of bottleneck because movement of
data in and out of memory is not very fast compared with CPU’s
execution speed.
With DDIO, the packets are moved directly into Cache
instead of going into the memory, successfully eliminating the
bottleneck. This is possible because the bigger L3 Cache size inside
Intel’s newer generation processors.
Intel implemented DDIO starting
from higher end Xeon processor Sandy Bridge-EP which is equipped with
20MB LL3 Cache.
At this time, there are only three Intel LAN chips
supporting DDIO: 82599(10G), X540(1Gb) and i350(1Gb). With DDIO, the IO
performance will be improved more than doubled.
3.
Which of your products have
Intel
DPDK feature? And which products have the Intel DDIO
feature?
Intel DPDK is just a
software package to speed up the packet processing. Any processor with
at least dual cores could take advantage of this feature. So we could
say every of our products based on Intel’s processor could support
DPDK.
For DDIO, our FW-8895(Romley platform) and
FW-8893(Crystal
Forest
platform) with right modules could support this feature.
4. On slide 9 there is a DPI block on the architecture slide, is this
in
SW or silicon? What functionality does it include?
We’ve worked or
discussed with companies
such as Lionic, LSI and Netlogic in the past, trying to provide
hardware offload solution for deep packet inspection, but for most
customers, they still prefer using software solution.
Now, there are
two companies providing software solution: Qosmos and Broadweb. We try
to partner with them to fill this gap.
5.
You mentioned something about
Lanner’s Hybrid structure. Could you explain what it is?
We developed the hybrid
architecture with
one US company about three years ago. It’s the idea of integrating data
plane and control plane together inside one unit.
At that time, this US
company was using Cavium Octeon based product as a traffic director in
the WAN optimization application. However, they still required to have
Intel Architecture’s robust computing power, so they asked us to
develop an integrated structure to include both boards.
Inside,
everything’s modularized. You couldn’t see any cable connection. There
are two motherboards on the back side. One on the bottom, acting as
control plane. It’s a X86 board. The other one on the top, working as a
data plane. It could be a RISC architecture board or another X86 board.
These two boards are connected together through middle
plane.
In front
of the middle plane, on the front side, there are three layers of
Ethernet modules, connected to the backside motherboards also through
the middle plane. They could have up to 36 gigabit ports or 12 10G
ports totally. The idea is very similar to ATCA, but much more
flexible.
We could customize for the customers according to their
needs. The main advantage of Hybrid architecture are energy saving and
space saving.
6.
It
looks like your products use other processors like OCTEON and QorIQ for
some packet processing? Will these be taken out in the future if Intel
can
make processors that include more cores?
Any guess on when this
would happen?
Some of our customers
prefer using RISC
processors like Octeon or QorIQ for three reasons: Packet processing,
Security processing offload and the price. They choose RISC processors
not just for the higher number of cores. However, for better computing
power, they still come to Intel Architecture.
With Intel’s new packet
processing software package – DPDK and security coprocessor – Cave
Creek, they might have great influence on the market depending on the
price position.
7. Where do you feel the performance bottlenecks are for the Intel
processors through the three generations? Memory? I/O? Are there
specific things
Intel has done over the three
generations to address
any of these issues that you've seen?
From our perspective, we
care only the
performance of packet processing. The bottleneck comes from Linux
kernel.
If all the packets got to go through network stack in
the
kernel, we just couldn’t see line speed for small packets in Intel
Architecture. We could only see limited improvement from generation to
generation.
Of course, with higher Cache memory and higher clock
rate,
we could see better performance, but to reach line rate for all kinds
of packet, we need to have something like DPDK to bypass Linux kernel.
8.
Can you please advise which 2 10GB module you
used with the FW-8758 for the comparison test?
For testing the 10G
throughputs, we used NCM-IXM203A which uses Intel® 82599ES 10G chip.
9.
On
slide 30, the forwarding benchmarks show between 64-128 bytes per
packet line rate is reached. Do you happen to know at what packet size it hits line rate?
For 10G test, we were
testing two ports.
Because it was in full-duplex mode, for reaching line rate, that meant
the machine could handle 20Gbs throughput.
This condition only happened
using 1518 Byte large packets in generation 1 core processor test. But
using 2nd generation core processor i5-2400, both 1280 and 1518 packet
sizes test could reach line rate.
|
|