|
Intel® QuickData Technology Extends Flexibility of I/O Acceleration
Overview: Extending the Benefits of I/O Acceleration
While the emergence of multi-Gigabit Ethernet (GbE) allows data centers to adapt to the increasing bandwidth requirements of enterprise IT, the impact of high-traffic volume on server resources creates a new challenge for data center managers. With the introduction of Intel® QuickData Technology, the industry ecosystem can now answer that challenge by extending the speed, scalability, and server reliability of I/O acceleration to a broader range of third-party devices, bringing unprecedented flexibility and improved cost/performance of Intel® architecture–based servers to more end customers than ever before.
Intel QuickData Technology is a component of Intel® I/O Acceleration Technology (Intel® I/OAT), functionality now available in the new Dual-Core Intel® Xeon® processor 5100 series–based platforms that efficiently translates high bandwidth into increased throughput and enhanced quality of service. Intel I/OAT moves data more efficiently through new Dual-Core Intel Xeon processor–based servers for fast, scalable, and reliable networking. Additionally, it provides network acceleration that scales seamlessly across multiple Ethernet ports while providing a safe and flexible choice for IT managers due to its tight integration into popular operating systems. This integration helps avoid the support risks of using new network stacks and preserves existing networking requirements such as teaming and failover.
 |
Intel® I/OAT moves data more efficiently through new Dual-Core Intel® Xeon® processor–based servers for fast, scalable, and reliable networking. |
 |
Intel® QuickData Technology Opens Third-Party Avenue to I/O Acceleration
Intel QuickData Technology is a data acceleration engine that enables third-party networking and server vendors to take advantage of the benefits of I/O acceleration. By extending I/O acceleration to a broad range of third-party device manufacturers, Intel QuickData Technology helps the industry benefit from the increased speed, scalability, and server reliability that only Intel® enterprise platforms can provide. Intel QuickData Technology allows customers to benefit from I/O acceleration regardless of operating system (Windows Server* 2003, SUSE Linux* Enterprise Server* 10, or Red Hat Enterprise Linux* 5) or I/O solution.
The flexibility offered by Intel QuickData Technology will help third-party network adapter vendors support customers as they seek to cost-effectively scale network applications and optimize performance—efficiencies critical to meeting the bandwidth challenges in today’s data centers.
 |
Intel has been working closely with Microsoft, the Linux* community, VMware, and several server vendors to optimize Intel® QuickData Technology for the broadest range of physical and virtualized operating system and software environments. Other companies looking to take advantage of Intel QuickData Technology include Broadcom, Fujitsu-Siemens, IBM, and Mellanox. |
 |
I/O Limitations Hinder Network Performance
Business success is becoming increasingly dependent on the rapid transfer, processing, compilation, and storage of data. Although IT managers continue investing in new networking and storage infrastructure to achieve higher performance, network I/O bottlenecks have emerged as the key IT challenge in realizing full value from server investments.
Until recently, the real nature and extent of I/O bottlenecks was not thoroughly understood. Most network issues could be resolved with faster servers, higher bandwidth network interface cards (NICs), and various networking techniques such as network segmentation. Such solutions sufficed until recently when the volume of network traffic began to outpace server capacity to manage that data, due in part to trends such as:
- Increased server port densities and per-port bandwidth requirements due to server consolidation and virtualization
- Increased demand for resource-intensive enterprise audio and video resources
- Increased adoption of network-attached storage versus direct-attach storage to meet enterprise backup and recovery needs
The increased reliance on networked data and networked compute resources results in a need to manage an unprecedented data load. This increased load threatens to outpace server processing capabilities, which is demonstrated by marginal gains from traditional performance enhancements (for example, increasing network bandwidth, adding servers, and so forth).
 |
“By enabling products from other vendors to use the data acceleration engine present in the Intel® Xeon® processor 5100 and 5300 series–based platform, Intel® QuickData Technology will help the industry benefit from the increased speed, scalability and server reliability that only Intel® enterprise platforms can provide. This echoes Intel’s long-held belief in the proliferation of key technologies designed to grow the industry’s computing and networking capabilities.”
—Kirk Skaugen, Vice President and General Manager of Intel’s Server Products Group |
 |
Intel R&D Exposes Three Primary Bottlenecks
To identify the I/O bottlenecks and determine their nature and their impact on network performance, Intel research and development teams examined the entire flow of a data packet as it is received, processed, and sent out by the server to determine where platform-wide bottlenecks were affecting system throughput. As a result of packet data flow that had remained largely unchanged for more than a decade, Intel engineers discovered three major contributors to inefficient network I/O—inefficient TCP/IP processing, significant system overhead, and excessive memory accesses (Figure 1).

Figure 1. Server network I/O processing tasks fall into three major overhead categories,
each varying as a percentage of total overhead according to TCP/IP packet size.
Technologies such as Intel® PRO Server Adapters and Intel® PRO Network Connections (LAN on motherboard) include advanced features that reduce processor usage, mitigating some overhead issues. These features include interrupt moderation, TCP checksum offload, TCP segmentation, and large send offload.
Other approaches exist that claim to address system overhead by offloading even more processing to the NIC. These approaches include the TCP offload engine (TOE) and remote direct memory access (RDMA).
TOE uses a specialized and dedicated processor on the NIC to handle some of the packet protocol processing. It does not fully address the other performance bottlenecks shown in Figure 1. In fact, for small packet sizes and short duration connections, TOE may be of very limited benefit in addressing the overall I/O performance problem. Additionally, given its offloading (that is, copying) of the network stack to a fixed-speed microcontroller (the offload engine), not only is performance limited to the speed of the micro-controller, but there is also a risk that key network capabilities, such as adapter teaming and failover, may not function in a TOE environment.
As for RDMA-enabled NICS (RNICs), the RDMA protocol supports direct placement of packet payload data into an application’s memory space. This addresses memory access bottlenecks by reducing data movement overhead for the processor. However, the RDMA protocol is in addition to existing network protocols, like TCP/IP, and exhibits its own overhead as each data transfer is arranged.
Defining the Worst I/O Bottleneck
The limitations of existing I/O acceleration solutions became even clearer as Intel research and development teams began to quantify each category under different conditions. Figure 2 summarizes these results for various application I/O sizes and their impact by task category on percent of processor utilization.

Figure 2. Processor utilization varies according to I/O size. TCP/IP processing is fairly
constant and tends to be a smaller part of processor utilization compared to system overhead.
Notice in Figure 2 that processor usage by TCP/IP processing is nearly constant across packet sizes ranging from 2K to 64K. Although TCP/IP processing is a significant bottleneck, it is not the most significant bottleneck. Memory accesses account for more processor usage in all cases than TCP/IP processing, and system overhead is the worst I/O bottleneck for application I/O sizes below 8K.
As stated, TOE and RDMA do not address the entire I/O bottleneck issue. What is needed is a system-wide solution that can fit anywhere in the enterprise computing hierarchy without requiring modification of application software and which provides acceleration benefits to address all three types of network bottlenecks. Intel I/OAT, now available on the new Dual-Core Intel Xeon processor, is exactly that kind of solution.
 |
“We believe the introduction of Intel® QuickData Technology is a very positive step. We are working with Intel to evaluate the possible ways that our customers can benefit from this new direction, and we will be looking at our product offerings to assess where there may be synergistic coexistence.”
—Greg Young, Vice President and General Manager of
Broadcom’s High-Speed Controller Line of Business |
 |
Intel® I/OAT Offers a System-Wide Solution
Intel I/OAT addresses system overhead, TCP/IP processing, and memory access server I/O bottlenecks by providing fast, reliable network acceleration that scales seamlessly across multiple Ethernet ports, and it is a safe and flexible choice for IT managers due to its tight integration into popular operating systems.
The system-wide network I/O acceleration technologies applied by Intel I/OAT include:
- Parallel processing of TCP and memory functions. Lowers system overhead and improves the efficiency of TCP stack processing by using the processor to execute multiple instructions per clock, pre-fetch TCP/IP header information into cache, and perform other data movement operations in parallel.
- Affinitized data flows. Partitions network stack processing dynamically across multiple physical or logical processors. This allows processor cycles to be allocated to the application for faster execution.
- Asynchronous low-cost copy. Provides enhanced DMA, allowing payload data copies from the NIC buffer in system memory to the application buffer with far fewer processor cycles, with the saved cycles applied to productive application workloads.
- Improved TCP/IP protocol with optimized TCP/IP stack. Implements separate packet data and control paths to optimize processing of the packet header from the packet payload. This and other stack-related enhancements reduce protocol processing cycles.
Because the Intel I/OAT performance enhancements occur within the server architecture for the new Dual-Core Intel Xeon processor–based servers, the technology is said to be “stateless”. This is as opposed to stateful offload technologies, such as TOE. As a stateless technology, Intel I/OAT retains use of the system processors and protocols as the principal engines for handling network traffic.
Additionally, Intel I/OAT is used throughout the platform to increase processor efficiency by reducing bottlenecks across most application I/O sizes. Because Intel I/OAT is tightly integrated into popular operating systems, it ensures full compatibility with critical network configurations. As a result, Intel I/OAT provides a fast, scalable, and reliable network acceleration solution with significant performance advantages over prior system implementations and technologies.
Intel I/OAT Shows a Clear Performance Advantage
The Intel I/OAT platform-level approach to improving network performance has been verified by extensive testing. Some of these results are summarized in Figure 3.
 |
| Figure 3. Network-performance comparisons for platforms with and without Intel® I/OAT. Compared to previous processors, the new Dual-Core Intel® Xeon® processor with Intel I/OAT provides superior performance in terms of both higher throughput and reduced percentage of processor utilization. |
The Intel I/OAT performance tests were conducted for both Linux* and Microsoft Windows* operating systems using two Intel Xeon processor–based servers tested across multiple GbE NIC ports (two to eight GbE NIC ports) as represented by the X-axis. One of the servers was an Intel® E7520 chipset–based platform without the benefit of Intel I/OAT. The other server was an Intel® E5000 chipset–based server using the new Dual-Core Intel Xeon processor, with Intel I/OAT enabled. In the test examples of Figure 3, the graphs represent both processor utilization percentages (the lines) and corresponding network throughput performance (the vertical bars). Both systems underwent identical stress tests.
The top graph summarizes the results of an Intel Xeon processor–based server running Linux. Notice that a platform with Intel I/OAT enabled running Linux and using eight GbE ports achieved a processor utilization improvement of over 40 percent versus a platform without Intel I/OAT. Additionally, this same platform achieved almost twice the network throughput as the platform without Intel I/OAT operating under the same conditions.
Similarly, network throughput of the platform nearly doubled for eight GbE ports on Windows Server 2003 (bottom graph in Figure 3). In this test, the Intel E7520 chipset–based platform was incapable of generating processor utilization data beyond four ports, because without the benefit of receive-side scaling, the server directs all network traffic to Processor 0, saturating the processor and limiting the system’s ability to report data. However, the new Dual-Core Intel Xeon processor–based platform with Intel I/OAT balanced the workload across the processors and never reached 70 percent processor utilization, even at eight GbE ports.
Summary
Crucial to accelerating widespread adoption of I/O acceleration technologies, Intel QuickData Technology makes the data movement engine introduced in Intel’s server and workstation platform chipsets available to third-party network adapter manufacturers. By enabling other networking and server vendors to increase the throughput of server data traffic, end users will be able to provide fast, scalable, and reliable network acceleration for the majority of today’s data center environments. This translates to the following significant areas of benefit for IT data centers:
- Speed. By leveraging platform-level architectural improvements to minimize performance-limiting bottlenecks, Intel I/OAT significantly reduces processor overhead to free resources for more critical compute tasks.
- Scalability. Intel I/OAT scales seamlessly across multiple GbE ports (up to eight GbE ports), and can scale up to 10GbE, while maintaining power and thermal characteristics similar to those of a standard GbE network adapter.
- Reliability. Intel I/OAT is tightly integrated into popular operating systems such as Microsoft Windows Server 2003 and Linux, avoiding the support risks associated with relying on third-party hardware vendors for network stack updates. Intel I/OAT also preserves critical network configurations such as teaming and failover by maintaining control of the network stack processing within the processor—where it belongs. This results in reduced support risks for IT departments.
Contact the Editor
Tell us what you think about this article. |