Technology & Research

Intel® Technology Journal Home

Volume 12, Issue 04

Intel® vPro™ Technology


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1204.09

  • Volume 12
  • Issue 04
  • Published December 23, 2008

Intel® vPro™ Technology

  Section 5 of 10  

Remote System Repair Using Intel® vPro™ Technology

Remote Heal Solution Challenges

As we continue to make technological advancements, the usability of such advanced features is a key concern. These range from simple ease-of-use to preventing an unauthorized person from gaining access to corporate data.

Usability Challenges

Triggers to Initiate Heal Usage
Currently, the trigger to establish the Fast-call for Help connection to a remote gateway enabled by Intel® vPro™ technology is initiated in software–usually controlled by the BIOS or the host OS. However, the trigger cannot be initiated in some unknown OS or BIOS state, such as when the OS ends up in an inoperable state. For BIOS-based triggers, the user has to be instructed to interact with BIOS screen.

IT Integration Challenges for Remote Services
Currently, when an IT worker (customer) hits the special key for Fast-call for Help to trigger the remote heal usage, the connection event itself is trapped at a remote IT server, waiting for a corresponding event to be generated to register the client in the enterprise Active Directory. Specifically, if the customer has already triggered this remote access and subsequently talks to a remote console operator over a phone, there is no easy way to associate the operator’s remote console being used by the service provider to the existing remote connection event. To compare this to a phone-based support infrastructure it would be like having an automated call management tool route waiting phone calls to a servicing agent or a queue until service personnel are available. The interface of the remote access server or service into an existing remote console infrastructure is critical for successful customer interaction. Specifically, the following integration aspects need to be covered in the overall service delivery:

  • Direct mapping of a remote access event from a specific client platform to a customer support console where all heal actions associated with the client platform are performed.
  • Ability to trap or forward an existing remote Fast Call for Help connection to the remote console, in case the customer does not make contact over a phone but instead initiates a client connection by using Fast Call for Help.
  • Active Directory for swift integration of a remotely connected client event such as PET/WS-Event via Fast Call for Help to the enterprise Active Directory or Name Resolution service.
  • Quality of service metric to monitor and improve the clients that are serviced by Fast Call for Help technology.

The resolution of such new challenges is critical for the successful integration of this remote heal usage in the enterprise.

Deployment Issues of a Remote Access Gateway
Deploying a RSR solution utilizing Fast Call for Help requires the deployment of an Intel vPro-enabled gateway to provide the client authentication and access service. This may be a standalone gateway or integrated into an existing gateway residing in an enterprise DMZ for normal IT usage. If deployed as a standalone gateway this adds the need to maintain potentially another gateway in the enterprise DMZ.

Performance issues in Delivering Remote Heal Usage

High-Latency Networks
The IDE-R protocol was implemented and designed to work within the enterprise, that is, on high-bandwidth, low-latency networks. Further, the embedded processor (EP) operates in a constrained memory environment. It limits the number of buffers that are posted to WiFi/Ethernet* Network Interface Cards (NICs) and limits the socket buffers in the Intel® Management Engine (Intel® ME) network firmware—just enough to meet targets in low-latency networks. However, as the latency increases, the buffers needed to hide the latency are insufficient to meet these targets, thereby compromising network performance.

In the remote system repair scenario, the platform connects to the Internet by using DSL, or cable or fiber optic networks. These are usually limited in bandwidth and have higher latencies. In this environment, as network latencies increase, the performance drops much faster—especially in typical Internet speeds and latency ranges. This performance drop was observed in the context of both Transmission Control Protocol/Internet Protocol (TCP/IP) and IDE-R protocol stacks.

In addition, all application layer protocols such as IDE-R are multiplexed over the Intel® Active Management Technology (Intel® AMT) Port Forwarding (APF) protocol, used by Fast Call for Help technology in a Transport Layer Security (TLS) session. So, these application-level flow controls are also layered on top of APF’s own flow control. As a result, performance issues are accentuated when the additional protocol layer, with its own data-trunk protocol, is introduced. This limits the efficiency of payload transfer.

The Impact of High Latency Network Performance on RSR Solution

Table 1 below demonstrates the impact of a high latency network on the RSR solution. In this table, we show a comparison of the IDE-R application with and without optimization on an APF connection.

Table 1: IDE-R application performance without optimization on an APF connection
Bandwidth: 2 Mbps Symmetric (250 KB/s) File Size: 16 MB
Without optimization
IDE-R Copy Throughput (KB/s)
Round-trip Time (RTT)
0 ms 40 ms 80 ms
185 34 18

For the purposes of measurement and tuning, we used the runtime performance of IDE-R by remotely mounting a CD image and copying large files from the server to the client, enabled with Intel® vPro™ technology. We compared how well the files copied over APF connections versus non-APF connections. The Internet latency and bandwidth were emulated by using a NISTNet* Linux* router sitting between the client system and the remote access gateway.

To provide real world context, without optimization, transferring a copy of a 100MB ISO file would take approximately 95 minutes with limited success.

Tackling the High Latency Network Issue
In an effort to improve the RSR performance, we began our analysis of the end-to-end IDE-R copy performance by using the layered application architecture. We started at the bottom of the stack and moved up, while ensuring the corresponding peer in the server side was equally capable of delivering the performance. Figure 6 shows the application stack.

We first focused on the TCP/IP stack implementation in the Intel® ME, and then analyzed IDE-R, since IDE-R can simply be transported on TCP/IP. The APF layer (c and d) is a transparent layer that can be added or removed as part of the overall stack–giving us the flexibility to fine-tune IDE-R, independent of APF. Subsequently, we introduced the Fast Call for Help or APF layer over TLS, to understand the impact of TLS and the multiplexing protocol (APF).

Layered approach for performance analysis
Figure 6: Layered approach for performance analysis
click image for larger view

The Improved Performance of Intel® vPro™ Technology Architecture

TCP/IP stack in the Intel® Management Engine (Intel® ME)
In order to ensure the highest performance from the TCP/IP stack implementation, we instrumented it by adding added a special FTP module; we designed the module to discard packets once FTP packets are received. We found that the TCP was advertising a small receiver window (8K) over the wire by using tools such as Wireshark*. We fixed this by increasing the size of the receiver window, as well as increasing the socket receive queue to 64k. The TCP/IP stack then started performing at the levels of Windows XP*. For a 2Mbps symmetric link, 32KB buffers or a maximum TCP receive window are sufficient. With the fixes, we concluded that the TCP/IP stack performance was sufficient for on-the-go usage.

IDE-R on regular networks
Since IDE-R is a TCP application and has its own buffering scheme, we focused on the sizes of the IDE-R buffer, since we knew that at higher latencies, larger buffer sizes or more data were needed to fill the end-to-end network pipe. Applying the same thinking to IDE-R, we increased the redirection buffer sizes to 64K to measure any improvements in performance. For 80ms RTT on a 2Mbps symmetric link, with 64K buffers, we measured an IDE-R file copy throughput of 178KB/s, as opposed to 18KB/s under the same network conditions without the optimizations.

IDE-R with Fast Call for Help
Since layer (a) and (b) in Figure 6 were fixed by increasing the buffer size, we introduced the Fast Call for Help layer with APF and TLS encryption above TCP/IP. Fast Call for Help was simply a multiplexing and encryption layer, and since encryption was done in hardware, we expected no difference in performance levels for IDE-R with the introduction of APF and TLS.

However, our measurements indicated low performance at un-buffered levels. Subsequent wire-level debugging of packet traces indicated smaller receiver window size and delayed acknowledgements from the Intel® ME firmware. These issues were resolved by turning on the TCP_NODELAY option and increasing the receiver window size appropriately. The new performance measurements are shown in Table 2.

Table 2: Optimized IDE-R application performance
Bandwidth: 2 Mbps Symmetric (250 KB/s) File Size: 16 MB
With Optimization IDE-R Copy
Throughput (KB/s)
Round-trip Time (RTT)
0 ms 40 ms 80 ms
195 158 124

APF Protocol and implementation analysis
Even with the changes discussed so far, we observed that on a 250KB/s link at 80ms RTT, FTP throughput was 230KB/s and IDE-R throughput was 178KB/s. Conversely, IDE-R over Fast Call for Help throughput was only 110-120 KB/s. Further investigation led us to take a closer look at an aspect of the APF protocol itself.

APF has its own flow control layered on top of TCP’s flow control in order to ensure fairness to multiple TCP connections sharing the Fast Call for Help tunnel. With data from APF being sent in 4380-byte chunks to the TCP, efforts are underway to increase this to 64K for increased throughput.

To ensure that TLS encryption was not limiting the performance, we removed the TLS layer and found little difference in performance. This was in line with our expectation that encryption was offloaded to the hardware engine.

Host Interface (ATA) and end-to-end performance
Even after completing the performance optimizations at all lower levels, the maximum observed throughput at that target of 80ms was still at 120 KB/s. This was just a little more than 50 percent of host-based FTP transfer rates.

As a result, we focused on the topmost layer of the application stack. The host IDE/ATA stack issues a 64KB request for data to the underlying CD drive. In the case of IDE-R, these data are fetched from the remote drive by using the IDE-R messages. This sequence can be established as follows:

  1. The host OS issues a 64KB request to firmware. These requests are standard requests issued by operating systems such as Windows XP* and Windows Vista*.
  2. The Firmware sends a 64KB request via an IDE-R message to the remote server spanning the entire latency.
  3. The remote gateway starts the 64KB transfer only after “t/2” msec if “t” is the end-to-end RTT.
  4. The Firmware receives and copies 64KB of data to the host; however, the first packet of the 64KB is received “t/2” msec after Step 3 is completed.
  5. The host initiates the next request as in Step 1.

Therefore, in the case of high-latency networks, after completion of one ATA request, no data were transported until the server (console) receives the next request for 64 KB of data (one way delay). In a one-second interval, if we need to transport three 64 KB chunks of data to at least meet a 192 KB/sec target, we need to incur this additional RTT overhead three times, resulting in a substantial loss of nearly 20 percent, prior to issuing the next request.

As we analyzed each layer of the stack and implemented the optimization in firmware, we were able to observe the improvement in network performance substantially from 18 KB/s to 124 KB/sec in an Internet environment with 80-msec (RTT) latency. To provide real-world context for the optimization, transferring a copy of a 100MB ISO file would take approximately take 14 minutes.

Future Improvements for RSR
Since the RSR usage is primarily end-to-end image copying, we also determined it would help if we could pre-fetch the next data chunk without waiting for the host OS to issue the request. In order to optimize the request for an Internet-based connection, we want to pre-fetch at least one second’s worth of data, to keep filling the end-to-end pipe. Currently, efforts are underway to target the next-generation platforms with Intel® vPro™ technology to incorporate IDE-R data pre-fetching.

  Section 5 of 10  

Back to Top

In this article

Download PDF of this article