Understanding Packet Loss in Network Monitoring and Analysis Appliances
By Daniel Joseph Barry
Network monitoring and analysis has grown in importance as the Internet and IP networks have become the de facto standard for a range of digital services. The commercialization of the Internet has only aggravated this need and extended the range of applications to include network testing, security, and optimization.
Common for all these applications is a need to analyze large amounts of data in real time. What distinguishes the task of network analysis from communication is the amount of data to be analyzed. In a typical communication scenario, the endpoints are only interested in the packets that are related to their conversation. The other packets sharing the same connection are simply filtered out. In a network analysis scenario, on the other hand, we are interested in all the packets traversing the point we are monitoring. At 10 Gbps this can be up to 30 million packets per second that need to be analyzed in real time.
For an analysis to be useful, every packet needs to be analyzed. That missing packet could be the key to determining what is happening in the network. Waiting for the packet to be re-sent is not an option either because we are trying to perform analysis in real time. Packet loss is, therefore, unacceptable for analysis applications.
There can be many causes of packet loss, which can relate to how we get access to the data, the kind of technology used to capture packets, the processing platform, and the application software used to analyze the data. Let’s take a look at each of these in turn.
Source #1: Receipt of packet copies
The first source of packet loss can be the method for receiving copies of the packets for passive, off-line analysis. Switches and routers provide Switched Port ANalyzer (SPAN) ports, which are designed to provide a copy of all packets passing through the given switch or router. Network monitoring and analysis appliances can thus receive the data they need from the SPAN port directly. In most cases, this works well, but there is the potential for packet loss if the switch or router becomes overloaded. In such cases, the switch or router will prioritize its main task of switching and routing and down-prioritize SPAN port tasks. This will result in packets not being delivered for analysis or, in other words, packet loss.
It is for this reason that many prefer to use test access points (TAPs), which are simpler devices installed on the connection itself. A tap simply copies each packet received to the TAP outputs. The advantage of TAPs is that they can guarantee that a copy of each packet received is available. On a typical TAP, two outputs are provided per connection; one for upstream traffic and one for downstream traffic. Therefore, two analysis ports are required to capture and merge this data.
Source #2: Packet-capture technology
The second source of packet loss is the packet-capture technology used. Many appliances are based on standard network interfaces, such as those used for communication. However, these are not designed to handle the large amounts of data that need to be captured. As we said, up to 30 million packets per second need to be captured, but standard network interfaces cannot handle more than five million packets per second at the time of writing.
Another way of looking at this is in relation to what packet sizes are supported. Many of the vendors of standard network interfaces will claim full throughput for 512 bytes and larger packets. With larger packet sizes, there are inversely fewer packets per second to handle. Unfortunately, the Internet and IP networks don’t start at 512 bytes, and it is far from a rare occurrence that smaller packet sizes are used.
If we just look at typical TCP traffic, we can see two distinct breakpoints when analyzing traffic profiles. The first noticeable breakpoint is at 1500 bytes corresponding to the maximum transmission unit (MTU) of the Ethernet protocol. The next breakpoint is at 576 bytes, corresponding to the maximum segment size (MSS) of the transmission control protocol (TCP). Below 576 bytes, there can be a large number of smaller packet sizes correspond to TCP acknowledge packets, control segments, etc., which can be as small as 40 bytes.
This knowledge is often used in test methodologies, where reference is made to the Internet mix or IMIX to simulate internet traffic. A typical IMIX model will use a mix of 40-byte, 576-byte, and 1500-byte traffic corresponding to the breakpoints above. It is therefore clear that discounting traffic below 512 bytes is not providing a realistic and complete picture of what is happening in the network.
To guarantee packet capture, use products that are designed specifically for this task. They must ensure that all packet traffic is captured with zero packet loss even at 100 percent load. Otherwise, the analysis is incomplete. An example of this type of product is Napatech intelligent network adapters (full disclosure: I work for Napatech), which are designed specifically for packet-capture applications. These adapters are also designed for use in standard servers, which are the most common platform for appliance design.
Source #3: Servers
The third source of packet loss is the standard servers that are used as hardware platforms for appliances. If these servers are not configured properly, packets can be lost due to processing congestion. As general-purpose processing platforms, standard servers support many applications simultaneously as well as various adapters. Sharing processing, memory, and data bus (PCIe) resources between these various applications can lead to congestion if not configured properly. Because analysis is performed in real time, the analysis data will be lost unless it is buffered on the network adapter itself.
In addition, modern servers often provide “green” profiles, where power consumption is minimized. This means that very little airflow is provided to the PCIe slots where adapters are installed, so adapters will have difficulty in dissipating heat and can thus lead to the adapter failing (which of course guarantees packet loss). This needs to be considered in the design of the packet capture adapter.
Source #4: Analysis application software
The fourth source of packet loss is the design of the analysis application software defining the network monitoring and analysis appliance. Many applications are implemented using a single thread, meaning that they can only execute on a single CPU core. This is sufficient for lower bit rates but becomes a source of packet loss at higher bit rates, such as 10 Gbps.
The analysis application just cannot keep up. A best practice in such situations is to use a multi-threaded design that can take advantage of the multiple CPU cores available in standard servers. This in turn requires a packet capture adapter that can distribute to the multiple CPU cores in a way that fits the analysis application.
A Final Word
As can be seen, there are multiple sources of packet loss, but with careful consideration of how the data is provided to the appliance, the packet capture adapter used, configuration of the standard server hardware platform and application analysis software design, it is possible to guarantee zero packet loss analysis.
Daniel Joseph Barry is VP of marketing at Napatech.