Fast In-kernel Traffic Sketching in eBPF

Sebastiano Miano, Xiaoqi Chen, Ran Ben Basat, Gianni Antichi

Abstract

The extended Berkeley Packet Filter (eBPF) is an infrastructure that allows to dynamically load and run micro-programs directly in the Linux kernel without recompiling it. In this work, we study how to develop high-performance network measurements in eBPF. We take sketches as case-study, given their ability to support a wide-range of tasks while providing low-memory footprint and accuracy guarantees. We implemented NitroSketch, the state-of-the-art sketch for user-space networking and show that best practices in user-space networking cannot be directly applied to eBPF, because of its different performance characteristics. By applying our lesson learned we improve its performance by 40% compared to a naive implementation.

Download from ACM

The January 2023 issue

This January 2023 issue contains five technical papers.

The first technical paper, Fast In-kernel Traffic Sketching in eBPF, by Sebastiano Miano and colleagues, studies how to develop high-performance network measurements in eBPF. The extended Berkeley Packet Filter (eBPF) allows to dynamically load and run micro-programs in the Linux kernel without the need for recompiling it. The authors use sketches as case-study, given their ability to support a wide-range of tasks while providing low-memory footprint and accuracy guarantees. The authors apply their approach to a state-of-the-art sketch for user-space networking, show that best practices in user-space networking cannot be directly applied to eBPF, and improve its performance by 40% compared to a naive implementation. The lessons learned in this paper are not only applicable to network measurement algorithms but extend to a wide variety of eBPF-based programs.

The second technical paper, Comparing User Space and In-Kernel Packet Processing for Edge Data Centers, by Federico Parola and colleagues, is motivated by the increased availability of small data centers at the edge of the network. Network operators are moving their network functions in these computing facilities. However, commonly used technologies for data plane processing such as DPDK, based on kernel-bypass primitives, provide high performance but at the cost of rigid resource partitioning. This is unsuitable for edge data centers in which efficiency demands both general-purpose applications and data-plane telco workloads to be executed on the same (shared) physical machines. In this respect, eBPF/XDP looks a more appealing solution, thanks to its capability to process packets in the kernel, achieving a higher level of integration with non-data plane applications albeit with lower performance than DPDK. This research addresses the premise that in edge data centers, with limited resources, packet processing and protocol stack workloads are likely to be consolidated within the same servers. As a result, kernel-based XDP may be a more attractive option than DPDK-based data plane processing. This motivates the need for a deeper understanding of kernel-based XDP and its various forms to support different workload types.

The third technical paper, P4RROT: Generating P4 Code for the Application Layer, by Csaba Györgyi and colleagues, proposes a new code generation mechanism to streamline application-level offloads expressed in the P4 programming language. The authors present P4RROT, a new library that allow developers to write application layer logic in Python which is then converted in P4. The authors discuss the pain points and challenges for automatic code generation and show the applicability of P4RROT in two different contexts: a publish-subscribe sensor data processing system and a real-time data streaming engine, supporting MQTT-SN and MoldUDP traffic.

The fourth technical paper, The Slow Path Needs an Accelerator Too!, by Annus Zulfiqar and colleagues, shows that the slow path is set to become a new key bottleneck in Software-Defined Networks (SDNs). The authors present their vision of a new Domain Specific Accelerator (DSA) for the slow path at the end host that sits between the hardware-offloaded data plane and the logically-centralized control plane. They also discuss open problems and call on the networking community to creatively address this emerging issue.

The fifth technical paper, Who squats IPv4 Addresses?, by Loqman Salamatian and colleagues, analyzes the phenomenon of squatted IP space: IPv4 addresses that operators use although they have not been allocated to them. This is possible because larger IPv4 blocks exist that have been allocated to organizations which never announced them in the global routing system. The authors draw on a very large data set of traceroutes and develop a heuristic to identify how squat space is used, by whom, and what the implications for Internet routing and the operator communities are. This paper is a significant contribution of interest to everyone with an interest in the operation of Internet routing and larger networks.

I hope that you will enjoy reading this new issue and welcome comments and suggestions on CCR Online (https://ccronline.sigcomm.org) or by email at ccr-editor at sigcomm.org.

Rethinking SIGCOMM’s Conferences: Making Form Follow Function

Scott Shenker

Abstract

In this short essay, I ask whether our current practice of highly selective conferences is helping us achieve SIGCOMM’s research goals.1 This requires first articulating what those goals are, and then evaluating our practices in relation to those goals. To no one’s surprise, this essay contends that there is a significant mismatch between what I believe SIGCOMM’s goals should be and what our current practices achieve. I then propose a radical restructuring of our conferences that would provide better alignment and, as an additional benefit, a stronger sense of community. However, I wrote this essay not to promote the specifics of a particular proposal, but to encourage our community to (i) engage in a thorough reexamination of how we organize SIGCOMM-sponsored conferences and (ii) seriously entertain the possibility of radical changes in our practices.

Download from ACM

Topology and Geometry of the Third-Party Domains Ecosystem: Measurement and Applications

Costas Iordanou, Fragkiskos Papadopoulos

Abstract

Over the years, web content has evolved from simple text and static images hosted on a single server to a complex, interactive and multimedia-rich content hosted on different servers. As a result, a modern website during its loading time fetches content not only from its owner’s domain but also from a range of third-party domains providing additional functionalities and services. Here, we infer the network of the third-party domains by observing the domains’ interactions within users’ browsers from all over the globe. We find that this network possesses structural properties commonly found in complex networks, such as power-law degree distribution, strong clustering, and small-world property. These properties imply that a hyperbolic geometry underlies the ecosystem’s topology. We use statistical inference methods to find the domains’ coordinates in this geometry, which abstract how popular and similar the domains are. The hyperbolic map we obtain is meaningful, revealing the large-scale organization of the ecosystem. Furthermore, we show that it possesses predictive power, providing us the likelihood that third-party domains are co-hosted; belong to the same legal entity; or merge under the same entity in the future in terms of company acquisition. We also find that complementarity instead of similarity is the dominant force driving future domains’ merging. These results provide a new perspective on understanding the ecosystem’s organization and performing related inferences and predictions.

Download from ACM

The October 2022 issue

This October 2022 issue contains two technical papers and one editorial note.

The first technical paper, LGC-ShQ: Datacenter Congestion Control with Queueless Load-based ECN Marking, by Kristjon Ciko and colleagues, provides a thorough performance evaluation of LGC-ShQ, a novel congestion control (CC)mechanism for data-centers. LGC-ShQ’s performance are compared (over Linux) against HULL, the closest solution in the state-of-the-art.

The second technical paper, Topology and Geometry of the Third-Party Domains Ecosystem: Measurement and Applications, by Costas Iordanou and colleagues, studies the network of the third-party domains by observing the domains’ interactions within users’ browsers from all over the globe. The authors then discuss the structural properties of the corresponding network. The results provide a new perspective on understanding the ecosystem’s organization.

We have one editorial note. In Rethinking SIGCOMM’s Conferences: Making Form Follow Function, Scott Shenker asks whether our current practice of highly selective conferences is helping us achieve SIGCOMM’s research goals. This essay contends that there is a significant mismatch between what SIGCOMM’s goals should be and what our current practices achieve, and proposes a radical restructuring of our conferences that would provide better alignment and, as an additional benefit, a stronger sense of community.

I hope that you will enjoy reading this new issue and welcome comments and suggestions on CCR Online (https://ccronline.sigcomm.org) or by email at ccr-editor at sigcomm.org.

LGC-ShQ: Datacenter Congestion Control with Queueless Load-based ECN Marking

Kristjon Ciko, Peyman Teymoori, Michael Welzl

Abstract

We present LGC-ShQ, a new ECN-based congestion control mechanism for datacenters. LGC-ShQ relies on ECN feedback from a Shadow Queue, and it uses ECN not only to decrease the rate, but it also increases the rate in relation to this signal. Real-life tests in a Linux testbed show that LGC-ShQ keeps the real queue at low levels while achieving good link utilization and fairness.

Download from ACM

AppClassNet: A commercial-grade dataset for application identification research

Wang Chao, Alessandro Finamore, Lixuan Yang, Kevin Fauvel, Dario Rossi

Abstract

The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled to practical availability of data and computing power. Therefore, it is not surprising that the lack of high quality data is often recognized as one of the major factors limiting AI research in several domains, and the networking domain is not excluded. Large companies have access to large data assets, that would constitute interesting benchmarks for algorithmic research in the broader scientific community. However, such datasets are private assets that are generally very difficult to share due to privacy or business sensitivity concerns.

Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature.

To avoid leak of user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset as well as the steps we took to avoid leakage of sensitive information while retaining relevance as a benchmark. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.

Download from ACM

The multiple roles that IPv6 addresses can play in today’s Internet

Maxime Piraux, Tom Barbette, Nicolas Rybowski, Louis Navarre, Thomas Alfroy, Cristel Pelsser, François Michel, Olivier Bonaventure

Abstract

The Internet use IP addresses to identify and locate network interfaces of connected devices. IPv4 was introduced more than 40 years ago and specifies 32-bit addresses. As the Internet grew, available IPv4 addresses eventually became exhausted more than ten years ago. The IETF designed IPv6 with a much larger addressing space consisting of 128-bit addresses, pushing back the exhaustion problem much further in the future.

In this paper, we argue that this large addressing space allows reconsidering how IP addresses are used and enables improving, simplifying and scaling the Internet. By revisiting the IPv6 addressing paradigm, we demonstrate that it opens up several research opportunities that can be investigated today. Hosts can benefit from several IPv6 addresses to improve their privacy, defeat network scanning, improve the use of several mobile access networks and their mobility as well as to increase the performance of multicore servers. Network operators can solve the multihoming problem more efficiently and without putting a burden on the BGP RIB, implement Function Chaining with Segment Routing, differentiate routing inside and outside a domain given particular network metrics and offer more fine-grained multicast services.

Download from ACM

The July 2022 issue

This July 2022 issue contains one technical paper and two editorial notes.

The technical paper, The Packet Number Space Debate in Multipath QUIC, by Quentin De Coninck, deals with how QUIC packets should be numbered over multiple paths. This work provides a comparison between the usage of a single (shared) or multiple packet space numbers for QUIC multipath. The main outcome of the evaluation is that using multiple packet number spaces has the advantage that packet losses can be detected while maintaining a significantly lower state at the receiver. Also, it allows using fewer signalling frames at the cost of a more profound modification of the QUIC protocol.

We have two editorial notes. The first one, The multiple roles that IPv6 addresses can play in today’s Internet, by Maxime Piraux and his colleagues, argues that the large IPv6 addressing space allows reconsidering how IP addresses are used and enables improving, simplifying and scaling the Internet. The second, AppClassNet: A commercial-grade dataset for application identification research by Wang Chao and his colleagues, releases a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community.

I hope that you will enjoy reading this new issue and welcome comments and suggestions on CCR Online (https://ccronline.sigcomm.org) or by email at ccr-editor at sigcomm.org.

The Packet Number Space Debate in Multipath QUIC

Quentin De Coninck

Abstract

With a standardization process that attracted much interest, QUIC can be seen as the next general-purpose transport protocol. Still, it does not provide true multipath support yet, missing some use cases that Multipath TCP addresses. To fill that gap, the IETF recently adopted a Multipath proposal merging several proposed designs. While it focuses on its core components, there still remains one major design issue: the amount of packet number spaces that should be used. This paper provides experimental results with two different Multipath QUIC implementations based on NS3 simulations to understand the impact of using one packet number space per path or a single packet number space for the whole connection. Our results show that using one packet number space per path makes Multipath QUIC more resilient to the receiver’s heuristics to acknowledge packets and detect duplicates.

Download from ACM