Author Archives: Steve Uhlig

Comment on “Datacenter Congestion Control: Identifying what is essential and making it practical”

James Roberts

Abstract

We dispute the authors’ claim that SRPT is the crucial factor in achieving good FCT performance in datacenter networks.

Validating the Sharing Behavior and Latency Characteristics of the L4S Architecture

Dejene Boru Oljira, Karl-Johan Grinnemo, Anna Brunstrom, Javid Taheri

Abstract

The strict low-latency requirements of applications such as virtual reality, online gaming, etc., can not be satisfied by the current Internet. This is due to the characteristics of classic TCP such as Reno and TCP Cubic which induce high queuing delays when used for capacity-seeking traffic, which in turn results in unpredictable latency. The Low Latency, Low Loss, Scalable throughput (L4S) architecture addresses this problem by combining scalable congestion controls such as DCTCP and TCP Prague with early congestion signalling from the network. It defines a Dual Queue Coupled (DQC) AQM that isolates low-latency traffic from the queuing delay of classic traffic while ensuring the safe co-existence of scalable and classic flows on the global Internet. In this paper, we benchmark the DualPI2 scheduler, a reference implementation of DQC AQM, to validate some of the experimental result(s) reported in the previous works that demonstrate the co-existence of scalable and classic congestion controls and its low-latency service. Our results validate the co-existence of scalable and classic flows using DualPI2 Single queue (SingleQ) AQM, and queue latency isolation of scalable flows using DualPI2 Dual queue (DualQ) AQM. However, the rate or window fairness between DCTCP without fair-queuing (FQ) pacing and TCP Cubic using DualPI2 DualQ AQM deviates from the original results. We attribute the difference in our results and the original results to the sensitivity of the L4S architecture to traffic bursts and the burst sending pattern of the Linux kernel.

Download the full article

The web is still small after more than a decade

Nguyen Phong Hoang, Arian Akhavan Niaki, Michalis Polychronakis, Phillipa Gill

Abstract

Understanding web co-location is essential for various reasons. For instance, it can help one to assess the collateral damage that denial-of-service attacks or IP-based blocking can cause to the availability of co-located web sites. However, it has been more than a decade since the first study was conducted in 2007. The Internet infrastructure has changed drastically since then, necessitating a renewed study to comprehend the nature of web co-location.

In this paper, we conduct an empirical study to revisit web co-location using datasets collected from active DNS measurements. Our results show that the web is still small and centralized to a handful of hosting providers. More specifically, we find that more than 60% of web sites are co-located with at least ten other web sites—a group comprising less popular web sites. In contrast, 17.5% of mostly popular web sites are served from their own servers.

Although a high degree of web co-location could make co-hosted sites vulnerable to DoS attacks, our findings show that it is an increasing trend to co-host many web sites and serve them from well-provisioned content delivery networks (CDN) of major providers that provide advanced DoS protection benefits. Regardless of the high degree of web co-location, our analyses of popular block lists indicate that IP-based blocking does not cause severe collateral damage as previously thought.

Download the full article

Path persistence in the cloud: A study of the effects of inter-region traffic engineering in a large cloud provider’s network

Waleed Reda, Kirill Bogdanov, Alexandros Milolidakis, Hamid Ghasemirahni, Marco Chiesa, Gerald Q. Maguire, Dejan Kostić

Abstract

A commonly held belief is that traffic engineering and routing changes are infrequent. However, based on our measurements over a number of years of traffic between data centers in one of the largest cloud provider’s networks, we found that it is common for flows to change paths at ten-second intervals or even faster. These frequent path and, consequently, latency variations can negatively impact the performance of cloud applications, specifically, latency-sensitive and geo-distributed applications.

Our recent measurements and analysis focused on observing path changes and latency variations between different Amazon AWS regions. To this end, we devised a path change detector that we validated using both ad hoc experiments and feedback from cloud networking experts. The results provide three main insights: (1) Traffic Engineering (TE) frequently moves (TCP and UDP) flows among network paths of different latency, (2) Flows experience unfair performance, where a subset of flows between two machines can suffer large latency penalties (up to 32% at the 95th percentile) or excessive number of latency changes, and (3) Tenants may have incentives to selfishly move traffic to low latency classes (to boost the performance of their applications). We showcase this third insight with an example using rsync synchronization.

To the best of our knowledge, this is the first paper to reveal the high frequency of TE activity within a large cloud provider’s network. Based on these observations, we expect our paper to spur discussions and future research on how cloud providers and their tenants can ultimately reconcile their independent and possibly conflicting objectives. Our data is publicly available for reproducibility and further analysis at http://goo.gl/25BKte.

Download the full article

RIPE IPmap Active Geolocation: Mechanism and Performance Evaluation

Ben Du, Massimo Candela, Bradley Huffaker, Alex C. Snoeren, kc claffy

Abstract

RIPE IPmap is a multi-engine geolocation platform operated by the RIPE NCC. One of its engines, single-radius, uses active geolocation to infer the geographic coordinates of target IP addresses. In this paper, we first introduce the methodology of IPmap’s single-radius engine, then we evaluate its accuracy, coverage, and consistency, and compare its results with commercial geolocation databases. We found that 80.3% of single-radius results have city-level accuracy for our ground truth dataset, and 87.0% have city-level consistency when geolocating different interfaces on the same routers. Single radius provided geolocation inferences for 78.5% of 26,559 core infrastructure IP addresses from our coverage evaluation dataset. The main contributions of this paper are to introduce and evaluate the IPmap single-radius engine.

Download the full article

The April 2020 Issue

SIGCOMM Computer Communication Review (CCR) is produced by a group of members of our community that spend time to prepare the newsletter that you read every quarter. Olivier Bonaventure served as editor during the last four years and his term is now over. It is my pleasure to now serve the community as the editor of CCR. As Olivier and other editors in the past did, we’ll probably adjust the newsletter to the evolving needs of the community. A first change is the introduction of a new Education series led by Matthew Caesar, our new SIGCOMM Education Director. This series will be part of every issue of CCR, and will contain different types of contributions, not only technical papers as in the current issue, but also position papers (that promote discussion through a defensible opinion on a topic), studies (describing research questions, methods, and results), experience reports (that describe an approach with a reflection on why it did/did not work), and approach reports (that describe a technical approach with enough detail for adoption by others).

This April 2020 issue contains five technical papers, the first paper of our new education series, as well as three editorial notes.

The first technical paper, RIPE IPmap Active Geolocation: Mechanism and Performance Evaluation, by Ben Du and his colleagues, introduces the research community to the IPmap single-radius engine and evaluates its effectiveness against commercial geolocation databases.

It is often believed that traffic engineering changes are rather infrequent. In the second paper, Path Persistence in the Cloud: A Study of the Effects of Inter-Region Traffic Engineering in a Large Cloud Provider’s Network, Waleed Reda and his colleagues reveal the high frequency of traffic engineering activity within a large cloud provider’s network.

In the third paper, The Web is Still Small After More Than a Decade, Nguyen Phong Hoang and his colleagues revisit some of the decade-old studies on web presence and co-location.

The fourth paper, a repeatable paper originated in the IMC reproducibility track, An Artifact Evaluation of NDP, by Noa Zilberman, provides an analysis of NDP (New Data centre protocol). NDP was first presented at ACM SIGCOMM 2017 (best paper award) and proposes a novel data centre transport architecture. In this paper, the author builds the analysis of the artefact proposed by the original authors of NDP, showing how it is possible to carry out research and build new results on previous work done by other fellow researchers.

The Low Latency, Low Loss, Scalable throughput (L4S) architecture addresses this problem by combining scalable congestion control such as DCTCP and TCP Prague with early congestion signalling from the network. In our fifth technical paper, Validating the Sharing Behavior and Latency Characteristics of the L4S Architecture, Dejene Boru Oljira and his colleagues validate some of the experimental result(s) reported in the previous works that demonstrate the co-existence of scalable and classic congestion controls and its low-latency service.

The sixth paper, also our very first paper in the new education series, An Open Platform to Teach How the Internet Practically Works, by Thomas Holterbach and his colleagues, describes a software infrastructure that can be used to teach about how the Internet works. The platform presented by the authors aims to be a much smaller, yet a representative copy of the Internet. The paper’s description and evaluation are focused on technical aspects of the design, but as a teaching tool, it may be more helpful to describe more about pedagogical issues.

Then, we have three very different editorial notes. The first, Workshop on Internet Economics (WIE 2019) report, by kc Klaffy and David Clark, reports on the 2019 interdisciplinary Workshop on Internet Economics (WIE). The second, strongly related to the fourth technical paper, deals with reproducibility. In Thoughts about Artifact Badging, Noa Zilberman and Andrew Moore illustrate that the current badging scheme may not identify limitations of architecture, implementation, or evaluation. Our last editorial note is a comment on a past editorial, “Datacenter Congestion Control: Identifying what is essential and making it practical” by Aisha Mushtaq, et al., from our July 2019 issue. This comment, authored by James Roberts, disputes that shortest remaining processing time (SRPT the crucial factor in achieving good flow completion time (FCT) performance in datacenter networks.

Steve Uhlig — CCR Editor

The January 2020 Issue

This January 2020 issue starts the fiftieth volume of Computer Communication Re- view. This marks an important milestone for our newsletter. This issue contains four tech- nical papers and three editorial notes. In C- Share: Optical Circuits Sharing for Software- Defined Data-Centers, Shay Vargaftik and his colleagues tackle the challenge of designing data-center networks that combine optical circuit switches with traditional packet switches.

In our second paper, A Survey on the Current Internet Interconnection Practices, Pedro Marcos and his colleagues provide the results of a survey answered by about hundred network operators. This provides very interesting data on why and how networks interconnect. In A first look at the Latin American IXPs, Esteban Carisimo and his colleagues analyze how Internet eXchange Points have been deployed in Latin America during the last decade.

Our fourth technical paper, Internet Backbones in Space, explores different aspects of using satellite networks to create Internet backbones. Giacomo Giuliari and his colleagues analyse four approaches to organise routing between the ground segment of satellite networks (SNs) and traditional terrestrial ISP networks,

Then, we have three very different editorial notes. In Network architecture in the age of programmability, Anirudh Sivaraman and his colleagues look at where programmable functions should be placed inside networks and the impact that programmability can have on the network architecture. In The State of Network Neutrality Regulation, Volker Stocker, Georgios Smaragdakis and William Lehr analyse network neutrality from the US and European viewpoints by considering the technical and legal implications of this debate. In Gigabit Broadband Measurement Workshop Report, William Lehr, Steven Bauer and David Clark report on a recent workshop that discusses the challenges of correctly measuring access networks as they reach bandwidth of 1 Gbps and more.

Finally, I’m happy to welcome Matthew Caesar as the new SIGCOMM Education Director. He is currently preparing different initiatives, including an education section in CCR.

I hope that you will enjoy reading this new issue and welcome comments and suggestions on CCR Online (https: //ccronline.sigcomm.org) or by email at ccr-editor at sigcomm.org.

C-Share: Optical Circuits Sharing for Software-Defined Data-Centers

S. Vargaftik, C. Caba, L. Schour, Y. Ben-Itzhak

Abstract

Integrating optical circuit switches in data-centers is an on-going research challenge. In recent years, state-of-the-art solutions introduce hybrid packet/circuit architectures for different optical circuit switch technologies, control techniques, and traffic re-routing methods. These solutions are based on separated packet and circuit planes that cannot utilize an optical circuit with flows that do not arrive from or delivered to switches directly connected to the circuit’s end-points. Moreover, current SDN-based elephant flow re-routing methods require a forwarding rule for each flow, which raises scalability issues.

In this paper, we present C-Share – a scalable SDN-based circuit sharing solution for data center networks. C-Share inherently en-ables elephant flows to share optical circuits by exploiting a flat top-of-rack tier network topology. C-Share is based on a scalable and decoupled SDN-based elephant flow re-routing method comprised of elephant flow detection, tagging and identification, which is utilized by using a prevalent network sampling method (e.g., sFlow). C-Share requires only a single OpenFlow rule for each optical circuit, and therefore significantly reduces the required OpenFlow rule entry footprint and setup rule rate. It also mitigates the OpenFlow outbound latency for subsequent elephant flows. We implement a proof-of-concept system for C-Share based on Mininet, and test the scalability of C-Share by using an event-driven simulation. Our results show a consistent increase in the mice/elephant flow separation in the network, which, in turn, improves both network throughput and flow completion time.

Download the full article

A survey on the current internet interconnection practices

Pedro Marcos, Marco Chiesa, Christoph Dietzel, Marco Canini, Marinho Barcellos

Abstract

The Internet topology has significantly changed in the past years. Today, it is richly connected and flattened. Such a change has been driven mostly by the fast growth of peering infrastructures and the expansion of Content Delivery Networks as alternatives to reduce interconnection costs and improve traffic delivery performance. While the topology evolution is perceptible, it is unclear whether or not the interconnection process has evolved or if it continues to be an ad-hoc and lengthy process. To shed light on the current practices of the Internet interconnection ecosystem and how these could impact the Internet, we surveyed more than 100 network operators and peering coordinators. We divide our results into two parts: (i) the current interconnection practices, including the steps of the process and the reasons to establish new interconnection agreements or to renegotiate existing ones, and the parameters discussed by network operators. In part (ii), we report the existing limitations and how the interconnection ecosystem can evolve in the future. We show that despite the changes in the topology, interconnecting continues to be a cumbersome process that usu- ally takes days, weeks, or even months to complete, which is in stark contrast with the desire of most operators in reducing the interconnection setup time. We also identify that even being primary candidates to evolve the interconnection process, emerging on-demand connectivity companies are only fulfilling part of the existing gap between the current interconnection practices and the network operators’ desires.

Download the full article

Internet backbones in space

Giacomo Giuliari, Tobias Klenze, Markus Legner, David Basin, Adrian Perrig and Ankit Singla

Abstract

Several “NewSpace” companies have launched the rst of thousands of planned satellites for providing global broadband Internet service. The resulting low-Earth-orbit (LEO) constellations will not only bridge the digital divide by providing service to remote areas, but they also promise much lower latency than terrestrial fiber for long- distance routes. We show that unlocking this potential is non-trivial: such constellations provide inherently variable connectivity, which today’s Internet is ill-suited to accommodate. We therefore study cost–performance tradeffs in the design space for Internet routing that incorporates satellite connectivity, examining four solutions ranging from naïvely using BGP to an ideal, clean-slate design. We find that the optimal solution is provided by a path-aware networking architecture in which end-hosts obtain information and control over network paths. However, a pragmatic and more deployable approach inspired by the design of content distribution networks can also achieve stable and close-to-optimal performance.

Download the full article