Tag Archives: technical

Towards retina-quality VR video streaming: 15ms could save you 80% of your bandwidth

Luke Hsiao, Brooke Krajancich, Philip Levis, Gordon Wetzstein, Keith Winstein

Abstract

Virtual reality systems today cannot yet stream immersive, retina-quality virtual reality video over a network. One of the greatest challenges to this goal is the sheer data rates required to transmit retina-quality video frames at high resolutions and frame rates. Recent work has leveraged the decay of visual acuity in human perception in novel gaze-contingent video compression techniques. In this paper, we show that reducing the motion-to-photon latency of a system itself is a key method for improving the compression ratio of gaze-contingent compression. Our key finding is that a client and streaming server system with sub-15ms latency can achieve 5x better compression than traditional techniques while also using simpler software algorithms than previous work.

Download from ACM

Zeph & Iris map the internet: A resilient reinforcement learning approach to distributed IP route tracing

Matthieu Gouel, Kevin Vermeulen, Maxime Mouchet, Justin P. Rohrer, Olivier Fourmaux, Timur Friedman

Abstract

We describe a new system for distributed tracing at the IP level of the routes that packets take through the IPv4 internet. Our Zeph algorithm coordinates route tracing efforts across agents at multiple vantage points, assigning to each agent a number of /24 destination prefixes in proportion to its probing budget and chosen according to a reinforcement learning heuristic that aims to maximize the number of multipath links discovered. Zeph runs on top of Iris, our fault-tolerant system for orchestrating internet measurements across distributed agents of heterogeneous probing capacities. Iris is built around third party free open source software and modern containerization technology, thereby presenting a new model for assembling a resilient and maintainable internet measurement architecture. We show that carefully choosing the destinations to probe from which vantage point matters to optimize topology discovery and that a system can learn which assignment will maximize the overall discovery based on previous measurements. After 10 cycles of probing, Zeph is capable of discovering 2.4M nodes and 10M links in a cycle of 6 hours, when deployed on 5 Iris agents. This is at least 2 times more nodes and 5 times more links than other production systems for the same number of prefixes probed.

Download from ACM

An educational toolkit for teaching cloud computing

Cosimo Anglano, Massimo Canonico, Marco Guazzone

Abstract

In an educational context, experimenting with a real cloud computing platform is very important to let students understand the core concepts, methodologies and technologies of cloud computing. However, API heterogeneity of cloud providers complicates the experimentation by forcing students to focus on the use of different APIs, and by hindering the jointly use of different platforms. In this paper, we present EasyCloud, a toolkit enabling the easy and effective use of different cloud platforms. In particular, we describe its features, architecture, scalability, and use in our cloud computing courses, as well as the pedagogical insights we learnt over the years.

Download from ACM

Machine learning-based analysis of COVID-19 pandemic impact on US research networks

Mariam Kiran, Scott Campbell, Fatema Bannat Wala, Nick Buraglio, Inder Monga

Abstract

This study explores how fallout from the changing public health policy around COVID-19 has changed how researchers access and process their science experiments. Using a combination of techniques from statistical analysis and machine learning, we conduct a retrospective analysis of historical network data for a period around the stay-at-home orders that took place in March 2020. Our analysis takes data from the entire ESnet infrastructure to explore DOE high-performance computing (HPC) resources at OLCF, ALCF, and NERSC, as well as User sites such as PNNL and JLAB. We look at detecting and quantifying changes in site activity using a combination of t-Distributed Stochastic Neighbor Embedding (t-SNE) and decision tree analysis. Our findings bring insights into the working patterns and impact on data volume movements, particularly during late-night hours and weekends.

Download from ACM

REDACT: refraction networking from the data center

Arjun Devraj, Liang Wang, Jennifer Rexford

Abstract

Refraction networking is a promising censorship circumvention technique in which a participating router along the path to an innocuous destination deflects traffic to a covert site that is otherwise blocked by the censor. However, refraction networking faces major practical challenges due to performance issues and various attacks (e.g., routing-around-the-decoy and fingerprinting). Given that many sites are now hosted in the cloud, data centers offer an advantageous setting to implement refraction networking due to the physical proximity and similarity of hosted sites. We propose REDACT, a novel class of refraction networking solutions where the decoy router is a border router of a multi-tenant data center and the decoy and covert sites are tenants within the same data center. We highlight one specific example REDACT protocol, which leverages TLS session resumption to address the performance and implementation challenges in prior refraction networking protocols. REDACT also offers scope for other designs with different realistic use cases and assumptions.

Download from ACM

When latency matters: measurements and lessons learned

Marco Iorio, Fulvio Risso, Claudio Casetti

Abstract

Several emerging classes of interactive applications are demanding for extremely low-latency to be fully unleashed, with edge computing generally regarded as a key enabler thanks to reduced delays. This paper presents the outcome of a large-scale end-to-end measurement campaign focusing on task-offloading scenarios, showing that moving the computation closer to the end-users, alone, may turn out not to be enough. Indeed, the complexity associated with modern networks, both at the access and in the core, the behavior of the protocols at different levels of the stack, as well as the orchestration platforms used in data-centers hide a set of pitfalls potentially reverting the benefits introduced by low propagation delays. In short, we highlight how ensuring good QoS to latency-sensitive applications is definitely a multi-dimensional problem, requiring to cope with a great deal of customization and cooperation to get the best from the underlying network.

Download from ACM

P4Pi: P4 on Raspberry Pi for networking education

Sándor Laki, Radostin Stoyanov, Dávid Kis, Robert Soulé, Péter Vörös, Noa Zilberman

Abstract

High level, network programming languages, like P4, enable students to gain hands-on experience in the structure of a switch or router. Students can implement the packet processing pipeline themselves, without prior knowledge of circuit design. However, when choosing a P4 programmable target for use in the classroom, instructors face a lack of options. On the one hand, software solutions, such as the behavioral model (BMv2) switch, are overly simplified and offer low performance. On the other hand, existing hardware solutions are closed source and expensive.

In this paper, we present P4Pi, a new, low-cost, open-source hardware platform intended for networking education. P4Pi allows students to design and deploy P4-based network devices using the Raspberry Pi board, which has a price tag of less than many academic textbooks. We describe the high-level design of the P4Pi platform, offer some suggestions for how P4Pi could be used in the classroom, and present some additional use-cases for applications and functionality that could be developed using P4Pi.

Download from ACM

The graph neural networking challenge: a worldwide competition for education in AI/ML for networks

José Suárez-Varela, Miquel Ferriol-Galmés, Albert López, Paul Almasan, Guillermo Bernárdez, David Pujol-Perich, Krzysztof Rusek, Loïck Bonniot, Christoph Neumann, François Schnitzler, François Taïani, Martin Happ, Christian Maier, Jia Lei Du, Matthias Herlich, Peter Dorfinger, Nick Vincent Hainke, Stefan Venz, Johannes Wegener, Henrike Wissing, Bo Wu, Shihan Xiao, Pere Barlet-Ros, Albert Cabellos-Aparicio

Abstract

During the last decade, Machine Learning (ML) has increasingly become a hot topic in the field of Computer Networks and is expected to be gradually adopted for a plethora of control, monitoring and management tasks in real-world deployments. This poses the need to count on new generations of students, researchers and practitioners with a solid background in ML applied to networks. During 2020, the International Telecommunication Union (ITU) has organized the “ITU AI/ML in 5G challenge”, an open global competition that has introduced to a broad audience some of the current main challenges in ML for networks. This large-scale initiative has gathered 23 different challenges proposed by network operators, equipment manufacturers and academia, and has attracted a total of 1300+ participants from 60+ countries. This paper narrates our experience organizing one of the proposed challenges: the “Graph Neural Networking Challenge 2020”. We describe the problem presented to participants, the tools and resources provided, some organization aspects and participation statistics, an outline of the top-3 awarded solutions, and a summary with some lessons learned during all this journey. As a result, this challenge leaves a curated set of educational resources openly available to anyone interested in the topic.

Download from ACM

NemFi: record-and-replay to emulate WiFi

Abhishek kumar Mishra, Sara Ayoubi, Giulio Grassi, Renata Teixeira

Abstract

This paper presents NemFi: a trace-driven WiFi emulator. NemFi is a record-and-replay emulator that captures traces representing real WiFi conditions, and later replay these traces to reproduce the same conditions. In this paper, we demonstrate that the state-of-the-art emulator that was developed for cellular links cannot emulate WiFi conditions. We identify the three key differences that must be addressed to enable accurate WiFi record-and-replay: WiFi packet losses, medium-access control, and frame aggregation. We then extend the existing cellular network emulator to support WiFi record-and-replay. We evaluate the performance of NemFi via repeated experimentation across different WiFi conditions and for three different types of applications: speed-test, file download, and video streaming. Our experimental results demonstrate that average application performance over NemFi and real WiFi links is similar (with less than 3 percent difference).

Download from ACM

Surviving switch failures in cloud datacenters

Rachee Singh, Muqeet Mukhtar, Ashay Krishna, Aniruddha Parkhi, Jitendra Padhye, David Maltz

Abstract

Switch failures can hamper access to client services, cause link congestion and blackhole network traffic. In this study, we examine the nature of switch failures in the datacenters of a large commercial cloud provider through the lens of survival theory. We study a cohort of over 180,000 switches with a variety of hardware and software configurations and find that datacenter switches have a 98% likelihood of functioning uninterrupted for over 3 months since deployment in production. However, there is significant heterogeneity in switch survival rates with respect to their hardware and software: the switches of one vendor are twice as likely to fail compared to the others. We attribute the majority of switch failures to hardware impairments and unplanned power losses. We find that the in-house switch operating system, SONiC, boosts the survival likelihood of switches in datacenters by 1% by eliminating switch failures caused by software bugs in vendor switch OSes.

Download from ACM