Author Archives: Steve Uhlig

TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs

Girma M. Yilma, Jose A. Ayala-Romero, Andres Garcia-Saavedra, Xavier Costa-Perez

Abstract

Large Language Models (LLMs) have immense potential to transform the telecommunications industry. They could help professionals understand complex standards, generate code, and accelerate development. However, traditional LLMs struggle with the precision and source verification essential for telecom work. To address this, specialized LLM-based solutions tailored to telecommunication standards are needed. This Editorial Note showcases how Retrieval-Augmented Generation (RAG) can offer a way to create precise, factual answers. In particular, we show how to build a Telecommunication Standards Assistant that provides accurate, detailed, and verifiable responses. We show a usage example of this framework using 3GPP Release 16 and Release 18 specification documents. We believe that the application of RAG can bring significant value to the telecommunications field.

Download from ACM

Towards Immersive Cloud-Based IoT Education

Fan Gabriella Xue, Matthew Caesar

Abstract

An increasing number of students are becoming interested in learning about the Internet of Things (IoT) space. However, today, we lack scalable and efficient ways to bring hands-on IoT learning to many due to hardware accessibility, system complexity, and deployment environment constraints. This paper presents ThingVisor, an IoT learning platform that enables hands-on IoT development in an immersive virtual space. Specifically, it allows users to design, test, and deploy IoT devices virtually in a simulated IoT world with static and dynamic software verification as a complementary tool to IoT education. ThingVisor consists of (1) a Device Design Stack to configure virtual IoT devices, (2) an Immersive Runtime Stack to interact with devices and environment, and (3) a Device Emulator, which is a runtime environment used to execute virtual devices to get their behaviors. Our experiments confirm the learning effectiveness and user satisfaction of our platform. Additionally, we have demonstrated the scalability and usability of the system through load testing and application of the System Usability Scale. Our results indicate that students can achieve up to a 32% improvement in their scores after engaging with ThingVisor for two weeks, irrespective of their prior experience.

Download from ACM

A Survey on Packet Filtering

Nik Sultana, Hyunsuk Bang, Elena Yulaeva, Ricky K. P. Mok, Kc Claffy, Richard Mortier

Abstract

Packet filtering has remained a key network monitoring primitive over decades, even as networking has continuously evolved. In this article we present the results of a survey we ran to collect data from the networking community, including researchers and practitioners, about how packet filtering is used. In doing so, we identify pain points related to packet filtering, and unmet needs of survey participants. Based on analysis of this survey data, we propose future research and development goals that would support the networking community.

Download from ACM

The July 2024 issue

This July 2024 issue contains one technical paper, one educational paper, and one editorial note.

The technical paper, A Survey on Packet Filtering, by Nick Sultana and colleagues, was originally submitted as an editorial. Given that CCR does not usually consider survey papers, it went through a thorough reviewing process. Given its value to the community, we felt that it deserves to be accepted as a technical paper, not an editorial. The topic of this work is important to the community, namely packet filtering. The authors present the results of a survey they ran to collect data from the networking community, including researchers and practitioners, about how packet filtering is used. They identify pain points related to packet filtering, and unmet needs of survey participants. Based on analysis of this survey data, they propose future research and development goals that would support the networking community.

The second paper, an educational contribution, Towards Immersive Cloud-Based IoT Education, by Fan Gabriella Xue and Matthew Caesar, presents ThingVisor, an IoT learning platform that enables hands-on IoT development in an immersive virtual space. Specifically, it allows users to design, test, and deploy IoT devices virtually in a simulated IoT world with static and dynamic software verification as a complementary tool to IoT education. The experiments confirm the learning effectiveness and user satisfaction of the platform, as well as the scalability and usability of the system.

Finally, the editorial note, TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs by Girma M. Yilma and colleagues, discusses the very timely topic of Large Language Models (LLMs), and discuss the potential they have in transforming the telecommunications industry.

I hope that you will enjoy reading this new issue and welcome comments and suggestions on CCR Online (https://ccronline.sigcomm.org) or by email at ccr-editor at sigcomm.org.

Towards Re-Architecting Today’s Internet for Survivability: NSF Workshop Report

Fabián E. Bustamante, John Doyle, Walter Willinger, Marwan Fayed, David L. Alderson, Steven Low, Stefan Savage, Henning Schulzrinne

Abstract

On November 28–29, 2023, Northwestern University hosted a workshop titled “Towards Re-architecting Today’s Internet for Survivability” in Evanston, Illinois, US. The goal of the workshop was to bring together a group of national and international experts to sketch and start implementing a transformative research agenda for solving one of our community’s most challenging yet important tasks: the re-architecting of tomorrow’s Internet for “survivability”, ensuring that the network is able to fulfill its mission even in the presence of large-scale catastrophic events. This report provides a necessarily brief overview of two full days of active discussions.

Download from ACM

On Sample Selection for Continual Learning: A Video Streaming Case Study

Alexander Dietmüller, Romain Jacob, Laurent Vanbever

Abstract

Machine learning (ML) is a powerful tool to model the complexity of communication networks. As networks evolve, we cannot only train once and deploy. Retraining models, known as continual learning, is necessary. Yet, to date, there is no established methodology to answer the key questions: With which samples to retrain? When should we retrain?
We address these questions with the sample selection system Memento, which maintains a training set with the “most useful” samples to maximize sample space coverage. Memento particularly benefits rare patterns—the notoriously long “tail” in networking—and allows assessing rationally when retraining may help, i.e., when the coverage changes.
We deployed Memento on Puffer, the live-TV streaming project, and achieved a 14 % reduction of stall time, 3.5× the improvement of random sample selection. Memento is model-agnostic and can be applied beyond video streaming.

Download from ACM

The April 2024 issue

This April 2024 issue contains two technical papers and one editorial note.

The first technical paper, This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data, by Raffaele Sommese and colleagues, presents a measurement study that investigates ccTLD coverage using public data sources. Domain lists such as Alexa and Tranco are crucial tools for performing representative Web censuses. However, these lists often overlook domains under country-code top-level domains (ccTLDs), resulting in biased census outcomes. The authors demonstrate that data from Certificate Transparency (CT) logs and Common Crawl provide robust ccTLD coverage and can serve as reliable proxies for Web censuses. The authors also plan to release ccTLD domain names to the community as part of an expansion of the daily OpenINTEL measurement.

The second technical paper, On Sample Selection for Continual Learning: a Video Streaming Case Study, by Alexander Dietmüller and colleagues, proposes Memento, a prototype implementation aiming at improving sample selection, especially in the tail of the traffic distribution. An already sizable and increasing body of work focuses on using ML models to capture the inherent complexity of communication networks. And yet, as network traffic evolves, ML approaches need to tackle the issue of concept drift, which implies that the model will eventually underperform. This means that ML models need retraining. When to retrain and on what data remain difficult problems. This paper focuses precisely on those open issues.

Finally, the editorial note, Towards Re-architecting Today’s Internet for Survivability — NSF Workshop Report by Fabian Bustamante and colleagues, reports on a workshop that took place on November 28-29, 2023, at Northwestern University. The goal of the workshop was to bring together a group of national and international experts to sketch and start implementing a transformative research agenda for solving one of our community’s most challenging yet important tasks: the re-architecting of tomorrow’s Internet for “survivability”, ensuring that the network is able to fulfill its mission even in the presence of large-scale catastrophic events.

I hope that you will enjoy reading this new issue and welcome comments and suggestions on CCR Online (https://ccronline.sigcomm.org) or by email at ccr-editor at sigcomm.org.

This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data

Raffaele Sommese, Roland van Rijswijk-Deij, Mattijs Jonker

Abstract

Domain lists are a key ingredient for representative censuses of the Web. Unfortunately, such censuses typically lack a view on domains under country-code top-level domains (ccTLDs). This introduces unwanted bias: many countries have a rich local Web that remains hidden if their ccTLDs are not considered. The reason ccTLDs are rarely considered is that gaining access – if possible at all – is often laborious. To tackle this, we ask: what can we learn about ccTLDs from public sources? We extract domain names under ccTLDs from 6 years of public data from Certificate Transparency logs and Common Crawl. We compare this against ground truth for 19 ccTLDs for which we have the full DNS zone. We find that public data covers 43%-80% of these ccTLDs, and that coverage grows over time. By also comparing port scan data we then show that these public sources reveal a significant part of the Web presence under a ccTLD. We conclude that in the absence of full access to ccTLDs, domain names learned from public sources can be a good proxy when performing Web censuses.

Download from ACM

iip: An Integratable TCP/IP Stack

Kenichi Yasukata

Abstract

This paper presents iip, an integratable TCP/IP stack, which aims to become a handy option for developers and researchers who wish to have a high-performance TCP/IP stack implementation for their projects. The problem that motivated us to newly develop iip is that existing performance-optimized TCP/IP stacks often incur tremendous integration complexity and existing portability-aware TCP/IP stacks have significant performance limitations. In this paper, we overhaul the responsibility boundary between a TCP/IP stack implementation and the code provided by developers, and introduce an API that enables iip to allow for easy integration and good performance simultaneously, then report performance numbers of iip along with insights on performance-critical factors.

Download from ACM

Planter: Rapid Prototyping of In-Network Machine Learning Inference

Changgang Zheng, Mingyuan Zang, Xinpeng Hong, Liam Perreault, Riyad Bensoussane, Shay Vargaftik, Yaniv Ben-Itzhak, Noa Zilberman

Abstract

In-network machine learning inference provides high throughput and low latency. It is ideally located within the network, power efficient, and improves applications’ performance. Despite its advantages, the bar to in-network machine learning research is high, requiring significant expertise in programmable data planes, in addition to knowledge of machine learning and the application area. Existing solutions are mostly one-time efforts, hard to reproduce, change, or port across platforms. In this paper, we present Planter: a modular and efficient open-source framework for rapid prototyping of in-network machine learning models across a range of platforms and pipeline architectures. By identifying general mapping methodologies for machine learning algorithms, Planter introduces new machine learning mappings and improves existing ones. It provides users with several example use cases and supports different datasets, and was already extended by users to new fields and applications. Our evaluation shows that Planter improves machine learning performance compared with previous model-tailored works, while significantly reducing resource consumption and co-existing with network functionality. Planter-supported algorithms run at line rate on unmodified commodity hardware, providing billions of inference decisions per second.

Download from ACM

Computer Communication Review

The ACM SIGCOMM newsletter

Author Archives: Steve Uhlig

TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs

Towards Immersive Cloud-Based IoT Education

A Survey on Packet Filtering

The July 2024 issue

Towards Re-Architecting Today’s Internet for Survivability: NSF Workshop Report

On Sample Selection for Continual Learning: A Video Streaming Case Study

The April 2024 issue

This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data

iip: An Integratable TCP/IP Stack

Planter: Rapid Prototyping of In-Network Machine Learning Inference