Author Archives: Steve Uhlig

Towards Passive Analysis of Anycast in Global Routing: Unintended Impact of Remote Peering

Rui Bian, Shuai Hao, Haining Wang, Amogh Dhamdere, Alberto Dainotti, Chase Cotton

Abstract

Anycast has been widely adopted by today’s Internet services, including DNS, CDN, and DDoS protection, in which the same IP address is announced from distributed locations and clients are directed to the topologically-nearest service replica. Prior research has focused on various aspects of anycast, either its usage in particular services such as DNS or characterizing its adoption by Internet-wide active probing methods. In this paper, we first explore an alternative approach to characterize anycast based on previously collected global BGP routing information. Leveraging state-of-the-art active measurement results as near-ground-truth, our passive method without requiring any Internet-wide probes can achieve 90% accuracy in detecting anycast prefixes. More importantly, our approach uncovers anycast prefixes that have been missed by prior datasets based on active measurements. While investigating the root causes of inaccuracy, we reveal that anycast routing has been entangled with the increased adoption of remote peering, a type of layer-2 interconnection where an IP network may peer at an IXP remotely without being physically present at the IXP. The invisibility of remote peering from layer-3 breaks the assumption of the shortest AS paths on BGP and causes an unintended impact on anycast performance. We identify such cases from BGP routing information and observe that at least 19.2% of anycast prefixes have been potentially impacted by remote peering.

Download the full article

Privacy Trading in the Surveillance Capitalism Age: Viewpoints on `Privacy-Preserving’ Societal Value Creation

Ranjan Pal and Jon Crowcroft

Abstract

In the modern era of the mobile apps (part of the era of surveillance capitalism, a famously coined term by Shoshana Zuboff), huge quantities of data about individuals and their activities offer a wave of opportunities for economic and societal value creation. On the flip side, such opportunities also open up channels for privacy breaches of an individual’s personal information. Data holders (e.g., apps) may hence take commercial advantage of the individuals’ inability to fully anticipate the potential uses of their private information, with detrimental effects for social welfare. As steps to improve social welfare, we comment on the the existence and design of efficient consumer-data releasing ecosystems aimed at achieving a maximum social welfare state amongst competing data holders. In view of (a) the behavioral assumption that humans are `compromising’ beings, (b) privacy not being a well-boundaried good, and (c) the practical inevitability of inappropriate data leakage by data holders upstream in the supply-chain, we showcase the idea of a regulated and radical privacy trading mechanism that preserves the heterogeneous privacy preservation constraints (at an aggregate consumer, i.e., app, level) upto certain compromise levels, and at the same time satisfying commercial requirements of agencies (e.g., advertising organizations) that collect and trade client data for the purpose of behavioral advertising. More specifically, our idea merges supply function economics, introduced by Klemperer and Meyer, with differential privacy, that, together with their powerful theoretical properties, leads to a stable and efficient, i.e., a maximum social welfare, state, and that too in an algorithmically scalable manner. As part of future research, we also discuss interesting additional techno-economic challenges related to realizing effective privacy trading ecosystems.

Download the full article

Datacenter Congestion Control: Identifying what is essential and making it practical

Aisha Mushtaq ,Radhika Mittal, James McCauley, Mohammad Alizadeh, Sylvia Ratnasamy, Scott Shenker

Abstract

Recent years have seen a slew of papers on datacenter congestion control mechanisms. In this editorial, we ask whether the bulk of this research is needed for the common case where congestion control involves hosts responding to simple congestion signals from the network and the performance goal is reducing some average measure of Flow Completion Time. We raise this question because we find that, out of all the possible variations one could make in congestion control algorithms, the most essential feature is the switch scheduling algorithm. More specifically, we find that congestion control mechanisms that use Shortest-Remaining-Processing-Time (SRPT) achieve superior performance as long as the rate-setting algorithm at the host is reasonable. We further find that while SRPT’s performance is quite robust to host behaviors, the performance of schemes that use scheduling algorithms like FIFO or Fair Queuing depend far more crucially on the rate-setting algorithm, and their performance is typically worse than what can be achieved with SRPT. Given these findings, we then ask whether it is practical to realize SRPT in switches without requiring custom hardware. We observe that approximate and deployable SRPT (ADS) designs exist, which leverage the small number of priority queues supported in almost all commodity switches, and require only software changes in the host and the switches. Our evaluations with one very simple ADS design shows that it can achieve performance close to true SRPT and is significantly better than FIFO. Thus, the answer to our basic question – whether the bulk of recent research on datacenter congestion control algorithms is needed for the common case – is no.

Download the full article

The 11th Workshop on Active Internet Measurements (AIMS-11) Workshop Report

kc Claffy, David Clark

Abstract

On 16-17 April 2018, CAIDA hosted its eleventh Workshop on Active Internet Measurements (AIMS-11). This workshop series provides a forum for stakeholders in Internet active measurement projects to communicate their interests and concerns, and explore cooperative approaches to maximizing the collective benefit of deployed infrastructure and gathered data. An overarching theme this year was scaling the storage, indexing, annotation, and usage of Internet measurements. We discussed tradeoffs in use of commercial cloud services to to make measurement results more accessible and informative to researchers in various disciplines. Other agenda topics included status updates on recent measurement infrastructures and community feedback; measurement of poorly configured infrastructure; and recent successes and approaches to evolving challenges in geolocation, topology, route hijacking, and performance measurement. We review highlights of discussions of the talks. This report does not cover each topic discussed; for more details examine workshop presentations linked from the workshop web page: http://www.caida.org/workshops/aims/1904/.

Download the full article

The April 2019 Issue

This April 2019 issue contains two technical papers and four editorial notes. In ”On the Complexity of Non-Segregated Routing in Reconfigurable Data Center Architectures”, Klaus-Tycho Foerster and his colleagues analyse whether a data center net- work could dynamically reconfigure its topology the better meet the traffic demand. They explore this problem as a mathematical optimisation problem and seek exact algorithms. The second technical paper, ”Precise Detection of Content Reuse in the Web” ad- dresses the problem of detecting whether the same content is available on different websites. Calvin Ardi and John Heidemann propose a new methodology that enables re- searchers to discover and detect the reutilisation of content on web servers. They provide both a large dataset and software to analyse it.

The four editorial notes cover very different topics. Kc Claffy and Dave Clark summarise in ”Workshop on Internet Economics (WIE2018) Final Report” a recent workshop on Internet economics. In ”Democratizing the Network Edge”, Larry Peterson and six colleagues encourage the community to participate in the innovation that they predict at the intersection between the cloud and the access networks that many refer to as the edge. They propose a plan for action to have a real impact on this emerging domain. In ”A Broadcast-Only Communication Model Based on Replicated Append- Only Logs”, Christian Tschudin looks at the interplay between the append-only log data structure and broadcast communication techniques. He argues that some network architectures could leverage this interplay.

As a reader of CCR, you already know the benefits of releasing the artifacts asso- ciated to scientific papers. This encourages their replicability and ACM has defined a badging system to recognise the papers that provide such artifacts. Several ACM SIGs have associated artifacts evaluation commit- tees to their flagship conferences and encourage their members to release their paper artifacts. Last year, two evaluation of paper artifacts were organised within SIGCOMM. The first one focussed on the papers that were accepted at the Conext’18 conference. Twelve of the papers presented at Conext’18 received ACM reproducibility badges. The second artifacts evaluation was open to pa- pers accepted by CCR and other SIGCOMM conferences. Twenty eight of these papers received ACM reproducibility badges. These evaluations and some lessons learned are dis- cussed in ”Evaluating the artifacts of SIGCOMM papers” by Damien Saucez, Luigi Iannone and myself. We hope that evalu- ating the artifacts will become a habit for all SIGCOMM conferences.

I hope that you will enjoy reading this new issue and welcome comments and suggestions on CCR Online (https: //ccronline.sigcomm.org) or by email at ccr-editor at sigcomm.org.

Workshop on Internet Economics (WIE2018) Final Report

kc Claffy, David Clark

Abstract

On 12-13 December 2018, CAIDA hosted the 9th interdisciplinary Workshop on Internet Economics (WIE) at the UC San Diego’s Supercomputer Center. This workshop series provides a forum for researchers, Internet facilities and service providers, technologists, economists, theorists, policy makers, and other stakeholders to exchange views on current and emerging regulatory and policy debates. To add clarity to a range of vigorous policy debates, and in pursuit of actionable objectives, this year’s meeting used a different approach to structuring the agenda. Each attendee chose a specific policy goal or harm, and structured their presentation to answer three questions: (1) What data is needed to measure progress toward/away from this goal/harm? (2) What methods do you propose to gather such data? (3) Who are the right entities to gather such data, and how should such data be managed and shared? With a specific focus on measurement challenges, the topics we discussed included: analyzing the evolution of the Internet in a layered-platform context to gain new insights; measurement and analysis of economic impacts of new technologies using old tools; security and trustworthiness, reach (universal service) and reachability, sustainability of investment into Internet infrastructure, as well as infrastructure to measure the Internet. All slides made available at https://www.caida.org/workshops/wie/1812/.

Download the full article

Evaluating the artifacts of SIGCOMM papers

Damien Saucez, Luigi Iannone, Olivier Bonaventure

Abstract

A growing fraction of the papers published by CCR and at SIGCOMM- sponsored conferences include artifacts such as software or datasets. Besides CCR, these artifacts were rarely evaluated. During the last months of 2018, we organised two different Artifacts Evaluation Committees to which authors could submit the artifacts of their papers for evaluation. The first one evaluated the papers accepted by Conext’18 shortly after the TPC decision. It assigned ACM reproducibility badges to 12 different papers. The second one evaluated papers accepted by CCR and any SIGCOMM-sponsored conference. 28 papers received ACM reproducibility badges. We report on the results of a short survey among artifacts authors and reviewers and provide some suggestions for future artifacts evaluations.

Download the full article

Democratizing the Network Edge

Larry Peterson, Tom Anderson, Sachin Katti, Nick Mc Keown, Guru Parulkar, Jennifer Rexford, Mahadev Satyanarayanan, Oguz Sunay, Amin Vahdat

Abstract

With datacenters established as part of the global computing infras- tructure, industry is now in the midst of a transition towards the edge. Previous research initiatives laid the groundwork for this transition, but that is no guarantee the emerging edge will continue to be open to researchers. This paper argues that there is a tremendous opportunity to innovate at the edge, but having impact requires understanding the nature of the current industry momentum, and making a concerted effort to align with that momentum. We believe there are three keys to doing this: (1) focus on the intersection of the cloud and access networks, (2) contribute to the relevant open source projects, and (3) address the challenge of operationalizing the results. The paper puts forward a concrete proposal for all three, and discusses the opportunity to influence how the Internet evolves at the edge and enable new and transformative edge applications.

Download the full article

A Broadcast-Only Communication Model Based on Replicated Append-Only Logs

Christian Tschudin

Abstract

This note is about the interplay between a data structure, the append-only log, and a broadcasting communication abstraction that seems to induce it. We identified real-world systems which have started to exploit this configuration and highlight its desirable properties. Networking research should take note of this develop- ment and adjust its research agenda accordingly.

Download the full article

Precise Detection of Content Reuse in the Web

Calvin Ardi, John Heidemann

Abstract

With vast amount of content online, it is not surprising that unscrupulous entities “borrow” from the web to provide content for advertisements, link farms, and spam. Our insight is that cryptographic hashing and fingerprinting can efficiently identify content reuse for web-size corpora. We develop two related algorithms, one to automatically discover previously unknown duplicate content in the web, and the second to precisely detect copies of discovered or manually identified content. We show that bad neighborhoods, clusters of pages where copied content is frequent, help identify copying in the web. We verify our algorithm and its choices with controlled experiments over three web datasets: Common Crawl (2009/10), GeoCities (1990s–2000s), and a phishing corpus (2014). We show that our use of cryptographic hashing is much more precise than alternatives such as locality-sensitive hashing, avoiding the thousands of false-positives that would otherwise occur. We apply our approach in three systems: discovering and detecting duplicated content in the web, searching explicitly for copies of Wikipedia in the web, and detecting phishing sites in a web browser. We show that general copying in the web is often benign (for example, templates), but 6–11% are commercial or possibly commercial. Most copies of Wikipedia (86%) are commercialized (link farming or advertisements). For phishing, we focus on PayPal, detecting 59% of PayPal-phish even without taking on intentional cloaking.

Download the full article