Tag Archives: scientific

Securing Linux with a Faster and Scalable IPtables

Sebastiano Miano, Matteo Bertrone, Fulvio Risso, Mauricio Vásquez Bernal,
Yunsong Lu, Jianwen Pi

Abstract

The sheer increase in network speed and the massive deployment of containerized applications in a Linux server has led to the consciousness that iptables, the current de-facto firewall in Linux, may not be able to cope with the current requirements particularly in terms of scalability in the number of rules. This paper presents an eBPF-based firewall, bpf-iptables, which emulates the iptables filtering semantic while guaranteeing higher throughput. We compare our implementation against the current version of iptables and other Linux firewalls, showing how it achieves a notable boost in terms of performance particularly when a high number of rules is involved. This result is achieved without requiring custom kernels or additional software frameworks (e.g., DPDK) that could not be allowed in some scenarios such as public data-centers.

Download the full article

Towards Passive Analysis of Anycast in Global Routing: Unintended Impact of Remote Peering

Rui Bian, Shuai Hao, Haining Wang, Amogh Dhamdere, Alberto Dainotti, Chase Cotton

Abstract

Anycast has been widely adopted by today’s Internet services, including DNS, CDN, and DDoS protection, in which the same IP address is announced from distributed locations and clients are directed to the topologically-nearest service replica. Prior research has focused on various aspects of anycast, either its usage in particular services such as DNS or characterizing its adoption by Internet-wide active probing methods. In this paper, we first explore an alternative approach to characterize anycast based on previously collected global BGP routing information. Leveraging state-of-the-art active measurement results as near-ground-truth, our passive method without requiring any Internet-wide probes can achieve 90% accuracy in detecting anycast prefixes. More importantly, our approach uncovers anycast prefixes that have been missed by prior datasets based on active measurements. While investigating the root causes of inaccuracy, we reveal that anycast routing has been entangled with the increased adoption of remote peering, a type of layer-2 interconnection where an IP network may peer at an IXP remotely without being physically present at the IXP. The invisibility of remote peering from layer-3 breaks the assumption of the shortest AS paths on BGP and causes an unintended impact on anycast performance. We identify such cases from BGP routing information and observe that at least 19.2% of anycast prefixes have been potentially impacted by remote peering.

Download the full article

Precise Detection of Content Reuse in the Web

Calvin Ardi, John Heidemann

Abstract

With vast amount of content online, it is not surprising that unscrupulous entities “borrow” from the web to provide content for advertisements, link farms, and spam. Our insight is that cryptographic hashing and fingerprinting can efficiently identify content reuse for web-size corpora. We develop two related algorithms, one to automatically discover previously unknown duplicate content in the web, and the second to precisely detect copies of discovered or manually identified content. We show that bad neighborhoods, clusters of pages where copied content is frequent, help identify copying in the web. We verify our algorithm and its choices with controlled experiments over three web datasets: Common Crawl (2009/10), GeoCities (1990s–2000s), and a phishing corpus (2014). We show that our use of cryptographic hashing is much more precise than alternatives such as locality-sensitive hashing, avoiding the thousands of false-positives that would otherwise occur. We apply our approach in three systems: discovering and detecting duplicated content in the web, searching explicitly for copies of Wikipedia in the web, and detecting phishing sites in a web browser. We show that general copying in the web is often benign (for example, templates), but 6–11% are commercial or possibly commercial. Most copies of Wikipedia (86%) are commercialized (link farming or advertisements). For phishing, we focus on PayPal, detecting 59% of PayPal-phish even without taking on intentional cloaking.

Download the full article

On the Complexity of Non-Segregated Routing in Reconfigurable Data Center Architectures

Klaus-Tycho Foerster, Maciej Pacut, Stefan Schmid

Abstract

By enhancing the traditional static network (e.g., based on electric switches) with a dynamic topology (e.g., based on reconfigurable optical switches), emerging reconfigurable data centers introduce unprecedented flexibilities in how networks can be optimized toward the workload they serve. However, such hybrid data centers are currently limited by a restrictive routing policy enforcing artificial segregation: each network flow can only use either the static or the flexible topology, but not a combination of the two.

This paper explores the algorithmic problem of supporting more general routing policies, which are not limited by segregation. While the potential benefits of non-segregated routing have been demonstrated in recent work, the underlying algorithmic complexity is not well-understood. We present a range of novel results on the algorithmic complexity of non-segregated routing. In particular, we show that in certain specific scenarios, optimal data center topologies with non-segregated routing policies can be computed in polynomial-time. In many variants of the problem, however, introducing a more flexible routing comes at a price of complexity: we prove several important variants to be NP-hard.

Download the full article

On Max-min Fair Allocation for Multi-source Transmission

Geng Li, Yichen Qian, Y. Richard Yang

Abstract

Max-min fair is widely used in network traffic engineering to allocate available resources among different traffic transfers. Recently, as data replication technique developed, increasing systems enforce multi-source transmission to maximize network utilization. However, existing TE approaches fail to deal with multi-source transfers because the optimization becomes a joint problem of bandwidth allocation as well as flow assignment among different sources. In this paper, we present a novel allocation approach for multi-source transfers to achieve global max-min fairness. The joint bandwidth allocation and flow assignment optimization problem poses a major challenge due to nonlinearity and multiple objectives. We cope with this by deriving a novel transformation with simple equivalent canonical linear programming to achieve global optimality efficiently. We conduct data-driven simulations, showing that our approach is more max-min fair than other single-source and multi-source allocation approaches, meanwhile it outperforms others with substantial gains in terms of network throughput and transfer completion time.

Download the full article

On Collaborative Predictive Blacklisting

Luca Melis, Apostolos Pyrgelis, Emiliano De Cristofaro

Abstract

Collaborative predictive blacklisting (CPB) allows to forecast future attack sources based on logs and alerts contributed by multiple organizations. Unfortunately, however, research on CPB has only focused on increasing the number of predicted attacks but has not considered the impact on false positives and false negatives. Moreover, sharing alerts is often hindered by confidentiality, trust, and liability issues, which motivates the need for privacy-preserving approaches to the problem. In this paper, we present a measurement study of state-of-the-art CPB techniques, aiming to shed light on the actual impact of collaboration. To this end, we reproduce and measure two systems: a non privacy-friendly one that uses a trusted coordinating party with access to all alerts [12] and a peer-to-peer one using privacy-preserving data sharing [8]. We show that, while collaboration boosts the number of predicted attacks, it also yields high false positives, ultimately leading to poor accuracy. This motivates us to present a hybrid approach, using a semi-trusted central entity, aiming to increase utility from collaboration while, at the same time, limiting information disclosure and false positives. This leads to a better trade-off of true and false positive rates, while at the same time addressing privacy concerns.

Download the full article

Bootstrapping Privacy Services in Today’s Internet

Taeho Lee, Christos Pappas, Adrian Perrig

Abstract

Internet users today have few solutions to cover a large space of diverse privacy requirements. We introduce the concept of privacy domains, which provide flexibility in expressing users’ privacy requirements. Then, we propose three privacy services that construct meaningful privacy domains and can be offered by ISPs. Furthermore, we illustrate that these services introduce little overhead for communication sessions and that they come with a low deployment barrier for ISPs.

Download the full article

Learning IP Network Representations

Mingda Li, Cristian Lumezanu, Bo Zong, Haifeng Chen

Abstract

We present DIP, a deep learning based framework to learn structural properties of the Internet, such as node clustering or distance between nodes. Existing embedding-based approaches use linear algorithms on a single source of data, such as latency or hop count information, to approximate the position of a node in the Internet. In contrast, DIP computes low-dimensional representations of nodes that preserve structural properties and non-linear relationships across multiple, heterogeneous sources of structural information, such as IP, routing, and distance information. Using a large real-world data set, we show that DIP learns representations that preserve the real-world clustering of the associated nodes and predicts distance between them more than 30% better than a meanbased approach. Furthermore, DIP accurately imputes hop count distance to unknown hosts (i.e., not used in training) given only their IP addresses and routable prefixes. Our framework is extensible to new data sources and applicable to a wide range of problems in network monitoring and security

Download the full article

Refining Network Intents for Self-Driving Networks

Arthur Selle Jacobs, Ricardo José Pfitscher , Ronaldo Alves Ferreira, Lisandro Zambenedetti Granville

Abstract

Recent advances in artificial intelligence (AI) offer an opportunity for the adoption of self-driving networks. However, network operators or home-network users still do not have the right tools to exploit these new advancements in AI, since they have to rely on low-level languages to specify network policies. Intent-based networking (IBN) allows operators to specify high-level policies that dictate how the network should behave without worrying how they are translated into configuration commands in the network devices. However, the existing research proposals for IBN fail to exploit the knowledge and feedback from the network operator to validate or improve the translation of intents. In this paper, we introduce a novel intent-refinement process that uses machine learning and feedback from the operator to translate the operator’s utterances into network configurations. Our refinement process uses a sequence-to-sequence learning model to extract intents from natural language and the feedback from the operator to improve learning. The key insight of our process is an intermediate representation that resembles natural language that is suitable to collect feedback from the operator but is structured enough to facilitate precise translations. Our prototype interacts with a network operator using natural language and translates the operator input to the intermediate representation before translating to SDN rules. Our experimental results show that our process achieves a correlation coefficient squared (i.e., R-squared of 0.99 for a dataset with 5000 entries and the operator feedback significantly improves the accuracy of our model.

Download the full article

Making Content Caching Policies ‘Smart’ using the DeepCache Framework

Arvind Narayanan, Saurabh Verma, Eman Ramadan, Pariya Babaie, Zhi-Li Zhang

Abstract

In this paper, we present DeepCache a novel framework for content caching, which can significantly boost cache performance. Our framework is based on powerful deep recurrent neural network models. It comprises of two main components: i) Object Characteristics Predictor, which builds upon deep LSTM Encoder-Decoder model to predict the future characteristics of an object (such as object popularity) — to the best of our knowledge, we are the first to propose LSTM Encoder-Decoder model for content caching; ii) a caching policy component, which accounts for predicted information of objects to make smart caching decisions. In our thorough experiments, we show that applying DeepCache Framework to existing cache policies, such as LRU and k-LRU, significantly boosts the number of cache hits.

Download the full article