Matthieu Gouel, Kevin Vermeulen, Maxime Mouchet, Justin P. Rohrer, Olivier Fourmaux, Timur Friedman
We describe a new system for distributed tracing at the IP level of the routes that packets take through the IPv4 internet. Our Zeph algorithm coordinates route tracing efforts across agents at multiple vantage points, assigning to each agent a number of /24 destination prefixes in proportion to its probing budget and chosen according to a reinforcement learning heuristic that aims to maximize the number of multipath links discovered. Zeph runs on top of Iris, our fault-tolerant system for orchestrating internet measurements across distributed agents of heterogeneous probing capacities. Iris is built around third party free open source software and modern containerization technology, thereby presenting a new model for assembling a resilient and maintainable internet measurement architecture. We show that carefully choosing the destinations to probe from which vantage point matters to optimize topology discovery and that a system can learn which assignment will maximize the overall discovery based on previous measurements. After 10 cycles of probing, Zeph is capable of discovering 2.4M nodes and 10M links in a cycle of 6 hours, when deployed on 5 Iris agents. This is at least 2 times more nodes and 5 times more links than other production systems for the same number of prefixes probed.