Network telemetry is about understanding what is happening in the current network. It serves as the basis for making a variety of management decisions for improving the performance, availability, security, and efficiency of networks. However, it is challenging to build real-time and fine-grained network telemetry systems because of the need to support a variety of measurement queries, handle a large amount of traffic for large networks, while staying within the resource constraints at hosts and switches. Today, most operators take a bottom-up approach by passively collecting data from individual devices and infer the network-wide information they need. They are often limited by the monitoring tools device vendors provide and find it hard to extract useful information. In this paper, we argue for a top-down approach: We should provide a high-level declarative abstraction for operators to specify measurement queries, programmable measurement primitives at switches and hosts, and a runtime that translates the high-level queries into low-level API calls. We discuss a few recent works taking this top-down approach and call for more research in this direction.
We have published a paper with the same vision a few years back. Dynamic Network Probes: A Stepping Stone to Omni Network Visibility (https://arxiv.org/abs/1612.02862v1). Many challenges remain. Hope to see it comes into reality through solid research that is applicable to various target devices.