[Community Feedback] A Longitudinal Study of Utilization at Internet Interconnection Points

This paper has been submitted to CCR. This is a draft version of the paper that has not been peer-reviewed. Comments on the paper or the supplementary material are encouraged through the comment facility at the bottom of this page.
N. Feamster
Abstract

The increase in high-volume traffic flows due to applications such as video streaming draw new attention on utilization at the interconnections between the Internet’s independently operated networks. This paper surveys the findings from nearly two years of Internet utilization data provided by seven participating ISPs—Bright House Networks, Comcast, Cox, Mediacom, Midco, Suddenlink, and Time Warner Cable—whose access networks represent about 50% of all U.S. broadband subscribers. The dataset spans 18 months and includes about 97% of the paid peering, settlement-free peering, and ISP- paid transit links of each of the participating ISPs. Analysis of the data—which comprises more than 1,000 link groups, representing the diverse and substitutable available routes— suggests that many interconnects have significant spare capacity, that this spare capacity exists both across ISPs in each region and in aggregate for any individual ISP, and that the aggregate utilization across interconnects is roughly 50% during peak periods.

Draft article

Supplementary material

2 comments

  1. This paper illustrates the challenges and risks of working with data that is covered in its raw form by an NDA. The data provided by the ISPs are a fantastic source of information, but the providers have required that it be aggregated in ways that obscure the most interesting conclusions, as the paper makes clear.

    The way this data is aggregated here has to be evaluated in the context of Washington policy-making, because different industry sectors are (or in past disputes have been) making claims about the character of interconnection that reflect how they want their industry sector to be seen. Data analysis of this sort is a policy and strategic activity, not just a technical exercise. The particular dispute of relevance here is between the large ISPs and the large content providers (think Netflix). Netflix was arguing that it had inadequate options to deliver all its content into the large access ISPs, which created enough market power for the access ISPs that they could extract payments from the content providers for direct interconnection. The ISPs were arguing that there were enough uncongested links into their networks that Netflix had many options to deliver traffic, and the ISPs had no significant market power that might attract regulatory attention.

    In this context, it is useful for an ISP to point to a graph such as Figure 2 to support the claim that there is lots of spare capacity, at least somewhere. But this figure is across all links of all the ISPs in the study, which connect those ISPs to all their interconnecting parties, both traditional peers (e.g., Comcast to ATT) and as well the links that represent direct interconnection to content providers (so-called paid peering). The required aggregation prevents the paper from discussing which ISPs participated in the study, how they vary in size, and so on. So a lot of heterogeneity has been rolled up in these aggregates. When the paper looks at a given region (metro area), multiple ISPs are being combined, so it is not possible to tell if there is variation among the ISPs.

    Here is an example of the sort of issue that the required aggregation masks. It is not hard to estimate (although the actual data is hard to get) that the aggregate capacity of interconnection links from access ISPs to content providers is much larger than the aggregate capacity from access providers to their traditional peers, and certainly to their transit providers. If Netflix is looking for delivery options other than direct interconnection for its content, it is not going to be able to negotiate to use the capacity of its content competitors (think Youtube). It may not be able to negotiate to use the capacity of the traditional peers, because such a deal might upset the existing peering agreements. The only practical opportunity for Netflix to find capacity might be via the transit provider(s) of the access ISP, where the existing aggregate capacity is most likely be totally inadequate to carry the traffic. But all these nuances are masked by the required aggregation.

    One way to evaluate this data (in the form in which the paper can present it) is to consider what the most interesting conclusions are to draw. I find the most interesting figure to be 5b, which displays the peak utilization of all links in the study, weighted by capacity. First, note that the peak number plotted in this (and other figures) is the 95% utilization. One might ask why 95%, instead of actual peak. It is traditional to clip outliers, but if I understand the way the data is processed, taking the 95% utilization means that 5% of the time (or about 1.2 hours a day) can exceed this value. In this context, is the fact that 5% of the links are running with utilization between 95% and 100% an indication that there are few actual cases of congestion, or more than one might expect? (I am actually surprised that there are so many underutilized links–why is that so? It might have to do with the fact that capacity comes in large granularity. If 10 G is not enough, you have to buy 20G, and so on.)

    I think it would be particularly interesting to know if those 5% represent content providers or traditional peers. Who knows?

    Here is an example of a general conclusion the paper draws that really needs to be dissected or unpacked. It paints a generally positive picture of the ecosystem that may be true but could be easily spun as advocacy. The paper says:

    “Our analysis suggests that capacity continues to be provisioned to meet growing demand and certain interconnection points have spare capacity even though specific links may be experiencing high utilization.”

    Figure 4 does suggest that capacity is being added, but it is impossible to get much detail. The big content providers (e.g., Netflix and Youtube) deliver about half of the total content flowing into a typical US access ISP, so their decisions about adding capacity may mask whatever else is happening. It might well be that there is no capacity being added (and no demand being added) to traditional peering and transit links. So what conclusions are actually justified?

    An earlier version of this paper was presented at TPRC (Telecommunications Policy Research Conference) a year ago; that venue makes clear that one target of this paper is the policy and regulatory community. At that conference, my co-authors and I presented another paper that critiqued a number of measurement and analysis methods, including this one, and made the point that in many cases the methods seem crafted to tell the preferred message for different industry sectors. The authors of this paper note that TPRC does not publish archival proceedings. However, all the papers are available on SSRN. Since this paper does not cite ours, here is the URL so that others can see our analysis.

    https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2756868

    At the risk of seeming snarky, I would note that the paper puts a slightly optimistic spin on the implications of the aggregation. For example, in 3.3, the paper notes that the “..aggregation makes it difficult to drill down…” I think the correct work is “impossible”.

    Authors of papers (especially papers with relevance to policy and industry contention) should have a disclosure about the sources of their funding.

    1. Before responding pointwise, I would like to make it clear that this commenter has sent similar comments in private, and these comments are also directly addressed in the paper. I would refer readers to the paper itself for many of these answers.

      I’ll respond pointwise below.

      “This paper illustrates the challenges and risks of working with data that is covered in its raw form by an NDA. The data provided by the ISPs are a fantastic source of information, but the providers have required that it be aggregated in ways that obscure the most interesting conclusions, as the paper makes clear.”

      * No, this is not exactly the case. For any given ISP, data about utilization can be seen across aggregates of three metro areas. While the data sharing agreement makes it clear that any data must be aggregated into groups of three cities or three ISPs

      “The way this data is aggregated here has to be evaluated in the context of Washington policy-making, because different industry sectors are (or in past disputes have been) making claims about the character of interconnection that reflect how they want their industry sector to be seen. Data analysis of this sort is a policy and strategic activity, not just a technical exercise.”

      * The paper explains the practical limitations of data sharing agreements quite clearly. This is a bit of “Internet 101”, and is also detailed in the paper, but let me explain again the way data sharing agreements between content providers and ISPs work. As the paper explains, data sharing agreements on private interconnects are tightly held industry secrets. Among the information that is private includes (1) the existence of an interconnect in any given city; (2) the capacity of those interconnects in any given city; (3) the amount of traffic being routed over an interconnect in any given city.

      * As you probably know (you’ve been privy to some of the same discussions), there are specific ISPs that are exploring releasing fine-grained information about their interconnects. We may be quite a way from that still, but your comment is disingenuous: I am aware that you know that the limitation here is due to the legalities surrounding conventional private peering arrangements, and the way those agreements are drawn up. Your comment suggests that there’s an intentional attempt to hide data, but there is not. As the paper clearly explains, I can see the data on individual link granularities, but the paper itself is not at liberty to publish those, and there are legal reasons surrounding contracts that prevent that. Happy to explain more if that is still unclear, or if you have specific questions about it.

      * In fact, Google and others have explicitly claimed that they can drive utilization on their public-facing CDN towards 100% without affecting user experience. See the Espresso paper from SIGCOMM 2017. A high-utilization link between a content provider and an access ISP is not controversial.

      “The particular dispute of relevance here is between the large ISPs and the large content providers (think Netflix). Netflix was arguing that it had inadequate options to deliver all its content into the large access ISPs, which created enough market power for the access ISPs that they could extract payments from the content providers for direct interconnection. The ISPs were arguing that there were enough uncongested links into their networks that Netflix had many options to deliver traffic, and the ISPs had no significant market power that might attract regulatory attention.”

      * The paper doesn’t have anything to say about regulatory attention—I’m not aware of your arguments about “market power”, but those are clearly false, and I also made the same point about market power in a previous blog post, here—https://freedom-to-tinker.com/2015/03/25/why-your-netflix-traffic-is-slow-and-why-the-open-internet-order-wont-necessarily-make-it-faster/—so I agree with you about that.

      * Your statement “there were enough uncongested links into their networks” is what the aggregates can show. Even that kind of data is far more information that we previously had. If you have follow-up questions, that is good. David, I know that you also signed the same NDA for the data, so if you’re interested in drilling into the non-public aggregates, you have the ability to look into the data yourself and make a more informed comment, instead of implying that you can’t see it.

      “In this context, it is useful for an ISP to point to a graph such as Figure 2 to support the claim that there is lots of spare capacity, at least somewhere. But this figure is across all links of all the ISPs in the study, which connect those ISPs to all their interconnecting parties, both traditional peers (e.g., Comcast to ATT) and as well the links that represent direct interconnection to content providers (so-called paid peering).”

      * The paper explains why such an aggregate can still be useful information. If there is spare capacity at an interconnect in a metro region, then there would be spare capacity for a content provider to deliver video traffic in that region by buying more capacity.

      “The required aggregation prevents the paper from discussing which ISPs participated in the study, how they vary in size, and so on.”

      * The abstract lists all seven ISPs. I refer you to the abstract.

      “So a lot of heterogeneity has been rolled up in these aggregates. When the paper looks at a given region (metro area), multiple ISPs are being combined, so it is not possible to tell if there is variation among the ISPs.”

      * This is also not the case. The plots show a distribution, so you can see for a given metro that some ISPs have higher utilization than others. The website http://interconnection.citp.princeton.edu/ shows the data for all of the metros and ISPs; I didn’t include them in the paper for space reasons, but they are all there.

      “Here is an example of the sort of issue that the required aggregation masks. It is not hard to estimate (although the actual data is hard to get) that the aggregate capacity of interconnection links from access ISPs to content providers is much larger than the aggregate capacity from access providers to their traditional peers, and certainly to their transit providers. If Netflix is looking for delivery options other than direct interconnection for its content, it is not going to be able to negotiate to use the capacity of its content competitors (think Youtube).”

      * This is irrelevant to the paper, since we are talking about capacity between content providers and ISPs, not between content providers.

      “It may not be able to negotiate to use the capacity of the traditional peers, because such a deal might upset the existing peering agreements. The only practical opportunity for Netflix to find capacity might be via the transit provider(s) of the access ISP, where the existing aggregate capacity is most likely be totally inadequate to carry the traffic. But all these nuances are masked by the required aggregation.”

      * Not exactly. The presence of capacity at the interconnects suggests that the capacity is not being artificially constrained. It may be constrained by the content provider’s willingness to pay for the additional capacity (or for the ISP to jointly upgrade the capacity as part of the peering agreement), but that is a question of economics, not a technical question. The economics of peering is not something that this paper addresses, but as I mention, I have blogged about that (such as in link I included above).

      “First, note that the peak number plotted in this (and other figures) is the 95% utilization. One might ask why 95%, instead of actual peak. It is traditional to clip outliers…”

      * Not only is it customary for statistical reasons, but it is also customary industry-wide to show this statistic. Billing is done on 95th percentile.

      “but if I understand the way the data is processed, taking the 95% utilization means that 5% of the time (or about 1.2 hours a day) can exceed this value. In this context, is the fact that 5% of the links are running with utilization between 95% and 100% an indication that there are few actual cases of congestion, or more than one might expect? I think it would be particularly interesting to know if those 5% represent content providers or traditional peers. Who knows?”

      * Not sure I understand your comment, but this all of the interconnects that the paper discusses are between ISPs and content providers. It is well-known that some link utilizations approach 100%, as outlined in the espresso paper, so there is no controversy there.

      “Here is an example of a general conclusion the paper draws that really needs to be dissected or unpacked. It paints a generally positive picture of the ecosystem that may be true but could be easily spun as advocacy. The paper says: “Our analysis suggests that capacity continues to be provisioned to meet growing demand and certain interconnection points have spare capacity even though specific links may be experiencing high utilization.”

      * This statement is clearly supported by the data. If you (or anyone else) should choose to use it for “advocacy” then that is up to you. I am happy to opine about that, which I have done a little bit here: https://freedom-to-tinker.com/2015/04/02/where-is-internet-congestion-occurring/

      “Figure 4 does suggest that capacity is being added, but it is impossible to get much detail. The big content providers (e.g., Netflix and Youtube) deliver about half of the total content flowing into a typical US access ISP, so their decisions about adding capacity may mask whatever else is happening. It might well be that there is no capacity being added (and no demand being added) to traditional peering and transit links. So what conclusions are actually justified?”

      * The data itself is about peering links specifically. I am not sure I understand this question. The data clearly shows that aggregate capacity continues to be added. I refer you to http://interconnection.citp.princeton.edu where we are publishing ongoing data

      “An earlier version of this paper was presented at TPRC (Telecommunications Policy Research Conference) a year ago; that venue makes clear that one target of this paper is the policy and regulatory community. At that conference, my co-authors and I presented another paper that critiqued a number of measurement and analysis methods, including this one, and made the point that in many cases the methods seem crafted to tell the preferred message for different industry sectors. The authors of this paper note that TPRC does not publish archival proceedings. However, all the papers are available on SSRN. Since this paper does not cite ours, here is the URL so that others can see our analysis.
      https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2756868

      * SSRN papers are drafts.

      * Your paper is certainly advocacy, not technical, as it contains no data whatsoever. Your paper makes a number of basic errors, which I pointed out to you privately and still remain uncorrected. I’ll list those here in case you choose to edit your paper.

      1. Some of the things your SSRN paper says, such as the required level of aggregation, are not correct. The aggregation level that data sharing agreements require is three.

      2. The discussion on LAGs needs a lot of work. The assertion that LAGs aren’t load-balanced in a metro is not cited, so it seems odd and speculative to say that they “might not be”, without a proper citation or explanation. The paper explains in detail why these links are generally load-balanced. Also, only one ISP in the dataset has two LAGs in a single metro region, and that is a bit of an aberration. I discussed this with Steve already at length, I believe.

      3. I think discussing the limits of capacity that exist for a single peer are valid—and you are absolutely right to do so—but the tone of the text could be more balanced; to say that this reveals nothing is disingenuous.

      * I will point out a few other facts:

      – David, you have access to the same dataset that I do, under the same data sharing agreement. If you have doubts, you actually have the data and could follow up yourself. I sent you multiple versions of this paper when I was first working on it, in late 2015 and early 2016. You demurred on any opportunity to provide private comment, yet somehow felt the need to write a public rebuttal. Speaking as a professional in this community, I was deeply disappointed by your approach; I found it highly unprofessional and uncollegial.

      “At the risk of seeming snarky, I would note that the paper puts a slightly optimistic spin on the implications of the aggregation. For example, in 3.3, the paper notes that the “..aggregation makes it difficult to drill down…” I think the correct work is “impossible”.”

      * I explained this to you explicitly in a meeting we had on October 1, 2017, according to my notes. In case you’ve forgotten about that discussion, I’ll include the explanation here.

      * Unfortunately, snark has gotten in the way of more careful thinking. Read the paper more carefully, and think more carefully. The data sharing agreement allows publication of aggregates of any aggregate of N where N is larger than three. It would technically be within bounds to publish an aggregate of four, and an aggregate of three.

      * So technically, yes, it is difficult (though not impossible), and the difficulty actually stems from respecting the spirit of a data sharing agreement, not from my (or anybody, for that matter) ability to see it.

      “Authors of papers (especially papers with relevance to policy and industry contention) should have a disclosure about the sources of their funding.”

      * It is not customary for papers under review to have an acknowledgments section, but there is a clear disclosure on the project website, which is cited. The SSRN paper, which you have clearly read, acknowledges both past commenters (which do not include you, despite my numerous attempts to send you early working drafts), as well as acknowledgement of support from CableLabs. Quoting the website, which has a clear acknowledgment of funding:

      * “Q: Who pays for this?
      Project participants deploy the measurement tool at their own expense. CITP is working with participating ISPs and other project participants to establish and maintain a fair and equitable funding mechanism for CITP.”

      * CableLabs funds for operations time to maintain the ongoing data analysis as a service to the community. Quoting from Wikipedia: “Cable Television Laboratories, Inc. is a not-for-profit innovation and research and development lab founded in 1988 by American cable operators.”

Leave a Reply