IP anycast is a popular approach to routing; where geographically distributed servers all serve the same content and use the same IP address. It’s used by large, popular, latency-sensitive services including the Domain Name System (DNS), Content Delivery Networks (CDNs), and the cloud, and its popularity is only growing. As an example, Microsoft has more than doubled the size of its deployment since 2015, and Facebook recently switched from DNS-based redirection to anycast!
This isn’t to say that anycast is perfect. Prior studies such as this one and this one that examined anycast root DNS traces showed that many queries travel to anycast sites much farther than the closest, unnecessarily inflating latency due to the additional propagation delay. One of those studies goes further to suggest that deployments with more sites worsen inflation, since having more sites implies there are more chances for a suboptimal decision. The study leaves the reader feeling that anycast is inherently flawed and may need ‘fixing’. So, why are Facebook and Microsoft embracing anycast if it hurts performance?
To get the story straight, we at Columbia University, in collaboration with the University of Southern California’s Information Sciences Institute and Microsoft, reevaluated anycast’s performance in the context of the applications that use it. Our key insight was that anycast performance in the root DNS (which is what prior work investigated) is indeed poor. However, poor performance in this setting doesn’t matter to end users due to caching of DNS records close to users.
But does caching of root DNS records really make that much of a difference so that root DNS latency hardly matters for end users? And, in cases where latency does matter for end users, can anycast performance be made better? The answer to both questions, as we’ll explain in this post, is yes.
End users execute about one query a day to the root DNS on average
The root DNS is a global DNS service structured as a group of 13 anycast deployments run by different organizations that all serve the same content. A query can be sent to the deployment of the querier’s choice.
Principally, the root DNS servers serve records for the Top Level Domains (TLDs), nearly all of which can be cached for two days by intermediate resolvers before they need to be refreshed. Since intermediate DNS resolvers (and therefore their caches) are often shared among large populations of end users, in theory, a very small number of queries need to go to the root DNS per user.
Relying on intuition about this caching process is helpful, but we wanted to put our theory to the test with real data. Figure 1 shows the number of queries users make to the root DNS each day. We calculated these user-query counts using root DNS traces, adding up all the queries generated by each Autonomous System (AS) and then dividing (amortizing) the query counts by the number of end users in that AS according to APNIC population estimates. This calculation approximates how a cache works, assuming the query load is divided among all end users who share a cache.
Note: Caches are actually shared per recursive resolver, and resolvers may not be in the same AS as users. Our paper also amortizes queries over per-resolver user counts using a different dataset (not APNIC, which is per AS) and arrives at similar conclusions.
Figure 1 shows that end users execute about a query a day to the root DNS on average — so user queries could be inflated around the world, and no one would notice the difference!
Context is key when measuring anycast
The key takeaway from our analysis is that previous studies assessed anycast out of the context of the system in which it was used, which left us with an unfair impression of anycast.
Following this analysis on the root DNS, we compared the root DNS to Microsoft’s anycast CDN which, unlike the root DNS, hosts latency-sensitive services. Our study suggests that, possibly because Microsoft invests in optimizing user experience, anycast performs much better in that setting.
So, network managers need not be worried about their end users suffering from anycast and researchers need not think that anycast needs fixing. Services with strict performance requirements can configure their anycast prefix advertisements to ensure user traffic takes low latency paths. And services such as root DNS without those strict performance requirements probably don’t need to do anything to fix inflation, but that’s okay!
Please refer to our paper for more details on and results of our study.
Thomas Koch is a PhD student at Columbia University working with Ethan Katz-Bassett in the Systems and Networking Lab. Thomas’ interests include Internet measurement and content delivery with a focus on network optimization.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.