In April 2020, APNIC announced the initial release of Registration Data Access Protocol (RDAP) to the cloud using the Google Cloud Platform (GCP) in the Sydney region. Today, we’d like to announce the expansion of this service to a multi-regional cloud deployment with the addition of new Google Kubernetes Engine (GKE) clusters hosting RDAP in Singapore and North Virginia.
This should significantly improve round-trip-times (RTT) to clients worldwide, and further improve upon the high availability resilience of RDAP by protecting it from regional outages. Further nodes may be deployed in 2021, depending on the cost/benefit and design issues.
We previously discussed the design of the RDAP architecture in the cloud, with the use of
rdap-ingressd to redirect or proxy a RDAP query to the appropriate RDAP service.
We’ve extended upon this design to leverage the Dynamic Traffic Steering capability of Cloudflare’s load balancer, to direct the query to the closest RDAP cluster, with fallback routing to the original Sydney cluster if required.
Configured via Terraform, a Cloudflare load balancer origin pool is created for each GCP Kubernetes cluster’s load balancer.
These origin pools have a HTTP monitor, which uses a valid RDAP query to check each regional RDAP cluster is responding appropriately, and also to compile a profile of the RTT from each of Cloudflare’s points of presence (PoP).
The origin pools are added to a Cloudflare load balancer, configured with Dynamic Traffic Steering, which means Cloudflare will automatically route the request to the origin pool with the smallest RTT from the perspective of the requester.
How we decided which regions
The selection of which of GCP’s 24 regions to deploy our second and third RDAP supporting cluster was based on two major factors — the RTT for a RDAP query, and the resource cost difference of the GCP compute elements.
The candidate regions we investigated initially were Tokyo, Singapore, North Virginia and London (along with our existing Sydney region).
Determining the RTT from multiple source locations around the world to each of the target locations required we set up test GKE clusters hosting a live RDAP service in each of the four new target regions.
We were able to rapidly deploy (and ultimately tear down) these test targets through the application of Terraform and a clone of the Git repository of FluxCD controlled HelmReleases.
Configuration of Cloudflare load balancer origin pools, and HTTP monitors (with a valid RDAP query and expected response) allowed us to use the Cloudflare RTT profile mechanism to gather statistics for the RTT from ~180 source locations around the world to the five target test clusters.
Contrasting the RTT results, combined with the estimated GCP resource costs for equivalent GKE clusters in each of the regions, we selected Singapore and Virginia as the regions that provided the best improvement to the RDAP service for the majority of our Members and global Internet users. Future deployments will enhance resiliency, and also be chosen to balance the benefit to the APNIC region and the global Internet.
We’d love to hear how you find the new service or if you’ve noticed any improvements. In an upcoming article, we will explore the engineering that allowed us to manage scaling to multiple regions.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.