Part of being a registry is being a phonebook for the Internet. But just as phonebooks have changed, so too are registries evolving. A core aspect of the ‘phonebook’ service that registries provide are known as ‘whois’ databases.
The thing about whois though… it’s pretty old. The documents that define whois are some of the oldest RFCs still in use. RFC 812 dates back to March 1982.
But whois is not the only game in town, at least for some functions. Registries have also increasingly been using Registration Data Access Protocol (RDAP) to provide these phonebook type look-up services. If you read this recent blog post you’ll have seen the reference to RDAP as part of the new model for registry services.
Functionally, it’s in the same service delivery role, and it was invented specifically because of weaknesses in the whois model. So it would make sense, right?
So, let’s take a minute to investigate this. Is RDAP set to take over the role of whois?
Well… it’s not quite that simple. Before we can answer that question, we need to think a bit about the shortcomings of the whois model and what gaps need to be filled.
Weaknesses in the whois model
Whois as a protocol is severely under-defined
Whois has, of course, been updated since its first RFC back in 1982, but the subsequent RFCs didn’t actually add much to the protocol itself — it uses TCP on port 43, and has no other definition. More complex uses of whois are defined separately.
That’s part of the problem. It’s poorly defined, and lacks a single binding definition. Arguably, the key definitions no longer even apply, except in the specific case of the Internet Routing Registry (IRR) function that uses Routing Policy Specification Language (RPSL).
But that takes us from the protocol definition to the semantics of how it is actually being used.
Whois doesn’t have a single information model
Each whois provider is free to implement a key-to-value lookup method (with options, flags, switches) as they see fit, which means (since whois is shared between DNS registries and number registries) there is a problem for clients — what kinds of whois queries and responses do you support? How do you know when to apply them?
You don’t know, except from external knowledge. Only IRR users have high confidence of a specific structure.
Whois has no well-defined multi-language model
Some servers implement a multi-language model, some don’t, and those that do, implement it different ways. This means that a single client cannot reliably try to say it wants English-language only results or that it prefers a different language.
Whois has no well-defined authentication model
It is not easy to specify public or private access levels, or to set user limits on data. Different whois services support different methods of doing this, including having none at all! In some cases, it’s based purely on the IP address you query from (for instance, when rate limits are applied in some cases). Whois predates the California Consumer Privacy Act and the EU’s General Data Protection Regulation (GDPR). There is a huge privacy issue bubbling along around the personal data exposed in whois records.
For number whois, there is the RPSL model for IRRs
This is a structured information model (a language) to specify routing policy and is machine parsable. It is now being used both for the routing registry and delegation registry function. The model includes alternative forms that are confusing, and open to interpretation. This is arguably not a weakness, so perhaps needs to be set to one side. The weakness, if any, is the standard for RPSL doesn’t reflect modern coding models and requires quite a lot of work on the client side to process, when in fact most whois servers in the world don’t use it. Really, it is meant only for the routing configuration management problem and possibly shouldn’t be applied to a delegation record-keeping whois service.
Whois is harder to distribute in the cloud
Because cloud service providers tune and optimize their service to HTTP/HTTPS and that’s not port 43, it is logistically quite hard to do a cloud distribution. Typically, you either have to use a generalized IP forwarding model, which is ‘unusual’ and attracts higher traffic costs, or you have to consider running your own anycast or other cloud distribution model. This is not uncommon; APNIC runs whois in at least three locations operating its own whois service hosted in a data centre provider, and uses a separate ‘pick best local’ service selector to send different clients (by geolocation) to different whois servers.
There isn’t a ‘directory’ function
There is no single mechanism to find which whois server can provide answers for your query, or a consistent referral mechanism to be told where to go, if you ask the ‘wrong’ server. You have to code this into the client directly, or use external knowledge. In some cases, there is limited support for ‘referral whois’ but it’s not globally available or consistently applied.
So, from this list, it’s easy to see that there are quite a few problems with whois, and they’ve been around for a long time. What’s the upside behind RDAP?
Strengths in the RDAP Model
RDAP as a protocol is JSON data served over HTTP/HTTPS
Using HTTP/HTTPS means there is a clean model to provide additional functionality such as data compression, proxying, cache lifetime, and to identify client capability. We don’t have to ‘invent’ things, which HTTP and HTTPS have already solved, so it’s far simpler to design systems, because the modern web provides so much for us.
There are well understood authentication models to apply to HTTP/HTTPS
An advantage of using the web is that it has an authentication model. There is none in whois, but in the web we have several to choose from, implemented in different ways, but available as code and so it is possible to define access control limits.
There is work in progress, in the IETF REGEXT working group, to define the use of OAuth. This is likely to permit the privacy problem inherent in whois to be modified — public data will conform to regulations like the GDPR, and recognized users who can show credentials can be shown the higher privilege data they need.
RDAP is explicitly defined to be UTF-8
RDAP is multilingual across the board. Not all RDAP services may be multilingual, including use of alternate language responses. This means things like HTTP language preference hints can be used to decide how to send a response: send only the requested language, or send the data first, or send both forms. It means that two people sharing a link to an RDAP object can see the version that best suits their local language. For things like physical addresses, this can be really useful because some economies in the Asia Pacific use forms of addressing that don’t map into the Western “number/street name/locality” model well.
RDAP has a single consistent information model
‘Profiles’ can be defined and registered with IANA, so there is both a base model (JSON) and a profile registration method that allows clients to understand what kind of data to expect, and how to parse and process it. This has already been used by the RIR community to define a profile specific to Number RDAP, and APNIC is working through the NRO’s Engineering Coordination Group to adopt the profile and converge data models.
The information model is structured for everyone
Unlike whois, all responses in RDAP are easily machine parsable. The model is open to interpretation but the profile registration mechanism helps define contexts. This results in a single client that can handle most responses from any RDAP service, in either domain name or Internet number contexts. User-friendly models are simple to design with JSON, and machine parsing can now use modern language data models like the dictionary or hash table to identify fields. The barrier to entry when writing code for RDAP is significantly lower than it is for whois.
HTTP/HTTPS is easy to distribute in the cloud
HTTPS is the primary model for cloud service distribution — this is what they are designed to do. Caching works extremely well in CDN/cloud deployments so service delivery is more efficient for everyone (less load on the server, faster response to clients). APNIC has been able to demonstrate a reduction in Round Trip Time (RTT) to European clients from 300ms to 5 to 10ms by virtue of efficient HTTP cache models from its cloud provider.
There is a directory function
This ‘where do I go?’ process, called the ‘bootstrap’, uses an IANA-managed directory to identify where to go in the worldwide RDAP community. Your query can be directed to the best RDAP server, in many cases directly from the bootstrap list, but if you go to a ‘parent’ RDAP node that has to send you on, the web redirect feature works cleanly. There is a consistent referral mechanism built into HTTP/HTTPS that permits servers to redirect queries, and clients to be written to handle redirection.
This reads like a pretty compelling story. So, why don’t we all just stop doing whois and move over to RDAP?
The challenges with RDAP
RDAP can’t do IRR
The IRR function is vital for secure configuration of BGP filter lists, and helps guard the integrity of worldwide routing. To do this, it has a complex language (noted above) that defines policy rules used to build filter lists for an ISP’s BGP configurations. This language (RPSL) was not integrated into the JSON for RDAP because it is not a good fit; it’s a programming language with an if-then-else structure, not just a data representation method. It is possible parts, or even all, of the IRR function could be lifted into RDAP, but at this time it is out of scope.
Therefore, purely for the routing registry function, whois should continue to be used in RPSL form.
It should be noted that many aspects of the IRR function are now under review for inclusion in the RPKI signed data model. This is an important improvement in the trust model because RPKI is a framework of provable assertions about Internet number resources; while IRR is a vital configuration tool, the distinction between the two systems stands — only RPKI consists of ‘provable’ statements by the delegates of the resources. Arguably, the ‘right’ way to fix the IRR problem is not to put IRR data in RPSL into RDAP (JSON) but to use RPKI to secure the BGP filter generation process.
It’s not yet fully deployed, except in the RIRs
RDAP is not yet sufficiently deployed in the domain name space to completely replace the public listing services whois provides. However, in the RIR and Internet numbers space, it is fully deployed by all RIRs, and by several NIRs in the APNIC region. If we are discussing the primary function of the RIR system, RDAP is completely fine (noting the need for whois for the IRR function).
In many cases, the data we serve in RDAP is taken from whois databases
Because of how APNIC deploys services, with a strong dependency on whois data as a format agreed with the NIRs, it is still at the stage of RDAP deployment where the data is manufactured from whois data. This is a transitional step and APNIC could (for instance) continue to use whois data as an exchange format ‘behind the scenes’ and move public service delivery to RDAP, if the community wanted that.
However, at this time, they are presented as complementary services and APNIC is happy to continue to provide both.
If RDAP did replace whois, what could we do?
Most of the upside benefits of RDAP are shown above. Basically, APNIC would still be doing the public information service it exists to serve, but would be doing it better overall, if:
- It could do OAuth and improve privacy; by default, APNIC could arrange that no personal information is included. If you’re a logged-in user, what you see would depend on your authorization level.
- It could serve data much more effectively, with shorter RTTs, along with better overall caching and responsiveness to clients. APNIC would scale better to demand in the community.
- There is future potential to include parts of the IRR function, but it needs to be acknowledged that RPSL and IRR will be a persisting service for the BGP-speaking community.
Is this what deprecated whois looks like?
So RDAP can’t replace whois entirely. But in the future, noting the routing configuration issues, maybe it sort-of could? Is this what ‘deprecated whois’ looks like?
Perhaps, but here is a better way of thinking about it; if we work to make RDAP able to be the natural whois replacement for as many whois services as we can — and be privacy respecting — maybe the deprecated whois question solves itself.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.