RDAP: Search using POSIX Regular Expressions

The Registration Data Access Protocol (RDAP) is the HTTP-based successor to the port 43 whois protocol. It has several advantages over the traditional whois service:

It has well-defined internationalization semantics, so servers can be written to transparently support multiple languages, and client preferences can be handled correctly.
It supports ‘proper’ redirects, so a server that doesn’t know an answer, but does know where an answer can be found, can communicate that to a client.
Wide support for HTTP allows operators to make use of existing services, like caching proxies, content distribution networks, and so on.

The protocol’s initial specification can be found in RFCs 7480, 7481, 7482, 7483 and 7484.

Part of that specification was a basic search facility: clients are able to search for records that match a given string, and that string may include one or more wildcard characters. For example, using ARIN’s NicInfo tool, it’s possible to query for each contact whose name begins with a certain string:

$ nicinfo -t esbyname -b http://rdap.apnic.net/ "Nguyen*" \
    | grep "Common Name" | sed 's/\s*//'

Common Name:  Nguyen Che Phuong
Common Name:  Nguyen Gia Hoa
Common Name:  Nguyen Hai Giang
...

However, this type of wildcard is the only feature provided by the current search.

The need for a more full-featured search facility was flagged when the initial specifications were written, and now a group from Verisign have drafted an improvement: RDAP Search Using POSIX Regular Expressions. This new specification defines a mechanism for using POSIX Extended Regular Expressions (EREs) to query registry data. For example:

 $ curl -s -G "http://testrdap.apnic.net/entities?searchtype=regex" \
     --data-urlencode "fn=^N[a-g]u[^z].*(n|m)" \
     | jq '.entitySearchResults[].vcardArray[1] \
     | .[] | select(.[0] == "fn") | .[3]'

"Nguyen Che Phuong"
"Nguyen Gia Hoa"
"Nguyun Hai Giang"
 ...

POSIX EREs support the following features, among others:

Standard quantifiers (`*`, `+`, `{n}`);
Character classes, for example `[a-c]`, `[^1-3]`;
Alternatives, for example `(John|Jack)`); and
Anchors (`^` and `$`).

This particular type of regular expression was selected for this specification because it is widely supported: many different database systems implement it, as do indexing libraries like Lucene.

APNIC staff are participating in this effort by working on proof-of-concept implementations of the specification and investigating potential deployment scenarios.

If you would like to help with testing or similar, please let us know.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply Cancel reply