In April this year, the volume of Domain Name System (DNS) traffic hit a record high of five trillion DNS transactions.
The bulk of these transactions were in the clear. This situation enables entities, such as Internet Service Providers (ISPs), Autonomous Systems (ASes), or state-level agencies, to perform user tracking, mass surveillance, and censorship.
The risk of pervasive surveillance and its consequences has prompted Internet governance organizations, industry actors, and standardization bodies to foster privacy protections. In particular, these bodies have standardized two protocols: DNS-over-TLS (DoT) and DNS-over-HTTPS (DoH).
Both DoT and DoH encrypt the communication between the client and the recursive resolver to prevent the inspection of domain names by network eavesdroppers. However, even when communications are encrypted, analysis of traffic features such as volume and timing can reveal information about their content. As a consequence, standardization bodies are also considering protection mechanisms to limit the inference of private information from traffic metadata of the encrypted DNS communication.
While existing evaluations of DoH implementations have mainly focused on performance, we at the École polytechnique fédérale de Lausanne, the KU Leuven, and the IMDEA Networks Institute, investigated whether encrypting DNS traffic can protect users from traffic analysis-based monitoring and censoring, and whether existing protection mechanisms against traffic analysis are effective.
Our research was scenario-based, where we evaluated an adversary placed between the client and the resolver whose aim was to identify web pages visited by the user by examining their DoH traffic. The reasoning behind the attack is that the set of DNS resolutions to the visited page’s first party domain and subsequent resolutions for the resources embedded in the page, which we call a DoH trace, could act as a fingerprint for its identification. Since neither DoT nor DoH hide the sizes of these domains, the adversary can extract these fingerprints from the encrypted traffic.
Key points:
- It is possible for adversaries to identify around 70% of visits by a user to black-listed web pages by analysing DoH traffic.
- Censorship is feasible even in the presence of DNS encryption.
- CDNs, including Cloudflare, have enabled padding on their DoH resolvers by default as a result of our findings — something we recommend all resolvers to do. However, current EDNS0-based padding measures are not sufficient.
- DNS-over-Tor has the potential to be an effective countermeasure against DoH traffic analysis.
DNS-based fingerprinting
DNS traffic is more chatty than web traffic: as opposed to typical web traffic, DNS requests and responses are short and do not vary much in size. For this reason, features traditionally used to classify web traffic are not effective for DNS.
In our study, we proposed a novel set of features specifically designed for encrypted DNS traffic. The features are based on the n-gram representation of DoH traffic traces, and our analysis showed that such features are effective in fingerprinting web pages from DoH traffic.
Monitoring
We evaluated the effectiveness of such features in two different scenarios.
In the first scenario, we considered an adversary who knows the entire set of possible pages the user might visit — this is unrealistic in practice, albeit useful to bound the performance of the attack — and whose goal is to determine which page in the set was visited. Our results showed that ~90% of the time, the adversary could correctly identify a page out of the 1,500 pages in the set.
In the second scenario, the adversary did not know the pages the user might visit. Instead, the adversary is interested in knowing whether or not the visited page falls within a black-list of monitored pages. Our evaluation showed that for 5,000 pages, out of which 50 are monitored, the adversary was able to identify a monitored page ~70% of the time.
Such a high success rate for the adversary indicates that it is possible to identify web pages from DoH traffic. In addition, this attack is more scalable than existing attacks on HTTPS traffic: since it is applied on DoH traffic only, it requires 124 times less volume of data than attacks that also use HTTPS traffic (0.6 GB for DoH as compared to 73 GB for HTTPS, on a dataset of 700 webpages, with 60 samples each).
Censorship
We then considered a censor that aims not only to identify the pages being accessed but to do it as fast as possible, in order to prevent users from downloading the content.
The best-case scenario for a censor is that DoH traces of the blacklisted pages have a unique starting pattern. For this reason, we analysed the uniqueness of DoH traces when only a few packets at the start of the communication have been observed, for a set of 1,500 pages.
Our results showed that the 4th TLS record size in a trace usually corresponds to the first DoH query. In addition, by the 15th TLS record — corresponding to approximately 15% of the total average trace — it is possible to distinguish, with high probability, the page that generated the trace. Thus, a censor could follow one of two strategies:
(1) They can block on the first query, by analysing the 4th record, or
(2) Perform a high confidence guess, by waiting at least until the 15th record.
The former strategy blocks the user from obtaining even the first-party domain but could result in higher collateral damage, since other websites with the same TLS record size might be blocked. The latter would cause less false positives and thus lower collateral damage, but it would allow partial access to the content. Thus, in the presence of DNS encryption, the censor has to find a compromise between precision and collateral damage. We show that for blacklists of unpopular pages, precise censorship with low collateral damage is possible.
We also investigated the robustness of our attacks — how the attack is impacted when the victim’s setting is different from that of the adversary. We evaluated the impact of the following variables — location, time, and infrastructure (resolver, platform, DoH client) — by training the classifier on one value of a variable, and testing it on another.
Attack effectiveness decreased for all the variables we tested (Figure 1). However, we also found that the adversary could mitigate the effect of these variables by inferring them from the victim’s encrypted traffic.
Standardized padding does not work
We examined a few countermeasures against our attack.
RFC 7830 specifies EDNS(0) padding, a mechanism to pad the size of the DNS messages. The recommended padding policy is for clients to pad DNS requests to the nearest multiple of 128 bytes, and for resolvers to pad DNS responses to the nearest multiple of 468 bytes (RFC 8467). By using our classifier on padded data, we observed that the recommended policy reduces the success rate (measured by F1-score) from 90% to 43%, but it does not completely deter the attack.
Interestingly, client-side countermeasures such as altering the pattern of requests by the use of an ad-blocker, help as much as padding. We also experimented with Cloudflare’s DNS-over-Tor and discovered that Tor is an effective countermeasure against DoH traffic analysis.
Looking ahead
Upon responsible disclosure of our findings, Cloudflare implemented padding of responses on their DoH resolver. We recommend all resolvers to enable the padding option by default.
Also, a promising countermeasure could be to mimic the transformations that Tor applies to traffic while stripping out the anonymization logic, which is orthogonal to our problem. This can lead to an effective, yet lightweight countermeasure against DoH traffic analysis.
This work was presented at an IETF Privacy Enhancements and Assessments Research Group Meeting, and we are currently investigating different countermeasures in order to contribute to the next RFC for traffic analysis protection of encrypted DNS. More details of the study can be found in our paper pre-print.
Contributors: Sandra Siby, Marc Juarez, Claudia Diaz, Narseo Vallina-Rodriguez, Carmela Troncoso
Sandra Siby is a PHD Student in Computer Science at the École polytechnique fédérale de Lausanne. Marc Juarez is a Postdoctoral researcher at the University of Southern California.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.
Nice work, congratulations.
Another possible counter-measure would be to have a local caching resolver on the machine or the LAN, forwarding to a DoH resolver when the data is not in the cache. The effects of the cache would probably break the most obvious patterns of DNS requests.