The Domain Name System (DNS) provides critical naming services to Internet users, making itself a prime target for those attempting to monitor or censor users.
Several defence mechanisms for encrypting DNS communications have been proposed to protect users from such attacks. Two of them, DNS over TLS (DoT) and DNS over HTTPS (DoH) have been standardized by the IETF and implemented in multiple tools available for wide deployment.
The use of these security protocols raises new questions about DNS privacy, including that of ‘how much privacy could be truly afforded by encrypting DNS?’
The specifications for DoT (RFC 7858) and DoH (RFC 8484) note the possibility that traffic analysis techniques may be exploited to undermine the privacy provided by these protocols. While such techniques have been successfully demonstrated in multiple scenarios, it is not clear that these classic approaches will work in attacks against encrypted DNS.
This is the question my colleagues from the University of Delaware, the University of California and Virginia Tech and I, sought to answer in a paper we recently presented at ACM CoNEXT 2019.
How we emulated attacks on encrypted traffic
Our threat model is similar to that used in several classic website fingerprinting studies.
We considered an adversary who aims to determine if a victim has visited a website, or one of a set of websites, of interest to the attacker. Figures 1a and 1b illustrate the steps of the attack.
The adversary
begins by visiting the websites and capturing the DoT traffic generated by
loading target webpages. From this traffic, the adversary extracts a set of
statistical features and uses these to train classifiers. Finally, the
adversary uses these classifiers to tag DoT traffic collected between the
victim and its recursive resolver.
To emulate this scenario, we generated a set of DoT traces associated with 198 websites (one page per site), to train and evaluate classifiers. Of these, 98 were ‘target sites’ in the sensitive categories of dating, health insurance, and gambling, and the remaining 100 were popular sites, taken from the Alexa top sites list.
The specification for DoT recommends padding messages to defend against traffic analysis attacks, and many implementations of DoT support the use of padding. To evaluate the effectiveness of this defence, we collected one set of traces where queries and responses are unpadded, and one set with padding. From these traces, we extract a set of 126 statistical features related to the size, timing, and direction of DoT messages.
Padding messages is important but not full-proof
In our first experiment, we evaluated the attacker’s ability to determine if the victim has visited any one of a group of websites in a particular category.
Using the dataset without padding, the Random Forest classifier constructed for this task is able to distinguish between DoT traffic associated with websites in a target category, and those generated by visiting popular pages, with a false positive rate (FPR) of 0.23% to 0.54% and false negative rate (FNR) of 3.9% to 6.4%. Once padding is applied, the FPR increases to the range between 3.4% and 4.3% and FNR increases to the range between 21.3% to 30.7%.
These results highlight the importance of using padding when developing or using systems implementing DoT.
Similar results hold for our second experiment in which we tested the adversary’s ability to determine if a victim has visited a specific website of interest.
For this experiment, we used Adaboost Classifiers. When padding is not applied, the median FNR and FPR for all pages are 2.5% and 0.0%, respectively. Once padding is applied, the results are worsened, with a median FNR and FPR of 26% and 0.05%, respectively.
These results again highlight the importance of padding to prevent loss of privacy via traffic analysis but suggest that more defences are still needed in the future, given that even with padding, some pages could be identified with an FNR or FPR of 0.
More details on these experiments and others we conducted are described in the paper we presented at ACM CoNEXT 2019.
Contributors: Chase Cotton, Zhou Li, and Haining Wang.
Rebekah Houser is a Ph.D. candidate in the Electrical and Computer Engineering Department at the University of Delaware.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.