Under the hood of DANE mismanagement in SMTP

By on 9 Sep 2022

Category: Tech matters

Tags: , , ,

1 Comment

Blog home

Photo by Mark Duffel on Unsplash

DNS-based Authentication of Named Entities (DANE) was introduced in 2012 to support the authentication of a communication peer (for example, a TLS connection) without relying on Certificate Authorities (CAs). 

Because DANE can mitigate security challenges in SMTP such as STARTTLS downgrade attacks, it has been increasingly deployed by many email providers, including Microsoft 365. However, previous research has revealed that there are prevalent misconfigurations on DANE SMTP servers that hinder the proliferation of DANE.

In this post, I’ll share the findings of recent work my colleagues and I at Seoul National University, Virginia Tech, SIDN Labs, the University of Twente, and NLnet Labs presented at USENIX Security 2022. This research investigates the underlying reason for such mismanagement, to help improve the correct deployment of DANE.

Key points:
  • More than 30% of TLSA records could not be validated due to incorrect DNSSEC chain and 3.6% of TLSA records were not matched with the corresponding certificates.
  • Self-management of SMTP servers is more error-prone.
  • TLSA record validation mainly fails due to invalid DNSSEC chains or mismatches between TLSA records and their certificates due to many SMTP servers failing to roll over their keys correctly.
  • More than 87% of SMTP servers rolled over incorrectly.

How has DANE been deployed?

DANE is mostly deployed with SMTP. By simply publishing the TLSA records associated with MX records to explicitly signal the support of TLS to the email senders, DANE can effectively mitigate many security problems of SMTP.

Read: Better mail security with DANE for SMTP

Due to this powerful but simple mechanism, its deployment rate has been continuously increasing. In February 2021 for instance, 42% of .se and 1% of .com, .net, and .org domains with MX records had deployed TLSA records.

Unfortunately, we also found misconfigurations where more than 30% of TLSA records could not be validated due to an incorrect DNSSEC chain, and 3.6% of TLSA records were not matched with the corresponding certificates.

Why do domains fail to manage DANE correctly?

An email sender can authenticate the receiver’s public key on connection, by comparing it with the TLSA record they published. Therefore, to support DANE correctly, domain owners must put effort into managing their SMTP and DNS servers coherently.

However, this may not always be straightforward when the domain name owner outsources either or both servers. For example, when those who run their mail servers, but use external DNS authoritative services, need to upload the TLSA record by contacting their DNS providers.

Therefore, our research focused on who manages the DNS and SMTP servers for domains that support DANE.

Managing entities and management quality

Infographic showing DANE management categories.
Figure 1 — DANE management categories.

DANE management cases can be classified into three categories (Figure 1). Using this classification, we analyzed the quality of DANE management.

Two time series graphs showing the percentage of incorrect TLSA records (top) and their served domains (bottom) for each category (Jul 2019 ~ Feb 2020).
Figure 2 — The percentage of incorrect TLSA records (top) and their served domains (bottom) for each category (July 2019 – Feb 2020).

Figure 2 shows the incorrect DANE deployment rate of self-managed SMTP servers (SSDS and SSDO) is much higher than outsourced SMTP servers (SO). By contrast, we see that only 16 (2.7%) TLSA records in the SO group were invalid until September 2020. A spike in incorrect TLSA records in the SO group from 18 September 2020 onwards was due to two email hosting providers (Syix and Antagonist).

The percentage of the domains that publish invalid TLSA records (Figure 2, bottom) is comparable to that of the invalid TLSA records.

These results highlight that self-management of SMTP servers is more error-prone.

Where does such mismanagement happen?

TLSA record validation fails mostly due to two reasons: An invalid DNSSEC chain or a mismatch between TLSA records and their certificates.

Invalid DNSSEC chain

Figure 3 shows that DNSSEC is the dominant reason for DANE management failures across all managing cases. Nearly 90% and 95% of TLSA records in SSDO and SSDS respectively are invalid due to DNSSEC issues over the entire measurement period. With further investigation, we found that about 99% of DNSSEC issues are due to missing DS records in the parent zone.

Two time series graphs showing the percentage of TLSA validation failures for the self-managed SMTP categories. Since there are only 12 invalid and unique TLSA records for the SO case, it is omitted.
Figure 3 — The percentage of TLSA validation failures for the self-managed SMTP categories. Since there are only 12 invalid and unique TLSA records for the SO case, it is omitted.

A mismatch between TLSA records and their certificates

Figure 3 also shows that on average, 16% (SSDS) and 23% (SSDO) of TLSA records are invalid due to mismatches. Our first hypothesis was that the parameters of TLSA records were being incorrectly set by mistake. We tested this hypothesis by changing each of the parameters to see if any combination of parameters made the validation successful. It didn’t.

Next, we tested whether currently mismatched TLSA records could be correctly matched with any certificates previously used by SMTP server. As shown in Figure 4, the TLSA records mismatched in this way increased to 70% (SSDS) and 73% (SSDO) over time. In other words, the majority of mismatches are due to TLSA records that have not been updated over time. This implies that many SMTP servers failed to roll over their keys correctly.

How do SMTP servers perform rollover?

To consider caches in DNS resolvers, SMTP servers have to publish the new TLSA records in advance, at least two time-to-lives (TTLs) before moving on to the new certificate (RFC 7671).

Category Domains SMTP servers Incorrect rollover case
Total Wrong rollover Total Wrong rollover Early retirement old TLSA Late introduction new TLSA No introduction new TLSA
SO 54,052 34,056 (63.0 %) 277 255 (92.1 %) 1 (0.4 %) 216 (84.7 %) 58 (22.8 %)
SSDO 278 242 (87.1 %) 275 240 (87.3 %) 9 (3.9 %) 173 (72.1 %) 87 (36.1 %)
SSDS 585 546 (93.3 %) 594 544 (91.6 %) 55 (10.1 %) 450 (82.7 %) 179 (32.9 %)

Table 1 — The percentages of SMTP servers that have ever rolled over incorrectly with the reasons for each category and the impacted numbers of domains. For incorrect rollovers, the percentages of individual cases are also shown. Note that an SMTP server may have incorrectly rolled over multiple times for different reasons during our measurement period, which makes the sum of incorrect rollover reasons >100%.

Unfortunately, we found that more than 87% of SMTP servers rolled over incorrectly at least once during our measurement period (Table 1).

99% of DANE SMTP servers use DANE-TA or DANE-EE for their TLSA records and don’t need to roll over their keys. However, we have observed that more than 63% of SMTP servers conduct rollovers.

Why do they roll over?

We found that 86.7% of certificates in our dataset were issued by two popular CAs — Let’s Encrypt and Sectigo. DANE is designed to break the reliance on CAs but many SMTP servers still rely on them.

Moreover, we found unexpected outcomes from this reliance. Certificates issued by those two CAs often have very short certificate lifetimes. Let’s Encrypt, for example, automatically reissues every three months, which enforces rollovers of SMTP servers.

Due to this problem, we found that the percentage of misconfigured SMTP servers soared when Let’s Encrypt rolled over its intermediate certificates in 2020.

Two time series graphs showing the percentage of SMTP servers that use Let’s Encrypt certificates and have rolled over incorrectly when they used TLSA records with DANE-TA usage.
Figure 4 — The percentage of SMTP servers that use Let’s Encrypt certificates and have rolled over incorrectly when they used TLSA records with DANE-TA usage.

8.7% of SMTP servers use DANE-TA, allowing domains to publish TA or CA certificates as TLSA records.

We noticed a dramatic increase in the ratio of incorrect rollovers from October 2020, as shown in Figure 4. This was confirmed by monitoring the percentage of SMTP servers using certificates issued by Let’s Encrypt (Figure 4, bottom): In October 2020, Let’s Encrypt introduced a new intermediate certificate (R3) and withdrew the formal intermediate certificate (X3). We found that many SMTP servers using Let’s Encrypt intermediate certificates as TLSA records failed to update their TLSA records properly, right after the introduction of R3.

This shows another side effect caused by reliance on CAs.

Read our paper to learn more.

Hyeonmin Lee is a postdoctoral researcher at the Department of Computer Science and Engineering at Seoul National University, South Korea.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

One Comment

  1. Viktor Dukhovni

    The study’s headline numbers are misleading and hyped. The 30% of TLSA records that “fail to valid” are simply ordinary unsigned delegations, that should simply have been excluded from consideration. Lots of DNS operators routinely sign all domains internally, wether or nor they’re able to “consumate” the deal with parent-side DS records.

    Other numbers are similarly misleading. See: http://dnssec-stats.ant.isi.edu/~viktor/usenix-security-dane-response.html

    Reply

Leave a Reply

Your email address will not be published.

Top