Reflections on certificates, Part 1

By Enno Rey on 14 Apr 2023

I’ve written a couple of posts on (X.509v3) certificates in the past, starting with this one in 2001. In the two decades since then, several developments have taken place (to name a few: OCSP, ACME, Let’s Encrypt certificates, and the general role of automation).

On the other hand, the fundamental mechanisms of certificates have stayed the same. In this post, I argue that understanding the inherent (but often hidden) complexity, the trust relationships and the tradeoffs of certificates used in a given environment can lead to better decision making and to more efficient operations.

The basic scheme (for the purposes of this post) usually involves a set of parties:

A server (in the sense of an entity receiving a connection request, including network devices).
A client (an entity that initiates a connection).
A user who uses the client, and we can safely assume this is a human, so motivations and desires come into play (which can influence trust decisions).
An operator is in charge of (1), (2), or both. Here again, we assume humans, so they have objectives (in particular ‘make the users happy by providing a service that is available, and that they can use with their present skill set’).
Certificate Authorities (CAs) who issue certificates to be used on (1), (2), or both. Evidently, this involves (potentially complicated) relationships with the operators.
Developers.
Infosec people.

Let’s start with some high-level concepts (yep, regular readers remember my love for those).

Complexity

Working with certificates frequently induces a high level of complexity (definition of the term here), for several reasons:

Multiple standards bodies have contributed to specifying what we have today, one of them (ITU) being notorious for complex outcomes. The main IETF document, that is RFC 5280, has 151 pages.
Using certificates often involves other, not necessarily simple, things like ASN.1 or DER.
Most importantly, all types of extensions can be employed for nearly unlimited creative uses 😉. See this part of the table of contents of RFC 5280.

Screenshot of extract from RFC 5280. — Figure 1 — Extract from RFC 5280.

Unfortunately, one of the objectives of the ‘traditional’ certificate use case (that is, securely buying stuff on the Internet) was to hide this complexity from the users. At the same time, certificates being capabilities (see below) — which get deployed once, and seemingly, don’t have to be ‘operationally taken care of’ at least for a while — causes them and their complexity to be underestimated (and being ‘invisible until something breaks’) in fast-moving environments.

Realizing that certificates are complex beasts, and especially so when employed for certain use cases, might be the first step to getting better at handling them 😉.

Trust

By their very core value proposition trust (some definition of the term here) plays a huge role when certificates come into play. They’re exactly meant to contribute to trust between communication partners (by assuring the identity of one or multiple of them). In the classic use case, this works as follows:

I can trust that this website I’m visiting belongs to the organization holding the domain name I typed into my browser because I see that little lock in the URL.
Behind the scenes, this trust is established as another party (the CA) assured the binding of some cryptographic material to some identity information, based on some more or less rigorous checks. I might not know this other party but my browser does, and the mere existence in my browser’s (or OSes) certificate store expresses this trust.

Alas, matters involving trust can be way more complex in today’s world. Imagine you operate an application that runs on several systems and at some point connects to a system operated by a third party (called $ORG in the following), for example, for querying a database. As smart and security-conscious people are involved, certificates are used everywhere including that one external system. When asked about the dimension of trust as for the certificate over there (in the following: $CERT) one might be tempted to respond ‘well, that one enables us to trust we’re connecting to the right system (and infosec told us to ubiquitously use certificates anyway)’.

However, in reality:

You now inherently trust that $ORG too has done a reasonable job when getting an appropriate certificate for the purpose.
You trust the respective CA to have done a proper job vetting $ORG (and to have issued an appropriate certificate for the purpose).
You now inherently trust that $ORG knows or monitors the expiry date of $CERT (and, evidently/subsequently, that related alerting capabilities are in place).
You inherently trust that some sufficiently qualified personnel will be available at the latest on the day when $CERT expires.
Overall you inherently trust $ORG’s operational maturity to properly handle certificates 😉.

Looking closer you may also find out that $CERT is a wildcard certificate covering the full domain of $ORG, so the initial assumption of trust (‘make sure we connect to the right system’) might be… debatable.

In short, understanding the (hidden) trust relationships in an environment can generally be beneficial for prioritizing operational resources. This brings me directly to the next point.

Tradeoffs

The world of certificates is full of tradeoffs (as, of course, these are all settings with many different parties and their — differing — objectives). Here they are usually clustered around two main themes:

Performing certificate validation at all 😉. This may sound strange at the first glance — I mean, using certificates only makes sense once you validate them, right? — but many of us know situations of the ‘oops, that expired certificate over there breaks our service delivery right now. What about temporarily [by some definition of temporary 😂] disabling certificate validation for the TLS connections between those systems to quickly fix the issue?’ type. You may also look at the Wi-Fi authentication use case below.
How to determine if a certificate is (still) valid. This can be time-based, based on checks of the revocation status, or both. Such checks (and the concept of certificate lifetimes/validity periods as a whole) are related to a specific property of certificates (them being capabilities, see next section), and these checks can induce significant operational complexity (for example, see the post I referenced at the beginning of this one). I will cover certificate revocation and checking in a later part of this series.

Finding the right balance between objectives of different parties (going with the right tradeoffs) can greatly help to efficiently steer operational resources in all directions. For example, increasing certificate lifetimes between systems that are all part of the same — your — operational domain can be a good idea when certificate expiry is a frequent cause of issues. An even better solution is to increase the level of automation for renewal 😉.

You may then spend some intellectual cycles on understanding/questioning the tradeoffs in your environment. As stated above, some of the tradeoffs are commonly related to the most important, yet at times, least understood point of my little theory discourse here.

Certificates are capabilities

Imagine there’s a subject (a user/process) that wants to access an object, for example, a resource (network, file, and so on). The enforcement mechanism controlling the subject’s access to the object can then look at an attribute of the object itself (we could call it something like an ‘access control list’). This attribute/list is then checked every time the subject shows up and asks for access, and it’s usually maintained by the object’s owner. Or it can look for an entitlement (not to be confused with, but similar to these) which at an earlier point of time was granted to the subject and which generally allows some access. Such a thing is sometimes called a capability, and certificates can be perfect examples of capabilities (technically, the private key corresponding to a certificate’s public key constitutes the actual capability, but let’s keep it simple).

I’m using the above terms a bit loosely here, and there’s a lot of theoretical discussion in OS security circles on these. In any case, capabilities have two main challenges:

Delegation: How can you make sure that one subject does not transfer the capability to another subject after it has been granted.
Revocation: If circumstances change (for example, a system/key material is compromised or when a user leaves an organization) how can you make sure that the once-granted entitlement can no longer be used.

Both are well-known in certificate circles, and various architectural or technical approaches exist on how to deal with them including:

Producing a flag (‘non-exportable’) for private keys and hope that the OS environment properly enforces it.
Storing the private key(s) in some extra-secure place. That’s the main reason why smart cards once gained a lot of popularity in some industry sectors (namely heavily regulated ones like banks), and why hardware security modules (HSMs) exist.
Implementing an additional layer where, at the very moment of a certificate’s use, some extra check of the ‘ok, it is still within its validity period, but has it been revoked?’ type happens. Voilà — the birth of certificate revocation checking! Welcome to a whole new space of complexity, trust relationships, and tradeoffs (which I’ll discuss in detail in the next post).

It should be noted that revocation checks significantly change the trust relationships (‘Ok, I see the certificate that you present to me. It was meant to create trust between you and me, but I’m not convinced. Let me reach out to somebody else to verify.’) and, that they kind of move the needle towards an object-based security model that many people intuitively prefer as it gives them the notion of being in control (also this is better aligned with many compliance frameworks 😉).

Certificate use cases

Let’s now discuss some certificate use cases from the above perspectives. In the following I will look at five of them (the first two in this post, the others in the next):

E-commerce web server offering HTTPS.
Authentication in enterprise Wi-Fi networks.
Client/user authentication (for example, for VPN access).
Client/user authorization (as in ‘enrich a certificate with additional information which is then parsed to take security decisions like controlling access to a specific resource’).
mTLS.

Ee-commerce web server with HTTPS

This is probably the most classic use case, and it’s the one that paved the way for the widespread use of certificates. When e-commerce became a thing, there were two challenges to be solved from a user’s (buyer’s) perspective:

How do I know I’m connected to the right server (assuming that this one only uses my credit card data for the goods I want to purchase)?
How can I be sure that my payment data is not compromised when using the Internet for its transfer?

Both could be addressed by deploying a certificate on the web server(s) and enabling HTTPS.

To note:

From a trust perspective, this is a reasonably easy one. The user has a certain desire (for example, to buy something, or to watch specific content) that generally highly influences trust decisions (otherwise Ponzi schemes wouldn’t work). The CAs were trusted as there were only a few of them, and their trustworthiness was rarely questioned or verified by the people requesting certificates (in the early days the latter was sometimes even a part of a company’s marketing team; they usually have a more optimistic approach to life than those ever-sceptical infosec folk, anyway).
From a company’s security objective perspective, it was also an easy one — none of the assets to be protected (user’s credit card data) were really of relevance to the protection needs of the web server owners. This only changed when the Payment Card Industry Data Security Standard (PCI DSS) emerged.
From an operations perspective, it wasn’t particularly difficult either. Certificates had comparably long lifetimes (usually two years), there were only a few of them, and while renewal was known to be somewhat inconvenient it was at least less cumbersome than the initial request.

Authentication protocols used in enterprise Wi-Fi networks

Pretty much all Extensible Authentication Protocols (EAPs) used in enterprise Wi-Fi networks employ certificates. Some of them are only on the side of infrastructure elements (for example, PEAP), others (EAP-TLS) — also for clients. The latter, especially, brings high operational complexity (see for example, this old setup guide, which my fine buddy Chris Werny authored many years ago). With that come heavily differing objectives of the involved parties and quite interesting failure scenarios.

Let’s analyse some of the involved parties:

Operators of the RADIUS servers. They might not be super-familiar with certificates, hence installing those may not be a daily task for them, so they’d be happy with generally longer certificate lifetimes.
‘Enterprise desktop team’ — they will strive for auto-enrollment and renewal, and again they will want to keep things simple (‘why do they bother us with this certificate stuff, our life is already difficult’). This group/task could be outsourced (=> $CONTRACTOR1).
The users who just want Wi-Fi to work (legitimately) don’t care about the underlying technologies, and they will happily click away any certificate-related warnings ‘as long as the corporate Wi-Fi works’.
The infosec people want to prevent the users from doing the latter, and they’d be happy if the lifetimes of involved certificates were shorter. It’s a bonus if they come up with the idea of implementing some additional scheme where ‘Wi-Fi (security) profiles’ are mapped to certain parts of the certificate (did I already mention that certificates have various types of fields that can be ~~overloaded~~ populated with all types of information?).
The operators of the whole Wi-Fi infrastructure want to keep the users happy. There is some chance here that operations of (some parts of) the network infrastructure might be outsourced/provided by contractors ($CONTRACTOR2).
The CA issuing the involved certificates might be in-house or not; a common scenario is another contracted service ($CONTRACTOR3). It’s a bonus if the wireless infrastructure uses intermediary certificates from another CA ($CONTRACTOR4).

Let’s imagine at some point something either breaks or one of the certificates, in particular on the infrastructure level (RADIUS server, AP, wireless controllers) level, expires. There’s a high chance that the renewal requires human labour and skills and, evidently, requires touches on availability-critical network infrastructure. Maybe the certificates in question are not monitored. Overall, there’s some probability that certificate expiry leads to something breaking.

How well do you think $CONTRACTORS (1 through 4) will interact in such a case? Exactly 😉.

It should be noted that most of the above parties do not have a deep familiarity with certificates in their daily life. In fact, they are mostly invisible until something breaks doesn’t help either (=> incentives?).

I can also tell you from practical experience (from my days as a network consultant in a US Fortune 10 company 15 years ago) that all of the above parties (except the infosec folk) will happily and immediately sacrifice all certificate-related security properties once, say, 50K users might not be able to use the corporate Wi-Fi anymore (due to expiring intermediate certificates from a vendor with whom $CONTRACTOR2 had ended their contractual relationship). In that situation, the following suggestions might appear on the table:

Can’t we just disable certificate validation as a whole, on certain $INFRASTRUCTURE_ELEMENTS?
What about publishing guidance in which we tell users to ignore certificate warnings?
Is there any chance of configuring some grace period, say four to eight weeks, during which we still accept the expired certificates? $VENDOR already promised us a custom image that somehow avoids the issue (don’t ask…).

If only a group of experts had reflected on the certificate deployment in that environment, its operational complexities, its inherent trust relationships, and the tradeoffs between the different parties and their incentives earlier.

If this post makes you think about these aspects in your own world, I’m a happy man.

Enno Rey is a long-term IPv6 enthusiast with extensive practical experience in the space.

This post was originally published on the Internet Protocol Blog.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.