Most APNIC Blog readers will be familiar with the Domain Name System (DNS) and Domain Name System Security Extensions (DNSSEC). This post will focus on a key part of DNSSEC infrastructure — Root KSK ceremonies. These ceremonies exist to provide transparency to the Internet community around the creation, use, and storage of the Root KSK. Transparency is essential in establishing trust of the KSK — asking the Internet to just blindly trust something wouldn’t work, and rightly so!
Who runs the Internet?
Answering this question completely would take a very long time. Over the life of the Internet, it has moved from being a US military project maintained by a handful of people, to today, where the Internet has penetrated a solid majority of the developed world, and a fair amount of the developing world. Its history is long and complicated.
In short, operation of the DNS root (among other functions) is overseen by the Internet Assigned Numbers Authority (IANA), which is a function of ICANN, the Internet Corporation for Assigned Names and Numbers. On 1 October 2016, ICANN gained independence as a non-profit organization, governed by an international multistakeholder community. Before then, ICANN was contracted and overseen by the US Department of Commerce.
Another party involved in DNS root management is Verisign, a public US-based company. Aside from operating the A and J root name servers, Verisign is the Root Zone Maintainer (RZM), meaning they’re responsible for maintaining the root zone file at the direction of IANA.
The components of a ceremony
Ceremonies are held every three months and alternate between two redundant Key Management Facilities (KMFs) in the US — in Culpeper, Virginia, and El Segundo, California. These facilities comprise a secure ceremony room, subject to dual occupancy. Inside this room is a metal cage ‘safe room’, also subject to dual occupancy, which contains two safes.
Safe 1, the equipment safe, contains Hardware Security Modules (HSMs) each sealed in its own tamper-evident bag (TEB). An HSM is a specialized device for securely storing private keys and allows a key to be used without the key itself leaving the HSM. HSMs are generally tamper-resistant, meaning they’ll delete the key and stop working if tampering is detected — usually through a combination of accelerometers, tamper switches and other mechanisms.
There are often multiple HSMs in the safe at various stages of being commissioned or decommissioned, as well as two operational HSMs, which are alternated between each ceremony.
Safe 1 also contains (again in TEBs):
- Two laptops that are used alternately. These are near normal laptops, however, they are operated as air gapped machines with no permanent storage, so they have no Wi-Fi, Bluetooth, battery, hard drive or SSD, and are never plugged into a network.
- A DVD and USB flash drive. The DVD contains the operating system for the laptop, that is a minimal Debian build with some extra utilities for the ceremony. The flash drive, known as the HSMFD, serves as a record and backup of the signatures and logs created during the ceremony. Multiple copies are made during the ceremony to be distributed.
- A smartcard containing an encrypted backup of the KSK.
Safe 2, the credential safe, contains safe deposit boxes, containing more smartcards in TEBs that are required to activate the HSMs. The safe deposit boxes in this safe need two (physical) keys to be opened. One is retained by IANA, and the second key for each box remains under the custody of a Crypto Officer (CO).
COs are Trusted Community Representatives (TCRs), meaning they aren’t affiliated with IANA, ICANN or Verisign. Anyone can apply to be a CO; however, a certain level of knowledge is required. There are seven COs for each KMF, although only three are required to operate an HSM.
There is another group of TCRs called Recovery Key Shareholders (RKSHs). There are seven RKSHs in total and they are custodians of smartcards containing parts of the encryption key under which the KSK is stored. The RKSHs are only needed in disaster recovery situations, and five RKSHs are required to recover the storage key.
As transparency and auditing is such a core part of the ceremonies, the participants follow a script to ensure correct policies are followed and deviations are recorded. The whole ceremony is also filmed from multiple angles and since 2018 is also live streamed on YouTube. The full resources from every ceremony since the start of DNSSEC signing in 2010 are on IANA’s website.
What happens at a Root KSK ceremony?
As well as the COs, there are a few other people involved in the ceremony, mostly IANA and ICANN staff. The leader of the ceremony is the Ceremony Administrator (CA), who reads out the script and conducts the main tasks of the ceremony. The timestamp of every step completed is recorded by the Internal Witness (IW), as are deviations from the script, called exceptions. There is also a Safe Security Controller (SSC) for each safe, responsible for opening and closing the safe and recording what is taken and put back.
Also present are representatives from the RZM (Verisign), external auditors, staff, and external witnesses. Anyone can apply to attend a ceremony (in non-COVID times) as an external witness. The final key role is the system administrator, who is responsible for the filming and live streaming of the ceremony, as well as the physical access system for the ceremony and safe rooms, and other support tasks.
Verisign generates the root ZSKs, and to be useful, the ZSK needs to be signed by the root KSK. This is the main objective of the ceremonies — to generate signatures of the ZSK using the KSK. There are, however, a multitude of other tasks, including HSM introduction and destruction, KSK generation, safe maintenance and TCR rotation.
To start off, I’ll go through the simplest ceremony, where signature generation is the only task. I’m going to abbreviate things substantially as the scripts usually start at around 30 pages. If you want to follow along in the actual script, then I’d suggest using the one from Ceremony 36, as it’s the simplest recent ceremony at the time of writing. I’ll get to the complexities of more recent ceremonies later.
- After starting the cameras and welcoming the participants, the CA, IW, SSC for safe 2, and COs enter the safe room. The SSC opens the safe, then each CO in turn (along with the CA using the common key) opens their safe deposit box to retrieve the needed smartcard. Each safe deposit box contains a second smartcard used for more advanced tasks, but if it’s not needed, it’s left in the safe. During the process, the TEBs of both cards are checked for integrity and their serial numbers are matched against what they were during the last ceremony. The safe is then locked and everyone exits the safe room.
- After a short delay enforced by the physical access system, the CA and IW go back into the safe room with the SSC for safe 1. After safe 1 is opened, the CA retrieves the HSM, laptop, and OS DVD required for the ceremony. Again, the TEBs of both the removed and remaining items are verified, the safe is locked, and everyone exits the safe room.
- The CA now sets up the equipment on a table at the front of the ceremony room. This involves plugging in and booting the laptop, which is only connected to its power cable, a USB printer, and an external monitor so the screen can be easily seen in the room. The hash of the OS DVD is then verified, and the time set. The time has to be set because the laptop doesn’t have a battery and is never connected to the Internet, so it won’t keep time itself.
After verifying the contents of the HSMFD (which was with the OS DVD from the last ceremony) and starting audit logging, it’s time to set up the HSM.
- The HSM is connected to the laptop by both a serial cable for logging purposes, and an ethernet cable over which the laptop and HSM communicate. At this stage, the HSM will not do anything as it’s inactive. To activate it, three COs come to the table and present their smartcards to the CA, who inserts them into the HSM in turn to activate it. The HSM is now ready for signing.
- The CA now plugs in another flash drive (KSRFD) to the laptop, which contains the public part of the ZSK from the RZM. The CA runs a command on the laptop which prints out the hash of the ZSK. A representative from the RZM in the room reads off the hash from their own documentation, which all participants verify matches the hash displayed by the laptop. If the hashes match and there are no objections, the CA types ‘y’ on the laptop to confirm the signing. The laptop communicates with the HSM, and the generated signatures are saved to the KSRFD, to be given back to the RZM. First, however, the contents of the KSRFD are copied to the HSMFD as a backup. The signing also generates a log, which is printed and distributed to everyone in the room.
- Now that the signatures are generated, the HSM can be deactivated, which again requires three CO cards. The HSM is sealed into a new TEB which has its serial number recorded so it can be verified at the next ceremony. The audit logging on the laptop is now stopped and saved to the HSMFD, which has five copies made for various audit processes. The audit logs are also printed. The laptop is now shut down and the HSMFD and OS DVD are put into a new TEB. The laptop and CO smartcards are also returned to their new TEBs.
- The final stage is returning the equipment to the safes, starting with the HSM, laptop, and OS DVD into safe 1, and then the CO smartcards into safe 2, in essentially the inverse process of steps 1 and 2. All participants must then sign the IW’s script to attest that it is an accurate record of the ceremony. The cameras are then stopped, and various materials are collected for auditing purposes, including logs from the physical access system.
Other ceremony tasks
Some of the simplest secondary tasks are the replacement of RKSHs and COs. For an RKSH replacement, the outgoing RKSH attends a ceremony with their smartcards, and while an HSM is out of the safe for use in the ceremony, their smartcards are verified to still work, then are repackaged and given to the incoming RKSH.
For a CO replacement, the outgoing and incoming CO go into the safe room along with the CA, IW and SSC2. The outgoing CO opens their safe deposit box, removes the two TEBs and gives them to the incoming CO, who verifies their integrity, then places them into a new safe deposit box and collects its key.
If an outgoing CO isn’t available for the transition, then a locksmith will attend to drill out the lock on the safe deposit box. This highlights another priority of the ceremonies — reliability. There are transparent and auditable processes to recover from pretty much any problem. In this case, as the cards are still in their TEBs inside the box, the audit trail remains intact.
On recovering from problems, Ceremony 40 is a good example. As well as the standard signature generation, HSM3 (West) was scheduled to be decommissioned, a new set of replacement safe deposit box locks were to be prepared, and the locks on both safes were to be replaced.
The safe deposit box ceremony went smoothly, however the next day, the SSC couldn’t open safe 1 for the lock to be changed. The backup SSC tried, but also couldn’t open it. It was determined that the lock had failed as it was accepting the code but wasn’t physically unlocking. The lock on safe 2 worked fine and was successfully replaced, however with safe 1 unable to be opened, the ceremony wouldn’t be able to go ahead, so a locksmith attended to drill out the lock. It took almost 20 hours to open the safe, which is a testament to the strength of the lock! Once the safe was finally repaired and the lock replaced, the HSMs were tested to ensure their tamper mechanisms hadn’t tripped.
The ceremony could then take place late on Saturday evening. The destruction of HSM3 was postponed for a future ceremony due to the massive delay caused by the lock malfunction.
The policies for the DNSSEC root management include HSM rotation schedules to ensure the HSMs do not fail over time. When a new HSM is to be introduced, multiple temporary smartcards are generated using an existing HSM to transfer the storage and access keys to the new HSM. After the new HSM is set up, the KSK can be imported from the KSK backup smartcard in safe 1. After this, the temporary smartcards are erased and then shredded.
When an HSM is due to be retired, the KSK is deleted from it, then its tamper mechanism is manually triggered to render it inoperable. It’s then disassembled, and the sensitive components are sealed in a TEB to be sent to a third party for shredding.
When a new KSK is generated (which has only happened in 2010 and 2017), it’s generated inside one HSM and transferred to the other HSMs in both KMFs, again through smartcards (in TEBs when couriering between the two KMFs). An HSM can hold multiple KSKs at once and during the transition to a new KSK, both will be used to sign the ZSK (and each other). After the transition, the old KSK is deleted, and its backup smartcards wiped and shredded.
The two most recent ceremonies at time of writing, 41 and 42, were conducted in April 2020 and February 2021, respectively. As the COVID-19 pandemic made travel and indoor gatherings unsafe, a few major changes were implemented:
- The West KMF was used because using the East KMF would have required IANA staff to fly from coast to coast.
- All participants who were able to, participated remotely via video call. This included the COs, who couriered their safe deposit box keys to IANA in TEBs before the ceremony. Each CO remotely granted the IANA staff permission to use their key and witnessed them being put into new TEBs to be sent back to them after the ceremony.
- Instead of the usual three months’ worth of signatures being generated per ceremony, both ceremonies generated nine months’ worth of signatures each. The additional generated signatures were retained by IANA until the time they’d normally have been generated.
- Non-essential tasks scheduled for the ceremonies were postponed.
About the ‘Seven keys to the Internet’ analogy, and accountability
I’ve previously blogged about how the DNSSEC process is dramatized when presented to the public. I’d like to counter some of the more common arguments, as it isn’t helpful to the objective of trust in the DNSSEC processes.
The root KSK ceremonies are only for the DNS, and only DNSSEC at that. Even if we assume that DNSSEC eventually reaches 100% adoption (which, as I’ve mentioned before, we’re a long way from at the moment), DNSSEC keys do not — and never will — directly protect communications such as HTTPS.
But taking a step back from that, we need to look at the likelihood of various situations. Firstly, failure of the ceremony and DNSSEC would essentially require simultaneous disasters of extreme proportions at both KMFs such that all copies of the KSK were destroyed. In this situation, worldwide DNS recursor operators may have to disable DNSSEC validation temporarily if a new KSK couldn’t be set up before signatures expired.
Another common concern is various parties going rogue. In the case of COs, all they retain outside of a ceremony is the key to their safe deposit box, giving them hardly any more access than any person off the street, as the key is only useful once in the safe room, with the safe open. Getting to this point without detection would be almost impossible. Even then, all the materials are inside TEBs, which would need to be broken to attain any access to the KSK.
So, in a sense, there is a group of seven people with ‘keys to the Internet’, but on their own, without ICANN (and the tens-of-thousands-strong Internet community backing them), they’re powerless, so bringing it up as some sort of mind-blowing fact is disingenuous at best.
In the case of the RKSHs, if five or more were to go rogue, then they could only recover the HSM storage key, which without access to safe 1 (and again breaking TEBs) does not provide them with the KSK.
Another thing people worry about is ICANN going rogue. This is difficult to address briefly, but the truth of the matter is that ICANN, and the Internet in general, is governed, maintained, and improved, by a massive global community of many different subgroups. While representation could arguably be better, change is slowly coming and it’s the responsibility of everyone to ensure that the Internet community is as representative of the world as possible.
The last party to talk about is the US government. While they no longer have any special ‘veto’ over the Internet, many Internet companies, including ICANN and Verisign, are US-based and therefore subject to US law. The US has used this power against Verisign, who also operate the .com top-level domain, to take down .com websites that violate US law. To attempt to use this power in a broader manner against ICANN would be an almost unimaginable step for the US government to take and would undoubtedly face very vocal resistance from the Internet community.
There are some pushing for the physical aspects of the Internet to be more globalized, in addition to the multistakeholder communities that already exist. I’m broadly in support of that, however Internet governance has been a complicated topic since its inception and big changes take time.
Cameron Steel is a tech enthusiast with interests in networking and security, particularly in advancing the adoption of IPv6, DNSSEC and HTTPS.
Adapted from the original post at Cam’s blog.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.