Every technical endeavour seems to grow its own special language. You can’t talk to steam-engine enthusiasts without Walschaerts being mentioned. Who knew ‘four foot, eight and a half inches‘ would become universal? Maybe it’s a universal rule that the more work you do, the more specialized the language becomes. Digital cryptography, and specifically Public Key Infrastructure (PKI) is no exception.
If you work with Internet numbers, you will know that among Resource PKI (RPKI) specialists, this is one of the issues being discussed — how to make a PKI work for Internet number distribution.
Let’s look at a key concept in the PKI world: ‘M of N’. What does this mean?
M of N is about sharing the risk and agreeing to use a key
First thing’s first — the N represents the number of parts, and the M represents the minimum number of those parts you need to make this process work.
The numbers might be three out of four. Or five out of 50. You can look at it as a fraction or a percentage — whatever floats your boat.
M of N is a pretty simple concept. You take something that has significant value — in this case, the private key in a public-private key pair, and you make it necessary for a group of people to be involved in using it, rather than just one.
It’s like the old treasure map stories; you tear the key into pieces, giving them to a number of people. If you split up the key, you can stop it being used unless all the split parts are presented, thereby ensuring no one holder of a single key fragment can steal the treasure.
There’s a twist, though; you can make it so you can access the full map/key with just some of the parts/participants. This number is the ‘M’ in M of N.
While you don’t technically need to have the ‘part holders’ involved, and someone could hand out the Personal Identification Number (PIN) to their key, those involved in key signing do sometimes operate from the assumption that people will not share their keys. Thus, the key holder and the key itself are sometimes indistinguishable in the sense they both must be present for this to work.
According to Thales (who sell hardware security modules), M of N is also called ‘multi-person control’ or ‘quorum-based authentication’. So the two key elements inherent in the names here are more than one (person, as in a key-holder) and ‘quorum’.
Did you catch all that? The key points here are that you have a set of keys, and you need to meet the agreed-upon quorum of people holding keys.
What are the fundamentals behind the mechanism? It’s the old computer science trick — checksums and error correction.
You can’t use a key unless you have all the parts
Let’s imagine you have a private key. Its a string of letters, 64 characters long. Here’s an example of the kinds of private key I’m talking about:
Now, imagine you mistype this, and miss just ONE character: you miss the 9 at the beginning and type:
Will it still work? In your own personal experience, missing one character in things like your login password, or the pass phrase you use to secure your bitcoin wallet is ‘no, it won’t work’. And it doesn’t matter if you miss that one character at the front, or the middle, or the back; you need all of the key to unlock the secret. That is the common experience.
So here’s an important point: Keys normally have to be complete to be used.
But what if I told you, this isn’t actually your ‘private key’ but is a ‘secret’, which itself is used to unlock the data that is your private key? In fact, in almost all practical public-private key cryptosystems you are going to use, this is how things work. You aren’t using your private key directly, you are given a secure password to unlock it, and software uses the unlocked key in the complex cryptographical mathematics that encodes the message.
(This gets even worse. What it actually does, is do really expensive, time-consuming mathematics to encode another secret key, and uses that secret key to do the thing that needs to be done securely, and then sends the encoded data with the secret key wrapped up in the complex, expensive mathematics. The public-private part is so horrendously expensive, it’s not actually used to directly protect the secret data; it’s used to protect the specific, randomly generated key that the data was secured with).
In this model the ‘golden rule’ here remains true: you can’t unlock the real private key if you don’t have all of this 64-letter secure password.
Now, you can break this password up into eight chunks of eight letters:
And then you tell eight people “here is your chunk of the password” but you also tell them “keep it private” and find a way to get them to present it, keeping it private, at the right time.
You have just made an ‘M of N’ model, where M and N are the same number: eight. Eight out of eight. 100%. All eight people have to be present to use this key. As long as they don’t share their secrets, they all have to be there to use it.
What about missing parts?
Here, I will delve a little bit into the Redundant Array of Inexpensive Disks (RAID) model, which basically allows us to potentially use the key if only, say, seven out of the eight people were present. RAID is a system where, if you have Network Attached Storage (NAS) at home with more than one physical disk, you can lose one disk but not lose the data.
In RAID, this is done using checksums and error-correcting codes. They’re the fundamental mechanism in computer science (in mathematics really) to do this kind of work — represent things in a way that allows you to detect something is missing and correct for it.
You may notice that if you get NAS with RAID, it says you have three 2TB disks inside, but for some reason you only get to use 4TB. Three 2TB disks should add up to 6TB, right?
It turns out there is a cost to being able to have one of your disks go missing. The same principle applies to M of N. Same maths, same idea, different application.
Let’s take a look at what that cost actually involves.
To do an M of N with any one person missing, you have to recalculate the eight values to be shared, super-encoding into the value an ‘error correcting code’ over every other element so that any one element can be missing as long as all seven are still present.
(For those familiar with RAID: This would be like moving from RAID-0 stripes to RAID-2 parity (not RAID-1 mirror), and specifying the parity to be ‘replace one missing drive’).
I didn’t calculate this, but let’s assume this has been done, and our eight people now have to hold (for instance) a 10-character special key, but inside those 10 characters, you encode enough information to reconstruct the original 64-character word, as long as seven of them are present. (It probably isn’t 10 characters. I am using this for illustrative purposes. The point is, from seven of the 10 character strings, or however long it has to be, you can reconstruct all eight of the original eight letter strings).
Any seven of the above can be used to reconstruct all of the eight below:
The thing to note in this example is that the top string is longer than the bottom. You’ve added a bit extra so that the remaining pieces could be assembled to form the whole. Each piece has to have added ‘leftovers’ that are there to pick up the slack when one piece is missing. This is that added cost that was just discussed.
And in like sense, you can move up the RAID hierarchy, allowing any two, any three, any four of the eight original fragment-holders to be missing, and be able to reconstruct all of the required key. Each time, you need to give each M of N participant more information to be able to recover more missing elements of the key.
What is the right M, in M of N? It depends
The big question out there, is how many parts do you want to select, as a minimum ‘quorum’ to decide to proceed? What is the right number of parts (N) and minimum present (M) to use? Let’s discuss some choices.
What if you just want to be sure it will always work?
If you just want to be sure you can do something, give everyone a copy of the full key. How many you give it to depends on how much you care who does it, and the risks of that information leaking. The more people you tell, the more likely it is misused. It’s like sharing the key to the bike shed. After the first 10 shares, somebody has passed a copy off to their cousin and all bets are off. But, if two or three people need to be able to do a thing, just share the key, and share some rules about using it.
What if you want a minimum process check?
If you demand at least two or more people from the list of people to be present, you are starting to bring some checks on the process to the table. If you believe a single person could be breaking the rules, or doesn’t understand the risks involved and should have a degree of supervision, you can make sure a minimum of at least two people are present when things are done. You could choose to segment things so that ‘one person from operations, and one from finance’ are required. This is a low bar and doesn’t do much beyond the bare minimum. Two people might collude on something nefarious, or more likely, one will just say to the other “oh, here’s my key, do what you need to do” and from there, things will only get sloppier.
There is also the option to safeguard against sudden absences if you still go with a requirement of two people, but it’s from a specific larger group. If it’s two people out of three, then only one person can be missing. If it’s two out of four, you can have a few missing. This can help if you’re worried about staff absences.
Higher M of N numbers become burdensome, but allow for more stringent process checks
If you keep following this logic, you can easily see how you could require any six from 10 people, or any 10 from 20. The natural question that arises is “what are you trying to do?”. In almost all cases, there is some higher functional goal at hand; a requirement for visible engagement from the community. There may be a requirement for participation from specific sectors of a shared activity, to ensure compliance with a process. There may be a requirement for reporting back to several interested parties with an element of distrust amongst them.
What do people really do in practice?
As an example, the ICANN DNS root, invokes ‘three of seven’ as its M of N formula. Seven cards have been given out to the Trusted Community Representatives (TCRs) and a minimum of three of them are required to unlock a critical device used in the ICANN DNS root key re-signing ceremony. The recent COVID-19 situation limiting travel caused some initial consternation, since it risks the question of what happens when a basic quorum cannot be fulfilled.
ICANN worked out a way around things, and met their over-arching community goal for transparency and inclusion, but the problem is inherent in any M of N calculation; if you make it too few, you risk some collusive behaviour that subverts the intent. If you make it too many, you risk some catastrophe causing you to slip below a minimum number to recreate your complete key.
Isn’t the maths horrific?
Well, yes and no. This is actually a well understood problem; it’s not that hard to work out what to do. The real problem is, you can’t decide to do this ‘post hoc’ and just break up the keys. To do this securely, you need it ‘baked in’ to the model of key management you are using. In the case of modern PKI, this almost always means using a Hardware Security Module (HSM) that integrates an M of N model into its core logic, and has been tested in the Federal Information Processing Standards (FIPS) framework for its underlying security.
Thales (for example) offers up to 16 N units, and any number from 1 to 16 of M elements can be defined. If you define 1, then M of N is not applied. You can elect to make copies, but that’s outside the model of M of N they apply. If you define any M=N, then it’s ‘all keys present’, and if M<N, it’s the quorum M you define for the given N number of keys. They document this on the web.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.