Tatu Ylonen may not be the best-known name in protocol development these days, but chances are that anyone doing systems work copying data between hosts or managing a Unix host will be using his software. He wrote the ubiquitous ‘SSH’ program back in 1995.
In a repost that always gets a huge amount of hits, Tatu discusses on the SSH Academy website how the Internet port assignment of 22 came about. Really, it’s a process story, about how in those days getting an assigned number in the protocol registry maintained by the Internet Assigned Numbers Authority (IANA) was as simple as writing a letter to Jon Postel and Joyce Reynolds. But the number ‘22’ is one of those rare and wonderful things. Why 22? Well, that’s the second part of the story.
What are port numbers anyway?
When a connection is made over the Internet Protocol (IP) to another machine (or even a local machine) a relationship is constructed between the Internet addresses, but also as a specific transport protocol (UDP or TCP for instance) one level above IP, and then within that higher transport protocol, a ‘port’ number which identifies a service. This allows the receiving host to understand what the sender wants to do because, by common convention, the port is signalling what kind of protocol is used for that service above the transport protocol. But it is just a convention — an organizing principle adopted to make things easier to understand and manage.
Users see IP ports all the time and don’t think about it
When a web URL looks like https://my.host.example.net:443/ the :443 part is actually giving the port number. Even the configuration of home Internet firewalls likely has rules in the Access Control List (ACL) mechanism about port 53 (DNS), port 123 (NTP) and ports 80 and 443 (HTTP, and secure HTTP) amongst others. These values are assigned to make it easier to know what’s going on. It’s basic organization.
It’s organizing principles all the way down
Protocols form layers like a sponge cake. Down the bottom is the physical layer, and the medium messages are sent over. It could be radio, in many different forms — Wi-Fi, cellular, satellite in (GEO) (LEO) or geosynchronous — orbit. It could be old-fashioned Ethernet on thick cables, modern unshielded twisted pair wiring, it could be a fibre optical connection or a modem connection through the telephone network, or even messages strapped to the leg of carrier pigeons. It really doesn’t matter. The physical layer simply provides a mechanism to send data out.
Above that is the ‘link’ layer, which, for that physical medium, defines what a bit, a byte, and a sequence of bytes (often called a frame) look like to the physical layer. It also has addresses all of its own sometimes, to define how people talk to each other on that link. That link is the local network. It may span vast distances, but in this context, it’s ‘local’ because the local user can know and talk to all the things on that link, without anyone else’s assistance.
The first true ‘Internet’ layer is the next one up, the network layer. It uses all the different kinds of links that lie between local and other Internet-connected devices to send IP packets. On the local link, they lie inside frames, when they cross into another intermediate network they move into their framing, and then at the receiver’s side they sit in whatever local link layer they use. So, the local user can be on fibre optic links in Wi-Fi and the framing takes care of the differences locally so the IP network layer can be a consistent message.
All is well and good, but there may be many other IP messages as packets flying around. How can they be categorized from each other? This is done by considering all the information about the local link and the other end in a form called a ‘tuple’ (a database term for a set of data) such as the ‘five-tuple’ of source address, source port, destination address, destination port, and transport protocol.
That should be unique to the local link and its conversation with another endpoint in the Internet. It’s used by Network Address Translators (NATs) to map from many private addresses to a smaller set of external globally visible addresses using the port number part creatively to identify the link behind the NAT common address. It’s used in Firewalls to manage data flow, to prioritize traffic, and crucially, it’s used by the receiving party to initiate communication using the right protocol encoding for the given purposes.
Breaking down the five-tuple
The IP packet has the source and destination IP address inside but also says what is the next level up, the transport protocol being used. That’s User Datagram Protocol (UDP) or Transmission Control Protocol (TCP) usually, the two predominant transport protocols used above IP (there are others) and each is identified by a protocol number, assigned by IANA. TCP is protocol number 6. So, inside the packet there’s the local link’s address as sender, their address as receiver, and protocol number 6, to say all the following data is TCP data.
But there can be many kinds of TCP data, and they could be sent to the other host at the same time, so it needs a way to say ‘no, this TCP data is about this context’ to work out where to send the data in all the processes running on it.
That’s where the port number comes in. Combined with the rest of the values in the five-tuple, the port number creates a unique set of numbers between sender and receiver and additionally allows the receiver to say ‘I will listen on this port’. In fact, they can say it to the world, implicitly, that they listen for a specific service on that port.
How many ports? Lots of ports!
These port assignments can be entirely random. They must lie within a range of values, 16 bits long (65,000 different ports). However, before the advent of some newer DNS features, the sender couldn’t know in advance where a given receiver listened by port to receive a specific service. So, if it ran email, login services and web (for instance), unless the sender knew to use ports 1021, 2232, and 8292 (for example) because the receiver was explicit about those port numbers, it was impossible to know which protocol to send to which port.
Managing port assignments in a registry
The problem could have been solved in lots of different ways but the one Internet designers settled on was to run a registry — a place that registers a specific value, its meaning, who controls it, and why it was delegated. In the DNS world, the registry manages sub-delegation of DNS names, in the Regional Internet Registry (RIR) system the registry manages unique number assignment and allocation for addressing.
The IP number assignment registry function was provided by IANA, embodied at the time in Jon and Joyce. Those who ran the services agreed upon the port numbers that Jon and Joyce delegated. That stopped confusion.
So, email was assigned port 25, and the Network Time Protocol (NTP) was assigned port 123 (a happy choice, as many economies use the phone number ‘123’ for their talking clock so that port made sense).
Two significant protocols of the time were Telnet (the remote login service) and the File Transfer Protocol (FTP) and they had been assigned numbers 21 and 23. Jon and Joyce were saving some spaces in the number field in case they turned out to be useful, and didn’t assign these two protocol numbers next to each other but they did put them close together because they were starting ‘low’ in the sequence of port assignments when the registry started. Telnet and FTP were among the first protocols defined.
To cut a long story short, since the SSH protocol was being designed to secure both Telnet and FTP, it made sense to give the number 22 to this new protocol. As it was positioned between the two values it was trying to succeed, it kind of made sense, even though by then there were hundreds of other application protocols, using bigger numbers. If you were going to fill up the spaces in the port assignment registry, this was a nice one to give SSH.
Tatu’s write-up explores in more detail both the amazingly simple process of emailing to get this assignment (it’s nothing like that simple anymore — the modern standards process can take literally a decade to justify an assignment) and the more modern consideration that there may be a reason to run the SSH protocol on different ports, which is doable if the related issues are understood.
Ports are still part of the protocol stack but are much less romantic now that they are bound in the IETF standards process, which is slow and demanding to justify. An old Internet adage about names probably applies to Internet port numbers too: ‘The good ones are short. All the short ones are taken’.
22 is a pretty good short port assignment for SSH. It’s well deserved.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.