Basic routing concepts, part 4: NATs aren’t evil

By on 24 Sep 2021

Category: Tech matters

Tags: , , ,

Blog home

As with the firstsecond and third posts in this series, this post is aimed at those in the early stages of their networking careers.

Routers are fundamental to how the Internet works. Have you ever wondered how this box in the corner is able to provide Internet to your computer, phone, tablet and smart TV, all running at once inside your house, while you interact with your ISP as one household rather than as a collection of devices?  

Welcome to Network Address Translation (NAT). A NAT is a device for making several things look like one thing by forwarding Internet Protocol (IP) packets. The ‘translation’ part is about translating the source and destination addresses inside the IP packets as they pass through the router. 

A NAT can take an address, reuse it, and not cause confusion even though it is being applied to two different things in two (or many more) different places.

Regional Internet Registries (RIRs) don’t encourage that, but for a number of different special blocks of addresses (for example, the RFC 1918 ‘private’ addresses) this is entirely normal. You might be on 192.168.1.1 inside your home, and somebody else you’re talking to on Skype might be on 192.168.1.1 inside their home, but you both get to talk to each other over the public Internet.

Every packet your computer has to send out to the world can be sent through a ‘translation’ box, to change its sender identity from the one you want to use inside the house, to the one you want to use outside the house on the public Internet.

Today’s post will discuss why this is useful, but also why NATs are an unfairly maligned technology.

NATs provide more convenience and more IPv4 address space

NATs are a bit like a border checkpoint on the edge of your house. They don’t provide security, but they do give directions to the specific devices inside your home. They also allow you to switch IP addresses when you go to a new ISP.

However, most households have more than one device requiring an IP address. A LOT more.

And there’s the problem. Your ISP doesn’t have enough Internet IPv4 addresses to give every customer they have 16, 32, or even 2 or 3 addresses, without charging more for them. It would be quite a lot more.

The entire RIR model of address distribution is based on the problem that there aren’t enough IPv4 addresses for the entire world and almost all of them have been consumed in a rationing model worldwide.

Read more: How bad is IPv4 address exhaustion?

Of course, in IPv6 there are more than enough addresses. But most of the Internet still functions on IPv4, and the problem remains. How do many devices in a single household all work on the Internet when an ISP may provide only one IPv4 address?

Figure 1 — A diagram of devices in four houses using Wi-Fi routers with NATs.

In Figure 1, each coloured square represents a household with a NAT acting as the ‘doorway’ into the house. Each blue number is an IP address, and each of the letters is a different port. A crucial thing to note is that ports are for each active line of packet transmission/communication. Think of it this way — for each browser window you open, a new port is needed, but the computer just needs the one IP address.

The desktop computer with IP address one (let’s call it IP1) is playing an online game with IP7 in another household (both are using Wi-Fi so they’re not particularly serious gamers). IP1 is also sending messages to computer IP8 in a different house. Both these sets of communication are bi-directional.

It’s also important to note that from the perspective of the ISP, the only visible IP addresses are three, four, five and six. Whether it’s a NAT, phone, PC or anything else, an IP address is just an IP address. The ISP doesn’t care how many devices are in each house; it just deals with the one IP address at the NAT gateway.

So, the gamer at desktop IP1 establishes port A, and uses this to send packets via the router with the NAT. The NAT is IP3, which is the address visible to the world. The NAT establishes port E, which links with Port I on the NAT IP4. The NAT then sends it to Port M, which is linked to IP7. It’s a two-way street. The packets are flowing back and forth through the path 1A, 3E, I4, and 7M and back 7M, I4, 3E, and 1A. 

Because the computer is doing more than one line of packet communication at a time, it needs multiple ports. That computer is also sending messages to another computer, in a different house, at the same time. That line of communication goes through 1B, 3F, 5J, and 8N and back again along the same route.

The same goes for the phone in the green house, which is messaging two different people who both live in the same yellow house, sharing a NAT. The first discussion is going via 2C, 3G, 6K, and 9O (and back again) while the other discussion goes from 2D, 3H, 6L, and 10P.

Notice how quickly the number of ports increased with just a few examples. In reality, hundreds of ports are needed, for the many different tasks required to send packets to various different places. 

So how is all of this organized?

A NAT is a kind of table

The first post in this series discussed how routing is basically just dealing with lists. Well, a NAT is also just a list. It’s a slightly different kind of list, based on something we call a 5-tuple. A ‘tuple‘ is a mathematical concept (common in relational databases and programming languages) that describes a sequence of things. A 5-tuple has five things in it:

  • Source IP address.
  • Source port.
  • Destination IP address.
  • Destination port.
  • Protocol above IP layer (TCP, UDP, SCTP…).

A 5-tuple like this defines everything needed at a specific point in time that’s needed to send and receive packets between two things. The 5-tuple is also exactly the same information as the source and destination IP address and ports of the actual IP packet; it’s about something ‘in’ the packet, so it’s perfect for working on a list of packet information that has to be changed.

Those are five distinct values. Each of these 5-tuples is a row on the NAT table. Having an entry in the NAT table means you can navigate both ways past the ‘border’ of the household, whether you are heading out into the great big wide Internet, or coming back inside. The NAT table can convert packet header information to the value to use ‘outside’ or they may be an incoming packet on the outside trying to find the right ‘inside’ host and ports of the various devices in the house. In effect, the NAT table can take this 5-tuple and ‘modify’ the contents of packets to make it work with the address and port parts being changed.

Here’s an example of one of these 5-tuple tables. For this scenario, we’re imagining it’s a house with only one IP address on the public Internet.

The one address known to my ISP (the ‘border’ NAT IP address we saw examples of in Figure 1) is going to be 203.0.113.129.  When packets come back, they come back to that destination IP address, and the NAT box has to find which inside host to send it to, using a port.

The NAT box has to decide how to use port numbers to provide this service. Typically, it says “I’m going to use some number (lets say 1,000) of ports to assign a unique ‘index’ to each IP conversation across my boundary” and then it runs this as a list, which is handed out for each active IP session. So essentially, it says “I can do 1,000 conversations at once; here are the tags I have for each conversation, and I’ll just reuse them once I’ve finished each conversation.”

In this scenario, if you want to do more than 1,000 things at once in TCP, and you already have 1,000 connections running… you’re out of luck. 

Device What we're trying to do Inside source IP Inside source port Outside destination IP Outside destination port Protocol NAT IP address as seen from the outside NAT port as seen from the outside State
My tablet HTTPS access a website 192.168.1.1 16384 203.133.248.1 443 TCP 203.0.113.129 1021 Live
My phone Look up a DNS name 192.169.1.2 8001 1.1.1.1 53 UDP 203.0.113.129 1022 Live

Table 1 — Example NAT table.

Table 1 shows a typical NAT table. Each row in the table has been assigned, on an as-needs basis, to get an IP packetflow out of the household and into the world. The two rows are two temporary entries. The first one will live for the life of my HTTP request to this website, and the second one will live for the life of my DNS request to Cloudflare to ask for some name-to-address lookup.

Each entry can be looked at from two sides. From the inside, it asks “what do I do to get off this home network? How do I change my source address and source port?” From the outside, it asks “I’ve just arrived inside your NAT. Which inside host is lined up with this destination port and IP?”.

New entries can be added, old entries can be removed. Let’s imagine after we do the DNS lookup on my Kindle, I update a note for a book I’m reading:

Device What we're trying to do Inside source IP Inside source port Outside destination IP Outside destination port Protocol NAT IP address (as seen from outside) NAT port (as seen from outside) State
My tablet HTTPS access a website 192.168.1.1 16384 203.133.248.1 443 TCP 203.0.113.129 1021 Live
My kindle Look up a DNS name 192.168.1.2 8001 1.1.1.1 53 UDP 203.0.113.129 1022 Dead
My kindle Update a note at Amazon 192.168.1.2 2021 198.51.100.21 2022 TCP 203.0.113.129 1023 Live

Table 2 — Example NAT table with a row added.

A new external port has been assigned to manage this Kindle note update, so a new row was added. Because the DNS query finished and there aren’t any packets flowing through there anymore, the second row has been marked as ‘dead’ and after a while that row on the NAT can be recycled. The outside source/destination port number 1,022 will be used for another NAT binding.

Because the 5-tuple has the source address in it, on the ‘inside’ of the house it can discriminate between my laptop and the Kindle. Each one has a unique source address ‘inside’ the house, so it’s going to be a different row in this table; the tuple will differ by source address. 

When things cross the NAT boundary, you ‘map’ the 5-tuple so that it emerges as your single ISP assigned IP address, with as much of the rest left unchanged as possible. You can’t alter the destination address, port or protocol at all, so all you’ve got is the sender’s port number. That’s how you can ‘index’ back to the specific inside-the-house conversation you’re managing. When it comes to looking in the table from the outside, this ‘source’ port going out, will also have a ‘destination’ port coming in.

A NAT table is not infinite — there has to be a limited number of choices

Alas, there’s no magic in a NAT. Everything here is bounded by size, time and complexity issues. Memory adds cost to a device, and so one of the ways home router vendors save money is to limit the memory footprint. The 5-tuple burns memory, one row per entry. A web browser is capable of having hundreds of tabs open, and each tab can represent 20 or 30 distinct connections. If an online computer game, a phone call, and some web browsing elsewhere in our home network is going on, the ‘inside’ NAT table size is pretty big. You may have just dozens of devices, but you could have hundreds of open connections.

There is also a maximum limit when considering all those addresses in the outside world (the first post in this series on prefixes discusses how this becomes manageable). Every inside 5-tuple has to go out with one external address and one port. There are only so many ports you can use on the outside, but thanks to the magic of prefixes, it can be done.

The other hard limit is time. A larger NAT table may impose CPU costs to find the entry and manage the table. Sometimes junk entries that no longer apply can linger, clogging up the NAT.

When a NAT isn’t enough a CGN does the job

In this post we’ve focused on the example of a household, but NATs are used by Internet Service providers (ISPs) that may need to handle millions of IPv4 addresses. This is where a Carrier Grade NAT (CGN) comes into play. They work in the same way as a regular NAT but are much bigger. A CGN is a large device (in memory and computing power) with high-speed links, which can handle thousands of internal customers, and redirect them out through a far smaller set of external globally routable Internet addresses. This has several attractive properties for an ISP.

A typical CGN will be able to manage tens of thousands of customers behind one /24 of IPv4 addresses.

The NAT table discussed above mapped the 5-tuple to one IP. Now imagine that can be done with 255 IP addresses. That immediately offers scaling of 255 times. However, by keeping entries in the table temporarily, that space is freed up and a multiplier effect kicks in. Rather than handling 255 times as much traffic, we’re likely into the tens of thousands.

This also avoids the cost of acquiring and routing new IPv4 addresses.  If you want more clients, reduce the NAT binding lifetime. If you want fewer problems, give each NAT table entry a longer time to exist without a packet flowing before you remove and reuse that slot.

A CGN can stack in front of a home NAT just fine

It might be a bit of a mind bender, but you can use a CGN and have a home NAT and things work just fine. That IPv4 address information can be translated back and forth, over and over.

There’s only one question that matters here, and it’s how ‘agile’ your protocol is, regarding what it thinks your address is. Typically, to just read the web or fetch some mail, it doesn’t matter what your IP address is because you’re the client. The server is the ‘static’ component that should be a fixed well-known IP address bound to a name in the DNS (or a collection of IP addresses).

The point being, they can be found reliably and don’t just change into something else. Going the other way, your ISP doesn’t typically have to give you a ‘static’ address that isn’t changed by a NAT or CGN (we call the fixed, untranslated addresses ‘static’ and the changing or translated ones ‘dynamic’). The ISPs usually word the contract to say they won’t give you your own static address unless you pay more money.

So, the question of static or dynamic address binding inside the ISP to scale things is about pricing — if you want to have a static client address, be prepared to pay more for it.

If you don’t care, and your protocol works through a NAT, your protocol generally now works through two, or even four NATs if you can use the right methods to inform everyone what real addresses are being used.

So, systems like the Skype Internet phone, or game systems where two machines in the network start talking directly to each other (not always via a third point) have to find out what the real, globally reachable external address is, and there are ways to leverage a single fixed point to ‘learn’ your address.

Of course, an aggressive NAT binding lifetime can be tricky, because by the time you complete learning your external IP, it might be moved out, but the protocols generally take care of that. So in the modern network I can be at home behind my home NAT, behind a carrier NAT, talking to another Carrier NAT and another home NAT at the other end, and things work just fine.

CGN logging is a bit of a nightmare

One of the downsides for an ISP is that if you have an aggressively short-lived NAT binding, and lots of customers, you have a huge logging nightmare.

Say one of your users does something that isn’t acceptable and you get a complaint. How do you find out which one it is? If NATs weren’t involved you could just look up the IP address. But a CGN needs to consider when they did it. Within seconds, someone else might be using that IP address. This means that it’s necessary for them to hold much larger logs. They’re so big, in fact, that it imposes limits on how long they can retain them. It’s an enormous cost to record all that traffic. 

Customers may say that the limits on the ability of CGNs to track as much traffic is a good thing, but I’m not as sure it’s a benefit overall. 

IPv6 dual-stack reduces pressure on a CGN 

Are CGNs a bad thing for IPv6 uptake? Not at all! If you can deploy a CGN, but also deploy dual-stack systems so all your ‘inside’ customers are on IPv6 and IPv4, the amount of traffic you have to send through the CGN reduces, the more people use IPv6. There’s no tension between the protocols and, in fact, CGNs work really well at scale for dual-stack IPv6-enabled ISPs. They’re very common in the mobile segment.

NATs are not firewalls; you need proper protection

A common misconception is that a NAT is actively part of your ‘protection’ against incoming traffic. It is true that a NAT can aide in masking what your specific inside hosts are and who is using the NAT, but it’s emphatically false to believe a NAT replaces having a firewall that checks packet integrity and defends you against unexpected IP risks. A NAT is not a replacement for a proper defensive boundary. Don’t depend on a NAT to provide this function for you.

Are NATs evil? No; NATs are a fact of life

During the earlier stages of the IPv6 deployment story it was very common to hear the mantra ‘NATs are evil’ at standards meetings. This was because of a couple of different things.

Firstly, there was a belief they impeded the move to IPv6, and so were tactically prolonging the lifetime of IPv4. This is arguably true, but the goal here isn’t to ‘kill off IPv4’; it is actually ‘make the Internet work for everyone’. NATs have unquestionably allowed more people to use the Internet. Secondly, there was a belief they eroded the end-to-end principle — the ability of any IP address to strike up a conversation with any other IP address without barriers.

Read more: Opinion: In defence of NATs

Well, it is true that with a NAT you need to do work to discover your external NAT binding address, and that intrudes into the IP protocol connection establishment for higher protocol layers. And that with a NAT it’s harder (but not impossible) to set up a rendezvous address on your public face, to receive inbound calls. Nowadays, this is mostly a non-problem, and the barrier to entry to be ‘NAT agile’ is about choosing a software library and a model of NAT detection, and coping.

The economic reality is that NATs and CGNs exist and they aren’t going away any time soon.

To learn more on routing, check out the APNIC Academy‘s range of free courses and webinars.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published. Required fields are marked *

Top