Imagine using your phone at work to connect to a virtual meeting via your office Wi-Fi network. The video conference is going overtime but you need to get to a client meeting across town. The facilitator says there are only 15 minutes left so you decide to watch the rest while you walk to your car to save time.
As your phone goes out of range of the Wi-Fi network, it automatically connects to and starts using a cellular network (4G or 5G) instead. In this process, all the connections you have with the video server close and reload, which leads to interruptions in the video or the video freezing if the connections can’t be made again.
This is referred to as the ‘Parking Lot Problem’ and is one of the inefficiencies of the Transport Control Protocol (TCP) that its successor, QUIC, overcomes with the new ‘connection migration’ feature.
As I detailed in my first post in this series, the QUIC protocol was developed to overcome several inefficiencies that TCP surrounding its future-proofed design, including privacy, and security improvements. In this post, I’ll go into more detail surrounding this, focusing on the everyday problem I illustrated above, and discussing the future of seamless migration between networks.
The problem of TCP in today’s Internet
As I mentioned in my previous post, TCP is not the best protocol for transferring your data reliably across today’s Internet.
This is because of the simplicity of TCP connections — they flow between two IP addresses and if one of those IP addresses changes, for example, when moving networks, they can no longer be used, as shown in Figure 1. (Note, in practice, TCP also uses so-called ‘ports’, and not all network changes incur IP address changes, but the concepts are the same.)
While this process isn’t the end of the world (the new connection will work just fine), it is somewhat inefficient. After all, it’s just the IP address that changes, everything else about the TCP connection and its state (for example, Transport Layer Security (TLS) protocol encryption parameters) could conceptually stay the same. We’re paying a lot of unnecessary additional overhead, for example, waiting for the new TCP and TLS handshakes to occur.
To help solve this problem, QUIC no longer relies (purely) on IP addresses to define connections. Instead, it assigns a number to each connection (a so-called Connection ID or CID). As such, even if you change networks and IP addresses, as long as you keep using the same CID, the ‘old’ connection remains usable. The server doesn’t care that the IP address has changed; it can rely on the CID to know it’s really you in that new network. This means both the client and server can keep the existing connection state, and there is no connection setup overhead, as shown in Figure 2.
As such, QUIC Connection Migration solves a performance inefficiency prevalent in TCP. However, if it worked as shown in Figure 2, it would be fundamentally insecure and terrible for end-user privacy!
Improving end-user privacy
The CID does not only describe individual connections but is (implicitly) tied to individual devices and, therefore, users! If you have a nefarious party with a view of multiple networks (for example, both the cabled network the Wi-Fi connects to and the 4G cellular network), they could use the CIDs to track individual users across different networks! While this is probably outside the purview of individual hackers, larger organizations, such as big companies and especially nation-states, could use this ability.
Luckily, QUIC’s designers were acutely aware of this problem, called ‘linkability’. QUIC solves this by changing the CID every time users change networks. This means attackers cannot directly attribute the connection to an individual device because the IP address and the CID change. Problem solved!
However, didn’t we need a non-changing CID in the first place to deal with the fact that the IP address changes? How can that possibly work if the CID also changes?
The trick is to assign not just one but multiple CIDs to the same underlying connection. In Figure 3, the blue connection isn’t just described by the green box, but ALSO by the purple circle and the red triangle. Crucially, only the client and the server know about these additional CIDs; any attacker just observing the network traffic will not because the new CIDs are exchanged securely once an encrypted QUIC connection has been established. As such, clients can safely use the next CID in the list; only the server will know that it’s actually targeting the same underlying connection.
(Note: In practice, this mechanism is even more complicated due to the need to support load balancers and other use cases. Please refer to this blog post for more information).
While this mechanism works quite well, it does have some downsides. The CID is one of the few parts that have to remain unencrypted in the QUIC packet header metadata. This is partly because the server needs to be able to look up the correct connection based on the CID before it can use that connection’s TLS decryption keys to unlock the packet. As discussed in the previous post in this series, any unencrypted metadata can (and probably will) be read (and probably misused/abused) by middleboxes (and attackers), again somewhat compromising QUIC’s future-proofed design.
Using a separate encryption scheme just for the CIDs (so they can be encrypted separately from the packet) can solve this issue. However, that increases the difficulty of implementing other services, such as firewalls and load balancers. This problem is quite difficult to properly solve in practice, and work is still underway at the IETF to define what that should look like.
Why can’t I use Wi-Fi and 4G at the same time?
This is an excellent question and a feature called Mulipath is currently being worked on to address this issue. This concept has been developed for TCP for over a decade but was difficult to deploy worldwide.
With QUIC, building on the connection migration features described above, this approach becomes much more feasible. Companies such as Apple, Google, and Alibaba are actively experimenting with it to improve overall network robustness and increase performance by combining the bandwidth of both networks for a single conceptual connection.
In my next post, I’ll discuss the protocols’ performance in the wild and why comparing QUIC and HTTP/3 to TCP and HTTP/2 is difficult.
Robin Marx is a Web Protocol and Performance Expert at Akamai.
This post was originally published on the Internet Society’s Pulse Blog.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.
Congratulations, Robin, for your QUIC related posts. I’ve been looking for a long time for friendly infomation about QUIC and how it contributes to a better user experience and its problems like the ones related to load-balancers and CID. Your posts are a great entry point for QUIC newbies.
it is interestying to read your thoughts about QUIC possibilities for server migration. Continuing the same CID, with a different server within a cluster of a service. This featre has a potential for cache cluster management. It allows shelling out the load-balancer and gain performance.