TCPLS is an intertwined TLS1.3/TCP design that we developed with colleagues from UCLouvain and ULiege.
If you have never heard about TCPLS, the first thing that might come to your mind is “What does the acronym mean?” Much like QUIC, TCPLS is not an acronym. It simply conveys the idea that bringing TCP and TLS closer brings benefits to network applications. Some of the features of TCPLS can be found in QUIC, yet the two protocols have significant differences.
In our last post, we wrote about the initial research steps we took in developing TCPLS. In this post, I’d like to present the similarities and differences to QUIC, backed by some experimental results.
Why QUIC is amazing
Both QUIC and TCPLS are modern transport protocols — they offer an interface to modern transport services to the application layer.
QUIC was designed at Google and has evolved with the involvement of many actors through the IETF standardization process. QUIC is great. Everything is implemented in userspace, which means that QUIC’s lifecycle can follow the application’s lifecycle that embeds it. It offers more control to their designers, assuming they’re distributed the same way. Moreover, QUIC embeds cryptographic techniques to secure the application payload and secure its header and control information. This last bit is a truly awesome thing about QUIC.
Wait? Why would it be useful to make confidential protocol information? Nobody cares right? This is not confidential information, it is just boring fields!
You know what? You’re right, nobody cares about confidentiality for protocol specifications, except in some privacy-preserving models that care about uniformity (and not confidentiality!) but that is not the point I’d like to make here. What we care about as engineers and practitioners is that the protocol works on the Internet. And we want the protocol to be extensible, which means that it should also work in the distant future for potentially unexpected novel needs supporting extended features. To achieve this, QUIC, like TCPLS, takes advantage of another property of cryptography algorithms.
Modern Cryptographic algorithms are said to be probabilistic. Basically, they’re randomized algorithms producing an output that looks random for anyone that does not hold the secret. And if you apply twice the algorithm to the same input, you should also get a different, totally random output that can only make sense for the secret holder(s).
And that’s exactly the property we require for header information; it should look random and be unpredictable to avoid anyone on the path deriving any assumption about its inner workings. Such assumptions can become wrong with time, and contributes to Internet ossification, breaking the ability to extend or change how protocols work over the Internet.
QUIC solves the extensibility issue that TCP has. And you know what? Since TLS 1.3, we can do the same for TCP!
TCPLS is even more amazing
TCPLS solves the same issue as QUIC, is in userspace, provides new and modern transport features that directly work on the Internet, and is ensured to work through time thanks to the randomized protocol information.
But unlike QUIC, which works above UDP, TCPLS takes advantage of decades of performance optimizations made to TCP and has also a few other tricks up its sleeves.
What about the famous Head of Line (HoL) blocking avoidance that QUIC provides with independent transport streams? TCPLS has that too!
TCPLS provides a stream abstraction to the application layer. Those streams can be steered over different TCP connections belonging to the same TCPLS session, or multiplexed over the same TCP connection. That’s up to the application to choose.
The TCPLS design offers an extremely lightweight trick to join together multiple TCP connections, and have independent cryptographic contexts derived from the same key. These different crypto contexts are a requirement for HoL blocking avoidance because we need the different connections to encrypt and decrypt independently.
Assuming we use TCP TFO, TCPLS also has a 1-RTT initial handshake, as QUIC does. It provides multipath capabilities similar to MPTCP straight out of the box and connection migration and handover due to an outage (we call that feature failover). Bandwidth aggregation or regular 1-path connections is also a choice up to the application to make. Some could benefit from multiple IP paths, others do not care, and that’s fine when options are on the table!
Let’s compare the pair
Recently, we conducted a few benchmark experiments to compare TCPLS with a few QUIC implementations.
The testbed consisted of three servers equipped with an Intel Xeon CPU E5-2630 2.40 GHz. Two of these machines are used as a client and a server, and the third one is used as a middlebox on the path. Each machine was equipped with an Intel XL710 2×40 Gbps NIC. For all experiments, with all implementations, we used a single thread on each machine to run the client and server.
So, what can potentially explain TCPLS >= 2x QUIC’s raw throughput? There are a few technical reasons:
- QUIC uses UDP. Because of this, QUIC packets encrypt at once and at best 1,472 bytes of payload to hold over a standard MTU. However in practice, that can break over the Internet, and Google QUIC deployment has found 1,350 bytes to be a good compromise that would reduce the likelihood of a broken connection. This is low compared to the maximum 16,384 bytes that a TCPLS packet can handle, which means that using the same crypto algorithm over QUIC and TCPLS, encryption and decryption of a stream of data goes faster on TCPLS.
- The UDP stack is much less performant than the TCP stack. The most prominent example of this is the sendmsg/recvmsg interface that can handle only one UDP packet per call. These are system calls. Calling them too often implies a serious performance penalty due to the context switch between the kernel and the userspace. This context switch on Linux has been made more costly recently due to an implemented defence against the Meltdown exploit. This has a worse impact on QUIC than TCPLS because QUIC requires more context switches to carry the same quantity of data than TCPLS.
- Continuing on the previous issue, QUIC has all of its transport logic within the userspace. TCPLS has some of its transport logic still within the kernel. The reliability of the stream is still ensured by the TCP kernel implementation and no context switch is required to ensure reliability. QUIC does context switching for data reliability, which implies a penalty. For TCPLS, in the failover case, some TCPLS-level acks are still required to be processed. However, the frequency for a connection reliability property is much less demanding than acknowledgment strategies for data reliability.
- TCPLS has better support for transport offload in existing network interface cards (NICs). TCPLS uses TCP segmentation offload (TSO), which is implemented into the NIC. QUIC uses Generic Segmentation Offload (GSO), which is a software strategy in front of the NIC.
- TCPLS’s design supports a contiguous zero-copy receiver. QUIC does not. At best, QUIC can support a fragmented zero-copy strategy, which has reduced performance and is more difficult to manipulate. However, the tricks we developed within TCPLS to enable a contiguous zero-copy receiver can be ported to the QUIC design.
A sixth reason that we could not evaluate — the Internet is more TCP friendly. Firewalls and other middleboxes are used for TCP but disregard UDP packets in many similar usage scenarios, potentially leading to connection reachability issues and performance downgrades. The impact of this on QUIC is yet to be determined and understood.
All in all, we see why TCPLS outperforms QUIC. We are impressed by the raw performance some of the QUIC stacks have achieved, especially MsQuic and Quicly, given the handicap they start with.
It is possible that, with time, the gap between QUIC and TCPLS shortens. However, we’re unconvinced, from the reasons described above, that QUIC could ever catch TCPLS without losing part of its current identity in the process (for example, its userspace flexibility).
Oh, and guess what? TCPLS can be written from any existing TLS 1.3 library. Everything can be efficient, effective and quick to develop!
Learn more about TCPLS from our research paper with my colleagues Emery Assogba, Maxime Piraux, Korian Edeline, Benoit Donnet and Olivier Bonaventure. We are also making ongoing efforts to propose a design to the IETF. We welcome feedback to help us make this a reality!
Contributors: Maxime Piraux, Korian Edeline, Benoit Donnet.
Florentin Rochet is an Assistant Professor in Computer Security at the University of Namur and a researcher in privacy and security mainly applied to computer networks.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.