Efficient multipath transport with QUIC in large-scale video services

By on 8 Dec 2021

Category: Tech matters

Tags: , ,

Blog home

Contributor: Yanmei Liu

Streaming videos has become commonplace in our daily lives. Unfortunately, when engineers are entrusted with the task of delivering smooth video streaming to our users, we face numerous challenges from ‘last-mile’ wireless connections. Poor wireless connectivity, either Wi-Fi or cellular, causes a variety of problems, ranging from slow video start, high video rebuffering rate, to complete connection loss.

One way to fundamentally address the last-mile problem is to employ multipath transport. Most smart devices are equipped with many radio interfaces, including Wi-Fi and cellular. Therefore, if we have a lightweight multipath transport protocol that can be easily used by our apps, we can aggregate independent links for more bandwidth and enhance network robustness video.

The topic of multipath transport is not new. We already have a well-known multipath protocol, MPTCP, defined in RFC 6824. In the real world however, deployment of multipath transport over public Internet has been traditionally slow due to deployment difficulties and performance issues from MPTCP.

Multipath QUIC as an end-to-end network solution is promising to change the landscape. Our research and engineering team at Alibaba developed XLINK, a multipath extension for QUIC, with additional advanced scheduling features to support videos. Our large-scale A/B tests in Taobao video shows significant improvements in tail latency and video rebuffering rate.

XLINK offers several key advantages over MPTCP

Figure 1 shows the network stack. At the top, we have application protocols such as HTTP3, RTMP, and RPC. They use QUIC as the transport layer. One layer below, we implement the multipath extension of QUIC, which includes path management, scheduler, loss detection, and recovery. All these components are integrated with the app and kernel changes are not necessary.

Infographic showing XLINK in network stack.
Figure 1 — Network Stack showing XLINK.

The user space nature of QUIC makes the deployment much easier than MPTCP and can be done similarly to an app release. Upgrading and testing are also much easier. These advantages also help accelerate innovation. For example, we can quickly run A/B tests by tuning some parameters. On the client side, upgrades can be daily. On the server side, we can update the configuration hourly. XLINK inherits QUIC’s end-to-end encryption, and its opaqueness to middleboxes, which allows XLINK’s packets to traverse the Internet much easier than MPTCP.

Infographic showing network design and role of multipath scheduling.
Figure 2 — Because multipath QUIC is integrated with the app, this allows us to design highly customized algorithms that cater to the needs of different apps where the most important part is multipath scheduling.

Scheduling is critical to multipath transport

We found scheduling via QUIC was far from straightforward when applying multipath scheduling solutions to our large-scale video services. To justify the incentives, multipath should achieve no worse performance than a single path. However, our deployment showed that the default minimum round-trip time (RTT) scheduler could be 28% slower at the 99th percentile than the single path delivery over the better path.

Efficiently using aggregated wireless resources turns out to be much harder than expected. One of the major hurdles is the multipath head-of-line (MP-HoL) blocking issue caused by fast varying and heterogeneous paths. The blocking happens when the packets sent earlier at the slow path arrive later than the packets at the fast path, causing out-of-order arrival. The out-of-order packets are not eligible to be submitted to applications, so the fast paths must wait. Significant heterogeneity over Wi-Fi, LTE, and 5G, as well as frequent handoffs of mobile terminals in our context, will further aggravate this issue.

One approach to overcome these issues with MP-HoL, is to send duplicate packets. However, redundant packets also mean that the traffic cost is higher, especially for videos, where the amount of traffic is very high (for example, 1GB could cost about 10 cents).

In terms of scheduling, we have two operational challenges at the same time:

  • Optimal user-perceived quality of experience (QoE).
  • Minimized cost overhead.

If you ask who knows the performance best, the answer is the video player, because video players know if a video rebuffers. On the other hand, the cost is determined by our scheduling algorithm. To optimize performance and cost at the same time, we need the player to work with the scheduler remotely. In other words, there is a knowledge asymmetry between the client and server and closing this knowledge gap is key.

Quality of experience

In XLINK, we introduced the Quality of Experience (QoE) driven multipath scheduling based on the QoE-Control-Signal framework.

When acknowledging (ACKing) received packets, the XLINK client requests the video players’ QoE information such as cached frames and framerates, allowing the server to compute the play time left in the client’s video playback buffer. XLINK then attaches the QoE info in the QoE control signal and sends it back to the server. After receiving the ACK, the server then passes the QoE control signal to its multipath scheduler. The scheduler uses QoE feedback to control the aggressiveness of packet reinjection.

Figure 3a shows the dynamics of video buffer level when one of the paths is fading while the other is stable. Figure 3b has no reinjection. What happens is the data is blocked with the sudden failure of a path, which causes the video buffer to run out of data and hence, video stalls. Figure 3c is reinjection without QoE. Reinjection is helpful, but it also incurs many unnecessary reinjections, which keeps increasing the traffic overhead. Figure 3d is the reinjection with QoE. It uses reinjection only when it is necessary. Reinjection is turned on when Path 1 fails, but there is no reinjection after Path 1 recovers, which allows us to balance the cost and performance.

Four line graphs showing the dynamics of video buffer level as per injection state.
Figure 3 — Dynamics of video buffer level as per injection state.

Multipath QUIC improves performance by more than 19%

To verify the effectiveness of XLINK, we performed a large-scale A/B test in Taobao mobile. In the experimental methods, we formed two contrast groups running in parallel with 100k participants who upgraded to Taobao mobile test versions. The dataset consisted of more than three million video plays. In the A/B test, we collected data in a two-week period.

As shown in Figure 4, multipath QUIC outperformed single path consistently. We noted greater than 2.3% improvement in median request complete time, more than 9.4% improvement in the 95th percentile and more than 19% improvement in the 99th percentile. For more information, please read our SIGCOMM’21 paper.

Bar graph showing data from a large scale two week test verifying effectiveness of XLINK per day.
Figure 4 — Data from a large-scale two-week test verifying effectiveness of XLINK.

Right now, we are in the process of integrating XLINK in several of Alibaba’s flagship products, including Taobao, Dingtalk, and AliCloud edge networks. On the standardization side, we have started the collaboration with researchers and engineers from Private Octopus Inc, UCLouvain, and Ericsson in a newly merged QUIC draft with a scope more focused on the core multi-path functionality.

Yunfei Ma is a staff engineer and research manager at Alibaba.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.

Leave a Reply

Your email address will not be published.

Top