rfc9840.original | rfc9840.txt | |||
---|---|---|---|---|
Network Working Group M. Bagnulo | Internet Research Task Force (IRTF) M. Bagnulo | |||
Internet-Draft A. Garcia-Martinez | Request for Comments: 9840 A. Garcia-Martinez | |||
Intended status: Experimental Universidad Carlos III de Madrid | Category: Experimental Universidad Carlos III de Madrid | |||
Expires: 6 August 2025 G. Montenegro | ISSN: 2070-1721 G. Montenegro | |||
P. Balasubramanian | P. Balasubramanian | |||
Confluent | Confluent | |||
2 February 2025 | August 2025 | |||
rLEDBAT: receiver-driven Low Extra Delay Background Transport for TCP | rLEDBAT: Receiver-Driven Low Extra Delay Background Transport for TCP | |||
draft-irtf-iccrg-rledbat-10 | ||||
Abstract | Abstract | |||
This document specifies rLEDBAT, a set of mechanisms that enable the | This document specifies receiver-driven Low Extra Delay Background | |||
execution of a less-than-best-effort congestion control algorithm for | Transport (rLEDBAT) -- a set of mechanisms that enable the execution | |||
TCP at the receiver end. This document is a product of the Internet | of a less-than-best-effort congestion control algorithm for TCP at | |||
the receiver end. This document is a product of the Internet | ||||
Congestion Control Research Group (ICCRG) of the Internet Research | Congestion Control Research Group (ICCRG) of the Internet Research | |||
Task Force (IRTF). | Task Force (IRTF). | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
provisions of BCP 78 and BCP 79. | published for examination, experimental implementation, and | |||
evaluation. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document defines an Experimental Protocol for the Internet | |||
and may be updated, replaced, or obsoleted by other documents at any | community. This document is a product of the Internet Research Task | |||
time. It is inappropriate to use Internet-Drafts as reference | Force (IRTF). The IRTF publishes the results of Internet-related | |||
material or to cite them other than as "work in progress." | research and development activities. These results might not be | |||
suitable for deployment. This RFC represents the consensus of the | ||||
Internet Congestion Control Research Group of the Internet Research | ||||
Task Force (IRTF). Documents approved for publication by the IRSG | ||||
are not candidates for any level of Internet Standard; see Section 2 | ||||
of RFC 7841. | ||||
This Internet-Draft will expire on 6 August 2025. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9840. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2025 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. | |||
described in Section 4.e of the Trust Legal Provisions and are | ||||
provided without warranty as described in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
2. Motivations for rLEDBAT . . . . . . . . . . . . . . . . . . . 3 | 2. Conventions and Terminology | |||
3. rLEDBAT mechanisms . . . . . . . . . . . . . . . . . . . . . 4 | 3. Motivations for rLEDBAT | |||
3.1. Controlling the receive window . . . . . . . . . . . . . 6 | 4. rLEDBAT Mechanisms | |||
3.1.1. Avoiding window shrinking . . . . . . . . . . . . . . 7 | 4.1. Controlling the Receive Window | |||
3.1.2. Setting the Window Scale Option . . . . . . . . . . . 8 | 4.1.1. Avoiding Window Shrinking | |||
3.2. Measuring delays . . . . . . . . . . . . . . . . . . . . 8 | 4.1.2. Setting the Window Scale Option | |||
3.2.1. Measuring RTT to estimate the queueing delay . . . . 9 | 4.2. Measuring Delays | |||
3.2.2. Measuring one way delay to estimate the queueing | 4.2.1. Measuring RTT to Estimate the Queuing Delay | |||
delay . . . . . . . . . . . . . . . . . . . . . . . . 11 | 4.2.2. Measuring One-Way Delay to Estimate the Queuing Delay | |||
3.3. Detecting packet losses and retransmissions . . . . . . . 13 | 4.3. Detecting Packet Losses and Retransmissions | |||
4. Experiment Considerations . . . . . . . . . . . . . . . . . . 13 | 5. Experiment Considerations | |||
4.1. Status of the experiment at the time of this writing. . . 14 | 5.1. Status of the Experiment at the Time of This Writing | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | 6. Security Considerations | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 | 7. IANA Considerations | |||
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 | 8. References | |||
8. Informative References . . . . . . . . . . . . . . . . . . . 16 | 8.1. Normative References | |||
Appendix A. Terminology . . . . . . . . . . . . . . . . . . . . 17 | 8.2. Informative References | |||
Appendix B. rLEDBAT pseudo-code . . . . . . . . . . . . . . . . 18 | Appendix A. rLEDBAT Pseudocode | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 | Acknowledgments | |||
Authors' Addresses | ||||
1. Introduction | 1. Introduction | |||
LEDBAT (Low Extra Delay Background Transport) [RFC6817] is a | LEDBAT (Low Extra Delay Background Transport) [RFC6817] is a | |||
congestion-control algorithm used for less-than-best-effort (LBE) | congestion control algorithm used for less-than-best-effort (LBE) | |||
traffic. | traffic. | |||
When LEDBAT traffic shares a bottleneck with other traffic using | When LEDBAT traffic shares a bottleneck with other traffic using | |||
standard congestion control algorithms (for example, TCP traffic | standard congestion control algorithms (for example, TCP traffic | |||
using Cubic[RFC9438], hereafter referred as standard-TCP for short), | using CUBIC [RFC9438], hereafter referred to as "standard-TCP" for | |||
it reduces its sending rate earlier and more aggressively than | short), it reduces its sending rate earlier and more aggressively | |||
standard-TCP congestion control, allowing other non-background | than standard-TCP congestion control, allowing other non-background | |||
traffic to use more of the available capacity. In the absence of | traffic to use more of the available capacity. In the absence of | |||
competing traffic, LEDBAT aims to make an efficient use of the | competing traffic, LEDBAT aims to make efficient use of the available | |||
available capacity, while keeping the queuing delay within predefined | capacity, while keeping the queuing delay within predefined bounds. | |||
bounds. | ||||
LEDBAT reacts both to packet loss and to variations in delay. With | LEDBAT reacts to both packet loss and variations in delay. With | |||
respect to packet loss, LEDBAT reacts with a multiplicative decrease, | respect to packet loss, LEDBAT reacts with a multiplicative decrease, | |||
similar to most TCP congestion controllers. Regarding delay, LEDBAT | similar to most TCP congestion controllers. Regarding delay, LEDBAT | |||
aims for a target queueing delay. When the measured current queueing | aims for a target queuing delay. When the measured current queuing | |||
delay is below the target, LEDBAT increases the sending rate and when | delay is below the target, LEDBAT increases the sending rate, and | |||
the delay is above the target, it reduces the sending rate. LEDBAT | when the delay is above the target, it reduces the sending rate. | |||
estimates the queuing delay by subtracting the measured current one- | LEDBAT estimates the queuing delay by subtracting the measured | |||
way delay from the estimated base one-way delay (i.e. the one-way | current one-way delay from the estimated base one-way delay (i.e., | |||
delay in the absence of queues). | the one-way delay in the absence of queues). | |||
The LEDBAT specification [RFC6817] defines the LEDBAT congestion- | The LEDBAT specification [RFC6817] defines the LEDBAT congestion | |||
control algorithm, implemented in the sender to control its sending | control algorithm, implemented in the sender to control its sending | |||
rate. LEDBAT is specified in a protocol and layer agnostic manner. | rate. LEDBAT is specified in a protocol-agnostic and layer-agnostic | |||
manner. | ||||
LEDBAT++ [I-D.irtf-iccrg-ledbat-plus-plus] is also an LBE congestion | LEDBAT++ [LEDBAT++] is also an LBE congestion control algorithm that | |||
control algorithm which is inspired by LEDBAT while addressing | is inspired by LEDBAT while addressing several problems identified | |||
several problems identified with the original LEDBAT specification. | with the original LEDBAT specification. In particular, the | |||
In particular the differences between LEDBAT and LEDBAT++ include: i) | differences between LEDBAT and LEDBAT++ include the following: | |||
LEDBAT++ uses the round-trip-time (RTT) (as opposed to the one way | ||||
delay used in LEDBAT) to estimate the queuing delay; ii) LEDBAT++ | ||||
uses an Additive Increase/Multiplicative Decrease algorithm to | ||||
achieve inter-LEDBAT++ fairness and avoid the late-comer advantage | ||||
observed in LEDBAT; iii) LEDBAT++ performs periodic slowdowns to | ||||
improve the measurement of the base delay; iv) LEDBAT++ is defined | ||||
for TCP. | ||||
In this specification, we describe rLEDBAT, a set of mechanisms that | i) LEDBAT++ uses the round-trip time (RTT) (as opposed to the one- | |||
enable the execution of an LBE delay-based congestion control | way delay used in LEDBAT) to estimate the queuing delay. | |||
algorithm such as LEDBAT or LEDBAT++ at the receiver end of a TCP | ||||
connection. | ii) LEDBAT++ uses an additive increase/multiplicative decrease | |||
algorithm to achieve inter-LEDBAT++ fairness and avoid the | ||||
latecomer advantage observed in LEDBAT. | ||||
iii) LEDBAT++ performs periodic slowdowns to improve the measurement | ||||
of the base delay. | ||||
iv) LEDBAT++ is defined for TCP. | ||||
In this specification, we describe receiver-driven Low Extra Delay | ||||
Background Transport (rLEDBAT) -- a set of mechanisms that enable the | ||||
execution of an LBE delay-based congestion control algorithm such as | ||||
LEDBAT or LEDBAT++ at the receiver end of a TCP connection. | ||||
The consensus of the Internet Congestion Control Research Group | The consensus of the Internet Congestion Control Research Group | |||
(ICCRG) is to publish this document to encourage further | (ICCRG) is to publish this document to encourage further | |||
experimentation and review of rLEDBAT. This document is not an IETF | experimentation and review of rLEDBAT. This document is not an IETF | |||
product and is not a standard. The status of this document is | product and is not an Internet Standards Track specification. The | |||
experimental. In section 4 titled Experiment Considerations, we | status of this document is Experimental. In Section 5 ("Experiment | |||
describe the purpose of the experiment and its current status. | Considerations"), we describe the purpose of the experiment and its | |||
current status. | ||||
2. Motivations for rLEDBAT | 2. Conventions and Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | ||||
"OPTIONAL" in this document are to be interpreted as described in | ||||
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | ||||
capitals, as shown here. | ||||
We use the following abbreviations throughout the text and include | ||||
them here for the reader's convenience: | ||||
RCV.WND: The value included in the Receive Window field of the TCP | ||||
header (which computation is modified by this specification). | ||||
SND.WND: The TCP sender's window. | ||||
cwnd: The congestion window as computed by the congestion control | ||||
algorithm running at the TCP sender. | ||||
RLWND: The window value calculated by the rLEDBAT algorithm. | ||||
fcwnd: The value that a standard RFC793bis TCP receiver calculates | ||||
to set in the receive window for flow control purposes. | ||||
RCV.HGH: The highest sequence number corresponding to a received | ||||
byte of data at one point in time. | ||||
TSV.HGH: The Timestamp Value (TSval) [RFC7323] corresponding to the | ||||
segment in which RCV.HGH was carried at that point in time. | ||||
SEG.SEQ: The sequence number of the last received segment. | ||||
TSV.SEQ: The TSval value of the last received segment. | ||||
3. Motivations for rLEDBAT | ||||
rLEDBAT enables new use cases and new deployment models, fostering | rLEDBAT enables new use cases and new deployment models, fostering | |||
the use of LBE traffic. The following scenarios are enabled by | the use of LBE traffic. The following scenarios are enabled by | |||
rLEDBAT: | rLEDBAT: | |||
Content Delivery Networks and more sophisticated file distribution | Content Delivery Networks (CDNs) and more sophisticated file | |||
scenarios: Consider the case where the source of a file to be | distribution scenarios: | |||
distributed (e.g., a software developer that wishes to distribute | Consider the case where the source of a file to be distributed | |||
a software update) would prefer to use LBE and it enables LEDBAT/ | (e.g., a software developer that wishes to distribute a software | |||
LEDBAT++ in the servers containing the source file. However, | update) would prefer to use LBE and enables LEDBAT/LEDBAT++ in the | |||
because the file is being distributed through a CDN that does not | servers containing the source file. However, because the file is | |||
implement LBE congestion control, the result is that the file | being distributed through a CDN that does not implement LBE | |||
transfers originated from CDN surrogates will not be using LBE. | congestion control, the result is that the file transfers | |||
originated from CDN surrogates will not be using LBE. | ||||
Interestingly enough, in the case of the software update, the | Interestingly enough, in the case of the software update, the | |||
developer may also control the software performing the download in | developer may also control the software performing the download in | |||
the client, the receiver of the file, but because current LEDBAT/ | the client (the receiver of the file), but because current LEDBAT/ | |||
LEDBAT++ are sender-based algorithms, controlling the client is | LEDBAT++ are sender-based algorithms, controlling the client is | |||
not enough to enable LBE congestion control in the communication. | not enough to enable LBE congestion control in the | |||
rLEDBAT would enable the use of LBE traffic class for file | communication. rLEDBAT would enable the use of an LBE traffic | |||
distribution in this setup. | class for file distribution in this setup. | |||
Interference from proxies and other middleboxes: Proxies and other | Interference from proxies and other middleboxes: | |||
middleboxes are commonplace in the Internet. For instance, in the | Proxies and other middleboxes are commonplace in the Internet. | |||
case of mobile networks, proxies are frequently used. In the case | For instance, in the case of mobile networks, proxies are | |||
of enterprise networks, it is common to deploy corporate proxies | frequently used. In the case of enterprise networks, it is common | |||
for filtering and firewalling. In the case of satellite links, | to deploy corporate proxies for filtering and firewalling. In the | |||
Performance Enhancement Proxies (PEPs) are deployed to mitigate | case of satellite links, Performance Enhancing Proxies (PEPs) are | |||
the effect of the long delay in TCP connection. These proxies | deployed to mitigate the effect of long delays in a TCP | |||
terminate the TCP connection on both ends and prevent the use of | connection. These proxies terminate the TCP connection on both | |||
LBE congestion control in the segment between the proxy and the | ends and prevent the use of LBE congestion control in the segment | |||
sink of the content, the client. By enabling rLEDBAT, clients | between the proxy and the sink of the content, the client. By | |||
would be able to enable LBE traffic between them and the proxy. | enabling rLEDBAT, clients can then enable LBE traffic between them | |||
and the proxy. | ||||
Receiver-defined preferences. It is frequent that the bottleneck | Receiver-defined preferences: | |||
of the communication is the access link. This is particularly | Frequently, the access link is the communication bottleneck. This | |||
true in the case of mobile devices. It is then especially | is particularly true in the case of mobile devices. It is then | |||
relevant for mobile devices to properly manage the capacity of the | especially relevant for mobile devices to properly manage the | |||
access link. With current technologies, it is possible for the | capacity of the access link. With current technologies, it is | |||
mobile device to use different congestion control algorithms | possible for the mobile device to use different congestion control | |||
expressing different preferences for the traffic. For instance, a | algorithms expressing different preferences for the traffic. For | |||
device can choose to use standard-TCP for some traffic and to use | instance, a device can choose to use standard-TCP for some traffic | |||
LEDBAT/LEDBAT++ for other traffic. However, this would only | and use LEDBAT/LEDBAT++ for other traffic. However, this would | |||
affect the outgoing traffic since both standard-TCP and LEDBAT/ | only affect the outgoing traffic, since both standard-TCP and | |||
LEDBAT++ are sender-driven. The mobile device has no means to | LEDBAT/LEDBAT++ are driven by the sender. The mobile device has | |||
manage the traffic in the down-link, which is in most cases, the | no means to manage the traffic in the downlink, which is, in most | |||
communication bottleneck for a typical eye-ball end-user. rLEDBAT | cases, the communication bottleneck for a typical "eyeball" end | |||
enables the mobile device to selectively use LBE traffic class for | user. rLEDBAT enables the mobile device to selectively use an LBE | |||
some of the incoming traffic. For instance, by using rLEDBAT, a | traffic class for some of the incoming traffic. For instance, by | |||
user can use regular standard-TCP/UDP for video stream (e.g., | using rLEDBAT, a user can use regular standard-TCP/UDP for a video | |||
Youtube) and use rLEDBAT for other background file download. | stream (e.g., YouTube) and use rLEDBAT for other background file | |||
downloads. | ||||
3. rLEDBAT mechanisms | 4. rLEDBAT Mechanisms | |||
rLEDBAT provides the mechanisms to implement an LBE congestion | rLEDBAT provides the mechanisms to implement an LBE congestion | |||
control algorithm at the receiver-end of a TCP connection. The | control algorithm at the receiver end of a TCP connection. The | |||
rLEDBAT receiver controls the sender's rate through the Receive | rLEDBAT receiver controls the sender's rate through the Receive | |||
Window announced by the receiver in the TCP header. | Window announced by the receiver in the TCP header. | |||
rLEDBAT assumes that the sender is a standard TCP sender. rLEDBAT | rLEDBAT assumes that the sender is a standard TCP sender. rLEDBAT | |||
does not require any rLEDBAT-specific modifications to the TCP | does not require any rLEDBAT-specific modifications to the TCP | |||
sender. The envisioned deployment model for rLEDBAT is that the | sender. The envisioned deployment model for rLEDBAT is that the | |||
clients implement rLEDBAT and this enables rLEDBAT in communications | clients implement rLEDBAT and this enables rLEDBAT in communications | |||
with existent standard TCP senders. In particular, the sender MUST | with existing standard TCP senders. In particular, the sender MUST | |||
implement [RFC9293] and it also MUST implement the Time Stamp Option | implement [RFC9293] and also MUST implement the TCP Timestamps (TS) | |||
as defined in [RFC7323]. Also, the sender should implement some of | option as defined in [RFC7323]. Also, the sender should implement | |||
the standard congestion control mechanisms, such as Cubic [RFC9438] | some of the standard congestion control mechanisms, such as CUBIC | |||
or New Reno [RFC5681]. | [RFC9438] or NewReno [RFC5681]. | |||
rLEDBAT does not define a new congestion control algorithm. The LBE | rLEDBAT does not define a new congestion control algorithm. The LBE | |||
congestion control algorithm executed in the rLEDBAT receiver is | congestion control algorithm executed in the rLEDBAT receiver is | |||
defined in other documents. The rLEDBAT receiver MUST use an LBE | defined in other documents. The rLEDBAT receiver MUST use an LBE | |||
congestion control algorithm. Because rLEDBAT assumes a standard TCP | congestion control algorithm. Because rLEDBAT assumes a standard TCP | |||
sender, the sender will be using a "best effort" congestion control | sender, the sender will be using a "best effort" congestion control | |||
algorithm (such as Cubic or New Reno). Since rLEDBAT uses the | algorithm (such as CUBIC or NewReno). Since rLEDBAT uses the Receive | |||
Receive Window to control the sender's rate and the sender calculates | Window to control the sender's rate and the sender calculates the | |||
the sender's window as the minimum of the Receive window and the | sender's window as the minimum of the Receive window and the | |||
congestion window, rLEDBAT will only be effective as long as the | congestion window, rLEDBAT will only be effective as long as the | |||
congestion control algorithm executed in the receiver yields a | congestion control algorithm executed in the receiver yields a | |||
smaller window than the one calculated by the sender. This is | smaller window than the one calculated by the sender. This is | |||
normally the case when the receiver is using an LBE congestion | normally the case when the receiver is using an LBE congestion | |||
control algorithm. The rLEDBAT receiver SHOULD use the LEDBAT | control algorithm. The rLEDBAT receiver SHOULD use the LEDBAT | |||
congestion control algorithm [RFC6817] or the LEDBAT++ congestion | congestion control algorithm [RFC6817] or the LEDBAT++ congestion | |||
control algorithm [I-D.irtf-iccrg-ledbat-plus-plus]. The rLEDBAT MAY | control algorithm [LEDBAT++]. The rLEDBAT MAY use other LBE | |||
use other LBE congestion control algorithms defined elsewhere. | congestion control algorithms defined elsewhere. Irrespective of | |||
Irrespective of which congestion control algorithm is executed in the | which congestion control algorithm is executed in the receiver, an | |||
receiver, an rLEDBAT connection will never be more aggressive than | rLEDBAT connection will never be more aggressive than standard-TCP, | |||
standard-TCP since it is always bounded by the congestion control | since it is always bounded by the congestion control algorithm | |||
algorithm executed at the sender. | executed at the sender. | |||
rLEDBAT is essentially composed of three types of mechanisms, namely, | rLEDBAT is essentially composed of three types of mechanisms, namely | |||
those that provide the means to measure the packet delay (either the | those that provide the means to measure the packet delay (either the | |||
round trip time or the one way delay, depending on the selected | RTT or the one-way delay, depending on the selected algorithm), | |||
algorithm), mechanisms to detect packet loss and the means to | mechanisms to detect packet loss, and the means to manipulate the | |||
manipulate the Receive Window to control the sender's rate. The | Receive Window to control the sender's rate. The first two provide | |||
former provide input to the LBE congestion control algorithm while | input to the LBE congestion control algorithm, while the third uses | |||
the latter uses the congestion window computed by the LBE congestion | the congestion window computed by the LBE congestion control | |||
control algorithm to manipulate the Receive window, as depicted in | algorithm to manipulate the Receive window, as depicted in Figure 1. | |||
the figure. | ||||
+------------------------------------------+ | +------------------------------------------+ | |||
| TCP receiver | | | TCP Receiver | | |||
| +-----------------+ | | | +-----------------+ | | |||
| | +------------+ | | | | | +------------+ | | | |||
| +---------------------| RTT | | | | | +---------------------| RTT | | | | |||
| | | | Estimation | | | | | | | | Estimation | | | | |||
| | | +------------+ | | | | | | +------------+ | | | |||
| | | | | | | | | | | | |||
| | | +------------+ | | | | | | +------------+ | | | |||
| | +--------------| Loss, RTX | | | | | | +--------------| Loss, RTX | | | | |||
| | | | | Detection | | | | | | | | | Detection | | | | |||
| | | | +------------+ | | | | | | | +------------+ | | | |||
| v v | | | | | v v | | | | |||
| +----------------+ | | | | | +----------------+ | | | | |||
| | LBE Congestion | | rLEDBAT | | | | | LBE Congestion | | rLEDBAT | | | |||
| | Control | | | | | | | Control | | | | | |||
| +----------------+ | | | | | +----------------+ | | | | |||
| | | +------------+ | | | | | | +------------+ | | | |||
| | | | RCV-WND | | | | | | | | RCV.WND | | | | |||
| +---------------->| Control | | | | | +---------------->| Control | | | | |||
| | +------------+ | | | | | +------------+ | | | |||
| +-----------------+ | | | +-----------------+ | | |||
+------------------------------------------+ | +------------------------------------------+ | |||
Figure 1: The rLEDBAT architecture. | Figure 1: The rLEDBAT Architecture | |||
We describe each of the rLEDBAT components next. | We next describe each of the rLEDBAT components. | |||
3.1. Controlling the receive window | 4.1. Controlling the Receive Window | |||
rLEDBAT uses the Receive Window (RCV.WND) of TCP to enable the | rLEDBAT uses the TCP Receive Window (RCV.WND) to enable the receiver | |||
receiver to control the sender's rate. [RFC9293] defines that the | to control the sender's rate. [RFC9293] specifies that the RCV.WND | |||
RCV.WND is used to announce the available receive buffer to the | is used to announce the available receive buffer to the sender for | |||
sender for flow control purposes. In order to avoid confusion, we | flow control purposes. In order to avoid confusion, we will call | |||
will call fcwnd the value that a standard RFC793bis TCP receiver | fcwnd the value that a standard RFC793bis TCP receiver calculates to | |||
calculates to set in the receive window for flow control purposes. | set in the receive window for flow control purposes. We call RLWND | |||
We call RLWND the window value calculated by rLEDBAT algorithm and we | the window value calculated by the rLEDBAT algorithm, and we call | |||
call RCV.WND the value actually included in the Receive Window field | RCV.WND the value actually included in the Receive Window field of | |||
of the TCP header. For a RFC793bis receiver, RCV.WND == fcwnd. | the TCP header. For an RFC793bis receiver, RCV.WND == fcwnd. | |||
In the case of rLEDBAT receiver, the rLEDBAT receiver MUST NOT set | In the case of the rLEDBAT receiver, this receiver MUST NOT set the | |||
the RCV.WND to a value larger than fcwnd and it SHOULD set the | RCV.WND to a value larger than fcwnd and SHOULD set the RCV.WND to | |||
RCV.WND to the minimum of RLWND and fcwnd, honoring both. | the minimum of RLWND and fcwnd, honoring both. | |||
When using rLEDBAT, two congestion controllers are in action in the | When using rLEDBAT, two congestion controllers are in action in the | |||
flow of data from the sender to the receiver, namely, the congestion | flow of data from the sender to the receiver, namely the TCP | |||
control algorithm of TCP in the sender side and the LBE congestion | congestion control algorithm on the sender side and the LBE | |||
control algorithm executed in the receiver and conveyed to the sender | congestion control algorithm executed in the receiver and conveyed to | |||
through the RCV.WND. In the normal TCP operation, the sender uses | the sender through the RCV.WND. In the normal TCP operation, the | |||
the minimum of the congestion window cwnd and the receiver window | sender uses the minimum of the cwnd and the RCV.WND to calculate the | |||
RCV.WND to calculate the sender's window SND.WND. This is also true | SND.WND. This is also true for rLEDBAT, as the sender is a regular | |||
for rLEDBAT, as the sender is a regular TCP sender. This guarantees | TCP sender. This guarantees that the rLEDBAT flow will never | |||
that the rLEDBAT flow will never transmit more aggressively than a | transmit more aggressively than a standard-TCP flow, as the sender's | |||
standard-TCP flow, as the sender's congestion window limits the | congestion window limits the sending rate. Moreover, because an LBE | |||
sending rate. Moreover, because a LBE congestion control algorithm | congestion control algorithm such as LEDBAT/LEDBAT++ is designed to | |||
such as LEDBAT/LEDBAT++ is designed to react earlier and more | react earlier and more aggressively to congestion than regular TCP | |||
aggressively to congestion than regular TCP congestion control, the | congestion control, the RLWND contained in the TCP RCV.WND field will | |||
RLWND contained in the RCV.WND field of TCP will be in general | generally be smaller than the congestion window calculated by the TCP | |||
smaller than the congestion window calculated by the TCP sender, | sender, implying that the rLEDBAT congestion control algorithm will | |||
implying that the rLEDBAT congestion control algorithm will be | be effectively controlling the sender's window. One exception to | |||
effectively controlling the sender's window. One exception to this | this scenario is that at the beginning of the connection, when there | |||
is at the beginning of the connection, when there is no information | is no information to set RLWND, RLWND is set to its maximum value, so | |||
to set RLWND, then, RLWND is set to its maximum value, so that the | that the sending rate of the sender is governed by the flow control | |||
sending rate of the sender is governed by the flow control algorithm | algorithm of the receiver and the TCP slow start mechanism of the | |||
of the receiver and the TCP slow start mechanism of the sender. | sender. | |||
In summary, the sender's window is: SND.WND = min(cwnd, RLWND, fcwnd) | In summary, the sender's window is SND.WND = min(cwnd, RLWND, fcwnd) | |||
3.1.1. Avoiding window shrinking | 4.1.1. Avoiding Window Shrinking | |||
The LEDBAT/LEDBAT++ algorithm executed in a rLEDBAT receiver | The LEDBAT/LEDBAT++ algorithm executed in a rLEDBAT receiver | |||
increases or decreases RLWND according to congestion signals | increases or decreases RLWND according to congestion signals | |||
(variations on the estimated queueing delay and packet loss). If | (variations in the estimated queuing delay and packet loss). If | |||
RLWND is decreased and directly announced in RCV.WND, this could lead | RLWND is decreased and directly announced in RCV.WND, this could lead | |||
to an announced window that is smaller than what is currently in use. | to an announced window that is smaller than what is currently in use. | |||
This so called 'shrinking the window' is discouraged as per | This so-called "shrinking the window" is discouraged as per | |||
[RFC9293], as it may cause unnecessary packet loss and performance | [RFC9293], as it may cause unnecessary packet loss and performance | |||
penalty. To be consistent with [RFC9293], the rLEDBAT receiver | penalties. To be consistent with [RFC9293], the rLEDBAT receiver | |||
SHOULD NOT shrink the receive window. | SHOULD NOT shrink the receive window. | |||
In order to avoid window shrinking, the receiver MUST only reduce | In order to avoid window shrinking, the receiver MUST only reduce | |||
RCV.WND by the number of bytes upon of a received data packet. This | RCV.WND by the number of bytes upon of a received data packet. This | |||
may fall short to honor the new calculated value of the RLWND | may fall short to honor the new calculated value of the RLWND | |||
immediately. However, the receiver SHOULD progressively reduce the | immediately. However, the receiver SHOULD progressively reduce the | |||
advertised RCV.WND, always honoring that the reduction is less or | advertised RCV.WND, always honoring that the reduction is less than | |||
equal than the received bytes, until the target window determined by | or equal to the received bytes, until the target window determined by | |||
the rLEDBAT algorithm is reached. This implies that it may take up | the rLEDBAT algorithm is reached. This implies that it may take up | |||
to one RTT for the rLEDBAT receiver to drain enough in-flight bytes | to one RTT for the rLEDBAT receiver to drain enough in-flight bytes | |||
to completely close its receive window without shrinking it. This is | to completely close its receive window without shrinking it. This is | |||
sufficient to honor the window output from the LEDBAT/LEDBAT++ | sufficient to honor the window output from the LEDBAT/LEDBAT++ | |||
algorithms since they only allow to perform at most one | algorithms, since they only allow to perform at most one | |||
multiplicative decrease per RTT. | multiplicative decrease per RTT. | |||
3.1.2. Setting the Window Scale Option | 4.1.2. Setting the Window Scale Option | |||
The Window Scale (WS) option [RFC7323] is a means to increase the | The Window Scale (WS) option [RFC7323] is a means to increase the | |||
maximum window size permitted by the Receive Window. The WS option | maximum window size permitted by the Receive Window. The WS option | |||
defines a scale factor which restricts the granularity of the receive | defines a scale factor that restricts the granularity of the receive | |||
window that can be announced. This means that the rLEDBAT client | window that can be announced. This means that the rLEDBAT client | |||
will have to accumulate the increases resulting from multiple | will have to accumulate the increases resulting from multiple | |||
received packets, and only convey a change in the window when the | received packets and only convey a change in the window when the | |||
accumulated sum of increases is equal or higher than one increase | accumulated sum of increases is equal to or higher than one increase | |||
step as imposed by the scaling factor according to the WS option in | step as imposed by the scaling factor according to the WS option in | |||
place for the TCP connection. | place for the TCP connection. | |||
Changes in the receive window that are smaller than 1 MSS are | Changes in the receive window that are smaller than 1 MSS (Maximum | |||
unlikely to have any immediate impact on the sender's rate, as usual | Segment Size) are unlikely to have any immediate impact on the | |||
TCP's segmentation practice results in sending full segments (i.e., | sender's rate. As usual, TCP's segmentation practice results in | |||
segments of size equal to the MSS). Current WS option specification | sending full segments (i.e., segments of size equal to the MSS). | |||
[RFC7323] defines that allowed values for the WS option are between 0 | [RFC7323], which defines the WS option, specifies that allowed values | |||
and 14. Assuming a MSS around 1500 bytes, WS option values between 0 | for the WS option are between 0 and 14. Assuming an MSS of around | |||
and 11 result in the receive window being expressed in units that are | 1500 bytes, WS option values between 0 and 11 result in the receive | |||
about 1 MSS or smaller. So, WS option values between 0 and 11 have | window being expressed in units that are about 1 MSS or smaller. So, | |||
no impact in rLEDBAT (unless packets smaller than the MSS are being | WS option values between 0 and 11 have no impact in rLEDBAT (unless | |||
exchanged). | packets smaller than the MSS are being exchanged). | |||
WS option values higher than 11 can affect the dynamics of rLEDBAT, | WS option values higher than 11 can affect the dynamics of rLEDBAT, | |||
since control may become too coarse (e.g., with WS of 14, a change in | since control may become too coarse (e.g., with a WS option value of | |||
one unit of the receive window implies a change of 10 MSS in the | 14, a change in one unit of the receive window implies a change of 10 | |||
effective window). | MSS in the effective window). | |||
For the above reasons, the rLEDBAT client SHOULD set WS option values | For the above reasons, the rLEDBAT client SHOULD set WS option values | |||
lower than 12. Additional experimentation is required to explore the | lower than 12. Additional experimentation is required to explore the | |||
impact of larger WS values on rLEDBAT dynamics. | impact of larger WS values on rLEDBAT dynamics. | |||
Note that the recommendation for rLEDBAT to set the WS option value | Note that the recommendation for rLEDBAT to set the WS option values | |||
to lower values does not precludes the communication with servers | to lower values does not preclude communication with servers that set | |||
that set the WS option values to larger values, since the WS option | the WS option values to larger values, since WS option values are set | |||
value is set independently for each direction of the TCP connection. | independently for each direction of the TCP connection. | |||
3.2. Measuring delays | 4.2. Measuring Delays | |||
Both LEDBAT and LEDBAT++ measure base and current delays to estimate | Both LEDBAT and LEDBAT++ measure base and current delays to estimate | |||
the queueing delay. LEDBAT uses the one way delay while LEDBAT++ | the queuing delay. LEDBAT uses the one-way delay, while LEDBAT++ | |||
uses the round trip time. In the next sections we describe how | uses the RTT. In the next sections, we describe how rLEDBAT | |||
rLEDBAT mechanisms enable the receiver to measure the one way delay | mechanisms enable the receiver to measure the one-way delay or the | |||
or the round trip time, whatever is needed depending on the | RTT -- whichever is needed, depending on the congestion control | |||
congestion control algorithm used. | algorithm used. | |||
3.2.1. Measuring RTT to estimate the queueing delay | 4.2.1. Measuring RTT to Estimate the Queuing Delay | |||
LEDBAT++ uses the round trip time (RTT) to estimate the queueing | LEDBAT++ uses the RTT to estimate the queuing delay. In order to | |||
delay. In order to estimate the queueing delay using RTT, the | estimate the queuing delay using RTT, the rLEDBAT receiver estimates | |||
rLEDBAT receiver estimates the base RTT (i.e., the constant | the base RTT (i.e., the constant components of RTT) and also measures | |||
components of RTT) and also measures the current RTT. By subtracting | the current RTT. By subtracting these two values, we obtain the | |||
these two values, we obtain the queuing delay to be used by the | queuing delay to be used by the rLEDBAT controller. | |||
rLEDBAT controller. | ||||
LEDBAT++ discovers the base RTT (RTTb) by taking the minimum value of | LEDBAT++ discovers the base RTT (RTTb) by taking the minimum value of | |||
the measured RTTs over a period of time. The current RTT (RTTc) is | the measured RTTs over a period of time. The current RTT (RTTc) is | |||
estimated using a number of recent samples and applying a filter, | estimated using a number of recent samples and applying a filter, | |||
such as the minimum (or the mean) of the last k samples. Using RTT | such as the minimum (or the mean) of the last k samples. Using RTT | |||
to estimate the queueing delay has a number of shortcomings and | to estimate the queuing delay has a number of shortcomings and | |||
difficulties that we discuss next. | difficulties, as discussed below. | |||
The queuing delay measured using RTT includes also the queueing delay | The queuing delay measured using RTT also includes the queuing delay | |||
experienced by the return packets in the direction from the rLEDBAT | experienced by the return packets in the direction from the rLEDBAT | |||
receiver to the sender. This is a fundamental limitation of this | receiver to the sender. This is a fundamental limitation of this | |||
approach. The impact of this error is that the rLEDBAT controller | approach. The impact of this error is that the rLEDBAT controller | |||
will also react to congestion in the reverse path direction which | will also react to congestion in the reverse path direction, | |||
results in an even more conservative mechanism. | resulting in an even more conservative mechanism. | |||
In order to measure RTT, the rLEDBAT client MUST enable the Time | In order to measure RTT, the rLEDBAT client MUST enable the TS option | |||
Stamp (TS) option [RFC7323]. By matching the TSVal value carried in | [RFC7323]. By matching the TSval value carried in outgoing packets | |||
outgoing packets with the TSecr value observed in incoming packets, | with the Timestamp Echo Reply (TSecr) value [RFC7323] observed in | |||
it is possible to measure RTT. This allows the rLEDBAT receiver to | incoming packets, it is possible to measure RTT. This allows the | |||
measure RTT even if it is acting as a pure receiver. In a pure | rLEDBAT receiver to measure RTT even if it is acting as a pure | |||
receiver there is no data flowing from the rLEDBAT receiver to the | receiver. In a pure receiver, there is no data flowing from the | |||
sender, making impossible to match data packets with acknowledgements | rLEDBAT receiver to the sender, making it impossible to match data | |||
packets to measure RTT, as it is usually done in TCP for other | packets with Acknowledgment packets to measure RTT, as it is usually | |||
purposes. | done in TCP for other purposes. | |||
Depending on the frequency of the local clock used to generate the | Depending on the frequency of the local clock used to generate the | |||
values included in the TS option, several packets may carry the same | values included in the TS option, several packets may carry the same | |||
TSVal value. If that happens, the rLEDBAT receiver will be unable to | TSval value. If that happens, the rLEDBAT receiver will be unable to | |||
match the different outgoing packets carrying the same TSVal value | match the different outgoing packets carrying the same TSval value | |||
with the different incoming packets carrying also the same TSecr | with the different incoming packets also carrying the same TSecr | |||
value. However, it is not necessary for rLEDBAT to use all packets | value. However, it is not necessary for rLEDBAT to use all packets | |||
to estimate RTT and sampling a subset of in-flight packets per RTT is | to estimate RTT, and sampling a subset of in-flight packets per RTT | |||
enough to properly assess the queueing delay. RTT MUST then be | is enough to properly assess the queuing delay. RTT MUST then be | |||
calculated as the time since the first packet with a given TSVal was | calculated as the time since the first packet with a given TSval was | |||
sent and the first packet that was received with the same value | sent and the first packet that was received with the same value | |||
contained in the TSecr. Other packets with repeated TS values SHOULD | contained in the TSecr. Other packets with repeated TS values SHOULD | |||
NOT be used for RTT calculation. | NOT be used for RTT calculations. | |||
Several issues must be addressed in order to avoid an artificial | Several issues must be addressed in order to avoid an artificial | |||
increase of the observed RTT. Different issues emerge depending | increase in the observed RTT. Different issues emerge, depending on | |||
whether the rLEDBAT capable host is sending data packets or pure ACKs | whether the rLEDBAT-capable host is sending data packets or pure ACKs | |||
to measure RTT. We next consider the issues separately. | to measure RTT. We next consider these issues separately. | |||
3.2.1.1. Measuring RTT sending pure ACKs | 4.2.1.1. Measuring RTT When Sending Pure ACKs | |||
In this scenario, the rLEDBAT node (node A) sends a pure ACK to the | In this scenario, the rLEDBAT node (node A) sends a pure ACK to the | |||
other endpoint of the TCP connection (node B), including the TS | other endpoint of the TCP connection (node B), including the TS | |||
option. Upon the reception of the TS Option, host B will copy the | option. Upon the reception of the TS option, host B will copy the | |||
value of the TSVal into the TSecr field of the TS option and include | value of the TSval into the TSecr field of the TS option and include | |||
that option into the next data packet towards host A. However, there | that option in the next data packet towards host A. However, there | |||
are two reasons why B may not send a packet immediately back to A, | are two reasons why B may not send a packet immediately back to A, | |||
artificially increasing the measured RTT. The first reason is when A | artificially increasing the measured RTT. The first reason is when A | |||
has no data to send. The second is when A has no available window to | has no data to send. The second is when A has no available window to | |||
put more packets in-flight. We describe next how each of these cases | put more packets in flight. We next describe how each of these cases | |||
is addressed. | is addressed. | |||
The case where the host B has no data to send when it receives the | The case where host B has no data to send when it receives the pure | |||
pure Acknowledgement is expected to be rare in the rLEDBAT use cases. | Acknowledgment is expected to be rare in the rLEDBAT use | |||
rLEDBAT will be used mostly for background file transfers so the | cases. rLEDBAT will be used mostly for background file transfers, so | |||
expected common case is that the sender will have data to send | the expected common case is that the sender will have data to send | |||
throughout the lifetime of the communication. However, if, for | throughout the lifetime of the communication. However, if, for | |||
example, the file is structured in blocks of data, it may be the case | example, the file is structured in blocks of data, it may be the case | |||
that the sender seldomly will have to wait until the next block is | that the sender will seldom have to wait until the next block is | |||
available to proceed with the data transfer. To address this | available to proceed with the data transfer. To address this | |||
situation, the filter used by the congestion control algorithm | situation, the filter used by the congestion control algorithm | |||
executed in the receiver SHOULD discard outliers (e.g. a min filter | executed in the receiver SHOULD discard outliers (e.g., a MIN filter | |||
would achieve this) when measuring RTT using pure ACK packets. | [RFC6817] would achieve this) when measuring RTT using pure ACK | |||
packets. | ||||
This limitation of the sender's window can come either from the TCP | This limitation of the sender's window can come from either the TCP | |||
congestion window in host B or from the announced receive window from | congestion window in host B or the announced receive window from the | |||
the rLEDBAT in host A. Normally, the receive window will be the one | rLEDBAT in host A. Normally, the receive window will be the one to | |||
to limit the sender's transmission rate, since the LBE congestion | limit the sender's transmission rate, since the LBE congestion | |||
control algorithm used by the rLEDBAT node is designed to be more | control algorithm used by the rLEDBAT node is designed to be more | |||
restrictive on the sender's rate than standard-TCP. If the limiting | restrictive on the sender's rate than standard-TCP. If the limiting | |||
factor is the congestion window in the sender, it is less relevant if | factor is the congestion window in the sender, it is less relevant if | |||
rLEDBAT further reduces the receive window due to a bloated RTT | rLEDBAT further reduces the receive window due to a bloated RTT | |||
measurement, since the rLEDBAT node is not actively controlling the | measurement, since the rLEDBAT node is not actively controlling the | |||
sender's rate. Nevertheless, the proposed approach to discard larger | sender's rate. Nevertheless, the proposed approach to discard larger | |||
samples would also address this issue. | samples would also address this issue. | |||
To address the case in which the limiting factor is the receive | To address the case in which the limiting factor is the receive | |||
window announced by rLEDBAT, the congestion control algorithm at the | window announced by rLEDBAT, the congestion control algorithm at the | |||
receiver SHOULD discard RTT measurements during the window reduction | receiver SHOULD discard RTT measurements during the window reduction | |||
phase that are triggered by pure ACK packets. The rLEDBAT receiver | phase that are triggered by pure ACK packets. The rLEDBAT receiver | |||
is aware whether a given TSVal value was sent in a pure ACK packet | is aware of whether a given TSval value was sent in a pure ACK packet | |||
where the window was reduced, and if so, it can discard the | where the window was reduced, and if so, it can discard the | |||
corresponding RTT measurement. | corresponding RTT measurement. | |||
3.2.1.2. Measuring RTT when sending data packets | 4.2.1.2. Measuring RTT When Sending Data Packets | |||
In the case that the rLEDBAT node is sending data packets and | In the case that the rLEDBAT node is sending data packets and | |||
matching them with pure ACKs to measure RTT, a factor that can | matching them with pure ACKs to measure RTT, a factor that can | |||
artificially increase the RTT measured is the presence of delayed | artificially increase the RTT measured is the presence of delayed | |||
Acknowledgements. According to the TS option generation rules | Acknowledgments. According to the TS option generation rules | |||
[RFC7323], the value included in the TSecr for a delayed ACK is the | [RFC7323], the value included in the TSecr for a delayed ACK is the | |||
one in the TSVal field of the earliest unacknowledged segment. This | one in the TSval field of the earliest unacknowledged segment. This | |||
may artificially increase the measured RTT. | may artificially increase the measured RTT. | |||
If both endpoints of the connection are sending data packets, | If both endpoints of the connection are sending data packets, | |||
Acknowledgments are piggybacked into the data packets and they are | Acknowledgments are piggybacked onto the data packets and they are | |||
not delayed. Delayed ACKs only increase RTT measurements in the case | not delayed. Delayed ACKs only increase RTT measurements in the case | |||
that the sender has no data to send. Since the expected use case for | that the sender has no data to send. Since the expected use case for | |||
rLEDBAT is that the sender will be sending background traffic to the | rLEDBAT is that the sender will be sending background traffic to the | |||
rLEDBAT receiver, the cases where delayed ACKs increase the measured | rLEDBAT receiver, the cases where delayed ACKs increase the measured | |||
RTT are expected to be rare. | RTT are expected to be rare. | |||
Nevertheless, measurements based on data packets from the rLEDBAT | Nevertheless, measurements based on data packets from the rLEDBAT | |||
node matching pure ACKs from the other end will result in an | node matching pure ACKs from the other end will result in an | |||
increased RTT sample. The additional increase in the measured RTT | increased RTT sample. The additional increase in the measured RTT | |||
will be up to 500 ms. The reason for this is that delayed ACKs are | will be up to 500 ms. This is because delayed ACKs are generated | |||
generated every second data packet received and not delayed more than | every second data packet received and not delayed more than 500 ms | |||
500 ms according to [RFC9293]. The rLEDBAT receiver MAY discard RTT | according to [RFC9293]. The rLEDBAT receiver MAY discard RTT | |||
measurements done using data packets from the rLEBDAT receiver and | measurements done using data packets from the rLEDBAT receiver and | |||
matching pure ACKs, especially if it has recent measurements done | matching pure ACKs, especially if it has recent measurements done | |||
using other packet combinations. Also, applying a filter that | using other packet combinations. Applying a filter (e.g., a MIN | |||
discards outliers would also address this issue (e.g. a min filter). | filter) that discards outliers would also address this issue. | |||
3.2.2. Measuring one way delay to estimate the queueing delay | 4.2.2. Measuring One-Way Delay to Estimate the Queuing Delay | |||
The LEDBAT algorithm uses the one-way delay of packets as input. A | The LEDBAT algorithm uses the one-way delay of packets as input. A | |||
TCP receiver can measure the delay of incoming packets directly (as | TCP receiver can measure the delay of incoming packets directly (as | |||
opposed to the sender-based LEDBAT, where the receiver measures the | opposed to the sender-based LEDBAT, where the receiver measures the | |||
one-way delay and needs to convey it to the sender). | one-way delay and needs to convey it to the sender). | |||
In the case of TCP, the receiver can use the TimeStamp option to | In the case of TCP, the receiver can use the TS option to measure the | |||
measure the one way delay by subtracting the timestamp contained in | one-way delay by subtracting the timestamp contained in the incoming | |||
the incoming packet from the local time at which the packet has | packet from the local time at which the packet has arrived. As noted | |||
arrived. As noted in [RFC6817] the clock offset between the clock of | in [RFC6817], the clock offset between the sender's clock and the | |||
the sender and the clock in the receiver does not affect the LEDBAT | receiver's clock does not affect the LEDBAT operation, since LEDBAT | |||
operation, since LEDBAT uses the difference between the base one way | uses the difference between the base one-way delay and the current | |||
delay and the current one way delay to estimate the queuing delay, | one-way delay to estimate the queuing delay, effectively "canceling | |||
effectively canceling the clock offset error in the queueing delay | out" the clock offset error in the queuing delay estimation. There | |||
estimation. There are however two other issues that the rLEDBAT | are, however, two other issues that the rLEDBAT receiver needs to | |||
receiver needs to take into account in order to properly estimate the | take into account in order to properly estimate the one-way delay, | |||
one way delay, namely, the units in which the received timestamps are | namely the units in which the received timestamps are expressed and | |||
expressed and the clock skew. We address them next. | the clock skew. These issues are addressed below. | |||
In order to measure the one way delay using TCP timestamps, the | In order to measure the one-way delay using TCP timestamps, the | |||
rLEDBAT receiver, first, needs to discover the units of values in the | rLEDBAT receiver first needs to discover the units of values in the | |||
TS option and, second, needs to account for the skew between the two | TS option and then needs to account for the skew between the two | |||
endpoint clocks. Note that a mismatch of 100 ppm (parts per million) | endpoint clocks. Note that a mismatch of 100 ppm (parts per million) | |||
in the estimation of the sender's clock rate accounts for 6 ms of | in the estimation of the sender's clock rate accounts for 6 ms of | |||
variation per minute in the measured delay. This just one order of | variation per minute in the measured delay. This is just one order | |||
magnitude below the target delay set by rLEDBAT (or potentially more | of magnitude below the target delay set by rLEDBAT (or potentially | |||
if the target is set to lower values, which is possible). Typical | more if the target is set to lower values, which is possible). | |||
skew for untrained clocks is reported to be around 100-200 ppm | Typical skew for untrained clocks is reported to be around 100-200 | |||
[RFC6817]. | ppm [RFC6817]. | |||
In order to learn both the TS units and the clock skew, the rLEDBAT | In order to learn both the TS units and the clock skew, the rLEDBAT | |||
receiver measures how much local time has elapsed between two packets | receiver measures how much local time has elapsed between two packets | |||
with different TS values issued by the sender. By comparing the | with different TS values issued by the sender. By comparing the | |||
local time difference and the TS value difference, the receiver can | local time difference and the TS value difference, the receiver can | |||
assess the TS units and relative clock skews. In order for this to | assess the TS units and relative clock skews. In order for this to | |||
be accurate, the packets carrying the different TS values should | be accurate, the packets carrying the different TS values should | |||
experience equal (or at least similar delay) when traveling from the | experience equal (or at least similar) delay when traveling from the | |||
sender to the receiver, as any difference in the experienced delays | sender to the receiver, as any difference in the experienced delays | |||
would introduce error in the unit/skew estimation. One possible | would introduce an error in the unit/skew estimation. One possible | |||
approach is to select packets that experienced the minimum delay | approach is to select packets that experienced minimal delay (i.e., | |||
(i.e. close to zero queueing delay) to make the estimations. | queuing delay close to zero) to make the estimations. | |||
An additional difficulty regarding the estimation of the TS units and | An additional difficulty regarding the estimation of the TS units and | |||
clock skew in the context of (r)LEDBAT is that the LEDBAT congestion | clock skew in the context of (r)LEDBAT is that the LEDBAT congestion | |||
controller actions directly affect the (queueing) delay experienced | controller actions directly affect the (queuing) delay experienced by | |||
by packets. In particular, if there is an error in the estimation of | packets. In particular, if there is an error in the estimation of | |||
the TS units/skew, the LEDBAT controller will attempt to compensate | the TS units/skew, the LEDBAT controller will attempt to compensate | |||
it by reducing/increasing the load. The result is that the LEDBAT | for it by reducing/increasing the load. The result is that the | |||
operation interferes with the TS units/clock skew measurements. | LEDBAT operation interferes with the TS units/clock skew | |||
Because of this, measurements are more accurate when there is no | measurements. Because of this, measurements are more accurate when | |||
traffic in the connection (in addition to the packets used for the | there is no traffic in the connection (in addition to the packets | |||
measurements). The problem is that the receiver is unaware if the | used for the measurements). The problem is that the receiver is | |||
sender is injecting traffic at any point in time, and so, it is | unaware if the sender is injecting traffic at any point in time, and | |||
unable to use these quiet intervals to perform measurements. The | so, it is unable to use these quiet intervals to perform | |||
receiver can however, force periodic slowdowns, reducing the | measurements. The receiver can, however, force periodic slowdowns, | |||
announced receive window to a few packets and perform the | reducing the announced receive window to a few packets and perform | |||
measurements then. | the measurements then. | |||
It is possible for the rLEDBAT receiver to perform multiple | It is possible for the rLEDBAT receiver to perform multiple | |||
measurements to assess both the TS units and the relative clock skew | measurements to assess both the TS units and the relative clock skew | |||
during the lifetime of the connection, in order to obtain more | during the lifetime of the connection, in order to obtain more | |||
accurate results. Clock skew measurements are more accurate if the | accurate results. Clock skew measurements are more accurate if the | |||
time period used to discover the skew is larger, as the impact of the | time period used to discover the skew is larger, as the impact of the | |||
skew becomes more apparent. It is a reasonable approach for the | skew becomes more apparent. It is a reasonable approach for the | |||
rLEDBAT receiver to perform an early discovery of the TS units (and | rLEDBAT receiver to perform an early discovery of the TS units (and | |||
the clock skew) using the first few packets of the TCP connection and | the clock skew) using the first few packets of the TCP connection and | |||
then improve the accuracy of the TS units/clock skew estimation using | then improve the accuracy of the TS units/clock skew estimation using | |||
periodic measurements later in the lifetime of the connection. | periodic measurements later in the lifetime of the connection. | |||
3.3. Detecting packet losses and retransmissions | 4.3. Detecting Packet Losses and Retransmissions | |||
The rLEDBAT receiver is capable of detecting retransmitted packets in | The rLEDBAT receiver is capable of detecting retransmitted packets as | |||
the following way. We call RCV.HGH the highest sequence number | follows. We call RCV.HGH the highest sequence number corresponding | |||
corresponding to a received byte of data (not assuming that all bytes | to a received byte of data (not assuming that all bytes with smaller | |||
with smaller sequence numbers have been received already, there may | sequence numbers have been received already, there may be holes), and | |||
be holes) and we call TSV.HGH the TSVal value corresponding to the | we call TSV.HGH the TSval value corresponding to the segment in which | |||
segment in which that byte was carried. SEG.SEQ stands for the | that byte was carried. SEG.SEQ stands for the sequence number of a | |||
sequence number of a newly received segment and we call TSV.SEQ the | newly received segment, and we call TSV.SEQ the TSval value of the | |||
TSVal value of the newly received segment. | newly received segment. | |||
If SEG.SEQ < RCV.HGH and TSV.SEQ > TSV.HGH then the newly received | If SEG.SEQ < RCV.HGH and TSV.SEQ > TSV.HGH, then the newly received | |||
segment is a retransmission. This is so because the newly received | segment is a retransmission. This is so because the newly received | |||
segment was generated later than another already received segment | segment was generated later than another already-received segment | |||
which contained data with a larger sequence number. This means that | that contained data with a larger sequence number. This means that | |||
this segment was lost and was retransmitted. | this segment was lost and was retransmitted. | |||
The proposed mechanism to detect retransmissions at the receiver | The proposed mechanism to detect retransmissions at the receiver | |||
fails when there are window tail drops. If all packets in the tail | fails when there are window tail drops. If all packets in the tail | |||
of the window are lost, the receiver will not be able to detect a | of the window are lost, the receiver will not be able to detect a | |||
mismatch between the sequence numbers of the packets and the order of | mismatch between the sequence numbers of the packets and the order of | |||
the timestamps. In this case, rLEDBAT will not react to losses but | the timestamps. In this case, rLEDBAT will not react to losses but | |||
the TCP congestion controller at the sender will, most likely | the TCP congestion controller at the sender will, most likely | |||
reducing its window to 1MSS and take over the control of the sending | reducing its window to 1 MSS and take over the control of the sending | |||
rate, until slow start ramps up and catches the current value of the | rate, until slow start ramps up and catches the current value of the | |||
rLEDBAT window. | rLEDBAT window. | |||
4. Experiment Considerations | 5. Experiment Considerations | |||
The status of this document is Experimental. The general purpose of | The status of this document is Experimental. The general purpose of | |||
the proposed experiment is to gain more experience running rLEDBAT | the proposed experiment is to gain more experience running rLEDBAT | |||
over different network paths to see if the proposed rLEDBAT | over different network paths to see if the proposed rLEDBAT | |||
parameters perform well in different situations. Specifically, we | parameters perform well in different situations. Specifically, we | |||
would like to learn about the following aspects of the rLEDBAT | would like to learn about the following aspects of the rLEDBAT | |||
mechanism: | mechanism: | |||
- Interaction between the sender and the receiver Congestion | * Interaction between the sender's and receiver's congestion control | |||
control algorithms. rLEDBAT posits that because the rLEDBAT | algorithms. rLEDBAT posits that because the rLEDBAT receiver is | |||
receiver is using a less-than-best-effort congestion control | using a less-than-best-effort congestion control algorithm, the | |||
algorithm, the receiver congestion control algorithm will expose a | receiver's congestion control algorithm will expose a smaller | |||
smaller congestion window (conveyed though the Receive Window) | congestion window (conveyed through the Receive Window) than the | |||
than the one resulting from the congestion control algorithm | one resulting from the congestion control algorithm executed at | |||
executed at the sender. One of the purposes of the experiment is | the sender. One of the purposes of the experiment is to learn how | |||
learn how these two interact and if the assumption that the | these two algorithms interact and if the assumption that the | |||
receiver side is always controlling the sender's rate (and making | receiver side is always controlling the sender's rate (and making | |||
rLEDBAT effective) holds. The experiment should include the | rLEDBAT effective) holds. The experiment should include the | |||
different congestion control algorithms that are currently widely | different congestion control algorithms that are currently widely | |||
used in the Internet, including Cubic, BBR and LEDBAT(++). | used in the Internet, including CUBIC, Bottleneck Bandwidth and | |||
Round-trip propagation time (BBR), and LEDBAT(++). | ||||
- Interaction between rLEDBAT and Active Queue Management | * Interaction between rLEDBAT and Active Queue Management techniques | |||
techniques such as Codel, PIE and L4S. | such as Controlled Delay (CoDel); Proportional Integral controller | |||
Enhanced (PIE); and Low Latency, Low Loss, and Scalable Throughput | ||||
(L4S). | ||||
- How the rLEDBAT should resume after a period during which there | * How the rLEDBAT should resume after a period during which there | |||
was no incoming traffic and the information about the rLEDBAT | was no incoming traffic and the information about the rLEDBAT | |||
state information is potentially dated. | state information is potentially dated. | |||
4.1. Status of the experiment at the time of this writing. | 5.1. Status of the Experiment at the Time of This Writing | |||
Currently there are the following implementations of rLEDBAT that can | Currently, the following implementations of rLEDBAT can be used for | |||
be used for experimentation: | experimentation: | |||
- Windows 11. rLEDBAT is available in Microsoft's Windows 11 22H2 | * Windows 11. rLEDBAT is available in Microsoft's Windows 11 22H2 | |||
since October 2023 [Windows11]. | since October 2023 [Windows11]. | |||
- Windows Server 2022. rLEDBAT is available in Microsoft's Windows | * Windows Server 2022. rLEDBAT is available in Microsoft's Windows | |||
Server 2022 since September 2022 [WindowsServer]. | Server 2022 since September 2022 [WindowsServer]. | |||
- Apple. rLEDBAT is available in MacOS and iOS since 2021 [Apple]. | * Apple. rLEDBAT is available in macOS and iOS since 2021 [Apple]. | |||
- Linux implementation, open source, available since 2022 at | * Linux implementation, open source, available since 2022 at | |||
https://github.com/net-research/rledbat_module. | <https://github.com/net-research/rledbat_module>. | |||
- ns3 implementation, open source, available since 2020 at | * ns3 implementation, open source, available since 2020 at | |||
https://github.com/manas11/implementation-of-rLEDBAT-in-ns-3. | <https://github.com/manas11/implementation-of-rLEDBAT-in-ns-3>. | |||
In addition, rLEDBAT has been deployed by Microsoft in wide scale in | In addition, rLEDBAT has been deployed by Microsoft at wide scale in | |||
the following services: | the following services: | |||
- BITS (Background Intelligent Transfer Service) | * BITS (Background Intelligent Transfer Service) | |||
- DO (Delivery Optimization) service | * DO (Delivery Optimization) service | |||
- Windows update # using DO | * Windows update # using DO | |||
- Windows Store # using DO | ||||
- OneDrive | * Windows Store # using DO | |||
- Windows Error Reporting # wermgr.exe; werfault.exe | * OneDrive | |||
- System Center Configuration Manager (SCCM) | * Windows Error Reporting # wermgr.exe; werfault.exe | |||
- Windows Media Player | * System Center Configuration Manager (SCCM) | |||
- Microsoft Office | * Windows Media Player | |||
- Xbox (download games) # using DO | * Microsoft Office | |||
* Xbox (download games) # using DO | ||||
Some initial experiments involving rLEDBAT have been reported in | Some initial experiments involving rLEDBAT have been reported in | |||
[COMNET3]. Experiments involving the interaction of LEDBAT++ and BBR | [COMNET3]. Experiments involving the interaction between LEDBAT++ | |||
are presented in [COMNET2]. An experimental evaluation of the | and BBR are presented in [COMNET2]. An experimental evaluation of | |||
LEDBAT++ algorithm is presented in [COMNET1]. As LEDBAT++ is one of | the LEDBAT++ algorithm is presented in [COMNET1]. As LEDBAT++ is one | |||
the less-than-best-effort congestion control algorithms that rLEDBAT | of the less-than-best-effort congestion control algorithms that | |||
relies on, the results regarding LEDBAT++ interaction with other | rLEDBAT relies on, the results regarding how LEDBAT++ interacts with | |||
congestion control algorithms are relevant for the understanding of | other congestion control algorithms are relevant for the | |||
rLEDBAT as well. | understanding of rLEDBAT as well. | |||
5. Security Considerations | 6. Security Considerations | |||
Overall, we believe that rLEDBAT does not introduce any new | Overall, we believe that rLEDBAT does not introduce any new | |||
vulnerabilities to existing TCP endpoints, as it relies on existing | vulnerabilities to existing TCP endpoints, as it relies on existing | |||
TCP knobs, notably the Receive Window and timestamps. | TCP knobs, notably the Receive Window and timestamps. | |||
Specifically, rLEDBAT uses RCV.WND to modulate the rate of the | Specifically, rLEDBAT uses RCV.WND to modulate the rate of the | |||
sender. An attacker wishing to starve a flow can simply reduce the | sender. An attacker wishing to starve a flow can simply reduce the | |||
RCV.WND, irrespective of whether rLEDBAT is being used or not. | RCV.WND, irrespective of whether rLEDBAT is being used or not. | |||
We can further ask ourselves whether the attacker can use the rLEDBAT | We can further ask ourselves whether the attacker can use the rLEDBAT | |||
mechanisms in place to force the rLEDBAT receiver to reduce the RCV | mechanisms in place to force the rLEDBAT receiver to reduce the | |||
WND. There are two ways an attacker can do that. One would be to | RCV.WND. There are two ways an attacker can do this: | |||
introduce an artificial delay to the packets either by actually | ||||
delaying the packets or modifying the Timestamps. This would cause | ||||
the rLEDBAT receiver to believe that a queue is building up and | ||||
reduce the RCV.WND. Note that an attacker to do that must be on | ||||
path, so if that is the case, it is probably more direct to simply | ||||
reduce the RCV.WND. | ||||
The other option would be for the attacker to make the rLEDBAT | * One would be to introduce an artificial delay to the packets by | |||
receiver believe that a loss has occurred. To do that, it basically | either actually delaying the packets or modifying the timestamps. | |||
needs to retransmit an old packet (to be precise, it needs to | This would cause the rLEDBAT receiver to believe that a queue is | |||
transmit a packet with the right sequence number and the right port | building up and reduce the RCV.WND. Note that to do so, an | |||
and IP numbers). This means that the attacker can achieve a | attacker must be on path, so if that is the case, it is probably | |||
reduction of incoming traffic to the rLEDBAT receiver not only by | more direct to simply reduce the RCV.WND. | |||
modifying the RCV.WND field of the packets originated from the | ||||
rLEDBAT host, but also by injecting packets with the proper sequence | ||||
number in the other direction. This may slightly expand the attack | ||||
surface. | ||||
6. IANA Considerations | * The other option would be for the attacker to make the rLEDBAT | |||
receiver believe that a loss has occurred. To do this, it | ||||
basically needs to retransmit an old packet (to be precise, it | ||||
needs to transmit a packet with the correct sequence number and | ||||
the correct port and IP numbers). This means that the attacker | ||||
can achieve a reduction of incoming traffic to the rLEDBAT | ||||
receiver not only by modifying the RCV.WND field of the packets | ||||
originated from the rLEDBAT host but also by injecting packets | ||||
with the proper sequence number in the other direction. This may | ||||
slightly expand the attack surface. | ||||
No actions are required from IANA. | 7. IANA Considerations | |||
7. Acknowledgements | This document has no IANA actions. | |||
This work was supported by the EU through the StandICT projects RXQ, | 8. References | |||
CCI and CEL6, the NGI Pointer RIM project and the H2020 5G-RANGE | ||||
project and by the Spanish Ministry of Economy and Competitiveness | ||||
through the 5G-City project (TEC2016-76795-C6-3-R). | ||||
We would like to thank ICCRG chairs Reese Enghardt and Vidhi Goel for | 8.1. Normative References | |||
their support on this work. We would also like to thank Daniel Havey | ||||
for his help. We would like to thank Colin Perkins, Mirja | ||||
Kuehlewind, and Vidhi Goel for their reviews and comments on earlier | ||||
versions of this document. | ||||
8. Informative References | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | ||||
DOI 10.17487/RFC2119, March 1997, | ||||
<https://www.rfc-editor.org/info/rfc2119>. | ||||
[Apple] Stuart, S.C. and V.G. Vidhi, "Reduce network delays for | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
your app", WWDC21 https://developer.apple.com/videos/play/ | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
wwdc2021/10239/, 2021. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[COMNET1] Bagnulo, M.B. and A.G. Garcia-Martinez, "An experimental | 8.2. Informative References | |||
evaluation of LEDBAT++", Computer Networks Volume 212, | ||||
2022. | ||||
[COMNET2] Bagnulo, M.B. and A.G. Garcia-Martinez, "When less is | [Apple] Cheshire, S. and V. Goel, "Reduce network delays for your | |||
more: BBR versus LEDBAT++", Computer Networks Volume 219, | app", Apple Worldwide Developers Conference (WWDC2021), | |||
2022. | Video, 2021, | |||
<https://developer.apple.com/videos/play/wwdc2021/10239/>. | ||||
[COMNET3] Bagnulo, M.B., Garcia-Martinez, A.G., Mandalari, A.M., | [COMNET1] Bagnulo, M. and A. García-Martínez, "An experimental | |||
Balasubramanian, P.B,., Havey, D.H., and G.M. Montenegro, | evaluation of LEDBAT++", Computer Networks, vol. 212, | |||
DOI 10.1016/j.comnet.2022.109036, July 2022, | ||||
<https://doi.org/10.1016/j.comnet.2022.109036>. | ||||
[COMNET2] Bagnulo, M. and A. García-Martínez, "When less is more: | ||||
BBR versus LEDBAT++", Computer Networks, vol. 219, | ||||
DOI 10.1016/j.comnet.2022.109460, December 2022, | ||||
<https://doi.org/10.1016/j.comnet.2022.109460>. | ||||
[COMNET3] Bagnulo, M., García-Martínez, A., Mandalari, A.M., | ||||
Balasubramanian, P., Havey, D., and G. Montenegro, | ||||
"Design, implementation and validation of a receiver- | "Design, implementation and validation of a receiver- | |||
driven less-than-best-effort transport", Computer | driven less-than-best-effort transport", Computer | |||
Networks Volume 233, 2022. | Networks, vol. 233, DOI 10.1016/j.comnet.2023.109841, | |||
September 2023, | ||||
<https://doi.org/10.1016/j.comnet.2023.109841>. | ||||
[I-D.irtf-iccrg-ledbat-plus-plus] | [LEDBAT++] Balasubramanian, P., Ertugay, O., Havey, D., and M. | |||
Balasubramanian, P., Ertugay, O., and D. Havey, "LEDBAT++: | Bagnulo, "LEDBAT++: Congestion Control for Background | |||
Congestion Control for Background Traffic", Work in | Traffic", Work in Progress, Internet-Draft, draft-irtf- | |||
Progress, Internet-Draft, draft-irtf-iccrg-ledbat-plus- | iccrg-ledbat-plus-plus-02, 13 February 2025, | |||
plus-01, 25 August 2020, | ||||
<https://datatracker.ietf.org/doc/html/draft-irtf-iccrg- | <https://datatracker.ietf.org/doc/html/draft-irtf-iccrg- | |||
ledbat-plus-plus-01>. | ledbat-plus-plus-02>. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | |||
<https://www.rfc-editor.org/info/rfc5681>. | <https://www.rfc-editor.org/info/rfc5681>. | |||
[RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, | [RFC6817] Shalunov, S., Hazel, G., Iyengar, J., and M. Kuehlewind, | |||
"Low Extra Delay Background Transport (LEDBAT)", RFC 6817, | "Low Extra Delay Background Transport (LEDBAT)", RFC 6817, | |||
DOI 10.17487/RFC6817, December 2012, | DOI 10.17487/RFC6817, December 2012, | |||
<https://www.rfc-editor.org/info/rfc6817>. | <https://www.rfc-editor.org/info/rfc6817>. | |||
skipping to change at page 17, line 37 ¶ | skipping to change at line 823 ¶ | |||
[RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | |||
STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | |||
<https://www.rfc-editor.org/info/rfc9293>. | <https://www.rfc-editor.org/info/rfc9293>. | |||
[RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | |||
"CUBIC for Fast and Long-Distance Networks", RFC 9438, | "CUBIC for Fast and Long-Distance Networks", RFC 9438, | |||
DOI 10.17487/RFC9438, August 2023, | DOI 10.17487/RFC9438, August 2023, | |||
<https://www.rfc-editor.org/info/rfc9438>. | <https://www.rfc-editor.org/info/rfc9438>. | |||
[Windows11] | [Windows11] | |||
Forsmann, C.F., "What's new in Delivery Optimization", | Microsoft, "What's new in Delivery Optimization", | |||
Microsoft Documentation https://learn.microsoft.com/en- | Microsoft Windows Documentation, October 2024, | |||
us/windows/deployment/do/whats-new-do, 2023. | <https://learn.microsoft.com/en-us/windows/deployment/do/ | |||
whats-new-do>. | ||||
[WindowsServer] | [WindowsServer] | |||
Havey, D.H., "LEDBAT Background Data Transfer for | Havey, D., "LEDBAT Background Data Transfer for Windows", | |||
Windows", Microsoft Blog | Microsoft Networking Blog, September 2022, | |||
https://techcommunity.microsoft.com/t5/networking- | <https://techcommunity.microsoft.com/t5/networking-blog/ | |||
blog/ledbat-background-data-transfer-for-windows/ba- | ledbat-background-data-transfer-for-windows/ba-p/3639278>. | |||
p/3639278, 2022. | ||||
Appendix A. Terminology | ||||
We use the following abreviations thoughout the text. We include a | ||||
short list for the reader's convenence: | ||||
RCV.WND: the value included in the Receive Window field of the TCP | ||||
header (which computation is modified by this specification) | ||||
SND.WND: The TCP sender's window | ||||
cwnd: the consgestion window as computed by the congestion control | ||||
algorithm running at the TCP sender. | ||||
RLWND: the window value calculated by rLEDBAT algorithm | ||||
fcwnd: the value that a standard RFC793bis TCP receiver calculates | ||||
to set in the receive window for flow control purposes. | ||||
RCV.HGH: the highest sequence number corresponding to a received | ||||
byte of data at one point in time | ||||
TSV.HGH: TSV.HGH the TSVal value corresponding to the segment in | ||||
which RCV.HGH was carried at that point in time | ||||
SEG.SEQ: the sequence number of the last received segment | ||||
TSV.SEQ: the TSVal value of the last received segment | ||||
Appendix B. rLEDBAT pseudo-code | Appendix A. rLEDBAT Pseudocode | |||
We next describe how to integrate the proposed rLEDBAT mechanisms and | In this section, we describe how to integrate the proposed rLEDBAT | |||
an LBE delay-based congestion control algorithm such as LEDBAT or | mechanisms and an LBE delay-based congestion control algorithm such | |||
LEDBAT++. We describe the integrated algorithm as two procedures, one | as LEDBAT or LEDBAT++. We describe the integrated algorithm as two | |||
that is executed when a packet is received by a rLEDBAT-enabled | procedures: one that is executed when a packet is received by a | |||
endpoint (Figure 2) and another that is executed when the rLEDBAT- | rLEDBAT-enabled endpoint (Figure 2) and another that is executed when | |||
enabled endpoint sends a packet (Figure 3). At the beginning, RLWND | the rLEDBAT-enabled endpoint sends a packet (Figure 3). At the | |||
is set to its maximum value, so that the sending rate of the sender | beginning, RLWND is set to its maximum value, so that the sending | |||
is governed by the flow control algorithm of the receiver and the TCP | rate of the sender is governed by the flow control algorithm of the | |||
slow start mechanism of the sender, and the ackedBytes variable is | receiver and the TCP slow start mechanism of the sender, and the | |||
set to 0. | ackedBytes variable is set to 0. | |||
We assume that the LBE congestion control algorithm defines a | We assume that the LBE congestion control algorithm defines a | |||
WindowIncrease() function and a WindowDecrease() function. For | WindowIncrease() function and a WindowDecrease() function. For | |||
example, in the case of LEDBAT++, the WindowIncrease() function is an | example, in the case of LEDBAT++, the WindowIncrease() function is an | |||
additive increase, while the WindowDecrease() function is a | additive increase, while the WindowDecrease() function is a | |||
multiplicative decrease. In the case of the WindowIncrease(), we | multiplicative decrease. In the case of the WindowIncrease() | |||
assume that it takes as input the current window size and the number | function, we assume that it takes as input the current window size | |||
of bytes that were acknowledged since the last window update | and the number of bytes that were acknowledged since the last window | |||
(ackedBytes) and returns as output the updated window size. In the | update (ackedBytes) and returns as output the updated window size. | |||
case of WindowDecrease(), it takes as input the current window size | In the case of the WindowDecrease() function, it takes as input the | |||
and returns the updated window size. | current window size and returns the updated window size. | |||
The data structures used in the algorithms are as follows. The | The data structures used in the algorithms are as follows. The | |||
sentList is a list that contains the TSval and the local send time of | sentList is a list that contains the TSval and the local send time of | |||
each packet sent by the rLEDBAT-enabled endpoint. The TSecr field of | each packet sent by the rLEDBAT-enabled endpoint. The TSecr field of | |||
the packets received by the rLEDBAT-enabled endpoint are matched with | the packets received by the rLEDBAT-enabled endpoint is matched with | |||
the sendList to compute the RTT. | the sendList to compute the RTT. | |||
The RTT values computed for each received packet are stored in the | The RTT values computed for each received packet are stored in the | |||
RTTlist, which contains also the received TSecr (to avoid using | RTTlist, which also contains the received TSecr (to avoid using | |||
multiple packets with the same TSecr for RTT calculations, only the | multiple packets with the same TSecr for RTT calculations, only the | |||
first packet received for a given TSecr is used to compute the RTT). | first packet received for a given TSecr is used to compute the RTT). | |||
It also contains the local time at which the packet was received, to | It also contains the local time at which the packet was received, to | |||
allow selecting the RTTs measured in a given period (e.g., in the | allow selecting the RTTs measured in a given period (e.g., in the | |||
last 10 minutes). RTTlist is initialized with all its values to its | last 10 minutes). RTTlist is initialized with all its values to its | |||
maximum. | maximum. | |||
procedure receivePacket() | procedure receivePacket() | |||
//Looks for first sent packet with same TSval as TSecr, and, | //Looks for first sent packet with same TSval as TSecr, and | |||
//returns time difference | //returns time difference | |||
receivedRTT = computeRTT(sentList, receivedTSecr, receivedTime) | receivedRTT = computeRTT(sentList, receivedTSecr, receivedTime) | |||
//Inserts minimum value for a given receivedTSecr | //Inserts minimum value for a given receivedTSecr | |||
//note that many received packets may contain same receivedTSecr | //Note that many received packets may contain same receivedTSecr | |||
insertRTT (RTTlist, receivedRTT, receivedTSecr, receivedTime) | insertRTT (RTTlist, receivedRTT, receivedTSecr, receivedTime) | |||
filteredRTT = minLastKMeasures(RTTlist, K=4) | filteredRTT = minLastKMeasures(RTTlist, K=4) | |||
baseRTT = minLastNSeconds(RTTlist, N=180) | baseRTT = minLastNSeconds(RTTlist, N=180) | |||
qd = filteredRTT - baseRTT | qd = filteredRTT - baseRTT | |||
//ackedBytes is the number of bytes that can be used to reduce | //ackedBytes is the number of bytes that can be used to reduce | |||
//the Receive Window - without shrinking it - if necessary | //the Receive Window - without shrinking it - if necessary | |||
ackedBytes = ackedBytes + receiveBytes | ackedBytes = ackedBytes + receiveBytes | |||
if retransmittedPacketDetected then | if retransmittedPacketDetected then | |||
RLWND = DecreaseWindow(RLWND) // Only once per RTT | RLWND = DecreaseWindow(RLWND) //Only once per RTT | |||
end if | end if | |||
if qd < T then | if qd < T then | |||
RLWND = IncreaseWindow(RLWND, ackedBytes) | RLWND = IncreaseWindow(RLWND, ackedBytes) | |||
else | else | |||
RLWND = DecreaseWindow(RLWND) | RLWND = DecreaseWindow(RLWND) | |||
end if | end if | |||
end procedure | end procedure | |||
Figure 2: Procedure executed when a packet is received | Figure 2: Procedure Executed When a Packet Is Received | |||
procedure SENDPACKET | procedure SENDPACKET | |||
if (RLWND > RLWNDPrevious) or (RLWND - RLWNDPrevious < ackedBytes) | if (RLWND > RLWNDPrevious) or (RLWND - RLWNDPrevious < ackedBytes) | |||
then | then | |||
RLWNDPrevious = RLWND | RLWNDPrevious = RLWND | |||
else | else | |||
RLWNDPrevious = RLWND - ackedBytes | RLWNDPrevious = RLWND - ackedBytes | |||
end if | end if | |||
ackedBytes = 0 | ackedBytes = 0 | |||
RLWNDPrevious = RLWND | RLWNDPrevious = RLWND | |||
//Compute the RWND to include in the packet | //Compute the RWND to include in the packet | |||
RLWND = min(RLWND, fcwnd) | RLWND = min(RLWND, fcwnd) | |||
end procedure | end procedure | |||
Figure 3: Procedure executed when a packet is sent | Figure 3: Procedure Executed When a Packet Is Sent | |||
Acknowledgments | ||||
This work was supported by the EU through the StandICT projects RXQ, | ||||
CCI, and CEL6; the NGI Pointer RIM project; and the H2020 5G-RANGE | ||||
project; and by the Spanish Ministry of Economy and Competitiveness | ||||
through the 5G-City project (TEC2016-76795-C6-3-R). | ||||
We would like to thank ICCRG chairs Reese Enghardt and Vidhi Goel for | ||||
their support on this work. We would also like to thank Daniel Havey | ||||
for his help. We would like to thank Colin Perkins, Mirja Kühlewind, | ||||
and Vidhi Goel for their reviews and comments on earlier draft | ||||
versions of this document. | ||||
Authors' Addresses | Authors' Addresses | |||
Marcelo Bagnulo | Marcelo Bagnulo | |||
Universidad Carlos III de Madrid | Universidad Carlos III de Madrid | |||
Email: marcelo@it.uc3m.es | Email: marcelo@it.uc3m.es | |||
Alberto Garcia-Martinez | Alberto Garcia-Martinez | |||
Universidad Carlos III de Madrid | Universidad Carlos III de Madrid | |||
Email: alberto@it.uc3m.es | Email: alberto@it.uc3m.es | |||
End of changes. 144 change blocks. | ||||
480 lines changed or deleted | 518 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |