| rfc9768v1.txt | rfc9768.txt | |||
|---|---|---|---|---|
| Internet Engineering Task Force (IETF) B. Briscoe | Internet Engineering Task Force (IETF) B. Briscoe | |||
| Request for Comments: 9768 Independent | Request for Comments: 9768 Independent | |||
| Updates: 3168 M. Kühlewind | Updates: 3168 M. Kühlewind | |||
| Category: Standards Track Ericsson | Category: Standards Track Ericsson | |||
| ISSN: 2070-1721 R. Scheffenegger | ISSN: 2070-1721 R. Scheffenegger | |||
| NetApp | NetApp | |||
| August 2025 | October 2025 | |||
| More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | |||
| Abstract | Abstract | |||
| Explicit Congestion Notification (ECN) is a mechanism by which | Explicit Congestion Notification (ECN) is a mechanism by which | |||
| network nodes can mark IP packets instead of dropping them to | network nodes can mark IP packets instead of dropping them to | |||
| indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
| ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
| sender. ECN was originally specified for TCP in such a way that only | sender. ECN was originally specified for TCP in such a way that only | |||
| one feedback signal can be transmitted per Round-Trip Time (RTT). | one feedback signal can be transmitted per Round-Trip Time (RTT). | |||
| Newer TCP mechanisms like Congestion Exposure (ConEx), Data Center | More recently defined TCP mechanisms like Congestion Exposure | |||
| TCP (DCTCP), or Low Latency, Low Loss, and Scalable Throughput (L4S) | (ConEx), Data Center TCP (DCTCP), or Low Latency, Low Loss, and | |||
| need more Accurate ECN (AccECN) feedback information whenever more | Scalable Throughput (L4S) need more Accurate ECN (AccECN) feedback | |||
| than one marking is received in one RTT. This document updates the | information whenever more than one marking is received in one RTT. | |||
| original ECN specification defined in RFC 3168 by specifying a scheme | This document updates the original ECN specification defined in RFC | |||
| that provides more than one feedback signal per RTT in the TCP | 3168 by specifying a scheme that provides more than one feedback | |||
| header. Given TCP header space is scarce, it allocates a reserved | signal per RTT in the TCP header. Given TCP header space is scarce, | |||
| header bit previously assigned to the ECN-nonce. It also overloads | it allocates a reserved header bit previously assigned to the ECN- | |||
| the two existing ECN flags in the TCP header. The resulting extra | nonce. It also overloads the two existing ECN flags in the TCP | |||
| space is additionally exploited to feed back the IP-ECN field | header. The resulting extra space is additionally exploited to feed | |||
| received during the TCP connection establishment. Supplementary | back the IP ECN field received during the TCP connection | |||
| feedback information can optionally be provided in two new TCP option | establishment. Supplementary feedback information can optionally be | |||
| alternatives, which are never used on the TCP SYN. The document also | provided in two new TCP Option alternatives, which are never used on | |||
| specifies the treatment of this updated TCP wire protocol by | the TCP SYN. The document also specifies the treatment of this | |||
| middleboxes. | updated TCP wire protocol by middleboxes. | |||
| Status of This Memo | Status of This Memo | |||
| This is an Internet Standards Track document. | This is an Internet Standards Track document. | |||
| This document is a product of the Internet Engineering Task Force | This document is a product of the Internet Engineering Task Force | |||
| (IETF). It represents the consensus of the IETF community. It has | (IETF). It represents the consensus of the IETF community. It has | |||
| received public review and has been approved for publication by the | received public review and has been approved for publication by the | |||
| Internet Engineering Steering Group (IESG). Further information on | Internet Engineering Steering Group (IESG). Further information on | |||
| Internet Standards is available in Section 2 of RFC 7841. | Internet Standards is available in Section 2 of RFC 7841. | |||
| skipping to change at line 158 ¶ | skipping to change at line 158 ¶ | |||
| Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by | Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by | |||
| which network nodes can mark IP packets instead of dropping them to | which network nodes can mark IP packets instead of dropping them to | |||
| indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
| ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
| sender. In RFC 3168, ECN was specified for TCP in such a way that | sender. In RFC 3168, ECN was specified for TCP in such a way that | |||
| only one feedback signal could be transmitted per Round-Trip Time | only one feedback signal could be transmitted per Round-Trip Time | |||
| (RTT). This is sufficient for congestion control schemes like Reno | (RTT). This is sufficient for congestion control schemes like Reno | |||
| [RFC6582] and CUBIC [RFC9438], as those schemes reduce their | [RFC6582] and CUBIC [RFC9438], as those schemes reduce their | |||
| congestion window by a fixed factor if congestion occurs within an | congestion window by a fixed factor if congestion occurs within an | |||
| RTT independent of the number of received congestion markings. | RTT independent of the number of received congestion markings. More | |||
| Recently, proposed mechanisms like Congestion Exposure (ConEx | recently defined mechanisms like Congestion Exposure (ConEx | |||
| [RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more | [RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more | |||
| than one marking is received in one RTT, which is information that | than one marking is received in one RTT, which is information that | |||
| cannot be provided by the feedback scheme as specified in [RFC3168]. | cannot be provided by the feedback scheme as specified in [RFC3168]. | |||
| This document specifies an update to the ECN feedback scheme of RFC | This document specifies an update to the ECN feedback scheme of RFC | |||
| 3168 that provides more accurate information and could be used by | 3168 that provides more accurate information and could be used by | |||
| these and potentially other future TCP extensions, while still also | these and potentially other future TCP extensions, while still also | |||
| supporting the pre-existing TCP congestion controllers that use just | supporting the pre-existing TCP congestion controllers that use just | |||
| one feedback signal per round. Congestion control is the term the | one feedback signal per round. Congestion control is the term the | |||
| IETF uses to describe data rate management. It is the algorithm that | IETF uses to describe data rate management. It is the algorithm that | |||
| a sender uses to optimize its sending rate so that it transmits data | a sender uses to optimize its sending rate so that it transmits data | |||
| as fast as the network can carry it, but no faster. A fuller | as fast as the network can carry it, but no faster. A fuller | |||
| description of the motivation for this specification is given in the | description of the motivation for this specification is given in the | |||
| associated requirements document [RFC7560]. | associated requirements document [RFC7560]. | |||
| This document specifies a Standards Track scheme for ECN feedback in | This document specifies a Standards Track scheme for ECN feedback in | |||
| the TCP header to provide more than one feedback signal per RTT. It | the TCP header to provide more than one feedback signal per RTT. It | |||
| is called the more "Accurate ECN" feedback scheme, or AccECN for | is called the more "Accurate ECN feedback" scheme, or AccECN for | |||
| short. This document updates RFC 3168 with respect to negotiation | short. This document updates RFC 3168 with respect to negotiation | |||
| and use of the feedback scheme for TCP. All aspects of RFC 3168 | and use of the feedback scheme for TCP. All aspects of RFC 3168 | |||
| other than the TCP feedback scheme and its negotiation remain | other than the TCP feedback scheme and its negotiation remain | |||
| unchanged by this specification. In particular, the definition of | unchanged by this specification. In particular, the definition of | |||
| ECN at the IP layer is unaffected. Section 4 details the aspects of | ECN at the IP layer is unaffected. Section 4 details the aspects of | |||
| RFC 3168 that are updated by this document. | RFC 3168 that are updated by this document. | |||
| This document uses the term "Classic ECN feedback" when it needs to | This document uses the term "Classic ECN feedback" when it needs to | |||
| distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | |||
| AccECN TCP feedback scheme. AccECN is intended to offer a complete | AccECN TCP feedback scheme. AccECN is intended to offer a complete | |||
| skipping to change at line 224 ¶ | skipping to change at line 224 ¶ | |||
| CUBIC, AccECN can be used to respond to the extent of congestion | CUBIC, AccECN can be used to respond to the extent of congestion | |||
| notification over a round trip, as for example DCTCP does in | notification over a round trip, as for example DCTCP does in | |||
| controlled environments [RFC8257]. For congestion response, this | controlled environments [RFC8257]. For congestion response, this | |||
| specification refers to the original ECN specification adopted in | specification refers to the original ECN specification adopted in | |||
| 2001 [RFC3168], as updated by the more relaxed rules introduced in | 2001 [RFC3168], as updated by the more relaxed rules introduced in | |||
| 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | |||
| Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | |||
| Alternative Backoff with ECN (ABE) [RFC8511]. | Alternative Backoff with ECN (ABE) [RFC8511]. | |||
| Section 5.2 explains how AccECN is compatible with current commonly | Section 5.2 explains how AccECN is compatible with current commonly | |||
| used TCP options, and a number of current experimental modifications | used TCP Options, and a number of current experimental modifications | |||
| to TCP, as well as SYN cookies. | to TCP, as well as SYN cookies. | |||
| 1.1. Document Roadmap | 1.1. Document Roadmap | |||
| The following introductory section outlines the goals of AccECN | The following introductory section outlines the goals of AccECN | |||
| (Section 1.2). Then, terminology is defined (Section 1.3) and a | (Section 1.2). Then, terminology is defined (Section 1.3) and a | |||
| recap of existing prerequisite technology is given (Section 1.4). | recap of existing prerequisite technology is given (Section 1.4). | |||
| Section 2 gives an informative overview of the AccECN protocol. Then | Section 2 gives an informative overview of the AccECN protocol. Then | |||
| Section 3 gives the normative protocol specification, and Section 3.3 | Section 3 gives the normative protocol specification, and Section 3.3 | |||
| skipping to change at line 269 ¶ | skipping to change at line 269 ¶ | |||
| The requirements document recognizes that a protocol as ubiquitous as | The requirements document recognizes that a protocol as ubiquitous as | |||
| TCP needs to be able to serve as-yet-unspecified requirements. | TCP needs to be able to serve as-yet-unspecified requirements. | |||
| Therefore, an AccECN receiver acts as a generic (mechanistic) | Therefore, an AccECN receiver acts as a generic (mechanistic) | |||
| reflector of congestion information with the aim that new sender | reflector of congestion information with the aim that new sender | |||
| behaviours can be deployed unilaterally (see Section 2.5) in the | behaviours can be deployed unilaterally (see Section 2.5) in the | |||
| future. | future. | |||
| 1.3. Terminology | 1.3. Terminology | |||
| AccECN: The more Accurate ECN feedback scheme is called AccECN for | Accurate ECN feedback: The more Accurate ECN feedback scheme is | |||
| short. | called AccECN for short. | |||
| Classic ECN: The ECN protocol specified in [RFC3168]. | Classic ECN: The ECN protocol specified in [RFC3168]. | |||
| Classic ECN feedback: The feedback aspect of the ECN protocol | Classic ECN feedback: The feedback aspect of the ECN protocol | |||
| specified in [RFC3168], including generation, encoding, | specified in [RFC3168], including generation, encoding, | |||
| transmission and decoding of feedback, but not the Data Sender's | transmission and decoding of feedback, but not the Data Sender's | |||
| subsequent response to that feedback. | subsequent response to that feedback. | |||
| ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | |||
| skipping to change at line 317 ¶ | skipping to change at line 317 ¶ | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
| BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 1.4. Recap of Existing ECN Feedback in IP/TCP | 1.4. Recap of Existing ECN Feedback in IP/TCP | |||
| Explicit Congestion Notification (ECN) [RFC3168] can be split into | Explicit Congestion Notification (ECN) [RFC3168] can be split into | |||
| two parts conceptionally. In the forward direction, alongside the | two parts conceptionally. In the forward direction, alongside the | |||
| data stream, it uses a 2-bit field in the IP header. This is | data stream, it uses a 2-bit field in the IP header. This is | |||
| referred to as IP-ECN later on. This signal carried in the IP (Layer | referred to as IP ECN later on. This signal carried in the IP (Layer | |||
| 3) header is exposed to network devices and may be modified when such | 3) header is exposed to network devices and may be modified when such | |||
| a device starts to experience congestion (see Table 1). The second | a device starts to experience congestion (see Table 1). The second | |||
| part is the feedback mechanism, by which the original data sender is | part is the feedback mechanism, by which the original data sender is | |||
| notified of the current congestion state of the intermediate path. | notified of the current congestion state of the intermediate path. | |||
| That returned signal is carried in a protocol-specific manner, and is | That returned signal is carried in a protocol-specific manner, and is | |||
| not to be modified by intermediate network devices. While ECN is in | not to be modified by intermediate network devices. While ECN is in | |||
| active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | |||
| [RFC6679], and Remote Direct Memory Access over Converged Ethernet | [RFC6679], and Remote Direct Memory Access over Converged Ethernet | |||
| [RoCEv2], this document only concerns itself with the specific | [RoCEv2], this document only concerns itself with the specific | |||
| implementation for the TCP protocol. | implementation for the TCP protocol. | |||
| skipping to change at line 343 ¶ | skipping to change at line 343 ¶ | |||
| 0b00, the packet is considered to have been sent by a Not ECN-capable | 0b00, the packet is considered to have been sent by a Not ECN-capable | |||
| Transport (Not-ECT). When a network node experiences congestion, it | Transport (Not-ECT). When a network node experiences congestion, it | |||
| will occasionally either drop or mark a packet, with the choice | will occasionally either drop or mark a packet, with the choice | |||
| depending on the packet's ECN codepoint. If the codepoint is Not- | depending on the packet's ECN codepoint. If the codepoint is Not- | |||
| ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), | ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), | |||
| the node can mark the packet by setting the ECN codepoint to 0b11, | the node can mark the packet by setting the ECN codepoint to 0b11, | |||
| which is termed 'Congestion Experienced' (CE), or loosely a | which is termed 'Congestion Experienced' (CE), or loosely a | |||
| 'congestion mark'. Table 1 summarises these codepoints. | 'congestion mark'. Table 1 summarises these codepoints. | |||
| +==================+================+===========================+ | +==================+================+===========================+ | |||
| | IP-ECN codepoint | Codepoint name | Description | | | IP ECN Codepoint | Codepoint Name | Description | | |||
| +==================+================+===========================+ | +==================+================+===========================+ | |||
| | 0b00 | Not-ECT | Not ECN-Capable Transport | | | 0b00 | Not-ECT | Not ECN-Capable Transport | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b01 | ECT(1) | ECN-Capable Transport (1) | | | 0b01 | ECT(1) | ECN-Capable Transport (1) | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b10 | ECT(0) | ECN-Capable Transport (0) | | | 0b10 | ECT(0) | ECN-Capable Transport (0) | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b11 | CE | Congestion Experienced | | | 0b11 | CE | Congestion Experienced | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| skipping to change at line 404 ¶ | skipping to change at line 404 ¶ | |||
| Like the general TCP approach, the Data Receiver of each TCP half- | Like the general TCP approach, the Data Receiver of each TCP half- | |||
| connection sends AccECN feedback to the Data Sender on TCP | connection sends AccECN feedback to the Data Sender on TCP | |||
| acknowledgements, reusing data packets of the other half-connection | acknowledgements, reusing data packets of the other half-connection | |||
| whenever possible. | whenever possible. | |||
| The AccECN protocol has had to be designed in two parts: | The AccECN protocol has had to be designed in two parts: | |||
| * an essential feedback part that reuses the TCP-ECN header bits for | * an essential feedback part that reuses the TCP-ECN header bits for | |||
| the Data Receiver to feed back the number of packets arriving with | the Data Receiver to feed back the number of packets arriving with | |||
| CE in the IP-ECN field. This provides more accuracy than Classic | CE in the IP ECN field. This provides more accuracy than Classic | |||
| ECN feedback, but limited resilience against ACK loss; | ECN feedback, but limited resilience against ACK loss. | |||
| * a supplementary feedback part using one of two new alternative | * a supplementary feedback part using one of two new alternative | |||
| AccECN TCP options that provide additional feedback on the number | AccECN TCP Options that provide additional feedback on the number | |||
| of payload bytes that arrive marked with each of the three ECN | of payload bytes that arrive marked with each of the three ECN | |||
| codepoints in the IP-ECN field (not just CE marks). See the BCP | codepoints in the IP ECN field (not just CE marks). See the BCP | |||
| on Byte and Packet Congestion Notification [RFC7141] for the | on Byte and Packet Congestion Notification [RFC7141] for the | |||
| rationale determining that conveying congested payload bytes | rationale determining that conveying congested payload bytes | |||
| should be preferred over just providing feedback about congested | should be preferred over just providing feedback about congested | |||
| packets. This also provides greater resilience against ACK loss | packets. This also provides greater resilience against ACK loss | |||
| than the essential feedback, but it is currently more likely to | than the essential feedback, but it is currently more likely to | |||
| suffer from middlebox interference. | suffer from middlebox interference. | |||
| The two part design was necessary, given limitations on the space | The two part design was necessary, given limitations on the space | |||
| available for TCP options and given the possibility that certain | available for TCP Options and given the possibility that certain | |||
| incorrectly designed middleboxes might prevent TCP from using any new | incorrectly designed middleboxes might prevent TCP from using any new | |||
| options. | options. | |||
| The essential feedback part overloads the previous definition of the | The essential feedback part overloads the previous definition of the | |||
| three flags in the TCP header that had been assigned for use by | three flags in the TCP header that had been assigned for use by | |||
| Classic ECN. This design choice deliberately allows AccECN peers to | Classic ECN. This design choice deliberately allows AccECN peers to | |||
| replace the Classic ECN feedback protocol, rather than leaving | replace the Classic ECN feedback protocol, rather than leaving | |||
| Classic ECN feedback intact and adding more accurate feedback | Classic ECN feedback intact and adding more accurate feedback | |||
| separately because: | separately because: | |||
| * this efficiently reuses scarce TCP header space, given TCP option | * this efficiently reuses scarce TCP header space, given TCP Option | |||
| space is approaching saturation; | space is approaching saturation; | |||
| * a single upgrade path for the TCP protocol is preferable to a fork | * a single upgrade path for the TCP protocol is preferable to a fork | |||
| in the design that modifies the TCP header to convey all ECN | in the design that modifies the TCP header to convey all ECN | |||
| feedback; | feedback; | |||
| * otherwise, Classic and Accurate ECN feedback could give | * otherwise, Classic and Accurate ECN feedback could give | |||
| conflicting feedback about the same segment, which could open up | conflicting feedback about the same segment, which could open up | |||
| new security concerns and make implementations unnecessarily | new security concerns and make implementations unnecessarily | |||
| complex; | complex; | |||
| skipping to change at line 470 ¶ | skipping to change at line 470 ¶ | |||
| An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | |||
| option space is limited. The TCP Server sends an AccECN Option on | option space is limited. The TCP Server sends an AccECN Option on | |||
| the SYN/ACK, and the TCP Client sends one on the first ACK to test | the SYN/ACK, and the TCP Client sends one on the first ACK to test | |||
| whether the network path forwards these options correctly. | whether the network path forwards these options correctly. | |||
| 2.2. Feedback Mechanism | 2.2. Feedback Mechanism | |||
| A Data Receiver maintains four counters initialized at the start of | A Data Receiver maintains four counters initialized at the start of | |||
| the half-connection. Three count the number of arriving payload | the half-connection. Three count the number of arriving payload | |||
| bytes marked CE, ECT(1), and ECT(0) in the IP-ECN field. These byte | bytes marked CE, ECT(1), and ECT(0) in the IP ECN field. These byte | |||
| counters reflect only the TCP payload length, excluding the TCP | counters reflect only the TCP payload length, excluding the TCP | |||
| header and TCP options. The fourth counter counts the number of | header and TCP Options. The fourth counter counts the number of | |||
| packets arriving marked with a CE codepoint (including control | packets arriving marked with a CE codepoint (including control | |||
| packets without payload if they are CE-marked). | packets without payload if they are CE-marked). | |||
| The Data Sender maintains four equivalent counters for the half | The Data Sender maintains four equivalent counters for the half- | |||
| connection, and the AccECN protocol is designed to ensure they will | connection, and the AccECN protocol is designed to ensure they will | |||
| match the values in the Data Receiver's counters, albeit after a | match the values in the Data Receiver's counters, albeit after a | |||
| little delay. | little delay. | |||
| Each ACK carries the three least significant bits (LSBs) of the | Each ACK carries the three least significant bits (LSBs) of the | |||
| packet-based CE counter using the ECN bits in the TCP header, now | packet-based CE counter using the ECN bits in the TCP header, now | |||
| renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of | renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of | |||
| some or all of the byte counters can be optionally carried in an | some or all of the byte counters can be optionally carried in an | |||
| AccECN Option. For efficient use of limited option space, two | AccECN Option. For efficient use of limited option space, two | |||
| alternative forms of the AccECN Option are specified with the fields | alternative forms of the AccECN Option are specified with the fields | |||
| in the opposite order to each other. | in the opposite order to each other. | |||
| 2.3. Delayed ACKs and Resilience Against ACK Loss | 2.3. Delayed ACKs and Resilience Against ACK Loss | |||
| With both the ACE and the AccECN Option mechanisms, the Data Receiver | With both the ACE and the AccECN Option mechanisms, the Data Receiver | |||
| continually repeats the current LSBs of each of its respective | continually repeats the current LSBs of each of its respective | |||
| counters. There is no need to acknowledge these continually repeated | counters. There is no need to acknowledge these continually repeated | |||
| counters, so the Congestion Window Reduced (CWR) mechanism of | counters, so the CWR mechanism of [RFC3168] is no longer used. Even | |||
| [RFC3168] is no longer used. Even if some ACKs are lost, the Data | if some ACKs are lost, the Data Sender ought to be able to infer how | |||
| Sender ought to be able to infer how much to increment its own | much to increment its own counters, even if the protocol field has | |||
| counters, even if the protocol field has wrapped. | wrapped. | |||
| The 3-bit ACE field can wrap fairly frequently. Therefore, even if | The 3-bit ACE field can wrap fairly frequently. Therefore, even if | |||
| it appears to have incremented by one (say), the field might have | it appears to have incremented by one (say), the field might have | |||
| actually cycled completely and then incremented by one. The Data | actually cycled completely and then incremented by one. The Data | |||
| Receiver is not allowed to delay sending an ACK to such an extent | Receiver is not allowed to delay sending an ACK to such an extent | |||
| that the ACE field would cycle. However, ACKs received at the Data | that the ACE field would cycle. However, ACKs received at the Data | |||
| Sender could still cycle because a whole sequence of ACKs carrying | Sender could still cycle because a whole sequence of ACKs carrying | |||
| intervening values of the field might all be lost or delayed in | intervening values of the field might all be lost or delayed in | |||
| transit. | transit. | |||
| skipping to change at line 557 ¶ | skipping to change at line 557 ¶ | |||
| other than the L4S experiment [RFC9330], such as a lower severity or | other than the L4S experiment [RFC9330], such as a lower severity or | |||
| a more instant congestion signal than CE. | a more instant congestion signal than CE. | |||
| Feedback in bytes is provided to protect against the receiver or a | Feedback in bytes is provided to protect against the receiver or a | |||
| middlebox using attacks similar to 'ACK-Division' to artificially | middlebox using attacks similar to 'ACK-Division' to artificially | |||
| inflate the congestion window, which is why [RFC5681] now recommends | inflate the congestion window, which is why [RFC5681] now recommends | |||
| that TCP counts acknowledge bytes not packets. | that TCP counts acknowledge bytes not packets. | |||
| 2.5. Generic (Mechanistic) Reflector | 2.5. Generic (Mechanistic) Reflector | |||
| The ACE field provides feedback about CE markings in the IP-ECN field | The ACE field provides feedback about CE markings in the IP ECN field | |||
| of both data and control packets. According to [RFC3168], the Data | of both data and control packets. According to [RFC3168], the Data | |||
| Sender is meant to set the IP-ECN field of control packets to Not- | Sender is meant to set the IP ECN field of control packets to Not- | |||
| ECT. However, mechanisms in certain private networks (e.g., data | ECT. However, mechanisms in certain private networks (e.g., data | |||
| centres) set control packets to be ECN-capable because they are | centres) set control packets to be ECN-capable because they are | |||
| precisely the packets that performance depends on most. | precisely the packets that performance depends on most. | |||
| For this reason, AccECN is designed to be a generic reflector of | For this reason, AccECN is designed to be a generic reflector of | |||
| whatever ECN markings it sees, whether or not they are compliant with | whatever ECN markings it sees, whether or not they are compliant with | |||
| a current standard. Then as standards evolve, Data Senders can | a current standard. Then as standards evolve, Data Senders can | |||
| upgrade unilaterally without any need for receivers to upgrade too. | upgrade unilaterally without any need for receivers to upgrade too. | |||
| It is also useful to be able to rely on generic reflection behaviour | It is also useful to be able to rely on generic reflection behaviour | |||
| when senders need to test for unexpected interference with markings | when senders need to test for unexpected interference with markings | |||
| (for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present | (for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present | |||
| document and paragraph 2 of Section 20.2 of [RFC3168]). | document and paragraph 2 of Section 20.2 of [RFC3168]). | |||
| The initial SYN and SYN/ACK are the most critical control packets, so | The initial SYN and SYN/ACK are the most critical control packets, so | |||
| AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits | AccECN feeds back their IP ECN fields. Although RFC 3168 prohibits | |||
| ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | |||
| the SYN and SYN/ACK supports future scenarios in which SYNs might be | the SYN and SYN/ACK supports future scenarios in which SYNs might be | |||
| ECN-enabled (without prejudging whether they ought to be). For | ECN-enabled (without prejudging whether they ought to be). For | |||
| instance, [RFC8311] updates this aspect of RFC 3168 to allow | instance, [RFC8311] updates this aspect of RFC 3168 to allow | |||
| experimentation with ECN-capable TCP control packets. | experimentation with ECN-capable TCP control packets. | |||
| Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | |||
| Not-ECT in compliance with RFC 3168, feedback on the state of the IP- | Not-ECT in compliance with RFC 3168, feedback on the state of the IP | |||
| ECN field when it arrives at the receiver could still be useful, | ECN field when it arrives at the receiver could still be useful, | |||
| because middleboxes have been known to overwrite the IP-ECN field as | because middleboxes have been known to overwrite the IP ECN field as | |||
| if it is still part of the old Type of Service (ToS) field | if it is still part of the old Type of Service (ToS) field | |||
| [Mandalari18]. For example, if a TCP Client has set the SYN to Not- | [Mandalari18]. For example, if a TCP Client has set the SYN to Not- | |||
| ECT, but receives feedback that the IP-ECN field on the SYN arrived | ECT, but receives feedback that the IP ECN field on the SYN arrived | |||
| with a different codepoint, it can detect such middlebox | with a different codepoint, it can detect such middlebox | |||
| interference. Previously, neither end knew what IP-ECN field the | interference. Previously, neither end knew what IP ECN field the | |||
| other sent. So, if a TCP Server received ECT or CE on a SYN, it | other sent. So, if a TCP Server received ECT or CE on a SYN, it | |||
| could not know whether it was invalid because only the TCP Client | could not know whether it was invalid because only the TCP Client | |||
| knew whether it originally marked the SYN as Not-ECT (or ECT). | knew whether it originally marked the SYN as Not-ECT (or ECT). | |||
| Therefore, prior to AccECN, the Server's only safe course of action | Therefore, prior to AccECN, the Server's only safe course of action | |||
| in this example was to disable ECN for the connection. Instead, the | in this example was to disable ECN for the connection. Instead, the | |||
| AccECN protocol allows the Server and Client to feed back the ECN | AccECN protocol allows the Server and Client to feed back the ECN | |||
| field received on the SYN and SYN/ACK to their peer, which now has | field received on the SYN and SYN/ACK to their peer, which now has | |||
| all the information to decide whether the connection has to fall back | all the information to decide whether the connection has to fall back | |||
| from supporting ECN (or not). | from supporting ECN (or not). | |||
| skipping to change at line 627 ¶ | skipping to change at line 627 ¶ | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 2: The New Definition of the TCP Header Flags During the | Figure 2: The New Definition of the TCP Header Flags During the | |||
| TCP Three-Way Handshake | TCP Three-Way Handshake | |||
| During the TCP three-way handshake at the start of a connection, to | During the TCP three-way handshake at the start of a connection, to | |||
| request more Accurate ECN feedback the TCP Client (host A) MUST set | request more Accurate ECN feedback the TCP Client (host A) MUST set | |||
| the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | |||
| If a TCP Server (host B) that is AccECN-enabled receives a SYN with | If a TCP Server (host B) that is AccECN-enabled receives a SYN with | |||
| the above three flags set, it MUST set both its half connections into | the above three flags set, it MUST set both its half-connections into | |||
| AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | |||
| SYN/ACK to the combination in the top block of Table 2 that feeds | SYN/ACK to the combination in the top block of Table 2 that feeds | |||
| back the IP-ECN field that arrived on the SYN. This applies whether | back the IP ECN field that arrived on the SYN. This applies whether | |||
| or not the Server itself supports setting the IP-ECN field on a SYN | or not the Server itself supports setting the IP ECN field on a SYN | |||
| or SYN/ACK (see Section 2.5 for rationale). | or SYN/ACK (see Section 2.5 for rationale). | |||
| When the TCP Server returns any of the four combinations in the top | When the TCP Server returns any of the four combinations in the top | |||
| block of Table 2, it confirms that it supports AccECN. The TCP | block of Table 2, it confirms that it supports AccECN. The TCP | |||
| Server MUST NOT set one of these four combinations of flags on the | Server MUST NOT set one of these four combinations of flags on the | |||
| SYN/ACK unless the preceding SYN requested support for AccECN as | SYN/ACK unless the preceding SYN requested support for AccECN as | |||
| above. | above. | |||
| Once a TCP Client (A) has sent the above SYN to declare that it | Once a TCP Client (A) has sent the above SYN to declare that it | |||
| supports AccECN, and once it has received the above SYN/ACK segment | supports AccECN, and once it has received the above SYN/ACK segment | |||
| that confirms that the TCP Server supports AccECN, the TCP Client | that confirms that the TCP Server supports AccECN, the TCP Client | |||
| MUST set both its half connections into AccECN mode. The TCP Client | MUST set both its half-connections into AccECN mode. The TCP Client | |||
| MUST NOT enter AccECN mode (or any feedback mode) before it has | MUST NOT enter AccECN mode (or any feedback mode) before it has | |||
| received the first SYN/ACK. | received the first SYN/ACK. | |||
| Once in AccECN mode, a TCP Client or Server has the rights and | Once in AccECN mode, a TCP Client or Server has the rights and | |||
| obligations to participate in the ECN protocol defined in | obligations to participate in the ECN protocol defined in | |||
| Section 3.1.5. | Section 3.1.5. | |||
| The procedures for retransmission of SYNs or SYN/ACKs are given in | The procedures for retransmission of SYNs or SYN/ACKs are given in | |||
| Section 3.1.4. | Section 3.1.4. | |||
| It is RECOMMENDED that the AccECN protocol be implemented alongside | It is RECOMMENDED that the AccECN protocol be implemented alongside | |||
| Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | |||
| with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | |||
| MUST also be implemented. | MUST also be implemented. | |||
| 3.1.2. Backward Compatibility | 3.1.2. Backward Compatibility | |||
| The three flags are set to 1 to indicate AccECN support on the SYN | The three flags set to 1 to indicate AccECN support on the SYN has | |||
| have been carefully chosen to enable natural fall-back to prior | been carefully chosen to enable natural fall-back to prior stages in | |||
| stages in the evolution of ECN. Table 2 tabulates all the | the evolution of ECN. Table 2 tabulates all the negotiation | |||
| negotiation possibilities for ECN-related capabilities that involve | possibilities for ECN-related capabilities that involve at least one | |||
| at least one AccECN-capable host. The entries in the first two | AccECN-capable host. The entries in the first two columns have been | |||
| columns have been abbreviated, as follows: | abbreviated, as follows: | |||
| AccECN: Supports more Accurate ECN feedback (the present | AccECN: Supports more Accurate ECN feedback (the present | |||
| specification) | specification). | |||
| Nonce: Supports ECN-nonce feedback [RFC3540] | Nonce: Supports ECN-nonce feedback [RFC3540]. | |||
| ECN: Supports 'Classic' ECN feedback [RFC3168] | ECN: Supports 'Classic' ECN feedback [RFC3168]. | |||
| No ECN: Not ECN-capable. Implicit congestion notification using | No ECN: Not ECN-capable. Implicit congestion notification using | |||
| packet drop. | packet drop. | |||
| +========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| | Host A | Host B | SYN | SYN/ACK | Feedback Mode | | | Host A | Host B | SYN | SYN/ACK | Feedback Mode | | |||
| | | | A->B | B->A | of Host A | | | | | A->B | B->A | of Host A | | |||
| | | | AE CWR ECE | AE CWR ECE | | | | | | AE CWR ECE | AE CWR ECE | | | |||
| +========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | |||
| skipping to change at line 716 ¶ | skipping to change at line 716 ¶ | |||
| row. | row. | |||
| 1. The top block shows the case already described in Section 3.1 | 1. The top block shows the case already described in Section 3.1 | |||
| where both endpoints support AccECN and how the TCP Server (B) | where both endpoints support AccECN and how the TCP Server (B) | |||
| indicates congestion feedback. | indicates congestion feedback. | |||
| 2. The second block shows the cases where the TCP Client (A) | 2. The second block shows the cases where the TCP Client (A) | |||
| supports AccECN but the TCP Server (B) supports some earlier | supports AccECN but the TCP Server (B) supports some earlier | |||
| variant of TCP feedback, as indicated in its SYN/ACK. Therefore, | variant of TCP feedback, as indicated in its SYN/ACK. Therefore, | |||
| as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | |||
| shown, it MUST set both its half connections into the feedback | shown, it MUST set both its half-connections into the feedback | |||
| mode shown in the rightmost column. If the TCP Client has set | mode shown in the rightmost column. If the TCP Client has set | |||
| itself into Classic ECN feedback mode, it MUST comply with | itself into Classic ECN feedback mode, it MUST comply with | |||
| [RFC3168]. | [RFC3168]. | |||
| An AccECN implementation has no need to recognize or support the | An AccECN implementation has no need to recognize or support the | |||
| Server response labelled 'Nonce' or ECN-nonce feedback more | Server response labelled 'Nonce' or ECN-nonce feedback more | |||
| generally [RFC3540], as RFC 3540 has been reclassified as | generally [RFC3540], as RFC 3540 has been reclassified as | |||
| Historic [RFC8311]. AccECN is compatible with alternative ECN | Historic [RFC8311]. AccECN is compatible with alternative ECN | |||
| feedback integrity approaches to the nonce (see Section 5.3). | feedback integrity approaches to the nonce (see Section 5.3). | |||
| The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is | The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is | |||
| skipping to change at line 738 ¶ | skipping to change at line 738 ¶ | |||
| SYN/ACK follows the procedure for forward compatibility given in | SYN/ACK follows the procedure for forward compatibility given in | |||
| Section 3.1.3. | Section 3.1.3. | |||
| 3. The third block shows the cases where the TCP Server (B) supports | 3. The third block shows the cases where the TCP Server (B) supports | |||
| AccECN but the TCP Client (A) supports some earlier variant of | AccECN but the TCP Client (A) supports some earlier variant of | |||
| TCP feedback, as indicated in its SYN. | TCP feedback, as indicated in its SYN. | |||
| When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
| (AE,CWR,ECE) = (0,1,1), it MUST do one of the following: | (AE,CWR,ECE) = (0,1,1), it MUST do one of the following: | |||
| * set both its half connections into the Classic ECN feedback | * set both its half-connections into the Classic ECN feedback | |||
| mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | |||
| shown. Then it MUST comply with [RFC3168]. | shown. Then it MUST comply with [RFC3168]. | |||
| * set both its half-connections into Not ECN mode and return a | * set both its half-connections into Not ECN mode and return a | |||
| SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | |||
| disabled. This latter case is unlikely to be desirable, but | disabled. This latter case is unlikely to be desirable, but | |||
| it is allowed as a possibility, e.g., for minimal TCP | it is allowed as a possibility, e.g., for minimal TCP | |||
| implementations. | implementations. | |||
| When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
| (AE,CWR,ECE) = (0,0,0), it MUST set both its half connections | (AE,CWR,ECE) = (0,0,0), it MUST set both its half-connections | |||
| into the Not ECN feedback mode, return a SYN/ACK with | into the Not ECN feedback mode, return a SYN/ACK with | |||
| (AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled. | (AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled. | |||
| 4. The fourth block displays a combination labelled 'Broken'. Some | 4. The fourth block displays a combination labelled 'Broken'. Some | |||
| older TCP Server implementations incorrectly set the TCP-ECN | older TCP Server implementations incorrectly set the TCP-ECN | |||
| flags in the SYN/ACK by reflecting those in the SYN. Such broken | flags in the SYN/ACK by reflecting those in the SYN. Such broken | |||
| TCP Servers (B) cannot support ECN; so as soon as an AccECN- | TCP Servers (B) cannot support ECN; so as soon as an AccECN- | |||
| capable TCP Client (A) receives such a broken SYN/ACK, it MUST | capable TCP Client (A) receives such a broken SYN/ACK, it MUST | |||
| fall back to Not ECN mode for both its half connections and | fall back to Not ECN mode for both its half-connections and | |||
| continue with ECN disabled. | continue with ECN disabled. | |||
| The following additional rules do not fit the structure of the table, | The following additional rules do not fit the structure of the table, | |||
| but they complement it: | but they complement it: | |||
| Simultaneous Open: An originating AccECN Host (A), having sent a SYN | Simultaneous Open: An originating AccECN Host (A), having sent a SYN | |||
| with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | |||
| B. Host A MUST then enter the same feedback mode as it would have | B. Host A MUST then enter the same feedback mode as it would have | |||
| entered had it been a responding host and received the same SYN. | entered had it been a responding host and received the same SYN. | |||
| Then host A MUST send the same SYN/ACK as it would have sent had | Then host A MUST send the same SYN/ACK as it would have sent had | |||
| skipping to change at line 793 ¶ | skipping to change at line 793 ¶ | |||
| such a combination, the Server MUST negotiate the use of AccECN as if | such a combination, the Server MUST negotiate the use of AccECN as if | |||
| the three flags had been set to (1,1,1). However, an AccECN Client | the three flags had been set to (1,1,1). However, an AccECN Client | |||
| implementation MUST NOT send a SYN with any combination other than | implementation MUST NOT send a SYN with any combination other than | |||
| the three listed. | the three listed. | |||
| If a TCP Client sent a SYN requesting AccECN feedback with | If a TCP Client sent a SYN requesting AccECN feedback with | |||
| (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently | (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently | |||
| reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | |||
| logic specific to such a combination, the Client MUST enable AccECN | logic specific to such a combination, the Client MUST enable AccECN | |||
| mode as if the SYN/ACK confirmed that the Server supported AccECN and | mode as if the SYN/ACK confirmed that the Server supported AccECN and | |||
| as if it fed back that the IP-ECN field on the SYN had arrived | as if it fed back that the IP ECN field on the SYN had arrived | |||
| unchanged. However, an AccECN Server implementation MUST NOT send a | unchanged. However, an AccECN Server implementation MUST NOT send a | |||
| SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | |||
| | For the avoidance of doubt, the behaviour described in the | | For the avoidance of doubt, the behaviour described in the | |||
| | present specification applies whether or not the three | | present specification applies whether or not the three | |||
| | remaining reserved TCP header flags are zero. | | remaining reserved TCP header flags are zero. | |||
| All of these requirements ensure that future uses of all the Reserved | All of these requirements ensure that future uses of all the Reserved | |||
| combinations on a SYN or SYN/ACK can rely on consistent behaviour | combinations on a SYN or SYN/ACK (see Table 2) can rely on consistent | |||
| from the installed base of AccECN implementations. See Appendix B.3 | behaviour from the installed base of AccECN implementations. See | |||
| for related discussion. | Appendix B.3 for related discussion. | |||
| 3.1.4. Multiple SYNs or SYN/ACKs | 3.1.4. Multiple SYNs or SYN/ACKs | |||
| 3.1.4.1. Retransmitted SYNs | 3.1.4.1. Retransmitted SYNs | |||
| If the sender of an AccECN SYN (the TCP Client) times out before | If the sender of an AccECN SYN (the TCP Client) times out before | |||
| receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | |||
| AccECN at least one more time by continuing to set all three TCP ECN | AccECN at least one more time by continuing to set all three TCP ECN | |||
| flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | |||
| the usual retransmission timeouts). If this first retransmission | the usual retransmission timeouts). If this first retransmission | |||
| skipping to change at line 830 ¶ | skipping to change at line 830 ¶ | |||
| Retrying once before fall-back adds delay in the case where a | Retrying once before fall-back adds delay in the case where a | |||
| middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | |||
| measurements [Mandalari18] imply that a drop is less likely to be due | measurements [Mandalari18] imply that a drop is less likely to be due | |||
| to middlebox interference than other intermittent causes of loss, | to middlebox interference than other intermittent causes of loss, | |||
| e.g., congestion, wireless transmission loss, etc. | e.g., congestion, wireless transmission loss, etc. | |||
| Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
| be more effective (e.g., attempting to negotiate AccECN on the SYN | be more effective (e.g., attempting to negotiate AccECN on the SYN | |||
| only once or more than twice (most appropriate during high levels of | only once or more than twice (most appropriate during high levels of | |||
| congestion). | congestion)). | |||
| Further it might make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
| experimental fields or options on the SYN in case a middlebox might | experimental fields or options on the SYN in case a middlebox might | |||
| be blocking them, although the required behaviour will depend on the | be blocking them, although the required behaviour will depend on the | |||
| specification of the other option(s) and any attempt to coordinate | specification of the other option(s) and any attempt to coordinate | |||
| fall-back between different modules of the stack. For instance, even | fall-back between different modules of the stack. For instance, if | |||
| if taking part in an [RFC8311] experiment that allows ECT on a SYN, | taking part in an [RFC8311] experiment that allows ECT on a SYN, it | |||
| it would be advisable to try it without. | would be advisable to have a fall-back strategy that tries use of | |||
| AccECN without setting ETC on SYN. | ||||
| Whichever fall-back strategy is used, the TCP initiator SHOULD cache | Whichever fall-back strategy is used, the TCP initiator SHOULD cache | |||
| failed connection attempts. If it does, it SHOULD NOT give up | failed connection attempts. If it does, it SHOULD NOT give up | |||
| attempting to negotiate AccECN on the SYN of subsequent connection | attempting to negotiate AccECN on the SYN of subsequent connection | |||
| attempts until it is clear that the blockage is persistently and | attempts until it is clear that the blockage is persistently and | |||
| specifically due to AccECN. The cache needs to be arranged to expire | specifically due to AccECN. The cache needs to be arranged to expire | |||
| so that the initiator will infrequently attempt to check whether the | so that the initiator will infrequently attempt to check whether the | |||
| problem has been resolved. | problem has been resolved. | |||
| All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
| in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | |||
| negotiating different types of feedback have been sent within the | negotiating different types of feedback have been sent within the | |||
| same connection, including the possibility that they arrive out of | same connection, including the possibility that they arrive out of | |||
| order. As examples, the following non-normative bullets call out | order. As examples, the following non-normative bullets call out | |||
| those rules from Section 3.1.5 that apply to the above fall-back | those rules from Section 3.1.5 that apply to the above fall-back | |||
| strategies: | strategies: | |||
| * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | |||
| with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | |||
| from the Server in response to one, the other, or both, and | from the Server in response to one, the other, or both, and | |||
| possibly reordered; | possibly reordered. | |||
| * Such a TCP Client enters the feedback mode appropriate to the | * Such a TCP Client enters the feedback mode appropriate to the | |||
| first SYN/ACK it receives according to Table 2, and it does not | first SYN/ACK it receives according to Table 2, and it does not | |||
| switch to a different mode, whatever other SYN/ACKs it might | switch to a different mode, whatever other SYN/ACKs it might | |||
| receive or send; | receive or send. | |||
| * If a TCP Client has entered AccECN mode but then subsequently | * If a TCP Client has entered AccECN mode but then subsequently | |||
| sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | |||
| is still allowed to set ECT on packets for the rest of the | is still allowed to set ECT on packets for the rest of the | |||
| connection. Note that this rule is different than that of a | connection. Note that this rule is different than that of a | |||
| Server in an equivalent position (Section 3.1.5 explains). | Server in an equivalent position (Section 3.1.5 explains). | |||
| * Having entered AccECN mode, in general a TCP Client commits to | * Having entered AccECN mode, in general a TCP Client commits to | |||
| respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
| sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
| see Section 3.2.2.3, Section 3.2.2.4); | see Section 3.2.2.3, Section 3.2.2.4). | |||
| * Having entered AccECN mode, a TCP Client commits to using AccECN | * Having entered AccECN mode, a TCP Client commits to using AccECN | |||
| to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP ECN field in incoming packets for the rest of | |||
| the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
| itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
| 3.1.4.2. Retransmitted SYN/ACKs | 3.1.4.2. Retransmitted SYN/ACKs | |||
| A TCP Server might send multiple SYN/ACKs indicating different | A TCP Server might send multiple SYN/ACKs indicating different | |||
| feedback modes. For instance, when falling back to sending a SYN/ACK | feedback modes. For instance, when falling back to sending a SYN/ACK | |||
| with (AE,CWR,ECE) = (0,0,0) after previous AccECN SYN/ACKs have timed | with (AE,CWR,ECE) = (0,0,0) after previous AccECN SYN/ACKs have timed | |||
| out (Section 3.2.3.2.2); or to acknowledge different retransmissions | out (Section 3.2.3.2.2); or to acknowledge different retransmissions | |||
| of the SYN (Section 3.1.4.1). | of the SYN (Section 3.1.4.1). | |||
| skipping to change at line 900 ¶ | skipping to change at line 901 ¶ | |||
| All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
| in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | |||
| negotiating different types of feedback are sent within the same | negotiating different types of feedback are sent within the same | |||
| connection, including the possibility that they arrive out of order. | connection, including the possibility that they arrive out of order. | |||
| As examples, the following non-normative bullets call out those rules | As examples, the following non-normative bullets call out those rules | |||
| from Section 3.1.5 that apply to the above fall-back strategies: | from Section 3.1.5 that apply to the above fall-back strategies: | |||
| * An AccECN-capable TCP Server enters the feedback mode appropriate | * An AccECN-capable TCP Server enters the feedback mode appropriate | |||
| to the first SYN it receives using Table 2, and it does not switch | to the first SYN it receives using Table 2, and it does not switch | |||
| to a different mode, whatever other SYNs it might receive and | to a different mode, whatever other SYNs it might receive and | |||
| whatever SYN/ACKs it might send; | whatever SYN/ACKs it might send. | |||
| * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | |||
| (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | |||
| ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | |||
| * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | |||
| uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | uses the TCP-ECN flags in each SYN/ACK to feed back the IP ECN | |||
| field on the latest SYN to have arrived; | field on the latest SYN to have arrived. | |||
| * If a TCP Server enters AccECN mode and then subsequently sends a | * If a TCP Server enters AccECN mode and then subsequently sends a | |||
| SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | |||
| prohibited from setting ECT on any packet for the rest of the | prohibited from setting ECT on any packet for the rest of the | |||
| connection; | connection. | |||
| * Having entered AccECN mode, in general a TCP Server commits to | * Having entered AccECN mode, in general a TCP Server commits to | |||
| respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
| sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
| see Sections 3.2.2.3, 3.2.2.4); | see Sections 3.2.2.3, 3.2.2.4). | |||
| * Having entered AccECN mode, a TCP Server commits to using AccECN | * Having entered AccECN mode, a TCP Server commits to using AccECN | |||
| to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP ECN field in incoming packets for the rest of | |||
| the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
| itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
| 3.1.5. Implications of AccECN Mode | 3.1.5. Implications of AccECN Mode | |||
| Section 3.1.1 describes the only ways that a host can enter AccECN | Section 3.1.1 describes the only ways that a host can enter AccECN | |||
| mode, whether as a Client or as a Server. | mode, whether as a Client or as a Server. | |||
| An implementation that supports AccECN has the rights and obligations | An implementation that supports AccECN has the rights and obligations | |||
| concerning the use of ECN defined below, which update those in | concerning the use of ECN defined below, which update those in | |||
| Section 6.1.1 of [RFC3168]. This section uses the following | Section 6.1.1 of [RFC3168]. This section uses the following | |||
| definitions: | definitions: | |||
| 'During the handshake': The connection states prior to | 'During the handshake': The connection states prior to | |||
| synchronization; | synchronization. | |||
| 'Valid SYN': A SYN that has the same port numbers and the same ISN | 'Valid SYN': A SYN that has the same port numbers and the same ISN | |||
| as the SYN that first caused the Server to open the connection. | as the SYN that first caused the Server to open the connection. | |||
| An 'Acceptable' packet is defined in Section 1.3. | An 'Acceptable' packet is defined in Section 1.3. | |||
| Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | |||
| * Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
| - MUST NOT switch into a different feedback mode than the one it | - MUST NOT switch into a different feedback mode than the one it | |||
| first entered according to Table 2, no matter whether it | first entered according to Table 2, no matter whether it | |||
| subsequently receives valid SYNs or Acceptable SYN/ACKs of | subsequently receives valid SYNs or Acceptable SYN/ACKs of | |||
| different types. | different types; | |||
| - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | |||
| received after the implementation reaches the Established | received after the implementation reaches the ESTABLISHED | |||
| state, in line with the general TCP approach [RFC9293]; | state, in line with the general TCP approach [RFC9293]; | |||
| Reason: Reaching established state implies that at least one | Reason: Reaching ESTABLISHED state implies that at least one | |||
| SYN and one SYN/ACK have successfully been delivered. And all | SYN and one SYN/ACK have successfully been delivered. And all | |||
| the rules for handshake fall-back are designed to work based on | the rules for handshake fall-back are designed to work based on | |||
| those packets that successfully traverse the path, whatever | those packets that successfully traverse the path, whatever | |||
| other handshake packets are lost or delayed. | other handshake packets are lost or delayed. | |||
| - MUST NOT send a 'Classic' ECN-setup SYN [RFC3168] with | - MUST NOT send a 'Classic' ECN-setup SYN [RFC3168] with | |||
| (AE,CWR,ECE) = (0,1,1) and a SYN with (AE,CWR,ECE) = (1,1,1) | (AE,CWR,ECE) = (0,1,1) and a SYN with (AE,CWR,ECE) = (1,1,1) | |||
| requesting AccECN feedback within the same connection; | requesting AccECN feedback within the same connection; | |||
| - MUST NOT send a 'Classic' ECN-setup SYN/ACK [RFC3168] with | - MUST NOT send a 'Classic' ECN-setup SYN/ACK [RFC3168] with | |||
| (AE,CWR,ECE) = (0,0,1) and a SYN/ACK agreeing to use AccECN | (AE,CWR,ECE) = (0,0,1) and a SYN/ACK agreeing to use AccECN | |||
| feedback within the same connection; | feedback within the same connection; | |||
| - MUST reset the connection with a RST packet, if it receives a | - MUST reset the connection with a RST packet, if it receives a | |||
| 'Classic' ECN-setup SYN with (AE,CWR,ECE) = (0,1,1) and a SYN | 'Classic' ECN-setup SYN with (AE,CWR,ECE) = (0,1,1) and a SYN | |||
| requesting AccECN feedback during the same handshake; | requesting AccECN feedback during the same handshake; | |||
| - MUST reset the connection with a RST packet, if it receives | - MUST reset the connection with a RST packet, if it receives | |||
| 'Classic' ECN-setup SYN/ACK with (AE,CWR,ECE) = (0,0,1) and a | 'Classic' ECN-setup SYN/ACK with (AE,CWR,ECE) = (0,0,1) and a | |||
| SYN/ACK agreeing to use AccECN feedback during the same | SYN/ACK agreeing to use AccECN feedback during the same | |||
| handshake; | handshake. | |||
| The last four rules are necessary because, if one peer were to | The last four rules are necessary because, if one peer were to | |||
| negotiate the feedback mode in two different types of handshake, | negotiate the feedback mode in two different types of handshake, | |||
| it would not be possible for the other peer to know for certain | it would not be possible for the other peer to know for certain | |||
| which handshake packet(s) the other end had eventually received or | which handshake packet(s) the other end had eventually received or | |||
| in which order it received them. So, in the absence of these | in which order it received them. So, in the absence of these | |||
| rules, the two peers could end up using different ECN feedback | rules, the two peers could end up using different ECN feedback | |||
| modes without knowing it. | modes without knowing it. | |||
| * A host in AccECN mode that is feeding back the IP-ECN field on a | * A host in AccECN mode that is feeding back the IP ECN field on a | |||
| SYN or SYN/ACK: | SYN or SYN/ACK: | |||
| - MUST feed back the IP-ECN field on the latest valid SYN or | - MUST feed back the IP ECN field on the latest valid SYN or | |||
| acceptable SYN/ACK to arrive. | acceptable SYN/ACK to arrive. | |||
| * A TCP Server already in AccECN mode: | * A TCP Server already in AccECN mode: | |||
| - SHOULD acknowledge a valid SYN arriving with (AE,CWR,ECE) = | - SHOULD acknowledge a valid SYN arriving with (AE,CWR,ECE) = | |||
| (0,0,0) by emitting an AccECN SYN/ACK (with the appropriate | (0,0,0) by emitting an AccECN SYN/ACK (with the appropriate | |||
| combination of TCP-ECN flags to feed back the IP-ECN field of | combination of TCP-ECN flags to feed back the IP ECN field of | |||
| this latest SYN); | this latest SYN); | |||
| - MAY acknowledge a valid SYN arriving with (AE,CWR,ECE) = | - MAY acknowledge a valid SYN arriving with (AE,CWR,ECE) = | |||
| (0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | (0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | |||
| Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP | Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP | |||
| Server that is already in AccECN mode, it implies that the TCP | Server that is already in AccECN mode, it implies that the TCP | |||
| Client had probably not received the previous AccECN SYN/ACK | Client had probably not received the previous AccECN SYN/ACK | |||
| emitted by the TCP Server. Therefore, the first bullet recommends | emitted by the TCP Server. Therefore, the first bullet recommends | |||
| attempting at least one more AccECN SYN/ACK. Nonetheless, the | attempting at least one more AccECN SYN/ACK. Nonetheless, the | |||
| second bullet recognizes that the Server might eventually need to | second bullet recognizes that the Server might eventually need to | |||
| fall back to a non-ECN SYN/ACK. In either case, the TCP Server | fall back to a non-ECN SYN/ACK. In either case, the TCP Server | |||
| remains in AccECN feedback mode (according to the earlier | remains in AccECN feedback mode (according to the earlier | |||
| requirement not to switch modes). | requirement not to switch modes). | |||
| * An AccECN-capable TCP Server already in Not ECN mode: | * An AccECN-capable TCP Server already in Not ECN mode: | |||
| - SHOULD respond to any subsequent valid SYN using a SYN/ACK with | - SHOULD respond to any subsequent valid SYN using a SYN/ACK with | |||
| (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | |||
| negotiate Classic ECN or AccECN feedback mode; | negotiate Classic ECN or AccECN feedback mode. | |||
| Rationale: There would be no point in the Server offering any | Rationale: There would be no point in the Server offering any | |||
| type of ECN feedback, because the Client will not be using ECN. | type of ECN feedback, because the Client will not be using ECN. | |||
| However, there is no interoperability reason to make this rule | However, there is no interoperability reason to make this rule | |||
| mandatory. | mandatory. | |||
| If for any reason a host is not willing to provide ECN feedback on a | If for any reason a host is not willing to provide ECN feedback on a | |||
| particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags | particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags | |||
| in all SYN and/or SYN/ACK packets that it sends. | in all SYN and/or SYN/ACK packets that it sends. | |||
| skipping to change at line 1040 ¶ | skipping to change at line 1041 ¶ | |||
| - MUST NOT set ECT if it is in Not ECN feedback mode. | - MUST NOT set ECT if it is in Not ECN feedback mode. | |||
| A Data Sender in AccECN mode: | A Data Sender in AccECN mode: | |||
| - SHOULD set an ECT codepoint in the IP header of packets to | - SHOULD set an ECT codepoint in the IP header of packets to | |||
| indicate to the network that the transport is capable and | indicate to the network that the transport is capable and | |||
| willing to participate in ECN for this packet; | willing to participate in ECN for this packet; | |||
| - MAY not set ECT on any packet (for instance if it has reason to | - MAY not set ECT on any packet (for instance if it has reason to | |||
| believe such a packet would be blocked); | believe such a packet would be blocked). | |||
| A TCP Server in AccECN mode: | A TCP Server in AccECN mode: | |||
| - MUST NOT set ECT on any packet for the rest of the connection, | - MUST NOT set ECT on any packet for the rest of the connection, | |||
| if it has received or sent at least one valid SYN or Acceptable | if it has received or sent at least one valid SYN or Acceptable | |||
| SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | |||
| This rule solely applies to a Server because, when a Server | This rule solely applies to a Server because, when a Server | |||
| enters AccECN mode, it doesn't know for sure whether the Client | enters AccECN mode, it doesn't know for sure whether the Client | |||
| will end up in AccECN mode. But when a Client enters AccECN | will end up in AccECN mode. But when a Client enters AccECN | |||
| skipping to change at line 1066 ¶ | skipping to change at line 1067 ¶ | |||
| * A host in AccECN mode: | * A host in AccECN mode: | |||
| - is obliged to respond appropriately to AccECN feedback that | - is obliged to respond appropriately to AccECN feedback that | |||
| indicates there were ECN marks on packets it had previously | indicates there were ECN marks on packets it had previously | |||
| sent, where 'appropriately' is defined in Section 6.1 of | sent, where 'appropriately' is defined in Section 6.1 of | |||
| [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | |||
| - is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
| feedback, even when it is solely sending non-ECN-capable | feedback, even when it is solely sending non-ECN-capable | |||
| packets (for rationale, some examples and some exceptions see | packets (for rationale, some examples and some exceptions see | |||
| Sections 3.2.2.3 and 3.2.2.4). | Sections 3.2.2.3 and 3.2.2.4); | |||
| - is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
| feedback, even if it has sent or received a SYN or SYN/ACK | feedback, even if it has sent or received a SYN or SYN/ACK | |||
| packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | |||
| - MUST NOT set CWR to indicate that it has received and responded | - MUST NOT set CWR to indicate that it has received and responded | |||
| to indications of congestion. | to indications of congestion. | |||
| For the avoidance of doubt, this is unlike an RFC 3168 data | For the avoidance of doubt, this is unlike an RFC 3168 data | |||
| sender and this does not preclude the Data Sender from setting | sender and this does not preclude the Data Sender from setting | |||
| the bits of the ACE counter field, which includes an overloaded | the bits of the ACE counter field, which includes an overloaded | |||
| use of the same bit. | use of the same bit. | |||
| Receiving ECT: | Receiving ECT: | |||
| * A host in AccECN mode: | * A host in AccECN mode: | |||
| - MUST feed back the information in the IP-ECN field of incoming | - MUST feed back the information in the IP ECN field of incoming | |||
| packets using Accurate ECN feedback, as specified in | packets using Accurate ECN feedback, as specified in | |||
| Section 3.2. | Section 3.2. | |||
| For the avoidance of doubt, this requirement stands even if the | For the avoidance of doubt, this requirement stands even if the | |||
| AccECN host has also sent or received a SYN or SYN/ACK with | AccECN host has also sent or received a SYN or SYN/ACK with | |||
| (AE,CWR,ECE) = (0,0,0). Reason: Such a SYN or SYN/ACK implies | (AE,CWR,ECE) = (0,0,0). Reason: Such a SYN or SYN/ACK implies | |||
| some form of packet mangling might be present. Even if the | some form of packet mangling might be present. Even if the | |||
| remote peer is not setting ECT, it could still be set | remote peer is not setting ECT, it could still be set | |||
| erroneously by packet mangling at the IP layer (see | erroneously by packet mangling at the IP layer (see | |||
| Section 3.2.2.3). In such cases, the Data Sender is best | Section 3.2.2.3). In such cases, the Data Sender is best | |||
| placed to decide whether ECN markings are valid, but it can | placed to decide whether ECN markings are valid, but it can | |||
| only do that if the Data Receiver mechanistically feeds back | only do that if the Data Receiver mechanistically feeds back | |||
| any ECN markings. This approach will not lead to TCP Options | any ECN markings. This approach will not lead to TCP Options | |||
| being generated unnecessarily if the recommended simple scheme | being generated unnecessarily if the recommended simple scheme | |||
| in Section 3.2.3.3 is used, because no byte counters will | in Section 3.2.3.3 is used, because no byte counters will | |||
| change if no packets are set to ECT. | change if no packets are set to ECT. | |||
| - MUST NOT use reception of packets with ECT set in the IP-ECN | - MUST NOT use reception of packets with ECT set in the IP ECN | |||
| field as an implicit signal that the peer is ECN-capable. | field as an implicit signal that the peer is ECN-capable. | |||
| Reason: ECT at the IP layer does not explicitly confirm the | Reason: ECT at the IP layer does not explicitly confirm the | |||
| peer has the correct ECN feedback logic, because the packets | peer has the correct ECN feedback logic, because the packets | |||
| could have been mangled at the IP layer. | could have been mangled at the IP layer. | |||
| 3.2. AccECN Feedback | 3.2. AccECN Feedback | |||
| Each Data Receiver of each half connection maintains four counters, | Each Data Receiver of each half-connection maintains four counters, | |||
| r.cep, r.ceb, r.e0b, and r.e1b: | r.cep, r.ceb, r.e0b, and r.e1b: | |||
| * The Data Receiver MUST increment the CE packet counter (r.cep), | * The Data Receiver MUST increment the CE packet counter (r.cep), | |||
| for every Acceptable packet that it receives with the CE code | for every Acceptable packet that it receives with the CE code | |||
| point in the IP-ECN field, including CE-marked control packets and | point in the IP ECN field, including CE-marked control packets and | |||
| retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | |||
| * A Data Receiver that supports sending of AccECN TCP Options MUST | * A Data Receiver that supports sending of AccECN TCP Options MUST | |||
| increment the r.ceb, r.e0b, or r.e1b byte counters by the number | increment the r.ceb, r.e0b, or r.e1b byte counters by the number | |||
| of TCP payload octets in Acceptable packets marked with the CE, | of TCP payload octets in Acceptable packets marked with the CE, | |||
| ECT(0), and ECT(1) codepoint in their IP-ECN field, including any | ECT(0), and ECT(1) codepoint in their IP ECN field, including any | |||
| payload octets on control packets and retransmissions, but not | payload octets on control packets and retransmissions, but not | |||
| including any payload octets on SYN packets (SYN=1; ACK=0). | including any payload octets on SYN packets (SYN=1; ACK=0). | |||
| Each Data Sender of each half connection maintains four counters, | Each Data Sender of each half-connection maintains four counters, | |||
| s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | |||
| counters at the Data Receiver. | counters at the Data Receiver. | |||
| A Data Receiver feeds back the CE packet counter using the Accurate | A Data Receiver feeds back the CE packet counter using the Accurate | |||
| ECN (ACE) field, as explained in Section 3.2.2. And it optionally | ECN (ACE) field, as explained in Section 3.2.2. And it optionally | |||
| feeds back all the byte counters using the AccECN TCP Option, as | feeds back all the byte counters using the AccECN TCP Option, as | |||
| specified in Section 3.2.3. | specified in Section 3.2.3. | |||
| Whenever a Data Receiver feeds back the value of any counter, it MUST | Whenever a Data Receiver feeds back the value of any counter, it MUST | |||
| report the most recent value, no matter whether it is in a pure ACK, | report the most recent value, no matter whether it is in a pure ACK, | |||
| or an ACK piggybacked on a packet used by the other half-connection, | or an ACK piggybacked on a packet used by the other half-connection, | |||
| whether a new payload data or a retransmission. Therefore, the | whether a new payload data or a retransmission. Therefore, the | |||
| feedback piggybacked on a retransmitted packet is unlikely to be the | feedback piggybacked on a retransmitted packet is unlikely to be the | |||
| same as the feedback on the original packet. | same as the feedback on the original packet. | |||
| 3.2.1. Initialization of Feedback Counters | 3.2.1. Initialization of Feedback Counters | |||
| When a host first enters AccECN mode, in its role as a Data Receiver, | When a host first enters AccECN mode, in its role as a Data Receiver, | |||
| it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1, and | it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1, and | |||
| r.ceb = 0, | r.ceb = 0. | |||
| Non-zero initial values are used to support a stateless handshake | Non-zero initial values are used to support a stateless handshake | |||
| (see Section 5.1) and to be distinct from cases where the fields are | (see Section 5.1) and to be distinct from cases where the fields are | |||
| incorrectly zeroed (e.g., by middleboxes -- see Section 3.2.3.2.4). | incorrectly zeroed (e.g., by middleboxes -- see Section 3.2.3.2.4). | |||
| When a host enters AccECN mode, in its role as a Data Sender, it | When a host enters AccECN mode, in its role as a Data Sender, it | |||
| initializes its counters to s.cep = 5, s.e0b = s.e1b = 1, and s.ceb = | initializes its counters to s.cep = 5, s.e0b = s.e1b = 1, and s.ceb = | |||
| 0. | 0. | |||
| 3.2.2. The ACE Field | 3.2.2. The ACE Field | |||
| skipping to change at line 1200 ¶ | skipping to change at line 1201 ¶ | |||
| Both parts of each of these conditions are equally important. For | Both parts of each of these conditions are equally important. For | |||
| instance, even if AccECN negotiation has been successful, the ACE | instance, even if AccECN negotiation has been successful, the ACE | |||
| field is not defined on any segments with SYN=1 (e.g., a | field is not defined on any segments with SYN=1 (e.g., a | |||
| retransmission of an unacknowledged SYN/ACK, or when both ends send | retransmission of an unacknowledged SYN/ACK, or when both ends send | |||
| SYN/ACKs after AccECN support has been successfully negotiated during | SYN/ACKs after AccECN support has been successfully negotiated during | |||
| a simultaneous open). | a simultaneous open). | |||
| 3.2.2.1. ACE Field on the ACK of the SYN/ACK | 3.2.2.1. ACE Field on the ACK of the SYN/ACK | |||
| A TCP Client (A) in AccECN mode MUST feed back which of the 4 | A TCP Client (A) in AccECN mode MUST feed back which of the 4 | |||
| possible values of the IP-ECN field was on the SYN/ACK by writing it | possible values of the IP ECN field was on the SYN/ACK by writing it | |||
| into the ACE field of a pure ACK with no SACK blocks using the binary | into the ACE field of a pure ACK with no SACK blocks using the binary | |||
| encoding in Table 3 (which is the same as that used on the SYN/ACK in | encoding in Table 3 (which is the same as that used on the SYN/ACK in | |||
| Table 2). This shall be called the handshake encoding of the ACE | Table 2). This shall be called the "handshake encoding" of the ACE | |||
| field, and it is the only exception to the rule that the ACE field | field, and it is the only exception to the rule that the ACE field | |||
| carries the 3 least significant bits of the r.cep counter on packets | carries the 3 least significant bits of the r.cep counter on packets | |||
| with SYN=0. | with SYN=0. | |||
| Normally, a TCP Client acknowledges a SYN/ACK with an ACK that | Normally, a TCP Client acknowledges a SYN/ACK with an ACK that | |||
| satisfies the above conditions anyway (SYN=0, no data, no SACK | satisfies the above conditions anyway (SYN=0, no data, no SACK | |||
| blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK | blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK | |||
| with a packet that does not satisfy these conditions (e.g., it has | with a packet that does not satisfy these conditions (e.g., it has | |||
| data to include on the ACK), it SHOULD first send a pure ACK that | data to include on the ACK), it SHOULD first send a pure ACK that | |||
| does satisfy these conditions (see Section 5.2), so that it can feed | does satisfy these conditions (see Section 5.2), so that it can feed | |||
| back which of the four values of the IP-ECN field arrived on the SYN/ | back which of the four values of the IP ECN field arrived on the SYN/ | |||
| ACK. A valid exception to this "SHOULD" would be where the | ACK. A valid exception to this "SHOULD" would be where the | |||
| implementation will only be used in an environment where mangling of | implementation will only be used in an environment where mangling of | |||
| the ECN field is unlikely. | the ECN field is unlikely. | |||
| The TCP Client MUST also use the handshake encoding for the pure ACK | The TCP Client MUST also use the handshake encoding for the pure ACK | |||
| of any retransmitted SYN/ACK that confirms that the TCP Server | of any retransmitted SYN/ACK that confirms that the TCP Server | |||
| supports AccECN. If the final ACK of the handshake does not arrive | supports AccECN. If the final ACK of the handshake does not arrive | |||
| before its retransmission timer expires, the TCP Server is follow the | before its retransmission timer expires, the procedure that the TCP | |||
| procedure given in Section 3.1.4.2. | Server will follow is given in Section 3.1.4.2. | |||
| +==================+================+=====================+ | +==================+================+=====================+ | |||
| | IP-ECN codepoint | ACE on pure | r.cep of TCP Client | | | IP ECN Codepoint | ACE on Pure | r.cep of TCP Client | | |||
| | on SYN/ACK | ACK of SYN/ACK | in AccECN mode | | | on SYN/ACK | ACK of SYN/ACK | in AccECN Mode | | |||
| +==================+================+=====================+ | +==================+================+=====================+ | |||
| | Not-ECT | 0b010 | 5 | | | Not-ECT | 0b010 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | ECT(1) | 0b011 | 5 | | | ECT(1) | 0b011 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | ECT(0) | 0b100 | 5 | | | ECT(0) | 0b100 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | CE | 0b110 | 6 | | | CE | 0b110 | 6 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| Table 3: The Encoding of the ACE Field in the ACK of | Table 3: The Encoding of the ACE Field in the ACK of | |||
| the SYN-ACK to Reflect the SYN-ACK's IP-ECN Field | the SYN-ACK to Reflect the SYN-ACK's IP ECN Field | |||
| When an AccECN Server in SYN-RCVD state receives a pure ACK with | When an AccECN Server in SYN-RCVD state receives a pure ACK with | |||
| SYN=0 and no SACK blocks, instead of treating the ACE field as a | SYN=0 and no SACK blocks, it MUST infer the meaning of each possible | |||
| counter, it MUST infer the meaning of each possible value of the ACE | value of the ACE field from Table 4 instead of treating the ACE field | |||
| field from Table 4, which also shows the value that an AccECN Server | as a counter. As a result, an AccECN Server MUST set s.cep to the | |||
| MUST set s.cep to as a result. | respective value, also shown in Table 4. | |||
| Given this encoding of the ACE field on the ACK of a SYN/ACK is | Given this encoding of the ACE field on the ACK of a SYN/ACK is | |||
| exceptional, an AccECN Server using large receive offload (LRO) might | exceptional, an AccECN Server using large receive offload (LRO) might | |||
| prefer to disable LRO until such an ACK has transitioned it out of | prefer to disable LRO until the ACK of the SYN/ACK was sent and it | |||
| SYN-RCVD state. | has transitioned out of SYN-RCVD state. | |||
| +============+==========================+=====================+ | +============+==========================+=====================+ | |||
| | ACE on ACK | IP-ECN codepoint on SYN/ | s.cep of TCP Server | | | ACE on ACK | IP ECN Codepoint on SYN/ | s.cep of TCP Server | | |||
| | of SYN/ACK | ACK inferred by Server | in AccECN mode | | | of SYN/ACK | ACK Inferred by Server | in AccECN Mode | | |||
| +============+==========================+=====================+ | +============+==========================+=====================+ | |||
| | 0b000 | {Notes 1, 3} | Disable s.cep | | | 0b000 | {Notes 1, 3} | Disable s.cep | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b001 | {Notes 2, 3} | 5 | | | 0b001 | {Notes 2, 3} | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b010 | Not-ECT | 5 | | | 0b010 | Not-ECT | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b011 | ECT(1) | 5 | | | 0b011 | ECT(1) | 5 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b100 | ECT(0) | 5 | | | 0b100 | ECT(0) | 5 | | |||
| skipping to change at line 1291 ¶ | skipping to change at line 1292 ¶ | |||
| AccECN feedback. Nonetheless, as a Data Receiver, it MUST | AccECN feedback. Nonetheless, as a Data Receiver, it MUST | |||
| NOT disable AccECN feedback. | NOT disable AccECN feedback. | |||
| Any of the circumstances below could cause a value of zero | Any of the circumstances below could cause a value of zero | |||
| but, whatever the cause, the actions above would be the | but, whatever the cause, the actions above would be the | |||
| appropriate response: | appropriate response: | |||
| * The TCP Client has somehow entered No ECN feedback mode | * The TCP Client has somehow entered No ECN feedback mode | |||
| (most likely if the Server received a SYN or sent a SYN/ | (most likely if the Server received a SYN or sent a SYN/ | |||
| ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN | ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN | |||
| mode, but possible even if it didn't); | mode, but possible even if it didn't). | |||
| * The TCP Client genuinely might be in AccECN mode, but its | * The TCP Client genuinely might be in AccECN mode, but its | |||
| count of received CE marks might have caused the ACE | count of received CE marks might have caused the ACE | |||
| field to wrap to zero. This is highly unlikely, but not | field to wrap to zero. This is highly unlikely, but not | |||
| impossible because the Server might have already sent | impossible because the Server might have already sent | |||
| multiple packets while still in SYN-RCVD state, e.g., | multiple packets while still in SYN-RCVD state, e.g., | |||
| using TFO (see Section 5.2), and some might have been CE- | using TFO (see Section 5.2), and some might have been CE- | |||
| marked. Then ACE on the first ACK seen by the Server | marked. Then ACE on the first ACK seen by the Server | |||
| might be zero, due to previous ACKs experiencing an | might be zero, due to previous ACKs experiencing an | |||
| unfortunate pattern of loss or delay. | unfortunate pattern of loss or delay. | |||
| skipping to change at line 1354 ¶ | skipping to change at line 1355 ¶ | |||
| * It then follows the safety procedures in Section 3.2.2.5.2 to | * It then follows the safety procedures in Section 3.2.2.5.2 to | |||
| calculate or estimate how many packets the ACK could have | calculate or estimate how many packets the ACK could have | |||
| acknowledged under the prevailing conditions to determine whether | acknowledged under the prevailing conditions to determine whether | |||
| the ACE field might have wrapped more than once. | the ACE field might have wrapped more than once. | |||
| The encode/decode procedures during the three-way handshake are | The encode/decode procedures during the three-way handshake are | |||
| exceptions to the general rules given so far, so they are spelled out | exceptions to the general rules given so far, so they are spelled out | |||
| step by step below for clarity: | step by step below for clarity: | |||
| * If a TCP Server in AccECN mode receives a CE mark in the IP-ECN | * If a TCP Server in AccECN mode receives a CE mark in the IP ECN | |||
| field of a SYN (SYN=1, ACK=0), it MUST NOT increment r.cep (it | field of a SYN (SYN=1, ACK=0), it MUST NOT increment r.cep (it | |||
| remains at its initial value of 5). | remains at its initial value of 5). | |||
| Reason: It would be redundant for the Server to include CE-marked | Reason: It would be redundant for the Server to include CE-marked | |||
| SYNs in its r.cep counter, because it already reliably delivers | SYNs in its r.cep counter, because it already reliably delivers | |||
| feedback of any CE marking using the encoding in the top block of | feedback of any CE marking using the encoding in the top block of | |||
| Table 2 in the SYN/ACK. This also ensures that, when the Server | Table 2 in the SYN/ACK. This also ensures that, when the Server | |||
| starts using the ACE field, it has not unnecessarily consumed more | starts using the ACE field, it has not unnecessarily consumed more | |||
| than one initial value, given they can be used to negotiate | than one initial value, given they can be used to negotiate | |||
| variants of the AccECN protocol (see Appendix B.3). | variants of the AccECN protocol (see Appendix B.3). | |||
| * If a TCP Client in AccECN mode receives CE feedback in the TCP | * If a TCP Client in AccECN mode receives CE feedback in the TCP | |||
| flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | |||
| initial value of 5) so that it stays in step with r.cep on the | initial value of 5) so that it stays in step with r.cep on the | |||
| Server. Nonetheless, the TCP Client still triggers the congestion | Server. Nonetheless, the TCP Client still triggers the congestion | |||
| control actions necessary to respond to the CE feedback. | control actions necessary to respond to the CE feedback. | |||
| * If a TCP Client in AccECN mode receives a CE mark in the IP-ECN | * If a TCP Client in AccECN mode receives a CE mark in the IP ECN | |||
| field of a SYN/ACK, it MUST increment r.cep, but no more than once | field of a SYN/ACK, it MUST increment r.cep, but no more than once | |||
| no matter how many CE-marked SYN/ACKs it receives (i.e., | no matter how many CE-marked SYN/ACKs it receives (i.e., | |||
| incremented from 5 to 6, but no further). | incremented from 5 to 6, but no further). | |||
| Reason: Incrementing r.cep ensures the Client will eventually | Reason: Incrementing r.cep ensures the Client will eventually | |||
| deliver any CE marking to the Server reliably when it starts using | deliver any CE marking to the Server reliably when it starts using | |||
| the ACE field. Even though the Client also feeds back any CE | the ACE field. Even though the Client also feeds back any CE | |||
| marking on the ACK of the SYN/ACK using the encoding in Table 3, | marking on the ACK of the SYN/ACK using the encoding in Table 3, | |||
| this ACK is not delivered reliably, so it can be considered as a | this ACK is not delivered reliably, so it can be considered as a | |||
| timely notification that is redundant but unreliable. The Client | timely notification that is redundant but unreliable. The Client | |||
| skipping to change at line 1417 ¶ | skipping to change at line 1418 ¶ | |||
| ACK of the SYN/ACK) that is delayed for longer than the Server's | ACK of the SYN/ACK) that is delayed for longer than the Server's | |||
| retransmission timeout; or packet duplication by the network. And | retransmission timeout; or packet duplication by the network. And | |||
| the impact of any error in the feedback on such ACKs will only be | the impact of any error in the feedback on such ACKs will only be | |||
| temporary. | temporary. | |||
| 3.2.2.3. Testing for Mangling of the IP/ECN Field | 3.2.2.3. Testing for Mangling of the IP/ECN Field | |||
| * TCP Client side: | * TCP Client side: | |||
| The value of the TCP-ECN flags on the SYN/ACK indicates the value | The value of the TCP-ECN flags on the SYN/ACK indicates the value | |||
| of the IP-ECN field when the SYN arrived at the Server. The TCP | of the IP ECN field when the SYN arrived at the Server. The TCP | |||
| Client can compare this with how it originally set the IP-ECN | Client can compare this with how it originally set the IP ECN | |||
| field on the SYN. If this comparison implies an invalid | field on the SYN. If this comparison implies an invalid | |||
| transition (defined below) of the IP-ECN field, for the remainder | transition (defined below) of the IP ECN field, for the remainder | |||
| of the half-connection the Client is advised to send non-ECN- | of the half-connection the Client is advised to send non-ECN- | |||
| capable packets, but it still ought to respond to any feedback of | capable packets, but it still ought to respond to any feedback of | |||
| CE markings (explained below). However, the TCP Client MUST | CE markings (explained below). However, the TCP Client MUST | |||
| remain in the AccECN feedback mode and it MUST continue to feed | remain in the AccECN feedback mode and it MUST continue to feed | |||
| back any ECN markings on arriving packets (in its role as Data | back any ECN markings on arriving packets (in its role as Data | |||
| Receiver). | Receiver). | |||
| * TCP Server side: | * TCP Server side: | |||
| The value of the ACE field on the last ACK of the three-way | The value of the ACE field on the last ACK of the three-way | |||
| handshake indicates the value of the IP-ECN field when the SYN/ACK | handshake indicates the value of the IP ECN field when the SYN/ACK | |||
| arrived at the TCP Client. The Server can compare this with how | arrived at the TCP Client. The Server can compare this with how | |||
| it originally set the IP-ECN field on the SYN/ACK. If this | it originally set the IP ECN field on the SYN/ACK. If this | |||
| comparison implies an invalid transition of the IP-ECN field, for | comparison implies an invalid transition of the IP ECN field, for | |||
| the remainder of the half-connection the Server is advised to send | the remainder of the half-connection the Server is advised to send | |||
| non-ECN-capable packets, but it still ought to respond to any | non-ECN-capable packets, but it still ought to respond to any | |||
| feedback of CE markings (explained below). However, the Server | feedback of CE markings (explained below). However, the Server | |||
| MUST remain in the AccECN feedback mode and it MUST continue to | MUST remain in the AccECN feedback mode and it MUST continue to | |||
| feed back any ECN markings on arriving packets (in its role as | feed back any ECN markings on arriving packets (in its role as | |||
| Data Receiver). | Data Receiver). | |||
| If a Data Sender in AccECN mode starts sending non-ECN-capable | If a Data Sender in AccECN mode starts sending non-ECN-capable | |||
| packets because it has detected mangling, it is still advised to | packets because it has detected mangling, it is still advised to | |||
| respond to CE feedback. Reason: Any CE marking arriving at the Data | respond to CE feedback. Reason: Any CE marking arriving at the Data | |||
| Receiver could be due to something early in the path mangling the | Receiver could be due to something early in the path mangling the | |||
| non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, | non-ECN-capable IP ECN field into an ECN-capable codepoint and then, | |||
| later in the path, a network bottleneck might be applying CE markings | later in the path, a network bottleneck might be applying CE markings | |||
| to indicate genuine congestion. This argument applies whether the | to indicate genuine congestion. This argument applies whether the | |||
| handshake packet originally sent by the TCP Client or Server was non- | handshake packet originally sent by the TCP Client or Server was non- | |||
| ECN-capable or ECN-capable because, in either case, an unsafe | ECN-capable or ECN-capable because, in either case, an unsafe | |||
| transition could imply that non-ECN-capable packets later in the | transition could imply that non-ECN-capable packets later in the | |||
| connection might get mangled. | connection might get mangled. | |||
| Once a Data Sender has entered AccECN mode it is advised to check | Once a Data Sender has entered AccECN mode it is advised to check | |||
| whether it is receiving continuous feedback of CE. Specifying | whether it is receiving continuous feedback of CE. Specifying | |||
| exactly how to do this is beyond the scope of the present | exactly how to do this is beyond the scope of the present | |||
| skipping to change at line 1483 ¶ | skipping to change at line 1484 ¶ | |||
| As always, once a host has entered AccECN mode, it follows the | As always, once a host has entered AccECN mode, it follows the | |||
| general mandatory requirements (Section 3.1.5) to remain in the same | general mandatory requirements (Section 3.1.5) to remain in the same | |||
| feedback mode and to continue feeding back any ECN markings on | feedback mode and to continue feeding back any ECN markings on | |||
| arriving packets using AccECN feedback. This follows the general | arriving packets using AccECN feedback. This follows the general | |||
| approach where an AccECN Data Receiver mechanistically reflects | approach where an AccECN Data Receiver mechanistically reflects | |||
| whatever it receives (Section 2.5). | whatever it receives (Section 2.5). | |||
| The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | |||
| count of CE marks is still eventually delivered reliably). If this | count of CE marks is still eventually delivered reliably). If this | |||
| ACK does not arrive, the Server is advised to continue to send ECN- | ACK does not arrive, the Server is advised to continue to send ECN- | |||
| capable packets without having tested for mangling of the IP-ECN | capable packets without having tested for mangling of the IP ECN | |||
| field on the SYN/ACK. | field on the SYN/ACK. | |||
| All the fall-back behaviours in this section are necessary in case | All the fall-back behaviours in this section are necessary in case | |||
| mangling of the IP-ECN field is asymmetric, which is currently common | mangling of the IP ECN field is asymmetric, which is currently common | |||
| over some mobile networks [Mandalari18]. In this case, one end might | over some mobile networks [Mandalari18]. In this case, one end might | |||
| see no unsafe transition and continue sending ECN-capable packets, | see no unsafe transition and continue sending ECN-capable packets, | |||
| while the other end sees an unsafe transition and stops sending ECN- | while the other end sees an unsafe transition and stops sending ECN- | |||
| capable packets. | capable packets. | |||
| Invalid transitions of the IP-ECN field are defined in Section 18 of | Invalid transitions of the IP ECN field are defined in Section 18 of | |||
| the Classic ECN specification [RFC3168] and repeated here for | the Classic ECN specification [RFC3168] and repeated here for | |||
| convenience: | convenience: | |||
| * the Not-ECT codepoint changes; | * the Not-ECT codepoint changes; | |||
| * either ECT codepoint transitions to Not-ECT; | * either ECT codepoint transitions to Not-ECT; | |||
| * the CE codepoint changes. | * the CE codepoint changes. | |||
| RFC 3168 says that a router that changes ECT to Not-ECT is invalid | RFC 3168 says that a router that changes ECT to Not-ECT is invalid | |||
| skipping to change at line 1539 ¶ | skipping to change at line 1540 ¶ | |||
| Reason: the symptoms imply any or all of the following: | Reason: the symptoms imply any or all of the following: | |||
| * the remote peer has somehow entered Not ECN feedback mode; | * the remote peer has somehow entered Not ECN feedback mode; | |||
| * a broken remote TCP implementation; | * a broken remote TCP implementation; | |||
| * potential mangling of the ECN fields in the TCP headers (although | * potential mangling of the ECN fields in the TCP headers (although | |||
| unlikely given they clearly survived during the handshake). | unlikely given they clearly survived during the handshake). | |||
| This advice is not stated normatively (in capitals), because the best | This advice is not stated normatively (in capitals), because the best | |||
| strategy might depend on experience of the most likely scenarios, | strategy might depend on the likelihood to experience these | |||
| which can only be known at the time of deployment. | scenarios, which can only be known at the time of deployment. | |||
| Note that a host in AccECN mode MUST continue to provide Accurate ECN | Note that a host in AccECN mode MUST continue to provide Accurate ECN | |||
| feedback to its peer, even if it is no longer sending ECT itself over | feedback to its peer, even if it is no longer sending ECT itself over | |||
| the other half connection. | the other half-connection. | |||
| If reordering occurs, the first feedback packet that arrives will not | If reordering occurs, the first feedback packet that arrives will not | |||
| necessarily be the same as the first packet in sequence order. The | necessarily be the same as the first packet in sequence order. The | |||
| test has been specified loosely like this to simplify implementation, | test has been specified loosely like this to simplify implementation, | |||
| and because it would not have been any more precise to have specified | and because it would not have been any more precise to have specified | |||
| the first packet in sequence order, which would not necessarily be | the first packet in sequence order, which would not necessarily be | |||
| the first ACE counter that the Data Receiver fed back anyway, given | the first ACE counter that the Data Receiver fed back anyway, given | |||
| it might have been a retransmission. | it might have been a retransmission. | |||
| The possibility of reordering means that there is a small chance that | The possibility of reordering means that there is a small chance that | |||
| the ACE field on the first packet to arrive is genuinely zero | the ACE field on the first packet to arrive is genuinely zero | |||
| (without middlebox interference). This would cause a host to | (without middlebox interference). This would cause a host to | |||
| unnecessarily disable ECN for a half connection. Therefore, in | unnecessarily disable ECN for a half-connection. Therefore, in | |||
| environments where there is no evidence of the ACE field being | environments where there is no evidence of the ACE field being | |||
| zeroed, implementations MAY skip this test. | zeroed, implementations MAY skip this test. | |||
| Note that the Data Sender MUST NOT test whether the arriving counter | Note that the Data Sender MUST NOT test whether the arriving counter | |||
| in the initial ACE field has been initialized to a specific valid | in the initial ACE field has been initialized to a specific valid | |||
| value -- the above check solely tests whether the ACE fields have | value -- the above check solely tests whether the ACE fields have | |||
| been incorrectly zeroed. This allows hosts to use different initial | been incorrectly zeroed. This allows hosts to use different initial | |||
| values as an additional signalling channel in the future. | values as an additional signalling channel in the future. | |||
| 3.2.2.5. Safety Against Ambiguity of the ACE Field | 3.2.2.5. Safety Against Ambiguity of the ACE Field | |||
| skipping to change at line 1585 ¶ | skipping to change at line 1586 ¶ | |||
| The following rules define when the receiver of a packet in AccECN | The following rules define when the receiver of a packet in AccECN | |||
| mode emits an ACK: | mode emits an ACK: | |||
| Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | |||
| whenever a data packet marked CE arrives after the previous packet | whenever a data packet marked CE arrives after the previous packet | |||
| was not CE. | was not CE. | |||
| Even though this rule is stated as a "SHOULD", it is important for | Even though this rule is stated as a "SHOULD", it is important for | |||
| a transition to trigger an ACK if at all possible. The only valid | a transition to trigger an ACK if at all possible. The only valid | |||
| exception to this rule is given below these bullets. | exception to this rule is due to Large Receive Offload (LRO) or | |||
| Generic Receive Offload (GRO) as further described below. | ||||
| For the avoidance of doubt, this rule is deliberately worded to | For the avoidance of doubt, this rule is deliberately worded to | |||
| apply solely when _data_ packets arrive, but the comparison with | apply solely when _data_ packets arrive, but the comparison with | |||
| the previous packet includes any packet, not just data packets. | the previous packet includes any packet, not just data packets. | |||
| Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | |||
| an ACK if 'n' CE marks have arrived since the previous ACK. If | an ACK if 'n' CE marks have arrived since the previous ACK. If | |||
| there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | |||
| there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | |||
| and MUST be no less than 3. In either case, 'n' MUST be no | and MUST be no less than 3. In either case, 'n' MUST be no | |||
| skipping to change at line 1723 ¶ | skipping to change at line 1725 ¶ | |||
| Figure 4 shows two option field orders; order 0 and order 1. They | Figure 4 shows two option field orders; order 0 and order 1. They | |||
| both consist of three 24-bit fields. Order 0 provides the 24 least | both consist of three 24-bit fields. Order 0 provides the 24 least | |||
| significant bits of the r.e0b, r.ceb, and r.e1b counters, | significant bits of the r.e0b, r.ceb, and r.e1b counters, | |||
| respectively. Order 1 provides the same fields, but in the opposite | respectively. Order 1 provides the same fields, but in the opposite | |||
| order. On each packet, the Data Receiver can use whichever order is | order. On each packet, the Data Receiver can use whichever order is | |||
| more efficient. In either case, the bytes within the fields are in | more efficient. In either case, the bytes within the fields are in | |||
| network byte order (big-endian). | network byte order (big-endian). | |||
| The choice to use three bytes (24 bits) fields in the options was | The choice to use three bytes (24 bits) fields in the options was | |||
| made to strike a balance between TCP option space usage, and the | made to strike a balance between TCP Option space usage, and the | |||
| required fidelity of the counters to accommodate typical scenarios | required fidelity of the counters to accommodate typical scenarios | |||
| such as hardware TCP Segmentation Offloading (TSO), and periods | such as hardware TCP Segmentation Offloading (TSO), and periods | |||
| during which no option may be transmitted (e.g., SACK loss recovery). | during which no option may be transmitted (e.g., SACK loss recovery). | |||
| Providing only 2 bytes (16 bits) for these counters could easily roll | Providing only 2 bytes (16 bits) for these counters could easily roll | |||
| over within a single TSO transmission or large/generic receive | over within a single TSO transmission or large/generic receive | |||
| offload (LRO/GRO) event. Having two distinct orderings further | offload (LRO/GRO) event. Having two distinct orderings further | |||
| allows the transmission of the most pertinent changes in an | allows the transmission of the most pertinent changes in an | |||
| abbreviated option (see below). | abbreviated option (see below). | |||
| When a Data Receiver sends an AccECN Option, it MUST set the Kind | When a Data Receiver sends an AccECN Option, it MUST set the Kind | |||
| skipping to change at line 1862 ¶ | skipping to change at line 1864 ¶ | |||
| AccECN Options. To expedite connection setup in deployment scenarios | AccECN Options. To expedite connection setup in deployment scenarios | |||
| where AccECN path traversal might be problematic, the TCP Server | where AccECN path traversal might be problematic, the TCP Server | |||
| SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | |||
| retransmission times out, to expedite connection setup, the TCP | retransmission times out, to expedite connection setup, the TCP | |||
| Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | |||
| no AccECN Option, but it remains in AccECN feedback mode (per | no AccECN Option, but it remains in AccECN feedback mode (per | |||
| Section 3.1.5). | Section 3.1.5). | |||
| | Note that a retransmitted AccECN SYN/ACK will not necessarily | | Note that a retransmitted AccECN SYN/ACK will not necessarily | |||
| | have the same TCP-ECN flags as the original SYN/ACK, because it | | have the same TCP-ECN flags as the original SYN/ACK, because it | |||
| | feeds back the IP-ECN field of the latest SYN to have arrived | | feeds back the IP ECN field of the latest SYN to have arrived | |||
| | (by the rule in Section 3.1.5). | | (by the rule in Section 3.1.5). | |||
| The above fall-back approach limits any interference by middleboxes | The above fall-back approach limits any interference by middleboxes | |||
| that might drop packets with unknown options, even though it is more | that might drop packets with unknown options, even though it is more | |||
| likely that SYN/ACK loss is due to congestion. The TCP Server MAY | likely that SYN/ACK loss is due to congestion. The TCP Server MAY | |||
| try to send another packet with an AccECN Option at a later point | try to send another packet with an AccECN Option at a later point | |||
| during the connection but it ought to monitor if that packet got lost | during the connection but it ought to monitor if that packet got lost | |||
| as well, in which case it SHOULD disable the sending of AccECN | as well, in which case it SHOULD disable the sending of AccECN | |||
| Options for this half-connection. | Options for this half-connection. | |||
| skipping to change at line 1922 ¶ | skipping to change at line 1924 ¶ | |||
| packets carried an AccECN Option and disable the sending of AccECN | packets carried an AccECN Option and disable the sending of AccECN | |||
| Options if the loss probability of those packets is significantly | Options if the loss probability of those packets is significantly | |||
| higher than that of all other data packets in the same connection. | higher than that of all other data packets in the same connection. | |||
| 3.2.3.2.3. Testing for Absence of the AccECN Option | 3.2.3.2.3. Testing for Absence of the AccECN Option | |||
| If the TCP Client has successfully negotiated AccECN but does not | If the TCP Client has successfully negotiated AccECN but does not | |||
| receive an AccECN Option on the SYN/ACK (e.g., because is has been | receive an AccECN Option on the SYN/ACK (e.g., because is has been | |||
| stripped by a middlebox or not sent by the Server), the Client | stripped by a middlebox or not sent by the Server), the Client | |||
| switches into a mode that assumes that the AccECN Option is not | switches into a mode that assumes that the AccECN Option is not | |||
| available for this half connection. | available for this half-connection. | |||
| Similarly, if the TCP Server has successfully negotiated AccECN but | Similarly, if the TCP Server has successfully negotiated AccECN but | |||
| does not receive an AccECN Option on the first segment that | does not receive an AccECN Option on the first segment that | |||
| acknowledges sequence space at least covering the ISN, it switches | acknowledges sequence space at least covering the ISN, it switches | |||
| into a mode that assumes that the AccECN Option is not available for | into a mode that assumes that the AccECN Option is not available for | |||
| this half connection. | this half-connection. | |||
| While a host is in this mode that assumes incoming AccECN Options are | While a host is in this mode that assumes incoming AccECN Options are | |||
| not available, it MUST adopt the conservative interpretation of the | not available, it MUST adopt the conservative interpretation of the | |||
| ACE field discussed in Section 3.2.2.5. However, it cannot make any | ACE field discussed in Section 3.2.2.5. However, it cannot make any | |||
| assumption about support of outgoing AccECN Options on the other half | assumption about support of outgoing AccECN Options on the other | |||
| connection, so it SHOULD continue to send AccECN Options itself | half-connection, so it SHOULD continue to send AccECN Options itself | |||
| (unless it has established that sending AccECN Options is causing | (unless it has established that sending AccECN Options is causing | |||
| packets to be blocked as in Section 3.2.3.2.2). | packets to be blocked as in Section 3.2.3.2.2). | |||
| If a host is in the mode that assumes incoming AccECN Options are not | If a host is in the mode that assumes incoming AccECN Options are not | |||
| available, but it receives an AccECN Option at any later point during | available, but it receives an AccECN Option at any later point during | |||
| the connection, this clearly indicates that AccECN Options are no | the connection, this clearly indicates that AccECN Options are no | |||
| longer blocked on the respective path, and the AccECN endpoint MAY | longer blocked on the respective path, and the AccECN endpoint MAY | |||
| switch out of the mode that assumes AccECN Options are not available | switch out of the mode that assumes AccECN Options are not available | |||
| for this half connection. | for this half-connection. | |||
| 3.2.3.2.4. Test for Zeroing of the AccECN Option | 3.2.3.2.4. Test for Zeroing of the AccECN Option | |||
| For a related test for invalid initialization of the ACE field, see | For a related test for invalid initialization of the ACE field, see | |||
| Section 3.2.2.4 | Section 3.2.2.4 | |||
| Section 3.2.1 required the Data Receiver to initialize the r.e0b and | Section 3.2.1 required the Data Receiver to initialize the r.e0b and | |||
| r.e1b counters to a non-zero value. Therefore, in either direction | r.e1b counters to a non-zero value. Therefore, in either direction | |||
| the initial value of the EE0B field or EE1B field in an AccECN Option | the initial value of the EE0B field or EE1B field in an AccECN Option | |||
| (if one exists) ought to be non-zero. If AccECN has been negotiated: | (if one exists) ought to be non-zero. If AccECN has been negotiated: | |||
| * the TCP Server MAY check that the initial value of the EE0B field | * the TCP Server MAY check that the initial value of the EE0B field | |||
| or the EE1B field is non-zero in the first segment that | or the EE1B field is non-zero in the first segment that | |||
| acknowledges sequence space that at least covers the ISN plus 1. | acknowledges sequence space that at least covers the ISN plus 1. | |||
| If it runs a test and either initial value is zero, the Server | If it runs a test and either initial value is zero, the Server | |||
| will switch into a mode that ignores AccECN Options for this half | will switch into a mode that ignores AccECN Options for this half- | |||
| connection. | connection. | |||
| * the TCP Client MAY check that the initial value of the EE0B field | * the TCP Client MAY check that the initial value of the EE0B field | |||
| or the EE1B field is non-zero on the SYN/ACK. If it runs a test | or the EE1B field is non-zero on the SYN/ACK. If it runs a test | |||
| and either initial value is zero, the Client will switch into a | and either initial value is zero, the Client will switch into a | |||
| mode that ignores AccECN Options for this half connection. | mode that ignores AccECN Options for this half-connection. | |||
| While a host is in the mode that ignores AccECN Options, it MUST | While a host is in the mode that ignores AccECN Options, it MUST | |||
| adopt the conservative interpretation of the ACE field discussed in | adopt the conservative interpretation of the ACE field discussed in | |||
| Section 3.2.2.5. | Section 3.2.2.5. | |||
| Note that the Data Sender MUST NOT test whether the arriving byte | Note that the Data Sender MUST NOT test whether the arriving byte | |||
| counters in an initial AccECN Option have been initialized to | counters in an initial AccECN Option have been initialized to | |||
| specific valid values -- the above checks solely test whether these | specific valid values -- the above checks solely test whether these | |||
| fields have been incorrectly zeroed. This allows hosts to use | fields have been incorrectly zeroed. This allows hosts to use | |||
| different initial values as an additional signalling channel in the | different initial values as an additional signalling channel in the | |||
| skipping to change at line 2006 ¶ | skipping to change at line 2008 ¶ | |||
| could also occur if a middlebox mangled an AccECN Option but not the | could also occur if a middlebox mangled an AccECN Option but not the | |||
| ACE field. However, the Data Sender has to assume that the integrity | ACE field. However, the Data Sender has to assume that the integrity | |||
| of AccECN Options is sound, based on the above test of the well-known | of AccECN Options is sound, based on the above test of the well-known | |||
| initial values and optionally other integrity tests (Section 5.3). | initial values and optionally other integrity tests (Section 5.3). | |||
| If either endpoint detects that the s.ceb counter has increased but | If either endpoint detects that the s.ceb counter has increased but | |||
| the s.cep has not (and by testing ACK coverage it is certain how much | the s.cep has not (and by testing ACK coverage it is certain how much | |||
| the ACE field has wrapped), and if there is no explanation other than | the ACE field has wrapped), and if there is no explanation other than | |||
| an invalid protocol transition due to some form of feedback mangling, | an invalid protocol transition due to some form of feedback mangling, | |||
| the Data Sender MUST disable sending ECN-capable packets for the | the Data Sender MUST disable sending ECN-capable packets for the | |||
| remainder of the half-connection by setting the IP-ECN field in all | remainder of the half-connection by setting the IP ECN field in all | |||
| subsequent packets to Not-ECT. | subsequent packets to Not-ECT. | |||
| 3.2.3.3. Usage of the AccECN TCP Option | 3.2.3.3. Usage of the AccECN TCP Option | |||
| If a Data Receiver in AccECN mode intends to use AccECN TCP Options | If a Data Receiver in AccECN mode intends to use AccECN TCP Options | |||
| to provide feedback, the rules below determine when to include an | to provide feedback, the rules below determine when to include an | |||
| AccECN TCP Option, and which fields to include, given other options | AccECN TCP Option, and which fields to include, given other options | |||
| might be competing for limited option space: | might be competing for limited option space: | |||
| Importance of Congestion Control: AccECN is for congestion control, | Importance of Congestion Control: AccECN is for congestion control, | |||
| which implementations SHOULD generally prioritize over other TCP | which implementations SHOULD generally prioritize over other TCP | |||
| options when there is insufficient space for all the options in | Options when there is insufficient space for all the options in | |||
| use. | use. | |||
| If SACK has been negotiated [RFC2018], and the smallest | If SACK has been negotiated [RFC2018], and the smallest | |||
| recommended AccECN Option would leave insufficient space for two | recommended AccECN Option would leave insufficient space for two | |||
| SACK blocks on a particular ACK, the Data Receiver MUST give | SACK blocks on a particular ACK, the Data Receiver MUST give | |||
| precedence to the SACK option (total 18 octets), because loss | precedence to the SACK option (total 18 octets), because loss | |||
| feedback is more critical. | feedback is more critical. | |||
| Recommended Simple Scheme: The Data Receiver SHOULD include an | Recommended Simple Scheme: The Data Receiver SHOULD include an | |||
| AccECN TCP Option on every scheduled ACK if any byte counter has | AccECN TCP Option on every scheduled ACK if any byte counter has | |||
| skipping to change at line 2040 ¶ | skipping to change at line 2042 ¶ | |||
| include a field for every byte counter that has changed at some | include a field for every byte counter that has changed at some | |||
| time during the connection (see examples later). | time during the connection (see examples later). | |||
| A scheduled ACK means an ACK that the Data Receiver would send by | A scheduled ACK means an ACK that the Data Receiver would send by | |||
| its regular delayed ACK rules. Recall that Section 1.3 defines an | its regular delayed ACK rules. Recall that Section 1.3 defines an | |||
| 'ACK' as either with data payload or without. But the above rule | 'ACK' as either with data payload or without. But the above rule | |||
| is worded so that, in the common case when most of the data is | is worded so that, in the common case when most of the data is | |||
| from a Server to a Client, the Server only includes an AccECN TCP | from a Server to a Client, the Server only includes an AccECN TCP | |||
| Option while it is acknowledging data from the Client. | Option while it is acknowledging data from the Client. | |||
| When available TCP option space is limited on particular packets, the | When available TCP Option space is limited on particular packets, the | |||
| recommended scheme will need to include compromises. To guide the | recommended scheme will need to include compromises. To guide the | |||
| implementer, the rules below are ranked in order of importance, but | implementer, the rules below are ranked in order of importance, but | |||
| the final decision has to be implementation-dependent, because | the final decision has to be implementation-dependent, because | |||
| tradeoffs will alter as new TCP options are defined and new use-cases | tradeoffs will alter as new TCP Options are defined and new use-cases | |||
| arise. | arise. | |||
| Necessary Option Length: When TCP option space is limited, an AccECN | Necessary Option Length: When TCP Option space is limited, an AccECN | |||
| TCP option MAY be truncated to omit one or two fields from the end | TCP Option MAY be truncated to omit one or two fields from the end | |||
| of the option, as indicated by the permitted variants listed in | of the option, as indicated by the permitted variants listed in | |||
| Table 5, provided that the counter(s) that have changed since the | Table 5, provided that the counter(s) that have changed since the | |||
| previous AccECN TCP option are not omitted. | previous AccECN TCP Option are not omitted. | |||
| If there is insufficient space to include an AccECN TCP option | If there is insufficient space to include an AccECN TCP Option | |||
| containing the counter(s) that have changed since the previous | containing the counter(s) that have changed since the previous | |||
| AccECN TCP option, then the entire AccECN TCP option MUST be | AccECN TCP Option, then the entire AccECN TCP Option MUST be | |||
| omitted. (see Section 3.2.3); | omitted. (see Section 3.2.3); | |||
| Change-Triggered AccECN TCP Options: If an arriving packet | Change-Triggered AccECN TCP Options: If an arriving packet | |||
| increments a different byte counter to that incremented by the | increments a different byte counter to that incremented by the | |||
| previous packet, the Data Receiver SHOULD feed it back in an | previous packet, the Data Receiver SHOULD feed it back in an | |||
| AccECN Option on the next scheduled ACK. | AccECN Option on the next scheduled ACK. | |||
| For the avoidance of doubt, this rule does not concern the arrival | For the avoidance of doubt, this rule does not concern the arrival | |||
| of control packets with no payload, because they cannot alter any | of control packets with no payload, because they cannot alter any | |||
| byte counters. | byte counters. | |||
| skipping to change at line 2078 ¶ | skipping to change at line 2080 ¶ | |||
| increment the same byte counter: | increment the same byte counter: | |||
| * the Data Receiver SHOULD include a counter that has continued | * the Data Receiver SHOULD include a counter that has continued | |||
| to increment on the next scheduled ACK following a change- | to increment on the next scheduled ACK following a change- | |||
| triggered AccECN TCP Option; | triggered AccECN TCP Option; | |||
| * while the same counter continues to increment, it SHOULD | * while the same counter continues to increment, it SHOULD | |||
| include the counter every n ACKs as consistently as possible, | include the counter every n ACKs as consistently as possible, | |||
| where n can be chosen by the implementer; | where n can be chosen by the implementer; | |||
| * It SHOULD always include an AccECN Option if the r.ceb counter | * it SHOULD always include an AccECN Option if the r.ceb counter | |||
| is incrementing and it MAY include an AccECN Option if r.ec0b | is incrementing and it MAY include an AccECN Option if r.ec0b | |||
| or r.ec1b is incrementing | or r.ec1b is incrementing; | |||
| * It SHOULD include each counter at least once for every 2^22 | * it SHOULD include each counter at least once for every 2^22 | |||
| bytes incremented to prevent overflow during continual | bytes incremented to prevent overflow during continual | |||
| repetition. | repetition. | |||
| The above rules complement those in Section 3.2.2.5, which determine | The above rules complement those in Section 3.2.2.5, which determine | |||
| when to generate an ACK irrespective of whether an AccECN TCP Option | when to generate an ACK irrespective of whether an AccECN TCP Option | |||
| is to be included. | is to be included. | |||
| The recommended scheme is intended as a simple way to ensure that all | The recommended scheme is intended as a simple way to ensure that all | |||
| the relevant byte counters will be carried on any ACK that reaches | the relevant byte counters will be carried on any ACK that reaches | |||
| the Data Sender, no matter how many pure ACKs are filtered or | the Data Sender, no matter how many pure ACKs are filtered or | |||
| coalesced along the network path, and without consuming the space | coalesced along the network path, and without consuming the space | |||
| available for payload data with counter field(s) that have never | available for payload data with counter field(s) that have never | |||
| changed. | changed. | |||
| As an example of the recommended scheme, if ECT(0) is the only | As an example of the recommended scheme, if ECT(0) is the only | |||
| codepoint that has ever arrived in the IP-ECN field, the Data | codepoint that has ever arrived in the IP ECN field, the Data | |||
| Receiver will feed back an AccECN0 TCP Option with only the EE0B | Receiver will feed back an AccECN0 TCP Option with only the EE0B | |||
| field on every packet that acknowledges new data. However, as soon | field on every packet that acknowledges new data. However, as soon | |||
| as even one CE-marked packet arrives, on every packet that | as even one CE-marked packet arrives, on every packet that | |||
| acknowledges new data it will start to include an option with two | acknowledges new data it will start to include an option with two | |||
| fields, EE0B and ECEB. As a second example, if the first packet to | fields, EE0B and ECEB. As a second example, if the first packet to | |||
| arrive happens to be CE marked, the Data Receiver will have to | arrive happens to be CE marked, the Data Receiver will have to | |||
| arbitrarily choose whether to precede the ECEB field with an EE0B | arbitrarily choose whether to precede the ECEB field with an EE0B | |||
| field or an EE1B field. If it chooses, say, EEB0 but it turns out | field or an EE1B field. If it chooses, say, EEB0 but it turns out | |||
| never to receive ECT(0), it can start sending EE1B and ECEB instead | never to receive ECT(0), it can start sending EE1B and ECEB instead | |||
| -- it does not have to include the EE0B field if the r.e0b counter | -- it does not have to include the EE0B field if the r.e0b counter | |||
| skipping to change at line 2170 ¶ | skipping to change at line 2172 ¶ | |||
| A TCP normalizer is likely to block or alter an AccECN TCP Option if | A TCP normalizer is likely to block or alter an AccECN TCP Option if | |||
| the length value or the initial values of its byte-counter fields do | the length value or the initial values of its byte-counter fields do | |||
| not match one of those specified in Sections 3.2.3 or 3.2.1. | not match one of those specified in Sections 3.2.3 or 3.2.1. | |||
| However, to comply with the present AccECN specification, a middlebox | However, to comply with the present AccECN specification, a middlebox | |||
| MUST NOT change the ACE field; or those fields of an AccECN Option | MUST NOT change the ACE field; or those fields of an AccECN Option | |||
| that are currently specified in Section 3.2.3; or any AccECN field | that are currently specified in Section 3.2.3; or any AccECN field | |||
| covered by integrity protection (e.g., [RFC5925]). | covered by integrity protection (e.g., [RFC5925]). | |||
| 3.3.3. Requirements for TCP ACK Filtering | 3.3.3. Requirements for TCP ACK Filtering | |||
| Section 5.2.1 of [RFC3449] gives best current practice on filtering | Section 5.2.1 of RFC 3449 [BCP69] gives best current practice on | |||
| (aka thinning or coalescing) of pure TCP ACKs. It advises that | filtering (aka thinning or coalescing) of pure TCP ACKs. It advises | |||
| filtering ACKs carrying ECN feedback ought to preserve the correct | that filtering ACKs carrying ECN feedback ought to preserve the | |||
| operation of ECN feedback. As the present specification updates the | correct operation of ECN feedback. As the present specification | |||
| operation of ECN feedback, this section discusses how an ACK filter | updates the operation of ECN feedback, this section discusses how an | |||
| might preserve correct operation of AccECN feedback as well. | ACK filter might preserve correct operation of AccECN feedback as | |||
| well. | ||||
| The problem divides into two parts: determining if an ACK is part of | The problem divides into two parts: determining if an ACK is part of | |||
| a connection that is using AccECN and then preserving the correct | a connection that is using AccECN and then preserving the correct | |||
| operation of AccECN feedback: | operation of AccECN feedback: | |||
| * To determine whether a pure TCP ACK is part of an AccECN | * To determine whether a pure TCP ACK is part of an AccECN | |||
| connection without resorting to connection tracking and per-flow | connection without resorting to connection tracking and per-flow | |||
| state, a useful heuristic would be to check for a non-zero ECN | state, a useful heuristic would be to check for a non-zero ECN | |||
| field at the IP layer (because the ECN++ experiment only allows | field at the IP layer (because the ECN++ experiment only allows | |||
| TCP pure ACKs to be ECN-capable if AccECN has been negotiated | TCP pure ACKs to be ECN-capable if AccECN has been negotiated | |||
| [ECN++]). This heuristic is simple and stateless. However, it | [ECN++]). This heuristic is simple and stateless. However, it | |||
| might omit some AccECN ACKs, because AccECN can be used without | might omit some AccECN ACKs because AccECN can be used without | |||
| ECN++ and even if it is, ECN++ does not have to make pure ACKs | ECN++. Even if ECN++ is used, pure ACKs do not necessarily have | |||
| ECN-capable -- only deployment experience will tell. Also, TCP | to be marked as ECN-capable -- only deployment experience will | |||
| ACKs might be ECN-capable owing to some scheme other than AccECN, | tell. Also, TCP ACKs might be ECN-capable owing to some scheme | |||
| e.g., [RFC5690] or some future standards action. Again, only | other than AccECN, e.g., [RFC5690] or some future standards | |||
| deployment experience will tell. | action. Again, only deployment experience will tell. | |||
| * The main concern with preserving correct AccECN operation involves | * The main concern with preserving correct AccECN operation involves | |||
| leaving enough ACKs for the Data Sender to work out whether the | leaving enough ACKs for the Data Sender to work out whether the | |||
| 3-bit ACE field has wrapped. In the worst case, in feedback about | 3-bit ACE field has wrapped. In the worst case, in feedback about | |||
| a run of received packets that were all ECN-marked, the ACE field | a run of received packets that were all ECN-marked, the ACE field | |||
| will wrap every 8 acknowledged packets. ACE field wrap might be | will wrap every 8 acknowledged packets. ACE field wrap might be | |||
| of less concern if packets also carry AccECN TCP Options. | of less concern if packets also carry AccECN TCP Options. | |||
| However, note that logic to read an AccECN TCP Option is optional | However, note that logic to read an AccECN TCP Option is optional | |||
| to implement (albeit recommended -- see Section 3.2.3). So one | to implement (albeit recommended -- see Section 3.2.3). So one | |||
| end writing an AccECN TCP Option into a packet does not | end writing an AccECN TCP Option into a packet does not | |||
| skipping to change at line 2240 ¶ | skipping to change at line 2243 ¶ | |||
| direction. Therefore, currently available TSO hardware with | direction. Therefore, currently available TSO hardware with | |||
| [RFC3168] support may need some minor driver changes, to adjust the | [RFC3168] support may need some minor driver changes, to adjust the | |||
| bitmask for the first, middle, and last segments processed with TSO. | bitmask for the first, middle, and last segments processed with TSO. | |||
| Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | |||
| on the same offloading engine, the host software may need to work | on the same offloading engine, the host software may need to work | |||
| around incompatibilities (e.g., when only global configurable TSO TCP | around incompatibilities (e.g., when only global configurable TSO TCP | |||
| Flag bitmasks are available), otherwise this would cause some issues. | Flag bitmasks are available), otherwise this would cause some issues. | |||
| One way around this could be to only negotiate for Accurate ECN, but | One way around this could be to only negotiate for Accurate ECN, but | |||
| not offer a fall back to [RFC3168] ECN. Another way could be to | not offer a fall back to Classic ECN [RFC3168]. Another way could be | |||
| allow TSO only as long as the CWR flag in the TCP header is not set | to allow TSO only as long as the CWR flag in the TCP header is not | |||
| -- at the cost of more processing overhead while the ACE field has | set -- at the cost of more processing overhead while the ACE field | |||
| this bit set. | has this bit set. | |||
| For LRO in the receive direction, a different issue may get exposed | For LRO in the receive direction, a different issue may get exposed | |||
| with [RFC3168] ECN supporting hardware. | with Classic ECN [RFC3168] supporting hardware. | |||
| The ACE field changes with every received CE marking, so today's | The ACE field changes with every received CE marking, so today's | |||
| receive offloading could lead to many interrupts in high congestion | receive offloading could lead to many interrupts in high congestion | |||
| situations. Although that would be useful (because congestion | situations. Although that would be useful (because congestion | |||
| information is received sooner), it could also significantly increase | information is received sooner), it could also significantly increase | |||
| processor load, particularly in scenarios such as DCTCP or L4S where | processor load, particularly in scenarios such as DCTCP or L4S where | |||
| the marking rate is generally higher. | the marking rate is generally higher. | |||
| Current offload hardware ejects a segment from the coalescing process | Current offload hardware ejects a segment from the coalescing process | |||
| whenever the TCP ECN flags change. In data centres, it has been | whenever the TCP ECN flags change. In data centres, it has been | |||
| skipping to change at line 2304 ¶ | skipping to change at line 2307 ¶ | |||
| of the present specification. | of the present specification. | |||
| * In Section 6.1.2 of [RFC3168], all mentions of a congestion | * In Section 6.1.2 of [RFC3168], all mentions of a congestion | |||
| response to an ECN-Echo (ECE) ACK packet are updated by | response to an ECN-Echo (ECE) ACK packet are updated by | |||
| Section 3.2 of the present specification to mean an increment to | Section 3.2 of the present specification to mean an increment to | |||
| the sender's count of CE-marked packets, s.cep. And the | the sender's count of CE-marked packets, s.cep. And the | |||
| requirements to set the CWR flag no longer apply, as specified in | requirements to set the CWR flag no longer apply, as specified in | |||
| Section 3.1.5 of the present specification. Otherwise, the | Section 3.1.5 of the present specification. Otherwise, the | |||
| remaining requirements in Section 6.1.2 of [RFC3168] still stand. | remaining requirements in Section 6.1.2 of [RFC3168] still stand. | |||
| It will be noted that [RFC8311] already updates, or potentially | It will be noted that [RFC8311] already updates a number of the | |||
| updates, a number of the requirements in Section 6.1.2 of | requirements in Section 6.1.2 of [RFC3168]. Section 6.1.2 of RFC | |||
| [RFC3168]. Section 6.1.2 of RFC 3168 extended standard TCP | 3168 extended standard TCP congestion control [RFC5681] to cover | |||
| congestion control [RFC5681] to cover ECN marking as well as | ECN marking as well as packet drop. Whereas, [RFC8311] enables | |||
| packet drop. Whereas, [RFC8311] enables experimentation with | experimentation with alternative responses to ECN marking, if | |||
| alternative responses to ECN marking, if specified for instance by | specified for instance by an Experimental RFC produced by the IETF | |||
| an Experimental RFC produced by the IETF Stream. [RFC8311] also | Stream. [RFC8311] also strengthened the statement that "ECT(0) | |||
| strengthened the statement that "ECT(0) SHOULD be used" to a | SHOULD be used" to a "MUST" (see [RFC8311] for the details). | |||
| "MUST" (see [RFC8311] for the details). | ||||
| * The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2 | * The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2 | |||
| of the present specification, with the exception of the last | of the present specification, with the exception of the last | |||
| paragraph (about congestion response to drop and ECN in the same | paragraph (about congestion response to drop and ECN in the same | |||
| round trip), which still stands. Incidentally, this last | round trip), which still stands. Incidentally, this last | |||
| paragraph is in the wrong section, because it relates to "TCP | paragraph is in the wrong section, because it relates to "TCP | |||
| Sender" behaviour. | Sender" behaviour. | |||
| * The following text within Section 6.1.5 of [RFC3168]: | * The following text within Section 6.1.5 of [RFC3168]: | |||
| skipping to change at line 2384 ¶ | skipping to change at line 2386 ¶ | |||
| with the value 0b000 or 0b001, these values indicate that the TCP | with the value 0b000 or 0b001, these values indicate that the TCP | |||
| Client did not request support for AccECN; therefore, the Server does | Client did not request support for AccECN; therefore, the Server does | |||
| not enter AccECN mode for this connection. Further, 0b001 on the ACK | not enter AccECN mode for this connection. Further, 0b001 on the ACK | |||
| implies that the Server sent an ECN-capable SYN/ACK, which was marked | implies that the Server sent an ECN-capable SYN/ACK, which was marked | |||
| CE in the network, and the non-AccECN TCP Client fed this back by | CE in the network, and the non-AccECN TCP Client fed this back by | |||
| setting ECE on the ACK of the SYN/ACK. | setting ECE on the ACK of the SYN/ACK. | |||
| 5.2. Compatibility with TCP Experiments and Common TCP Options | 5.2. Compatibility with TCP Experiments and Common TCP Options | |||
| AccECN is compatible (at least on paper) with the most commonly used | AccECN is compatible (at least on paper) with the most commonly used | |||
| TCP options: MSS, time-stamp, window scaling, SACK, and TCP-AO. It | TCP Options: MSS, timestamp, window scaling, SACK, and TCP-AO. It is | |||
| is also compatible with Multipath TCP (MPTCP [RFC8684]) and the | also compatible with Multipath TCP (MPTCP [RFC8684]) and the | |||
| experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is | experimental TCP Option TCP Fast Open (TFO [RFC7413]). AccECN is | |||
| friendly to all these protocols, because space for TCP options is | friendly to all these protocols, because space for TCP Options is | |||
| particularly scarce on the SYN, where AccECN consumes zero additional | particularly scarce on the SYN, where AccECN consumes zero additional | |||
| header space. | header space. | |||
| When option space is under pressure from other options, | Because option space is limited, Section 3.2.3.3 provides guidance on | |||
| Section 3.2.3.3 provides guidance on how important it is to send an | how important it is to send an AccECN Option relative to other | |||
| AccECN Option relative to other options, and which fields are more | options and specifies which fields are more important to include. | |||
| important to include. | ||||
| Implementers of TFO need to take careful note of the recommendation | Implementers of TFO need to take careful note of the recommendation | |||
| in Section 3.2.2.1. That section recommends that, if the TCP Client | in Section 3.2.2.1. That section recommends that, if the TCP Client | |||
| has successfully negotiated AccECN, when acknowledging the SYN/ACK, | has successfully negotiated AccECN, when acknowledging the SYN/ACK, | |||
| even if it has data to send, it sends a pure ACK immediately before | even if it has data to send, it sends a pure ACK immediately before | |||
| the data. Then it can reflect the IP-ECN field of the SYN/ACK on | the data. Then it can reflect the IP ECN field of the SYN/ACK on | |||
| this pure ACK, which allows the Server to detect ECN mangling. Note | this pure ACK, which allows the Server to detect ECN mangling. Note | |||
| that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | |||
| is not included in any of the byte counters held locally for each ECN | is not included in any of the byte counters held locally for each ECN | |||
| marking, nor in the AccECN Option on the wire. | marking, nor in the AccECN Option on the wire. | |||
| AccECN feedback is compatible with the ECN++ experiment [ECN++], | AccECN feedback is compatible with the ECN++ experiment [ECN++], | |||
| which allows TCP control packets and retransmissions to be ECN- | which allows TCP control packets and retransmissions to be ECN- | |||
| capable ([RFC3168] was updated by [RFC8311] to permit such | capable ([RFC3168] was updated by [RFC8311] to permit such | |||
| experiments). AccECN is likely to inherently support any experiment | experiments). AccECN is likely to inherently support any experiment | |||
| with ECN-capable packets, because it feeds back the contents of the | with ECN-capable packets, because it feeds back the contents of the | |||
| skipping to change at line 2424 ¶ | skipping to change at line 2425 ¶ | |||
| an earlier experimental protocol with narrower scope than ECN++ and a | an earlier experimental protocol with narrower scope than ECN++ and a | |||
| 5-way handshake. | 5-way handshake. | |||
| 5.3. Compatibility with Feedback Integrity Mechanisms | 5.3. Compatibility with Feedback Integrity Mechanisms | |||
| Three alternative mechanisms are available to assure the integrity of | Three alternative mechanisms are available to assure the integrity of | |||
| ECN and/or loss signals. AccECN is compatible with any of these | ECN and/or loss signals. AccECN is compatible with any of these | |||
| approaches: | approaches: | |||
| * The Data Sender can test the integrity of the receiver's ECN (or | * The Data Sender can test the integrity of the receiver's ECN (or | |||
| loss) feedback by occasionally setting the IP-ECN field to a value | loss) feedback by occasionally setting the IP ECN field to a value | |||
| normally only set by the network (and/or deliberately leaving a | normally only set by the network (and/or deliberately leaving a | |||
| sequence number gap). Then it can test whether the Data | sequence number gap). Then it can test whether the Data | |||
| Receiver's feedback faithfully reports what it expects (similar to | Receiver's feedback faithfully reports what it expects (similar to | |||
| paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce | paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce | |||
| [RFC3540], this approach does not waste the ECT(1) codepoint in | [RFC3540], this approach does not waste the ECT(1) codepoint in | |||
| the IP header, it does not require standardization, and it does | the IP header, it does not require standardization, and it does | |||
| not rely on misbehaving receivers volunteering to reveal feedback | not rely on misbehaving receivers volunteering to reveal feedback | |||
| information that allows them to be detected. However, setting the | information that allows them to be detected. However, setting the | |||
| CE mark by the sender might conceal actual congestion feedback | CE mark by the sender might conceal actual congestion feedback | |||
| from the network and therefore ought to only be done sparingly. | from the network and therefore ought to only be done sparingly. | |||
| skipping to change at line 2455 ¶ | skipping to change at line 2456 ¶ | |||
| ConEx is an experimental change to the Data Sender that would be | ConEx is an experimental change to the Data Sender that would be | |||
| most useful when combined with AccECN. Without AccECN, the ConEx | most useful when combined with AccECN. Without AccECN, the ConEx | |||
| behaviour of a Data Sender would have to be more conservative than | behaviour of a Data Sender would have to be more conservative than | |||
| would be necessary if it had the accurate feedback of AccECN. | would be necessary if it had the accurate feedback of AccECN. | |||
| * The Standards Track TCP authentication option (TCP-AO [RFC5925]) | * The Standards Track TCP authentication option (TCP-AO [RFC5925]) | |||
| can be used to detect any tampering with AccECN feedback between | can be used to detect any tampering with AccECN feedback between | |||
| the Data Receiver and the Data Sender (whether malicious or | the Data Receiver and the Data Sender (whether malicious or | |||
| accidental). The AccECN fields are immutable end to end, so they | accidental). The AccECN fields are immutable end to end, so they | |||
| are amenable to TCP-AO protection, which covers TCP options by | are amenable to TCP-AO protection, which covers TCP Options by | |||
| default. However, TCP-AO is often too brittle to use on many end- | default. However, TCP-AO is often too brittle to use on many end- | |||
| to-end paths, where middleboxes can make verification fail in | to-end paths, where middleboxes can make verification fail in | |||
| their attempts to improve performance or security, e.g., Network | their attempts to improve performance or security, e.g., Network | |||
| Address Translation (NAT) and Network Address Port Translation | Address Translation (NAT) and Network Address Port Translation | |||
| (NAPT), resegmentation, or shifting the sequence space. | (NAPT), resegmentation, or shifting the sequence space. | |||
| 6. Summary: Protocol Properties | 6. Summary: Protocol Properties | |||
| This section is informative, not normative. It describes how well | This section is informative, not normative. It describes how well | |||
| the protocol satisfies the agreed requirements for a more Accurate | the protocol satisfies the agreed requirements for a more Accurate | |||
| skipping to change at line 2477 ¶ | skipping to change at line 2478 ¶ | |||
| Accuracy: From each ACK, the Data Sender can infer the number of new | Accuracy: From each ACK, the Data Sender can infer the number of new | |||
| CE-marked segments since the previous ACK. This provides better | CE-marked segments since the previous ACK. This provides better | |||
| accuracy on CE feedback than Classic ECN. In addition, if an | accuracy on CE feedback than Classic ECN. In addition, if an | |||
| AccECN Option is present (not blocked by the network path), the | AccECN Option is present (not blocked by the network path), the | |||
| number of bytes marked with CE, ECT(1), and ECT(0) are provided. | number of bytes marked with CE, ECT(1), and ECT(0) are provided. | |||
| Overhead: The AccECN scheme is divided into two parts. The | Overhead: The AccECN scheme is divided into two parts. The | |||
| essential feedback part reuses the three flags already assigned to | essential feedback part reuses the three flags already assigned to | |||
| ECN in the TCP header. The supplementary feedback part adds an | ECN in the TCP header. The supplementary feedback part adds an | |||
| additional TCP option consuming up to 11 bytes. However, no TCP | additional TCP Option consuming up to 11 bytes. However, no TCP | |||
| option space is consumed in the SYN. | Option space is consumed in the SYN. | |||
| Ordering: The order in which marks arrive at the Data Receiver is | Ordering: The order in which marks arrive at the Data Receiver is | |||
| preserved in AccECN feedback, because the Data Receiver is | preserved in AccECN feedback, because the Data Receiver is | |||
| expected to send an ACK immediately whenever a different mark | expected to send an ACK immediately whenever a different mark | |||
| arrives. | arrives. | |||
| Timeliness: While the same ECN markings are arriving continually at | Timeliness: While the same ECN markings are arriving continually at | |||
| the Data Receiver, it can defer ACKs as TCP does normally, but it | the Data Receiver, it can defer ACKs as TCP does normally, but it | |||
| will immediately send an ACK as soon as a different ECN marking | will immediately send an ACK as soon as a different ECN marking | |||
| arrives. | arrives. | |||
| skipping to change at line 2545 ¶ | skipping to change at line 2546 ¶ | |||
| can assure the integrity of ECN feedback. If AccECN Options are | can assure the integrity of ECN feedback. If AccECN Options are | |||
| stripped, the resolution of the feedback is degraded, but the | stripped, the resolution of the feedback is degraded, but the | |||
| integrity of this degraded feedback can still be assured. | integrity of this degraded feedback can still be assured. | |||
| Backward Compatibility: If only one endpoint supports the AccECN | Backward Compatibility: If only one endpoint supports the AccECN | |||
| scheme, it will fall back to the most advanced ECN feedback scheme | scheme, it will fall back to the most advanced ECN feedback scheme | |||
| supported by the other end. | supported by the other end. | |||
| If AccECN Options are stripped by a middlebox, AccECN still | If AccECN Options are stripped by a middlebox, AccECN still | |||
| provides basic congestion feedback in the ACE field. Further, | provides basic congestion feedback in the ACE field. Further, | |||
| AccECN can be used to detect mangling of the IP-ECN field; | AccECN can be used to detect mangling of the IP ECN field; | |||
| mangling of the TCP ECN flags; blocking of ECT-marked segments; | mangling of the TCP ECN flags; blocking of ECT-marked segments; | |||
| and blocking of segments carrying an AccECN Option. It can detect | and blocking of segments carrying an AccECN Option. It can detect | |||
| these conditions during TCP's three-way handshake so that it can | these conditions during TCP's three-way handshake so that it can | |||
| fall back to operation without ECN and/or operation without AccECN | fall back to operation without ECN and/or operation without AccECN | |||
| Options. | Options. | |||
| Forward Compatibility: The behaviour of endpoints and middleboxes is | Forward Compatibility: The behaviour of endpoints and middleboxes is | |||
| carefully defined for all reserved or currently unused codepoints | carefully defined for all reserved or currently unused codepoints | |||
| in the scheme. Then, the designers of security devices can | in the scheme. Then, the designers of security devices can | |||
| understand which currently unused values might appear in the | understand which currently unused values might appear in the | |||
| skipping to change at line 2581 ¶ | skipping to change at line 2582 ¶ | |||
| +=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| | Bit | Name | Reference | Assignment Notes | | | Bit | Name | Reference | Assignment Notes | | |||
| +=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| | 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce | | | 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce | | |||
| | | ECN) | | Sum) by [RFC3540], which is | | | | ECN) | | Sum) by [RFC3540], which is | | |||
| | | | | now Historic [RFC8311] | | | | | | now Historic [RFC8311] | | |||
| +-----+--------------+-----------+------------------------------+ | +-----+--------------+-----------+------------------------------+ | |||
| Table 6: TCP Header Flag Reassignment | Table 6: TCP Header Flag Reassignment | |||
| This document also defines two new TCP options for AccECN from the | This document also defines two new TCP Options for AccECN from the | |||
| TCP option space. These values are defined as the following in the | TCP Option space. These values are defined as the following in the | |||
| "TCP Option Kind Numbers" registry in the "Transmission Control | "TCP Option Kind Numbers" registry in the "Transmission Control | |||
| Protocol (TCP) Parameters" registry group: | Protocol (TCP) Parameters" registry group: | |||
| +======+========+================================+===========+ | +======+========+================================+===========+ | |||
| | Kind | Length | Meaning | Reference | | | Kind | Length | Meaning | Reference | | |||
| +======+========+================================+===========+ | +======+========+================================+===========+ | |||
| | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 | | | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 | | |||
| +------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
| | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 | | | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 | | |||
| +------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
| Table 7: New TCP Option assignments | Table 7: New TCP Option Assignments | |||
| Early experimental implementations of the two AccECN Options used | Early experimental implementations of the two AccECN Options used | |||
| experimental option 254 per [RFC6994] with the 16-bit magic numbers | experimental option 254 per [RFC6994] with the 16-bit magic numbers | |||
| 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in | 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in | |||
| the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP | the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP | |||
| ExIDs)" registry. Even earlier experimental implementations used the | ExIDs)" registry. Even earlier experimental implementations used the | |||
| single magic number 0xACCE (16 bits). Uses of these experimental | single magic number 0xACCE (16 bits). Uses of these experimental | |||
| options SHOULD migrate to use the new option kinds (172 and 174). | options SHOULD migrate to use the new option kinds (172 and 174). | |||
| 8. Security and Privacy Considerations | 8. Security and Privacy Considerations | |||
| If ever the supplementary feedback part of AccECN that is based on | If ever the supplementary feedback part of AccECN that is based on | |||
| one of the new AccECN TCP Options is unusable (due for example to | one of the new AccECN TCP Options is unusable (due for example to | |||
| middlebox interference), the essential feedback part of AccECN's | middlebox interference), the essential feedback part of AccECN's | |||
| congestion feedback offers only limited resilience to long runs of | congestion feedback offers only limited resilience to long runs of | |||
| ACK loss (see Section 3.2.2.5). These problems are unlikely to be | ACK loss (see Section 3.2.2.5). These problems are unlikely to be | |||
| due to malicious intervention (because if an attacker could strip a | due to malicious intervention (because if an attacker could strip a | |||
| TCP option or discard a long run of ACKs, it could wreak other | TCP Option or discard a long run of ACKs, it could wreak other | |||
| arbitrary havoc). However, it would be of concern if AccECN's | arbitrary havoc). However, it would be of concern if AccECN's | |||
| resilience could be indirectly compromised during a flooding attack. | resilience could be indirectly compromised during a flooding attack. | |||
| AccECN is still considered safe though, because if AccECN Options are | AccECN is still considered safe though, because if AccECN Options are | |||
| not present, the AccECN Data Sender is then required to switch to | not present, the AccECN Data Sender is then required to switch to | |||
| more conservative assumptions about wrap of congestion indication | more conservative assumptions about wrap of congestion indication | |||
| counters (see Section 3.2.2.5 and Appendix A.2). | counters (see Section 3.2.2.5 and Appendix A.2). | |||
| Section 5.1 describes how a TCP Server can negotiate AccECN and use | Section 5.1 describes how a TCP Server can negotiate AccECN and use | |||
| the SYN cookie method for mitigating SYN flooding attacks. | the SYN cookie method for mitigating SYN flooding attacks. | |||
| skipping to change at line 2639 ¶ | skipping to change at line 2640 ¶ | |||
| will be degraded, but the integrity of this degraded information can | will be degraded, but the integrity of this degraded information can | |||
| still be assured. Assuring that Data Senders respond appropriately | still be assured. Assuring that Data Senders respond appropriately | |||
| to ECN feedback is possible, but the scope of the present document is | to ECN feedback is possible, but the scope of the present document is | |||
| confined to the feedback protocol and excludes the response to this | confined to the feedback protocol and excludes the response to this | |||
| feedback. | feedback. | |||
| In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | |||
| TCP AccECN Option length and read as many whole 3-octet fields from | TCP AccECN Option length and read as many whole 3-octet fields from | |||
| it as possible up to a maximum of 3, treating the remainder as | it as possible up to a maximum of 3, treating the remainder as | |||
| padding. This opens up a potential covert channel of up to 29B (40 - | padding. This opens up a potential covert channel of up to 29B (40 - | |||
| (2+3*3)) B. However, it is really an overt channel (not hidden) and | (2+3*3)). However, it is really an overt channel (not hidden) and it | |||
| it is no different than the use of unknown TCP options with unknown | is no different than the use of unknown TCP Options with unknown | |||
| option lengths in general. Therefore, where this is of concern, it | option lengths in general. Therefore, where this is of concern, it | |||
| can already be adequately mitigated by regular TCP normalizer | can already be adequately mitigated by regular TCP normalizer | |||
| technology (see Section 3.3.2). | technology (see Section 3.3.2). | |||
| The AccECN protocol is not believed to introduce any new privacy | The AccECN protocol is not believed to introduce any new privacy | |||
| concerns, because it merely counts and feeds back signals at the | concerns, because it merely counts and feeds back signals at the | |||
| transport layer that had already been visible at the IP layer. A | transport layer that had already been visible at the IP layer. A | |||
| covert channel can be used to compromise privacy. However, as | covert channel can be used to compromise privacy. However, as | |||
| explained above, undefined TCP options in general open up such | explained above, undefined TCP Options in general open up such | |||
| channels, and common techniques are available to close them off. | channels, and common techniques are available to close them off. | |||
| There is a potential concern that a Data Receiver could deliberately | There is a potential concern that a Data Receiver could deliberately | |||
| omit AccECN Options pretending that they had been stripped by a | omit AccECN Options pretending that they had been stripped by a | |||
| middlebox. No known way can yet be contrived for a receiver to take | middlebox. Currently, there is no known way for a receiver to take | |||
| advantage of this behaviour, which seems to always degrade its own | advantage of this behaviour, which seems to always degrade its own | |||
| performance. However, the concern is mentioned here for | performance. However, the concern is mentioned here for | |||
| completeness. | completeness. | |||
| A generic privacy concern of any new protocol is that for a while it | A generic privacy concern of any new protocol is that for a while it | |||
| will be used by a small population of hosts, and thus show up more | will be used by a small population of hosts, and thus those hosts | |||
| easily. However, it is expected that AccECN will become available in | could be more easily identified. However, it is expected that AccECN | |||
| operating systems over time and that it will eventually be turned on | will become available in operating systems over time and that it will | |||
| by default. Thus, an individual identification of a particular user | eventually be turned on by default. Thus, an individual | |||
| is less of a concern than the fingerprinting of specific versions of | identification of a particular user is less of a concern than the | |||
| operation systems. However, the latter can be done using different | fingerprinting of specific versions of operation systems. However, | |||
| means independent of Accurate ECN. | the latter can be done using different means independent of Accurate | |||
| ECN. | ||||
| As Accurate ECN exposes more bits in the TCP header that could be | As Accurate ECN exposes more bits in the TCP header that could be | |||
| tampered with without interfering with the transport excessively, it | tampered with without interfering with the transport excessively, it | |||
| may allow an additional way to identify specific data streams across | may allow an additional way to identify specific data streams across | |||
| a virtual private network (VPN) to an attacker that has access to the | a virtual private network (VPN) to an attacker that has access to the | |||
| datastream before and after the VPN tunnel endpoints. This may be | datastream before and after the VPN tunnel endpoints. This may be | |||
| achieved by injecting or modifying the ACE field in specific patterns | achieved by injecting or modifying the ACE field in specific patterns | |||
| that can be recognized. | that can be recognized. | |||
| Overall, Accurate ECN does not change the risk profile on privacy to | Overall, Accurate ECN does not change the risk profile on privacy to | |||
| skipping to change at line 2722 ¶ | skipping to change at line 2724 ¶ | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | |||
| STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | |||
| <https://www.rfc-editor.org/info/rfc9293>. | <https://www.rfc-editor.org/info/rfc9293>. | |||
| 9.2. Informative References | 9.2. Informative References | |||
| [BCP69] Best Current Practice 69, | ||||
| <https://www.rfc-editor.org/info/bcp69>. | ||||
| At the time of writing, this BCP comprises the following: | ||||
| Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | ||||
| Sooriyabandara, "TCP Performance Implications of Network | ||||
| Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | ||||
| December 2002, <https://www.rfc-editor.org/info/rfc3449>. | ||||
| [ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | [ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | |||
| Congestion Notification (ECN) to TCP Control Packets", | Congestion Notification (ECN) to TCP Control Packets", | |||
| Work in Progress, Internet-Draft, draft-ietf-tcpm- | Work in Progress, Internet-Draft, draft-ietf-tcpm- | |||
| generalized-ecn-17, 21 April 2025, | generalized-ecn-17, 21 April 2025, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | |||
| generalized-ecn-17>. | generalized-ecn-17>. | |||
| [Mandalari18] | [Mandalari18] | |||
| Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | |||
| Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | |||
| over Mobile", IEEE Communications Magazine , March 2018, | over Mobile", IEEE Communications Magazine , March 2018, | |||
| <http://www.it.uc3m.es/amandala/ | <http://www.it.uc3m.es/amandala/ | |||
| ecn++/ecn_commag_2018.html>. | ecn++/ecn_commag_2018.html>. | |||
| [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | ||||
| Sooriyabandara, "TCP Performance Implications of Network | ||||
| Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | ||||
| December 2002, <https://www.rfc-editor.org/info/rfc3449>. | ||||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
| Congestion Notification (ECN) Signaling with Nonces", | Congestion Notification (ECN) Signaling with Nonces", | |||
| RFC 3540, DOI 10.17487/RFC3540, June 2003, | RFC 3540, DOI 10.17487/RFC3540, June 2003, | |||
| <https://www.rfc-editor.org/info/rfc3540>. | <https://www.rfc-editor.org/info/rfc3540>. | |||
| [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common | [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common | |||
| Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | |||
| <https://www.rfc-editor.org/info/rfc4987>. | <https://www.rfc-editor.org/info/rfc4987>. | |||
| [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | |||
| skipping to change at line 2852 ¶ | skipping to change at line 2858 ¶ | |||
| (L4S) Internet Service: Architecture", RFC 9330, | (L4S) Internet Service: Architecture", RFC 9330, | |||
| DOI 10.17487/RFC9330, January 2023, | DOI 10.17487/RFC9330, January 2023, | |||
| <https://www.rfc-editor.org/info/rfc9330>. | <https://www.rfc-editor.org/info/rfc9330>. | |||
| [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | |||
| "CUBIC for Fast and Long-Distance Networks", RFC 9438, | "CUBIC for Fast and Long-Distance Networks", RFC 9438, | |||
| DOI 10.17487/RFC9438, August 2023, | DOI 10.17487/RFC9438, August 2023, | |||
| <https://www.rfc-editor.org/info/rfc9438>. | <https://www.rfc-editor.org/info/rfc9438>. | |||
| [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | |||
| Specification", Volume 1, Release 1.4, 2020, | Specification", | |||
| <https://www.infinibandta.org/ibta-specification/>. | <https://www.infinibandta.org/ibta-specification/>. | |||
| Appendix A. Example Algorithms | Appendix A. Example Algorithms | |||
| This appendix is informative, not normative. It gives example | This appendix is informative, not normative. It gives example | |||
| algorithms that would satisfy the normative requirements of the | algorithms that would satisfy the normative requirements of the | |||
| AccECN protocol. However, implementers are free to choose other ways | AccECN protocol. However, implementers are free to choose other ways | |||
| to implement the requirements. | to satisfy the requirements. | |||
| A.1. Example Algorithm to Encode/Decode the AccECN Option | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
| The example algorithms below show how a Data Receiver in AccECN mode | The example algorithms below show how a Data Receiver in AccECN mode | |||
| could encode its CE byte counter r.ceb into the ECEB field within an | could encode its CE byte counter r.ceb into the ECEB field within an | |||
| AccECN TCP Option, and how a Data Sender in AccECN mode could decode | AccECN TCP Option, and how a Data Sender in AccECN mode could decode | |||
| the ECEB field into its byte counter s.ceb. The other counters for | the ECEB field into its byte counter s.ceb. The other counters for | |||
| bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | |||
| encoded and decoded. | encoded and decoded. | |||
| skipping to change at line 2893 ¶ | skipping to change at line 2899 ¶ | |||
| where '%' is the remainder operator. | where '%' is the remainder operator. | |||
| On the arrival of an AccECN Option, the Data Sender first makes sure | On the arrival of an AccECN Option, the Data Sender first makes sure | |||
| the ACK has not been superseded in order to avoid winding the s.ceb | the ACK has not been superseded in order to avoid winding the s.ceb | |||
| counter backwards. It uses the TCP acknowledgement number and any | counter backwards. It uses the TCP acknowledgement number and any | |||
| SACK options [RFC2018] to calculate newlyAckedB, the amount of new | SACK options [RFC2018] to calculate newlyAckedB, the amount of new | |||
| data that the ACK acknowledges in bytes (newlyAckedB can be zero but | data that the ACK acknowledges in bytes (newlyAckedB can be zero but | |||
| not negative). If newlyAckedB is zero, either the ACK has been | not negative). If newlyAckedB is zero, either the ACK has been | |||
| superseded or CE-marked packet(s) without data could have arrived. | superseded or CE-marked packet(s) without data could have arrived. | |||
| To break the tie for the latter case, the Data Sender could use time- | To break the tie for the latter case, the Data Sender could use | |||
| stamps [RFC7323] (if present) to work out newlyAckedT, the amount of | timestamps [RFC7323] (if present) to work out newlyAckedT, the amount | |||
| new time that the ACK acknowledges. If the Data Sender determines | of new time that the ACK acknowledges. If the Data Sender determines | |||
| that the ACK has been superseded, it ignores the AccECN Option. | that the ACK has been superseded, it ignores the AccECN Option. | |||
| Otherwise, the Data Sender calculates the minimum non-negative | Otherwise, the Data Sender calculates the minimum non-negative | |||
| difference d.ceb between the ECEB field and its local s.ceb counter, | difference d.ceb between the ECEB field and its local s.ceb counter, | |||
| using modulo arithmetic as follows: | using modulo arithmetic as follows: | |||
| if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | |||
| d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | |||
| s.ceb += d.ceb | s.ceb += d.ceb | |||
| } | } | |||
| skipping to change at line 2982 ¶ | skipping to change at line 2988 ¶ | |||
| of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | |||
| retransmissions will not repair the lost AccECN information, because | retransmissions will not repair the lost AccECN information, because | |||
| AccECN requires retransmissions to carry the latest AccECN counters, | AccECN requires retransmissions to carry the latest AccECN counters, | |||
| not the original ones. | not the original ones. | |||
| The phrase 'under prevailing conditions' allows for implementation- | The phrase 'under prevailing conditions' allows for implementation- | |||
| dependent interpretation. A Data Sender might take account of the | dependent interpretation. A Data Sender might take account of the | |||
| prevailing size of data segments and the prevailing CE marking rate | prevailing size of data segments and the prevailing CE marking rate | |||
| just before the sequence of missing ACKs. However, we shall start | just before the sequence of missing ACKs. However, we shall start | |||
| with the simplest algorithm, which assumes segments are all full- | with the simplest algorithm, which assumes segments are all full- | |||
| sized and ultra-conservatively it assumes that ECN marking was 100% | sized, and ultra-conservatively it assumes that ECN marking was 100% | |||
| on the forward path when ACKs on the reverse path started to all be | on the forward path when ACKs on the reverse path started to all be | |||
| dropped. Specifically, if newlyAckedB is the amount of data that an | dropped. Specifically, if newlyAckedB is the amount of data that an | |||
| ACK acknowledges since the previous ACK, then the Data Sender could | ACK acknowledges since the previous ACK, then the Data Sender could | |||
| assume that this acknowledges newlyAckedPkt full-sized segments, | assume that this acknowledges newlyAckedPkt full-sized segments, | |||
| where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | |||
| ACE field incremented by | ACE field incremented by | |||
| dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE) | dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE) | |||
| For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | |||
| size segments than any previous ACK, and that ACE increments by a | size segments than any previous ACK, and that ACE increments by a | |||
| minimum of 2 CE marks (d.cep=2). The above formula works out that it | minimum of 2 CE marks (d.cep=2). The above formula indicates that it | |||
| would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | |||
| 2). However, if ACE increases by a minimum of 2 but acknowledges 10 | 2). However, if ACE increases by a minimum of 2 but acknowledges 10 | |||
| full-sized segments, then it would be necessary to assume that there | full-sized segments, then it would be necessary to assume that there | |||
| could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | |||
| Note that checks would need to be added to the above pseudocode for | Note that checks would need to be added to the above pseudocode for | |||
| (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | |||
| wrongly estimated using an inappropriate packet size. | wrongly estimated using an inappropriate packet size. | |||
| ACKs that acknowledge a large stretch of packets might be common in | ACKs that acknowledge a large stretch of packets might be common in | |||
| skipping to change at line 3024 ¶ | skipping to change at line 3030 ¶ | |||
| average segment size and prevailing ECN marking. For instance, | average segment size and prevailing ECN marking. For instance, | |||
| newlyAckedPkt in the above formula could be replaced with | newlyAckedPkt in the above formula could be replaced with | |||
| newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | |||
| segment size and p is the prevailing ECN marking probability. | segment size and p is the prevailing ECN marking probability. | |||
| However, ultimately, if TCP's ECN feedback becomes inaccurate, it | However, ultimately, if TCP's ECN feedback becomes inaccurate, it | |||
| still has loss detection to fall back on. Therefore, it would seem | still has loss detection to fall back on. Therefore, it would seem | |||
| safe to implement a simple algorithm, rather than a perfect one. | safe to implement a simple algorithm, rather than a perfect one. | |||
| The simple algorithm for dSafer.cep above requires no monitoring of | The simple algorithm for dSafer.cep above requires no monitoring of | |||
| prevailing conditions and it would still be safe if, for example, | prevailing conditions and it would still be safe if, for example, | |||
| segments were on average at least 5% of full-sized as long as ECN | segments were on average at least 5% of a full-sized packet as long | |||
| marking was 5% or less. Assuming it was used, the Data Sender would | as ECN marking was 5% or less. Assuming it was used, the Data Sender | |||
| increment its packet counter as follows: | would increment its packet counter as follows: | |||
| s.cep += dSafer.cep | s.cep += dSafer.cep | |||
| If missing acknowledgement numbers arrive later (due to reordering), | If missing acknowledgement numbers arrive later (due to reordering), | |||
| Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | |||
| effect of any action it took based on a conservative assumption that | effect of any action it took based on a conservative assumption that | |||
| it later found to be incorrect". To do this, the Data Sender would | it later found to be incorrect". To do this, the Data Sender would | |||
| have to store the values of all the relevant variables whenever it | have to store the values of all the relevant variables whenever it | |||
| made assumptions, so that it could re-evaluate them later. Given | made assumptions, so that it could re-evaluate them later. Given | |||
| this could become complex and it is not required, we do not attempt | this could become complex and it is not required, we do not attempt | |||
| skipping to change at line 3063 ¶ | skipping to change at line 3069 ¶ | |||
| if (dSafer.cep > d.cep) { | if (dSafer.cep > d.cep) { | |||
| if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | |||
| sSafer = d.ceb/dSafer.cep | sSafer = d.ceb/dSafer.cep | |||
| if (sSafer < MSS/SAFETY_FACTOR) | if (sSafer < MSS/SAFETY_FACTOR) | |||
| dSafer.cep = d.cep % d.cep is a safe enough estimate | dSafer.cep = d.cep % d.cep is a safe enough estimate | |||
| } % else | } % else | |||
| % No need for else; dSafer.cep is already correct, | % No need for else; dSafer.cep is already correct, | |||
| % because d.cep must have been too small | % because d.cep must have been too small | |||
| } | } | |||
| The chart below shows when the above algorithm will consider d.cep | The chart below shows when the above algorithm will replace | |||
| can replace dSafer.cep as a safe enough estimate of the number of CE- | dSafer.cep with d.cep as a safe enough estimate of the number of CE | |||
| marked packets: | marked packets: | |||
| ^ | ^ | |||
| sSafer| | sSafer| | |||
| | | | | |||
| MSS+ | MSS+ | |||
| | | | | |||
| | dSafer.cep | | dSafer.cep | |||
| | is | | is | |||
| MSS/SAFETY_FACTOR+--------------+ safest | MSS/SAFETY_FACTOR+--------------+ safest | |||
| skipping to change at line 3113 ¶ | skipping to change at line 3119 ¶ | |||
| than below MSS/2. | than below MSS/2. | |||
| If pure ACKs were allowed to be ECN-capable, missing ACKs would be | If pure ACKs were allowed to be ECN-capable, missing ACKs would be | |||
| far less likely. However, because [RFC3168] currently precludes | far less likely. However, because [RFC3168] currently precludes | |||
| this, the above algorithm assumes that pure ACKs are not ECN-capable. | this, the above algorithm assumes that pure ACKs are not ECN-capable. | |||
| A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | |||
| If AccECN Options are not available, the Data Sender can only decode | If AccECN Options are not available, the Data Sender can only decode | |||
| a CE marking from the ACE field in packets. Every time an ACK | a CE marking from the ACE field in packets. Every time an ACK | |||
| arrives, to convert this into an estimate of CE-marked bytes, it | arrives, to convert the number of CE markings into an estimate of CE- | |||
| needs an average of the segment size, s_ave. Then it can add or | marked bytes, it needs an average of the segment size, s_ave. Then | |||
| subtract s_ave from the value of d.ceb as the value of d.cep | it can add or subtract s_ave from the value of d.ceb as the value of | |||
| increments or decrements. Some possible ways to calculate s_ave are | d.cep increments or decrements. Some possible ways to calculate | |||
| outlined below. The precise details will depend on why an estimate | s_ave are outlined below. The precise details will depend on why an | |||
| of marked bytes is needed. | estimate of marked bytes is needed. | |||
| The implementation could keep a record of the byte numbers of all the | The implementation could keep a record of the byte numbers of all the | |||
| boundaries between packets in flight (including control packets), and | boundaries between packets in flight (including control packets), and | |||
| recalculate s_ave on every ACK. However, it would be simpler to | recalculate s_ave on every ACK. However, it would be simpler to | |||
| merely maintain a counter packets_in_flight for the number of packets | merely maintain a counter packets_in_flight for the number of packets | |||
| in flight (including control packets), which is reset once per RTT. | in flight (including control packets), which is reset once per RTT. | |||
| Either way, it would estimate s_ave as: | Either way, it would estimate s_ave as: | |||
| s_ave ~= flightsize / packets_in_flight, | s_ave ~= flightsize / packets_in_flight, | |||
| skipping to change at line 3179 ¶ | skipping to change at line 3185 ¶ | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
| AccECN uses a rather unorthodox approach to negotiate the highest | AccECN uses a rather unorthodox approach to negotiate the highest | |||
| version TCP ECN feedback scheme that both ends support, as justified | version TCP ECN feedback scheme that both ends support, as justified | |||
| below. It follows from the original TCP ECN capability negotiation | below. It follows from the original TCP ECN capability negotiation | |||
| [RFC3168], in which the Client set the 2 least significant of the | [RFC3168], in which the Client set the 2 least significant of the | |||
| original reserved flags in the TCP header, and fell back to No ECN | original reserved flags in the TCP header, and fell back to No ECN | |||
| support if the Server responded with the 2 flags cleared, which had | support if the Server responded with the 2 flags cleared, which had | |||
| previously been the default. | previously been the default. | |||
| Classic ECN used header flags rather than a TCP option because it was | Classic ECN used header flags rather than a TCP Option because it was | |||
| considered more efficient to use a header flag for 1 bit of feedback | considered more efficient to use a header flag for 1 bit of feedback | |||
| per ACK, and this bit could be overloaded to indicate support for | per ACK, and this bit could be overloaded to indicate support for | |||
| Classic ECN during the handshake. During the development of ECN, 1 | Classic ECN during the handshake. During the development of ECN, 1 | |||
| bit crept up to 2, in order to deliver the feedback reliably and to | bit crept up to 2, in order to deliver the feedback reliably and to | |||
| work round some broken hosts that reflected the reserved flags during | work round some broken hosts that reflected the reserved flags during | |||
| the handshake. | the handshake. | |||
| In order to be backward compatible with RFC 3168, AccECN continues | In order to be backward compatible with RFC 3168, AccECN continues | |||
| this approach, using the 3rd least significant TCP header flag that | this approach, using the 3rd least significant TCP header flag that | |||
| had previously been allocated for the ECN-nonce (now historic). | had previously been allocated for the ECN-nonce (now historic). | |||
| Then, whatever form of Server an AccECN Client encounters, the | Then, whatever form of Server an AccECN Client encounters, the | |||
| connection can fall back to the highest version of feedback protocol | connection can fall back to the highest version of feedback protocol | |||
| that both ends support, as explained in Section 3.1. | that both ends support, as explained in Section 3.1. | |||
| If AccECN capability negotiation had used the more orthodox approach | If AccECN capability negotiation had used the more orthodox approach | |||
| of a TCP option, it would still have had to set the two ECN flags in | of a TCP Option, it would still have had to set the two ECN flags in | |||
| the main TCP header, in order to be able to fall back to Classic ECN | the main TCP header, in order to be able to fall back to Classic ECN | |||
| [RFC3168], or to disable ECN support, without another round of | [RFC3168], or to disable ECN support, without another round of | |||
| negotiation. Then AccECN would also have had to handle all the | negotiation. Then AccECN would also have had to handle all the | |||
| different ways that Servers currently respond to settings of the ECN | different ways that Servers currently respond to settings of the ECN | |||
| flags in the main TCP header, including all of the conflicting cases | flags in the main TCP header, including all of the conflicting cases | |||
| where a Server might have said it supported one approach in the flags | where a Server might have said it supported one approach in the flags | |||
| and another approach in a new TCP option. And AccECN would have had | and another approach in a new TCP Option. And AccECN would have had | |||
| to deal with all of the additional possibilities where a middlebox | to deal with all of the additional possibilities where a middlebox | |||
| might have mangled the ECN flags, or removed TCP options. Thus, | might have mangled the ECN flags, or removed TCP Options. Thus, | |||
| usage of the 3rd reserved TCP header flag simplified the protocol. | usage of the 3rd reserved TCP header flag simplified the protocol. | |||
| The third flag was used in a way that could be distinguished from the | The third flag was used in a way that could be distinguished from the | |||
| ECN-nonce, in case any nonce deployment was encountered. Previous | ECN-nonce, in case any nonce deployment was encountered. Previous | |||
| usage of this flag for the ECN-nonce was integrated into the original | usage of this flag for the ECN-nonce was integrated into the original | |||
| ECN negotiation. This further justified the third flag's use for | ECN negotiation. This further justified the third flag's use for | |||
| AccECN, because a non-ECN usage of this flag would have had to use it | AccECN, because a non-ECN usage of this flag would have had to use it | |||
| as a separate single bit, rather than in combination with the other 2 | as a separate single bit, rather than in combination with the other 2 | |||
| ECN flags. | ECN flags. | |||
| skipping to change at line 3232 ¶ | skipping to change at line 3238 ¶ | |||
| indicate on the SYN/ACK, four already indicated earlier (or broken) | indicate on the SYN/ACK, four already indicated earlier (or broken) | |||
| versions of ECN support, one now being Historic. In the early design | versions of ECN support, one now being Historic. In the early design | |||
| of AccECN, an AccECN Server could use only 2 of the 4 remaining | of AccECN, an AccECN Server could use only 2 of the 4 remaining | |||
| codepoints. They both indicated AccECN support, but one fed back | codepoints. They both indicated AccECN support, but one fed back | |||
| that the SYN had arrived marked as CE. Even though ECN support on a | that the SYN had arrived marked as CE. Even though ECN support on a | |||
| SYN is not yet on the Standards Track, the idea is for either end to | SYN is not yet on the Standards Track, the idea is for either end to | |||
| act as a mechanistic reflector, so that future capabilities can be | act as a mechanistic reflector, so that future capabilities can be | |||
| unilaterally deployed without requiring 2-ended deployment (justified | unilaterally deployed without requiring 2-ended deployment (justified | |||
| in Section 2.5). | in Section 2.5). | |||
| During traversal testing, it was discovered that the IP-ECN field in | During traversal testing, it was discovered that the IP ECN field in | |||
| the SYN was mangled on a non-negligible proportion of paths. | the SYN was mangled on a non-negligible proportion of paths. | |||
| Therefore, it was necessary to allow the SYN/ACK to feed all four IP- | Therefore, it was necessary to allow the SYN/ACK to feed all four IP | |||
| ECN codepoints that the SYN could arrive with back to the Client. | ECN codepoints that the SYN could arrive with back to the Client. | |||
| Without this, the Client could not know whether to disable ECN for | Without this, the Client could not know whether to disable ECN for | |||
| the connection due to mangling of the IP-ECN field (also explained in | the connection due to mangling of the IP ECN field (also explained in | |||
| Section 2.5). This development consumed the remaining two codepoints | Section 2.5). This development consumed the remaining two codepoints | |||
| on the SYN/ACK that had been reserved for future use by AccECN in | on the SYN/ACK that had been reserved for future use by AccECN in | |||
| earlier versions. | earlier draft versions of this document. | |||
| B.3. Space for Future Evolution | B.3. Space for Future Evolution | |||
| Despite availability of usable TCP header space being extremely | Despite availability of usable TCP header space being extremely | |||
| scarce, the AccECN protocol has taken all possible steps to ensure | scarce, the AccECN protocol has taken all possible steps to ensure | |||
| that there is space to negotiate possible future variants of the | that there is space to negotiate possible future variants of the | |||
| protocol, either if a variant of AccECN is required, or if a | protocol, either if a variant of AccECN is required, or if a | |||
| completely different ECN feedback approach is needed. | completely different ECN feedback approach is needed. | |||
| Future AccECN variants: When the AccECN capability is negotiated | Future AccECN variants: When the AccECN capability is negotiated | |||
| skipping to change at line 3300 ¶ | skipping to change at line 3306 ¶ | |||
| equivalent to AccECN negotiation with (1,1,1) on the SYN. These | equivalent to AccECN negotiation with (1,1,1) on the SYN. These | |||
| codepoints would not allow fall-back to Classic ECN support for a | codepoints would not allow fall-back to Classic ECN support for a | |||
| Server that did not understand them, but this approach ensures | Server that did not understand them, but this approach ensures | |||
| they are available in the future, perhaps for uses other than ECN | they are available in the future, perhaps for uses other than ECN | |||
| alongside the AccECN scheme. All possible combinations of SYN/ACK | alongside the AccECN scheme. All possible combinations of SYN/ACK | |||
| could be used in response except either (0,0,0) or reflection of | could be used in response except either (0,0,0) or reflection of | |||
| the same values sent on the SYN. | the same values sent on the SYN. | |||
| In order to extend AccECN or ECN in the future, other ways could | In order to extend AccECN or ECN in the future, other ways could | |||
| be resorted to, although their traversal properties are likely to | be resorted to, although their traversal properties are likely to | |||
| be inferior. They include a new TCP option; using the remaining | be inferior. They include a new TCP Option; using the remaining | |||
| reserved flags in the main TCP header (preferably extending the | reserved flags in the main TCP header (preferably extending the | |||
| 3-bit combinations used by AccECN to 4-bit combinations, rather | 3-bit combinations used by AccECN to 4-bit combinations, rather | |||
| than burning one bit for just one state); a non-zero urgent | than burning one bit for just one state); a non-zero urgent | |||
| pointer in combination with the URG flag cleared; or some other | pointer in combination with the URG flag cleared; or some other | |||
| unexpected combination of fields yet to be invented. | unexpected combination of fields yet to be invented. | |||
| Acknowledgements | Acknowledgements | |||
| We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | |||
| Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | |||
| End of changes. 150 change blocks. | ||||
| 242 lines changed or deleted | 248 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||