<?xmlversion="1.0" encoding="US-ASCII"?>version='1.0' encoding='utf-8'?> <!DOCTYPE rfcSYSTEM "rfc2629.dtd"[ <!ENTITYRFC2119 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">nbsp " "> <!ENTITYRFC8174 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">zwsp "​"> <!ENTITYRFC6514 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6514.xml">nbhy "‑"> <!ENTITYRFC7432 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7432.xml"> <!ENTITY I-D.ietf-bess-evpn-bum-procedure-updates SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-bess-evpn-bum-procedure-updates.xml"> <!ENTITY RFC8365 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8365.xml"> <!ENTITY RFC7902 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7902.xml"> <!ENTITY RFC6513 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6513.xml"> <!ENTITY RFC7348 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7348.xml"> <!ENTITY RFC4023 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4023.xml"> <!ENTITY RFC7637 SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7637.xml"> <!ENTITY I-D.ietf-bess-evpn-proxy-arp-nd SYSTEM "https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-bess-evpn-proxy-arp-nd.xml">wj "⁠"> ]><?rfc toc="yes"?> <?rfc tocompact="yes"?> <?rfc tocdepth="3"?> <?rfc tocindent="yes"?> <?rfc symrefs="yes"?> <?rfc sortrefs="yes"?> <?rfc comments="yes"?> <?rfc inline="yes"?> <?rfc compact="yes"?> <?rfc subcompact="no"?><rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="std" docName="draft-ietf-bess-evpn-optimized-ir-12" number="9574" ipr="trust200902"submissionType="IETF"> <!-- Generated by id2xml 1.5.0 on 2019-12-29T20:19:06Z --> <?rfc strict="yes"?> <?rfc compact="yes"?> <?rfc subcompact="no"?> <?rfc symrefs="yes"?> <?rfc sortrefs="no"?> <?rfc text-list-symbols="-o+*"?> <?rfc toc="yes"?>submissionType="IETF" obsoletes="" updates="" xml:lang="en" tocInclude="true" consensus="true" tocDepth="3" symRefs="true" sortRefs="false" version="3"> <front> <title abbrev="EVPN Optimized IR">Optimized Ingress Replication Solution for Ethernet VPNs (EVPNs)</title> <!-- [rfced] We updated the document title as follows. Please let us know any objections. Original: Optimized Ingress Replication Solution for Ethernet VPN(EVPN)</title>(EVPN) Currently: Optimized Ingress Replication Solution for Ethernet VPNs (EVPNs) --> <seriesInfo name="RFC" value="9574"/> <authorfullname="J.fullname="Jorge Rabadan" initials="J." role="editor" surname="Rabadan"> <organization>Nokia</organization> <address> <postal> <street>777 Middlefield Road</street> <city>Mountain View</city> <region>CA</region> <code>94043</code><country>USA</country><country>United States of America</country> </postal> <email>jorge.rabadan@nokia.com</email> </address> </author> <authorfullname="S.fullname="Senthil Sathappan" initials="S." surname="Sathappan"> <organization>Nokia</organization> <address> <email>senthil.sathappan@nokia.com</email> </address> </author> <authorfullname="W.fullname="Wen Lin" initials="W." surname="Lin"> <organization>Juniper Networks</organization> <address> <email>wlin@juniper.net</email> </address> </author> <authorfullname="M.fullname="Mukul Katiyar" initials="M." surname="Katiyar"> <organization>Versa Networks</organization> <address> <email>mukul@versa-networks.com</email> </address> </author> <authorfullname="A.fullname="Ali Sajassi" initials="A." surname="Sajassi"> <organization>Cisco Systems</organization> <address> <email>sajassi@cisco.com</email> </address> </author> <dateday="25" month="January" year="2022"/> <workgroup>BESS Workgroup</workgroup>month="April" year="2024"/> <area>rtg</area> <workgroup>BESS</workgroup> <!-- [rfced] Please insert any keywords (beyond those that appear in the title) for use on <https://www.rfc-editor.org/search>. --> <abstract> <t>Network Virtualization Overlay (NVO) networks using EthernetVPN (EVPN)VPNs (EVPNs) as their control plane may useIngress Replicationtrees based on ingress replication orPIM (ProtocolProtocol IndependentMulticast)-based treesMulticast (PIM) to convey the overlay Broadcast, Unknownunicast andUnicast, or Multicast (BUM) traffic. PIM provides an efficient solutionto avoidthat prevents sending multiple copies of the same packet over the same physicallink, howeverlink; however, it may not always be deployed in theNetwork Virtualization OverlayNVO core network. IngressReplicationreplication avoids the dependency on PIM in theNetwork Virtualization OverlayNVO network core. WhileIngress Replicationingress replication provides a simple multicast transport, someNetwork Virtualization OverlayNVO networks with demanding multicast applications require a more efficient solution without PIM in the core. This document describes a solution to optimize the efficiency ofIngress Replicationingress replication trees.</t> </abstract> </front> <middle> <section anchor="sect-1"title="Introduction">numbered="true" toc="default"> <name>Introduction</name> <t>Ethernet Virtual Private Networks(EVPN)(EVPNs) may be used as the control plane for a Network Virtualization Overlay (NVO) network <xreftarget="RFC8365"/>.target="RFC8365" format="default"/>. Network Virtualization Edge (NVE) and Provider Edge (PE) devices that are part of the same EVPN Broadcast Domain (BD) use Ingress Replication (IR) or PIM-based trees to transport the tenant's Broadcast, Unknownunicast andUnicast, or Multicast (BUM) traffic.</t> <t>In theIngress Replicationingress replication approach, the ingress NVErecevingreceiving a BUM frame from the Tenant System (TS) will create as many copies of the frame as remote NVEs/PEs are attached to the BD. Each of those copies will be encapsulated into an IP packet where the outer IP Destination Address (IP DA) identifies the loopback of the egress NVE/PE. The IP fabric core nodes (also known asSpines)spines) will simply route theIP encapsulatedIP-encapsulated BUM frames based on the outer IP DA. If PIM-based trees are used instead ofIngress Replication,ingress replication, the NVEs/PEs attached to the same BD will join a PIM-based tree. The ingress NVE receiving a BUM frame will send a single copy of the frame, encapsulated into an IP packet where the outer IP DA is the multicast address that represents the PIM-based tree. The IP fabric core nodes are part of the PIM tree and keep multicast state for the multicast group, so thatIP encapsulatedIP-encapsulated BUM frames can be routed to all the NVEs/PEs that joined thetree.</t>tree. <!-- [rfced] Section 1: Does "as many copies of the frame as remote NVEs/PEs are attached" mean "as many copies of the frame as the number of remote NVEs/PEs that are attached"? If the suggested text is not correct, please clarify. Original: In the Ingress Replication approach, the ingress NVE receving a BUM frame from the Tenant System will create as many copies of the frame as remote NVEs/PEs are attached to the BD. Suggested ("receving" has been fixed): In the ingress replication approach, the ingress NVE receiving a BUM frame from the Tenant System (TS) will create as many copies of the frame as the number of remote NVEs/PEs that are attached to the BD. --> </t> <t>The two approaches are illustrated in <xreftarget="IR-PIM"/>.target="IR-PIM" format="default"/>. On the left-handside,side of the diagram, NVE1 usesIngress Replicationingress replication to send a BUM frame (originated from Tenant System TS1) to the remote nodes attached to the BD, i.e., NVE2,NV3,NVE3, and PE1. On the right-handside of the diagram,side, the same example is depicted but using a PIM-based tree, i.e., (S1,G1), instead ofIngress Replication.ingress replication. While a single copy of the tunneled BUM frame is generated in the latter approach, all the routers in the fabric need to keepmuticastmulticast state, e.g., theSpinespine keeps a PIMmulticastrouting entry for (S1,G1) with an Incoming Interface (IIF) and three Outgoing Interfaces(OIFs).</t> <t><figure anchor="IR-PIM" title="Ingress(OIFs). <!-- [rfced] Section 1: We changed "NV3" to "NVE3" per Figure 1. If this is incorrect, please define "NV". Original: On the left-hand side, NVE1 uses Ingress Replicationvs PIM-based treesto send a BUM frame (originated from Tenant System TS1) to the remote nodes attached to the BD, i.e., NVE2, NV3, PE1. Currently: On the left-hand side of the diagram, NVE1 uses ingress replication to send a BUM frame (originated from Tenant System TS1) to the remote nodes attached to the BD, i.e., NVE2, NVE3, and PE1. --> </t> <figure anchor="IR-PIM"> <name>Ingress Replication vs. PIM-Based Trees in NVOnetworks"> <artwork><![CDATA[Networks</name> <artwork name="" type="" align="left" alt=""><![CDATA[ To-WAN To-WAN ^ ^ | | +-----+ +-----+ +----------| PE1 |-----------+ +----------| PE1 |-----------+ | +--^--+ | | +--^--+ | | | IP Fabric | | | IP Fabric | | PE | | (S1,G1) |OIF to-G | | +----PE->+-----+ No State | | IIF +-----+ OIF to-G | | | +---2->|Spine|------+ | | +------>Spine|------+ | | | | +-3->+-----+ | | | | +-----+ | | | | | | 2 3 | | |PIM |OIF to-G | | | | | |IR | | | | |tree | | | |+-----+ +--v--+ +--v--+ | |+-----+ +--v--+ +--v--+ | +| NVE1|---| NVE2|---| NVE3|-+ +| NVE1|---| NVE2|---| NVE3|-+ +--^--+ +-----+ +-----+ +--^--+ +-----+ +-----+ | | | | | | | v v | v v TS1 TS2 TS3 TS1 TS2 TS3]]></artwork></figure></t></figure> <!-- [rfced] Figure 1: Should the three instances of "to-G" be "to G1", in which case perhaps we should also change "To-WAN" to "To WAN")? (in other words, do these hyphenated entries mean "(from the) OIF to G" and "To (the) WAN"? Original: To-WAN ... OIF to-G ... --> <t>InNetwork Virtualization OverlayNVO networks where PIM-based trees cannot be used,Ingress Replicationingress replication is the only option. Examples of these situations areNetwork Virtualization OverlayNVO networks where the core nodes do not support PIM or the network operator does not want to run PIM in the core.</t> <t>In someuse-cases,use cases, the amount of replication for BUM traffic is kept under control on the NVEs due to the following fairly common assumptions:</t><t><list hangIndent="3" style="letters"> <t hangText="">Broadcast<ol spacing="normal" type="a"><li>Broadcast traffic is greatly reduced due to the proxyARP (AddressAddress ResolutionProtocol)Protocol (ARP) and proxyND (Neighbor Discovery)Neighbor Discovery (ND) capabilities supported byEVPNEVPNs <xref target="RFC9161" format="default"/> on theNVEs <xref target="I-D.ietf-bess-evpn-proxy-arp-nd"/>.NVEs. Some NVEs can even provide Dynamic Host Configuration Protocol (DHCP) server functions for the attachedTenant Systems,TSs, reducing the broadcasteven further.</t> <t hangText="b) Unknown unicasttrafficis greatly reduced in virtualized NVO">Unknowneven further.</li> <li>Unknown unicast traffic is greatly reduced inNetwork Virtualization OverlayNVO networks where all theMACMedia Access Control (MAC) and IP addresses from theTenant SystemsTSs are learned in the controlplane.</t> <t>Multicastplane.</li> <li>Multicast applications are notused.</t> </list></t>used.</li> </ol> <t>If the above assumptions are true for a givenNetwork Virtualization OverlayNVO network, thenIngress Replicationingress replication provides a simple solution for multi-destination traffic. However,thestatementc)c. above is not alwaystruetrue, and multicast applications are required in manyuse-cases.</t>use cases.</t> <t>When the multicast sources are attached to NVEs residing in hypervisors or low-performance-replicationTORs (Top Of Rack switches),Top-of-Rack (ToR) switches, the ingress replication of a large amount of multicast traffic to a significant number of remote NVEs/PEs can seriously degrade the performance of the NVE and impact the application.</t> <t>This document describes a solution that makes use of twoIngress Replicationingress replication optimizations:</t><t><list style="numbers"> <t>Assisted-Replication (AR)</t> <t>Pruned-Flood-Lists (PFL)</t> </list></t><ol spacing="normal" type="1"><li>Assisted-Replication (AR)</li> <li>Pruned-Flood-Lists (PFLs)</li> </ol> <!-- [rfced] Sections 1 and subsequent: We do not see the hyphenated form "Pruned-Flood-Lists" in any published RFCs to date. However, we see "Pruned Flood Lists (PFLs)" in RFC 9469. May we remove the hyphens? Also, RFC 9469 appears to be the only published RFC to date that uses "Flood List" (via case-insensitive search). Perhaps "Pruned Flooding Lists"? --> <t>Assisted-Replication consists of a set of procedures that allows the ingress NVE/PE to send a single copy of aBroadcastbroadcast orMulticastmulticast frame received from aTenant SystemTS to theBroadcast Domain,BD without the need for PIM in the underlay. Assisted Replication defines the roles of AR-REPLICATOR and AR-LEAF routers. The AR-LEAF is the ingress NVE/PE attached to theTenant System.TS. The AR-LEAF sends a single copy of aBroadcastbroadcast orMulticastmulticast packet to a selected AR-REPLICATOR that replicates the packetmutiplemultiple times to remote AR-LEAF or AR-REPLICATORrouters,routers and is therefore "assisting" the ingress AR-LEAF in delivering theBroadcastbroadcast orMulticastmulticast traffic to the remote NVEs/PEs attached to the sameBroadcast Domain.BD. Assisted-Replication can use a single AR-REPLICATOR or two AR-REPLICATOR routers in the path between the ingress AR-LEAF and the remote destinationNVE/PEs.NVEs/PEs. The procedures that use a single AR-REPLICATOR(Non-Selective(the Non-Selective Assisted-ReplicationSolution)solution) are specified in <xreftarget="sect-5"/>,target="sect-5" format="default"/>, whereas <xreftarget="sect-6"/>target="sect-6" format="default"/> describes how multi-staged replication, i.e., two AR-REPLICATOR routers in the path between the ingress AR-LEAF and destination NVEs/PEs, is accomplished(Selective(the Selective Assisted-ReplicationSolution).solution). TheAssisted-Replicationprocedures for Assisted-Replication do not impact unknown unicast traffic, which follows the same forwarding procedures as known unicast traffic so that packetre-orderingreordering does not occur.</t><t>Pruned-Flood-Lists is<t>PFLs provide a method for the ingress NVE/PE to prune or remove certain destination NVEs/PEs from a flood-list, depending on the interest of those NVEs/PEs in receivingBroadcast, Multicast or Unknown unicast.BUM traffic. As specified in <xreftarget="RFC8365"/>,target="RFC8365" format="default"/>, an NVE/PE builds a flood-list for BUM traffic based on the Next-Hops of the received EVPN Inclusive Multicast Ethernet Tag routes for theBroadcast Domain.BD. While <xreftarget="RFC8365"/>target="RFC8365" format="default"/> states that the flood-list is used for all BUM traffic, this document allows pruning certain Next-Hops from the list. As an example, suppose an ingress NVE creates a flood-list with Next-Hops PE1,PE2PE2, and PE3. If PE2 and PE3signaled no-interestdid not signal any interest in receiving Unknown Unicast traffic in their Inclusive Multicast Ethernet Tag routes, when the ingress NVE receives an Unknown Unicast frame from aTenant SystemTS, it will replicate it only to PE1. That is, PE2 and PE3 are "pruned" from the NVE's flood-list for Unknown Unicast traffic.Pruned-Flood-ListsPFLs can be used withIngress Replicationingress replication orAssisted-Replication,Assisted-Replication andit isare described in <xreftarget="sect-7"/>.</t>target="sect-7" format="default"/>. <!-- [rfced] Sections 1 and subsequent: We see that RFC 6514 uses "Next Hop field" (no hyphen) and that the only published RFCs to date that use "Next-Hops" are RFCs 2749 and 5286. May we change instances of "Next-Hop" to "Next Hop" where it is used as a noun (e.g., "will set the Next-Hop to an IP address") and change instances of "Next-Hops" to "next hops" (per draft-ietf-bess-evpn-bum-procedure-updates and most published RFCs)? --> </t> <t>Bothoptimizations,optimizations -- Assisted-Replication andPruned-Flood-Lists,PFLs -- may be used together or independently so that the performance and efficiency of the network to transport multicast can be improved. Both solutions require some extensions to the BGP attributes used in <xreftarget="RFC7432"/>, and they are described intarget="RFC7432" format="default"/>; see <xreftarget="sect-4"/>.</t>target="sect-4" format="default"/> for details.</t> <t>The Assisted-Replication solution described in this document is focused onNetwork Virtualization OverlayNVO networks (henceit usesits use of IPtunnels) andtunnels). MPLS transport networks are out ofscope.scope for this document. ThePruned-Flood-ListsPFLs solutionMAY<bcp14>MAY</bcp14> be used inNetwork Virtualization OverlayNVO and MPLS transport networks.</t> <t><xreftarget="sect-3"/>target="sect-3" format="default"/> lists the requirements of the combined optimizedIngress Replicationingress replication solution, whereas<xref target="sect-5"/>Sections <xref target="sect-5" format="counter"/> and <xreftarget="sect-6"/>target="sect-6" format="counter"/> describe the Assisted-Replication solution(forfor Non-Selective and Selective procedures,respectively), andrespectively. <xreftarget="sect-7"/>target="sect-7" format="default"/> provides thePruned-Flood-ListsPFLs solution.</t> </section> <section anchor="sect-2"title="Terminologynumbered="true" toc="default"> <name>Terminology andConventions">Conventions</name> <t>The key words"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY","<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and"OPTIONAL""<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as described inBCP 14BCP 14 <xreftarget="RFC2119"/>target="RFC2119" format="default"/> <xreftarget="RFC8174"/>target="RFC8174" format="default"/> when, and only when, they appear in all capitals, as shown here.</t> <t>The following terminology is used throughoutthethis document:</t><t><list style="symbols"> <t>Asisted<!-- [rfced] Section 2: We see that the list of terms is mostly alphabetized. Would you like the list to be alphabetized? --> <dl> <dt>Assisted Replication forwardingmode: formode:</dt><dd>In the case of an AR-LEAF,it meanssending an Attachment CircuitBM(AC) Broadcast and Multicast (BM) packet to a single AR-REPLICATOR with a tunnel destination IP AR-IP.ForIn the case of an AR-REPLICATOR,itthis means sending a BM packet to a selected number of, or all of, the overlay tunnels when the packet was previously received from an overlaytunnel.</t> <t>AR-LEAF: Assistedtunnel. <!-- [rfced] Sections 2 and 6.1: These two instances of "destination IP AR-IP" read oddly. Should they be "destination AR-IP"? Are some words missing? Original: - Asisted Replication forwarding mode: for an AR-LEAF, it means sending an Attachment Circuit BM packet to a single AR-REPLICATOR with tunnel destination IP AR-IP. ... The tunnel destination IP AR-IP will be an indication for the remote Selective AR-REPLICATOR that the packet needs further replication to its AR-LEAFs. --> </dd> <dt>AR-LEAF:</dt><dd>Assisted Replication -LEAF, refersLEAF. Refers to an NVE/PE that sends all theBroadcast and MulticastBM traffic to an AR-REPLICATOR that can replicate the traffic further on its behalf. An AR-LEAF is typically an NVE/PE with poor replication performancecapabilities.</t> <t>AR-REPLICATOR: Assistedcapabilities.</dd> <dt>AR-REPLICATOR:</dt><dd>Assisted Replication -REPLICATOR, refersREPLICATOR. Refers to an NVE/PE that can replicateBroadcastbroadcast orMulticastmulticast traffic received on overlay tunnels to other overlay tunnels and localAttachment Circuits.ACs. This document defines the control and data plane procedures that an AR-REPLICATOR needs tofollow.</t> <t>AR-IP:follow.</dd> <dt>AR-IP:</dt><dd>Assisted Replication - IP. Refers to an IP address owned by the AR-REPLICATOR and used to differentiate the incoming traffic that must follow the AR procedures. The AR-IP is also used in the Tunnel Identifier and Next-Hop fields of the Replicator-ARroute.</t> <t>AR-VNI: VNIroute.</dd> <dt>AR-VNI:</dt><dd>Assisted Replication - VNI. Refers to a Virtual eXtensible Local Area Network (VXLAN) Network Identifier (VNI) advertised by the AR-REPLICATOR along with the Replicator-AR route. It is used to identify the incoming packets that must follow the AR procedures ONLY in the Single-IP AR-REPLICATOR case (see <xreftarget="sect-8"/>.</t> <t>BM traffic: Referstarget="sect-8" format="default"/>).</dd> <dt>BM traffic:</dt><dd>Refers toBroadcastbroadcast andMulticastmulticast frames (excluding unknown unicastframes).</t> <t>BD:frames).</dd> <dt>BD:</dt><dd> Broadcast Domain, as defined in <xreftarget="RFC7432"/>.</t> <t>BD label: definedtarget="RFC7432" format="default"/>.</dd> <dt>BD label:</dt><dd>Defined as the MPLS label that identifies theBroadcast DomainBD and is advertised in Regular-IR or Replicator-AR routes, when the encapsulation isMPLSoGREMPLS over GRE (MPLSoGRE) orMPLSoUDP. </t> <t>DFMPLS over UDP (MPLSoUDP). </dd> <dt>DF andNDF: DesignatedNDF:</dt><dd>Designated Forwarder and Non-DesignatedForwarder,Forwarder. These are roles defined inNVE/PEsNVEs/PEs attached toMulti-Homed Tenant Systems,multihomed TSs, as per <xreftarget="RFC7432"/>target="RFC7432" format="default"/> and <xreftarget="RFC8365"/>.</t> <t>EStarget="RFC8365" format="default"/>.</dd> <dt>ES andESI: EthernetESI:</dt><dd>Ethernet Segment and Ethernet SegmentIdentifier, asIdentifier. EVPNMulti-Homingmultihoming concepts as specified in <xreftarget="RFC7432"/>.</t> <t>EVI:target="RFC7432" format="default"/>.</dd> <dt>EVI:</dt><dd> EVPN Instance. A group of Provider Edge (PE) devices participating in the same EVPN service, as specified in <xreftarget="RFC7432"/>.</t> <t>GRE: Generictarget="RFC7432" format="default"/>.</dd> <dt>GRE:</dt><dd>Generic Routing Encapsulation <xreftarget="RFC4023"/>.</t> <t>Ingresstarget="RFC4023" format="default"/>.</dd> <dt>Ingress Replication forwardingmode: it refersmode:</dt><dd> Refers to theIngress Replicationingress replication behavior explained in <xreftarget="RFC7432"/>. It means sendingtarget="RFC7432" format="default"/>. In this mode, anAttachment CircuitAC BM packet copy is sent to each remote PE/NVE in theBDBD, andsendingan overlay BM packet is sent only to theAttachment CircuitsACs and not to other overlaytunnels.</t> <t>IR-IP:tunnels.</dd> <dt>IR-IP:</dt><dd>Ingress Replication - IP. Refers to the local IP address of an NVE/PE that is used for theIngress Replicationingress replication signaling and procedures provided in <xreftarget="RFC7432"/>.target="RFC7432" format="default"/>. Encapsulated incoming traffic with an outer destination IP address matching the IR-IP will follow theIngress Replicationprocedures for ingress replication and not theAssisted-Replication procedures.procedures for Assisted-Replication. The IR-IP is also used in the Tunnel Identifier andNext-hopNext-Hop fields of the Regular-IRroute.</t> <t>IR-VNI:route.</dd> <dt>IR-VNI:</dt><dd>Ingress Replication - VNI. Refers to a VNI advertised along with the Inclusive Multicast Ethernet Tag route forIngress Replication Tunnel Type.</t> <t>MPLS: Multi-Protocolthe ingress replication tunnel type.</dd> <dt>MPLS:</dt><dd>Multi-Protocol LabelSwitching.</t> <t>NVE: NetworkSwitching.</dd> <dt>NVE:</dt><dd>Network Virtualization Edgerouter, used in this document as in<xreftarget="RFC8365"/>.</t> <t>NVGRE: Network Virtualizationtarget="RFC8365" format="default"/>.</dd> <dt>NVGRE:</dt><dd>Network virtualization using Generic RoutingEncapsulation, as inEncapsulation <xreftarget="RFC7637"/>.</t> <t>PE: Provider Edge router.</t> <t>PMSI: P-Multicasttarget="RFC7637" format="default"/>.</dd> <dt>PE:</dt><dd>Provider Edge.</dd> <dt>PMSI:</dt><dd>P-Multicast ServiceInterface - aInterface. A conceptual interface for a PE to send customer multicast traffic to all or some PEs in the same VPN <xreftarget="RFC6513"/>.</t> <t>RD: Route Distinguisher.</t> <t>Regular-IR route: antarget="RFC6513" format="default"/>.</dd> <dt>RD:</dt><dd>Route Distinguisher.</dd> <dt>Regular-IR route:</dt><dd>An EVPN Inclusive Multicast Ethernet Tag route <xreftarget="RFC7432"/>target="RFC7432" format="default"/> that usesIngress Replication Tunnel Type.</t> <t>RNVE: Regular NVE, refersthe ingress replication tunnel type.</dd> <dt>RNVE:</dt><dd>Regular NVE. Refers to an NVE that supports the proceduresofprovided in <xreftarget="RFC8365"/>target="RFC8365" format="default"/> and does not support the procedures provided in this document. However, this document defines procedures to interoperate withRNVEs.</t> <t>Replicator-AR route: anRNVEs.</dd> <dt>Replicator-AR route:</dt><dd>An EVPN Inclusive Multicast Ethernet Tag route that is advertised by an AR-REPLICATOR to signal its capabilities, as described in <xreftarget="sect-4"/>.</t> <t>TOR: Top Of Rack switch.</t> <t>TStarget="sect-4" format="default"/>.</dd> <dt>ToR switch:</dt><dd>Top-of-Rack switch.</dd> <dt>TS andVM: TenantVM:</dt><dd>Tenant System and Virtual Machine. In thisdocument Tenant Systemsdocument, TSs andVirtual MachinessVMs are the devices connected to theAttachment CircuitsACs of the PEs andNVEs.</t> <t>VNI: VXLANNVEs.</dd> <dt>VNI:</dt><dd>VXLAN NetworkIdentifier, usedIdentifier. Used in VXLANtunnels.</t> <t>VSID: Virtualtunnels.</dd> <dt>VSID:</dt><dd>Virtual SegmentIdentifier, usedIdentifier. Used in NVGREtunnels.</t> <t>VXLAN: Virtual Extensible LANtunnels.</dd> <dt>VXLAN:</dt><dd>Virtual eXtensible Local Area Network <xreftarget="RFC7348"/>.</t> </list></t>target="RFC7348" format="default"/>.</dd> </dl> </section> <section anchor="sect-3"title="Solution Requirements">numbered="true" toc="default"> <name>Solution Requirements</name> <t>TheIngress Replicationingress replication optimization solution specified in this document meets the following requirements:</t><t><list style="letters"> <t>It<ol spacing="normal" type="a"><li>The solution provides anIngress Replicationingress replication optimization forBroadcast and MulticastBM traffic without the need forPIM,PIM while preserving the packet order for unicast applications, i.e., unknown unicast traffic should follow the same path as known unicast traffic. This optimization is required in low-performanceNVEs.</t> <t>ItNVEs.</li> <li>The solution reduces the flooded traffic inNetwork Virtualization OverlayNVO networks where some NVEs do not need broadcast/multicast and/or unknown unicasttraffic.</t>traffic.</li> <li> <t>The solution is compatible with <xreftarget="RFC7432"/>target="RFC7432" format="default"/> and <xreftarget="RFC8365"/>target="RFC8365" format="default"/> and has no impact on theCECustomer Edge (CE) procedures for BM traffic. In particular, the solution supports the following EVPN functions:<list style="symbols"> <t>All-active multi-homing,<!-- [rfced] Section 3: For ease of the reader, we expanded "CE" as "Customer Edge". If this is incorrect, please provide the correct definition. Original: c. The solution is compatible with [RFC7432] and [RFC8365] and has no impact on the CE procedures for BM traffic. Currently: c. The solution is compatible with [RFC7432] and [RFC8365] and has no impact on the Customer Edge (CE) procedures for BM traffic. --> </t> <ul spacing="normal"> <li>All-active multihoming, including the split-horizon andDesignated Forwarder (DF) functions.</t> <t>Single-active multi-homing,DF functions.</li> <li>Single-active multihoming, including the DFfunction.</t> <t>Handlingfunction.</li> <li>Handling of multi-destination traffic and processing ofbroadcast and multicastBM traffic as per <xreftarget="RFC7432"/>.</t> </list></t> <t>Thetarget="RFC7432" format="default"/>.</li> </ul> </li> <li>The solution isbackwardsbackward compatible with existing NVEs using a non-optimized version ofIngress Replication.ingress replication. A given BD can have NVEs/PEs supporting regularIngress Replicationingress replication and optimizedIngress Replication.</t> <t>Theingress replication.</li> <li>The solution is independent of theNetwork Virtualization Overlay specificNVO-specific data plane encapsulation and the virtual identifiers being used,e.g.:e.g., VXLAN VNIs, NVGREVSIDsVSIDs, or MPLS labels, as long as the tunnel isIP-based.</t> </list></t>IP based.</li> </ol> </section> <section anchor="sect-4"title="EVPNnumbered="true" toc="default"> <name>EVPN BGP Attributes for Optimized IngressReplication">Replication</name> <t>This solution extends the Inclusive Multicast Ethernet Tag routes and attributes described in <xreftarget="RFC7432"/>target="RFC7432" format="default"/> so that an NVE/PE can signal its optimized ingress replication capabilities. <!-- [rfced] Sections 4 and 7: As this document discusses several solutions, which solution is referred to in these sentences - the ingress replication optimization solution in general or one of the solutions discussed in Sections 5 and 6? Original: This solution extends the [RFC7432] Inclusive Multicast Ethernet Tag routes and attributes so that an NVE/PE can signal its optimized Ingress Replicationcapabilities.</t>capabilities. ... In addition to AR, the second optimization supported by this solution is the ability for the all the BD nodes to signal Pruned-Flood-Lists (PFL). Possibly: The ingress replication optimization solution specified in this document extends the Inclusive Multicast Ethernet Tag routes and attributes described in [RFC7432] so that an NVE/PE can signal its optimized ingress replication capabilities. ... In addition to AR, the second optimization supported by the ingress replication optimization solution specified in this document is the ability of all the BD nodes to signal PFLs. --> </t> <t>TheNLRINetwork Layer Reachability Information (NLRI) of the Inclusive Multicast Ethernet Tag routeas in<xreftarget="RFC7432"/>target="RFC7432" format="default"/> is shown in <xreftarget="imet-route"/>target="imet-route" format="default"/> anditis used in this document without any modifications to its format. The PMSI Tunnel Attribute's general format as provided in <xreftarget="RFC7432"/>target="RFC7432" format="default"/> (which takes it from <xreftarget="RFC6514"/>)target="RFC6514" format="default"/>) is used in thisdocument,document; only a newTunnel Typetunnel type and new flags are specified, as shown in <xreftarget="pta"/>:</t> <t><figure anchor="imet-route" title="EVPNtarget="pta" format="default"/>.</t> <figure anchor="imet-route"> <name>EVPN Inclusive Multicast Ethernet Tagroute's NLRI"> <artwork><![CDATA[ +---------------------------------+Route's NLRI</name> <artwork name="" type="" align="left" alt=""><![CDATA[ +------------------------------------+ | RD (8 octets) |+---------------------------------++------------------------------------+ | Ethernet Tag ID (4 octets) |+---------------------------------++------------------------------------+ | IP Address Length (1 octet) |+---------------------------------++------------------------------------+ | Originating Router's IPAddrAddress | | (4 or 16 octets) |+---------------------------------+]]></artwork> </figure><figure anchor="pta" title="PMSI+------------------------------------+]]></artwork> </figure> <!-- [rfced] Figure 2: We changed "Inclusive Multicast Tag" to "Inclusive Multicast Ethernet Tag" per "Inclusive Multicast Ethernet Tag route [RFC7432] is shown in Figure 2" in the paragraph just prior to the figure and "the above Inclusive Multicast Ethernet Tag route (Figure 2)" a few paragraphs later. If this is incorrect, please define this new term, which we don't see anywhere else in this document or in any published RFC except for RFC 9489. Original: Figure 2: EVPN Inclusive Multicast Tag Route's NLRI Currently: Figure 2: EVPN Inclusive Multicast Ethernet Tag Route's NLRI --> <figure anchor="pta"> <name>PMSI TunnelAttribute"> <artwork><![CDATA[Attribute</name> <artwork name="" type="" align="left" alt=""><![CDATA[ 0 1 2 3 4 5 6 7 +---------------------------------+ +--+--+--+--+--+--+--+--+ | Flags (1 octet) | -> |x |E |x | T |BM|U |L | +---------------------------------+ +--+--+--+--+--+--+--+--+ | Tunnel Type (1octets)octet) | T = Assisted-Replication Type +---------------------------------+ BM = Broadcast and Multicast | MPLS Label (3 octets) | U = Unknownunicast(unknown unicast) +---------------------------------+ x = unassigned | Tunnel Identifier (variable) | +---------------------------------+]]></artwork></figure>The</figure> <t>The Flags field in <xreftarget="pta"/>target="pta" format="default"/> is 8 bits long as per <xreftarget="RFC7902"/>, where thetarget="RFC7902" format="default"/>. The Extensionflag(E) flag was allocated by <xref target="RFC7902" format="default"/>, and the Leaf Information Required (L)Flag are already allocated.flag was allocated by <xref target="RFC6514" format="default"/>. This document defines the use of 4 bits of this Flagsfield, and suggests the following allocation to IANA:</t> <t><list style="symbols"> <t>bitsfield: </t> <ul spacing="normal"> <li>Bits 3 and 4,formingwhich together form the Assisted-Replication Type (T)field</t> <t>bitfield</li> <li>Bit 5, called the Broadcast and Multicast (BM)flag</t> <t>bitflag</li> <li>Bit 6, called the Unknown (U)flag</t> </list>Bitsflag</li> </ul> <t>Bits 5 and 6 are collectively referred to as thePruned-Flood Lists (PFL)Pruned-Flood-Lists (PFLs) flags.</t> <t>The T field andPruned-Flood-ListsPFLs flags are defined as follows:</t><t><list style="symbols"><ul spacing="normal"> <li> <t>T is the Assisted-Replication Type field (2bits) thatbits), which defines the AR role of the advertisingrouter:<list style="symbols"> <t>00router:</t> <ul spacing="normal"> <li>00 (decimal 0) = RNVE (non-ARsupport)</t> <t>01support)</li> <li>01 (decimal 1) =AR-REPLICATOR</t> <t>10AR-REPLICATOR</li> <li>10 (decimal 2) =AR-LEAF</t> <t>11AR-LEAF</li> <li>11 (decimal 3) =RESERVED</t> </list></t>RESERVED</li> </ul> </li> <li> <t>ThePruned-Flood-ListsPFLs flags define the desired behavior of the advertising router for the different types oftraffic:<list style="symbols"> <t>Broadcasttraffic:</t> <ul spacing="normal"> <li>Broadcast and Multicast (BM) flag. BM = 1 means "prune-me" from the BM flooding list. BM = 0 indicates regular behavior.</li> <li>Unknown (U) flag. U = 1 means "prune-me" from the Unknown flooding list. U = 0 indicates regular behavior.</li> <!-- [rfced] Sections 4 and 7: These sentences are difficult to follow. Does "prune-me" mean "prune me (from the indicated list)"? If yes, may we update as suggested? Original: o Broadcast and Multicast (BM) flag. BM=1 means "prune-me" from the BM flooding list. BM=0 means regularbehavior.</t> <t>Unknownbehavior. o Unknown (U) flag. U=1 means "prune-me" from the Unknown flooding list. U=0 means regularbehavior.</t> </list></t> <t>Flag Lbehavior. ... - BM isan existingthe Broadcast and Multicast flag. BM=1 means "prune-me" from the BM flood-list. BM=0 means regular behavior. - U is the Unknown flag. U=1 means "prune-me" from the Unknown flood-list. U=0 means regular behavior. Suggested (assuming that "flood-list" and "flooding list" mean the same thing): - Broadcast and Multicast (BM) flag. BM = 1 means "prune me from the BM flooding list". BM = 0 indicates regular behavior. - Unknown (U) flag. U = 1 means "prune me from the Unknown flooding list". U = 0 indicates regular behavior. ... * BM is the Broadcast and Multicast flag. BM = 1 means "prune me from the BM flooding list". BM = 0 indicates regular behavior. * U is the Unknown flag. U = 1 means "prune me from the Unknown flooding list". U = 0 indicates regular behavior. --> </ul> </li> <li>The L flag (bit 7) is defined in <xreftarget="RFC6514"/> (L=Leaf Information Required, bit 7)target="RFC6514" format="default"/> anditwill be used only in the Selective ARSolution.</t> </list></t>solution.</li> </ul> <t>Please refer to <xreftarget="sect-11"/>target="sect-11" format="default"/> for the IANA considerations related to the PMSI Tunnel Attribute flags.</t> <t>In this document, the above Inclusive Multicast Ethernet Tag route<xref target="imet-route"/>(<xref target="imet-route" format="default"/>) and PMSI Tunnel Attribute<xref target="pta"/>(<xref target="pta" format="default"/>) can be used in two different modes for the same BD:</t><t><list style="symbols"> <t>Regular-IR route: in<dl> <dt>Regular-IR route:</dt><dd>In this route, Originating Router's IP Address, Tunnel Type (0x06), MPLSLabelLabel, and Tunnel IdentifierMUST<bcp14>MUST</bcp14> be used as described in <xreftarget="RFC7432"/>target="RFC7432" format="default"/> whenIngress Replicationingress replication is in use. The NVE/PE that advertises the route will set the Next-Hop to an IP address that we denominate IR-IP in this document. When advertised by an AR-LEAF node, the Regular-IR routeMUST<bcp14>MUST</bcp14> be advertised withtypethe T field set to 10(AR-LEAF).</t> <t>Replicator-AR route: this(AR-LEAF).</dd> <dt> Replicator-AR route:</dt><dd><t>This route is used by the AR-REPLICATOR to advertise its AR capabilities, with the fields set asfollows:<list style="symbols">follows:</t> <ul spacing="normal"> <li> <t>Originating Router's IP AddressMUST<bcp14>MUST</bcp14> be set to an IP address of the advertising router that is common to all the EVIs on the PE (usually this is a loopback address of the PE).<list style="symbols"> <t>The</t> <ul spacing="normal"> <li>The Tunnel Identifier and Next-Hop fields <bcp14>SHOULD</bcp14> be set to the same IP address as the Originating Router's IP Address field when the NVE/PE originates the route -- that is, when the NVE/PE is not an ASBR; see <xref target="RFC8365" section="10.2" sectionFormat="of"/>. Irrespective of the values in the Tunnel Identifier and Originating Router's IP Address fields, the ingress NVE/PE will process the received Replicator-AR route and will use the IP address setting in the Next-Hop field to create IP tunnels to the AR-REPLICATOR. <!-- [rfced] Section 4: The text in this paragraph was difficult to follow from the standpoint of whether some terms were field names or were used generally. We updated as follows. If this is incorrect, please provide clarifying text. Also, please note that per our previous question about "Next-Hop", we suggest removing the hyphen in the field name per more common usage in published RFCs. Original: + The Tunnel Identifier and Next-Hop SHOULD be set to the same IP address as the Originating Router's IP address when the NVE/PE originates the route, that is, when the NVE/PE is not an ASBR as in section 10.2 of<xref target="RFC8365"/>.[RFC8365]. Irrespective of the values in the Tunnel Identifier and Originating Router's IP Address fields, the ingress NVE/PE will process the received Replicator-AR route and will use the IP Address in the Next-Hop field to create IP tunnels to theAR-REPLICATOR.</t> <t>TheAR- REPLICATOR. Currently: - The Tunnel Identifier and Next-Hop fields SHOULD be set to the same IP address as the Originating Router's IP Address field when the NVE/PE originates the route - that is, when the NVE/PE is not an ASBR; see Section 10.2 of [RFC8365]. Irrespective of the values in the Tunnel Identifier and Originating Router's IP Address fields, the ingress NVE/PE will process the received Replicator-AR route and will use the IP address setting in the Next-Hop field to create IP tunnels to the AR-REPLICATOR. --> </li> <li>The Next-Hop address is referred to as the AR-IP andMUST<bcp14>MUST</bcp14> be different from the IR-IP for a given PE/NVE, unless the procedures provided in <xreftarget="sect-8"/>target="sect-8" format="default"/> arefollowed.</t> </list></t> <t>Tunnelfollowed.</li> </ul> </li> <li>Tunnel TypeMUST<bcp14>MUST</bcp14> be set to Assisted-Replication Tunnel. <xreftarget="sect-11"/>target="sect-11" format="default"/> provides the allocated typevalue.</t> <t>Tvalue.</li> <li>T (AR role type)MUST<bcp14>MUST</bcp14> be set to 01(AR-REPLICATOR).</t> <t>L(AR-REPLICATOR).</li> <li>L (Leaf Information Required)MUST<bcp14>MUST</bcp14> be set to 0(forfor non-selectiveAR),AR andMUST<bcp14>MUST</bcp14> be set to 1(forfor selectiveAR).</t> </list></t> </list></t>AR.</li> </ul> </dd> </dl> <t>An NVE/PE configured as an AR-REPLICATOR for a BDMUST<bcp14>MUST</bcp14> advertise a Replicator-AR route for the BD andMAY<bcp14>MAY</bcp14> advertise a Regular-IR route. The advertisement of the Replicator-AR route will indicate to the AR-LEAFswhatwhich outer IP DA, i.e.,thewhich AR-IP, they need to use forIP encapsulatedIP-encapsulated BM frames that use Assisted Replication forwarding mode. The AR-REPLICATOR will forward anIP encapsulatedIP-encapsulated BM frame in Assisted Replication forwarding mode if the outer IP DA matches itsAR-IP,AR-IP but will forward in Ingress Replication forwarding mode if the outer IP DA matches its IR-IP.</t> <t>In addition, this document also uses the Leaf Auto-Discovery (Leaf A-D) route defined in <xreftarget="I-D.ietf-bess-evpn-bum-procedure-updates"/>target="RFC9572" format="default"/> incasecases where the selective AR mode is used. An AR-LEAFMAY<bcp14>MAY</bcp14> send a Leaf A-D route in response to reception of a Replicator-AR route whose L flag is set. The LeafAuto-DiscoveryA-D route is only used for selectiveARAR, and the fields of such a route are set as follows:</t><t><list hangIndent="2" style="empty"> <t><list style="symbols"> <t>Originating<ul spacing="normal"> <li>Originating Router's IP Address is set to the advertising router's IP address(same(the same IP address used by the AR-LEAF inregular-IRRegular-IR routes). The Next-Hop address is set to the IR-IP, whichSHOULD<bcp14>SHOULD</bcp14> be the same IP address as the advertising router's IP address, when the NVE/PE originates the route, i.e., when the NVE/PE is not anASBR as in section 10.2 ofASBR; see <xreftarget="RFC8365"/>.</t> <t>Routetarget="RFC8365" sectionFormat="of" section="10.2" format="default"/>.</li> <li>Route Key <xref target="RFC9572" format="default"/> is the "Route Type Specific" NLRI of the Replicator-AR route for which this LeafAuto-DiscoveryA-D route isgenerated.</t> <t>Thegenerated.</li> <li>The AR-LEAF constructs an IP-address-specificroute-target,Route Target, analogously to <xreftarget="I-D.ietf-bess-evpn-bum-procedure-updates"/>,target="RFC9572" format="default"/>, by placing the IP address carried in the Next-Hop field of the received Replicator-AR route in the Global Administrator field of theCommunity,extended community, with the Local Administrator field of thisCommunityextended community set to 0, and setting the Extended Communities attribute of the LeafAuto-DiscoveryA-D route to thatCommunity.extended community. The same IP-address-specific importroute-targetRoute Target is auto-configured by the AR-REPLICATOR that sent the Replicator-AR route, in order to control the acceptance of the LeafAuto-Discovery routes.</t> <t>TheA-D routes.</li> <li>The LeafAuto-DiscoveryA-D routeMUST<bcp14>MUST</bcp14> include the PMSI TunnelattributeAttribute withtheTunnel Type set toARAssisted-Replication Tunnel (<xreftarget="sect-11"/>),target="sect-11" format="default"/>), T (AR role type) set toAR-LEAFAR-LEAF, andtheTunnel Identifier set to the IP address of the advertising AR-LEAF. The PMSI Tunnelattribute MUSTAttribute <bcp14>MUST</bcp14> carry a downstream-assigned MPLS label or VNI that is used by the AR-REPLICATOR to send traffic to theAR-LEAF.</t> </list></t> </list></t>AR-LEAF. <!-- [rfced] Section 4: Per Section 11 and per "Tunnel Type MUST be set to Assisted-Replication Tunnel" used earlier in this section, we changed "set to AR" to "set to Assisted-Replication Tunnel". Please let us know any concerns. Original: o The Leaf Auto-Discovery route MUST include the PMSI Tunnel attribute with the Tunnel Type set to AR (Section 11), T (AR role type) set to AR-LEAF and the Tunnel Identifier set to the IP address of the advertising AR-LEAF. Currently: * The Leaf A-D route MUST include the PMSI Tunnel Attribute with Tunnel Type set to Assisted-Replication Tunnel (Section 11), T (AR role type) set to AR-LEAF, and Tunnel Identifier set to the IP address of the advertising AR-LEAF. --> </li> </ul> <t>Each AR-enabled node understands andprocessprocesses the T (Assisted-Replication type) field in the PMSI Tunnel Attribute (Flags field) of theroutes,routes andMUST<bcp14>MUST</bcp14> signal the corresponding type (AR-REPLICATOR or AR-LEAF type) according to its administrative choice. An NVE/PE following this specification is not expected to set the Assisted-Replication Type field to decimal 3 (which is a RESERVED value). If a route with the AR type field set to decimal 3 is received by an AR-REPLICATOR or AR-LEAF, the router will process the route as a Regular-IR route advertised by an RNVE.</t> <t>Each node attached to the BD may understand and process the BM/U flags(Pruned-Flood-Lists(PFLs flags). Note that these BM/U flags may be used to optimize the delivery of multi-destinationtraffic andtraffic; their useSHOULD<bcp14>SHOULD</bcp14> be an administrativechoice,choice and independent of the AR role. When thePruned-Flood-ListPFL capability is enabled, the BM/U flags can be used with the Regular-IR,Replicator-ARReplicator-AR, and LeafAuto-DiscoveryA-D routes.</t> <t>Non-optimizedIngress Replicationingress replication NVEs/PEs will be unaware of the new PMSI Tunnel Attribute flag definition as well as the newTunnel Typetunnel type (AR), i.e., non-upgraded NVEs/PEs will ignore the information contained in theflagsFlags field or an unknownTunnel Typetunnel type (type AR in this case) for any Inclusive Multicast Ethernet Tag route.</t> </section> <section anchor="sect-5"title="Non-Selectivenumbered="true" toc="default"> <name>Non-selective Assisted-Replication (AR) SolutionDescription">Description</name> <t><xreftarget="ure-optimized-ir-scenario"/>target="ure-optimized-ir-scenario" format="default"/> illustrates an exampleNetwork Virtualization OverlayNVO network where the non-selective AR function is enabled. Three different roles are defined for a given BD: AR-REPLICATOR,AR-LEAFAR-LEAF, andRNVE (Regular NVE).RNVE. The solution is called "non-selective" because the chosen AR-REPLICATOR for a given flowMUST<bcp14>MUST</bcp14> replicate the BM traffic to all theNVE/PEsNVEs/PEs in the BD except for the source NVE/PE.Network Virtualization OverlayNVO tunnels, i.e., IP tunnels, exist among all the PEs and NVEs in the diagram. The PEs and NVEs in the diagram haveTenant SystemsTSs orVirtual MachinesVMs connected to theirAttachment Circuits.</t>ACs.</t> <figureanchor="ure-optimized-ir-scenario" title="Non-Selectiveanchor="ure-optimized-ir-scenario"> <name>Non-selective ARscenario"> <artwork><![CDATA[Scenario</name> <artwork name="" type="" align="left" alt=""><![CDATA[ ( ) (_ WAN _) +---(_ _)----+ | (_ _) | PE1 | PE2 | +------+----+ +----+------+ TS1--+ (BD-1) | | (BD-1) +--TS2 |REPLICATOR | |REPLICATOR | +--------+--+ +--+--------+ | | +--+----------------+--+ | | | | +----+VXLAN/nvGRE/MPLSoGREVXLAN/NVGRE/MPLSoGRE +----+ | | IP Fabric | | | | | | NVE1 | +-----------+----------+ | NVE3 Hypervisor|TORToR | NVE2 |Hypervisor +---------+-+ +-----+-----+ +-+---------+ | (BD-1) | | (BD-1) | | (BD-1) | | LEAF | | RNVE | | LEAF | +--+-----+--+ +--+-----+--+ +--+-----+--+ | | | | | | VM11 VM12 TS3 TS4 VM31 VM32 ]]></artwork> </figure> <t>In ARBDsBDs, such as BD-1 inthe example,<xref target="ure-optimized-ir-scenario"/>, BM(Broadcast and Multicast)traffic between two NVEs may follow a different path than unicast traffic. This solution recommends the replication of BM traffic through the AR-REPLICATOR node, whereas unknown/known unicast traffic will be delivered directly from the source node to the destination node without being replicated by any intermediate node.</t> <t>Note that known unicast forwarding is not impacted by this solution, i.e., unknown unicastSHALLtraffic <bcp14>SHALL</bcp14> follow the same path as known unicast traffic.</t> <section anchor="sect-5.1"title="Non-selectivenumbered="true" toc="default"> <name>Non-selective AR-REPLICATORProcedures">Procedures</name> <t>An AR-REPLICATOR is defined as an NVE/PE capable of replicating incoming BM traffic received on an overlay tunnel to other overlay tunnels and localAttachment Circuits.ACs. The AR-REPLICATOR signals its role in the control plane and understands where the other roles (AR-LEAF nodes,RNVEsRNVEs, and other AR-REPLICATORs) are located. A given AR-enabled BD service may have zero,oneone, or more AR-REPLICATORs. In our example in <xreftarget="ure-optimized-ir-scenario"/>,target="ure-optimized-ir-scenario" format="default"/>, PE1 and PE2 are defined as AR-REPLICATORs. The following considerations apply to the AR-REPLICATOR role:</t><t><list hangIndent="3" style="letters"> <t hangText="">The<ol spacing="normal" type="a"><li>The AR-REPLICATOR roleSHOULD<bcp14>SHOULD</bcp14> be an administrative choice in any NVE/PE that is part of an AR-enabled BD. This administrative option to enable AR-REPLICATOR capabilitiesMAY<bcp14>MAY</bcp14> be implemented as asystem levelsystem-level option as opposed toasa per-BDoption.</t> <t hangText="">Anoption.</li> <li>An AR-REPLICATORMUST<bcp14>MUST</bcp14> advertise a Replicator-AR route andMAY<bcp14>MAY</bcp14> advertise a Regular-IR route. The AR-REPLICATORMUST NOT<bcp14>MUST NOT</bcp14> generate a Regular-IR route if it does not have localattachment circuits (AC).ACs. If the Regular-IR route is advertised, the Assisted-Replication Type field of the Regular-IR routeMUST<bcp14>MUST</bcp14> be set tozero.</t> <t hangText="">The0.</li> <li>The Replicator-AR and Regular-IR routes are generated according to <xreftarget="sect-4"/>.target="sect-4" format="default"/>. The AR-IP and IR-IP are different IP addresses owned by theAR-REPLICATOR.</t> <t hangText="">WhenAR-REPLICATOR.</li> <li> <t>When a node defined as an AR-REPLICATOR receives a BM packet on an overlay tunnel, it will do a tunnel destination IP address lookup and apply the following procedures:<list style="symbols"> <t>If</t> <ul spacing="normal"> <li>If the destination IP address is the AR-REPLICATOR IR-IPAddressAddress, the node will process the packet normally as discussed in <xreftarget="RFC7432"/>.</t> <t>Iftarget="RFC7432" format="default"/>.</li> <li>If the destination IP address is the AR-REPLICATOR AR-IPAddressAddress, the nodeMUST<bcp14>MUST</bcp14> replicate the packet to localAttachment CircuitsACs and overlay tunnels (excluding the overlay tunnel to the source of the packet). When replicating to remoteAR-REPLICATORsAR-REPLICATORs, the tunnel destination IP address will be an IR-IP.ThatThis willbe an indication forindicate to the remote AR-REPLICATOR that itMUST NOT<bcp14>MUST NOT</bcp14> replicate to overlay tunnels. The tunnel source IP address used by the AR-REPLICATORMUST<bcp14>MUST</bcp14> be its IR-IP when replicating to AR-REPLICATOR or AR-LEAFnodes.</t> </list></t> </list>Annodes.</li> </ul> </li> </ol> <t>An AR-REPLICATORMUST<bcp14>MUST</bcp14> follow a data path implementation compatible with the following rules:</t><t><list style="symbols"> <t>The<ul spacing="normal"> <li>The AR-REPLICATORs will build a flooding list composed ofAttachment CircuitsACs and overlay tunnels to remote nodes in the BD. Some of those overlay tunnelsMAY<bcp14>MAY</bcp14> be flagged as non-BM receivers based on the BM flag received from the remote nodes in theBD.</t> <t>WhenBD.</li> <li>When an AR-REPLICATOR receives a BM packet on anAttachment Circuit,AC, it will forward the BM packet to its flooding list (including localAttachment CircuitsACs and remoteNVE/PEs),NVEs/PEs), skipping the non-BM overlaytunnels.</t>tunnels.</li> <li> <t>When an AR-REPLICATOR receives a BM packet on an overlay tunnel, it will check the destination IP address of the underlay IP headerand: <list style="symbols"> <t>Ifand:</t> <ul spacing="normal"> <li>If the destination IP address matches its IR-IP, the AR-REPLICATOR will skip all the overlay tunnels from the flooding list,i.e.i.e., it will only replicate to localAttachment Circuits.ACs. This is the regularIngress Replicationingress replication behavior described in <xreftarget="RFC7432"/>.</t> <t>Iftarget="RFC7432" format="default"/>.</li> <li>If the destination IP address matches its AR-IP, the AR-REPLICATORMUST<bcp14>MUST</bcp14> forward the BM packet to its flooding list (ACs and overlaytunnels)tunnels), excluding the non-BM overlay tunnels. The AR-REPLICATOR will ensure that the traffic is not sent back to the originatingAR-LEAF.</t> <t>IfAR-LEAF.</li> <li>If the encapsulation is MPLSoGRE or MPLSoUDP and the received BD label that the AR-REPLICATOR advertised in the Replicator-AR route is not at the bottom of the stack, the AR-REPLICATORMUST<bcp14>MUST</bcp14> copytheall the labels below the BD label and propagate them when forwarding the packet to the egress overlaytunnels.</t> </list></t>tunnels.</li> </ul> </li> <li> <t>The AR-REPLICATOR/LEAF nodes will build an Unknown unicast flood-list composed ofAttachment CircuitsACs and overlay tunnels to the IR-IP Addresses of the remote nodes in the BD. Some of those overlay tunnelsMAY<bcp14>MAY</bcp14> be flagged as non-U (Unknown unicast) receivers based on the U flag received from the remote nodes in theBD.<list style="symbols"> <t>WhenBD.</t> <ul spacing="normal"> <li>When an AR-REPLICATOR/LEAF receives an unknown unicast packet on anAttachment Circuit,AC, it will forward the unknown unicast packet to its flood-list, skipping the non-U overlaytunnels.</t> <t>Whentunnels.</li> <li>When an AR-REPLICATOR/LEAF receives an unknown unicast packet on an overlay tunnel, it will forward the unknown unicast packet to its localAttachment CircuitsACs and never to an overlay tunnel. This is the regularIngress Replicationingress replication behavior described in <xreftarget="RFC7432"/>.</t> </list></t> </list></t>target="RFC7432" format="default"/>.</li> </ul> </li> </ul> </section> <section anchor="sect-5.2"title="Non-Selectivenumbered="true" toc="default"> <name>Non-selective AR-LEAF Procedures</name> <t>An AR-LEAFProcedures"> <t>AR-LEAFis defined as an NVE/PEthat -that, given its poor replicationperformance -performance, sends all the BM traffic to an AR-REPLICATOR that can replicate the traffic further on its behalf. ItMAY<bcp14>MAY</bcp14> signal its AR-LEAF capability in the control plane and understands where the other roles are located(AR-REPLICATOR(AR-REPLICATORs and RNVEs). A given service can have zero,oneone, or more AR-LEAF nodes. In <xreftarget="ure-optimized-ir-scenario"/> showstarget="ure-optimized-ir-scenario" format="default"/>, NVE1 and NVE3 (both residing in hypervisors)actingact asAR-LEAF.AR-LEAF nodes. The following considerations apply to the AR-LEAF role:</t><t><list hangIndent="3" style="letters"> <t hangText="">The<ol spacing="normal" type="a"><li>The AR-LEAF roleSHOULD<bcp14>SHOULD</bcp14> be an administrative choice in any NVE/PE that is part of an AR-enabled BD. This administrative option to enable AR-LEAF capabilitiesMAY<bcp14>MAY</bcp14> be implemented as asystem levelsystem-level option as opposed toasa per-BDoption.</t> <t hangText="">Inoption.</li> <li>In this non-selective AR solution, the AR-LEAFMUST<bcp14>MUST</bcp14> advertise a single Regular-IRinclusive multicast routeInclusive Multicast Route as described in <xreftarget="RFC7432"/>.target="RFC7432" format="default"/>. The AR-LEAFSHOULD<bcp14>SHOULD</bcp14> set the Assisted-Replication Type field to AR-LEAF. Note that although this field does not make any difference for the remote nodes when creating an EVPN destination to the AR-LEAF, this field is useful for an easy operation and troubleshooting of theBD.</t> <t hangText="">InBD. <!-- [rfced] Section 5.2: a) We see "Inclusive Multicast Ethernet Tag route" in RFC 7432 but not "inclusive multicast route" or "Inclusive Multicast Route". Will these distinctions be clear to readers, or should "Ethernet Tag" be added? Original: b. In this non-selective AR solution, the AR-LEAF MUST advertise a single Regular-IR inclusive multicast route as in [RFC7432]. ... c. In a BD where there are no AR-REPLICATORs due to theAR-REPLICATORsAR- REPLICATORs being down or reconfigured, the AR-LEAF MUST use regular Ingress Replication, based on the remote Regular-IR Inclusive Multicast Routes as described in [RFC7432]. ... The AR-REPLICATOR-activation-timer SHOULD be configurable in seconds, and its value account for the time it takes for the AR-LEAF Regular-IR inclusive multicast route to get to the AR-REPLICATOR and be programmed. b) We had trouble following the meaning of "make any difference for" here. If the suggested text is not correct, please provide clarifying text. Original: Note that although this field does not make any difference for the remote nodes when creating an EVPN destination to the AR- LEAF, this field is useful for an easy operation and troubleshooting of the BD. Suggested: Note that although this field does not affect the remote nodes when creating an EVPN destination to the AR-LEAF, this field is useful from the standpoint of ease of operation and troubleshooting of the BD. --> </li> <li> <t>In a BD where there are no AR-REPLICATORs due to the AR-REPLICATORs being down or reconfigured, the AR-LEAF <bcp14>MUST</bcp14> use regular ingress replication based on the remote Regular-IR Inclusive Multicast Routes as described in <xreftarget="RFC7432"/>.target="RFC7432" format="default"/>. This may happen in the following cases:<list style="symbols"> <t>The</t> <ul spacing="normal"> <li>The AR-LEAF has a list of AR-REPLICATORs for the BD, but it detects that all the AR-REPLICATORs for the BD are down (vianext-hopNext-Hop tracking in the IGP oranysome other detectionmechanism).</t> <t>Themechanism).</li> <li>The AR-LEAF receives updates from all the former AR-REPLICATORs containing a non-REPLICATOR AR type in the Inclusive MulticastEthernerEthernet Tagroutes.</t> <t>Theroutes.</li> <li>The AR-LEAF never discovered an AR-REPLICATOR for theBD.</t> </list></t> <t hangText="">InBD.</li> </ul> </li> <li> <t>In a service where thereisare one or more AR-REPLICATORs (based on the received Replicator-AR routes for the BD), the AR-LEAF can locally select which AR-REPLICATOR it sends the BM trafficto: <list style="symbols"> <t>Ato:</t> <ul spacing="normal"> <li>A single AR-REPLICATORMAY<bcp14>MAY</bcp14> be selected for all the BM packets received on the AR-LEAFattachment circuits (ACs)ACs for a given BD. This selection is a local decision anditdoes not have to match other AR-LEAFs' selections within the sameBD.</t> <t>AnBD.</li> <li>An AR-LEAFMAY<bcp14>MAY</bcp14> select more than one AR-REPLICATOR and do either per-flow or per-BD loadbalancing.</t> <t>Inbalancing.</li> <li>In the case ofafailure of the selected AR-REPLICATOR, another AR-REPLICATORSHOULD<bcp14>SHOULD</bcp14> be selected by theAR-LEAF.</t> <t>WhenAR-LEAF.</li> <li>When an AR-REPLICATOR is selected for a given flow or BD, the AR-LEAFMUST<bcp14>MUST</bcp14> send all the BM packets targeted to that AR-REPLICATOR using the forwarding information given by the Replicator-AR route for the chosen AR-REPLICATOR, withtunnel typeTunnel Type = 0x0A (AR tunnel). The underlay destination IP addressMUST<bcp14>MUST</bcp14> be the AR-IP advertised by the AR-REPLICATOR in the Replicator-ARroute.</t> <t>Anroute.</li> <li>An AR-LEAFMAY<bcp14>MAY</bcp14> change theAR-REPLICATOR(s)selectiondynamically,of AR-REPLICATOR(s) dynamically due to an administrative or policy configurationchange.</t> <t>AR-LEAFchange.</li> <li>AR-LEAF nodesSHALL<bcp14>SHALL</bcp14> send service-level BM control planepacketspackets, following the procedures for regularIngress Replication procedures.ingress replication. An example would be IGMP,MLDMulticast Listener Discovery (MLD), or PIMmulticastpackets,andand, ingeneralgeneral, any packets using link-local scope multicast IPv4 or IPv6 packets. The AR-REPLICATORsMUST NOT<bcp14>MUST NOT</bcp14> replicate these control plane packets to other overlaytunnelstunnels, since they will use the regular IR-IPAddress.</t> </list></t> <t hangText="">TheAddress.</li> </ul> </li> <li>The use of an AR-REPLICATOR-activation-timer (in seconds, with a default valueisof 3) on the AR-LEAF nodes isRECOMMENDED.<bcp14>RECOMMENDED</bcp14>. Upon receiving a new Replicator-AR route where the AR-REPLICATOR is selected, the AR-LEAF will run a timer before programming the new AR-REPLICATOR. In the case of anewnewly addedAR-REPLICATOR,AR-REPLICATOR orin case theif an AR-REPLICATOR reboots, this timer will give the AR-REPLICATOR some time to program the AR-LEAF nodes before the AR-LEAF sends BM traffic. The AR-REPLICATOR-activation-timerSHOULD<bcp14>SHOULD</bcp14> be configurable in seconds, and its value account for the time it takes for the AR-LEAF Regular-IRinclusive multicast routeInclusive Multicast Route to get to the AR-REPLICATOR and be programmed. While theAR-REPLICATOR-activation-timeAR-REPLICATOR-activation-timer is running, the AR-LEAF node will use regular ingressreplication.</t> <t>Ifreplication. <!-- [rfced] Section 5.2: This sentence does not parse. Should "and its value account for" say "and its value SHOULD account for", "and its value needs to account for", or something else? In other words, to what does "SHOULD" apply in this sentence? Original: The AR-REPLICATOR-activation-timer SHOULD be configurable in seconds, and its value account for the time it takes for the AR-LEAF Regular-IR inclusive multicast route to get to the AR-REPLICATOR and be programmed. --> </li> <li>If the AR-LEAF has selected an AR-REPLICATOR, whether or not to change to a new preferred AR-REPLICATOR for the existing BM traffic flows is a matter of local policy. <!-- [rfced] Section 5.2: As it appears that "a matter of local policy" indicates whether or not to change to a new preferred AR-REPLICATOR (as opposed to always making the change), we updated this sentence accordingly. If this is incorrect, please provide clarifying text. Original: f. If the AR-LEAF has selected an AR-REPLICATOR, it is a matter of local policy to change to a new preferred AR-REPLICATOR for the existing BM trafficflows.</t> </list>Anflows. Currently: f. If the AR-LEAF has selected an AR-REPLICATOR, whether or not to change to a new preferred AR-REPLICATOR for the existing BM traffic flows is a matter of local policy. --> </li> </ol> <t>An AR-LEAFMUST<bcp14>MUST</bcp14> follow a data path implementation compatible with the following rules:</t><t><list style="symbols"><ul spacing="normal"> <li> <t>The AR-LEAF nodes will build twoflood-lists:<list style="numbers"> <t>Flood-list #1 - composedflood-lists:</t> <dl> <dt>Flood-list #1: </dt> <dd>Composed ofAttachment CircuitsACs and an AR-REPLICATOR-set of overlay tunnels. The AR-REPLICATOR-set is defined as one or more overlay tunnels to the AR-IP Addresses of the remote AR-REPLICATOR(s) in the BD. The selection of more than one AR-REPLICATOR is described inpoint d)item d. above anditis a local AR-LEAFdecision.</t> <t>Flood-list #2 - composeddecision. </dd> <dt>Flood-list #2: </dt> <dd>Composed ofAttachment CircuitsACs and overlay tunnels to the remote IR-IPAddresses.</t> </list></t>Addresses. </dd> </dl> </li> <li> <t>When an AR-LEAF receives a BM packet on anAttachment Circuit,AC, it will check theAR-REPLICATOR-set:<list style="symbols"> <t>IfAR-REPLICATOR-set:</t> <ul spacing="normal"> <li>If the AR-REPLICATOR-set is empty, the AR-LEAFMUST<bcp14>MUST</bcp14> send the packet to flood-list#2.</t> <t>If#2.</li> <li>If the AR-REPLICATOR-set is NOT empty, the AR-LEAFMUST<bcp14>MUST</bcp14> send the packet to flood-list #1, where only one of the overlay tunnels of the AR-REPLICATOR-set isused.</t> </list></t> <t>Whenused.</li> </ul> </li> <li>When an AR-LEAF receives a BM packet on an overlay tunnel, it will forward the BM packet to its localAttachment CircuitsACs and never to an overlay tunnel. This is the regularIngress Replicationingress replication behavior described in <xreftarget="RFC7432"/>.</t> <t>AR-LEAFtarget="RFC7432" format="default"/>.</li> <li>AR-LEAF nodes process Unknown unicast traffic in the same way AR-REPLICATORS do, as described in <xreftarget="sect-5.1"/>.</t> </list></t>target="sect-5.1" format="default"/>.</li> </ul> </section> <section anchor="sect-5.3"title="RNVE Procedures"> <t>RNVE (Regular Network Virtualization Edge node)numbered="true" toc="default"> <name>RNVE Procedures</name> <t>An RNVE is defined as an NVE/PE without AR-REPLICATOR or AR-LEAF capabilities that doesIngress Replicationingress replication as described in <xreftarget="RFC7432"/>.target="RFC7432" format="default"/>. The RNVE does not signal any AR role and is unaware of the AR-REPLICATOR/LEAF roles in the BD. The RNVE will ignore theFlagsflags in the Regular-IR routes and will ignore the Replicator-AR routes (due to an unknown tunnel type in the PMSI Tunnel Attribute) and the LeafAuto-DiscoveryA-D routes (due to the IP-address-specificroute-target).</t>Route Target).</t> <t>This role providesEVPNEVPNs with thebackwardsbackward compatibility required in optimizedIngress Replicationingress replication BDs. In <xreftarget="ure-optimized-ir-scenario"/> showstarget="ure-optimized-ir-scenario" format="default"/>, NVE2 acts as an RNVE.</t> </section> </section> <section anchor="sect-6"title="Selectivenumbered="true" toc="default"> <name>Selective Assisted-Replication (AR) SolutionDescription">Description</name> <t><xreftarget="selective-ar"/>target="selective-ar" format="default"/> is used to describe the selective AR solution.</t> <figureanchor="selective-ar" title="Selectiveanchor="selective-ar"> <name>Selective ARscenario"> <artwork><![CDATA[Scenario</name> <artwork name="" type="" align="left" alt=""><![CDATA[ ( ) (_ WAN _) +---(_ _)----+ | (_ _) | PE1 | PE2 | +------+----+ +----+------+ TS1--+ (BD-1) | | (BD-1) +--TS2 |REPLICATOR | |REPLICATOR | +--------+--+ +--+--------+ | | +--+----------------+--+ | | | | +----+VXLAN/nvGRE/MPLSoGREVXLAN/NVGRE/MPLSoGRE +----+ | | IP Fabric | | | | | | NVE1 | +-----------+----------+ | NVE3 Hypervisor|TORToR | NVE2 |Hypervisor +---------+-+ +-----+-----+ +-+---------+ | (BD-1) | | (BD-1) | | (BD-1) || LEAF-set1|LEAF-set-1 | |LEAF-set-1 | |LEAF-set-2 | +--+-----+--+ +--+-----+--+ +--+-----+--+ | | | | | | VM11 VM12 TS3 TS4 VM31 VM32 ]]></artwork> </figure> <!-- [rfced] Figure 5: We changed "LEAF-set1" to "LEAF-set-1", but is it correct to have two "LEAF-set-1"s in the diagram? In other words, are three distinct LEAF-sets shown here, or two? Original: | LEAF-set1 | |LEAF-set-1 | |LEAF-set-2 | Currently: |LEAF-set-1 | |LEAF-set-1 | |LEAF-set-2 | Possibly (if there should be three LEAF-sets): |LEAF-set-1 | |LEAF-set-2 | |LEAF-set-3 | --> <t>The solution is called "selective" because a given AR-REPLICATORMUST<bcp14>MUST</bcp14> replicate the BM traffic to only the AR-LEAFs that requested the replication (as opposed to all the AR-LEAF nodes) andMUST<bcp14>MUST</bcp14> replicate the BM traffic to the RNVEs (if there are any). The same AR roles as those defined in <xreftarget="sect-4"/>target="sect-4" format="default"/> are used here; however, the procedures are different. <!-- [rfced] Section 6: We found this sentence confusing, as it seems that the AR roles are defined more clearly in Section 5 than in Section 4. Should Section 5 also be cited here? Original: The same AR roles defined in Section 4 are used here, however the procedures aredifferent.</t>different. --> </t> <t>The Selective AR procedures create multiple AR-LEAF-sets in the EVPNBD,BD and build single-hop trees among AR-LEAFs of the same set(AR-LEAF->AR-REPLICATOR->AR-LEAF),(AR-LEAF->AR-REPLICATOR->AR-LEAF) and two-hop trees among AR-LEAFs of different sets (AR-LEAF->AR-REPLICATOR->AR-REPLICATOR->AR-LEAF). Compared to the Selective solution, theNon-SelectiveNon-selective AR method assumes that all the AR-LEAFs of the BD are in the same set and always creates two-hop trees among AR-LEAFs. While the Selective solution is more efficient than the Non-Selective solution in multi-stage IP fabrics, the trade-off is additional signaling and an additional outer source IP address lookup.</t> <t>The followingsub-sectionssubsections describe the differences in the proceduresof AR-REPLICATOR/LEAFsfor AR-REPLICATORs/LEAFs compared to the non-selective AR solution. Thereisare nochange on thechanges applicable to RNVEs.</t> <section anchor="sect-6.1"title="Selectivenumbered="true" toc="default"> <name>Selective AR-REPLICATORProcedures">Procedures</name> <t>In our example in <xreftarget="selective-ar"/>,target="selective-ar" format="default"/>, PE1 and PE2 are defined as Selective AR-REPLICATORs. The following considerations apply to the Selective AR-REPLICATOR role:</t><t><list style="letters"> <t>The<ol spacing="normal" type="a"><li>The Selective AR-REPLICATOR capabilitySHOULD<bcp14>SHOULD</bcp14> be an administrative choice in any NVE/PE that is part of an Assisted-Replication-enabledBD,BD as the AR role itself. This administrative optionMAY<bcp14>MAY</bcp14> be implemented as asystem levelsystem-level option as opposed toasa per-BDoption.</t> <t>Eachoption. <!-- [rfced] Section 6.1: This sentence is difficult to follow. May we update the text per item a. in Section 5.1? If not, please clarify "part of an Assisted-Replication-enabled BD as the AR role itself". Original: a. The Selective AR-REPLICATOR capability SHOULD be an administrative choice in any NVE/PE that is part of an Assisted- Replication-enabled BD, as the AR role itself. Suggested: a. The AR-REPLICATOR role SHOULD be an administrative choice in any NVE/PE that is part of an Assisted- Replication-enabled BD. --> </li> <li>Each AR-REPLICATOR will build a list of AR-REPLICATOR,AR-LEAFAR-LEAF, and RNVE nodes. In spite of the'Selective'"Selective" administrative option, an AR-REPLICATORMUST NOT<bcp14>MUST NOT</bcp14> behave as a Selective AR-REPLICATOR if at least one of the AR-REPLICATORs has the L flag NOT set. If at least one AR-REPLICATOR sends a Replicator-AR route withL=0L = 0 (in the BD context), the rest of the AR-REPLICATORs will fall back to non-selective ARmode.</t>mode.</li> <li> <t>The Selective AR-REPLICATORMUST<bcp14>MUST</bcp14> follow the procedures described in <xreftarget="sect-5.1"/>,target="sect-5.1" format="default"/>, except for the followingdifferences:<list style="symbols"> <t>Thedifferences:</t> <ul spacing="normal"> <li>The Replicator-AR routeMUST<bcp14>MUST</bcp14> includeL=1 (Leaf Information Required)L = 1 in the Replicator-AR route. This flag is used by the AR-REPLICATORs to advertise their'selective'"selective" AR-REPLICATOR capabilities. In addition, the AR-REPLICATOR auto-configures its IP-address-specific importroute-targetRoute Target as described in the third bullet of the procedures for LeafAuto-Discovery routeA-D routes in <xreftarget="sect-4"/>.</t> <t>Thetarget="sect-4" format="default"/>. <!-- [rfced] Section 6.1: It seems odd to repeat "Replicator-AR route" in this sentence. May we update as suggested, or should the wording be revised? Original: o The Replicator-AR route MUST include L=1 (Leaf Information Required) in the Replicator-AR route. Suggested: * The Replicator-AR route MUST have the L flag set to 1. --> </li> <li>The AR-REPLICATOR will build a'selective'"selective" AR-LEAF-set with the list of nodes that requested replication to its own AR-IP. For instance, assuming that NVE1 and NVE2 advertise a LeafAuto-DiscoveryA-D route with PE1's IP-address-specificroute-targetRoute Target and NVE3 advertises a LeafAuto-DiscoveryA-D route with PE2's IP-address-specificroute-target,Route Target, PE1 will only add NVE1/NVE2 to its selective AR-LEAF-set forBD-1,BD-1 and exclude NVE3. Likewise, PE2 will only add NVE3 to its selective AR-LEAF-set forBD-1,BD-1 and excludeNVE1/NVE2.</t>NVE1/NVE2.</li> <li> <t>When a node defined and operating as a Selective AR-REPLICATOR receives a packet on an overlay tunnel, it will do a tunnel destination IPlookuplookup, and if the destination IP address is the AR-REPLICATOR AR-IP Address, the nodeMUST<bcp14>MUST</bcp14> replicate the packetto:<list style="symbols"> <t>local Attachment Circuits</t> <t>overlayto:</t> <ul spacing="normal"> <li>Local ACs.</li> <li>Overlay tunnels in the Selective AR-LEAF-set, excluding the overlay tunnel to the sourceAR-LEAF.</t> <t>overlayAR-LEAF.</li> <li>Overlay tunnels to the RNVEs if the tunnel source IP address is the IR-IP of an AR-LEAF. In any other case, the AR-REPLICATORMUST NOT<bcp14>MUST NOT</bcp14> replicate the BM traffic to remote RNVEs. In other words, only the first-hop selective AR-REPLICATOR will replicate to all theRNVEs.</t> <t>overlayRNVEs.</li> <li>Overlay tunnels to the remote Selective AR-REPLICATORs if the tunnel source IP address (of the encapsulated packet that arrived on the overlay tunnel) is an IR-IP of its own AR-LEAF-set. In any other case, the AR-REPLICATORMUST NOT<bcp14>MUST NOT</bcp14> replicate the BM traffic to remote AR-REPLICATORs. When doing this replication, the tunnel destination IP address is the AR-IP of the remote Selective AR-REPLICATOR. The tunnel destination IP AR-IP willbe an indication forindicate to the remote Selective AR-REPLICATOR that the packet needs further replication to itsAR-LEAFs.</t> </list></t> </list></t> </list>AAR-LEAFs.</li> </ul> </li> </ul> </li> </ol> <t>A Selective AR-REPLICATOR data path implementationMUST<bcp14>MUST</bcp14> be compatible with the following rules:</t><t><list style="symbols"><ul spacing="normal"> <li> <t>The Selective AR-REPLICATORs will build twoflood-lists:<list style="numbers"> <t>Flood-list #1 - composedflood-lists:</t> <dl> <dt>Flood-list #1: </dt> <dd><t>Composed ofAttachment CircuitsACs and overlay tunnels to the remote nodes in the BD, always using the IR-IPs in the tunnel destination IP addresses.</t><t>Flood-list #2 - composed</dd> <dt>Flood-list #2: </dt> <dd><t>Composed ofAttachment Circuits,ACs, a SelectiveAR-LEAF-setAR-LEAF-set, and a Selective AR-REPLICATOR-set,where:<list style="symbols"> <t>Thewhere:</t> <ul spacing="normal"> <li>The Selective AR-LEAF-set is composed of the overlay tunnels to the AR-LEAFs that advertise a LeafAuto-DiscoveryA-D route for the local AR-REPLICATOR. This set is updated with every LeafAuto-DiscoveryA-D route received/withdrawn from a newAR-LEAF.</t> <t>TheAR-LEAF.</li> <li>The Selective AR-REPLICATOR-set is composed of the overlay tunnels to all the AR-REPLICATORs that send a Replicator-AR route withL=1.L = 1. The AR-IP addresses are used as tunnel destinationIP.</t> </list></t> </list></t> <t>SomeIP addresses.</li> </ul> </dd> </dl> </li> <li>Some of the overlay tunnels in the flood-listsMAY<bcp14>MAY</bcp14> be flagged as non-BM receivers based on the BM flag received from the remote nodes in theroutes.</t> <t>Whenroutes.</li> <li>When a Selective AR-REPLICATOR receives a BM packet on anAttachment Circuit,AC, itMUST<bcp14>MUST</bcp14> forward the BM packet to its flood-list #1, skipping the non-BM overlaytunnels.</t>tunnels.</li> <li> <t>When a Selective AR-REPLICATOR receives a BM packet on an overlay tunnel, it will check the destination and source IPs of the underlay IP headerand:<list style="symbols"> <t>Ifand:</t> <ul spacing="normal"> <li>If the destination IP address matches its AR-IP and the source IP address matches an IP of its own Selective AR-LEAF-set, the AR-REPLICATORMUST<bcp14>MUST</bcp14> forward the BM packet to its flood-list #2, unless some AR-REPLICATOR within the BD has advertisedL=0.L = 0. In the latter case, the node revertsbackto non-selectivemodemode, and flood-list #1MUST<bcp14>MUST</bcp14> be used. Non-BM overlay tunnels are skipped when sending BMpackets.</t> <t>Ifpackets.</li> <li>If the destination IP address matches its AR-IP and the source IP address does not match any IP address of its Selective AR-LEAF-set, the AR-REPLICATOR <bcp14>MUST</bcp14> forward the BM packet to flood-list #2 but skipping the AR-REPLICATOR-set. Non-BM overlay tunnels are skipped when sending BM packets. <!-- [rfced] Section 6.1: This sentence does not parse. Does the "MUST" in this sentence also apply to "but skipping"? If the suggested text is not correct, please provide clarifying text. Original: o If the destination IP address matches its AR-IP and the source IP address does not match any IP address of its Selective AR- LEAF-set, the AR-REPLICATOR MUST forward the BM packet to flood-list #2 but skipping the AR-REPLICATOR-set. Non-BM overlay tunnels are skipped when sending BMpackets.</t> <t>Ifpackets. Suggested (assuming that "MUST" applies to both forwarding and skipping): - If the destination IP address matches itsIR-IP,AR-IP and the source IP address does not match any IP address of its Selective AR- LEAF-set, the AR-REPLICATOR MUST forward the BM packet to flood-list #2, skipping the AR-REPLICATOR-set. Non-BM overlay tunnels are skipped when sending BM packets. --> </li> <li>If the destination IP address matches its IR-IP, the AR-REPLICATOR <bcp14>MUST</bcp14> use flood-list #1 butMUST<bcp14>MUST</bcp14> skip all the overlay tunnels from the flooding list,i.e.i.e., it will only replicate to localAttachment Circuits.ACs. This is theregular-IRregular ingress replication behavior described in <xreftarget="RFC7432"/>.target="RFC7432" format="default"/>. Non-BM overlay tunnels are skipped when sending BMpackets.</t> </list></t> <t>Inpackets.</li> </ul> </li> <li>In any case, the AR-REPLICATOR ensures that the traffic is not sent back to the originating source. If the encapsulation is MPLSoGRE or MPLSoUDP and the received BD label (the label that the AR-REPLICATOR advertised in the Replicator-AR route) is not at the bottom of the stack, the AR-REPLICATORMUST<bcp14>MUST</bcp14> copy the rest of the labels when forwarding them to the egress overlaytunnels.</t> </list></t>tunnels.</li> </ul> </section> <section anchor="sect-6.2"title="Selectivenumbered="true" toc="default"> <name>Selective AR-LEAFProcedures">Procedures</name> <t>A Selective AR-LEAF chooses a single Selective AR-REPLICATOR per BD and:</t><t><list style="symbols"> <?rfc subcompact="yes"?> <t>Sends<ul spacing="normal"> <li>Sends all the BD's BM traffic to that AR-REPLICATORand</t> <t>Expectsand</li> <li>Expects to receive all the BM traffic for a given BD from the same AR-REPLICATOR (except for the BM traffic from the RNVEs, which comes directly from theRNVEs)</t> <?rfc subcompact="no"?> </list></t>RNVEs)</li> </ul> <t>In the exampleofin <xreftarget="selective-ar"/>,target="selective-ar" format="default"/>, we consider NVE1/NVE2/NVE3 as Selective AR-LEAFs. NVE1 selects PE1 as its Selective AR-REPLICATOR. If that is so, NVE1 will send all its BM traffic for BD-1 to PE1. If otherAR-LEAF/REPLICATORsAR-LEAFs/REPLICATORs send BM traffic, NVE1 will receive that traffic from PE1.These are the differences in the behavior of aA Selective AR-LEAFcompared toand a non-selectiveAR-LEAF:<list style="letters"> <t>TheAR-LEAF behave differently, as follows:</t> <ol spacing="normal" type="a"><li>The AR-LEAF roleselective capability SHOULD<bcp14>SHOULD</bcp14> be an administrative choice in any NVE/PE that is part of an Assisted-Replication-enabled BD. This administrative option to enable AR-LEAF capabilitiesMAY<bcp14>MAY</bcp14> be implemented as asystem levelsystem-level option as opposed toasa per-BDoption.</t> <t>Theoption. <!-- [rfced] Section 6.2: We updated this sentence per item a. in Section 5.2. If this is incorrect, please clarify "role selective capability". Original: a. The AR-LEAFMAYrole selective capability SHOULD be an administrative choice in any NVE/PE that is part of an Assisted-Replication- enabled BD. Currently: a. The AR-LEAF role SHOULD be an administrative choice in any NVE/PE that is part of an Assisted-Replication-enabled BD. --> </li> <li>The AR-LEAF <bcp14>MAY</bcp14> advertise a Regular-IR route if there are RNVEs in the BD. The Selective AR-LEAFMUST<bcp14>MUST</bcp14> advertise a LeafAuto-DiscoveryA-D route after receiving a Replicator-AR route withL=1.L = 1. It isRECOMMENDED<bcp14>RECOMMENDED</bcp14> that the Selective AR-LEAFwaitswait for a period specified by an AR-LEAF-join-wait-timer (in seconds, with a default valueisof 3) before sending the LeafAuto-DiscoveryA-D route, so that the AR-LEAF can collect all the Replicator-AR routes for the BD before advertising the LeafAuto-DiscoveryA-D route. If the Replicator-AR route withL=1L = 1 is withdrawn, the corresponding LeafAuto-DiscoveryA-D route is withdrawntoo.</t>too. <!-- [rfced] Section 6.2: As the original text seemed to indicate that the AR-LEAF should wait for a timer, we updated the text to indicate that the AR-LEAF should wait for some interval to pass. If this is incorrect, please provide clarifying text (e.g., are some words missing?). Original: It is RECOMMENDED that the Selective AR-LEAF waits for an AR- LEAF-join-wait-timer (in seconds, default value is 3) before sending the Leaf Auto-Discovery route, so that the AR-LEAF can collect all the Replicator-AR routes for the BD before advertising the Leaf Auto-Discovery route. Currently: It is RECOMMENDED that the Selective AR-LEAF wait for a period specified by an AR-LEAF-join-wait-timer (in seconds, with a default value of 3) before sending the Leaf A-D route, so that the AR-LEAF can collect all the Replicator-AR routes for the BD before advertising the Leaf A-D route. --> </li> <li> <t>In a service where there is more than one SelectiveAR-REPLICATORAR-REPLICATOR, the Selective AR-LEAFMUST<bcp14>MUST</bcp14> locally select a single Selective AR-REPLICATOR for the BD. Once selected:<list style="symbols"> <t>The</t> <ul spacing="normal"> <li>The Selective AR-LEAFMUST<bcp14>MUST</bcp14> send a LeafAuto-Discovery routeA-D route, including theRoute-keyroute key and IP-address-specificroute-targetRoute Target of the selectedAR-REPLICATOR.</t> <t>TheAR-REPLICATOR.</li> <li>The Selective AR-LEAFMUST<bcp14>MUST</bcp14> send all the BM packets received on theattachment circuits (ACs)ACs for a given BD to thatAR-REPLICATOR.</t> <t>InAR-REPLICATOR.</li> <li>In the case ofafailureonof the selected AR-REPLICATOR (detected when the Replicator-AR route becomes infeasible asthea result of any of the underlying BGP mechanisms), another AR-REPLICATOR will be selected and a new LeafAuto-DiscoveryA-D update will be issued for the new AR-REPLICATOR. This new route will update the selective list in the new Selective AR-REPLICATOR. In the case of failure of the active Selective AR-REPLICATOR, it isRECOMMENDED for<bcp14>RECOMMENDED</bcp14> that the Selective AR-LEAFtorevert toIngress Replicationingress replication behavior fora timeran AR-REPLICATOR-activation-timer (in seconds, with a default valueisof 3) to mitigate the traffic impact. When the timer expires, the Selective AR-LEAF will resume its AR mode with the new Selective AR-REPLICATOR. The AR-REPLICATOR-activation-timerMAY<bcp14>MAY</bcp14> be the same configurable parameter as in <xreftarget="sect-5.2"/>.</t> <t>Atarget="sect-5.2" format="default"/>. <!-- [rfced] Section 6.2: a) We changed "a timer AR-REPLICATOR-activation-timer" to "an AR-REPLICATOR-activation-timer", but we still find this sentence difficult to follow. Should "behavior for an AR-REPLICATOR-activation-timer" be "behavior after a specified AR-REPLICATOR-activation-timer interval"? If not, please provide clarifying text. Original: In case of failure of the active Selective AR- REPLICATOR, it is RECOMMENDED for the Selective AR-LEAF to revert to Ingress Replication behavior for a timer AR- REPLICATOR-activation-timer (in seconds, default value is 3) to mitigate the traffic impact. Currently: In the case of failure of the active Selective AR-REPLICATOR, it is RECOMMENDED that the Selective AR-LEAF revert to ingress replication behavior for an AR-REPLICATOR-activation-timer (in seconds, with a default value of 3) to mitigate the traffic impact. b) Does "the same configurable parameter as in Section 5.2" mean "the same configurable parameter as the parameter discussed in Section 5.2", "the same configurable parameter, as discussed in Section 5.2", or something else? Please clarify. Original: The AR-REPLICATOR-activation-timer MAY be the same configurable parameter as in Section 5.2. --> </li> <li>A Selective AR-LEAF <bcp14>MAY</bcp14> change theAR-REPLICATOR(s)selectiondynamically,of AR-REPLICATOR(s) dynamically due to an administrative or policy configurationchange.</t> </list></t> </list></t>change.</li> </ul> </li> </ol> <t>All the AR-LEAFs in a BD are expected to be configured as either selective or non-selective. A mix of selective and non-selective AR-LEAFsSHOULD NOT<bcp14>SHOULD NOT</bcp14> coexist in the same BD.In case there isIf a non-selectiveAR-LEAF,AR-LEAF is present, its BM traffic sent to a selective AR-REPLICATOR will not be replicated to other AR-LEAFs that are not in its Selective AR-LEAF-set.</t> <t>A Selective AR-LEAFMUST<bcp14>MUST</bcp14> follow a data path implementation compatible with the followingrules:<list style="symbols">rules:</t> <ul spacing="normal"> <li> <t>The Selective AR-LEAF nodes will build twoflood-lists:<list style="numbers"> <t>Flood-list #1 - composedflood-lists:</t> <dl> <dt>Flood-list #1: </dt> <dd>Composed ofAttachment CircuitsACs and the overlay tunnel to the selected AR-REPLICATOR (using the AR-IP as the tunnel destination IPaddress).</t> <t>Flood-list #2 - composedaddress). </dd> <dt>Flood-list #2: </dt> <dd>Composed ofAttachment CircuitsACs and overlay tunnels to the remote IR-IPaddresses.</t> </list></t> <t>Someaddresses. </dd> </dl> </li> <li>Some of the overlay tunnels in the flood-listsMAY<bcp14>MAY</bcp14> be flagged as non-BM receivers based on the BM flag received from the remote nodes in theroutes.</t> <t>Whenroutes.</li> <li>When an AR-LEAF receives a BM packet on anAttachment Circuit,AC, it will check to see iftherean AR-REPLICATOR was selected; if one isany selected AR-REPLICATOR. If there is,found, flood-list #1MUST<bcp14>MUST</bcp14> be used. Otherwise, flood-list #2MUST<bcp14>MUST</bcp14> be used. Non-BM overlay tunnels are skipped when sending BMpackets.</t> <t>Whenpackets.</li> <li>When an AR-LEAF receives a BM packet on an overlay tunnel, itMUST<bcp14>MUST</bcp14> forward the BM packet to its localAttachment CircuitsACs and never to an overlay tunnel. This is the regularIngress Replicationingress replication behavior described in <xreftarget="RFC7432"/>.</t> </list></t>target="RFC7432" format="default"/>.</li> </ul> </section> </section> <section anchor="sect-7"title="Pruned-Flood-Lists (PFL)">numbered="true" toc="default"> <name>Pruned-Flood-Lists (PFLs)</name> <t>In addition to AR, the second optimization supported by this solution is the abilityfor theof all the BD nodes to signalPruned-Flood-Lists (PFL).PFLs. As described in <xreftarget="sect-4"/>,target="sect-4" format="default"/>, an EVPN node can signal a given value for the BM and UPruned-Food-ListsPFLs flags in the Regular-IR,Replicator-ARReplicator-AR, or LeafAuto-DiscoveryA-D routes, where:</t><t><list style="symbols"> <t>BM<ul spacing="normal"> <li>BM is the Broadcast and Multicast flag.BM=1BM = 1 means "prune-me" from the BM flood-list.BM=0 meansBM = 0 indicates regularbehavior.</t> <t>Ubehavior.</li> <li>U is the Unknown flag.U=1U = 1 means "prune-me" from the Unknown flood-list.U=0 meansU = 0 indicates regularbehavior.</t> </list></t>behavior.</li> </ul> <t>The ability to signal and process thesePruned-Flood-ListsPFLs flagsSHOULD<bcp14>SHOULD</bcp14> be an administrative choice. If a node is configured to process thePruned-Flood-ListsPFLs flags, upon receiving a non-zeroPruned-Flood-ListsPFLs flag for a route,thean NVE/PE will add the corresponding flag to the created overlay tunnel in the flood-list. When replicating a BM packet in the context of a flood-list, the NVE/PE will skip the overlay tunnels marked with the flagBM=1,BM = 1, since theNVE/PENVEs/PEs at the end of those tunnels are not expecting BM packets. Similarly, when replicating Unknown unicast packets, the NVE/PE will skip the overlay tunnels marked withU=1.</t>U = 1.</t> <t>An NVE/PE not following this document or not configured for this optimization will ignore any of the receivedPruned-Flood-ListsPFLs flags. An AR-LEAF or RNVE receiving BUM traffic on an overlay tunnelMUST<bcp14>MUST</bcp14> replicate the traffic to its localAttachment Circuits,ACs, regardless of the BM/U flags on the overlay tunnels.</t> <t>This optimizationMAY<bcp14>MAY</bcp14> be used along with the Assisted-Replication solution.</t> <section anchor="sect-7.1"title="Anumbered="true" toc="default"> <name>A Pruned-Flood-ListExample">Example</name> <t>In order to illustrate the use of the PFLs solution, we will assume that BD-1 in <xref target="ure-optimized-ir-scenario" format="default"/> is optimized ingress replication enabled and: <!-- [rfced] Section 7.1: As it appears that "the solution described in this document" refers to the PFLs solution and not to the Assisted-Replication solution as mentioned in Section 1 ("The Assisted-Replication solution described in this document") or the optimized ingress replication solution in general, we updated this sentence accordingly. If this is incorrect, please provide clarifying text. Original: In order to illustrate the use of the solution described in this document, we will assume that BD-1 in<xref target="ure-optimized-ir-scenario"/>Figure 4 is optimized Ingress Replication enabledand:</t> <t><list style="symbols"> <t>PE1and: Currently: In order to illustrate the use of the PFLs solution, we will assume that BD-1 in Figure 4 is optimized ingress replication enabled and: --> </t> <ul spacing="normal"> <li>PE1 and PE2 are administratively configured asAR-REPLICATORs,AR-REPLICATORs due to their high-performance replication capabilities. PE1 and PE2 will send a Replicator-AR route with BM/U flags =00.</t>00.</li> <li> <t>NVE1 and NVE3 are administratively configured as AR-LEAFnodes,nodes due to their low-performance software-based replication capabilities. They will advertise a Regular-IR route with type AR-LEAF. Assuming that both NVEs advertise all of the attachedVirtual MachinesVMs' MAC and IP addresses inEVPNEVPNs as soon as they comeup,up and these NVEs do not have anyVirtual MachinesVMs interested in multicast applications, they will be configured to signal BM/U flags = 11 for BD-1. That is, neither NVE1 nor NVE3areis interested in receiving BM or Unknown Unicasttraffic since:<list style="symbols"> <t>Theirtraffic, since:</t> <ul spacing="normal"> <li>Their attached VMs (VM11, VM12, VM31, VM32) do not support multicastapplications.</t> <t>Theirapplications.</li> <li>Their attached VMs will not receive ARP Requests.Proxy-ARPProxy ARP <xreftarget="I-D.ietf-bess-evpn-proxy-arp-nd"/>target="RFC9161" format="default"/> on the remote NVEs/PEs will reply to ARP Requests locally, and no other broadcast traffic is expected. <!-- [rfced] Section 7.1: As it appears that "will reply ARP Requests" means "will reply to ARP Requests", we updated this sentence accordingly. If this is incorrect, please provide clarifying text. Original: Proxy-ARP [I-D.ietf-bess-evpn-proxy-arp-nd] on the remote NVE/PEs will reply ARP Requests locally, and no other Broadcast isexpected.</t> <t>Theirexpected. Currently: Proxy ARP [RFC9161] on the remote NVEs/PEs will reply to ARP Requests locally, and no other broadcast traffic is expected. --> </li> <li>Their attached VMs will not receive unknown unicast traffic, since the VMs' MAC and IP addresses are always advertised byEVPNEVPNs as long as the VMs areactive.</t> </list></t> <t>NVE2active.</li> </ul> </li> <li>NVE2 is optimizedIngress Replicationingress replication unaware;thereforetherefore, it takes on the RNVE role inBD-1.</t> </list></t>BD-1.</li> </ul> <t>Based on the aboveassumptionsassumptions, the following forwarding behavior will take place:</t><t><list hangIndent="4" style="numbers"> <t hangText="">Any<ol spacing="normal" type="1"><li>Any BM packets sent from VM11 will be sent to VM12 and PE1. PE1 will then forwardfurtherthe BM packets on to TS1, the WAN link,PE2PE2, andNVE2,NVE2 but not to NVE3. PE2 and NVE2 will replicate the BM packets to their localAttachment CircuitsACs, butwe will avoidNVE3 will be prevented from having to replicateunnecessarilythose BM packets to VM31 andVM32.</t> <t hangText="">AnyVM32 unnecessarily.</li> <li>Any BM packets received on PE2 from the WAN will be sent to PE1 andNVE2,NVE2 but not to NVE1 and NVE3, sparing the two hypervisors from replicating unnecessarily to their localVirtual Machines.VMs. PE1 and NVE2 will replicate to their localAttachment Circuits only.</t> <t hangText="">AnyACs only.</li> <li>Any Unknown unicast packet sent from VM31 will be forwarded by NVE3 to NVE2,PE1PE1, and PE2 but not to NVE1. The solutionavoids theprevents unnecessary replication to NVE1, since the destination of the unknown traffic cannot beat NVE1.</t> <t hangText="">AnyNVE1.</li> <li>Any Unknown unicast packet sent from TS1 will be forwarded by PE1 to the WAN link,PE2PE2, and NVE2 but not to NVE1 and NVE3, since the target of the unknown traffic cannot beat those NVEs.</t> </list></t>NVE1 or NVE3.</li> </ol> </section> </section> <section anchor="sect-8"title="ARnumbered="true" toc="default"> <name>AR Procedures for Single-IPAR-REPLICATORS">AR-REPLICATORS</name> <t>The procedures explained insections <xref target="sect-5"/>Sections <xref target="sect-5" format="counter"/> and <xreftarget="sect-6"/>target="sect-6" format="counter"/> assume that the AR-REPLICATOR can use two local routable IP addresses to terminate and originateNetwork Virtualization OverlayNVO tunnels,i.e.i.e., IR-IP and AR-IP addresses. This is usually the case for PE-based AR-REPLICATOR nodes.</t> <t>In some cases, the AR-REPLICATOR node does not support more than one IP address to terminate and originateNetwork Virtualization OverlayNVO tunnels,i.e.i.e., the IR-IP and AR-IP are the same IP addresses. This may be the case in some software-based or low-end AR-REPLICATOR nodes. If this is the case, the procedures provided insections <xref target="sect-5"/>Sections <xref target="sect-5" format="counter"/> and <xreftarget="sect-6"/> MUSTtarget="sect-6" format="counter"/> <bcp14>MUST</bcp14> be modified in the following way:</t><t><list style="symbols"> <t>The<ul spacing="normal"> <li>The Replicator-AR routes generated by the AR-REPLICATOR use an AR-IP that will match its IR-IP. In order to differentiate the data plane packets that need to useIngress Replicationingress replication from the packets that must use Assisted Replication forwarding mode, the Replicator-AR routeMUST<bcp14>MUST</bcp14> advertise a different VNI/VSID than the one used by the Regular-IR route. For instance, the AR-REPLICATOR will advertise an AR-VNI along with the Replicator-AR route and an IR-VNI along with the Regular-IR route. Since both routes have the same key, different Route Distinguishers are needed in eachroute.</t> <t>Anroute.</li> <li>An AR-REPLICATOR will perform Ingress Replication forwarding mode or Assisted Replication forwarding mode for the incoming overlay packets based on an ingress VNI lookup as opposed to the tunnel IP DA lookup. Note that when replicating to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI advertised by the egress node will determine whether Ingress Replication forwarding mode or Assisted Replication forwarding mode is used at the subsequent AR-REPLICATOR.</li> <!-- [rfced] Section 8: As it appears that "Ingress Replication or Assisted Replication forwarding mode" means "Ingress Replication forwarding mode or Assisted Replication forwarding mode" as opposed to "ingress replication or Assisted Replication forwarding mode", we updated this paragraph accordingly. If this is incorrect, please clarify the text. Original: - An AR-REPLICATOR will perform Ingress Replication or Assisted Replication forwarding mode for the incoming Overlay packets based on an ingress VNI lookup, as opposed to the tunnel IP DA lookup. Note that, when replicating to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI advertised by the egress node will determine the Ingress Replication or Assisted Replication forwarding mode at the subsequentAR-REPLICATOR.</t> </list></t>AR-REPLICATOR. Currently: * An AR-REPLICATOR will perform Ingress Replication forwarding mode or Assisted Replication forwarding mode for the incoming overlay packets based on an ingress VNI lookup as opposed to the tunnel IP DA lookup. Note that when replicating to remote AR-REPLICATOR nodes, the use of the IR-VNI or AR-VNI advertised by the egress node will determine whether Ingress Replication forwarding mode or Assisted Replication forwarding mode is used at the subsequent AR- REPLICATOR. --> </ul> <t>The rest of the procedures will followwhat isthose described insections <xref target="sect-5"/>Sections <xref target="sect-5" format="counter"/> and <xreftarget="sect-6"/>.</t>target="sect-6" format="counter"/>.</t> </section> <section anchor="sect-9"title="ARnumbered="true" toc="default"> <name>AR Procedures and EVPN All-ActiveMulti-homing Split-Horizon">Multihoming Split-Horizon</name> <t>This section extends the procedures for the cases where two or more AR-LEAF nodes are attached to the sameEthernet Segment,ES and two or more AR-REPLICATOR nodes are attached to the sameEthernet SegmentES in the BD. The mixedcase, that is,case -- where an AR-LEAF node and an AR-REPLICATOR node are attached to the sameEthernet Segment,ES -- would require extended proceduresand it isthat are out ofscope.</t>scope for this document.</t> <section anchor="sect-9.1"title="Ethernetnumbered="true" toc="default"> <name>Ethernet Segments on AR-LEAFNodes">Nodes</name> <t>If a VXLAN or NVGREare used,is used and if theSplit-horizonsplit-horizon is based on the tunnel IP Source Address and"Local-Bias""local bias" as described in <xreftarget="RFC8365"/>,target="RFC8365" format="default"/>, theSplit-horizonsplit-horizon check will not work ifthere isanEthernet-SegmentES is shared between two AR-LEAF nodes, and the AR-REPLICATOR replaces the tunnel IP Source Address of the packets with its own AR-IP.</t> <t>In order to be compatible with the IP Source Address split-horizon check, the AR-REPLICATORMAY<bcp14>MAY</bcp14> keep the original received tunnel IP Source Address when replicating packets to a remote AR-LEAF or RNVE. This will allow AR-LEAF nodes to applySplit-horizonsplit-horizon check procedures for BMpackets,packets before sending them to the localEthernet-Segment.ES. Even if the AR-LEAF's IP Source Address is preserved when replicating to AR-LEAFs or RNVEs, the AR-REPLICATORMUST<bcp14>MUST</bcp14> always use its IR-IP as the IP Source Address when replicating to other AR-REPLICATORs.</t> <t>WhenEVPN isEVPNs are used forMPLS over GRE (or UDP),MPLSoGRE or MPLSoUDP, theESI-label basedESI-label-based split-horizon procedureasprovided in <xreftarget="RFC7432"/>target="RFC7432" format="default"/> will not work formulti-homed Ethernet-Segmentsmultihomed ESs defined on AR-LEAF nodes."Local-Bias"Local bias is recommended in this case, as it is in the case of a VXLAN or NVGRE as explained above. The"Local-Bias"local-bias and tunnel IP Source Address preservation mechanisms provide the required split-horizon behavior in non-selective or selective AR.</t> <t>Note that if the AR-REPLICATOR implementation keeps the received tunnel IP Source Address, the use ofuRPF (unicastunicast Reverse PathForwarding)Forwarding (uRPF) checks in the IP fabric based on the tunnel IP Source AddressMUST<bcp14>MUST</bcp14> be disabled.</t> </section> <section anchor="sect-9.2"title="Ethernetnumbered="true" toc="default"> <name>Ethernet Segments on AR-REPLICATORnodes">Nodes</name> <t>AR-REPLICATOR nodes attached to the same all-activeEthernet SegmentES will follow"Local-Bias"local-bias procedures <xreftarget="RFC8365"/>,target="RFC8365" format="default"/> as follows:</t><t><list style="letters"> <t>For<ol spacing="normal" type="a"><li>For BUM traffic received on a local AR-REPLICATOR'sAttachment Circuit, "Local-Bias"AC, local-bias procedures as provided in <xreftarget="RFC8365"/> MUSTtarget="RFC8365" format="default"/> <bcp14>MUST</bcp14> befollowed.</t> <t>Forfollowed.</li> <li>For BUM traffic received on an AR-REPLICATOR overlay tunnel with AR-IP as the IPDestination Address, "Local-Bias" MUSTDA, local bias <bcp14>MUST</bcp14> also be followed. That is, traffic received with AR-IP as the IPDestination AddressDA will be treated as though it had been received on a localAttachment CircuitAC that is part of theEthernet SegmentES and will be forwarded to all localEthernet Segments,ESs, irrespective of their DF or NDFstate.</t> <t>BUMstate.</li> <li>BUM traffic received on an AR-REPLICATOR overlay tunnel with IR-IP as the IPDestination Address,DA will follow regular<xref target="RFC8365"/> "Local-Bias"local-bias rules <xref target="RFC8365" format="default"/> and will not be forwarded to localEthernet SegmentsESs that are shared with the AR-LEAF or AR-REPLICATOR originating thetraffic.</t> <t>Intraffic.</li> <li>In cases where the AR-REPLICATOR supports a single IP address, the IR-IP and the AR-IP are the same IP address, as discussed in <xreftarget="sect-8"/>.target="sect-8" format="default"/>. The received BUM traffic will be treated as specified in'b'item b above if the received VNI is theAR-VNI,AR-VNI and as specified in'c'item c if the VNI is theIR-VNI.</t> </list></t>IR-VNI.</li> </ol> </section> </section> <section anchor="sect-10"title="Security Considerations">numbered="true" toc="default"> <name>Security Considerations</name> <t>TheSecurity Considerationssecurity considerations in <xreftarget="RFC7432"/>target="RFC7432" format="default"/> and <xreftarget="RFC8365"/>target="RFC8365" format="default"/> apply to this document. TheSecurity Considerationssecurity considerations related to the LeafAuto-DiscoveryA-D route in <xreftarget="I-D.ietf-bess-evpn-bum-procedure-updates"/>target="RFC9572" format="default"/> apply too.</t> <t>In addition, the Assisted-Replication method introduced by this document maybringintroduce some new risksforthat could affect the successful delivery of BM traffic. Unicast traffic is not affected by Assisted-Replication (although Unknown unicast traffic is affected by thePruned-Flood-Lists procedures).procedures for PFLs). The forwarding ofBroadcast and Multicast (BM)BM traffic is modified, and BM traffic from the AR-LEAF nodes will be attracted by the existence of AR-REPLICATORs in the BD. An AR-LEAF will forward BM traffic to its selectedAR-REPLICATOR, thereforeAR-REPLICATOR; therefore, an attack on the AR-REPLICATOR could impact the delivery of the BM traffic using that node. Also, an attack on the AR-REPLICATOR and any changeofto the advertised AR type will modify theselection onselections made by the AR-LEAF nodes. If no other AR-REPLICATOR is selected, the AR-LEAF nodes will be forced to use Ingress Replication forwarding mode, which will impactontheir performance, since the AR-LEAF nodes are usually NVEs/PEs with poor replicationperformance.</t>performance. <!-- [rfced] Section 10: a) As it appears that the AR-LEAF nodes are the entities making selections (per "other AR-LEAFs' selections" in Section 5.2), we updated this sentence accordingly. If this is incorrect, please provide clarifying text. Original: Also, an attack on the AR-REPLICATOR and change of the advertised AR type will modify the selection on the AR- LEAF nodes. Currently: Also, an attack on the AR-REPLICATOR and any change to the advertised AR type will modify the selections made by the AR-LEAF nodes. b) We could not follow the meaning of "attracted" as used in this sentence. Per <https://www.merriam-webster.com/dictionary/attracted>, it does not appear to be a good fit. Please provide alternative text. Original: The forwarding of Broadcast and Multicast (BM) traffic is modified, and BM traffic from the AR-LEAF nodes will be attracted by the existence of AR-REPLICATORs in the BD. --> </t> <t>This document introduces the abilityforof the AR-REPLICATOR to forward traffic received on an overlay tunnel to another overlay tunnel. The reader mayinterpretdetermine that this introduces the risk of BMloops. Thatloops -- that is, an AR-LEAF receiving aBM encapsulatedBM-encapsulated packet that the AR-LEAF originated in the firstplace,place due to one or two AR-REPLICATORs "looping" the BM traffic back to the AR-LEAF.TheFollowing the procedures provided in this document will prevent these BM loops, since the AR-REPLICATOR will always forward the BM traffic using the correct tunnel IPDestination AddressDA (or the correct VNI in the case of single-IPAR-REPLICATORs) thatAR-REPLICATORs), which instructs the remote nodes regarding how to forward the traffic. This is trueinfor both the Non-Selective and Selective modes defined in this document. However,a wrongincorrect implementation of the procedures provided in this document may lead to those unexpected BM loops.</t> <t>The Selective mode provides a multi-staged replication solution, whereaproper configuration of all the AR-REPLICATORs willavoidprevent any issues. A mix of mistakenly configured Selective and Non-Selective AR-REPLICATORs in the same BD could theoretically create packet duplication in someAR-LEAFs, howeverAR-LEAFs; however, this document specifies afall backfallback solution -- falling back to Non-Selective mode incasecases where the AR-REPLICATORs advertised an inconsistent ARReplicationmode.</t> <t>This document allows the AR-REPLICATOR to preserve the tunnel IP Source Address of the AR-LEAF (as an option) when forwarding BM packets from an overlay tunnel to another overlay tunnel. Preserving the AR-LEAF IP Source Address makes the"Local Bias"local-bias filtering procedures possible for AR-LEAF nodes that are attached to the sameEthernet Segment.ES. If the AR-REPLICATOR does not preserve the AR-LEAF IP Source Address, AR-LEAF nodes attached to all-activeEthernet SegmentsESs will cause packet duplication on themulti-homedmultihomed CE.</t> <t>The AR-REPLICATOR nodes are, by design, using more bandwidth than<xref target="RFC7432"/>PEsor<xreftarget="RFC8365"/>target="RFC7432" format="default"/> or NVEs <xref target="RFC8365" format="default"/> would use. Certain network events or unexpected low performance may exceed theAR-REPLICATORAR-REPLICATOR's local bandwidth and cause service disruption.</t> <t>Finally,the use of PFL as in <xref target="sect-7"/>,PFLs (<xref target="sect-7" format="default"/>) should behandledused with care.An intentionalIntentional or unintentional misconfiguration of the BDs on a given leaf node may result in the leaf not receiving the required BM or Unknown unicast traffic.</t> </section> <section anchor="sect-11"title="IANA Considerations">numbered="true" toc="default"> <name>IANA Considerations</name> <t>IANA has allocated the following Border Gateway Protocol (BGP)Parameters:</t> <t><list hangIndent="3" style="symbols"> <t hangText="">Allocationparameters:</t> <ul spacing="normal"> <li>Allocation in theP-Multicast"P-Multicast Service Interface Tunnel (PMSI Tunnel) TunnelTypes registry:</t> </list></t> <figure> <artwork><![CDATA[ Value Meaning Reference 0x0A Assisted-Replication Tunnel [This document] ]]></artwork> </figure> <t><list hangIndent="3" style="symbols"> <t hangText="">AllocationsTypes" registry:</li> </ul> <table align="center"> <thead> <tr> <th>Value</th> <th>Meaning</th> <th>Reference</th> </tr> </thead> <tbody> <tr> <td>0x0A</td> <td>Assisted-Replication Tunnel</td> <td>RFC 9574</td> </tr> </tbody> </table> <ul spacing="normal"> <li>Allocations in theP-Multicast"P-Multicast Service Interface (PMSI) Tunnel AttributeFlags registry:</t> </list></t> <figure> <artwork><![CDATA[ Value Name Reference 3-4 Assisted-ReplicationFlags" registry:</li> </ul> <table align="center"> <thead> <tr> <th>Value</th> <th>Name</th> <th>Reference</th> </tr> </thead> <tbody> <tr> <td>3-4</td> <td>Assisted-Replication Type(T) [This document] 5 Broadcast(T)</td> <td>RFC 9574</td> </tr> <tr> <td>5</td> <td>Broadcast and Multicast(BM) [This document] 6 Unknown (U) [This document] ]]></artwork> </figure>(BM)</td> <td>RFC 9574</td> </tr> <tr> <td>6</td> <td>Unknown (U)</td> <td>RFC 9574</td> </tr> </tbody> </table> </section><section title="Contributors"> <t>In addition</middle> <back> <!-- [rfced] Would you like to list thenamesreferences in alphanumeric order? --> <references> <name>References</name> <references> <name>Normative References</name> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6514.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7432.xml"/> <!-- draft-ietf-bess-evpn-bum-procedure-updates (RFC 9572) Fix title if 9572 authors approve thefront page, the following co-authors also contributedchange --> <reference anchor="RFC9572" target="https://www.rfc-editor.org/info/rfc9572"> <front> <title>Updates tothis document:</t> <t><figure> <artwork><![CDATA[ Wim Henderickx Nokia Kiran Nagaraj Nokia Ravi Shekhar Juniper Networks Nischal Sheth Juniper Networks Aldrin Isaac Juniper Mudassir Tufail Citibank ]]></artwork> </figure></t> </section>EVPN Broadcast, Unknown Unicast, or Multicast (BUM) Procedures</title> <author initials='Z' surname='Zhang' fullname='Z. Zhang'> <organization/> </author> <author initials='W' surname='Lin' fullname='W. Lin'> <organization/> </author> <author initials='J' surname='Rabadan' fullname='J. Rabadan'> <organization/> </author> <author initials='K' surname='Patel' fullname='K. Patel'> <organization/> </author> <author initials='A' surname='Sajassi' fullname='A. Sajassi'> <organization/> </author> <date month='April' year='2024'/> </front> <seriesInfo name="RFC" value="9572"/> <seriesInfo name="DOI" value="10.17487/RFC9572"/> </reference> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7902.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6513.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8365.xml"/> </references> <references> <name>Informative References</name> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7348.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4023.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7637.xml"/> <!-- draft-ietf-bess-evpn-proxy-arp-nd (RFC 9161; published) --> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9161.xml"/> </references> </references> <sectiontitle="Acknowledgments">numbered="false" toc="default"> <name>Acknowledgements</name> <t>The authors would like to thankNeil Hart, David Motz, Dai Truong, Thomas Morin, Jeffrey Zhang, Shankar Murthy and Krzysztof Szarkowicz<contact fullname="Neil Hart"/>, <contact fullname="David Motz"/>, <contact fullname="Dai Truong"/>, <contact fullname="Thomas Morin"/>, <contact fullname="Jeffrey Zhang"/>, <contact fullname="Shankar Murthy"/>, and <contact fullname="Krzysztof Szarkowicz"/> for their valuable feedback and contributions.AlsoAlso, thanks toJohn Scudder<contact fullname="John Scudder"/> for his thoroughreview thatreview, which improved the quality of the document significantly. </t> </section></middle> <back> <references title="Normative References"> &RFC2119; &RFC8174; &RFC6514; &RFC7432; &I-D.ietf-bess-evpn-bum-procedure-updates; &RFC7902; &RFC6513; &RFC8365; </references> <references title="Informative References"> &RFC7348; &RFC4023; &RFC7637; &I-D.ietf-bess-evpn-proxy-arp-nd; </references><section numbered="false" toc="default"> <name>Contributors</name> <t>In addition to the authors listed on the front page, the following people also contributed to this document and should be considered coauthors:</t> <contact fullname="Wim Henderickx"> <organization>Nokia</organization> </contact> <contact fullname="Kiran Nagaraj"> <organization>Nokia</organization> </contact> <contact fullname="Ravi Shekhar"> <organization>Juniper Networks</organization> </contact> <contact fullname="Nischal Sheth"> <organization>Juniper Networks</organization> </contact> <contact fullname="Aldrin Isaac"> <organization>Juniper</organization> </contact> <contact fullname="Mudassir Tufail"> <organization>Citibank</organization> </contact> </section> <!-- [rfced] Please let us know if any changes are needed for the following: a) The following terms were used inconsistently in this document. We chose to use the latter forms. Please let us know any objections. flags field / Flags field (per Cluster 448) Ingress Replication / ingress replication (per RFCs 9251 and 9252) NVE/PEs / NVEs/PEs nvGRE (Figures 4 and 5) / NVGRE Originating Router's IP address / Originating Router's IP Addr / Originating Router's IP Address (per RFC 9252) PMSI Tunnel attribute / PMSI Tunnel Attribute (per RFC 9252) regular-IR (2 instances) / Regular-IR (24 instances) (We see one instance of "regular IR-IP Address"; please confirm that this is correct.) regular-IR behavior described in [RFC7432] (1 instance) / regular ingress replication behavior described in [RFC7432] (4 instances) Split-horizon / split-horizon (per RFC 9252) Tunnel Type / tunnel type (where used generally) b) The following terms appear to be used inconsistently in this document. Please let us know which form is preferred. AR-IP Address / AR-IP address Assisted-Replication / Assisted Replication (e.g., "Assisted-Replication (AR)", "Assisted Replication (AR)") (We suggest that, after selecting one form, we use the abbreviation "AR" after the initial definition in Section 1.) Assisted-Replication Type field (4 instances) / AR type field (1 instance) (Are these two distinct fields, or do they both mean the same thing? If the same, which form should be used?) core network / network core (Abstract) IP Source Address / source IP address* * (e.g., as used in RFC 9251) IR-IP Address / IR-IP address multi-staged / multi-stage non-selective / Non-selective / Non-Selective (in running text, e.g., 'non-selective AR', 'solution is called "non-selective"', 'Non-Selective solution', 'Non-selective AR-REPLICATORs', 'non-selective mode', 'Non-Selective and Selective modes', 'Non-Selective mode') Suggested (with the exception of 'Non-selective mode', per other mode names in this cluster): non-selective, because we do not see a precedent for 'Non-selective' in any published RFC. However, we see 'Non-Selective Mode' used once in RFC 9469, but this appears to be an outlier. selective AR(...) / Selective AR(...) (e.g., selective AR, selective AR-LEAF-set, Selective AR-LEAF-set, selective AR-REPLICATOR, Selective AR-REPLICATOR) Suggested: selective AR(...) Please note that we also see 'Selective solution' and 'Selective mode'. We suggest 'selective solution', because it does not appear to be a proper term. We suggest 'Selective mode' per other mode names in this cluster. Single-IP AR-REPLICATOR ("in the Single-IP AR-REPLICATOR ..." (Section 2)) / single-IP AR-REPLICATOR (Section 10) T (AR role type) (2 instances) / T (Assisted-Replication type) (1 instance) Unknown Unicast / Unknown unicast / unknown unicast (where used generally) c) This is the only document in Cluster 448 (https://www.rfc-editor.org/cluster_info.php?cid=C448) that uses "flood-list" (e.g., "Unknown flooding list", "Unknown flood-list"). We also could not find any instances of "flood-list", "Flood-List", or "Flood-list" in any published RFC to date. May we change these instances to "flooding list" as used elsewhere in this document, and also per draft-ietf-bess-evpn-bum-procedure-updates? If not, how can the distinction between these two terms be clarified for readers? --> <!-- [rfced] Please review the "Inclusive Language" portion of the online Style Guide at <https://www.rfc-editor.org/styleguide/part2/#inclusive_language>, and let us know if any changes are needed. Note that our script did not flag any words in particular, but this should still be reviewed as a best practice. --> </back> </rfc>