Versatile Routing and Services with BGP: Understanding and Implementing BGP in SR-OS (2014)

Chapter 8. Graceful Restart and Error Handling

Graceful Restart is a mechanism applied to many protocols and relies on the fact that modern routers separate the control-plane (RIB) from the data-plane (FIB). Given this assumption, it is entirely possible that a router undergoing a control plane restart can maintain its forwarding state as intact during the restart, thereby significantly reducing the impact of the restart. Equally important, however, is that neighbors of the restarting router do not tear down protocol adjacencies and that they retain any routes learned from the restarting router. When the restart has been completed, the restarting router also relies on its neighbors to readvertise all prefixes previously advertised to it so that it can repopulate its RIB-IN. This is accomplished using the Graceful Restart mechanisms, which vary on a per-protocol basis but the capability is generically referred to as Non-Stop Forwarding (NSF).

An alternative method is Non-Stop Routing (NSR), which involves redundant Route Processors (CPMs) within a router being fully synchronized so that if the active processor undergoes a control-plane restart the standby processor is able to assume the active state immediately, without interruption to, and fully transparent to adjacent neighbors. It follows, therefore, that if a router is NSR-capable, it does not rely on its Graceful Restart capable neighbors to readvertise prefixes after a restart, simply because the synchronized Route Processors ensure that any processor restart is transparent to neighbors. However, implementing Graceful Restart is still useful even for NSR-capable routers. It may not need help from its neighbors during a processor restart, but it must understand Graceful Restart procedures so it can assist a non-NSR-capable router undergoing a restart. This is essentially the implementation of SR-OS.

Graceful Restart Mechanism

The Graceful Restart mechanism for BGP is defined in RFC 4724 and extended in draft-ietf-idr-bgp-gr-notification. In order for peers to agree on the use of Graceful Restart, it is negotiated as a capability during an OPEN exchange using a capability value encoded as shown in Figure 8-1.

Figure 8-1 Graceful Restart Capability

image

The most significant bit of the Restart Flags is the Restart State (R) bit, which when set (1) indicates that the BGP speaker has restarted. The second most significant bit is the Notification (N) bit, which indicates the support of Graceful Restart for NOTIFICATION messages discussed later in this chapter.

The Restart Time is the approximate time in seconds it takes for the restarting router to reestablish the BGP session. The AFI and SAFI fields indicate that Graceful Restart is supported for routes of that AFI/SAFI type. The most significant bit in the Flags for Address Family field contains the Forwarding State (F) bit, which when set (1) indicates that the forwarding state for routes of this AFI/SAFI was preserved during the previous BGP restart. Only one instance of the capability should appear in the capability advertisement, but as shown in Figure 8-1, fields are repeated for support of multiple Address Families.

The Graceful Restart mechanism also defines an End-of-RIB (EOR) marker, which indicates to a peer of a BGP speaker the completion of routing updates after the session is established. This is useful to allow a router that has just restarted to receive all routes from all of its peers before computing any best-path selection algorithms. The EOR marker is defined as an UPDATE message with no NLRI and empty withdrawn NLRI for IPv4 Address-Family, or an UPDATE message that contains the MP_UNREACH_NLRI attribute with no withdrawn routes for any other Address Family.

During the restart, the neighbor of the restarting router marks routes from the restarting router as stale. It also runs an internal timer (a “stale-routes timer”) and if the restarting router has not reestablished the BGP session and readvertised its routes before expiration of this timer, the neighboring router deletes the routes marked as stale.

To aid understanding of the Graceful Restart process, I'll step through the process of a restarting router using the simple topology depicted in Figure 8-2, where router R1 is EBGP peering with tester T2 and learning prefixes 172.16.0.0/24 to 172.16.99.0/24 from that peer. Router R1 is receiving traffic from tester T1 with a destination address of 172.16.0.1, which R1 is forwarding to tester T2 based on the prefix learned from T2. Tester T2 is the restarting router, and I'll show how router R1 deals with that restart when Graceful Restart is enabled.

Figure 8-2 Graceful Restart Test Topology

image

The configuration required to support Graceful Restart at router R1 is shown in Output 8-1 and consists of the command graceful-restart followed by a stale-routes-time in seconds, which defines the maximum time that routes are marked as stale during the restart, after which they are flushed. Output 8-1 also shows a restart-time set to 300 seconds. This is the Restart Time advertised in the Graceful Restart capability, and is configured with an explicit value at group level for no other reason than to assist in correlating the capability exchange decoding below. With the inherent NSR capabilities in SR-OS, there should be no reason to require a restart.

Output 8-1: Graceful Restart Configuration

        bgp

            group "EBGP"

                graceful-restart

                    stale-routes-time 300

                    restart-time 300

                exit

               neighbor 192.168.0.2

                   family ipv4 ipv6

                    peer-as 64510

                    split-horizon

                exit

            exit

First you can look at the OPEN message exchange between router R1 and tester T2 beginning with T2's OPEN message in Debug 8-1. Of interest here is the Graceful Restart capability field, which can be decoded as follows:

!

0x01

Restart Flag (most significant 4 bits only) not set indicating router has not restarted.

0x01 0x2c

Restart Time (12 bits consisting of second byte and least significant 4 bits of first byte). Four bits of first byte = 0x01 (decimal 256), second byte = 0x2c (decimal 44) indicating Restart Time of 300 seconds.

0x0 0x1

AFI (2 bytes) IP version 4

0x1

SAFI (1 byte) Unicast Forwarding

0x0

Flags for AFI/SAFI indicating Forwarding state bit clear

Debug 8-1: Tester T2 OPEN Message

32 2013/03/01 12:09:16.07 BST MINOR: DEBUG #2001 vprn6200 BGP

"BGP: OPEN

Peer 93: 192.168.0.2 - Received BGP OPEN: Version 4

   AS Num 64510: Holdtime 90: BGP_ID 192.168.0.2: Opt Length 18

   Opt Para: Type CAPABILITY: Length = 16: Data:

     Cap_Code MP-BGP: Length 4

       Bytes: 0x0 0x1 0x0 0x1

     Cap_Code ROUTE-REFRESH: Length 0

     Cap_Code GRACEFUL-RESTART: Length 6

       Bytes: 0x1 0x2c 0x0 0x1 0x1 0x0

"

The Graceful Restart capability field sent by router R1 in its OPEN message shown in Debug 8-2 differs because although it contains Restart Flags and Restart Time fields, it does not contain any AFI/SAFI fields or Flags for AFI/SAFI fields. Sending AFI/SAFI fields in the OPEN message when a BGP session is initially established is not strictly required, so they are not present. They must, however, always be present during the Graceful Restart process when a restarting router sends its OPEN message in an attempt to reestablish the BGP session.

!

0x01

Restart Flag (most significant 4 bits only) not set indicating router has not restarted.

0x01 0x2c

Restart Time (12 bits consisting of second byte and least significant 4 bits of first byte). Four bits of first byte = 0x01 (decimal 256), second byte = 0x2c (decimal 44) indicating Restart Time of 300 seconds.

After exchange of Keepalive messages, the BGP session between R1 and T2 moves to the Established state and T2 advertises 100 x IPv4 NLRI from 172.16.0.0/24 to 172.16.99.0/24 followed by the EOR marker. Similarly, R1 sends an EOR marker, and as previously described for the IPv4 Address Family this is an UPDATE message with no NLRI and empty withdrawn NLRI as shown in Debug 8-3.

Debug 8-2: Router R1 OPEN Message

30 2013/03/01 12:09:16.07 BST MINOR: DEBUG #2001 BGP

"BGP: OPEN

Peer 93: 192.168.0.2 - Send (Passive) BGP OPEN: Version 4

   AS Num 64496: Holdtime 90: BGP_ID 10.46.46.46: Opt Length 20

   Opt Para: Type CAPABILITY: Length = 18: Data:

     Cap_Code GRACEFUL-RESTART: Length 2

       Bytes: 0x1 0x2c

     Cap_Code MP-BGP: Length 4

       Bytes: 0x0 0x1 0x0 0x1

     Cap_Code ROUTE-REFRESH: Length 0

     Cap_Code 4-OCTET-ASN: Length 4

       Bytes: 0x0 0x0 0xfa 0x1

Debug 8-3: Router R1 EOR Marker

41 2013/03/01 12:09:17.09 BST MINOR: DEBUG #2001 Peer 93: 192.168.0.2

"Peer 93: 192.168.0.2: UPDATE

Peer 93: 192.168.0.2 - Send BGP UPDATE:

    Withdrawn Length = 0

    Total Path Attr Length = 0

Traffic now can pass between tester T1 and tester T2 through router R1. To simulate a restarting router, the TCP session supporting the EBGP peering between R1 and T2 is disabled at the tester T2 causing router R1 to register a TCP socket error and move the BGP session from the established state to the idle state. At this point, router R1 enters Graceful-Restart helper mode.

Debug 8-4: Router R1 Entering Graceful Restart Helper Mode

1 2013/03/01 16:42:15.03 BST MINOR: DEBUG #2001 BGP

"BGP: STATE

Peer 93: 192.168.0.2 - Change State from ESTABLISHED to IDLE due to TCP SOCKET ERROR

"

2 2013/03/01 16:42:15.04 BST MINOR: DEBUG #2001 vprn6200 BGP

"BGP: RESTART

Peer VR 93: Group EBGP-GROUP: Peer 192.168.0.2: entering helper mode due to reason tcp_error

"

You can verify the Graceful Restart status using the command show router bgp neighbor graceful-restart as shown in Output 8-2. Note that the prefixes 172.16.0.0/24 to 172.16.99.0/24 are still intact in the RIB and FIB, but are internally marked as stale. It follows that the traffic from tester T1 to tester T2 is still being forwarded by router R1.

The TCP session at tester T2 is then reenabled and a new BGP OPEN message is received at router R1 from tester T2 as shown in Debug 8-5. The Graceful-Restart capability code from tester T2 is decoded as follows:

!

0x81

Restart Flag (most significant 4 bits only) set indicating router has restarted.

0x81 0x2c

Restart Time (12 bits consisting of second byte and least significant 4 bits of first byte). Four bits of first byte = 0x01 (decimal 256), second byte = 0x2c (decimal 44) indicating Restart Time of 300 seconds.

0x0 0x1

AFI (2 bytes) IP version 4

0x1

SAFI (1 byte) Unicast Forwarding

0x80

Flags for AFI/SAFI indicating Forwarding state bit set

In summary, the restarting router is notifying the receiving router that it has undergone a restart, and that during that restart the forwarding plane was kept intact. As expected, the OPEN message from R1 is essentially the same as the original BGP session establishment.

Output 8-2: Router R1 Graceful Restart Status

*A:R1# show router bgp neighbor 192.168.0.2 graceful-restart

==================================================================

BGP Neighbor 192.168.0.2 Graceful Restart

==================================================================

Graceful Restart locally configured for peer: Enabled

Peer's Graceful Restart feature             : Enabled

NLRI(s) that peer supports restart for      : IPv4-Unicast

NLRI(s) that peer saved forwarding for      : None

NLRI(s) that restart is negotiated for      : None

NLRI(s) of received end-of-rib markers      : None

NLRI(s) of all end-of-rib markers sent      : None

Restart time locally configured for peer    : 120 seconds

Restart time requested by the peer          : 300 seconds

Time stale routes from peer are kept for    : 300 seconds

Graceful restart status on the peer         : Rcvd restart request

Number of Restarts                          : 10

Last Restart at                             : 03/01/2013 16:42:15

===================================================================

Debug 8-5: Tester T2 OPEN Message following Restart

6 2013/03/01 16:43:32.73 BST MINOR: DEBUG #2001 vprn6200 BGP

"BGP: OPEN

Peer 93: 192.168.0.2 - Received BGP OPEN: Version 4

   AS Num 64510: Holdtime 90: BGP_ID 192.168.0.2: Opt Length 18

   Opt Para: Type CAPABILITY: Length = 16: Data:

     Cap_Code MP-BGP: Length 4

       Bytes: 0x0 0x1 0x0 0x1

     Cap_Code ROUTE-REFRESH: Length 0

     Cap_Code GRACEFUL-RESTART: Length 6

       Bytes: 0x81 0x2c 0x0 0x1 0x1 0x80

"

When Keepalive messages have been exchanged, the BGP session moves to the established state and T2 sends a BGP UPDATE message containing 100 x IPv4 NLRI from 172.16.0.0/24 to 172.16.99.0/24. Because these routes have now been readvertised (refreshed) by the restarting router, R1 removes the stale marker on these prefixes. However, router R1 cannot exit the Graceful Restart process until it has received the EOR marker from tester T2. When this has been received, R1 exits the Graceful Restart process.

Debug 8-6: Router R1 exits Graceful Restart helper mode

16 2013/03/01 16:43:52.72 BST MINOR: DEBUG #2001 Peer 93: 192.168.0.2

"Peer 93: 192.168.0.2: UPDATE

Peer 93: 192.168.0.2 - Received BGP UPDATE:

    Withdrawn Length = 0

    Total Path Attr Length = 0

"

18 2013/03/01 16:43:52.72 BST MINOR: DEBUG #2001 BGP

"BGP: RESTART

Peer 93: 192.168.0.2: Received EOR marker for AFI/SAFI ipv4

"

21 2013/03/01 16:43:52.72 BST MINOR: DEBUG #2001 BGP

"BGP: RESTART

Peer 93: 192.168.0.2: exit helper mode due to reason end-of-rib received

"

Finally, you can verify that R1 has exited Graceful Restart helper mode.

image

SR-OS provides Graceful Restart support for the IPv4 and VPN-IPv4 Address Families.

Output 8-3: Router R1 Graceful Restart Status

*A:R1# show router bgp neighbor 192.168.0.2 graceful-restart

==================================================================

BBGP Neighbor 192.168.0.2 Graceful Restart

==================================================================

Graceful Restart locally configured for peer: Enabled

Peer's Graceful Restart feature             : Enabled

NLRI(s) that peer supports restart for      : IPv4-Unicast

NLRI(s) that peer saved forwarding for      : IPv4-Unicast

NLRI(s) that restart is negotiated for      : None

NLRI(s) of received end-of-rib markers      : IPv4-Unicast

NLRI(s) of all end-of-rib markers sent      : None

Restart time locally configured for peer    : 120 seconds

Restart time requested by the peer          : 300 seconds

Time stale routes from peer are kept for    : 300 seconds

Graceful restart status on the peer         : Restart completed

Number of Restarts                          : 10

Last Restart at                             : 03/01/2013 16:42:15

==================================================================

Error Handling

BGP has evolved from being simply an Exterior Gateway Protocol (EGP) to a protocol that is used by operators not just for inter-AS route exchange, but also for delivery of a number of intra-AS services such as IP-VPNs (RFC 4364), VPLS (RFC 4761), and VPWS (RFC 6624), as well as other value-added services including Multicast and IPv6. These services are typically high-profile and high-revenue, but the consumers of these services have high expectations with regard to availability and reconvergence upon failure. In addition, BGP can be used as a label distribution protocol for building infrastructure LSPs, and the integrity of these LSPs are paramount to delivery of these services.

The original BGP specification mandated that if any errors are detected in an UPDATE message, the receiving BGP speaker should send a NOTIFICATION message to the peer, which in turn meant that the session was torn down and prompted the removal of any routes learned from that peer. This procedure is disruptive, and was based on early assumptions that if a BGP session was taken down, an alternate path through another AS would be available. However, with the increase in services delivered by BGP (most of them intra-AS), its survivability has become critical, and therefore some changes to the original specification were required with regard to error handling. These updates are specified in draft-ietf-grow-ops-reqs-for-bgp-error-handling, which defines the requirements for increased protocol robustness and outlines the proposals for error handling while capturing the work of a number of previous proposals on increased error handling. Within this specification, error handling for UPDATE messages is decomposed into two categories; critical errors and non-critical errors.

Critical errors pertain to UPDATE messages where the NLRI cannot be extracted from the UPDATE because of, for example, message length errors. Because the NLRI cannot be extracted, a NOTIFICATION message must be sent to the advertising router. However, in an attempt to maintain forwarding during the session reset, the Graceful Restart mechanism is extended so that it can be triggered by a NOTIFICATION message, which as outlined previously is negotiated as a Graceful Restart capability by setting the “N” bit of the Restart Flags field.

When a BGP session enters the Graceful Restart process as a result of a critical error, there is a reasonable chance that the same error will occur when trying to reestablish the session during the Graceful Restart process. However, the process of reestablishing the BGP session and readvertising the RIB-OUT places some demand on CPU resource, so this cannot go unbounded. Therefore if a critical error/NOTIFICATION message has caused the router to enter the Graceful Restart process and another UPDATE message with a critical error is received as the session is being reestablished, the Graceful Restart process is immediately terminated without any further attempt to preserve forwarding state. To abort the process, a NOTIFICATION message is sent with the Cease Error Code and a Hard Reset Error subcode indicating that the BGP session must be terminated.

Noncritical errors are errors where the NLRI can be extracted from the UPDATE message. Examples include invalid length errors in path attributes, missing mandatory attributes, and UPDATE messages containing attributes that do not relate to the advertised NLRI. In the event of a non-critical error, a BGP speaker does not send a NOTIFICATION message, but instead should treat the NLRI as though it had been withdrawn and remove it from the RIB-IN. This behavior is referred to as “treat-as-withdraw” and has no impact on the integrity of the BGP session. To increase the likelihood of being able to extract the NLRI, all Address Families should use MP_REACH and MP_UNREACH NLRI (including IPv4 unicast), and this attribute should always be the first in an UPDATE message.

The error handling enhancements for non-critical errors are enabled in SR-OS at the BGP, group, and neighbor level within both the base instance and VPRN instances of BGP. The error-handling context allows for the addition of the update-fault-tolerance command, which enables the treat-as-withdraw handling of non-critical errors. Enhancements to critical error handling to allow NOTIFICATION triggered graceful restart are enabled at group and neighbor level using the enable-notification command within the graceful-restart context.

Output 8-4: Error Handling Configuration

        bgp

            group "IBGP"

                graceful-restart

                    enable-notification

                exit

                error-handling

                    update-fault-tolerance

                exit

            exit

Note that if the enable-notification command is enabled (and the peer indicated its support of Graceful Restart for NOTIFICATON messages), but the update-fault-tolerance command is not, a non-critical error causes a NOTIFICATION message to be sent, but Graceful Restart procedures are initiated.

Referring again to the topology in Figure 8-2 and using the configuration from Output 8-5 applied to router R1, you can verify that the NOTIFICATION-triggered Graceful Restart functions as expected. Remember that this Graceful Restart extension has to be negotiated as a capability during an OPEN exchange by setting the “N” bit of the Graceful Restart capability Restart Flags field. Debug 8-7 shows the OPEN message from R1 where the first four bits of the first byte (0x41) indicate that the “N” bit is set. This is followed in Debug 8-8 with the OPEN message from tester T2 with the same bit set. The fact that the peer has actually signaled its support of Graceful Restart for NOTIFICATION extension can be confirmed in Output 8-5.

Debug 8-7: OPEN Message from R1 with NOTIFICATION Graceful Restart Enabled

23 2013/07/26 08:55:31.42 BST MINOR: DEBUG #2001 Base BGP

"BGP: OPEN

Peer 1: 192.168.0.2 - Send (Active) BGP OPEN: Version 4

   AS Num 64496: Holdtime 90: BGP_ID 10.46.46.46: Opt Length 24

   Opt Para: Type CAPABILITY: Length = 22: Data:

     Cap_Code GRACEFUL-RESTART: Length 6

       Bytes: 0x41 0x2c 0x0 0x1 0x1 0x0

     Cap_Code MP-BGP: Length 4

       Bytes: 0x0 0x1 0x0 0x1

     Cap_Code ROUTE-REFRESH: Length 0

     Cap_Code 4-OCTET-ASN: Length 4

       Bytes: 0x0 0x0 0x11 0xed

"

Debug 8-8: OPEN message from T2 with NOTIFICATION Graceful Restart Enabled

24 2013/07/26 08:55:31.41 BST MINOR: DEBUG #2001 Base BGP

"BGP: OPEN

Peer 1: 146.135.235.2 - Received BGP OPEN: Version 4

   AS Num 64510: Holdtime 90: BGP_ID 192.168.0.2: Opt Length 16

   Opt Para: Type CAPABILITY: Length = 14: Data:

     Cap_Code MP-BGP: Length 4

       Bytes: 0x0 0x1 0x0 0x1

     Cap_Code GRACEFUL-RESTART: Length 6

       Bytes: 0x41 0x2c 0x0 0x1 0x1 0x0

"

Output 8-5: Graceful Restart For NOTIFICATION Capability

*A:R1# show router bgp neighbor 146.135.235.2 graceful-restart

==========================================================================

BGP Neighbor 146.135.235.2 Graceful Restart

==========================================================================

Graceful Restart locally configured for peer: Enabled

Peer's Graceful Restart feature             : Enabled

NLRI(s) that peer supports restart for      : IPv4-Unicast

NLRI(s) that peer saved forwarding for      : None

NLRI(s) that restart is negotiated for      : IPv4-Unicast

NLRI(s) of received end-of-rib markers      : None

NLRI(s) of all end-of-rib markers sent      : IPv4-Unicast

NLRI(s) peer supports NOTIFICATION GR for   : IPv4-Unicast

Restart time locally configured for peer    : 300 seconds

Restart time requested by the peer          : 90 seconds

Time stale routes from peer are kept for    : 360 seconds

Graceful restart status on the peer         : Not currently being helped

Number of Restarts                          : 3

Last Restart at                             : 07/08/2013 09:06:21

==========================================================================

To check that forwarding state is preserved throughout, traffic is forwarded from tester T1 to tester T2 while a NOTIFICATION message (error code Cease/error sub-code Invalid Network Field) is generated from tester T2 toward R1. Debug 8-9 shows the NOTIFICATION message being received at R1 followed by the router entering Graceful Restart helper mode.

Debug 8-9: Notification Triggered Graceful Restart

246 2013/07/08 11:09:53.67 BST MINOR: DEBUG #2001 Base Peer 1: 192.168.0.2

"Peer 1: 192.168.0.2: NOTIFICATION

Peer 1: 192.168.0.2 - Received BGP NOTIFICATION: Code = 6 (CEASE) Subcode = 10

 (Unknown)

  Data Length = 16  Data: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x

0 0x0 0x0

"

247 2013/07/08 11:09:53.66 BST MINOR: DEBUG #2001 Base BGP

"BGP: RESTART

Peer VR 1: Group EBGP-IxN2X: Peer 192.168.0.2: entering helper mode due to reason received_notification

"

After entering Graceful Restart helper mode because of the NOTIFICATION being received, router R1 attempts to restore the session by initiating an OPEN exchange (not shown). In this example, the exchange and readvertisement of NLRI is successful, and router R1 exits the helper mode when the EOR marker is received from tester T2. As expected, forwarding state is preserved throughout.

You can qualify the handling of non-critical errors by sending an UPDATE message from tester T2 to router R1 with a missing mandatory attribute; in this case the AS_PATH attribute.

Debug 8-10: UPDATE With Missing AS_PATH

281 2013/07/08 11:37:14.00 BST MINOR: DEBUG #2001 Base Peer 1: 192.168.0.2

"Peer 1: 192.168.0.2: UPDATE

Peer 1: 192.168.0.2 - Received BGP UPDATE:

    Withdrawn Length = 0

    Total Path Attr Length = 11

    Flag: 0x40 Type: 1 Len: 1 Origin: 0

    Flag: 0x40 Type: 3 Len: 4 Nexthop: 192.168.0.2

    NLRI: Length = 4

        172.16.99.0/24

"

Without the updated error handling enhancements, such an UPDATE message would have triggered a NOTIFICATION message to be sent by R1 and the BGP session would be torn down, but with update-fault-tolerance enabled, the integrity of the BGP session is maintained and the UPDATE is simply treated as a withdraw and not held in RIB-IN. Note that if an UPDATE is treated as a withdraw, SR-OS generates an error message containing the affected peer and prefix as shown in Debug 8-11 and additionally increments an “Update Errors” counter in the show router bgp neighbor command as shown in Output 8-6.

Debug 8-11: Missing Mandatory Attribute with Treat As Withdraw

1224 2013/10/03 14:53:31.53 BST WARNING: BGP #2029 Base Peer 1: 192.168.0.2

"1: BGP Peer: 192.168.0.2, Route: 172.16.99.0/24 withdrawn because of error in BGP update message."

Output 8-6: Update Errors Counter

*A:R1# show router bgp neighbor 192.168.0.2 | match "Update Errors"

Damp Peer Oscillatio*: Disabled         Update Errors        : 1  

In addition to the error handling enhancements for critical and non-critical errors, SR-OS supports the capability to hold a BGP session in an idle state for exponentially increasing amounts of time to reduce the impact of continually trying to reestablish a BGP session. This exponential back-off logic is activated whenever a BGP session transitions from a non-idle state to an idle state and is enabled using the damp-peer-oscillations command at BGP, group, and neighbor levels (both within the base instance and within VPRN instances of BGP). The damp-peer-oscillations command has optional and configurable parameters to control the amount of time the session is held in idle state, known as idle-hold-time, after a transition from a non-idle state. The initial-wait defines the idle-hold-time (in seconds) after the first transition to idle state, while subsequent idle-hold-times are derived by increasing intervals of the second-wait value until the max-wait is reached. In addition to the parameters used to control the idle-hold-time, the error-interval defines the amount of time (in minutes) that must pass without any transitions before all back-off logic timers are reset.

The damp-peer-oscillations feature behaves completely independently of any error-handling functions described earlier in this section. However, if error-handling is enabled, it may be beneficial to enable damp-peer-oscillations to control, for example, a BGP session that is trying to reestablish with a recurring critical error.

Output 8-7: Damp-Peer-Oscillation Configuration

        bgp

            damp-peer-oscillations idle-hold-time 0 120 1600 error-

interval 32

            exit