Table of Contents

Summary

This chapter describes dual controller redundancy configuration that allows 2 TD instances to sync traffic engineering config with each other.

Controller redundancy overview

Starting from 1.5, it is possible to synchronize parts of the configuration between 2 instances of TD. There is no need for any state sync in SR-TE; and config synchronization can be achieved without this feature – e.g. if you configure a network automation tool to add the same config to multiple TD instances.

However, if the operator prefers to configure TD manually using CLI or GUI, this new feature can be useful so that any SR-TE config changes on one controller are automatically replicated to the other one.

Starting from 1.5, the following config sections are synchronized:

traffic-eng affinities
traffic-eng policies
traffic-eng peer-groups
traffic-eng explicit-paths

Starting from 1.8, all config is synchronized, with exception of:

1. SR-TE distinguisher. Backup TD will preserve configured BGP router-id during config sync. If backup TD have SR-TE distinguisher configured to “1” (default value), the backup TD will take master TD SR-TE distinguisher and increment it by 1, so that both TD have different distinguishers. See [1], [2] for more details.

2. BGP router-id. Backup TD will preserve configured BGP router-id during config sync. If none is configured, the backup TD BGP process will be inactive.

3. PCEP init-delay. Backup TD will preserve configured PCEP init-delay for config sync. If none is configured, it will take init-delay config or master and add or remove 5 seconds (depending on value), to keep it different.

New config commands:

management redundancy
   !
   key 
   role [master|backup]
   neighbor <ipv4|ipv6>

Typical redundancy designs

BGP-LU and PCEP require session with each router where traffic engineering policies are pushed. Therefore, each router will have a session with each controller and will receive 2 copies of an SR-TE policy. Any SR-TE config changes will be synchronized between master and backup TD instances.

With BGP-SRTE, it is possible to configure sessions between TD and a route reflector, which will propagate SR-TE policies to other routers. There is no need to have a BGP session with each router. Master and backup TD instances are configured with different SR-TE distinguisher (“router general” section is not syncable so different controllers can have different config), and the route reflector will receive and propagate 2 different BGP-SRTE NLRI. Therefore, if one of the controllers fails, the router will already have an SR-TE policy from the second controller, so there will be no interruption.

Redundancy config example

Master config:

management redundancy
   key my_redundancy_key
   role master
   neighbor 172.17.0.2

Backup config:

management redundancy
   key my_redundancy_key
   role backup
   neighbor 172.17.0.1

How config sync works

The backup instance initates connection on TCP port 2011 to the master instance. Initially config_version is set to 0, and after any redundancy config changes, config_version is reset to 0. When config_version is 0 on both ends, the backup TD deletes syncable config sections (see above) and receives config from master. Any changes in syncable config section (either on master or backup) are synchronized to the other peer, incrementing config_version upon each change.

Failure scenarios

Backup TD fails: when backup comes up again, it will sync config from master.
Master TD fails: when master comes up again, master config_version will be 0, but backup config_version is higher, so the master will sync config from backup.
Split brain: when session comes up again, both backup and master will have config_version other than 0. In this case, backup will sync config from master, effectively deleting all config changes made on backup since split brain occurred. Therefore, it is recommended to make any config changes only on master.

In the future versions, it is possible that config changes on backup will be locked as long as redundancy session is active. Right now they are allowed, but not recommended.

Verification and troubleshooting

One TD must be configured as master and the other as backup, and communication on tcp/2011 must be allowed between the 2 controllers.

Redundancy key must match (this is to protect from misconfigurations).

Both TD instances must have the same software version. During software upgrade, it is ok to break redundancy – it is not a critical element, it just synchronizes config changes. There is no state sync between TD instances.

Verify redundancy status:

TD1#sh redundancy
Redundancy session statistics

  Role:                          backup              
  Key:                           my_redundancy_key   
  Neighbor IP:                   172.17.0.1          
  Redundancy server running:     True                
  Config version:                1                   
  Command server queue:          0                   
  Greenthreads available:        996                 
  Config changes queued:         0                   
  Running sessions count:        1                   
  Running sessions:              ['172.17.0.1']      

Redundancy neighbor is 172.17.0.1, local IP 172.17.0.2
  Redundancy version 15
  Last read 0:00:06, last write 0:00:22
  Hold time is 120, keepalive interval is 30 seconds
  Hold timer is active, time left 0:01:54
  Keepalive timer is active, time left 0:00:08
  Connect timer is inactive
  Idle hold timer is inactive
  Session state is Established, up for 0:44:53
  Number of transitions to established: 1
  Last state was OpenConfirm
                         Sent       Rcvd
    Opens:                  1          1
    Updates:                0          1
    Closes:                 0          0
    Keepalives:            90         91

    Total messages:        91         93

Debug command:

TD1#debug redundancy ?