Summary
Traffic Dictator version 1.5 has been released on 08.05.2025. This article describes changes in the new version.
New feature: Controller Redundancy
Now it is possible to synchronize parts of the configuration between 2 instances of TD. There is no need for any state sync in SR-TE; and config synchronization can be achieved without this feature – e.g. if you configure a network automation tool to add the same config to multiple TD instances.
However, if the operator prefers to configure TD manually using CLI or GUI, this new feature can be useful so that any SR-TE config changes on one controller are automatically replicated to the other one.
As of 1.5, the following config sections are synchronized:
- traffic-eng affinities
- traffic-eng policies
- traffic-eng peer-groups
- traffic-eng explicit-paths
New config commands:
management redundancy ! key role [master|backup] neighbor <ipv4|ipv6>
Typical redundancy designs
BGP-LU and PCEP require session with each router where traffic engineering policies are pushed. Therefore, each router will have a session with each controller and will receive 2 copies of an SR-TE policy. Any SR-TE config changes will be synchronized between master and backup TD instances.
With BGP-SRTE, it is possible to configure sessions between TD and a route reflector, which will propagate SR-TE policies to other routers. There is no need to have a BGP session with each router. Master and backup TD instances are configured with different SR-TE distinguisher (“router general” section is not syncable so different controllers can have different config), and the route reflector will receive and propagate 2 different BGP-SRTE NLRI. Therefore, if one of the controllers fails, the router will already have an SR-TE policy from the second controller, so there will be no interruption.
Redundancy config example
Master config:
management redundancy key my_redundancy_key role master neighbor 172.17.0.2
Backup config:
management redundancy key my_redundancy_key role backup neighbor 172.17.0.1
How config sync works
The backup instance initates connection on TCP port 2011 to the master instance. Initially config_version is set to 0, and after any redundancy config changes, config_version is reset to 0. When config_version is 0 on both ends, the backup TD deletes syncable config sections (see above) and receives config from master. Any changes in syncable config section (either on master or backup) are synchronized to the other peer, incrementing config_version upon each change.
Failure scenarios
- Backup TD fails: when backup comes up again, it will sync config from master.
- Master TD fails: when master comes up again, master config_version will be 0, but backup config_version is higher, so the master will sync config from backup.
- Split brain: when session comes up again, both backup and master will have config_version other than 0. In this case, backup will sync config from master, effectively deleting all config changes made on backup since split brain occurred. Therefore, it is recommended to make any config changes only on master.
In the future versions, it is possible that config changes on backup will be locked as long as redundancy session is active. Right now they are allowed, but not recommended.
Verification and troubleshooting
One TD must be configured as master and the other as backup, and communication on tcp/2011 must be allowed between the 2 controllers.
Redundancy key must match (this is to protect from misconfigurations).
Both TD instances must have the same software version. During software upgrade, it is ok to break redundancy – it is not a critical element, it just synchronizes config changes. There is no state sync between TD instances.
Verify redundancy status:
TD1#sh redundancy
Redundancy session statistics
Role: backup
Key: my_redundancy_key
Neighbor IP: 172.17.0.1
Redundancy server running: True
Config version: 1
Command server queue: 0
Greenthreads available: 996
Config changes queued: 0
Running sessions count: 1
Running sessions: ['172.17.0.1']
Redundancy neighbor is 172.17.0.1, local IP 172.17.0.2
Redundancy version 15
Last read 0:00:06, last write 0:00:22
Hold time is 120, keepalive interval is 30 seconds
Hold timer is active, time left 0:01:54
Keepalive timer is active, time left 0:00:08
Connect timer is inactive
Idle hold timer is inactive
Session state is Established, up for 0:44:53
Number of transitions to established: 1
Last state was OpenConfirm
Sent Rcvd
Opens: 1 1
Updates: 0 1
Closes: 0 0
Keepalives: 90 91
Total messages: 91 93
Debug command:
TD1#debug redundancy ?
Policy engine improvements
Policy debugging
Thanks to log-reload rust crate, it is now possible to enable detailed debugging to troubleshoot policy engine issues.
Debugs:
TD1#debug traffic-eng policy ? server Policy server debug engine Policy engine debug name Debug a specific policy
Debug a specific policy (or all policies):
TD1#debug traffic-eng policy name ? <POLICY_NAME|*> Debug traffic engineering policy calculation
Policy debug example
TD1#debug traffic-eng policy name R1_ISP5_BLUE_ONLY_IPV4 Enabled debugging for Policy R1_ISP5_BLUE_ONLY_IPV4 TD1#clear traffic-eng * Requested manual reoptimization of all policies
Check debugs:
TD1#show logg | grep R1_ISP5_BLUE_ONLY_IPV4 2025-05-08 09:37:23,327 TD1 WARNING: Policy-server: Enabling debug for Policy R1_ISP5_BLUE_ONLY_IPV4 2025-05-08 09:37:23,486 TD1 WARNING: Policy-engine: Enabled debug for policy R1_ISP5_BLUE_ONLY_IPV4 2025-05-08 09:37:32,530 TD1 DEBUG: Policy-engine: Starting calculating policy R1_ISP5_BLUE_ONLY_IPV4 2025-05-08 09:37:32,633 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: resolving headend 2025-05-08 09:37:32,633 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: resolved headend to 0001.0001.0001.00, protocol isis, topology_id 101 2025-05-08 09:37:32,633 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: resolving SRLB range 2025-05-08 09:37:32,633 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: resolved SRLB base 15000, range 1000 2025-05-08 09:37:32,633 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: checking candidate path 2025-05-08 09:37:32,633 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: calculating dynamic candidate path 100 2025-05-08 09:37:32,633 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: dynamic candidate path 100 - resolving SID structure 2025-05-08 09:37:32,634 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: dynamic candidate path 100 - generating segment lists 2025-05-08 09:37:32,634 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: dynamic candidate path 100 - attaching EPE label 2025-05-08 09:37:32,634 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: dynamic candidate path 100 - found EPE label 24015, local_ip 10.100.20.11, remote_ip 10.100.20.105 2025-05-08 09:37:32,634 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: dynamic candidate path 100 - checking MSD 2025-05-08 09:37:32,634 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: dynamic candidate path 100 - reserving bandwidth 2025-05-08 09:37:32,634 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: successfully calculated candidate path 100 2025-05-08 09:37:32,634 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: generating route_key 2025-05-08 09:37:32,634 TD1 DEBUG: Policy-engine: Policy R1_ISP5_BLUE_ONLY_IPV4: generated route_key [96][16777220][4][10.100.20.105]
Verify policy by key:
TD1#show pcep ipv4 sr-te [96][16777220][4][10.100.20.105]
PCEP SR-TE routing table information
PCEP routing table entry for [96][16777220][4][10.100.20.105]
Policy name: R1_ISP5_BLUE_ONLY_IPV4
Headend: 1.1.1.1
Endpoint: 10.100.20.105, Color 4
Install peer: 192.168.0.101
Last modified: May 08, 2025 09:37:33
Route acked by PCC, PLSP-ID 2
LSP-ID Oper status
2 Active (2)
Metric type igp, metric 40
Binding SID: 15004
ENLP: "none", Override: True
Segment list: [16010, 24013, 24015]
Limitations
Due to kafka rust crate limitations, it is not yet possible to print debugs from all functions I would like to. This will be hopefully solved in the future.
Path failure reason
Prior to 1.5, if a candidate path failed, TD sometimes would give reason (e.g. invalid config or can’t find headend or endpoint), but often it gave a generic error “Error when resolving segment list”:
TD1#show traffic-eng policy R1_ISP5_BLUE_ONLY_IPV4 detail
Detailed traffic-eng policy information:
Traffic engineering policy "R1_ISP5_BLUE_ONLY_IPV4"
Valid config, Reason failed: All candidate paths failed
Headend 1.1.1.1, topology-id 101, Maximum SID depth: 10
Endpoint 10.100.20.105, color 4
Setup priority: 7, Hold priority: 7
Install direct, protocol pcep, peer 192.168.0.101
Policy index: 4, SR-TE distinguisher: 16777220
Binding-SID: 15004
Candidate paths:
Candidate-path preference 100
Path config valid
Metric: igp
Path-option: dynamic
Affinity-set: BLUE_ONLY
Constraint: include-all
List: ['BLUE']
Value: 0x1
Path failed, reason: Error when resolving segment list
Policy statistics:
Last config update: 2025-05-07 17:40:59,251
Last recalculation: 2025-05-07 17:41:44.405
Policy calculation took 0 miliseconds
Now path failure reason is more verbose; for example:
TD1#show traffic-eng policy R1_ISP5_BLUE_ONLY_IPV4 detail
Detailed traffic-eng policy information:
Traffic engineering policy "R1_ISP5_BLUE_ONLY_IPV4"
Valid config, Reason failed: All candidate paths failed
Headend 1.1.1.1, topology-id 101, Maximum SID depth: 10
Endpoint 10.100.20.105, color 4
Setup priority: 7, Hold priority: 7
Install direct, protocol pcep, peer 192.168.0.101
Policy index: 4, SR-TE distinguisher: 16777220
Binding-SID: 15004
Candidate paths:
Candidate-path preference 100
Path config valid
Metric: igp
Path-option: dynamic
Affinity-set: BLUE_ONLY
Constraint: include-all
List: ['BLUE']
Value: 0x1
Path failed, reason: SPF failed
Policy statistics:
Last config update: 2025-05-07 17:55:55,717
Last recalculation: 2025-05-07 17:56:31.301
Policy calculation took 0 miliseconds
This means CSPF with the given constraints is not possible in the topology.
Another example:
TD1#show traffic-eng policy R1_ISP5_BLUE_ONLY_IPV4 detail
Detailed traffic-eng policy information:
Traffic engineering policy "R1_ISP5_BLUE_ONLY_IPV4"
Valid config, Reason failed: All candidate paths failed
Headend 1.1.1.1, topology-id 101, Maximum SID depth: 10
Endpoint 10.100.20.105, color 4
Setup priority: 7, Hold priority: 7
Install direct, protocol pcep, peer 192.168.0.101
Policy index: 4, SR-TE distinguisher: 16777220
Binding-SID: 15004
Candidate paths:
Candidate-path preference 100
Path config valid
Metric: igp
Path-option: dynamic
Affinity-set: BLUE_ONLY
Constraint: include-all
List: ['BLUE']
Value: 0x1
Path failed, reason: Unable to get Prefix SID for node 0010.0010.0010.00
Policy statistics:
Last config update: 2025-05-07 17:55:55,717
Last recalculation: 2025-05-07 17:59:51.060
Policy calculation took 0 miliseconds
This means TD was able to calculate CSPF, but to steer traffic over the path, it needs a prefix SID from node 0010.0010.0010.00, and such prefix SID is not available.
Bug fixes
1. When TD container is stopped and started, sometimes kafka fails to start (bug #42). Race condition in kafka, configured to restart kafka on failure.
2. Incorrect display of uptime in CLI (bug #43). Only happens with >1 month uptimes, and only affects CLI. Changed display format.
Download
You can download the new version of Traffic Dictator from the Downloads page.

