Bugs and limitations

Table of Contents

Summary

This chapter describes all known bugs, caveats and limitations of Traffic Dictator.

Bugs

CLI: piped bash commands are not supported after section

Issue #16

Impact: cosmetic

So you can’t do the following

knecht#sh run | sec bgp | head

However, multiple bash commands after pipe work fine.

knecht#sh run | grep "router bgp" -A 20 | head

Config handler – under traffic-eng nodes, deleting neighbor affinity deletes all affinities for this neighbor

Issue #30

Impact: minor

Be careful with deleting affinities that can impact live policies until this is fixed.

FIXED in TD 1.1

Config handler – if command has exclusive=True, and the command fails due to incorrect syntax, the existing setting still gets deleted

Issue #31

Impact: minor

Only API is affected (CLI has safeguards). Currently affected commands are under TE Policy configuration:

endpoint ... [color|service-loopback] ...
install [direct|indirect] ...

API returns unsorted BGP table outputs

Issue #25

Impact: cosmetic

CLI sorts them so they look in order. But BGP-LS is still unsorted even on CLI. Because LS NLRI format is very peculiar and so is difficult to sort.

Also in show topology outputs, BGP-LU routes for each egress node are unsorted.

When processing a very large API query, command server might complain that some processes fail

Issue #27

Impact: minor

If an API query takes >10 seconds, it will exceed keepalive interval and command server will print CRITICAL log messages that some processes failed.

If “show threads” shows that all processes are ok, these log messages can be ignored. Even when the large API query is being processed, BGP and Policy server will keep working and processing messages/recalculating policies etc. Only API and CLI will not respond during this time.

Limitations

 

Multi-topologies policies with null endpoint

Issue #28

Class: design decision

If a null endpoint can be resolved in headend topology, policy engine will always try to calculate the path in local topology. If explicit path points to another topology, the policy will fail.

In order to find null endpoints in remote topology, local topology must not have any endpoints with matching constraints.

BGP-LU policies don’t support ECMP on first hop or multiple segment lists

Issue #22

Support for either ECMP on first hop or multiple SL will require add-path with BGP-LU which is not currently supported.

BGP links can be only in the middle of explicit path indexes

Issue #15

Class: design decision

Policy cannot begin with a BGP-only headend, and can end with BGP-endpoint only if this is an EPE policy.

However last explicit path index must be in ISIS or OSPF topology. BGP links can only be in the middle of explicit path, to connect IGP domains.

When CSPF has bandwidth constraint, it requires 100% of requested bandwidth

Issue #5

With ECMP it might be possible to use less than 100% bandwidth on each link because traffic will be shared across multiple paths.

Deferring this until the implementation of Container LSP.

Bandwidth reservation algorithm reserves 100% of requested bandwidth, even on ECMP links

Issue #6

Similar to issue 5, this is deferred until the implementation of container LSP

IPv6 BGP sessions using link-local addresses are not supported

Issue #1

When a BGP session is configured, shut/no shut or resets, TD will attempt to resolve local IP for the session from the routing table on host machine, using ip route get. Normally this returns ‘RTA_PREFSRC’ set to the source IP of the interface used to send traffic to the destination.
For IPv6, it will return global unicast IPv6 address when it’s configured. However, if you delete global unicast IPv6 but leave a link local IPv6 on the interface, this will return a link local IPv6 and BGP session won’t come up (unless dst is also link local). Solution is when deleting a global unicast IPv6 is to also delete link local IPv6, or just do ip -6 addr flush dev ; then shut/no shut BGP session.

This has been fixed in BGP code by not letting source address be resolved to one starting with “fe80” so it will keep trying until a global unicast address is found. The downside is that a BGP session can’t be setup using link-local addresses.

If multiple egress ASBR advertise the same egress peer IP as BGP-LU, only best path is considered

Issue #24

Class: protocol limitation

This is the limitation of BGP: only prefix is considered NLRI, and if there are multiple of copies of the same NLRI, BGP best path kicks in and only the best route will get to topology handler.

If network design necessitates same egress peer from multiple egress ASBR (e.g. in IXP), best practice is to use BGP peer SID and not BGP-LU.

First link in BGP-LU policy cannot be unnumbered

Issue #23

Class: protocol limitation

Because to push an LU policy, we need first nexthop.

In multi-domain policies with anycast SID, second and following topologies CSPF must be same as SPF

Issue #19

Class: design decision

This applies only to anycast SID with >1 nodes in the second and following topologies in the path. No extra SID may be required to steer traffic across the topology.

ISIS overload bit is not checked for strict explicit path indexes

Issue #13

Class: design decision

When ISIS overload bit is set, the node is excluded from SPF calculation. Traffic dictator honours that and nodes with overload bit are ignored during CSPF towards endpoint or towards a loose explicit path index.

However, when checking strict explicit path indexes, overload bit is not checked.

When calculating CSPF between 2 ISIS L1L2 nodes, L1 circuit is ignored

Issue #12

Class: design decision

If a link between L1L2 nodes is configured as L1L2 (default), BGP-LS will generate 2 separate NLRI; one for L1 and another for L2. Traffic dictator will ignore L1 NLRI in this case.

Link metric and attributes on both links will be the same anyway, and configuring circuit as L1-only would be a bad configuration because L2 must be contiguous. So this behaviour should not have any drawbacks.

Same applies to OSPF area 0 (like IS-IS L2) and non-area 0 (like IS-IS L1).

Strict explicit path index is not allowed after anycast loose index

Issue #11

Class: protocol limitation

If a loose explicit path index is an anycast IP, and the following index is strict, this will result in undefined behaviour and is therefore not allowed.

Strict explicit path index is not allowed after exclude index

Issue #10

Class: protocol limitation

Exclude index means we want to exclude certain nodes or links from CSPF calculation to endpoint or to the next loose explicit path index. Strict index after loose will result in undefined behaviour and is therefore not allowed.

Strict explicit path index is not supported with MT-ISIS in multi-domain policies

Issue #9

Class: design decision

When calculating a multi-domain SR-TE policy, we are currently at a border node (a node that participates in multiple IGP domains), and next explicit path index is strict IPv6, MT-ISIS links will not be considered to jump to next topology.

Loose explicit path index is supported in this scenario; also both strict and loose indexes are supported within the same IGP domain.

Mixed-AF explicit path within the same IGP domain do not support MT-ISIS

Issue #8

Class: protocol limitation

Explicit path can have a mix of IPv4 and IPv6 indexes. It is possible to an SR-TE policy to go across multiple IGP domains, some IPv4-only or IPv6-only, including multi-topology ISIS.

However within the same IGP domain, if the endpoint has non-zero MT-ID, path calculation will consider only links with the same MT-ID (per RFC 5120). Therefore it is not possible to mix IPv4 and IPv6 explicit path indexes within an IGP domain with MT-ISIS.

Null-endpoint policies must include affinity or bandwidth in order to find a suitable egress peer

Issue #7

Class: design decision

When a policy is configured with null endpoint (0.0.0.0 or ::), Traffic Dictator assumes this is an EPE policy. Then it checks affinity and bandwidth constraints of the candidate path against configured egress peers to find the suitable peer. If candidate path has neither affinity nor bandwidth constraint configured, the policy will fail.

Explicit path strict does not work with unnumbered links

Issue #4

Class: protocol limitation

BGP-LS NLRI for unnumbered links do not include neighbor IP addresses, therefore it is not possible to calculate strict explicit path indexes through unnumbered links.

Explicit path strict does not work on ISIS broadcast networks

Issue #3

Class: protocol limitation

BGP-LS NLRI for ISIS pseudonode does not include neighbor IP addresses. Therefore, it is not possible to resolve explicit path strict via ISIS broadcast segments. OSPF does not have this limitation because BGP-LS NLRI for OSPF pseudonode includes neighbor IP.

Caveats

 

Multi-instance IGP design is preferred to multi-level due to how SPF caching works

Issue #26

Class: design decision

In order to resolve segment list, TD needs to compare CSPF path with default SPF path. This comparison can potentially result in a lot of repeated SPF calculations, especially when many policies are recalculated. To optimize this behaviour, TD caches SPF calculation results (distahces and prev).

SPF caches are stored in HashMap<i32, HashMap<String, SpfCache>> with outer cache key being topology-id and inner cache key being router-id.

SpfCache struct:

pub struct SpfCache {
pub distances_cache: HashMap<String, i32>,
pub prev_cache: HashMap<String, Vec<(String, IgpLink)>>,
pub protocol: String,
pub level: String,
pub hit_count: i32
}

So if SPF is calculated for same router-ID but in a different level, SPF cache will be rewritten. This will result in more SPF calculations, potentially slowing down policy recalculation, especially on a large topology with a lot of policies.

Therefore, when multiple IGP domains are required, best practice is to use multiple IGP instances and not multi-level/multi-area IGP

SR capability flags are not checked on node NLRI

Issue #17

Class: 3rd party bug

Because some implementations (e.g. XR) don’t always set those flags.

Policy name TLV advertisement is disabled due to IOS-XR bug

Issue #14

Class: 3rd party bug

Policy name (https://datatracker.ietf.org/doc/html/draft-ietf-idr-segment-routing-te-policy-26#name-policy-name-sub-tlv) can be advertised in BGP SR-TE update so that the user will see the configured policy name on a router instead of some auto-generated shyte.

However when IOS-XR receives an SR-TE route with TLV it doesn’t understand, and then it has to advertise this route to another peer (e.g. route reflector sends route to policy target router), XR doesn’t include unsupported TLV but keeps the tunnel encap attribute length from the original received update. This results in malformed packet, so I have to disable policy name TLV.

CLI grep after pipe doesn’t properly match string with special symbols

Issue #2

E.g. this can happen when trying to filter BGP-LS or BGP-SRTE NLRI.

Workaround: use fgrep or grep -F