# Predicting LAG and ECMP Paths ## The Short Version, Verbosely Among the big names, Junos, Arista EOS, and Nokia SR OS will let you fire a synthetic probe straight at the ASIC and read the literal egress member the hardware selects. Cisco’s IOS-XE and IOS-XR pretend to do the same but merely copy what CEF computed in control-plane RAM, which means their “exact-route” commands go stale the moment the ASIC team twiddles a salt bit. Cumulus and SONiC give you nothing at all; you end up scraping registers or writing your own Python to poke the SAI. Standards fans can try an RFC 5837 TTL-expired probe, yet vendor support is spotty and you still need in-band traffic to trigger it. ## Why Hardware Accuracy Is Not a Pedantic Detail When you are chasing a hot link in a 32-member bundle, the difference between “what the RP thinks” and “what the ASIC does” is the difference between sleeping tonight and paging the SE at 02:00. IOS-XE happily reports that a given flow lands on member 1. Reality, captured on the tap, shows member 3 taking the full brunt. Engineers then waste hours looking for phantoms in QoS or policing policies. With a data-plane probe, you ask the forwarding engine directly. The answer is incontrovertible. ## How to Question the Box The dance is almost embarrassingly simple: 1. Grab the five-tuple that defines your flow. If MPLS or VLAN tags are in play, keep those handy too. 2. Feed the tuple to the vendor incantation that actually hits the ASIC. On Junos that means: ```bash test forwarding path bundle-hash ingress ae0 \ src-ip 10.0.1.1 dst-ip 10.0.9.9 protocol tcp \ src-port 12345 dst-port 443 ``` The PFE responds with something like “ae0.3 (xe-1/0/7)”. That is your gospel. 3. If polarization is ugly, tweak the salt. Junos exposes `forwarding-options hash-key`, EOS hands you `port-channel load-balance hash-polynomial`, and SR OS gives `tools traffic-hash`. Add or shuffle L4 ports when entropy is thin. 4. Re-probe until distribution looks sane rather than vindictive. ## A Real-World Junos Episode A CDN’s NTP spray was crushing ae2 on an MX204 to the tune of a 70/30 skew. Every probe told the same story: the ASIC loved member 0 too much. One knob flip later ```bash set forwarding-options hash-key family inet l4-src-port commit ``` and the probes began to distribute evenly across all four members. Counters followed suit, the CRB call was cancelled, and life resumed. ## Cisco XR’s Convenient Fiction In the lab, an NCS-5501-SE with Jericho2 silicon reported, via `show cef exact-route`, that the flow was bound for bucket 0. A low-level register read said bucket 5. A packet capture concurred with the register. Three TAC cases and as many slipped “Q4” promises later, the lesson is clear: XR’s CLI is bedtime literature, not an affidavit. ## Edge Cases the Marketing Slides Skip MPLS ECMP often gets ignored by the human-friendly commands altogether. IPv6 flow-label hashing is partly implemented, partly forgotten. INT headers tend to collapse the hash because the ASIC punts on unknown ethertypes. In every case, hardware probing is the only honest test. ## When to Tune, When to Re-Architect Minor skew in a bundle smaller than sixty-four links is fixable by seed tweaks and salt shuffling. That assumes you control both ends and can burn a maintenance window. If, instead, you are staring at elephant flows or VXLAN pairs with zero entropy, no amount of secret sauce fixes the physics. Break the elephants into calves, randomize source ports, or bite the bullet on a per-packet spray if jitter is tolerable. ## Closing Grumble RFC 2991 warned everyone twenty-five years ago. Vendors still ship half-finished CEF clones that hallucinate about where packets are going. Until the CLIs grow up, probe twice, deploy once, keep your salts unpredictable, and never believe a router’s tale unless the silicon corroborates it.