Sunday, 25 August 2013

PIM-SM and multiple edge routers - suboptimal forwarding

I had a problem reported by Martyn Johnson in the University Computer Laboratory (CL) regarding multicast (running PIM-SM) - he had a host on a subnet (by "subnet", I mean a VLAN or other layer 2 domain which might have multiple unicast subnets, of course) which had subscribed to a multicast group; the subnet had redundant routers (using HSRP, I guess, but maybe VRRP) and traffic was being forwarded onto the network by the least optimal of these routers:



The multicast source - server "S" - (top left - 1.3.0.2 in my example, sending to group 239.0.0.1) was actually somewhere across the University backbone and from the internet and was sourced via MSDP, but the principle is the same: the traffic was coming in via R1 (which was also the RP), over to R2 and then being forwarded on to the member/receiver - host "M" (bottom left):

R1#show ip mroute 239.0.0.1
...

(*, 239.0.0.1), 00:08:34/stopped, RP 1.9.0.1, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:07:44/00:02:38

(1.3.0.2, 239.0.0.1), 00:00:23/00:03:27, flags: T
  Incoming interface: Ethernet1/2, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:00:23/00:03:06

And R2:

R2#show ip mroute 239.0.0.1

(*, 239.0.0.1), 00:08:38/stopped, RP 1.9.0.1, flags: SJC
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:08:38/00:02:22

(1.3.0.2, 239.0.0.1), 00:00:27/00:02:51, flags: JT
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:00:27/00:02:32

The optimal path would be directly from R1 onto the subnet: I would have expected R1 to detect R2 sending traffic out onto the subnet and send a PIM Assert, then win that to be the forwarder for that source on that subnet.  However, that didn't happen.

The PIM DR

I had to remind myself how all this worked and then, of course, realised I didn't understand it as well as I thought I did.

First up, I got a bit muddled with IGMP, but after a bit of re-reading, answered one of my long-standing queries that the IGMP Querier has nothing to do with the actual multicast forwarding router (at least for IGMPv2 onwards - in IGMPv1 there is no mechanism to elect a Querier, so the choice of that is left up to the multicast routing protocol).  IGMPv2 has no way to force the selection of a Querier but defaults to the router with the highest IP address.

Then there's the PIM-SM - that elects a DR on a subnet which will, by default, be responsible for forwarding traffic onto that subnet - this defaults to the PIM router with the lowest IP address.  The reason these two differ is the distribute the work of supporting multicast on a subnet, but it's important to remember that the PIM DR is the one that actually forwards the traffic for directly-connected members; the IGMP Querier is just responsible for soliciting periodical group membership from receivers.  You can force a particular router to be the PIM DR by setting the DR priority (ip pim dr-priority ...).

You can check these both with show ip igmp interface ... | inc router - or you can just look at the PIM stuff with show ip pim interface.

The DR is also responsible for forwarding received multicast traffic from the directly-connected subnet onto the routed backbone network, but that can change when the Shortest Path Tree (SPT) is built later.

So, in Martyn's case I guessed (but haven't verified yet) that the IGMP Querier would be R1 and the PIM DR would be R2.

So, when M sends an IGMP Membership Report (colloquially "IGMP Join") then R2 is the one which responds by joining the shared tree for and forwarding traffic onto the subnet.  Once the traffic started flowing, R2 joins the SPT for each source, which it did.

Here, it's important to remember that R2 will still be the DR serving the M's subnet: even if the SPT results in a shorter path to the source, it will still be begun (at the receiving end) from DR:

  • the shared tree starts at the DR for the source's subnet and goes up to the RP, then down to the DRs for the members' subnets
  • the SPT starts at the receiver's subnet's DR and uses the RPF table to connect it back to a PIM router connected to the source's subnet, which may or may not be the DR for that subnet, depending on how the IGP has converged

(When investigating this on the University backbone network, this caused me some confusion because our OSPF is set with maximum-paths 1 and static/connected routes distributed as E1s, so they are "sticky" - it can be the case that the active route for a particular source leads to a different router than the DR on it's subnet, meaning the SPT will not come from the same router that the shared tree does.  This is particularly confusing because changes in the order routes get advertised can cause the SPTs can be different, just as with unicast routes.)

I thought a PIM Assert would step in here, resulting in R1 taking over duties for the receiver's subnet, but it didn't.

Trying PIM-DM

Switching things over to PIM-DM on M's subnet and link between R1 and R2 DID solve the problem: here, the traffic flows out of both R1 and R2 onto the subnet and the routers detect that traffic is arriving on that interface which doesn't match the Reverse Path Forwarding check for S.  We get an Assert and R1 wins:

R1#show ip mroute 239.0.0.1 1.3.0.2
...
(1.3.0.2, 239.0.0.1), 00:00:33/00:02:50, flags: T
  Incoming interface: Ethernet1/2, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Dense, 00:00:09/00:00:00
    Ethernet1/1, Forward/Dense, 00:00:33/00:00:00, A

And R2 has pruned the interface, having lost the Assert:

R2#show ip mroute 239.0.0.1
...
(1.3.0.2, 239.0.0.1), 00:00:32/00:02:30, flags: PJT
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Prune/Dense, 00:00:32/00:02:27

So maybe PIM-DM would work better than PIM-SM here!

Introducing a downstream PIM-SM router

The examples in Cisco's Developing IP Multicast Networks, Volume I don't cover the situation which we have here, which is why I don't think it's explained there.  They do, however, cover the following situation (in the PIM-DM section), with R3 added to the client subnet and a member, M2, added downstream of that:


With no source, once everything has converged (with R3 joining the shared tree for the group), the multicast forwarding tables will look as follows for 239.0.0.1 on R1:

R1#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:05:43/00:02:54, RP 1.9.0.1, flags: SJC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:04:17/00:03:10
    Ethernet1/1, Forward/Sparse, 00:05:43/00:02:56

And R2:

R2#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:11:04/00:02:59, RP 1.9.0.1, flags: SC
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:11:04/00:02:59

And R3 is how you would expect:

R3#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:28:37/00:02:29, RP 1.9.0.1, flags: SC
  Incoming interface: Ethernet1/0, RPF nbr 1.2.0.253
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:28:37/00:02:29

Note that both R1 and R2 have added their link to the subnet with M3 and R3 into their outgoing interface list (highlighted), but for two different reasons:

  • R2 has added it because M has joined the group (via IGMP) and it is the DR for that subnet, as before, and
  • R1 because R3 has sent a join (via PIM-SM) for the shared tree towards the RP (which would be best reached directly across the shared network)
This duplication of forwarding onto the shared subnet is not resolved at this point: only when traffic is actually forwarded and detected by one (or both) of the routers which are forwarding is a PIM Assert generated and an election takes place to determine the winning forwarder for a particular (S, G).

So, when S starts to send traffic to the group, R1 and R2 will forward traffic onto the shared subnet and an Assert will be triggered, which R1 will win for (1.3.0.2, 239.0.0.1):

R1#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:03:54/00:02:36, RP 1.9.0.1, flags: SJC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:03:51/00:02:36
    Ethernet1/1, Forward/Sparse, 00:03:54/00:02:33

(1.3.0.2, 239.0.0.1), 00:01:24/00:03:21, flags: T
  Incoming interface: Ethernet1/2, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:00:53/00:02:36
    Ethernet1/1, Forward/Sparse, 00:01:24/00:03:03, A

And R2 loses and prunes (1.3.0.2, 239.0.0.1) to that subnet from its forwarding table:

R2#show ip mroute 239.0.0.1
...
(*, 239.0.0.1), 00:05:28/stopped, RP 1.9.0.1, flags: SJC
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list:
    Ethernet1/1, Forward/Sparse, 00:05:28/00:02:25

(1.3.0.2, 239.0.0.1), 00:03:01/00:02:57, flags: PJT
  Incoming interface: Ethernet1/0, RPF nbr 1.1.0.1
  Outgoing interface list: Null

So here, it seems that having a downstream router running PIM-SM helps the traffic flow to optimise.

Hello and why...

I'm starting the blog as an experiment to jot down little snippets of information I find in my job or other things and want to try and find them again...

I often spend a lot of time investigating and (if I'm lucky) solving problems but they then don't result in anything being written up, but just an explanation in the tea room or a corridor, or maybe an email.  This is an attempt to write some scratch notes about them to go back later and remind myself why.

Let's see how it works out.