Monday, 20 April 2015

DHCP, RPF verify, FHRP and ECMP - when protocols collide

I've been gradually enabling ECMP (Equal-Cost MultiPath) routing on parts of the University network as part of a prelude to the new data centre network and to improve performance generally (by making use of the dual downstream interfaces into institutions).

For the most part, this is fairly straightforward and just worked - I'll explain what was needed in each case in a future post (as MPLS VPN works slightly differently and needs some special consideration), but I did get a nasty problem which broke DHCP relaying in some places that I'll cover here.

How DHCP relaying works


First, let's review how DHCP relaying works (what you get when you do ip helper-address ... on a Cisco router interface towards clients).  Consider the following network:
When a client on the subnet 192.168.1.0/24 wants to DHCP, the following happens initially:
  1. The client sends out a DHCP DISCOVER message from 0.0.0.0 (as it doesn't know its IP address or even subnet, yet) on UDP port 68 (the DHCP client port) to all-hosts broadcast (255.255.255.255) port 67 (DHCP server).
  2. The DHCP relay agent (which is normally the router) will receive this broadcast and forward it as a unicast packet to the DHCP server listed in the ip helper-address ... interface command.  This will be from the router's interface address (this is the critical bit) - 192.168.1.253, in the above example, port 67 to the DHCP server, port 67.
  3. The DHCP server will receive the DISCOVER and, assuming it has an address (and other information) to give to the client, it will send a DHCP OFFER message back to the router to be relayed onto it.  This will be the reverse of the packet just received: going from the DHCP server's address, port 67 to the router's interface address, port 67.
  4. The router will receive the reply and unicast it back to the client, sending it from its interface IP address, port 67 to the client's prospective IP address, port 68.
This all works fine and, assuming the client wants to take the address, it will send a DHCP REQUEST (using the same process) and receive a DHCP ACK back from the server so it can begin using the address.

How DHCP relaying doesn't work


Now consider what happens when the same DHCP DISCOVER is relayed by the other client subnet router, without ECMP:
Here, something goes wrong at step 3, trying to return the DHCP OFFER to the relaying router:
  1. Because the backbone network (depicted with the cloud symbol) only knows about routing to the client subnet as a whole /24 (not the individual routers' addresses), it routes the packet via 192.168.1.253.
  2. 192.168.1.253 treats 192.168.1.252 as just another host on the client subnet and forwards it out of its interface onto that subnet.
  3. Because 192.168.1.252 has anti-spoofing blocks in place on the client subnet interface, it rejects this packet as the source address is that of the DHCP server: an invalid address from the client subnet.
With this configuration, typical for a network with a FHRP (First Hop Redundancy Protocol) such as HSRP or VRRP is in use, half of the DHCP replies (the ones relayed via one of the two routers) will be lost when they're returned to the relaying routers.

However, this in itself isn't particularly a problem in that both routers will relay the same packet to the DHCP server, resulting it receiving two copies of each DISCOVER [but with different relay agent / forwarder addresses], causing two OFFERs to be returned; the clients will not miss out as they'll get one of the two copies.  This is why I've never noticed this problem, even though it's been going on for years: the clients have still got their address and worked.

(Actually, I was sort-of aware of this problem, as it prevented pinging one of the routers' own addresses on a particular interface, if the source of the ping was elsewhere on the network.  However, that's just been a minor inconvenience and not service-affecting; I never realised that would also be affecting DHCP.)

Combining with ECMP


When this situation is combined with ECMP this can get messy: the returned DHCP OFFERs (and ACKs) might be returned to either of the two client subnet routers.  The routers' addresses are often 1 number offset (e.g. 192.168.1.252 vs .253) which will likely mean they each take a different path.

If the path for packets to the .253 relay address happen to go directly to the .253 router, all is fine.  Same with .252.

However, if you're really unlucky (and, of course, we were, in some situations), ECMP will return the .253 packet via the .252 router and the .252 packet via .253.  This results in both replies being rejected and the client getting neither of the responses.

Fixing the problem and creating another


I couldn't find any way to direct the replies back to the correct router (e.g. by advertising the router's interface IP address into OSPF as a /32), so dealing with them being rejected by the anti-spoofing protection seemed the only solution.

As I've written, I've been looking at the ip verify unicast source ... command recently, and it seemed a good opportunity to employ that, rather than modify lots of access control lists.  According to Cisco's documentation, that command has a special feature in to handle DHCP:
"Unicast RPF will allow packets with 0.0.0.0 source and 255.255.255.255 destination to pass so that Bootstrap Protocol (BOOTP) and Dynamic Host Configuration Protocol (DHCP) functions work properly."
— from the IOS Security Configuration Guide for IOS 12.2SX
Sounds good, except that doesn't handle the source addresses of relayed DHCP replies.  It would be nice if this included "packets with a source address of the interface ip helper-address and port 67, destined for the router's interface address port 67", but it doesn't.

However, the command has a feature to allow packets matching an access list to be accepted, even if they fail the RPF check.  It's configured as follows:

ip access-list extended 1301
 permit udp host DHCP-SERVER eq 67 192.168.1.0 0.0.0.255 eq 67
!
interface ...
 ip address 192.168.1.253 255.255.255.0
 standby ip 192.168.1.254
 ip verify unicast source reachable-via rx 1301

This will allow the initial DHCP DISCOVER in (as described in the Cisco documentation), regular 192.168.1.0/24 traffic (due to the RPF check) AND traffic from the DHCP server on port 67 to an address on the same subnet port 67 (which is less tedious than putting the interface IP address itself as it can be copied to the other router without modification).  This change can be combined with a simplification of the interface access lists (if used).  So, I implemented a few of these and all looked hunky dory.

The IP Input process - my old nemesis


However, a little while later, we started getting alarms for CPU usage on the Catalyst 6500-E routers.  A show process cpu sorted command showed high load caused by the IP Input process.

This is usually caused by excessive traffic being forwarded ("punted" in Cisco parlance) to the Route Processor (RP) for handling.  We can capture and display these with the following commands:

router# debug netdr capture rx
router# show netdr captured-packets

(Use debug netdr clear-capture to clear the buffer and our old friend undebug all to switch it off.)

The packets being punted all appeared to be regular data - nothing complicated like DHCP which needs special processing, so I started doing some more reading and found a document on Cisco's website explaining how this configuration is handled on a 6500:
"For unicast RPF check without ACL filtering, the PFC3 provides hardware support for the RPF check of traffic from multiple interfaces. 
For unicast RPF check with ACL filtering, the PFC determines whether or not traffic matches the ACL. The PFC sends the traffic denied by the RPF ACL to the route processor (RP) for the unicast RPF check. Packets permitted by the ACL are forwarded in hardware without a unicast RPF check." 
— from the IOS Network Security guide for IOS 12.2SX on Catalyst 6500 with PFC3
So it appears that, when you use an ACL, all the traffic not matching the ACL will get punted to the RP.  Excellent.

Fixing the problem for good


So, I backtracked on using the ip verify unicast ... command and reverted to using our old inbound access lists to protect against address spoofing.  These now have an extra entry and look as follows:

ip access-list extended in-subnet
 permit ip 192.168.1.0 0.0.0.255 any
 permit udp any eq bootpc host 255.255.255.255 eq bootps
 permit udp host DHCP-SERVER eq bootps 192.168.1.0 0.0.0.255 eq bootps
 deny ip any any

This appears to do the trick and doesn't involve the RP on the router going bananas.  Given this problem, I think I'll abandon using the ip verify unicast ... command!

(Update 2018-02-01 — we have since installed Catalyst 6807-XLs with Supervisor 6Ts and came up with a final solution.  I've described that on a separate blog post.)

2 comments:

  1. There's an alternate approach. This is only a problem when all 4 technologies are used together: FHRP; ECMP; DHCP Relay; and uRPF. Remove any one from the equation, and the problem goes away.

    Therefor, what you can do instead is to deploy a DHCP server on a subnet defined only on one of the routers, and install the helper-address only on that router. Then you still have DHCP relay, uRPF, and FHRP - but you no longer have ECMP (at least not for DHCP relay) since all DHCP traffic uses symmetric path routing.

    There is a problem in that DHCP won't be available if *that* router has an issue. However, its only a problem if the duration of the router issue exceeds a DHCP lease half-life, and even then it only starts as a small problem (that increases linearly until a full lease period). For short leases that's definitely a near-immediate issue; for longer leases, you should be able to recover the router in time under 90% of conceivable scenarios.

    The only other problem is a scalability issue - this approach requires every FHRP-paired router to have a DHCP server / server pair associated to it, so on larger networks (like I imagine at Cambridge, definitely at Boston University where I work) you can't have a single HA pair of DHCP servers for the whole campus. But at least in key locations (on-campus datacenter, for example, perhaps the central WiFi router if you don't use local drop-off), you can do this.

    ReplyDelete