Tuesday, 17 February 2015

Cisco Nexus data centre network and enabling MPLS

We're just completed a tender for a new data centre network and select a design based on one of Cisco's standard models consisting of core Nexus 7010s in two physically diverse data centre locations, with each row of racks consisting of a pair of Nexus 56128Ps as the EoR switches and various Nexus 2k models as the ToR units.

Despite having plenty of experience with IOS (and, to a lesser extent, IOS XE), we've never had Nexus equipment before, in particular not used NX-OS.  As such, there's a bit of a learning curve to get things up and running.

This post (and probably some following ones) is what we've encountered in the setup of the Nexus equipment and will likely cover VDCs on the 7010s, vPC (Virtual Port-Channel) on the 7010s and 56128Ps and the FEXs, as well as some NX-OS differences.

MPLS

Despite getting some licence sheets with PAKs (Product Activation Keys) for the MPLS feature set, these appear to have already been installed.  However, the feature isn't activated by default and requires some work to do so.

First, in the admin VDC, the feature set must be installed:

n7k(config)# install feature-set mpls

Then, in the service VDCs (our backbone routing-only one is "cudn", the Cambridge University Data Network), the feature-set must be activated, along with the desired individual features:

n7k-cudn(config)# feature-set mpls
n7k-cudn(config)# feature mpls ldp
n7k-cudn(config)# feature mpls l3vpn

The MPLS features then become available in that VDC.

Thursday, 7 August 2014

IPv4 multicast over an MPLS VPN using mGRE

I've wanted to sort out multicast forwarding over an MPLS VPN for some time — while most institutions at the University of Cambridge don't have multicast-enabled networks, it seems a loose end that might prevent take up of the service.

After some poking about, trying to find free stuff online and failing, it seemed like MPLS and VPN Architectures vol. II from CiscoPress would explain it: in particular Chapter 7 ("Multicast VPN").  The good news is that it does.

However, the book dates from 2003 and I found an update document on Cisco's website about developments in MBGP to handle multicast over MPLS VPNs from about 2008 ("Multicast VPN - IP Multicast Support for MPLS VPNs"); there was a reprint of the book in paperback a month or so ago (late June 2014) and I thought it might have been updated to cover that.  Unfortunately, the book hasn't been updated, although the changes are minimal so it doesn't hurt to start with the book and then read the update.

Note: the solution described here uses mGRE (multipoint GRE); replacing much of this is MLDP (Multicast Label Distribution Protocol)   The latter looks like the right solution in the longer term but is fairly recent (dating from 2010) and needs IOS 15 on the Cisco platform, I believe.  I'll look at that later.

Anyway — onto the technical bits...

The key facts

There are several key things to know in advance, which I think helps understand multicast over an MPLS VPN:
  • Firstly, it doesn't actually involve MPLS: multicast traffic is forwarded completely differently from unicast traffic and doesn't involve any MPLS labelled frames.
  • Instead, multicast traffic inside the VPN is tunnelled, by encapsulating it using GRE, inside multicast traffic in the global space.  This is commonly called 'mGRE' (multipoint GRE).
  • In the global space, one or more MDTs (Multicast Distribution Trees) are constructed to carry this tunnelled traffic between the PE routers in the VPN.  When traffic is received on one of these trees, it is de-encapsulated and forwarded to the VRF (Virtual Routing and Forwarding) instance for the VPN.  Inside the VRF, traffic is forwarded into the MDT, when it needs to reach other routers in the VPN.
  • MDTs essentially put all the participating routers on a single layer 2 segment: when querying the PIM neighbours on the tunnel interface created by an MDT, all of the other routers will be seen.
  • The first MDT is known as the Default MDT — all PE routers in the VPN will join this tree and it will, by default, be used to forward all multicast traffic on the VPN.
  • If there is a lot of multicast traffic and it is not typically required on all the PE routers (because only some have members) then this is inefficient.  To handle this situation, Data MDTs can be configured to set up separate groups in the global space — Cisco let you do this for groups where the bandwidth exceeds a certain threshold and/or with an address matched by an access list.
  • The Data MDTs use a pool of group addresses which is automatically recycled as groups come and go, much like a DHCP address pool.  If the pool is exhausted, group addresses will be reused on a least-used (by mapping of group addresses) basis.
  • The indication that a particular group has been switched from the Default MDT to the Data MDT is done by a PIM message that is forwarded across the Default MDT.
  • Note that the decision to switch traffic to a Data MDT is done by the ingress router (nearest the source), rather than the final hop router (as would be the case with SPT switchover under regular PIM).
  • If the groups for the Default MDTs are in a PIM-SSM range (232.0.0.0/8, by default), the BGP address-family ipv4 mdt can be used to locate other PE routers in the MVPN.  This is not necessary if the groups use PIM-SM with an RP.
  • The VPN must be configured to provide a BSR and RP in the normal manner: these work as normal across the MDTs (so a single BSR and RP can be set up on a PE for the VPN, or even inside the customer network).
  • The interfaces to CE routers work just as normal and also support BSR and RP discovery.
This arrangement is designed to be scalable and limit control plane resources in the provider network: the Data MDT group address pool allows the provider to prevent a large number of customer groups in the VPN from creating a similar number on the P routers, limiting the effect to just the PE routers.

All this business requires a good understanding of how multicast works and requires careful tuning of PIM modes (SM, SSM, Bidir) as well as the MDT group addresses, to get the best performance.

Keeping traffic separate

The main alarm bell I had with all this was about keeping traffic inside the VPN from escaping onto the global network, where it could be intercepted, or false traffic inserted.

This problem is actually straightforward: Cisco provide the ip multicast boundary command which is used to filter multicast traffic on an interface — if the group addresses used by the MDTs are in a range not permitted by the list specified with this, they will not pass across the interface, keeping the traffic safe.

In our case, we filter 239.255.0.0/16 as being 'institution private' so it doesn't ever pass across an interface between an edge subnet or institution and our backbone.  If the MDTs are created in here, they should be safe.

Testing multicast boundaries

To check this all worked, I did some tests with this using GNS3 and Wireshark sniffing the virtual link between a host in the global space, which had joined the Default MDT group.  By switching the filtering on and off, I could watch the effect...
  • Installing a boundary had an immediate affect, stopping the MDT traffic from reaching the host.  The interface connecting the host disappeared from the output interface list of show ip mroute.
  • However, the group still showed up as being joined in the output of show ip igmp group, even though traffic did not pass.  This would remain there until the IGMP reporting timer expired; other groups would refresh, but not this one
  • When the boundary was removed, traffic would not immediately flow, even if the interface remained in the IGMP list.  However, when the IGMP membership report was retransmitted by the host, it caused the traffic to start flowing to the host again.

Example

(Update 2015-050-24 - I'm about to convert my GNS3 simulation over to MLDP, now I've upgraded to IOS 15, so I thought I'd do some tests using mGRE before dismantling it.)

A quick demo of this all in action and the output of some show commands.

Configuration

The configuration below builds on a working [non-VRF] PIM-SM on the backbone, as well as an MPLS IPv4 VPN.  I'm doing this in IOS 15.

The MDTs are configured in the VRF, identically on each router, in the address-family block - the extra commands highlighted:

vrf definition eng_vrf
 rd 192.84.5.238:129
 route-target export 192.84.5.0:129
 route-target import 192.84.5.0:129
 !
 address-family ipv4
  mdt default 239.255.32.1
  mdt data 239.255.36.0 0.0.0.31
  mdt data threshold 10
 exit-address-family
!
ip multicast-routing vrf eng_vrf

I've deliberately set the Data MDT switchover threshold high to avoid it being triggered, initially.

Note that IOS 15 replaces the mdt data ... threshold ... option with a separate mdt data threshold command that operates per-VRF.  If you use the old format, you will receive a warning that it's deprecated.

Once enabled on the router, a VRF interface is enabled the same as a non-VRF one:

interface Ethernet1/2.210
 description eng-vpn-nms
 encapsulation dot1Q 210
 vrf forwarding eng_vrf
 ip address 129.169.10.253 255.255.255.0
 !
 ip pim bsr-border
 ip pim sparse-mode
 ip igmp version 3

Checking the Default MDT

Once this has been configured on all routers, the Default MDT should be formed across the backbone (using 239.255.32.1, in the above example) and they should all discover each other as if on a single network segment using PIM:

DIST-NMS#show ip pim vrf eng_vrf neighbor 
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
      P - Proxy Capable, S - State Refresh Capable, G - GenID Capable
Neighbor          Interface                Uptime/Expires    Ver   DR
Address                                                            Prio/Mode
192.84.5.236      Tunnel3                  00:00:15/00:01:35 v2    1 / S P G
192.84.5.237      Tunnel3                  00:00:39/00:01:29 v2    1 / S P G

The above router - DIST-NMS - has found the other routers across the Default MDT on the backbone using the Tunnel3 interface; their loopback addresses in 192.84.5.x are shown.  No DR is shown as that is us (our loopback is 192.84.5.238 - the highest address):

DIST-NMS#show ip pim vrf eng_vrf int tu3        

Address          Interface                Ver/   Nbr    Query  DR     DR
                                          Mode   Count  Intvl  Prior
192.84.5.238     Tunnel3                  v2/S   2      30     1      192.84.5.238

You can check the MDTs and their tunnel interfaces as follows:

DIST-NMS#show ip pim vrf eng_vrf mdt
  * implies mdt is the default MDT
  MDT Group/Num   Interface   Source                   VRF
* 239.255.32.1    Tunnel3     Loopback0                eng_vrf

You can see the Default MDT group is visible in the global IP multicast forwarding table - note the Z flag, showing the group is used as an mGRE multicast tunnel:

DIST-NMS#show ip mroute 239.255.32.1
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report, 
       Z - Multicast Tunnel, z - MDT-data group sender, 
       Y - Joined MDT-data group, y - Sending to MDT-data group, 
       G - Received BGP C-Mroute, g - Sent BGP C-Mroute, 
       Q - Received BGP S-A Route, q - Sent BGP S-A Route, 
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 239.255.32.1), 00:06:39/stopped, RP 192.84.5.240, flags: SJCFZ
  Incoming interface: Ethernet1/0, RPF nbr 192.84.5.33
  Outgoing interface list:
    MVRF eng_vrf, Forward/Sparse, 00:06:39/00:02:20

(192.84.5.236, 239.255.32.1), 00:05:40/00:01:10, flags: JTZ
  Incoming interface: Ethernet1/0, RPF nbr 192.84.5.33
  Outgoing interface list:
    MVRF eng_vrf, Forward/Sparse, 00:05:40/00:00:19

(192.84.5.237, 239.255.32.1), 00:06:04/00:00:59, flags: JTZ
  Incoming interface: Ethernet1/0, RPF nbr 192.84.5.33
  Outgoing interface list:
    MVRF eng_vrf, Forward/Sparse, 00:06:04/00:02:56

(192.84.5.238, 239.255.32.1), 00:06:39/00:02:54, flags: FT
  Incoming interface: Loopback0, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0, Forward/Sparse, 00:06:38/00:02:50

The PIM domain inside the VRF

The multicast forwarding table for the VRF is initially empty:

DIST-NMS# show ip mroute vrf eng_vrf
IP Multicast Routing Table
...

(*, 224.0.1.40), 00:59:21/00:02:14, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/2.210, Forward/Sparse, 00:59:20/00:02:14

In my simulation, the PIM-SM RP is on a router in an institutional (departmental) network, not part of the MPLS backbone.  This is advertised as normal using PIM BSR and detected over the MDT:

DIST-NMS#show ip pim vrf eng_vrf rp mapping 
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
  RP 129.169.252.1 (TRUMP.ENG), v2
    Info source: 129.169.252.1 (TRUMP.ENG), via bootstrap, priority 64, holdtime 150
         Uptime: 00:17:01, expires: 00:02:12

Adding group members

I'm using 234.131.111.10 as a test group.  First, I set up two group members on institutional networks using ip igmp join-group ....  Here's one:

interface FastEthernet0/0
 description eng-vpn-medschl
 ip address 129.169.86.86 255.255.255.0
 ip igmp join-group 234.131.111.10

These show up in the multicast forwarding table on the RP, ENG-TRUMP:

ENG-TRUMP#show ip mroute 234.131.111.10
IP Multicast Routing Table
...

(*, 234.131.111.10), 00:24:05/00:03:14, RP 129.169.252.1, flags: SJC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Ethernet1/0.252, Forward/Sparse, 00:21:03/00:03:14
    Ethernet1/2, Forward/Sparse, 00:24:05/00:01:58

On that router, Ethernet1/2 is the link to the backbone router which is inside the VRF (and has the above configured member) and Ethernet1/0.252 has another, local member.

Adding a source

Now I can start a source by setting a ping going on an end host (with IP 129.169.10.10) to the group address and see the responses from the two members:

HOST-ENG-NMS#ping 234.131.111.10 repeat 1000

Type escape sequence to abort.
Sending 1000, 100-byte ICMP Echos to 234.131.111.10, timeout is 2 seconds:

Reply to request 0 from HOST-TRUMP.ENG (129.169.254.1), 444 ms
Reply to request 0 from HOST-MEDSCHL.ENG (129.169.86.86), 572 ms
Reply to request 1 from HOST-MEDSCHL.ENG (129.169.86.86), 208 ms
Reply to request 1 from HOST-TRUMP.ENG (129.169.254.1), 404 ms
...

The multicast forwarding table on the first hop router will show traffic being forwarded over the tunnel:

DIST-NMS#show ip mroute vrf eng_vrf 234.131.111.10
IP Multicast Routing Table
...

(*, 234.131.111.10), 00:00:10/stopped, RP 129.169.252.1, flags: SPF
  Incoming interface: Tunnel3, RPF nbr 192.84.5.236
  Outgoing interface list: Null

(129.169.10.10, 234.131.111.10), 00:00:10/00:03:22, flags: FT
  Incoming interface: Ethernet1/2.210, RPF nbr 0.0.0.0
  Outgoing interface list:

    Tunnel3, Forward/Sparse, 00:00:09/00:03:20

Because there's not a sufficient level of traffic (above the configured 10kbit/s), no Data MDT has been created - we can check on the first hop router:

DIST-NMS#show ip mroute vrf eng_vrf 234.131.111.10 active  
Use "show ip mfib active" to get better response time for a large number of mroutes.

Active IP Multicast Sources - sending >= 4 kbps

DIST-NMS#show ip pim vrf eng_vrf mdt send                  
DIST-NMS#

And on a receiving router:

DIST-HOSP#show ip pim vrf eng_vrf mdt receive detail 
DIST-HOSP#

Increasing the data rate

Now everything's working, let's increase the rate of traffic by making the ping send packets without the 1s delay between them:

HOST-ENG-NMS#ping 234.131.111.10 rep 100000 timeout 0

Type escape sequence to abort.
Sending 100000, 100-byte ICMP Echos to 234.131.111.10, timeout is 0 seconds:
......................................................................
......................................................................
...

And check the traffic data rate:

DIST-NMS#show ip mroute vrf eng_vrf 234.131.111.10 active 
Use "show ip mfib active" to get better response time for a large number of mroutes.

Active IP Multicast Sources - sending >= 4 kbps

Group: 234.131.111.10, (MCAST-131-111-10.CAM)
   Source: 129.169.10.10 (HOST-NMS.ENG)
     Rate: 141 pps/113 kbps(1sec), 113 kbps(last 40 secs), 3 kbps(life avg)

Now the data rate has gone above the configured 10kbps (kbit/s) threshold, a Data MDT should have been created and traffic switched across to it.  This can be confirmed on the first hop router, along with the address of the new group:

DIST-NMS#show ip pim vrf eng_vrf mdt send                 

MDT-data send list for VRF: eng_vrf
  (source, group)                     MDT-data group/num   ref_count
  (129.169.10.10, 234.131.111.10)     239.255.36.0         1

On the egress (from the mGRE cloud) router, this can be confirmed:

DIST-HOSP#show ip pim vrf eng_vrf mdt receive detail 
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report, 
       Z - Multicast Tunnel, z - MDT-data group sender, 
       Y - Joined MDT-data group, y - Sending to MDT-data group, 
       G - Received BGP C-Mroute, g - Sent BGP C-Mroute, 
       Q - Received BGP S-A Route, q - Sent BGP S-A Route, 
       V - RD & Vector, v - Vector

Joined MDT-data [group/mdt number : source]  uptime/expires for VRF: eng_vrf
 [239.255.36.0 : 192.84.5.238]  00:04:56/00:00:36
  (129.169.10.10, 234.131.111.10), 00:10:39/00:01:44/00:00:36, OIF count: 1, flags: JTY

Once the source stops and the MDT times out (about 3 minutes), the Data MDT will then be destroyed.

Thursday, 10 July 2014

Traffic on the public Wi-Fi network

Just a quick graph — here's traffic in each of our zones on our public Wi-Fi service since it went live:


The graph come from our web-based management Console, updated daily (and interactively viewable).  I can't take credit for the graph on the web interface as that's done by my colleague Rob Bricheno, but I am responsible for calculating the numbers behind it.

Cambridge public Wi-Fi project

A brief diversion from the technical routing and switching bits and a bit of climbing about on roofs (although not as good as this!)...

We recently engaged on a project to Wi-Fi chunks of the centre of Cambridge, focussing on the route of stage 3 of the Tour de France, which started in Cambridge and finished in London.  The presented a number of problems - getting Wi-Fi to places we didn't have cabling to, coping with interference on the RF bands and also handling the large number of clients.

Mesh link to the Guildhall

We've done Wi-Fi bridges before but we had to get one particularly tricky one going: from the New Museums Site over to the city council-owned Guildhall so we could cover the Market Square.

The uplink end was fixed on top of the Austin Building on the New Museums Site: the picture below shows the view over to the Guildhall.  The Guildhall end of the link is pointed to by the red arrow — if you zoom the image to full size, you'll see the other end: it's not the bigger white blob at the tip of the arrow (that's an unfortunately-placed satellite dish) but the tiny two-pixel dot slightly up and to the right.  That location was restricted by the space the city council would give us on the roof (hence the satellite dish almost in the way!)


An Aruba AP-175P was fixed to the railings on the roof and a pair of antennae were attached: an ANT-2X2-5614, a directional 5GHz link antennae pointing over to the Guildhall to run the link, and an ANT-2X2-D805, a ~180-degree 2.4GHz antenna to cover the car park on the New Museums Site (to avoid the 2.4GHz radio going to waste).  You can see those both here:


... the AP is in the sun shield on the left hand side, the D805 is the panel pointing down and to the right and the 5614 you can just see end-on, just to the left of the D805.

The AP itself you can see here, with it's lightning arresters / band pass filters (BPFs) sticking out of the antenna terminals:


The Guildhall end was stuck on a pole and had an AP-175AC mounted on the wall, just below.  Here you can see the AP removed and Alexander Cox (our wireless surveyor and installer in the high-vis jacket) and Giles Scott (from Aruba, who assisted with an RF issue, below) poking about:


The 5614 here points back at the New Museums Site end and the D805 covers the Market Place:


We've since then used the PoE ethernet port on the AP-175AC as a bridge of the management network so, in addition to the link running the AP itself, it extends to another AP-175P on the Guildhall, enhancing coverage of the Market Place with a pair of D805s on (one for 2.4GHz and another for 5GHz).  The D805 on the link AP now points down Petty Cury.

RF interference

So did it work?  Well - not at first, we had problems with the 2.4GHz antenna giving very poor coverage: client devices could see the AP with a good signal but couldn't connect — the AP reported no attempts to connect.  It was like the client could receive but the signal wasn't making it back to the AP.

After some more investigation, this turned out to be exactly what it was — we were getting a lot of interference from mobile phone signals in the city centre area.  In particular, the top of the Guildhall also has some other antennae on it:


These are Vodafone 3G masts, transmitting at 2.1GHz (a frequency used by UMTS in Europe) causing lots of interference.  The Aruba AP-175 series don't have Band Pass Filters (BPFs) and so are susceptible to interference around the Wi-Fi frequencies.

A colleague had spoken to a third party supplier regarding lightning arresters and if the Aruba ones were suitable.  The third party supplier suggested some alternative model and initially these were used, in place of the Aruba ones.

However, the Aruba ones also contain BPFs: the LAR-1 filters everything outside 2-5GHz and the LAR-24 outside 2.3-2.5GHz.  The LAR-1 should be used on the 5GHz antennae and the LAR-24 on the 2.4GHz antennae.

Once the lightning arresters with the BPFs were installed, everything sprung into life.  As I write this, the link is showing up at 300Mbit/s in both directions and has been stable for over 7 days.  Considering there's that annoying satellite dish in the way, that's pretty good!

There were several other locations where this occurred, although some were also fine.  However, all of them have since been swapped.

Sunday, 5 January 2014

IPv6 multicast

I mentioned a while back that I'd been looking at IPv6 multicast and enabling it on our backbone network.  We were in the process of moving to a new office building at work and setting up our office network such that we were a regular internal eBGP client on the backbone (same as any other department or college), rather than making it a directly-connected network.  This introduced the challenge of making interdomain IPv6 multicast work.

All the information needed is covered in CiscoPress's Deploying IPv6 Networks (Chapter 6 - "Providing IPv6 Multicast Services"), although it does assume that you're familiar with both IPv6 basics and IPv4 multicast in significant detail.

In practice, enabling IPv6 multicast was almost trivially simple.  Most of the complication comes from the different group address types — the different types control interdomain vs local domain-only and ASM vs SSM.

However, multicast is one area where IPv6 really shines over IPv4: the clearly-defined group address structure, inherent scope, as well as automatically providing holders of global IPv6 address space a set of group addresses (similar to that in RFC6034 for IPv4),

A summary of the main differences are (some of which are Cisco-specific, but likely to be the case on other equipment):
  • When IPv6 multicast routing is enabled on a Cisco router (with global command ipv6 multicast-routing) then all IPv6-enabled interfaces automatically start to route it — there's no need to issue an IPv6-equivalent of the interface command ip pim sparse-mode (although you can disable it on a per-interface basis with no ipv6 mld router / no ipv6 pim, which was important for us!).
  • Scope is an inherent part of IPv6 multicast group addresses (being the fourth nibble) and is clearly specified on a particular interface with (e.g.) ipv6 multicast boundary scope organization-local — there is no need to muck about with boundary access lists containing a set of arbitrary group addresses.
  • Source Specific Multicast (SSM) was designed-in from the start — there is no need to explicitly enable it, nor designate certain address ranges to use it.
  • IPv6 PIM only uses Sparse Mode (SM) and SSM — there is none of the baggage of Dense Mode or Sparse-Dense Mode.
  • PIM BSR and RP discovery protocols were built-in early on — there was no need for an alternative protocol (such as Cisco AutoRP).  The commands for configuring this are similar, but not the same, as for IPv4 but are generally simpler because of the cleaner addressing structure (in our case, we had no need to separate the RPs for local groups from global groups, as we do with IPv4).
  • There is no need for an MSDP for IPv6 — interdomain multicast is provided through either SSM or Embedded RP (where the address of the RP is contained within the group address itself).
  • Multicast Listener Discovery (MLD) takes the place of IGMP.

Address types, capabilities and Embedded RP


The type of group address chosen will determine how a group address is handled and what it is capable of:
Address typePrefixesASM?SSM?Interdomain?Embedded RP?
Global ASM with Embedded RPff7M::/12YesYesYesYes
Global SSMff3M::/12NoYesYesNo
Local ASMff3M:00KL::/32 and ff0M::/12YesYesNoNo

ASM is where the main difference lies: because there is no MSDP in IPv6, there is no way for a PIM domain to know about sources of traffic in other domains — IPv6 solves this by using Embedded RP.

As its name suggests, Embedded RP is where the address of the RP for a particular group is contained within the group address itself.  As addresses are still 128 bits long and you need space for the multicast prefix, scope and group, only a limited number of bits are available for the RP itself.

Addresses are in the form ff7M:0GHI:... — the 7 indicates an Embedded RP address; M is the scope (as per normal); G is the final nibble of the RP address and HI gives the number of bits from the remainder of the address (after the first 32) to be used as the leading part of the address of the RP.  The remaining bits (128 - 32 - KL) are available for the group addresses themselves.

For example, the University of Cambridge uses addresses in the format ff7M:a2c:2001:630:210::/76 — this gives an RP address of 2001:630:210::a (0x2c = 44 bits after the first 32, and ending with a).  The remaining (128 - 76 =) 52 bits then chopped with the first 12 matching an institution's prefix and the final 40 being available for allocation by the institution.

The provide the Embedded RP, a simple command is required on the selected router: ipv6 pim rp-address 2001:630:210::a.  Cisco have a special Anycast RP feature to provide redundancy for this but it isn't available on our Catalyst 6500 series platform.

Lots of information about all this is published on the our public website.

Fun and games


So what was the "excitement" I mentioned in an earlier post?  Well, when this was enabled, it caused some of our servers to immediately and repeatedly crash and reboot.

The issue has still not been resolved and IPv6 remains disabled on our central server network.  I won't mention which OS and the exact cause, but it is unlikely most people will run into it (although it affects us significantly, as it's one of the reason to run thing on this platform) and that the OS is closed-source so we are dependent on the manufacturer resolving it.

Monday, 16 September 2013

DHCP Option 82 update - Cisco and HP

Following the work on DHCP with Option 82 and Cisco switches and routers, I thought I'd summarise things and try getting it to work with HP ProCurve switches, too (by "ProCurve" I mean the "traditional" HP switches and not the ex-H3C ones running Comware - we don't have any of the latter).

The agent remote-id and circuit-id fields are left as vendor-specific and, of course, differ between switch platforms.  Detecting the different formats is a little hit-and-miss, but seems to be possible between HP and Cisco, at least.  When I get time later, I may look at Extreme XOS stuff as we use that too (although only in our data centres, where we don't have DHCP in use, except to set up servers initially).

HP DHCP Snooping and Option 82


HP has three modes for the remote-id, set with the dhcp-snooping option 82 remote-id global command: mac (the default) - just 6 bytes of the base MAC address of the switch, subnet-ip - the switch's IP address on the VLAN with the client (I have no idea what happens if there isn't one set), and mgmt-ip - the management IP address (the IP address set on the management VLAN).  The latter looks the most useful.  There is no leading byte to indicate which of these options has been selected.

There seems to be no way to control what the format of the circuit-id takes: on the switches I've tested it on - a 2610-24-PWR and a 5412), it's two bytes - the first is zero and I assume would increase with slot numbers on a chassis-based switch (the AP I have on the 5412 is in slot A and it's still 0); the second is the port number.

Cisco and HP Option 82 information compared


The remote-id is as follows:

Format01234567-
Cisco(default)0Switch base MAC address
Hostname1LenHostname or explicit string
HPmacSwitch base MAC address
subnet-ipClient VLAN IP address?
mgmt-ipManagement VLAN IP address

The logic to parse this field seems best as:
  1. If byte 0 is 1, it's probably a Cisco hostname (it's unlikely to be an IP address "1.q.r.s" or a multicast MAC address), so print the string starting at byte 2
  2. If it's 4 bytes long, assume it's an HP IP address, so print it as an IPv4 address
  3. Else assume it's an HP MAC address so print as colon-separate hex string (which, if it's not a MAC address, we can still translate)
The circuit-id is one of:

Format0123456-
Ciscovlan-mod-port04 (= Length)VLAN ID (big endian)Module (slot)Port
string1LengthPort and VLAN string
HP-SlotPort

Depending on what the remote-id was selected as, it seems best to print this as:

  • Cisco - parse the 2-byte VLAN ID and print the port as "module/port" in decimal
  • HP - print the data as hyphen-separated decimals (it could be separated by slashes, but this is just to differentiate)

Cisco configuration


The following is what I've used on the Cisco switches; the lines with a leading "!" are defaults:

ip dhcp snooping vlan ...
!ip dhcp snooping information option
ip dhcp snooping information option format remote-id hostname
!no ip dhcp snooping information option allow-untrusted
ip dhcp snooping database ...
ip dhcp snooping database write-delay 900
ip dhcp snooping
!
interface <uplink>
 ip dhcp snooping trust

On the Cisco routers, I've issued the following, to allow Option 82 to be set by a downstream switch:

ip dhcp relay information trust-all

HP configuration


The following is what I've used on HP switches; the lines with a leading "!" are defaults:

dhcp-snooping vlan ...
dhcp-snooping option 82 remote-id mgmt-ip
!dhcp-snooping option 82 untrusted-policy drop
!dhcp-snooping option 82
dhcp-snooping database file ...
dhcp-snooping
!
interface <uplink>
 dhcp-snooping trust

Printing the agent details in ISC DHCP


Adapting the code posted before to cope with these variations:

# if the agent (Option 82) details are present, attempt to read information
# from them; this is tricky because different vendors and configurations can
# return information in conflicting formats and it's difficult to work out
# what format the information is in, so we make some assumptions

if exists agent.remote-id {
  if substring(option agent.remote-id, 0, 1) = 1 {
    # the first byte of the remote ID is 1 - that's unlikely to be an IP addr-
    # ess and, if it were a MAC address it would be multicast, so it is
    # probably a hostname from a Cisco switch
    log(
      info,
      concat(
        "agent information ",
        binary-to-ascii(10, 8, ".", leased-address),
        " to ",
        binary-to-ascii(16, 8, ":", substring(hardware, 1, 6)),
        " on ",
        substring(option agent.remote-id, 2, extract-int(substring(option agent.remote-id, 1, 1), 8)),
        " port ",
        binary-to-ascii(10, 8, "", substring(option agent.circuit-id, 4, 1)),
        "/",
        binary-to-ascii(10, 8, "", substring(option agent.circuit-id, 5, 1)),
        " VLAN ",
        binary-to-ascii(10, 16, "", substring(option agent.circuit-id, 2, 2))));

  } elsif substring(option agent.remote-id, 4, 2) = "" {
    # if the length of the remote ID is less than 6, we probably have an IP
    # address from an HP
    log(
      info,
      concat(
        "agent information ",
        binary-to-ascii(10, 8, ".", leased-address),
        " to ",
        binary-to-ascii(16, 8, ":", substring(hardware, 1, 6)),
        " on ",
        binary-to-ascii(10, 8, ".", option agent.remote-id),
        " port ",
        binary-to-ascii(10, 8, "-", option agent.circuit-id)));

  } else {
    # otherwise, we probably have an HP MAC address, so just print the rem-
    # ID as comma-separated hex; this doesn't hurt for anything else, anyway,
    # as we can always translate it
    log(
      info,
      concat(
        "agent information ",
        binary-to-ascii(10, 8, ".", leased-address),
        " to ",
        binary-to-ascii(16, 8, ":", substring(hardware, 1, 6)),
        " on ",
        binary-to-ascii(16, 8, ":", option agent.remote-id),
        " port ",
        binary-to-ascii(10, 8, "-", option agent.circuit-id)));
  }
}

This gives log output for an HP:

Sep 16 22:56:52 janganmun dhcpd: agent information 172.30.162.130 to 6c:f3:7f:c0:54:cf on 172.30.64.114 port 0-23

... the switch has management IP address 172.30.64.114 and the port is A23 (on a 5412).

Or, for a Cisco:

Sep 16 23:08:23 janganmun dhcpd: agent information 172.30.140.9 to 24:de:c6:c6:51:40 on sw-ucs-rnb-n3 port 2/37 VLAN 3008

... switch sw-ucs-rnb-n3, port Gi2/0/37 on VLAN 3008.

Saturday, 14 September 2013

DHCP Option 82, Cisco switches and routers and the ISC DHCP server

A bit of "fun" with IPv6 multicast this week which I'll come onto in another post...

We're moving into a new building beginning on Monday and our new network makes heavy use of pooled DHCP (with dynamic allocations of private addresses), and an option of fixed private or public addresses for devices needing fixed registrations and/or access from outside the University network.  The dynamic pooled allocations wouldn't need to be registered, enabling odd machines to be brought onto the network more easily, saving the fixed registrations for things that really need them.

Note: I did a follow-up article about this, to cover HP switches as well.  The details of the Cisco data are below, though.

DHCP Option 82, Cisco switches and routers


When using dynamic DHCP to unknown clients, you generally need a way to track the switch ports that hosts attach to so they can be located in the event of misbehaviour.  I think the long term way to do this is something similar to what we've done with our wireless system: using MAC-based authentication to trigger RADIUS accounting messages (or periodic MAC address table scraping).  However, for the short term, I was wondering if I could get DHCP Option 82 to help me out.

DHCP Option 82 is intended for things such as metro Ethernet or cable modems, whereby a switch providing an edge port will insert it into a DHCPREQUEST packet to indicate the switch and port where it was received, thus allowing the location of the host to be logged by the DHCP server.  It all looks promising but I've never got it to work - however, a bit of work today sorted that out.

The problem I had previously was that when Option 82 information is inserted into the DHCP packet, the DHCP server never saw the DHCPREQUEST.  This turned out to be the Cisco router: the default situation for packets received with Option 82 present, but the GIADDR (Gateway IP Address) field blank (0.0.0.0), is that the request will be discarded.  The GIADDR field is set by the relay agent (typically the router) to its address on the interface where the request was received, allowing the server to know which network the client is connected to.

In our situation, the edge switch will set Option 82 and leave GIADDR blank.  The router will relay the packet and fill in GIADDR.  I would imagine this is how most people would use it and it's odd that Cisco's default configuration doesn't do this.

Anyway, to fix this problem, the router interface must be set to trust the Option 82 information, even when GIADDR is blank, you do:

interface Vlan3008
 ip dhcp relay information trusted
 ip helper-address 172.28.208.86
 ip helper-address 172.28.208.87
 ...

... or for all VLANs, you can used the global context configuration command:

ip dhcp relay information trust-all

... then you can configure your edge switch with DHCP Snooping (and ARP Inspection, if required).

Which brings me on to part two of this - how to parse the Cisco Option 82 information in the ISC DHCP server and log it.

Cisco formats for Option 82 data


The edge switch command ip dhcp snooping information option format remote-id hostname above sets Option 82 to log the switch hostname rather than MAC address as the remote-id - I think that's more useful.

Cisco switches default to a binary-encoded vlan-mod-port for the circuit-id, although this can be overridden on a per-interface basis with something like:

interface GigabitEthernet1/0/2
 ip dhcp snooping vlan 3008 information option format-type circuit-id string Gi1/0/2:3008

... the second format is much nicer as you can just print it, but you have to manually set it on a per-port and per-VLAN basis, which is very tedious.  I don't like tedious things, so parsing the vlan-mod-port format is desirable.  There is a document called DHCP Option 82 Configurable Circuit ID and Remote ID on Cisco's website, but that just explains the ASCII variants, so I thought I'd document the default (binary) types too.

Remote ID


There are two formats for this, depending on whether it's the default (switch MAC address) or specified hostname:

012-
06 (= Length)Switch MAC address
1LengthHostname

Circuit ID


This also has two formats, depending on whether the vlan-mod-port option is used, or something custom:

0123456-
04 (= Length)VLAN ID (big endian)Module (slot)Port ID
1LengthPort and VLAN circuit-id string

Parsing Option 82 from Cisco devices in the ISC DHCP server


The following code, stuck at the top level in the dhcpd.conf file will log messages such as dhcpd: agent information 172.30.140.4 to 24:de:c6:c6:51:92 on sw-ucs-rnb-s4 port 1/2 VLAN 3008, if the agent information is present and the remote ID is in ASCII hostname format:

if ((exists agent.remote-id) and
   (substring(option agent.remote-id, 0, 1) = 1)) {

  log(
    info,
    concat(
      "agent information ",
      binary-to-ascii(10, 8, ".", leased-address),
      " to ",
      binary-to-ascii(16, 8, ":", substring(hardware, 1, 6)),
      " on ",
      substring(option agent.remote-id, 2, extract-int(substring(option agent.remote-id, 1, 1), 8)),
      " port ",
      binary-to-ascii(10, 8, "", substring(option agent.circuit-id, 4, 1)),
      "/",
      binary-to-ascii(10, 8, "", substring(option agent.circuit-id, 5, 1)),
      " VLAN ",
      binary-to-ascii(10, 16, "", substring(option agent.circuit-id, 2, 2))));
}

The ISC server's expression syntax is fairly limited and makes coping with alternatives very difficult - for example, to handle the same thing as above but with the remote ID being a MAC address, it would require an appropriate complete copy of the statement above, in addition, if both are to be handled.