Friday, 27 March 2015

NAT in a VRF on a 7200 (in GNS3)

An interesting aside here — I've been updating my GNS3 model of the University of Cambridge network to include a second connection to Janet (the UK Education and Research network).  We want to run the links and NAT boxes (which are ASA5580-20s) in active-active mode so I've had to add those.

I can't be bothered trying to wrestle with a pair of ASAs in multi-context mode (which you need for active-active) so I looked at using NAT under IOS on a 7200 (the platform we use as the router equivalent of our Catalyst 6500-Es in GNS3).  I don't need stateful switchover, but I do want the basic service to failover in the simulation.

This turns out to be very simple as the 7200s support NAT in VRFs and we can use HSRP to do the active-active balancing.

Background

CUDN border and NAT from GNS3
Our NAT service works by putting an ASA on a stick, attached to the border routers.  Some PBR (Policy Based Routing) redirects traffic destined for the internet (via Janet) to the ASA that comes from University-wide private addresses (which are RFC1918 addresses we route internally and NAT when they leave).

This arrangement means that only traffic to be NATed needs to go through the ASAs: IPv4 public IP addresses and IPv6 flow straight through, so we don't have to worry about handling those, which not only reduces load on the ASAs but also means we don't have to work out how to get things like multicast through them.

There are two ASAs operating in a pair, handling roughly half of the private addresses each.  This is done by putting half of the private addresses through one context, normally active on one box, and the other half through a second context, normally active on the other.  If either fail, one ASA takes over all the load.

The inside of the ASAs is on a /29 link subnet with static routing: the ASAs provide a redundant first hop address for traffic to them for the router to redirect traffic two.  The outside is a /24 block to provide a pool of public IP addresses to NAT behind.  That subnet is a effectively a regular client subnet.

NATed traffic coming back in goes to the public range, gets de-NATed by the ASAs and sent back into the network on the inside /29.

The router provides first hop redundant gateways on both the inside and outside networks, although in practice there is only one at present.

There are separate inside /29s and outside /24s for each half of the private addresses to be NATed.

In the GNS3 simulation, we're going to replace the ASAs with IOS routers doing NAT.

Router configuration

The configuration of the routers doing the PBR is identical to the real production ones.

First, we create the interfaces to link to the NAT - this is the configuration for the first router, which is going to handle the outside range 131.111.184.0/24 by default (the other outside range will go through 131.111.185.0/24 to load balance - see the BGP configuration later):

interface Ethernet1/2.1981
 description nat-1-outside
 encapsulation dot1Q 1981
 ip address 131.111.184.253 255.255.255.0
 no ip proxy-arp
 standby version 2
 standby 81 ip 131.111.184.254
 standby 81 timers 1 3
 standby 81 priority 200
 standby 81 preempt
 standby 81 track 30 decrement 50
!
interface Ethernet1/2.1982
 description nat-1-inside
 encapsulation dot1Q 1982
 ip address 193.60.92.34 255.255.255.248
 no ip proxy-arp
 standby version 2
 standby 82 ip 193.60.92.33
 standby 82 timers 1 3
 standby 82 priority 200
 standby 82 preempt
 standby 82 track 30 decrement 50
!
interface Ethernet1/2.1983
 description nat-2-outside
 encapsulation dot1Q 1983
 ip address 131.111.185.253 255.255.255.0
 no ip proxy-arp
 standby version 2
 standby 83 ip 131.111.185.254
 standby 83 timers 1 3
 standby 83 priority 190
 standby 83 preempt
 standby 83 track 30 decrement 50
!
interface Ethernet1/2.1984
 description nat-2-inside
 encapsulation dot1Q 1984
 ip address 193.60.92.42 255.255.255.248
 no ip proxy-arp
 standby version 2
 standby 84 ip 193.60.92.41
 standby 84 timers 1 3
 standby 84 priority 190
 standby 84 preempt
 standby 84 track 30 decrement 50

Then we create access lists to match the private address to be NATed - we use all of 172.16.0.0/12, except for 172.31.0.0/16 (don't ask!):

ip access-list extended nat-1_clients
 deny   ip any 128.232.0.0 0.0.255.255
 deny   ip any 129.169.0.0 0.0.255.255
 deny   ip any 131.111.0.0 0.0.255.255
 deny   ip any 192.18.195.0 0.0.0.255
 deny   ip any 193.60.80.0 0.0.15.255
 deny   ip any 193.63.252.0 0.0.1.255
 permit ip 172.16.0.0 0.7.255.255 any
 deny   ip any any
!
ip access-list extended nat-2_clients
 deny   ip any 128.232.0.0 0.0.255.255
 deny   ip any 129.169.0.0 0.0.255.255
 deny   ip any 131.111.0.0 0.0.255.255
 deny   ip any 192.18.195.0 0.0.0.255
 deny   ip any 193.60.80.0 0.0.15.255
 deny   ip any 193.63.252.0 0.0.1.255
 permit ip 172.24.0.0 0.3.255.255 any
 permit ip 172.28.0.0 0.1.255.255 any
 permit ip 172.30.0.0 0.0.255.255 any
 deny   ip any any

We then create the route-maps to do PBR and redirect traffic across the /29 'inside' links:

route-map nat_redirect permit 110
 match ip address nat-1_clients
 set ip next-hop 193.60.92.38
!
route-map nat_redirect permit 120
 match ip address nat-2_clients
 set ip next-hop 193.60.92.46

... and attach them to the inside interfaces (linking to the core routers):

interface Ethernet1/0
 description CORE-CENT
 ip policy route-map nat_redirect
!
interface Ethernet1/1
 description CORE-MILL
 ip policy route-map nat_redirect

I mentioned we were going to use this router to handle the outside range 131.111.184.0/24 and the other 131.111.185.0/24.  To steer inbound traffic via this router, we want to advertise that prefix to Janet explicitly, so we need to add this to our BGP configuration:

router bgp 64602
 address-family ipv4 unicast
  network 131.111.184.0 mask 255.255.255.0
 exit-address-family
 !
 address-family ipv4 multicast
  network 131.111.184.0 mask 255.255.255.0
 exit-address-family

And also add that range to the outbound prefix list - I've created a new prefix-list under a specific name as I think it best to keep lists the same across routers, if their names are the same - these are not, so I've changed it on each:

ip prefix-list janetc-out_prefixes seq 5 permit 128.232.0.0/16
ip prefix-list janetc-out_prefixes seq 10 permit 129.169.0.0/16
ip prefix-list janetc-out_prefixes seq 15 permit 131.111.0.0/16
ip prefix-list janetc-out_prefixes seq 20 permit 192.18.195.0/24
ip prefix-list janetc-out_prefixes seq 25 permit 192.84.5.0/24
ip prefix-list janetc-out_prefixes seq 30 permit 192.153.213.0/24
ip prefix-list janetc-out_prefixes seq 35 permit 193.60.80.0/20
ip prefix-list janetc-out_prefixes seq 40 permit 193.63.252.0/23
ip prefix-list janetc-out_prefixes seq 45 permit 131.111.184.0/24

Finally, you may have spotted the track directive to HSRP above, in the interface definitions.  This is to cause HSRP to lower its priority, if this router loses direct connectivity to Janet.  The 7200 can't track the metric of a BGP router (unlike a Catalyst 6500), so I've just made it track the interface connecting to Janet:

track 30 interface Ethernet1/3 ip routing

This causes the gateways on the outside and inside interfaces to be handled by the router which has the best connectivity to Janet.

NAT configuration

The 7200 supports NAT in VRFs - while we don't strictly need to use them, it is a nice way of keeping the routing tables for the two NAT context separate (otherwise it wouldn't be clear which outside gateway was going to be used as the default route back to the router to go on to Janet).

First, create the VRFs:

vrf definition nat-1_vrf
 rd 64602:1981
 address-family ipv4
 exit-address-family
!
vrf definition nat-2_vrf
 rd 64602:1983
 address-family ipv4
 exit-address-family

Then some access lists to identify the clients to be NATed:

ip access-list standard nat-1-clients
 permit 172.16.0.0 0.7.255.255
!
ip access-list standard nat-2-clients
 permit 172.24.0.0 0.3.255.255
 permit 172.28.0.0 0.1.255.255
 permit 172.30.0.0 0.0.255.255

Create the two NAT pools and mappings (note the 'vrf' option and  'match-in-vrf' which is essential when using VRFs):

ip nat pool nat-1c-pool 131.111.184.1 131.111.184.1 netmask 255.255.255.0
ip nat inside source list nat-1-clients pool nat-1c-pool vrf nat-1_vrf match-in-vrf overload
!
ip nat pool nat-2c-pool 131.111.185.1 131.111.185.1 netmask 255.255.255.0
ip nat inside source list nat-2-clients pool nat-2c-pool vrf nat-2_vrf match-in-vrf overload

The configure the outside and inside interfaces, putting them in the appropriate VRFs and configuring NAT.  Note that we need HSRP (with different priorities across both contexts, to balance load) on the inside as that is used as a static route destination by the connecting router.  However, we don't need this on the outside as we're using different outside addresses on each router (131.111.184.2 and 131.111.185.2, respectively):

interface Ethernet1/0.1981
 description nat-1-outside
 vrf forwarding nat-1_vrf
 encapsulation dot1Q 1981
 ip address 131.111.184.250 255.255.255.0
 no ip proxy-arp
 ip nat outside
!
interface Ethernet1/0.1982
 description nat-1-inside
 vrf forwarding nat-1_vrf
 encapsulation dot1Q 1982
 ip address 193.60.92.37 255.255.255.248
 no ip proxy-arp
 ip nat inside
 standby version 2
 standby 162 ip 193.60.92.38
 standby 162 timers 1 3
 standby 162 priority 200
 standby 162 preempt
!
interface Ethernet1/0.1983
 description nat-2-outside
 vrf forwarding nat-2_vrf
 encapsulation dot1Q 1983
 ip address 131.111.185.250 255.255.255.0
 ip nat outside
!
interface Ethernet1/0.1984
 description nat-2-inside
 vrf forwarding nat-2_vrf
 encapsulation dot1Q 1984
 ip address 193.60.92.45 255.255.255.248
 no ip proxy-arp
 ip nat inside
 standby version 2
 standby 164 ip 193.60.92.46
 standby 164 timers 1 3
 standby 164 priority 190
 standby 164 preempt

Finally, a bit of static routing for the inside and outside destinations, across both VRFs:

ip route vrf nat-1_vrf 0.0.0.0 0.0.0.0 Ethernet1/0.1981 131.111.184.254
ip route vrf nat-1_vrf 172.16.0.0 255.248.0.0 Ethernet1/0.1982 193.60.92.33
ip route vrf nat-1_vrf 172.24.0.0 255.252.0.0 Ethernet1/0.1982 193.60.92.33
ip route vrf nat-1_vrf 172.28.0.0 255.254.0.0 Ethernet1/0.1982 193.60.92.33
ip route vrf nat-1_vrf 172.30.0.0 255.255.0.0 Ethernet1/0.1982 193.60.92.33
!
ip route vrf nat-2_vrf 0.0.0.0 0.0.0.0 Ethernet1/0.1983 131.111.185.254
ip route vrf nat-2_vrf 172.16.0.0 255.248.0.0 Ethernet1/0.1984 193.60.92.41
ip route vrf nat-2_vrf 172.24.0.0 255.252.0.0 Ethernet1/0.1984 193.60.92.41
ip route vrf nat-2_vrf 172.28.0.0 255.254.0.0 Ethernet1/0.1984 193.60.92.41
ip route vrf nat-2_vrf 172.30.0.0 255.255.0.0 Ethernet1/0.1984 193.60.92.41

And that's all there is to it!

This doesn't give stateful failover (preserving translations), but my simulation only needs to pass traceroutes and pings, so it seems a lot of unnecessary work, as that's completely different on the ASAs.

Friday, 6 March 2015

Nexus management port not sending IGMP Membership Reports

OK - I've spent a day getting annoyed by this!  I was trying to get two Nexus 56128Ps (running NX-OS 7.0(3)) across their management interfaces with CFS to synchronise configuration with switch-profiles.

I had the Mgmt0 interfaces connected to a Cisco 2960 as access ports, with no other connections in that VLAN and everything worked fine.

However, when I tried to connect an uplink from the 2960 to our main network (on a test VLAN) synchronisation broke with show switch-profile status reporting that the Peer is unreachable.  Disconnecting the cable fixed the problem again, immediately.

The problem

After a lot of mucking about, this turned out to be an IGMP issue - the Management0 port on the Nexus switches advertise their presence to each other using multicast messages to a group (239.255.70.83).  However, they weren't sending IGMP Membership Report messages to indicate they themselves want to join the group, preventing the the announcements from reaching each other.

When the switch was not connected to the rest of the network, there was no IGMP Querier, so the switch resorted to flooding multicast traffic.  However, when connected to the main network, the IGMP Membership Query messages from the router started reaching the 2960 and it started to limit flooding.

Pulling the uplink cable from the 2960 immediately aged out the Querier and flood recommenced.  However, if the VLAN was severed in a way not known to the 2960 (e.g. removing the VLAN from the upstream switch), the Querier would take 3 minutes to expire (as expected) before things began to work again.

After some poking about, fiddling around with the configuration of the router, it appears that IGMPv2 is supported by the management interface but IGMPv3 (which is our default) is not.

The fix

Fixing this could be solved in one of three ways:
  • Disabling multicast routing on the VLAN,
  • Changing the IGMP version to 2 (instead of 3), if this has been raised, or
  • Disabling IGMP Snooping on the switches on the management VLAN (e.g. no ip igmp snooping vlan XXX)
I can't find mention of this in the Cisco documentation, nor a way of changing the IGMP version on the Nexus 56128Ps.

Wednesday, 4 March 2015

FEXs on a Nexus 7010

I've been considering how to deal with connecting our Aruba wireless controllers, when we switch over the Nexus stuff - they're currently linked directly to our existing Catalyst 6509-Es in the data centre, so we need to do something, even in the interim.  One solution that looks sensible is FEXs attached directly to the Nexus 7010s, so I thought I'd give that a quick try, as I've so far done this on the 56128Ps, only.

First difference is that the FEX functionality must be installed before it can be activated as a feature - similar to MPLS.  This must be done in the admin VDC, then it can be activated in the local VDC:

n7k-top# conf t
Enter configuration commands, one per line.  End with CNTL/Z.
n7k-top(config)# install feature-set fex
n7k-top(config)# end
n7k-top# switchto vdc srv
...
n7k-top-srv# conf t
Enter configuration commands, one per line.  End with CNTL/Z.
n7k-top-srv(config)# feature-set fex

From then on, things work pretty much as they do on the 56128Ps.

However, one important difference is that it seems not to be possible to use vPC on a FEX fabric port, unlike a 56128P, at least on NX-OS 6.2:

n7k-top-srv(config)# int e1/24
n7k-top-srv(config-if)# channel-group 102

n7k-top-srv(config)# int po102
n7k-top-srv(config-if)# switchport mode fex-fabric 
n7k-top-srv(config-if)# fex associate 102
n7k-top-srv(config-if)# vpc 102 
ERROR: Operation failed: [FEX interface is not capable of acting as vPC] 

This means it's not possible to dual-attach FEXs to two 7010s.  However, it is possible to create a vPC across two FEXs attach to 7ks which themselves are acting as a vPC pair: any devices wanting redundancy on these will need to attach to two FEXs.

Monday, 23 February 2015

Problems with NX-OS configuration synchronization (switch-profiles)

Now that configuration synchronization (through a switch-profile in configure sync mode) has been set up, the relevant parts of the configuration across both upstream Nexus 5ks should be kept in step (in particular, the ports on the FEXs).

However, during testing, a number of issues were encountered, making this feature look very useful but need a lot of care and attention, as if they arise it is often necessary to do some fiddling about to recover things.  However, so far none of these processes would appear to result in a service outage.

Running configuration getting out step

Sometimes, the running configuration can get out of step with the local configuration and synchronised switch-profile configuration - everything looks correct when you look at the files, but you get an error when you try to commit changes.

To detect this situation, compare the relevant portions of show running-config ... and show running-config switch-profile and see if they reconcile.  If they don't, you can compare these with what the internal configuration of the switch says with two internal commands:

n5k-top# show run int e1/3
...
interface Ethernet1/3

n5k-top# show system internal csm info global-db cmd-tbl | sec Ethernet1/3
parent_seq_no= 0,  seq_no= 34,  clone_seq_no= 0, cmd= 'interface Ethernet1/3'
    parent_seq_no= 34,  seq_no= 35,  clone_seq_no= 0, cmd= 'description b2'

n5k-top# show system internal csm info switch-profile cfgd-db cmd-tbl | sec Ethernet1/3
parent_seq_no= 0,  seq_no= 146,  clone_seq_no= 0, cmd= 'interface Ethernet1/3'
    parent_seq_no= 146,  seq_no= 150,  clone_seq_no= 0, cmd= 'channel-group 102'
    parent_seq_no= 146,  seq_no= 147,  clone_seq_no= 0, cmd= 'description b2'
    parent_seq_no= 146,  seq_no= 149,  clone_seq_no= 0, cmd= 'fex associate 102'
    parent_seq_no= 146,  seq_no= 148,  clone_seq_no= 0, cmd= 'switchport mode fex-fabric'

The first shows what is in the running configuration of Ethernet1/3, the second shows the local configuration and the third the one from the switch-profile (here for a block pertaining to Ethernet1/3).  Clearly these don't tally up.

Somehow things have gone awry, but it's easy to fix by resyncing the database which, in this case, caused the local configuration to clear itself (but sometimes it can have other effects):

n5k-top# conf sync
Enter configuration commands, one per line.  End with CNTL/Z.
n5k-top(config-sync)# resync-database
Re-synchronization of switch-profile db takes a few minutes...
Re-synchronize switch-profile db completed successfully.
n5k-top(config-sync)# end

n5k-top# show system internal csm info global-db cmd-tbl | sec Ethernet1/3

Over the next few seconds, things will sync up and sort themselves out again and you'll have your interface configuration back:

n5k-top# show run int e1/3
...
interface Ethernet1/3
  description b2
  switchport mode fex-fabric
  fex associate 102
  channel-group 102

I found an explanation of this here: http://nexp.com.ua/technologies/nx-os/troubleshooting-nx-os-config-sync/ - that's a good place to start before trying the second (more drastic) method.

Stub configuration stanza removal

If it is impossible to remove stub configuration stanzas (the opening line of a block, starting a subcontext in the configuration file - e.g. interface ... or fex ...), from either configuration, if it has been entered by mistake.

For example, if you add a FEX to the switch profile:

n5k-bottom# conf sync
n5k-bottom(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1
n5k-bottom(config-sync-sp)# fex 103
n5k-bottom(config-sync-sp-fex)# commit
Verification successful...
Proceeding to apply configuration. This might take a while depending on amount of configuration in buffer.
Please avoid other configuration changes during this time.
Commit Successful

Then you mistakenly go to configure the FEX in local (configure terminal) mode but realise your mistake, even enter no subcommands, you cannot remove the fex stanza because it would clash (= be mutually exclusive, in the parlance of configuration synchronization) with the declaration in the switch profile:

n5k-bottom# conf t
n5k-bottom(config)# fex 103
n5k-bottom(config-fex)# no fex 103
Error: Command is not mutually exclusive

You also cannot remove the declaration in the switch profile as that would clash with the one in the local configuration!

n5k-bottom(config)# conf sync
n5k-bottom(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1
n5k-bottom(config-sync-sp)# no fex 103
n5k-bottom(config-sync-sp)# commit
Failed: Verify Failed
n5k-bottom(config-sync-sp)# show switch-profile status
...
Local information:
----------------
Status: Verify Failure
Error(s):
Following commands failed mutual-exclusion checks:
no fex 103

You can see these two commands with this handy internal command:

n5k-bottom# show system internal csm info global-db cmd-tbl | sec "fex 103"
parent_seq_no= 0,  seq_no= 238,  clone_seq_no= 0, cmd= 'fex 103'
n5k-bottom# show system internal csm info switch-profile cfgd-db cmd-tbl | sec "fex 103"
parent_seq_no= 0,  seq_no= 237,  clone_seq_no= 0, cmd= 'fex 103'

The only solution to this problem that I'm aware of is to, step-by-step on both switches (except for the first command, which will do both at the same time):
  1. Use the no switch-profile ... profile-only all in configure sync mode to remove the profile.  This will break the synchronisation and move all the commands in the profile to the local configuration on both switches.
  2. Remove the unwanted parts in configure terminal mode.
  3. Create a new switch-profile.
  4. Use import running-config to read in the current configuration into the profile.
  5. commit the new profile, which will move the commands from the local configuration into the switch-profile.
  6. Re-establish the synchronisation peer.
This appears to be non-destructive but is a little worrying!

With an example - first break the synchronisation and copy the commands in the switch-profile to the local configuration:

n5k-bottom# conf sync
Enter configuration commands, one per line.  End with CNTL/Z.
n5k-bottom(config-sync)# no switch-profile wcdc-b profile-only all

WARNING: Deleting switch-profile will not remove all commands configured under switch-profile. This will only remove the switch profile. Are you sure you want to delete the switch-profile from the system ?
Are you sure? (y/n)  [n] y
Verification successful...
Proceeding to delete switch-profile. This might take a while depending on amount of configuration under a switch-profile.
Please avoid other configuration changes during this time.
Delete Successful

At this point, the commands you don't want will still be there, but only in the local configuration, where they can be removed directly on both switches, without issue:

n5k-bottom# conf t
Enter configuration commands, one per line.  End with CNTL/Z.
n5k-bottom(config)# no fex 103

Now re-create the switch-profile and import the (now clean) running configuration, then commit it - I've added some show system ... commands to show the commands moving from the local configuration to the switch-profile when the commit is issued:

n5k-bottom# conf sync
Enter configuration commands, one per line.  End with CNTL/Z.
n5k-bottom(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1
n5k-bottom(config-sync-sp)# import running-config
n5k-bottom(config-sync-sp-import)# show system internal csm info global-db cmd-tbl | sec "fex 103"
parent_seq_no= 0,  seq_no= 8,  clone_seq_no= 0, cmd= 'fex 103'
n5k-bottom(config-sync-sp-import)# commit
Verification successful...
Proceeding to apply configuration. This might take a while depending on amount of configuration in buffer.
Please avoid other configuration changes during this time.
Commit Successful
n5k-bottom(config-sync)# show system internal csm info global-db cmd-tbl | sec "fex 103"
n5k-bottom(config-sync)# show system internal csm info switch-profile cfgd-db cmd-tbl | sec "fex 103"
parent_seq_no= 0,  seq_no= 8,  clone_seq_no= 0, cmd= 'fex 103'

Once this has been done on both switches, the synchronisation peering can be re-established and hopefully everything will be OK:

n5k-bottom# conf sync
Enter configuration commands, one per line.  End with CNTL/Z.
n5k-bottom(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1
n5k-bottom(config-sync-sp)# sync-peers dest 192.168.200.2
n5k-bottom(config-sync-sp)# end

Wait a few seconds:

n5k-bottom# show switch-profile status
...
Local information:
----------------
Status: Commit Success
Error(s):

Peer information:
----------------
IP-address: 192.168.200.2
Sync-status: In sync
Status: Commit Success
Error(s):

All rather messy!  What NX-OS needs is some sort of "delete fex ..." command to remove the block from the configuration you're editing, but not try and remove the block from the running-config itself.

Config-revision getting out of step

I'm not sure how I got in this state, but I was testing FEX preprovisioning and what happened when the configured model mismatches the connected one, then fixing this.  This messed things up when I tried to change the model - the command verified OK but failed on the peer when I tried committing, claiming there was a mismatch.

After trying to undo this, I got in a weird situation where the peer had lost the configuration of some port channel interfaces in the switch profile.  On closer inspection, in show switch-profile the peer had a Config-revision listed one less than the switch I committed the change on.

I couldn't find a command to force the two switches to synchronise (including making a change and trying to commit it).  Breaking the peering (by removing the sync-peers destination ... entry and re-adding it on one switch) didn't fix it, with the switch grumbling the commands mismatched:

n5k-bottom# show switch-profile status

switch-profile  : wcdc-b
----------------------------------------------------------

Start-time: 221438 usecs after Mon Feb 12 02:59:18 2001
End-time: 885937 usecs after Mon Feb 12 02:59:19 2001

Profile-Revision: 11
Session-type: Import-Verify
Session-subtype: -
Peer-triggered: No
Profile-status: Verify Failed

Local information:
----------------
Status: Verify Success
Error(s):

Peer information:
----------------
IP-address: 192.168.200.2
Sync-status: Not yet merged
Merge Flags: pending_merge:1 rcv_merge:1 pending_validate:0
Status: Verify Failure
Error(s):
Validation Failed: Config validation failed as found changes on both sides. rcvd
_rev: 0, expected_rev: 0
interface Ethernet1/3
        switchport mode fex-fabric
interface Ethernet1/3
        fex associate 102
interface Ethernet1/3
        channel-group 102
interface Ethernet1/4
        switchport mode fex-fabric
interface Ethernet1/4
        fex associate 102
interface Ethernet1/4
        channel-group 102

Indeed, those commands listed were the ones missing on the peer but present on the one I was making the change on.

Fixing this was fairly straightforward but tedious - the missing commands just needed adding back into the switch profile and committing:

n5k-top# conf sync
n5k-top(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1
n5k-top(config-sync-sp)# int e1/3
n5k-top(config-sync-sp-if)# switchport mode fex-fabric
n5k-top(config-sync-sp-if)# fex associate 102
n5k-top(config-sync-sp-if)# channel-group 102
n5k-top(config-sync-sp-if)# int e1/4
n5k-top(config-sync-sp-if)# switchport mode fex-fabric
n5k-top(config-sync-sp-if)# fex associate 102
n5k-top(config-sync-sp-if)# channel-group 102
n5k-top(config-sync-sp-if)# verify
Verification Successful
n5k-top(config-sync-sp)# commit
Verification successful...
Proceeding to apply configuration. This might take a while depending on amount of configuration in buffer.
Please avoid other configuration changes during this time.
Commit Successful

A few seconds after this was done, the two switches automatically recovered and synced up:

n5k-top# show switch-profile status
...
Peer information:
----------------
IP-address: 192.168.200.1
Sync-status: In sync
Status: Commit Success
Error(s):

Invented/implicit and non-canonical commands

You can get in a muddle if you rely on NX-OS to insert implicit commands, or use the non-canonical forms of some commands.  What do I mean by this?  Well - consider the following configuration for a vPC link port channel interface:

rk-wcdc-b-b(config)# interface Port-Channel1
rk-wcdc-b-b(config-if)# switchport mode trunk
rk-wcdc-b-b(config-if)# switchport trunk allowed vlan remove 1
rk-wcdc-b-b(config-if)# vpc peer-link

Entering the last of these commands will display the following warning:

Please note that spanning tree port type is changed to "network" port type on vPC peer-link.
This will enable spanning tree Bridge Assurance on vPC peer-link provided the STP Bridge Assurance

(which is enabled by default) is not disabled.

When we look at the configuration, we can see two changes from what we entered (ignoring the speed command, which seems not to matter):

rk-wcdc-b-b# show run int po12
...
interface port-channel12
  description vpc-peer
  switchport mode trunk
  switchport trunk allowed vlan 2-4094
  spanning-tree port type network
  speed 40000
  vpc peer-link

The allowed vlan list just shows what's included (not what we removed), thus storing a different command from what we entered.  Also, as per the warning above, the spanning-tree port type network command has been automatically inserted.

This is all good stuff but, when you try to create a switch-profile and import the running-config and verify it, it fails on those two lines:

rk-wcdc-b-b# conf sync
rk-wcdc-b-b(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1
rk-wcdc-b-b(config-sync-sp)# import running-config
rk-wcdc-b-b(config-sync-sp-import)# verify
Failed: Verify Failed
rk-wcdc-b-b(config-sync-sp-import)# show switch-profile status
...
Local information:
----------------
Status: Verify Failure
Error(s):
Following commands are not configured under config-terminal mode and hence cannot be imported:
interface port-channel12
        switchport trunk allowed vlan 2-4094
interface port-channel12
        spanning-tree port type network

This is because these commands didn't match what was exactly entered, but something that was either "invented" (inserted automatically, as per the last line) or different (as per the first line).

In this case, the issue can be fixed with the conf sync / resync-database command, but I've seen situations - usually after the switch-profile has been created and the pair of switches synced up - where things get messy and it's necessary to break the synchronisation to fix it.

It is best to try and avoid commands being implicitly inserted and enter them explicitly.

NX-OS configuration synchronisation (switch-profiles)

When using vPC with dual-attached FEXs, a large chunk of the configuration across the two parent Nexus 5k (or 7k) switches must be kept in step.  For example:
  • VLANs must be created and deleted on both switches.
  • Fabric interfaces must have the same configuration with regards channel group assignments, FEX associations, etc.
  • Edge (host) ports must have the same switchport configuration - mode and access/trunk/native VLANs.
This can be done manually but is a bit tedious and error-prone to maintain.  To help, NX-OS has a mechanism whereby parts of the configuration can be synchronised across the switches.

However, this facility is little confusing and can be problematic to set up - it's one of those things where the Cisco documentation makes sense after you understand the basics!

Key concepts

  • Each switch maintains a local configuration which contains things which are NOT synchronised across the switches.  This is configured in the normal way, using configure terminal and has all the usual commands in it and will obviously hold things which differ between the switches.
  • In addition, there is a separate configuration which is synchronised between switches and holds the common elements.  This is configured in a special mode, entered using configure sync in a block called a switch-profile.
  • The running configuration is the result of merging the two configurations.  When using show running-config (including its options to look at specific parts of the configuration), the merged result is shown.
  • Changes to the switch profile are made on one of the switches, verifyed and commited as a single transaction (rather like a database) on both switches automatically; if they fail, things should rollback to how they were before they started.
  • Commands can be imported from the running configuration into the switch profile: this will remove them from the running configuration.
  • Synchronisation can only work across the out-of-band management VRF (which, on a Nexus 5k, limits you to the copper Management0 port on the rear of the unit); it cannot be done in-band.

Setting up synchronisation

There are three main steps to this:
  1. Set up management interface communication and enable Cisco Fabric Services (CFS) over it
  2. Set up a switch profile and import the running configuration into it
  3. Establish synchronisation between the peers
You have to take care to do these in the correct order otherwise things can get in a muddle.  Cisco describe this rather briefly on this page.

Set up management interface communication

Once the profile The first thing to do is allow communication between switches using the management interface and enable Cisco Fabric Services (CFS) over IPv4.  CFS is used to communicate the synchronisation and only operates over the management VRF:

n5k-bottom# conf t
n5k-bottom(config)# int mgmt0
n5k-bottom(config-if)# ip addr 192.168.200.1/24
n5k-bottom(config-if)# exit
n5k-top(config)# cfs ipv4 distribute
n5k-top(config)# end

Set up the switch profile and import the running configuration

The switch profile is created in the special configure sync mode.

The profile must be given a name - this is used to identify the configuration which must be synchronised and must match across the synchronisation peers - I recommend using something to identify the area the switches and their FEXs will serve (e.g. the room and rack row).

Once this is done, the running configuration can be imported, verified and committed (the verification stage can be omitted, if required, but it's a good idea to check things first).  This will move the synchronisable elements of the configuration from the running configuration into the switch profile.

n5k-bottom# conf sync
Enter configuration commands, one per line.  End with CNTL/Z.
n5k-bottom(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1


n5k-bottom(config-sync-sp)# import running-config

n5k-bottom(config-sync-sp-import)# verify

Verification Successful
n5k-bottom(config-sync-sp-import)# commit
Verification successful...
Proceeding to apply configuration. This might take a while depending on amount of configuration in buffer.
Please avoid other configuration changes during this time.
Commit Successful

All this must be done on BOTH switches independently.

Establish synchronisation between the peers

The synchronisation can now be enabled between the peers in the switch profile:

n5k-bottom# conf sync
Enter configuration commands, one per line.  End with CNTL/Z.
n5k-bottom(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1

n5k-bottom(config-sync-sp)# sync-peers destination 192.168.200.2

Once this is entered on both switches, they should find each other, exchange information and sync up, comparing the imported running configuration with each other and ensuring they agree (or, if they differ, they can be merged without conflict).  This will typically take 10-20 seconds and can be checked with show switch-profile status; the Peer information / Status will change from Peer not reachable to Verify Success to Commit Success and then finally the Sync-status will say In Sync, when this is complete:

n5k-bottom# show switch-profile status

switch-profile  : wcdc-b
----------------------------------------------------------

Start-time:   6070 usecs after Sun Feb 11 23:42:16 2001
End-time: 571814 usecs after Sun Feb 11 23:42:17 2001

Profile-Revision: 1
Session-type: Initial-Exchange
Session-subtype: Init-Exchange-All
Peer-triggered: Yes
Profile-status: Sync Success

Local information:
----------------
Status: Commit Success
Error(s):

Peer information:
----------------
IP-address: 192.168.200.1
Sync-status: In sync
Status: Commit Success
Error(s):

Once this has completed, and the two switches are in sync, it's ready for use.

Using synchronisation

From this point onwards, any changes to made across both switches should be made in configure sync mode, in the switch-profile then comited to take effect, rather than in configure terminal mode.  For example, to set up a new FEX, you might do:

n5k-bottom# conf sync
Enter configuration commands, one per line.  End with CNTL/Z.
n5k-bottom(config-sync)# switch-profile wcdc-b
Switch-Profile started, Profile ID is 1
n5k-bottom(config-sync-sp)# int e1/3-4
n5k-bottom(config-sync-sp-if-range)# desc b2
n5k-bottom(config-sync-sp-if-range)# channel-group 102
n5k-bottom(config-sync-sp-if-range)# exit
n5k-bottom(config-sync-sp)# int po102
n5k-bottom(config-sync-sp-if)# desc b2
n5k-bottom(config-sync-sp-if)# switchport mode fex-fabric
n5k-bottom(config-sync-sp-if)# fex associate 102
n5k-bottom(config-sync-sp-if)# vpc 102
n5k-bottom(config-sync-sp-if)# verify
Verification Successful
n5k-bottom(config-sync-sp)# commit
Verification successful...
Proceeding to apply configuration. This might take a while depending on amount of configuration in buffer.
Please avoid other configuration changes during this time.
Commit Successful

If something is accidentally configured in configure terminal mode, sometimes it can just be removed (with some no ... commands), but it may take some work if mutual exclusion errors are encountered - I'll cover this on a future entry.

Friday, 20 February 2015

FEX testing

Now I've got the FEXs up and running, there are various things I wanted to test and confirm so I can better understand how they work and, more importantly, how to find things that have gone wrong.

Port-VLAN inconsistency across upstream switches

When a FEX is dual-homed to a pair of upstream switches, each of those switches will have a configuration for the ports on the FEX.  By default, these are NOT automatically kept in step (although something called configuration synchronization can do this, which I'll go over later as I've been looking at that too).

If the ports are configured identically on both switches, everything is hunky dory:

n5k-top(config)# int e101/1/3
n5k-top(config-if)# switchport access vlan 808

n5k-bottom(config)# int e101/1/3
n5k-bottom(config-if)# switchport access vlan 808

n5k-top# show int e101/1/3 status

--------------------------------------------------------------------------------
Port          Name               Status    Vlan      Duplex  Speed   Type
--------------------------------------------------------------------------------
Eth101/1/3    --                 connected 808       full    a-1000  --

However, if one of the switches has a different configuration, the port becomes inactive:

n5k-bottom(config)# int e101/1/3
n5k-bottom(config-if)# switchport access vlan 812

n5k-bottom# show int e101/1/3 status

--------------------------------------------------------------------------------
Port          Name               Status    Vlan      Duplex  Speed   Type
--------------------------------------------------------------------------------
Eth101/1/3    --                 inactive  812       full    auto    --

This error can be explained with the "show int ... status err-vlans" command:

n5k-bottom# show int e101/1/3 status err-vlans

--------------------------------------------------------------------------------
Port         Name               Err-Vlans                     Status
--------------------------------------------------------------------------------
Eth101/1/3   --                 812                           Vlan is not
                                                              configured on
                                                              remote vPC
                                                              interface

It doesn't show up in the verbose show int command output, which seems a bit of an omission.

Once corrected, the port immediately springs back into life.

Wednesday, 18 February 2015

FEXs (Fabric Extenders) and vPCs (Virtual Port Channels)

The new data centre design makes extensive uses of Nexus 2k-series FEXs (Fabric EXtenders) as ToR (Top of Rack) units connected back to pairs of Nexus 56128Ps as the EoR (End of Row) units.

These links are all made using Virtual Port Channels, which are cross-chassis aggregated links - these are abbreviated "vPC" in Cisco parlance; some other vendors call this MLAG (Multi-chassis Link AGgregation).

vPC offers benefits over Spanning Tree in that it doesn't block any links and avoids the use of Spanning Tree as a redundancy protocol, leaving it just to prevent loops (which is what it does best).

Design

Our existing data centre network uses Extreme Summit X450s and X460s as ToR units, connected back by two links: one to each upstream switch, with one blocked by Spanning Tree.  The basic equivalent replacement (in absence of anyone requesting anything more fancy) is a Nexus 2248TP (48x 100/1000 copper switch) with four FEX fabric links: 2 to each EoR 56128P.

This requires two ports on each EoR unit to be aggregated as part of a 4 port vPC group across the two chassis.  Once that link is set up, it will be configured as a FEX fabric link to attach the ToR FEX unit.

Configuring vPC peering

To configure vPC, the first thing to do is turn it on as a feature, along with LACP, which will be needed to set up the peer link:

n5k-bottom(config)# feature vpc
n5k-bottom(config)# feature lacp

vPC requires the pair of switches involved are linked together with two types of links:
  • the peer link carries the bulk of the traffic (keeping the MAC tables in step and forwarding traffic between the peer switches, in the event of some downstream connectivity failing - we're using a pair of diversely-routed 40GE links for this
  • the keepalive link is used to exchange status and keepalive messages and just determine if the peer switch is still there and operating; not much traffic is exchanged across this link and it isn't critical, as long as the other link does not fail - we're using a single 1GE link for this

vPC keepalive link

The keepalive link requires an IP address for the two switches to communicate which defaults to the management interface mgmt0 (in the management VRF).  This interface is a copper connection on the rear of the 56128P and we don't typically run copper between racks, so we're going to use a fibre link on the front panel.

As this is a single physical link and there is no reason to extend it elsewhere, we'll configure it a non-switchport interface with an IP address directly in the default VRF but using IP addresses we don't use on the University network:

n5k-bottom(config)# int e1/48
n5k-bottom(config-if)# desc vpc-keepalive
n5k-bottom(config-if)# no switchport
n5k-bottom(config-if)# ip address 192.168.100.1/30

... and then 192.168.100.2/30 on the peer switch.

vPC domain

Once the keepalive link is set up, the vPC peering can be set up by creating a vpc domain.  A switch can participate in only a single vPC domain and it must agree on both peers.

n5k-bottom(config)# vpc domain 1
n5k-bottom(config-vpc-domain)# peer-keepalive destination 192.168.100.2 vrf default source 192.168.100.1
n5k-bottom(config-vpc-domain)# system-priority 8192
n5k-bottom(config-vpc-domain)# role priority 8192

The source IP address must be specified when the default VRF is used.  The two priorities are as follows:
  • the system-priority MUST MATCH on both switches and is used as the LACP system priority for a link; it should be lowered (= higher priority, as per LACP/STP) on the 'upstream' end of a link - I've used 8192
  • the role priority (note there is no hyphen in this option name) SHOULD DIFFER on each switch and should be lowered (= higher priority) on the switch you wish to be the primary/active one in the event of a failure of the peer link - I use 8192 on the primary and 16384 on the secondary

vPC peer link

The peer link is a special type of switched link that must itself be a port channel: the interfaces to be used for it must first be grouped and then that group assigned as a peer link.  A link cannot be configured as a peer link until the vpc domain has been created, so this step must be done last:

n5k-bottom(config)# int e1/49-50
n5k-bottom(config-if-range)# channel-group 1 mode active

n5k-bottom(config)# int e1/49
n5k-bottom(config-if)# desc vpc-peer/1

n5k-bottom(config)# int e1/50
n5k-bottom(config-if)# desc vpc-peer/2

n5k-bottom(config)# int po1
n5k-bottom(config-if)# desc vpc-peerlink
n5k-bottom(config-if)# switchport mode trunk
n5k-bottom(config-if)# switchport trunk allowed vlan remove 1
n5k-bottom(config-if)# vpc peer-link

The VLANs to be supported across vPC must be present on both switches and added to the peer link - in the event of a link somewhere failing, it is possible traffic will need to pass across the peer link.  Any VLANs which are not to be used across vPC should be removed explicitly - we never use the default VLAN (= 1) so I've removed that; if any uplink VLANs are present, they probably should also be removed, but it's probably best to keep the list generally open to avoid mistakes caused by missing individual VLANs out (which isn't what we normally do with our trunks, as I don't like VLANs going places I didn't explicitly want).

Adjacency formed

Once this is done, the peering should be formed:

n5k-bottom# show vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1
Peer status                       : peer adjacency formed ok
vPC keep-alive status             : peer is alive
Configuration consistency status  : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : primary
Number of vPCs configured         : 0
Peer Gateway                      : Disabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Enabled (timeout = 240 seconds)

vPC Peer-link status
---------------------------------------------------------------------
id   Port   Status Active vlans
--   ----   ------ --------------------------------------------------
1    Po1    up     -

We're now ready to set up ports in a vPC.

FEXs

FEXs are controlled entirely from their parent switch (in our case, the EoR 56128Ps) and have no local configuration, including their firmware (which will be upgraded as required).  Their identity and configuration is governed by which ports they are connected to, allowing a ToR unit to be replaced or swapped about and all setup will automatically be applied.

Before FEXs can be used, the feature must be enabled:

n5k-bottom(config)# feature fex

When a FEX is attached to a parent switch, it will be discovered but not action will be taken with it (including updating its firmware) until it is brought online:


n5k-bottom# show fex
  FEX         FEX           FEX              FEX              Fex
Number    Description      State            Model            Serial
------------------------------------------------------------------------
---       --------            Discovered   N2K-C2248TP-E-1GE   FOX1848GCMB

Configuring and associating a FEX via vPC

To configure a vPC group of ports to be used by FEXs across switches into a vPC group, they must first be put into a local [static] port channel group; note that LACP is not supported.  The port channel interface can then be assigned a vPC number and set associated with a FEX and then the port channel interface itself assigned a vPC number to match it against a group on the peer switch:

n5k-bottom(config-if)# int e1/1-2
n5k-bottom(config-if-range)# desc b1
n5k-bottom(config-if-range)# channel-group 101

n5k-bottom(config-if)# int e1/1
n5k-bottom(config-if-range)# desc b1/1

n5k-bottom(config-if)# int e1/2
n5k-bottom(config-if-range)# desc b1/2

n5k-bottom(config)# int po101
n5k-bottom(config-if)# desc b1
n5k-bottom(config-if)# switchport mode fex-fabric
n5k-bottom(config-if)# fex associate 101
n5k-bottom(config-if)# vpc 101


n5k-bottom(config)# fex 101
n5k-bottom(config-fex)# desc b1

FEXs are numbered 100-199 - for consistency and the avoidance of mistakes, it is common to use the same local port channel and vPC numbers as the FEX is assigned.  The FEX number controls the ethernet interface numbers - for example, FEX 101 with 48 ports will have interfaces on the controlling switch called Ethernet101/1/1-48.

Once a FEX has been associated, it will download an updated image, if required:

n5k-bottom# show fex
  FEX         FEX           FEX              FEX              Fex
Number    Description      State            Model            Serial
------------------------------------------------------------------------
101        b1             Image Download   N2K-C2248TP-E-1GE   FOX1848GCMB

When that's complete, it will reboot and come online properly, making its interfaces available for configuration:

n5k-top# show fex
  FEX         FEX           FEX              FEX              Fex
Number    Description      State            Model            Serial
------------------------------------------------------------------------
101        b1                     Online   N2K-C2248TP-E-1GE   FOX1848GCMB

n5k-top# show int status fex 101

--------------------------------------------------------------------------------
Port          Name               Status    Vlan      Duplex  Speed   Type
--------------------------------------------------------------------------------
Eth101/1/1    --                 notconnec 1         auto    auto    --
Eth101/1/2    --                 notconnec 1         auto    auto    --
Eth101/1/3    --                 notconnec 1         auto    auto    --
...