Demystifying OSPF on a vPC

A common issue among network engineers managing the Cisco Nexus platform is how to form routing adjacencies on hosts over a Virtual Port Channel (vPC) peer link. Back in the day when engineers wanted to do a multi-chassis port-channel to increase redundancy and resiliency you had to use a Nexus 5500 or 7000 with 2200 Fabric Extenders hanging off of them. The basic idea was use the 7000/5500’s at the data center distribution and the 2200’s at the top-of-rack, then have redundant connections from the FEXs to each core. The problem with this architecture is the support for route update traffic across a vPC peer link.  VPCs are a layer-2 virtualization protocol and Cisco hasn’t been able to support routing traffic along these links. That is until recently.

New code makes this feature easy, but if you haven’t, or can’t, upgrade past the 7-oh train you’ll need to use this information in your data center to achieve your desired goal. There are some fundamentals that need to be cleared up before jumping into the configuration. Virtual Port Channels have two components in the physical topology—other than the pair of Nexus switches—that are required to create a vPC. First, the vPC peer link. Second, the vPC keepalive link. Without these two connections configured on the Nexus core, vPCs do not happen. The vPC peer link is the most important of the two within the vPC configuration. It “fools” the switch pair and makes them think there is a single control plane between them. This link acts as the transport for Bridge Protocol Data Units (BPDUs), Link Aggregation Control Protocol (LACP) packets, MAC address synchronization between aggregation groups, and IGMP synchronization when snooping.

If the Nexus uses HSRP, for example, on it’s SVIs the peer link carries the HSRP frames between the switches. Each VLAN ID must be configured on both switches and be allowed across the peer link, assuming VLAN pruning is configured on the dot1Q trunks. For example, this sample output shows the vPC peer link configuration on two Nexus 9372s. The example topology has a single Cisco ISR4321 router connected to LAB-CS1. While this is simple, it gives you the idea of the problem we’re solving.

LAB-CS1# show run interface port-channel 1

interface port-channel1
  description VPC PEER LINK
  switchport mode trunk
  switchport trunk native vlan 42
  spanning-tree port type network
  vpc peer-link

LAB-CS1# show vpc brief
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1   
Peer status                       : peer adjacency formed ok      
vPC keep-alive status             : peer is alive                 
Configuration consistency status  : success 
Per-vlan consistency status       : success                       
Type-2 consistency status         : success 
vPC role                          : primary                       
Number of vPCs configured         : 0   
Peer Gateway                      : Enabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Disabled
Auto-recovery status              : Enabled, timer is off.(timeout = 240s)
Delay-restore status              : Timer is off.(timeout = 30s)
Delay-restore SVI status          : Timer is off.(timeout = 10s)
Operational Layer3 Peer-router    : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id    Port   Status Active vlans    
--    ----   ------ -------------------------------------------------
1     Po1    up     1,42,100-101,192                                                     
LAB-CS1#
LAB-CS2# show run interface port-channel 1

interface port-channel1
  description VPC PEER LINK
  switchport mode trunk
  switchport trunk native vlan 42
  spanning-tree port type network
  vpc peer-link

LAB-CS2# show vpc brief
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1   
Peer status                       : peer adjacency formed ok      
vPC keep-alive status             : peer is alive                 
Configuration consistency status  : success 
Per-vlan consistency status       : success                       
Type-2 consistency status         : success 
vPC role                          : secondary                     
Number of vPCs configured         : 0   
Peer Gateway                      : Enabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Disabled
Auto-recovery status              : Enabled, timer is off.(timeout = 240s)
Delay-restore status              : Timer is off.(timeout = 30s)
Delay-restore SVI status          : Timer is off.(timeout = 10s)
Operational Layer3 Peer-router    : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id    Port   Status Active vlans    
--    ----   ------ -------------------------------------------------
1     Po1    up     1,42,100-101,192                                                     
LAB-CS2#

The peer-keepalive is a routed link, or path, used to deconflict a dual-active issue. While a vPC switch peer is actively forwarding traffic, only one switch can be the master “ring” to rule the system. If both switches become active, the peer-keepalive is the link used to resolve the problem. Think of it as the Vanilla Ice of the vPC circuits. When configuring this link it’s best to use a special vPC-management VRF (Virtual Routing and Forwarding) across the dedicated management interface of the Nexus switch. Additionally, this link does not need to be directly connected between the two switches, but, if you attach it to a network switch, be certain to place it on its own VLAN to isolate the management traffic.

CS1# show run vpc 

feature vpc

vpc domain 1
  peer-switch
  role priority 1000
  system-priority 1000
  peer-keepalive destination 1.1.1.2 source 1.1.1.1 vrf vpckeepalive
  peer-gateway
  no graceful consistency-check
  auto-recovery
  ipv6 nd synchronize
  ip arp synchronize

interface port-channel1
  vpc peer-link

CS1#
CS2# show run vpc

feature vpc

vpc domain 1
  peer-switch
  role priority 2000
  system-priority 1000
  peer-keepalive destination 1.1.1.1 source 1.1.1.2 vrf vpckeepalive
  peer-gateway
  no graceful consistency-check
  auto-recovery
  ipv6 nd synchronize
  ip arp synchronize

interface port-channel1
  vpc peer-link

CS2#

This takes us to the solution to our vPC peer link routing problem. What happens when you have an upstream device needing to exchange route information? What if this device needs to form an adjacency with all routing hosts on the broadcast domain? This is where you need to be careful with your architecture.  

The Scenario

vPC/OSPF Lab ToplogyIn this example topology you have a Cisco ISR 4321 running OSPF upstream of the spine Nexus 9372 switches. The spine is using an SVI (VLAN192, 192.168.1.0/29) as the egress zone to the ISR4321. If you just configure the basics of the VPC, SVI, and OSPF you would see that the OSPF adjacency mechanisms are forwarded along the VPC peer link and there will be a failure of the routing protocol on LAB-CS2. If the active path is on LAB-CS1 you’ll see OSPF attempting to form the adjacency on LAB-CS1 and LAB-CS2, but it’s getting stuck on LAB-CS2. One side will nail up, but the other will flap or remain in EXSTART. Convergence time will be slower as the OSPF process needs to finish the neighbor adjacency process during the firewall device failure, exchange routes, and forward traffic. The ISR4321 connection to LAB-CS1 is also known as an “orphan port” because it isn’t multi-homed between chassis. If you are having problems forming route adjacencies in this topology you will see something like this.

LAB-CS1# show ip ospf neighbors 
 OSPF Process ID 1 VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.2          1 FULL/DROTHER     00:03:21 192.168.1.3     Vlan192 
 10.1.1.3          1 FULL/DR          04:31:48 192.168.1.4     Vlan192 
LAB-CS1#
LAB-CS2# show ip ospf neighbors 
 OSPF Process ID 1 VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.1          1 FULL/BDR         00:03:13 192.168.1.2     Vlan192 
 10.1.1.3          1 EXSTART/DR       00:00:14 192.168.1.4     Vlan192 
LAB-CS2#
LAB-AR1#show ip ospf neighbor 

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.1.1.1          1   FULL/BDR        00:00:36    192.168.1.2     GigabitEthernet0/0/0
10.1.1.2          1   EXSTART/DROTHER 00:00:37    192.168.1.3     GigabitEthernet0/0/0
LAB-AR1#

Cisco did not support this topology prior to NX-OS 7.0(3)I5 so there are limited work-arounds to deal with this scenario. If your organization is anything like a majority of enterprises, you’ll find yourself in this kind of configuration. Hopefully you can fix it before a) it’s in production, or b) in a network down state. So what do you do when you need the interface redundancy/resiliency but you also need layer-3 reachability?

Option One – Prune Your VLANs

If your upstream devices are peering to an SVI on the switch, your best option would be to

  • Create another ISL trunk between the core switches
  • Prune the L3 Egress SVI from the vPC peer link
  • Allow the VLAN across the new non-vPC ISL trunk

This should make some sense if you remember that vPC is a Layer-2 virtualization technology. By pruning the  Layer-3 transit VLAN from the vPC peer link and moving it to another trunk would allow the upstream device a path to form the routing protocol adjacencies. These sample configurations show the two switches with the vPC peer link. VLAN192 is the egress VLAN that uplinks to the HA pair of Internet facing firewalls. To address this architectural limitation, add another link to carry the layer-2 traffic between the switch spines, prune the appropriate VLAN from the peer link, and allow it on the new ISL. That VLAN will then be carried across the new port-channel and OSPF will nail up properly.

LAB-CS1# show running-config interface port-channel 1

interface port-channel1
  description VPC PEER LINK
  switchport mode trunk
  switchport trunk native vlan 42
  switchport trunk allowed vlan 1-191,193-4094
  spanning-tree port type network
  vpc peer-link

LAB-CS1# sho running-config interface port-channel 2

interface port-channel2
  description L2 TRANSIT
  switchport mode trunk
  switchport trunk native vlan 42
  switchport trunk allowed vlan 192
  spanning-tree port type network

LAB-CS1#
LAB-CS2# show running-config interface port-channel 1

interface port-channel1
  description VPC PEER LINK
  switchport mode trunk
  switchport trunk native vlan 42
  switchport trunk allowed vlan 1-191,193-4094
  spanning-tree port type network
  vpc peer-link

LAB-CS2# show running-config interface port-channel 2

interface port-channel2
  description L2 TRANSIT
  switchport mode trunk
  switchport trunk native vlan 42
  switchport trunk allowed vlan 192
  spanning-tree port type network

LAB-CS2#

After you create the new ISL and prune the VLAN on the spine switches, there is one more step to get OSPF to nail up. There is a slight limitation in the Nexus 9000 platform that will cause some strange results. If you just prune the VLAN from the peer link and add to the new ISL, you may still see one of the upstream adjacencies stuck in EXSTART. If you don’t, it will only be a matter of time before you start chasing the outage-ghosts as routes fail for some mysterious reason. The reason for this is that the Nexus 9000’s share MAC addresses of the SVIs as the vPC peer link MAC address. This issue is well beyond the scope of this article, but if you want to read about it, you can see the details in Cisco’s documentation, NX-OS Interfaces Configuration Guide and Supported Topologies as related to the Nexus 9000 platform. To resolve the MAC address issue you’ll need to add a static MAC address on the SVIs you pruned from the peer link. This basically tells the Nexus OS not to punt the local-to-switch ARP requests to the CPU so that it answers for the other chassis. Strange, heh? To have the other switch answer the routing protocol traffic simply add a static MAC to the SVI. In this scenario I use the first 16 bits as all zero’s, the second 16 bits as the  VLAN ID, and the last to identify which switch the MAC is assigned to. This makes the interface function easily identifiable when troubleshooting in the future.

LAB-CS1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
LAB-CS1(config)# inter vlan 192
LAB-CS1(config-if)# mac-address 0000.0192.0001
LAB-CS1(config-if)# end
LAB-CS1#
LAB-CS2(config)# inter vlan 192
LAB-CS2(config-if)# mac-address 0000.0192.0002
LAB-CS2(config-if)# end
LAB-CS2#

At this point the Layer-3 transit VLAN is removed from the peer link; it’s added it to a new dedicated non-vPC transit ISL; and a statically assigned MAC address is added to the SVI. Once that happens, you should see OSPF move into full adjacency almost immediately. It’s like Dijkstra is saying, “Where have you been all my life?! Want to go out for a drink or something?”

LAB-CS1# show ip ospf neighbors 
 OSPF Process ID 1 VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.2          1 FULL/DROTHER     00:01:00 192.168.1.3     Vlan192 
 10.1.1.3          1 FULL/BDR         01:12:37 192.168.1.4     Vlan192 
LAB-CS1#
LAB-CS2# show ip ospf neighbors 
 OSPF Process ID 1 VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.1          1 FULL/DROTHER     00:21:23 192.168.1.2     Vlan192 
 10.1.1.3          1 FULL/DR          00:21:25 192.168.1.4     Vlan192 
LAB-CS2#
LAB-AR1#show ip ospf neighbor 

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.1.1.1          1   FULL/DROTHER    00:00:34    192.168.1.2     GigabitEthernet0/0/0
10.1.1.2          1   FULL/BDR        00:00:38    192.168.1.3     GigabitEthernet0/0/0
LAB-AR1#

If you need to do this in a production environment be prepared for some service interruption. Changing the MAC address of the SVI will cause the switch to relearn the MAC table for that VLAN and when you prune a VLAN from one trunk and add it to another, there will be a service outage as you type the commands and STP recalculates the layer-2 path. Running R-PVST (Rapid Per-VLAN Spanning-Tree) limits the interruption time and you should only drop a couple of frames. A lot of moving parts are happening which is why I suggested this work be completed prior to production! I had to make this change on a couple of production switches a couple weeks ago. With the likelihood of only a couple lost frames, I believed this would be fine. Unfortunately, I had an HSRP issue with my upstream device even though I dropped one packet and OSPF still formed up in the topology. It took about 15 minutes to address which was certainly within the timeline of polling alerts! Crow doesn’t taste good, even with a little salt.

Option 2 – Layer-3 ISL Switchlink

But what if you have a WAN connection on one switch, and a Metro connection on the other? What if both use /30’s as the transit subnet? How do these attached devices exchange routes? The way this looks in the configuration is pretty straight forward. As a simple side-bar, I like all of these options when I stand up new Nexus boxes; a vPC peer link, a non-vPC layer-2 “transit” link, and a layer-3 “transit” link. When I create a new SVI that requires a routing protocol, in my script I just add the SVI to the proper ISL trunk and set a static MAC address to the new SVI.

If I do a routed link, I don’t have to worry about any of the other hoops to engineer the traffic flows. I carve out a /30 between the route devices on each switch, then let the layer-3 “transit” ISL help exchange and forward routes. If you have different point-to-point links carrying the same subnets known by all the switches between locations, it’s possible traffic entering switch 1 needing to go to switch 2, will go through a different office and return to the destination switch; an extended trombone network, if you will, based on OSPF calculations for shortest path. Think of like this: from a remote office, traffic enters LAB-CS1 destined for another remote office location connected to LAB-CS2. Without the Layer-3 ISL trunk between LAB-CS1 and LAB-CS2 this traffic flow has to go through LAB-CS3 and LAB-CS4 to get there. Adding a layer-3 ISL addresses the problem because the next hop is only one switch away.

LAB-CS1# show running-config interface port-channel 1

interface port-channel1
  description VPC PEER LINK
  switchport mode trunk
  switchport trunk native vlan 42
  switchport trunk allowed vlan 1-191,193-4094
  spanning-tree port type network
  vpc peer-link

LAB-CS1# show running-config interface port-channel 2

interface port-channel2
  description L2 TRANSIT
  switchport mode trunk
  switchport trunk native vlan 42
  switchport trunk allowed vlan 192
  spanning-tree port type network

LAB-CS1# show running-config interface Ethernet1/27

interface Ethernet1/27
  description L3 ISL TRANSIT
  no switchport
  ip address 192.168.1.9/30
  no ip ospf passive-interface
  ip router ospf 1 area 0.0.0.0
  no shutdown

LAB-CS1#
LAB-CS2# show running-config interface port-channel 1

interface port-channel1
  description VPC PEER LINK
  switchport mode trunk
  switchport trunk native vlan 42
  switchport trunk allowed vlan 1-191,193-4094
  spanning-tree port type network
  vpc peer-link

LAB-CS2# show running-config interface port-channel 2

interface port-channel2
  description L2 TRANSIT
  switchport mode trunk
  switchport trunk native vlan 42
  switchport trunk allowed vlan 192
  spanning-tree port type network

LAB-CS2# show running-config interface Ethernet1/27

interface Ethernet1/27
  description L3 ISL TRANSIT
  no switchport
  ip address 192.168.1.10/30
  no ip ospf passive-interface
  ip router ospf 1 area 0.0.0.0
  no shutdown

LAB-CS2#

Once this part of the configuration is complete, OSPF will have a full topology within your enterprise network.

LAB-CS1# sho ip ospf neigh
 OSPF Process ID 1 VRF default
 Total number of neighbors: 3
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.2          1 FULL/DROTHER     00:00:23 192.168.1.3     Vlan192 
 10.1.1.3          1 FULL/DR          00:00:29 192.168.1.4     Vlan192 
 10.1.1.2          1 FULL/DR          00:00:25 192.168.1.10    Eth1/27 
LAB-CS1#
LAB-CS2# sho ip ospf neigh
 OSPF Process ID 1 VRF default
 Total number of neighbors: 3
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.1          1 FULL/BDR         00:00:31 192.168.1.2     Vlan192 
 10.1.1.3          1 FULL/DR          04:33:42 192.168.1.4     Vlan192 
 10.1.1.1          1 FULL/BDR         00:00:33 192.168.1.9     Eth1/27 
LAB-CS2#
LAB-AR1#sho ip ospf neigh

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.1.1.1          1   FULL/BDR        00:00:36    192.168.1.2     GigabitEthernet0/0/0
10.1.1.2          1   FULL/DROTHER    00:00:39    192.168.1.3     GigabitEthernet0/0/0
LAB-AR1#

Option 3, The Easy Solution

After all of this, some of you may be saying to yourselves, “There must be an easier way.” In fact, there is. Passing routing protocols across the vPC has been needed for quite some time. Arista Networks’ version of the Virtual Port Channel, known as MLAG (Multi-chassis Link Aggregation), had this functionality almost on day one. My guess is you wouldn’t be reading this article if you had their equipment. If you want the easy solution, you need to install NX-OS version 7.0(3)I5(2) or better. This version of code includes a command issued within the vPC domain enabling routing across the peer link. No new cabling. No new port channels. No new pruning. In fact, this becomes less dependent on the physical connectivity of your hardware. Simply issue the command layer3 peer-router inside the vPC domain on both of the vPC pairs, and you’re off to the races! Let’s start with the faulty configuration:

LAB-CS1# show running-config interface port-channel 1

interface port-channel1
  description VPC PEER LINK
  switchport mode trunk
  switchport trunk native vlan 42
  spanning-tree port type network
  vpc peer-link

LAB-CS1# show running-config vpc 

vpc domain 1
  peer-switch
  role priority 1000
  system-priority 1000
  peer-keepalive destination 1.1.1.2 source 1.1.1.1 vrf vpckeepalive
  peer-gateway
  no graceful consistency-check
  auto-recovery
  ipv6 nd synchronize
  ip arp synchronize

LAB-CS1# show ip ospf neighbors 
 OSPF Process ID 1 VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.2          1 FULL/DROTHER     00:02:15 192.168.1.3     Vlan192 
 10.1.1.3          1 FULL/DR          00:02:25 192.168.1.4     Vlan192 
LAB-CS1#
LAB-CS2# show running-config interface port-channel 1

interface port-channel1
  description VPC PEER LINK
  switchport mode trunk
  switchport trunk native vlan 42
  spanning-tree port type network
  vpc peer-link

LAB-CS2# show running-config vpc

vpc domain 1
  peer-switch
  role priority 2000
  system-priority 1000
  peer-keepalive destination 1.1.1.1 source 1.1.1.2 vrf vpckeepalive
  peer-gateway
  no graceful consistency-check
  auto-recovery
  ipv6 nd synchronize
  ip arp synchronize

LAB-CS2# show ip ospf neighbors 
 OSPF Process ID 1 VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.1          1 FULL/BDR         00:04:27 192.168.1.2     Vlan192 
 10.1.1.3          1 EXSTART/DR       00:01:13 192.168.1.4     Vlan192 
LAB-CS2#
LAB-AR1#show ip ospf neighbor 

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.1.1.1          1   FULL/BDR        00:00:37    192.168.1.2     GigabitEthernet0/0/0
10.1.1.2          1   DOWN/DROTHER       -        192.168.1.3     GigabitEthernet0/0/0
LAB-AR1#

After adding the command to the VPC domain, just like the above options, OSPF nails up across the switches.

LAB-CS1# conf t
Enter configuration commands, one per line. End with CNTL/Z.
LAB-CS1(config-vpc-domain)# layer3 peer-router 
LAB-CS1(config-vpc-domain)# end
LAB-CS1# sho ip ospf neigh
 OSPF Process ID 1 VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.2          1 FULL/DROTHER     00:05:07 192.168.1.3     Vlan192 
 10.1.1.3          1 FULL/DR          00:05:13 192.168.1.4     Vlan192 
LAB-CS1#

 

LAB-CS2# conf t
Enter configuration commands, one per line. End with CNTL/Z.
LAB-CS2(config)# vpc domain 1
LAB-CS2(config-vpc-domain)# layer3 peer-router 
LAB-CS2(config-vpc-domain)# end
LAB-CS2# 
LAB-CS2# sho ip ospf neighbors
 OSPF Process ID 1 VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 10.1.1.1          1 FULL/BDR         00:05:12 192.168.1.2     Vlan192 
 10.1.1.3          1 INIT/DR          00:00:55 192.168.1.4     Vlan192 
LAB-CS2#

Conclusion

Whenever you’re building something new it’s extremely important to read through the related vendor documentation. They have done extensive testing, normally, and there really isn’t a reason to reinvent the wheel. Especially if you don’t know what the limitations of a wheel with no hub or spokes may be. Unfortunately most of us have grown up in the instructions-are-for-suckers mindset and that really needs to change. Quickly. Networks and applications are too important and complex to integrate a half-cocked theory based on a Sybase CCNA book. Those books give you the broad understanding. You have to look at the vendor documents to get the specifics of the platform limitations you’re supporting. Hope this helped the community of networkers!

Leave a Reply