Designing a Network (Part 2)

If you’ve considered the problem I wrote about back in December, no doubt there have been some questions about requirements. Actually, if you’re remotely interested in pursuing the CCIE Architecture path, this could be one of the design problems you would need to solve. This case study has its origins from a company I worked for. There were numerous issues not the least of which were static routes EVERYWHERE and a single OSPF area 0 for all regions. I came up with the design and headed out to the Cisco Proof Of Concept Lab (CPOC) in Research Triangle Park, NC to test the topology on physical hardware. While I was mocking the solution up at CPOC, I was fortunate enough to meet one of the writers of RFC 7868, Steven Moore. We talked about what I was trying to do, and he said, “You’re actually moving FROM OSPF to EIGRP? Why?”

That made me pause.

Was I doing the right thing? Was there another way to accomplish the goal? What if I was eliminating important technologies because I was using EIGRP? After some significant soul searching, I remembered my criteria and knew EIGRP was the best way to accomplish the goal—to meet both the IT staff needs and the needs of the business. Sometimes, rethinking your options is a good thing. The art is to avoid analysis paralysis. Eventually you have to pull the trigger (read Plan Before You Execute).

The design criteria for this case study is straight forward. This is the excerpt from my project plan/business analysis submitted to management.

  • Design goal is to improve network convergence, improve Mean-Time-To-Repair (MTTR), and limit use of static routes.
  • Additional funding for hardware is not available (e.g. New routers may not be purchased to complete the project)
  • The global routing design will use industry best practices and will be able to support future enhancements (e.g. QoS/CoS; Internet POP redundancy; growth/expansion; redundant circuits; etc.) without significant modification to the architecture.
  • Traffic sourced from directly connected data centers will follow local links to the destination (e.g. Austin, TX to Edinburgh, UK uses direct transatlantic link)
  • Traffic destined to non-directly connected data centers will use the in-country next hop before traversing an east-west leg (e.g. Austin, TX to London, UK will traverse the San Antonio, TX to London, UK transatlantic link).
  • Reply traffic will follow the same path as the source request. Asymmetric routing will be avoided.
  • LAN traffic will follow local high-speed links before failing to low-speed MPLS (e.g. All Metro Area Network connections must be down prior to preferring MPLS networks).
  • MPLS entry points will prefer local destination addresses (e.g. Austin, TX MPLS router will be primary for Austin subnets, London, UK MPLS router will be primary for London subnets.)
    Redundant MPLS entry points will be preferred by order of in-country networks followed by closest out of country, and furthest out of country (e.g. San Antonio, TX is secondary for Austin, TX networks and tertiary for London, UK ingress)
  • Routing protocols supported will be BGP, EIGRP, and OSPF. EIGRP will be used as an IGP for Cisco route-switched networks. BGP is to be used as the WAN protocol. OSPF is used on subnets that do not support EIGRP.
  • OSPF will be contained to choke-points on the network (e.g. firewalls connecting to an SVI).

Your mission: redesign this network topology.
Without the ability to purchase additional hardware, the design will utilize EIGRP so that circuit characteristics and and route tagging may be used for both LAN and WAN connectivity. The current network was a flat OSPF backbone area with static routes for transatlantic connectivity. As you read this post, I’d suggest checking out the diagram and the configuration scripts as you read through this discussion.

Core Switching

This network design leverages use of route-maps and route tagging. I’ve assigned route tag numbers of 10, 20, 30, and 40 for Austin, San Antonio, Edinburgh, and London campus’, respectively. The ACME Co. is a medium sized, point-to-point campus environment with a few dozen MPLS connected remote offices. These values are added to a route update, so, a route originating from the Austin data center will be tagged as “10”, and any route coming out of the Edinburgh data center will show a route tag of “30”. These tags may be observed when you check a given route. For example, viewing an Edinburgh route from AUS-CS2 shows the tag value:

For the Austin core switch one, (AUS-CS1), you’ll see two links with an applied route-map. One link heads east to Edinburgh and one heads south to San Antonio. It is important to remember the switch processes the route-maps in sequential order, finds a match; the rule of thumb is to go from specific to general when creating the policy.

The route-map recognizes the routing protocol, and tag a value. The redistribution looks at the route-map, checks the tag, and writes the same tag value on the outbound advertisement. If you fail to use the outbound route-map, the switch overwrites the inbound tag with its own value, and advertises the route with its own tag. If this is a tough concept, or, you want to see it in action, the Cisco Virtual Internet Routing Lab (VIRL) topology is included with the configurations.

To aide troubleshooting efforts, I tend to be somewhat verbose for interface descriptions, route-map names, and prefix-list names. I use things like EIGRP-METRIC-IN to let me see that a route-map is being applied inbound for a routing protocol and is modifying the default values. If I have a prefix-list that the route-map matches, I use the same title so it’s easy to search for. I also use capitol letters for configuration variables which helps bits of information stand out in the middle of a long configuration. A simple show running-config | include EIGRP-METRIC  gives me 100% of the proper values.

These route-maps are applied, where appropriate, to the inbound or outbound interface of the core/distribution layer switches. This keeps all the tagging correct and applies the proper metrics for traffic engineering. For the transatlantic circuits, the EIGRP metrics are modified to prevent asymmetric routing. In some instances, an asymmetric route isn’t a big deal, but performance problems will happen with time sensitive traffic like voice and video. Also, if stateful firewalls are in place along these paths, data transmission will break. The TCP SYN will be added to the firewall state table, but the ACK may enter firewalls on a different path. Since the SYN is absent from those firewall state tables, the packet will be dropped.

In this snippet located on AUS-CS1, the configuration and route-maps applied to the transatlantic DCI to EDI-CS1 accomplishes a few critical items:

  • Sets the bandwidth to 1Gbps for monitoring and routing protocol interface calculation.
  • Advertises the interface into EIGRP process 100.
  • Matches the route-map EIGRP-METRIC-OUT for outbound advertisements.
  • Matches the route-map EIGRP-METRIC-IN for inbound advertisements.
  • Allows EIGRP adjacencies to be formed on the interface. Best practice is to enable passive-interface on a physical or SVI interface so you control route adjacency.

I’m a big fan of standardized VLAN IDs, interface numbers and their functions, and hostnames. Scripting is so much easier when there’s a standard. With this design, any new office that may be added to the network has a basic route script. The tagging is handled at the core/distribution rather than one-off functions. If a new point-to-point link connecting a new office in Dallas to Austin is installed, exchanging EIGRP routes is the only requirement for the remote end. The tagging/route metrics happen automatically within the core based on the route-map on the interface.

With the sample metro-E connections taken care of, let’s look at the MPLS side of the equation. In the diagram you’ll notice that the MPLS routers are connected to core switch “2”, while the DCIs are on core switch “1”. With Nexus, there is a design consideration for VPC that you can not miss. The ability to pass routing protocols across a VPC peer link on the Nexus switches is not supported. Virtual Port Channels are a layer 2 virtualization technology so it does not support layer 3/multicast across the link (remember the OSI model?). This was prior to NX-OS version 7.0.3.I5.2; afterwards, EIGRP virtualization was supported across the VPC peer link. I’ve not tested it, but technically, the recent code supports unicast route updates. If you can force a routing protocol to use unicast, you’re in good shape for adding your redundant DCI circuits into VPC. It is strongly encouraged that you test full functionality prior to using this feature in production environments.

That’s all side note/hint, just be aware of the issue. If you see route neighbors get stuck in EXSTART or flap on neighbor negotiation, you’re probably sending route updates across the VPC peer link.

To achieve routing protocol reachability across Nexus switches (if your routers exchange across a VPC), you have two choices for architecture. First, you can configure a routed interface between the switch and the directly connected router interface (e.g. a /30 for the uplink). Then exchange dynamic routes across that uplink. This means adjacencies will be formed between the router and directly connected Nexus switch, thus bypassing the VPC peer link.

Your other option is to create a new trunk between your two Nexus cores, assuming you have a VLAN created for routing protocols. Once you create the new dot1q port-channel, prune the route-protocol VLAN from the VPC peer link, and allow it on the new trunk. If you have a router connected to each switch addressed to the same SVI, these devices will form an adjacency.

Each of these options have some drawbacks, so I’d suggest investigating how they should be introduced into your environment.

Distribution Routing

Now we get to MPLS route distribution. The tricky part is getting remote offices to enter the proper ingress points, and data center traffic to egress properly. Remember how painful it is to translate IGP metrics into BGP? The old way to advertise engineered routes from an IGP to BGP was to use a route-map and modify the AS-prepend. This way of route-modification has slipped into disrepute because of things like the BGP best-path decision tree, unsupported AS-prepending for some protocols, and is administratively onerous. We’ve not even talked about automation within ACI-enabled devices.

Thankfully, Cisco managed to include some IOS code to help. The Accumulated Interior Gateway Protocol (AIGP) metric command is used to ease the metric redistribution pain. AIGP will calculate the IGP metric and automatically change it for the proper EGP value. The command uses a route-map associated with the routing protocol, on the outbound redistribution. If you review the diagram, the MPLS routers have to redistribute EIGRP into BGP (and vis versa) for full network reachability which makes AIGP a useful tool.

This command tells MPLS connected remote sites to use the ingress point closest to the destination subnet. If the source address is in REMOTE-A, it is not desirable for data to travel through London to connect to a server in the Austin data center. Also, if the Austin circuit goes down for “maintenance”, the best secondary option is to enter through San Antonio to connect to the same host. AIGP easily enables this functionality.

Failure Not An Option, But Is Recoverable

Now that all the devices are configured, let’s examine some failure scenarios. Keep in mind the objectives—use the closest entry point and follow the ingress path for egress traffic. Convergence times may be modified with technologies like Bidirectional Forwarding (BFD). In production and testing I’ve found that the data centers/campus recovers with extreme rapidity. However, redistributed routes to MPLS offices may take upwards of 90 seconds. Again, BFD helps in these situations.

Let’s begin with the network behavior during an MPLS failure. The output below shows the routes received for the Austin data center at the REMOTE-A. This example illustrates how tagging and AIGP work in conjunction to provide the best ingress/egress decisions.

Routes in REMOTE-A for

Notice the route preference based on the metric value. In order, Austin, San Antonio, Edinburgh, and London. As we shut down the MPLS links in each geographic location, you’ll see the best path change accordingly.

1. Shutdown the MPLS link in Austin

2. Check REMOTE-A for the route to and the BGP preference

Notice how the ingress route through Austin has been removed. The preferred routes are San Antonio, Edinburgh, and London.

3. Shutdown the San Antonio MPLS circuit

4. Check REMOTE-A for the route to and the BGP preference

At this point we see the ingress routes in the U.S. get flushed from REMOTE-A. The preferred path to the Austin core is now Edinburgh then London. The MPLS originated traffic is then backhauled across the northern and southern DCI links, respectively.

5. Shutdown Edinburgh MPLS

6. Check REMOTE-A for the route to and the BGP preference

The likelihood of three of four geographically diverse MPLS circuits dropping simultaneously is pretty slim. Losing the regional north-south links is a different story altogether. In the previous 5 years of my career I’ve had seagoing vessels drop anchors on circuits in Singapore and London causing multi-day outages on those transport paths. How’s that for luck?!

Two years ago, the company I worked for actually had both north-south U.S. links drop, isolating the Austin and San Antonio data centers from each other. Early one morning an Austin city worker came out of a bar, jumped into his garbage truck, put the “arms” upright like a football referee signaling a touchdown, and left for home. As he headed down Pleasant Valley Ave. he clipped the telephone pole which just happened to hold our two, “diversely” routed data center connections. Everything went dark!

Wouldn’t it be great if you could use your MPLS connections to talk to your data centers without your involvement outside of opening a ticket with the carrier? Here’s how that scenario shakes out. The tags, EIGRP, and route-maps take over. Be sure to look for the “tag” option whenever you issue the show ip route x.x.x.x command in your lab or production environment.

7. Verify the traffic path between AUS-CS and SATX-CS switch pair

Note: This traces the path between an interface on the CS1 switches to an interface on the CS2 switches

Note: This traces the path between an interface on the CS2 switches to an interface on the CS1 switches

8. Disable the Metro-E circuit on SATX-CS2

9. Verify path change between the CS2 switches. These will pass through Port-Channel2 on both ends.

The final test in this architecture will fail one of the Trans-Atlantic DCI circuits. EIGRP will converge quickly and traffic should flow to the active link. The test below will show reachability from the Austin and London CS2 switches.

10. Verify the path from AUS-CS2 to LON-CS2

Note: The reply traffic will follow the same path as the initiated traffic. We have no loops to contend with.

11. Shutdown the AUS-CS1 interface pointing to EDI-CS1

12. Verify the path between AUS-CS2 and LON-CS2


This design took many hours of prep-work, testing, documenting the changes, and then documenting the final state of the network. The early design considerations tried to leverage OSPF and BGP rather than convert from OSPF to EIGRP. Unfortunately, OSPF convergence times weren’t fast enough and there were inevitable loops given the circuit topology. This new topology continues to use OSPF on certain VLANs for devices that do not support EIGRP, but, the primary routes are all carried on EIGRP and BGP.

While this architecture may not fit your needs, hopefully you can see how the use of tags gives great flexibility for traffic engineering. Each design criteria is met because the defined needs gave us the target for successful network changes. Honestly, that is just as important as the configuration snippets.

Brian Gleason is a full-time Lead Network Engineer for an Austin, Tx company and is currently pursuing the Cisco Certified Internetwork Expert, Data Center certification. He also teaches firearms in his spare time after being a husband to his wonderful wife and father to his three awesome kids. Brian was also selected as a delegate to Network Field Day 20 held in San Jose, CA.

Leave a Reply