Designing a Network (Part 2)

If you’ve considered the problem I wrote about back in December, no doubt there have been some questions about requirements. Actually, if you’re remotely interested in pursuing the CCIE Architecture path, this could be one of the design problems you would need to solve. This case study has its origins from a company I worked for. There were numerous issues not the least of which were static routes EVERYWHERE and a single OSPF area 0 for all regions. I came up with the design and headed out to the Cisco Proof Of Concept Lab (CPOC) in Research Triangle Park, NC to test the topology on physical hardware. While I was mocking the solution up at CPOC, I was fortunate enough to meet one of the writers of RFC 7868, Steven Moore. We talked about what I was trying to do, and he said, “You’re actually moving FROM OSPF to EIGRP? Why?”

That made me pause.

Was I doing the right thing? Was there another way to accomplish the goal? What if I was eliminating important technologies because I was using EIGRP? After some significant soul searching, I remembered my criteria and knew EIGRP was the best way to accomplish the goal—to meet both the IT staff needs and the needs of the business. Sometimes, rethinking your options is a good thing. The art is to avoid analysis paralysis. Eventually you have to pull the trigger (read Plan Before You Execute).

The design criteria for this case study is straight forward. This is the excerpt from my project plan/business analysis submitted to management.

  • Improve network convergence time.
  • Improve Mean-Time-To-Repair (MTTR).
  • Limit use of static routes.
  • Additional funding for hardware is not available.
  • Reply traffic will follow the same path as the source request. Asymmetric routing will be avoided.
  • LAN traffic will follow local high-speed links before failing to low-speed MPLS
  • MPLS entry points will prefer local destination addresses (e.g. Austin, TX MPLS router will be primary for Austin subnets, London, UK MPLS router will be primary for London subnets, etc.)
  • Redundant MPLS entry points will be preferred by order of in-country networks followed by closest out of country, and furthest out of country (e.g. Los Angeles is secondary, Dublin is tertiary, and London is last resort for Austin subnets).

Your mission: redesign this network topology.
Without the ability to purchase additional hardware, the design will utilize EIGRP so that circuit characteristics and and route tagging may be used for both LAN and WAN connectivity. The current network was a flat OSPF backbone area with static routes for transatlantic connectivity. As you read this post, I’d suggest checking out the diagram and the configuration scripts as you read through this discussion.

Core Switching

This network design leverages use of route-maps and route tagging. I’ve assigned route tag numbers of 10, 20, 30, and 40 for Austin, Los Angeles, Dublin, and London campus’, respectively. The ACME Co. is a medium sized, point-to-point campus environment with a few dozen MPLS connected remote offices. These values are added to a route update, so, a route originating from the Austin data center will be tagged as “10”, and any route coming out of the Dublin data center will show a route tag of “30”. These tags may be observed when you check a given route. For example, viewing an Dublin route from ATX-CS2 shows the tag value:

For the Austin core switch one, (ATX-CS1), you’ll see two links with an applied route-map. One link heads east to Dublin and one heads south to Los Angeles. It is important to remember the switch processes the route-maps in sequential order, finds a match; the rule of thumb is to go from specific to general when creating the policy.

The route-map recognizes the routing protocol, and tag a value. The redistribution looks at the route-map, checks the tag, and writes the same tag value on the outbound advertisement. If you fail to use the outbound route-map, the switch overwrites the inbound tag with its own value, and advertises the route with its own tag. If this is a tough concept, or, you want to see it in action, the Cisco Virtual Internet Routing Lab (VIRL) topology is included with the configurations.

To aide troubleshooting efforts, I tend to be somewhat verbose for interface descriptions, route-map names, and prefix-list names. I use things like EIGRP-METRIC-IN to let me see that a route-map is being applied inbound for a routing protocol and is modifying the default values. If I have a prefix-list that the route-map matches, I use the same title so it’s easy to search for. I also use capitol letters for configuration variables which helps bits of information stand out in the middle of a long configuration. A simple show running-config | include EIGRP-METRIC  gives me 100% of the proper values.

These route-maps are applied, where appropriate, to the inbound or outbound interface of the core/distribution layer switches. This keeps all the tagging correct and applies the proper metrics for traffic engineering. For the transatlantic circuits, the EIGRP metrics are modified to prevent asymmetric routing. In some instances, an asymmetric route isn’t a big deal, but performance problems will happen with time sensitive traffic like voice and video. Also, if stateful firewalls are in place along these paths, data transmission will break. The TCP SYN will be added to the firewall state table, but the ACK may enter firewalls on a different path. Since the SYN is absent from those firewall state tables, the packet will be dropped.

In this snippet located on ATX-CS1, the configuration and route-maps applied to the transatlantic DCI to DUB-CS1 accomplishes a few critical items:

  • Sets the bandwidth to 1Gbps for monitoring and routing protocol interface calculation.
  • Advertises the interface into EIGRP process 100.
  • Matches the route-map EIGRP-METRIC-OUT for outbound advertisements.
  • Matches the route-map EIGRP-METRIC-IN for inbound advertisements.
  • Allows EIGRP adjacencies to be formed on the interface. Best practice is to enable passive-interface on a physical or SVI interface so you control route adjacency.

I’m a big fan of standardized VLAN IDs, interface numbers and their functions, and hostnames. Scripting is so much easier when there’s a standard. With this design, any new office that may be added to the network has a basic route script. The tagging is handled at the core/distribution rather than one-off functions. If a new point-to-point link connecting a new office in Dallas to Austin is installed, exchanging EIGRP routes is the only requirement for the remote end. The tagging/route metrics happen automatically within the core based on the route-map on the interface.

With the sample metro-E connections taken care of, let’s look at the MPLS side of the equation. In the diagram you’ll notice that the MPLS routers are connected to core switch “2”, while the DCIs are on core switch “1”. With Nexus, there is a design consideration for VPC that you can not miss. The ability to pass routing protocols across a VPC peer link on the Nexus switches is not supported. Virtual Port Channels are a layer 2 virtualization technology so it does not support layer 3/multicast across the link (remember the OSI model?). This was prior to NX-OS version 7.0.3.I5.2; afterwards, EIGRP virtualization was supported across the VPC peer link. I’ve not tested it, but technically, the recent code supports unicast route updates. If you can force a routing protocol to use unicast, you’re in good shape for adding your redundant DCI circuits into VPC. It is strongly encouraged that you test full functionality prior to using this feature in production environments.

That’s all side note/hint, just be aware of the issue. If you see route neighbors get stuck in EXSTART or flap on neighbor negotiation, you’re probably sending route updates across the VPC peer link.

To achieve routing protocol reachability across Nexus switches (if your routers exchange across a VPC), you have two choices for architecture. First, you can configure a routed interface between the switch and the directly connected router interface (e.g. a /30 for the uplink). Then exchange dynamic routes across that uplink. This means adjacencies will be formed between the router and directly connected Nexus switch, thus bypassing the VPC peer link.

Your other option is to create a new trunk between your two Nexus cores, assuming you have a VLAN created for routing protocols. Once you create the new dot1q port-channel, prune the route-protocol VLAN from the VPC peer link, and allow it on the new trunk. If you have a router connected to each switch addressed to the same SVI, these devices will form an adjacency.

Each of these options have some drawbacks, so I’d suggest investigating how they should be introduced into your environment.

Distribution Routing

Now we get to MPLS route distribution. The tricky part is getting remote offices to enter the proper ingress points, and data center traffic to egress properly. Remember how painful it is to translate IGP metrics into BGP? The old way to advertise engineered routes from an IGP to BGP was to use a route-map and modify the AS-prepend. This way of route-modification has slipped into disrepute because of things like the BGP best-path decision tree, unsupported AS-prepending for some protocols, and is administratively onerous. We’ve not even talked about automation within ACI-enabled devices.

Thankfully, Cisco managed to include some IOS code to help. The Accumulated Interior Gateway Protocol (AIGP) metric command is used to ease the metric redistribution pain. AIGP will calculate the IGP metric and automatically change it for the proper EGP value. The command uses a route-map associated with the routing protocol, on the outbound redistribution. If you review the diagram, the MPLS routers have to redistribute EIGRP into BGP (and vis versa) for full network reachability which makes AIGP a useful tool.

This command tells MPLS connected remote sites to use the ingress point closest to the destination subnet. If the source address is in REMOTE-A, it is not desirable for data to travel through London to connect to a server in the Austin data center. Also, if the Austin circuit goes down for “maintenance”, the best secondary option is to enter through Los Angeles to connect to the same host. AIGP easily enables this functionality.

Failure Not An Option, But Is Recoverable

Now that all the devices are configured, let’s examine some failure scenarios. Keep in mind the objectives—use the closest entry point and follow the ingress path for egress traffic. Convergence times may be modified with technologies like Bidirectional Forwarding (BFD). In production and testing I’ve found that the data centers/campus recovers with extreme rapidity. However, redistributed routes to MPLS offices may take upwards of 90 seconds. Again, BFD helps in these situations.

Let’s begin with the network behavior during an MPLS failure. The output below shows the routes received for the Austin data center at the REMOTE-A. This example illustrates how tagging and AIGP work in conjunction to provide the best ingress/egress decisions.

Routes in REMOTE-A for

Notice the route preference based on the metric value. In order, Austin, Los Angeles, Dublin, and London. As we shut down the MPLS links in each geographic location, you’ll see the best path change accordingly.

1. Shutdown the MPLS link in Austin

2. Check REMOTE-A for the route to and the BGP preference

Notice how the ingress route through Austin has been removed. The preferred routes are Los Angeles, Dublin, and London.

3. Shutdown the Los Angeles MPLS circuit

4. Check REMOTE-A for the route to and the BGP preference

At this point we see the ingress routes in the U.S. get flushed from REMOTE-A. The preferred path to the Austin core is now Dublin then London. The MPLS originated traffic is then backhauled across the northern and southern DCI links, respectively.

5. Shutdown Dublin MPLS

6. Check REMOTE-A for the route to and the BGP preference

The likelihood of three of four geographically diverse MPLS circuits dropping simultaneously is pretty slim. Losing the regional north-south links is a different story altogether. In the previous 5 years of my career I’ve had seagoing vessels drop anchors on circuits in Singapore and London causing multi-day outages on those transport paths. How’s that for luck?!

Two years ago, the company I worked for actually had both north-south U.S. links drop, isolating the Austin and Los Angeles data centers from each other. Early one morning an Austin city worker came out of a bar, jumped into his garbage truck, put the “arms” upright like a football referee signaling a touchdown, and left for home. As he headed down Pleasant Valley Ave. he clipped the telephone pole which just happened to hold our two, “diversely” routed data center connections. Everything went dark!

Wouldn’t it be great if you could use your MPLS connections to talk to your data centers without your involvement outside of opening a ticket with the carrier? Here’s how that scenario shakes out. The tags, EIGRP, and route-maps take over. Be sure to look for the “tag” option whenever you issue the show ip route x.x.x.x command in your lab or production environment.

7. Verify the traffic path between ATX-CS and LAX-CS switch pair

Note: This traces the path between an interface on the CS1 switches to an interface on the CS2 switches

Note: This traces the path between an interface on the CS2 switches to an interface on the CS1 switches

8. Disable the Metro-E circuit on LAX-CS2

9. Verify path change between the CS2 switches. These will pass through Port-Channel2 on both ends.

The final test in this architecture will fail one of the transatlantic DCI circuits. EIGRP will converge quickly and traffic should flow to the active link. The test below will show reachability from the Austin and London CS2 switches.

10. Verify the path from ATX-CS2 to LON-CS2

Note: The reply traffic will follow the same path as the initiated traffic. We have no loops to contend with.

11. Shutdown the ATX-CS1 interface pointing to DUB-CS1

12. Verify the path between ATX-CS2 and LON-CS2


This design took many hours of prep-work, testing, documenting the changes, and then documenting the final state of the network. The early design considerations tried to leverage OSPF and BGP rather than convert from OSPF to EIGRP. Unfortunately, OSPF convergence times weren’t fast enough and there were inevitable loops given the circuit topology. This new topology continues to use OSPF on certain VLANs for devices that do not support EIGRP, but, the primary routes are all carried on EIGRP and BGP.

While this architecture may not fit your needs, hopefully you can see how the use of tags gives great flexibility for traffic engineering. Each design criteria is met because the defined needs gave us the target for successful network changes. Honestly, that is just as important as the configuration snippets.

Leave a Reply