The Spine-Leaf Data Center: Why and What?

A New Frontier

Sometimes the gladiator of network engineering removes his helmet to face the threat. What’s the threat? Dealing with the new, and perhaps unknown, technology. Few technologies or ideas have changed data center network design more than the Clos architecture. In the 1950’s Charles Clos’ telephone switching design turned data center design on its head at the turn of the century. Pronounced “clough”, like “dough”, not “closs”, like “floss”, Clos developed a non-blocking telephone switching network while employed at Bell Labs. This consisted of an ingress stage, a middle stage, and an egress stage. A 3-stage network (spine-leaf) deployment in the data center allows host-to-host communication within the data center to be only 1-hop away. The obvious benefit is that you expand your data center infrastructure horizontally, rather than vertically as you would in a legacy design.

The Design Side

The 3-Stage Clos has a spine layer and a leaf layer; the leaves are your top-of-rack switches connecting your ESX, UCS, and physical hosts within the rack. All leaves connect to each spine, but the spines do not connect to each other. MCLAG isn’t required and is frankly discouraged in this design. Growing past the functional support of the spine, or, adding Data Center Interconnects (DCI), could introduce a 5-Stage (super spine) deployment. The considerations for 3- or 5- stage are dependent on core switch leaf support, link oversubscription, and some other items outside the scope of this article.

Consider default-gateway placement for end hosts in this topology. In collapsed spine, Central Route Bridging (CRB), all routing, VLAN, and IRB/SVI configurations are placed on the spine. VLANs are trunked to the appropriate leaves for host access. With Edge Route Bridging (ERB), IRB/SVIs are configured on the leaf and as close to the end host as possible. How you deploy CRB or ERB will directly impact your EVPN/VXLAN design and support. More on that in a moment.

For enterprises transitioning their data centers, a collapsed spine is a great option. Adding or pruning VLANs off trunks leading to your leaves brings less risk with all the system movement. Few Enterprises have dedicated data center management processes to deploy a server in the proper rack to match IRB/SVIs configurations in an ERB topology. Imagine trying to deploy the rando-subnet-bound server into a rack when your data center was in an ERB configuration. Constantly touching each leaf could be error prone. Then, what happens if the same subnet needs to extend to another rack!?

Extend VLANs On Routed Links, EVPN-VXLAN?

That’s where EVPN-VXLAN enters the picture. Deploying EVPN-VXLAN let’s you stretch L2 domains across your L3 connections, isolate broadcast domains within data center racks, eliminate the need for spanning-tree’s loop prevention, and connect your regional data centers by MAC address. The benefit with EVPN-VXLAN to extend L2 over geographically dispersed infrastructure is MUCH improved over the days of Rita Perlman’s option! Now you’ll be able to route you MAC addresses via the extensibility of BGP.

I couldn’t really figure where it played in my corporate politics or technologies several years ago. I’m a long-term network engineer and had no case study that helped me fit the puzzle pieces together. Years ago, Joe Houghes (a friend and colleague) and I talked about rolling out VMware’s NSX between our data centers, and how it would impact the network. It was only VXLAN, afterall. I believe EVPN-VXLAN is a better deployment model since it can include all hosts in the DC rather than limit those to the VM environment.

Historically, think of the days of the dot-com, pre-bubble burst, where everyone had an and the venture capital was flowing like beer at a user-group meetup. Back then, most enterprises wanted to extend all IDFs across their metro-area network, via L2, to their central data centers. Carriers like Verizon and WorldCom were lighting fiber up and down the eastern seaboard to meet that demand. GTE Internetworking/Genuity were advertising the “Black Rocket“, whatever the heck that was, for similar carrier services. It was all about Ethernet connections to the access layer.

The problem we always had was the non-existent technology to deal with L2 functionality across miles of fiber. Extending the L2 domain expanded the blast radius for a MAC event across your campus network. Even now I can remember the arguments about the prudence of such a design, but with multi-layer switching hitting the market it was a foregone conclusion that companies would ignore the problems and embrace the new transport technology.

Today, we can pass L2 frames over an L3 connection, and this is especially important in the spine-leaf data center, and is seeing benefits to the IDF and campus fabric.

Ethernet Virtual Private Network (EVPN) is the control plane mechanism. Virtual Extensible Local Area Network (VXLAN) is the data plane mechanism. This is an important distinction network engineers need to keep straight. EVPN uses code extensions within BGP to identify where the L2 information should be sourced/destined. It’s the route decision-making of your data center extension. VXLAN encapsulates the MAC address information inside the IP packet enabling you to send L2 data over the routed network; that’s what we needed back in the late-90’s and early-00’s. This is where you read terms like “underlay” and “overlay” but that’s a discussion for the next article.

Shouting In A Hanger

You may be asking yourself, “When should I deploy this, and why?”. First let’s tackle the why. Juniper Networks wrote this great statement about the technology:

In traditional Layer 2 networks, reachability information is distributed in the data plane through flooding. With EVPN-VXLAN networks, this activity moves to the control plane.

Data Center, EVPN/VXLAN Diagram
Data Center, EVPN/VXLAN Diagram

Your L2 traffic is now controlled by L3 protocols. Controlling floods has benefits in Ethernet networks from reduced bandwidth consumption to convergence. In a switched network, learning the MAC to port assignment is a “Flood and Learn” function. Flood and learn is slower than a tuned route update. Protocols such as Bidirectional Forward Detection (BFD) significantly improves L3 convergence and is useful in BGP, OSPF, and ISIS where it also takes over the L2 MAC table insertion-removal.

The benefit is shown in the 5-Stage diagram to the left. In this respect, you’ll have the same VLAN-ID and subnet in each of your data centers. L2-events become a simple routing update rather than flood-and-learn. BUM traffic is managed at the VTEP and not flooding across your DCI links as in a simple L2 stretched data center. There will be some new route-types to learn when you deploy EVPN-VXLAN, but these are simple. The short answer is to deploy 3- or 5-stage data centers as soon as possible, and EVPN-VXLAN shortly afterwards.

Perhaps this is a good place to stop. In the next article, I’ll go over some common EVPN-VXLAN deployment terms and things to know for your Day 0+ support. Until then, for all the network engineers out there deploying unfamiliar technology, smile back and embrace your destiny!

More to follow.

Leave a Reply