One of the most common discussions I have with customers and colleagues these days relates to running active-active data centers using vSphere Metro Storage Cluster (vMSC). Often times, these are also referred to as stretched clusters.
When asked, here are just a few reasons that I’ve encountered for wanting to deploy vMSC:
- Data Center extension – The primary data center is running out of space/cooling/power, and vMSC is used to “extend” the primary data center to a secondary site
- Disaster Avoidance – vMSC provides ability to migrate workloads in anticipation of an outage, whether it’s a scheduled infrastructure maintenance or power disruption, incoming hurricane or slowly spreading wildfire
- Storage Array High-Availability – vMSC within a site (could be data center rooms within different nearby buildings) to provide storage array availability, in case an entire storage array decides to take a break.
- Disaster Recovery – vMSC to enable Business Continuity during an unforeseen disaster. I personally believe this is not for the faint of heart, as there are some very important caveats around this use case, and I highly recommend reading Stretched Clusters and VMware vCenter Site Recovery Manager: Understanding the Options and Goals by Ken Wernerburg. Really read this first before going all gung-ho about using vMSC for Disaster Recovery.
Since I used to work at a network vendor and networking always fascinates me, let’s first cover some network considerations when designing a stretched cluster. We’ll discuss the site-to-site network link considerations in this post, and ingress/egress network traffic flows in a future post.
For the site-to-site network links, also called Data Center Interconnects (DCI), there’s nothing really groundbreaking at this point. The options I hear often are as simple as dark-fibre, leased xWDM frequencies from ISPs, or very commonly these days, Cisco Overlay Transport Virtualization (OTV). All of these provide a means of extending Layer 2 domains (aka an IP subnet) across a two or more data center sites.
Why is a Layer 2 domain extended across data center sites needed? First of all, within a vMSC, this allows VMs to reside at either data center site without changing IP addresses. VMs within the subnet are able to communicate directly with their Layer 2 peers, regardless of which site they reside on, with no Layer 3 hops required.
Secondly, it is a vSphere support requirement that vMotion VMkernel ports of each ESXi host must be Layer 2 adjacent to each other in order for vMotion to occur. This is true whether for vSphere clusters within a single DC site or in a vMSC configuration.
You may note that I did not mention VXLAN, and there is a reason for that. Within the Data Center space, VXLAN provides the ability to abstract away Layer 2 domain boundaries, and overlay them over Layer 3 networks. This provides the ability for a VM to be started up anywhere in the data center, and still remain “virtually” Layer 2 connected to its peers (It actually uses tunneling). At this time however, placing vMotion VMkernel ports into VXLAN-backed portgroups is not supported. vMotion traffic MUST still be transported over conventional VLANs and consequently, this requires any of the Layer 2 VLAN extension technologies mentioned earlier. Duncan Epping already has a post out on this, there are actually some interesting comments to that post.
So what should one think about in terms of networks when building out a DCI for vMSC? Before getting into the technology, it is absolutely critical to have a permanent networking specialist whom can be trusted on the vMSC project team. They’ll be able to provide valuable advice on what can or cannot be done, as well as caveats and limitations of existing and new networking technologies. The entire premise of vMSC is based on the idea of extended connectivity, so you want to be sure that the infrastructure that enables this connectivity is properly scrutinized.
In terms of technical requirements, be sure to look in to the following:
- Network latency
- Network bandwidth
- Network monitoring
- Traffic flow
- Link redundancy
Within a vMSC configuration, there is a very stringent latency requirement between the sites for the vMotion network. If using vSphere Enterprise and below, the maximum acceptable round-trip latency is 5ms. For vSphere Enterprise Plus, the maximum acceptable round-trip latency is 10ms. If these limits are exceeded, it is possible that vMotion may not function correctly, or the configuration will be flagged as unsupported if you had to log a case.
Latency can, in a way, also be a affected by available bandwidth. With so many things possibly running across the DCI link, be sure that there is sufficient bandwidth to avoid link congestion, which could directly cause increased latency, or worse, packet drops. Consider sufficient bandwidth for the following:
- vMotion traffic, minimum of 622Mbps (that’s small “b” for bits”)
- Inter-VM or inter-workload traffic which crosses the DCI (tromboning)
- Ingress/Egress traffic which may need to exit via gateways at the other side (also tromboning)
- Other / Miscellaneous application traffic which also happens to utilize the DCI
- Future growth
I wanted to add a note about traffic flow here, since the required DCI bandwidth is now going to be affected about VM placement. Since vMSC allows VMs to reside on any of the sites, it is possible that what used to be local LAN traffic is now going to be traversing lower bandwidth DCI links (comparatively lower vs local LAN networks). If the placement of VMs is not done thoughtfully, for example placing App VM in site 1, but DB and Web VMs in site 2, it would result in network traffic tromboning across the links unnecessarily. Multiply that with the number of applications within the data center, and the DCI links can be quickly saturated. As such, it is best to keep VMs which need to communicate with each other frequently on the same side of the vMSC, which in the earlier example meant keeping all Web/App/DB VMs together within the same preferred site. This can be be easily done using the host affinity DRS settings. At the same time, it is also prudent to identify an acceptable worst case scenario, as in sizing up the DCI bandwidth to accommodate roughly up to x number of VMs migrating or failing over to the non-preferred site, and the subsequent traffic which may be generated by tromboning.
Of course, since we have all these requirements of the network, it is crucial to always have visibility into the health of the DCI link. At the very minimum, link bandwidth utilization must be monitored using SNMP. If possible, even more granular information should be captured using Netflow, to understand VM-VM or app-to-app traffic patterns for optimization, perhaps by relocating VMs which need to communicate to each other a lot (as in the Web/App/DB scenario earlier) to the same data center site, enabling QoS, or if possibly even upgrading the link bandwidth. Of course, be sure to monitor latency as well; as mentioned earlier, this can be affected by congestion on the DCI. It is crucial to keep an eye on it so as not to fall afoul of the 5ms (vSphere Enterprise and below) or 10ms (vSphere Enterprise Plus) vMotion requirements.
Finally, and this is really fundamental, have redundant DCI physical links and network gear (routers/switches) to ensure that there is no single point of failure. So size up the bandwidth you require for a link, and make sure that there is a spare link with at least enough unused capacity to handle the outage of one link without congesting.
I’ll write up something for ingress/egress network traffic flows and routing in a future post.