Tuesday, October 11, 2011

Long-Distance vMotion: Updates

This is an update to the Long-Distance vMotion series I did earlier this year. If you wish to read it all, start with Long-Distance vMotion: Part 1.

The problem with blogging about technology, techniques and architectures is they change. Sometimes that change is rapid, sometimes it takes time over major releases. In a more converged world where multiple components play, they can change quite rapidly.

Since writing my Long-Distance vMotion (LDVM) series, there have been some changes. Here in lies the dilemma, do I go back and change the old articles, or do I post an update like this new entry. I could add a section to the blog with the latest analysis, called Long-Distance vMotion. Part of me feels I should leave old posts unchanged (except correcting typos and erroneous information). The other approach would be changing the old articles preserving search engine entry points that are currently sending people into the articles – they wouldn’t have to go to another place in the blog for the latest updates. I can post-date new entries, I can’t post-date new information. Which is the best approach? Let me know what you think.

So what’s new? Really two things. First, vSphere 5 came out with some new enhancements on the LDVM front that can be taken advantage of. Second, IBM has decided to enhance the SVC to support distance vMotion across the metro.

For those of you who know me personally, these LDVM blog entries are a subset of a presentation I give in my day job. The presentation goes further in-depth with pictures and animations of how each part interrelates to fully explain the topic. It also takes into account other pieces of the network stack needed to make it work. The blog is stripped down, simplified to make it suitable for easy reading. If you’re interested in this topic, you should engage me or any of my qualified peers for the full-blown presentation. To engage us, send a message to me on Linked In. You can do this from bottom of the right hand column or the Biography tab. (I limit spam this way.)

Why is it stripped down? First, it’s not easy to take pictures and animations and give all of their meaning in a blog post. Second and more importantly, we are competitive! I don’t want competitors gaining all the knowledge necessary to pull this off, not that many can. It combines different disciplines, many of which we are market leaders in. Others may be able to sell the pieces if they had the full list, but few would have the engineering staff to make it all work. We can.

vSphere 5

The first change that effects LDVM is with vSphere 5. Released a few months ago, after the initial shock and awe of the licensing settled; people started digging into the details. If you recall from Long-Distance vMotion: Part 2, vSphere 4 had a latency limit of 5 ms round trip time (RTT). In vSphere 5, with the Enterprise Plus version (and only that version) we now get a feature called Metro vMotion. VSphere 5 Enterprise Plus takes us from 5 ms RTT to 10 ms RTT. With good clean switch-free links, instead of ~400 km at 5 ms RTT latency, we should get double that distance of around ~800 km with 10 ms RTT latency. The latency is what dictates the distance.

Now 800 km might not be available with every LDVM storage solution today, but their limits tend to update faster than major releases of vSphere. It’s a decent distance that may just take us beyond the metro from ~62.5 mi to ~125 mi. Remember that’s circuit distance, not as the crow flies. Additional switching will add additional latency.

IBM SVC Extended Distance Stretched Clusters

For those of you thinking the IBM v7000 killed IBM’s plans for the SVC, it remains the only device to do LDVM – and it just got better. IBM has taken care of many of their LDVM limitations with the latest v6.3 code, due out next month. I’ve been sitting on this for a few weeks now, waiting for it to announce. Well today is that day.

I have had two local happy IBM SVC customers looking to defect to another vendor because the SVC couldn’t do LDVM. Today’s announcement and the pending v6.3 code changes that. I had previously stated that they had a campus solution, limited to 10 km or 6.25 mi. IBM will now allow a new type “extended distance stretched cluster” up to 300 km (100 km will yield better performance). They also take care of the unwieldy amount of dark fiber by now allowing SAN switch ISLs between sites. Previously you had to go from the switch directly into the storage at the remote site with long-wave single mode fiber. With a Cisco MDS solution using VSANs, trying to keep best practices with split-brain protection, I can cut my fiber down to 6 links. It was projected to be 12 links at one customer.

IBM has made a lot of updates, tested and qualified solutions to get to 300 km. There were code changes to allow the greater distance. There was testing and qualification to support using the SAN to facilitate ISL traffic. You should also be able to use FCIP as long as you’re within limits.

IBM has a large install base of SVCs and many customers will be wanting to take a serious look at retooling their exiting solutions. Now that the SVC has enterprise licensing, you can easily test this out or migrate your existing infrastructure by just adding a new pair of nodes and stealing some licensing from some existing clusters.

They still require the most fiber between sites, but I can live with 6. If they offered a software agent like EMC’s VPLEX witness or NetApp’s Microsoft SCOM plugin for split-brain protection instead of the enhanced quorum volume, I could lower my OpEx further by using only 2 links to be on par with EMC’s VPLEX.

Expect to see this area: workload mobility and active-active datacenters heat up. It lets us move VMs around, move into and out of clouds transparently without downtime and will be an evolving technology for years to come. Whether you’re talking about vSphere, HyperV Live Mobility or PowerVM Live Partition Mobility, these technologies are evolving.

IBM has entered the game.

No comments:

Post a Comment