VMware Fault Tolerance (FT)

VMware Fault Tolerance (FT)

October 21, 2013 0 By Eric Shanks
olsontwins

I think the Olsen twins have been using FT longer than VMware has.

Awesome!  So you’ve got your brand new shiny VMware cluster all setup with HA and think, “Man, I’m in great shape now.  Downtime is a thing of the past!”.

 

Well, not so fast!  VMware High Availability just means that if a physical host fails, the virtual machines can reboot on another host which LIMITS your downtime.  What if your machines are so critical that you can’t have this reboot time in the case of a host failure?  The answer might be VMware Fault Tolerance (FT).

 

VMware implements FT by adding a second virtual machine (or a twin) that is in lockstep with the first.  In essence, they’re twins.  If something happens to the host one of the FT enabled VMs are running on, that VM may stop, but the twin will continue running and handling all of the operations for production use.  Pretty Awesome Stuff.

 How it works

In order for the two machines to work in “vLockstep” there is a logging network setup.  Much like a vMotion network being configured on a different vlan or subnet, the FT network should be setup on a separate network from production.

Two VMs are setup and forced onto different hosts.  The primary VM will read all of the non-deterministic data such as mouse clicks, network info, disk reads ect, and send them to the secondary VM on the FT logging network.  The secondary VM will then replay those logs so that the two VMs seem identical.

FTdiag1

 

 

When a failure of the host of the Primary VM occurs, the secondary VM will take over and a new secondary VM would then be created in order to keep FT current.

FTdiag2

 

How to Configure

First, you need to make sure you have your network setup.  I won’t go into this in detail, but create your portgroup and make sure you select the “Use this virtual adapter for Fault Tolerance logging” option.  Obviously this will need to be setup on all of your hosts.  Distributed switches makes this easier.

FTNetwork

 

 

 

 

 

Next you can right click your virtual machine and select Fault Tolerance –>Turn on Fault Tolerance.FT1

 

 

 

 

You should get a warning message that you can’t use thin provisioning and the VMs won’t use DRS and a memory reservation will be set.  Choose Yes to continue.FT2

 

 

FT will start up and create the secondary VM, which you’ll be able to see in the VMs and Templates view.secondary

 

 

Problems

My setup was not without an issue or two.  I needed to manually change the disks from thin to Eager zeroed thick and the monitor mode wasn’t compatible with my hosts in my lab.  This was easily resolved from the VMware KB http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2000589

FTError1

 

 

Demo time

Here, I’ve opened the console on both the Primary and Secondary VM.  You can see that a ping is running on both, the task manager is identical on both machines and the up times are the same.  Also, notice that the secondary VM is listed as read-only.

sidebyside

Next, I’ve simulated a failure on one of the hosts.  You can see that one of the VM’s keeps right on humming, while the other one goes blank.

Failure1

When the host recovers and both VMs are up again, notice the uptime looks identical again.

Failure2

Restrictions

So FT solves everything… not quite.  FT comes with quite a few restrictions.  The full list is here:  http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010631

The biggest problem is that virtual SMP is not supported.  I would guess in many cases the most critical virtual machine in your organization is possible one of the most resource intensive (think SQL Server).

If you’ve got a VM with a single vCPU and need very high uptime, try FT and see how it goes!