Finally, the idea of running a Disaster Recovery test is manageable. VMware Site Recovery Manager combined with vSphere has made it possible to test a failover to a warm site without worrying that the DR test itself will cause an outage.
Setting up Site Recovery Manager and performing a site failover sounds like a daunting task, but VMware has made this very simple, assuming you are familiar with vSphere already. If you already have a virtual environment setup at both your production site and a secondary site, SRM is pretty simple to get started with but allows for almost any DR Plan you can think of to be run.
SRM will need to be installed on a server in both the production site as well as the recovery site. This can be installed on your vCenter Server and does not require a different box if you have the capacity. The installation is pretty straight forward with the typical, “where to install?” question, and the prompt for the location of the vCenter Server. The one questions that you’ll have to decide is whether or not you want the vSphere Replication to be installed. If you are not sure just install it in case you want to use it later, of course you can always run a repair installation and choose the option at that time.
Once you are finished installing Site Recovery Manager you may need to install your Storage Replication Adapter (SRA). The SRA integrates vSphere with your storage solution so that SRM knows what data has been replicated and is on disk. If you have decided to use the vSphere Replication instead, then installing SRAs is not necessary.
Now that you have your software installed in both sites, we need to configure the production and recovery sites. Once SRM is installed, you can access the module in the vSphere Client under Home –> Solutions and Applications –> Site Recovery.
Under the Sites tab, you’ll see the currently connected site. We need to add the secondary site. If you select the Configure Connection button you’ll be presented with your first wizard.
Enter in the secondary site.
Enter the administrator credentials used to connect to the secondary site’s vCenter server.
Notice that during the connection setup, reciprocity is established so there is no need to go through this process at the secondary site. This setup is done automatically for you.
Now you’ll see both sites are setup in SRM.
The last step in site setup is to set a placeholder datastore. This datastore is used to store small virtual machine files even when a failover isn’t occurring. Just select each site and click “Configure Placeholder Datastore” to select where to store the files.
Setting up Array Managers
VMware Site Recovery Manager can’t fail over virtual machines unless all of the data is replicated to the secondary site. In order for SRM to know what data is available, you either need to use vSphere replication available in SRM 5.0 or use the SRA, which will present the replicated volumes to Site Recovery Manager. In the examples for this post, I’ve used the Netapp SRA and i’m connecting to Netapp Data Ontap 8.0 Simulators which are free for download.
You’ll see the Array Managers on the left hand side and in the work pane you should see that one or more of the SRAs are installed.
Click on the Add Array Manager link and another wizard will pop up. Give the Array Manager a name and choose the SRA that you’ve installed.
Enter all of the storage system information for the site.
Click Finish, and the array manager is configured. This must be done on the secondary site as well!
Once the Array Manager is added, an array pair must be configured. This array pair shows the devices that are replicating between the Production Site and the Disaster Site. Click on the SRA that was just configured and go to the Array Pairs tab of the work panel. The local array and a remote array will be listed. If you select the “Enable” action it should setup the array pair.
Now that the array managers are configured, a Protection Group must be setup. The Protection Group will include the virtual machines that need to be protected for a disaster.
On the Protection Groups tab, choose “Create New Protection Group”. Another wizard will start up. Select whether the group is being configured from the Primary Site or the Secondary Site, and which type of replication that is being utilized.
If the Array Managers are setup properly, the datastore that is replicated to the secondary site will show up.
Give the Protection Group a name.
Once the Protection Group is setup, the VMs will need to have their protection configured. The VM protection allows you to specify additional settings for the VMs ,should a failover occur. Choose the Protection Group and click on “Configure Protection.”
When the protection is configured, there are options to set what folder to put the VM in, what resource pool, and if any of the devices should be modified during a failover. For instance if the VM has three hard drives, and one of them is not necessary in a disaster situation, that disk doesn’t need to be connected at the disaster site and can be detached.
The Recovery Plan setup will likely be the most time consuming part of setting up the SRM. This isn’t because it’s difficult to use, but rather due to the amount of detail that the organization might want during a disaster or a test. The Recovery Plan will probably replace a good section of the old run books that have traditionally been used in the case of a disaster. The Recovery Plan will be the steps that happen during a failover.
When the Recovery Plan is created, the first thing that needs to be decided is what site should host the VMs in a disaster scenario.
Next will be the list of Protection Groups that should be included. In the example, I’ve chosen the recovery group that was setup earlier but this could include multiple Protection Groups.
Once the Protection Groups have been chosen, network mapping will occur. There will be options to select what network VMs should be placed in during a real disaster, and also during a test failover. This is very useful, as you can leave the Test Network setting at “Auto” and Site Recovery Manager will create a new “bubble network” that is isolated from the rest of your machines. You don’t have to use the bubble network, but it might be nice to keep the test failover VMs from interacting with the production VM. If you need to do some additional routing, check out the article on Bubble Routing in virtual networks. http://wp.me/p2d48c-7H/
Lastly, name the Recovery Plan. There may be more than one type of recovery plan needed for the organization and this will allow for multiple recovery plans to be run. They can even be run simultaneously if necessary.
Hopefully this post can get you started. I expect to right a few more SRM related posts in the future explaining the details of doing a failover, adding powercli scripts to the Recovery Plans and using the VMware Replication as opposed to SRA.
Mike Laverick has some great information about SRM on his blog. I invite you to check it out sometime.