SRM TroubleshootingJanuary 27, 2015
Unfortunately, not all software is perfect and from time to time I’ve run into issues with SRM as well. This post is a list of items I often see during SRM deployments and some information to troubleshoot issues.
Log File Locations
SRM Logs: c:programDataVMwareVMware vCenter Site Recovery ManagerLogs
Installation logs: %USERPROFILE%Application DataVMwareVMware Site Recovery ManagerLogs
Storage Replication Adapater Logs: This depends on the SRA Vendor, but try program filesSRANAME to start with
Change the default SRM Install Directory
It’s not really that uncommon for Windows server teams to install applications in a drive other than the C: drive, but SRM might give you a hurdle to cross there. SRM uses Perl for some if it’s install and Perl default installation need both a long and short name for directory listings. In order to allow SRM to be installed into a drive other than C: you modify the following registry key and restart the SRM Server before the install.
Change te Value to 1 and Click OK.
Full information about the issue can be found at the following VMware KB Article.
A constant headache I see during SRM installations is getting SRM to create the a SQL Database. This is either a connection issue such as windows firewall blocking ports between the SRM server and the SQL server, or most likely the database account does not have enough permissions on the database.
The account used to connect to the SQL Server should have the following permissions in SQL:
Many times, I see that the database permissions are set correctly, but the account is missing the server roles. This will prevent SRM from creating all of the tables in the database. Check this VMware KB article out for more information.
Another issue with the database could be that there are multiple schemas on the database.
Here are three rules to follow when using multiple schemas onthe SQL server.
- SRM database schema must have the same as the database user account
- SRM must be the owner of the SRM database schema
- SRM database schema must be the default schema for the SRM user account
This sort of issue happens on occasion, but in most cases I see databases, there is only one schema so this isn’t an issue.
The VMware Site Recovery Manager Service Won’t Start
After a default installation, the SRM Service is created and the default login is “Local System”. This is ok if you are using SQL Server Authentication, but if the database authentication is set to “Windows Authentication” then the service account needs to run as the domain user account with access to the database.
Change Service Account Passwords
This one might surprise you, but changing some of the account information means that you’ll need to re-run the installer and perform a modify action on the installation. There is likely another way to do this that would involve updating fields in the database, or some xml files as well as updating the service passwords, but the easiest way to make these modifications is to re-run the installer by going to your “Programs and Features” on the server and clicking modify on the SRM service. Re-run the install process and only modify the pieces that need changed. This is in the official documentation found here.
A common issue I see during “test” failovers is a problem mounting the snapshotted datastores at the recovery site. A typical error message may be “Failed to Create Snapshots of Replica Devices” or “Timed out (300 seconds) while waiting for SRA to complete ‘discoverDevices’ command”. Many times this is related to the SRA and some tweaks can be made to the timeout settings.
In the Advanced Settings of the site I typically end up modifying the:
- storageProvider.hostRescanRepeatCnt to 2
- storageProvider.hostRescanTimeoutSec to something higher than 300. It all depends on the environment. Sorry I can’t be more specific than that, but it depends.
- storageProvider.waitForAccessibleDatastoreTimeoutSec to something higher than 30, and again it depends.
There are plenty of advanced settings to tweak here, but most of the time the settings I have to change are in the “Storage Provider” section and in SRM 5.8 they give you a nice little summary about each setting to help you out.
Un-Replicated Devices Errors
Sometimes you’ll notice that some of your virtual machines will be in a warning state in the protection group and the Protection Status will say something like “Device Not Found”. This is usually because there is one or more disks that aren’t being replicated to the recovery site. YES, this can include attached CD-ROM devices. It’s an easy fix though.
Open up the VM Protection Settings and go to the device that is throwing the errors. Click the “Detach” button. This tells SRM to detach the device for failover purposes and don’t worry about it. This may be especially useful for virtual machines that have their own paging disk that you’d like to separate from the replicated data to conserve on bandwidth. Just be sure to detach that disk so everything runs smoothly.