Man down… when you need to DR your DR

Last week I received the following error on my DR ESXi host:


The host in question is a Dell PowerEdge R710, and this indicated a hardware failure of some sort.

mpx.vmhba32:C0:T0:L0 is an 8GB SanDisk SD card containing the ESXi 5.5 boot partitions.  With this gone, it meant the host would still live, but any changes wouldn’t be saved.  It was also unlikely that the host would boot up again.

Sure enough this turned out to be the case.  Three times out of four the server would hang at the BIOS.  To remedy this I disconnected the SD card reader and ordered a replacement.

Unfortunately this turned out to be a red herring, as the new SD card reader failed to be recognised, as did the internal USB stick.  After further testing, I decided it was the control panel (a daughterboard that the SD card reader, internal USB and console port plug into) that was faulty.  After replacing that, the server booted normally into ESXi.

The total downtime was about ten days, which meant when the remote vCenter and vSphere Replication servers came back up there was a lot of data which needed to be replicated.  This subsequently hammered my web connection, so much so that I couldn’t even bring up the vCenter Web Client to monitor how the replication was going.

To find out I needed to switch to the command line.

On the source host I used the following command to get the list of VMs and their associated VM IDs:

vim-cmd vmsvc/getallvms

This give an output similar to:


With each VM ID number, I then used the following command to get status of each replication:

vim-cmd hbrsvc/vmreplica.getState vmid

That told me exactly how much had replicated and how much was left to do.  With a total of thirteen VMs  to replicate, that was going to take a while!

Replication state

The BusyBox shell on ESXi is quite limited, but a loop such as:

for i in `seq 38 43`; do vim-cmd hbrsvc/vmreplica.getState $i; done | grep DiskID

Gives me a basic overview on how my mail replications are going:

For loop

In the long-run this hasn’t caused too much of an issue, however it has made me think about moving my DR needs to vCloud Air

Compress thin-provisioned VMDKs even further

Here in the lab I’m running out of resources to run all my VMs concurrently on the one host.  Rather than add another physical host at this site (or switch off some VMs), and with long-distance vMotion now available in vSphere 6.0, I decided to place another at my secondary site in the UK.

Before the move to vSphere 6.0 (I’m currently studying for my VCAP so want to stay on 5.5 for the time being), I decided it would be best to migrate my 14-host Exchange 2010 environment over the secondary site, and setup SRM in the process.  However before I could do that I needed to setup vSphere Replication to get the actual VMs over.

With the VPN to the secondary site not being the quickest, it was important to compress my already thin-provisioned VMs down as much as possible.

It’s an old one, but always worth doing in this type of scenario.

On each VM, I downloaded Mark Russinovich’s SDelete and ran it on each of the Windows drives:

sdelete -z

This balloons each VMDK up to its maximum size, but don’t worry about that for now.  When it finished, I powered down each VM and switched to the ESXi host console (using SSH) and ran:

vmkfstools -K /vmfs/volumes/path-to-VM/VM.vmdk

The important thing here to remember is to target the VMs normal VMDK, not the one ending in “-flat.vmdk”.

After a while it finished and freed up a lot of space, ready to be replicated using vSphere Replication 5.8!


Each VM still took two days though 😦