Recently I got myself into position where I lost a VMFS partition containing some vital lab VMs. Unfortunately, this was during a quick migration, where the words “what’s the worst that can happen?” were used.
Unfortunately the worst was data loss, and this is what happened.
Please note: this is another example of if you do daft things, don’t be surprised when it comes back to bite you. If you do this in a lab, shame on you. If you do it in production, shame on your boss for not firing you…
Despite a successful trial project using EMC ScaleIO, I decided to migrate my data back to VSAN. Whilst ScaleIO worked admirably (certainly for the cost), it couldn’t match VSAN in terms of performance and manageability.
As I just needed to rebuild the software-defined storage and not each host, I powered off and migrated most of my lab VMs to a large SATA drive (not clever). Each host had a 120GB boot SSD installed, so I created a VMFS partition on each using the remaining free space (approx. 112GB) and moved a handful of essential VMs (domain controller, virtual firewall, vCenter) to each one. As this was only a temporary move, what’s the worst that could happen?
With the ScaleIO datastores empty I decommissioned the installation, put each host into maintenance mode and shut it down.
Here be dragons…
Unfortunately after a reboot, the first host would not mount the VMFS partition. No amount of vmkfstools -V would mount the datastore, and running partedUtil verified the partition table was fine. It also listed a VMFS partition was there, it just refused to mount it.
I checked the VMFS partition using VOMA, specifically:
voma -m vmfs -f check -d /vmfs/devices/disks/naa.600507630080819ba000000000000000:9
Unfortunately all I got back was a list of stale locks. I tried breaking the locks (which worked), but still left me with a hosed system. Finally I called VMware support, and after a few hours the partition was pronounced dead. Time of death…. when I shut the host down.
I’d lost entire partition tables before, so I couldn’t understand how that could be intact but the VMFS partition was corrupt… and therefore nothing I could do.
So I got to work.
Recovering the data
Boot into your favourite Linux distro of choice. For quick and easy stuff like this I pxe-boot Ubuntu direct from my LAN, but use RHEL/CentOS for most lab work. For this demo I’ll assume you’re using the former. The following assumes:
- You’re using Ubuntu
- It’s an x86_64 version
- root privileges
Using a terminal, download the vmfs-tools package from a mirror (substitute _i386.deb for non 64-bit systems):
If you use apt-get to install a repo version, it’ll be an outdated one that is only good for reading VMFS 3 partitions. And that doesn’t happen often these days!
Install the package using (substitute accordingly):
dpkg -i vmfs-tools_0.2.5-1_amd64.deb
Plug your disk into computer. Here I am using a device which converts SATA & IDE to USB:
Keep an eye on the syslog and note the disk label:
View the partition table of the disk using:
fdisk -l /dev/sdc
In the following example, we see two “unknown” partitions, however we know the one we want is approximately 112GB and therefore /dev/sdc3:
Create a mount point for the files we wish to recover:
mkdir -p /mnt/vmfs
Mount the damaged partition:
vmfs-fuse /dev/sdc3 /mnt/vmfs
You can now list the contents of of the partition:
I was extremely pleased to get that data back, despite being told by categorically by VMware Support it was lost.
Now I have invested in a Synology DS1515+ I doubt I’ll be making the same mistake again:
By the way, if you’re in the vExpert Slack, don’t even start…. 😉