Building an advanced lab using VMware vRealize Automation – Part 1: Intro

20150630 - vRAA few months back my boss came to me and asked if I could build a test lab for the company.  Being a managed services company with a number of technical staff, it was important to give them a training ground where skills can be learned but mistakes also made without consequence.

We have a lot of kit we currently use for testing changes on prior to putting them live, but not a segregated area dedicated for this purpose. Continue reading

My VCAP5-DCA study resources

VCAP-DCA

This is a list of resources I used to pass the VCAP5-DCA (VDCA550) exam.

The first place to star is the exam blueprint.  This really is the go-to resource for everything you can expect to be tested on.

Exam blueprint (v3.3)
http://mylearn.vmware.com/lcms/web/portals/certification/VCAP_Blueprints/VCAP5-DCA-VDCA550-Exam-Blueprint-v3_3.pdf

VMware VCAP5-DCA Official Certification Guide
http://www.amazon.co.uk/books/dp/0789753235

Chris Wahl’s VCAP5-DCA Study Sheet
https://drive.google.com/file/d/0B8JynlqprJW2c1BXSURNUDFsZXM/view?pli=1

Nick Marshall’s vBrownBag VCAP5-DCA Series
https://www.youtube.com/watch?v=c0TJ1rQudTo&list=PLgKUP8MebCghDhhtd1hl3h2cqYqL_G2Rw

VMworld 2014 Breakout Sessions
http://www.vmworld.com/community/sessions/2014/
I especially found the following helpful:
STO2197 – Storage DRS- Deep Dive and Best Practices
NET2745 – vSphere Distributed Switch – Technical Deep Dive
INF2427 – DRS – Advanced Concepts, Best Practices and Future Directions
BCO2701.1 – vSphere HA Best Practices and FT Technology Preview
INF2311.1 – vCenter Server Architecture and Deployment Deep Dive

VMware’s Hands-on-Labs
http://labs.hol.vmware.com/

VMware vSphere 5.1 Clustering Deepdive (Epping/Dennerman)
http://www.amazon.co.uk/VMware-vSphere-5-1-Clustering-Deepdive-ebook/dp/B0092PX72C

Josh Andrews VCAP5-DCA Practice Environment
http://sostechblog.com/2015/03/05/vcap5-dca-practice-environment-test-track-v-550-lab-on-a-laptop-ii/

Together with the above I used a number of the VMware PDFs, namely:

vsphere-esxi-vcenter-server-55-command-line-interface-concepts-examples-guide.pdf
vsphere-esxi-vcenter-server-551-storage-guide.pdf
vsphere-esxi-vcenter-server-55-availability-guide.pdf
vsp_powercli_55_usg.pdf

Oh, and maybe a few of these too… 🙂

Red Bull

Reset root lockout on VMware vCenter Operations Manager 5.8

A few days ago I had the pleasure of fixing a severely hosed vCenter Operations Manager box.  After resolving all of the issues, the last thing to do was to install the latest update.

The intention was not only to bring the version from 5.8.2 up to 5.8.4, but also to update the underlying SUSE install to SP3.  However to do this you need root access, and somewhere along the way I’d fat-fingered the password one too many times and locked the root account out.

Whilst I was confident it would unlock after a period of time, no matter how long I left it (admittedly not twenty four hours) it still refused to unlock.

With being an RHEL fanboi I’m not familiar with SUSE, but luckily I still had console access by way of the Admin account.

The unlock was actually trivially simple.  Logged in as Admin, use:

sudo pam_tally2 --user=root --reset

And that was it – I was back in. After that I copied the .pak upgrade file over using SSH and initiated the OS upgrade:

/usr/lib/vmware-vcops/user/conf/upgrade/va_sles11_spx_init.sh /data/VMware-vcops-SP2-1381807.pak

This did the following:

  1. Stop vCenter Operations Manager services
  2. Copy the .pak file to the Analytics VM
  3. Update the Analytics VM
  4. Update the UI VM

After a while both UI and Analytics VMs came back and the upgrade was complete.

Wednesday Tidbit: PowerCLI script to enable copy & paste

20150713 - PowerCLIUnlike at work, when I access servers in the lab I use the vSphere Client remote console.  Unfortunately since vSphere 4.1, copy & paste between the host and the console has been disabled – which I find a pain (even though it’s probably more secure).  It can be enabled per-VM with the following two settings:

isolation.tools.copy.disable="false"
isolation.tools.paste.disable="false"

I wanted to enable it on all my backend VMs, but not my other VMs (DMZ etc).  For this I used the following PowerCLI script:

$esxi = "lab01.mdb-lab.com"
$credential = Get-Credential
$tgtVLAN = 'VLAN70','VLAN80','VLAN120'

Connect-VIServer $esxi -Credential $credential

Get-VM |where {
   (Get-NetworkAdapter -VM $_ | %{$tgtVLAN -contains $_.NetworkName}) -contains $true} | %{
      New-AdvancedSetting $_ -Name isolation.tools.copy.disable -Value false -Confirm:$false -Force:$true
      New-AdvancedSetting $_ -Name isolation.tools.paste.disable -Value false -Confirm:$false -Force:$true
   }
}

Disconnect-VIServer $esxi -Confirm:$false

I’d like to thank Luc Dekens for helping me with that last bit. If you get chance, check him out at http://www.lucd.info/.

VCAP5-DCA… passed!

VCAP-DCAToday I received the news… I passed the VCAP5-DCA exam.  It was tough, but it’s certainly achievable.

Now that it’s behind me, and before I begin the journey to VCAP5-DCD, I thought I would take a few moments to document my journey.

As you’ve no doubt read elsewhere on the web, time is not your ally.  As the blueprint will tell you (section 1.2), the exam “consists of approximately 23 live lab activities” … meaning it is task-based.  You are expected to perform these tasks at a fast pace… so if you don’t know the answer to something you won’t have much any time to stumble through the PDFs.

The blueprint is the first and last word on what you can expect from the exam – if it’s in there then it’s fair game.  So whilst I felt I had most of this covered, there were grey areas I needed to brush up on like vCO.

After I read the blueprint, I chose a multitude of resources to help me study.  These can be found here.

On the day of the exam I wasn’t apprehensive.  I decided to treat it more like an afternoon at work, and to try and enjoy it (after all, I do enjoy what I do for a living).  I arrived at the testing centre thirty minutes early, and after all the Pearson formalities were dispensed with I got to work.

My game plan was to look at each question for no more than a couple of seconds, and if it looked like I could complete the task relatively quickly then I would attempt to.  If not, I would move forward to the next and come back to it.

The plan worked.  I got through the exam with all but one of the questions answered… and no amount of time was going to solve that one.

As VMware made me (and everyone else) sign an NDA before sitting the exam, I obviously can’t reveal anything about it.  However, I can say that the following tips helped me enormously:

esxcli syntax

Unless you’re Rainman, there’s not much point in trying to remember esxcli‘s myriad of commands.  Use the command list function, and grep for a specific target. For example, syslog:

esxcli esxcli command list | grep syslog

Will give you the following:

esxcli1

This tells you that to set a parameter you need to start in:

esxcli system syslog config set

That example was taken from vCLI, but the vMA operates the same.  Note that in local mode (when esxcli operates on the host) you don’t have to specify –server or credentials.

Session file

Use a session file with esxcli… this will save lots of time:

esxcli --server esxi1.uk.mdb-lab.com --username=root --savesessionfile esxi1 system

Then the next time you connect use -f to specify to session file:

esxcli --server esxi1.uk.mdb-lab.com -f esxi1 system version get

Will give you:

esxcli2

Advanced Search

Finally, as the DCA is an open-book exam, be familiar with the Advanced Search function.

Open one of the PDFs, then navigate to Edit —> Advanced Search.  Select the All PDF Documents in radio button and navigate to the folder containing the VMware documents.

search1

Enter your criteria and click Search:

search2
The PDF reader you have access to on the exam may or may not be Adobe Reader. However, it still has the ability to search through multiple PDFs using the technique above.

Practice practice practice!

All the reading in the World will not help if you can’t execute all the tasks in a real environment very quickly.  Every day I put in at least two hours in the lab, using a combination of the GUI, CLI and PowerCLI.

As I said, the exam is tough but if you know your stuff it is achievable.

Now onto the VCAP-DCD!

Load-balancing Microsoft Exchange with nginx+ – Part 5: Tidying up

nginxIn part 4 of this series I configured Microsoft Exchange to work with nginx.

In this final part of the series I tidy up the loose ends so it can be put live.

Other articles in the series:

  1. Installing and configuring keepalived
  2. Installing nginx+
  3. Configuring nginx+ for Microsoft Exchange
  4. Configuring Microsoft Exchange
  5. Tidying up

The first thing to configure is synchronise the nginx+ configs between both VMs.  To do this we will use rsync over SSH.

Create a new user on both VMs to run the rsync copy.  Insert your own password as desired:

useradd -s /bin/bash -p $(echo mysecretpassword | openssl passwd -1 -stdin) sa_copyconf

On HA1, login as the user and create the SSH keys:

mkdir .ssh
chmod 700 .ssh
cd .ssh
ssh-keygen -t rsa -N '' -b 2048

Accept the default file name for the private key. Add the public key to the list of authorized keys:

cat id_rsa.pub > authorized_keys2
chmod 644 authorized_keys2

Copy the public key over to HA2:

cat id_rsa.pub | ssh ha2.mail.mdb-lab.com "mkdir .ssh && chmod 700 .ssh && cat > .ssh/authorized_keys2"

On HA2, login is as sa_copyconf and set the permissions to /home/sa_copyconf/.ssh/authorized_keys2:

chmod 644 /home/sa_copyconf/.ssh/authorized_keys2

Also on HA2, copy across the id_rsa file from HA1 and place in .ssh:

sftp ha1.mail.mdb-lab.com:.ssh/id_rsa .ssh/id_rsa

On each VM, add permission to /etc/nginx/ for sa_copyconf:

setfacl -m u:sa_copyconf:rwx /etc/nginx/

Next, install rsync (if it isn’t already):

yum install rsync -y --nogpgcheck

Create the following script on each host (replace the hostname as needed – on HA1, it should reference HA2 and vice-versa):

cat <<EOF> /home/sa_copyconf/copyconf.sh
#!/bin/bash
rsync -avuz -e ssh ha2.mail.mdb-lab.com:/etc/nginx/nginx.conf /etc/nginx
EOF

Make the script executable:

chmod +x /home/sa_copyconf/copyconf.sh

Add a cron job to run the script every five minutes:

crontab -l | { cat; echo "*/5 * * * * /home/sa_copyconf/copyconf.sh"; } | crontab -

To test, delete the config on HA2:

rm -f /etc/nginx/nginx.conf

Wait ten minutes and the config should now reappear on HA2. To check this:

diff /etc/nginx/nginx.conf <(ssh ha1.mail.mdb-lab.com 'cat /etc/nginx/nginx.conf')

Next, restrict VRRP (the protocol keepalived uses) to the IPs of the two hosts. On HA1:

iptables -D INPUT -p 112 -j ACCEPT
iptables -I INPUT -p 112 -s 172.17.80.12 -j ACCEPT
service iptables save

On HA2:

iptables -D INPUT -p 112 -j ACCEPT
iptables -I INPUT -p 112 -s 172.17.80.11 -j ACCEPT
service iptables save

Test this by pausing the VM currently owning the cluster addresses and verifying they have transferred.

Finally, SELinux needs to be modified so nginx can run.  To demonstrate this, enable SELinux:

setenforce 1

Then restart the nginx service:

service nginx restart

You will get the following error:

nginx: [emerg] bind() to 172.17.80.13:135 failed (13: Permission denied)
nginx: configuration file /etc/nginx/nginx.conf test failed

This is because with SELinux enabled nginx is unable to bind to tcp/25, tcp/135 and tcp/139. To work around this:

grep nginx /var/log/audit/audit.log | audit2allow -m nginx > nginx.te
grep nginx /var/log/audit/audit.log | audit2allow -M nginx
semodule -i nginx.pp

To test, restart the service again:

service nginx restart

nginx should now start without issue.  On each VM run the following as root:

sudo sed -i "/SELINUX=permissive/c\SELINUX=enforcing" /etc/selinux/config

I would like to thank the technical guys at Nginx for help with the SELinux component.  More information regarding this can be found on their blog at http://nginx.com/blog/nginx-se-linux-changes-upgrading-rhel-6-6/.

Nick Shadrin at Nginx has also put together a comprehensive Exchange configuration guide on their site.  I highly recommend checking it out – http://nginx.com/blog/load-balancing-microsoft-exchange-nginx-plus-r6/.

Now that mainstream support for Microsoft Threat Management Gateway 2010 has ended (extended support is available till 14 April 2020), there is an opportunity to leverage technologies such as nginx+ to load-balance and publish Microsoft Exchange 2013 externally when the time comes.  If there is, I’ll be sure to document it!

In this article we have provided a method of syncing the configs, tightened security and re-enabled SELinux.

That completes the series on how to configure nginx+ to load-balance Microsoft Exchange.

Wednesday Tidbit: Join vMA to AD and restrict access

20150703 - VMwareAs part of studying for my VDCA550-DCA, I’ve started relying more on the CLI and a lot less on the GUI.  IMHO, the best tool for the job is the vSphere vMA.

For ease of use, I decided to add it to my domain and then lock it down so only certain users could logon.

First, logon to the vMA and add it to the domain:

sudo domainjoin-cli join nl.mdb-lab.com sa_domainjoin@nl.mdb-lab.com

This will prompt you for the vMA super-user password you set during installation, followed by the password for the account you’re using to add the vMA to the domain. The vMA will then require a reboot.

Once restarted, edit /etc/likewise/lsassd.conf and add the AD groups you wish to have access to the vMA:

sudo sed -i "/require-membership-of/c\require-membership-of = NL\\\vMA Access Users" /etc/likewise/lsassd.conf

In this case, I created an AD group called vMA Access Users and used that.

Load-balancing Microsoft Exchange with nginx+ – Part 4: Configuring Microsoft Exchange

nginxIn part 3 of this series I configured nginx+ to support Microsoft Exchange.

In this part, I configure Microsoft Exchange 2010/13.

Other articles in the series:

  1. Installing and configuring keepalived
  2. Installing nginx+
  3. Configuring nginx+ for Microsoft Exchange
  4. Configuring Microsoft Exchange
  5. Tidying up

The Exchange environment consists of the following:

  1. 3 sites (2 in Amsterdam, 1 in London, 1 DR (Southport, UK))
  2. 2 Windows 2008 R2 domain controllers (core) (1 in Amsterdam, 1 in London)
  3. 11 Exchange 2010 SP3 RU9 servers
  4. 3 client access servers (2 in Amsterdam, 1 in London)
  5. 3 hub transport servers (2 in Amsterdam, 1 in London)
  6. 5 mailbox servers (3 in Amsterdam, 2 in London)
  7. 2 Forefront Threat Management Gateway 2010 servers (1 in Amsterdam, 1 in London)
  8. 1 Windows 2008 R2 landing pad (for administration)

Background information

The Exchange solution I have designed is based on the concept of a production and resource domain.  All user accounts are hosted in the production domains (nl.mdb-lab.com and uk.mdb-lab.com), and all Exchange-related objects reside in the resource domain (mail.mdb-lab.com).  A trust exists between the two  forests, and accounts are linked to mailboxes.

Whilst there are many advantages to this design, it does add extra complexity and there are simpler ways to bring Exchange to the organisation.

The first disadvantage is in the choice of name I made for the resource domain.  Ideally I wanted to use a consistent name across the estate for all services – mail.mdb-lab.com.  Unfortunately with DNS stub domain created to support the forest trust won’t allow this – any request for mail.mdb-lab.com will also return the IP addresses of the two domain controllers in the resource domain.  The only way around this it to configure internal hosts to use outlook.mail.mdb-lab.com and use mail.mdb-lab.com for external clients.  In hindsight I wish I had of named the domain exchange2010.mdb-lab.com.

At first the aim is to load-balance Exchange front-end traffic for users in Amsterdam for both Outlook Web App and the Outlook client. Exchange ActiveSync will also benefit from this additional layer of redundancy, along with using TMG to publish this to external users.

First, create an A record in DNS to point to the load-balanced address:

dnscmd dc1.mail.mdb-lab.com /RecordAdd mail.mdb-lab.com outlook A 172.17.80.13

For inbound SMTP from the internet, mail will come from the Exchange 2010 Edge server in the DMZ. However if you want to take advantage of the load-balanced address for sending email internally then another DNS entry is preferred:

dnscmd dc1.mail.mdb-lab.com /RecordAdd mail.mdb-lab.com smtp A 172.17.80.13

Using the Exchange Management Shell, create a new client access array on your Exchange server:

New-ClientAccessArray -Name "outlook.mail.mdb-lab.com" -fqdn "outlook.mail.mdb-lab.com" -site Amsterdam

Configure the RpcClientAccessServer attribute on the mailbox database:

Set-MailboxDatabase DB1 -RpcClientAccessServer "outlook.mail.mdb-lab.com"

You can check this by using:

Get-MailboxDatabase | select name,rpcclientaccessserver | ft -auto

If done correctly that should show:
RpcClientAccess
When the Outlook client communicates with the Client Access Servers it does so by first connecting the TCP Endpoint  Mapper on tcp/135.  After that, it chooses a port from the dynamic RPC port range (6005-59530).  For load balancing to work, we need to restrict this to as few ports as possible.

We do this by setting the ports in the registry for the Exchange RPC and Address Book services.

Create the following registry keys on each CAS in the site using:

reg add HKLM\SYSTEM\CurrentControlSet\services\MSExchangeAB\Parameters /v RpcTcpPort /t REG_SZ /d 60001
reg add HKLM\SYSTEM\CurrentControlSet\services\MSExchangeRPC\ParametersSystem /v "TCP/IP Port" /t REG_DWORD /d 60000

Reboot each CAS and verify the ports are in place using Netstat:

netstat -an -p tcp | find "60000"

Finally, configure Outlook and connect to Exchange.  The connection status should box should show a connection to the RPC port configured previously:

Outlook connectivity

That’s it for the Exchange configuration.  In part 5 I tidy up a few things before the solution can be put live.

Load-balancing Microsoft Exchange with nginx+ – Part 3: Configuring nginx+ for Microsoft Exchange

nginxIn part 2 of this series I installed nginx+ on both HA1 and HA2.

In this part, I configure nginx+ to support Microsoft Exchange 2010/13.

Other articles in the series:

  1. Installing and configuring keepalived
  2. Installing nginx+
  3. Configuring nginx+ for Microsoft Exchange
  4. Configuring Microsoft Exchange
  5. Tidying up

First, find your Exchange front-end SSL certificate and its serial number:

certutil -store my

Export the certificate (along with the private key) so it can be imported onto the nginx+ VMs:

certutil -exportpfx -p "password" -privatekey serialnumber mail.mdb-lab.com.pfx

Copy the PFX file to HA1 and HA2. Check the file came across okay:

openssl pkcs12 -info -in mail.mdb-lab.com.pfx

Import the certificate (you will be asked for the password you specified in the preceding step):

openssl pkcs12 -in mail.mdb-lab.com.pfx -nocerts -nodes -out mail.mdb-lab.com.key.enc
openssl pkcs12 -in mail.mdb-lab.com.pfx -clcerts -nokeys -out mail.mdb-lab.com.cer
openssl pkcs12 -in mail.mdb-lab.com.pfx -out cacerts.crt -nodes -nokeys -cacerts

The first command extracts the private key, the second the certificate, and the third the CA certificate(s).  Next make the private key ready for nginx+

openssl rsa -in mail.mdb-lab.com.key.enc -out mail.mdb-lab.com.key

Check the private key is correct:

openssl rsa -in mail.mdb-lab.com.key -check

Move the certificate, private key and CA certificates to /etc/nginx/ssl/

rm -f mail.mdb-lab.com.key.enc
rm -f mail.mdb-lab.com.pfx
mkdir -p /etc/nginx/ssl
mv -f mail.mdb-lab.com.* /etc/nginx/ssl/

Edit /etc/nginx/nginx.conf and make sure the following global settings are in place:

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log info;
pid /var/run/nginx.pid;
events {
     worker_connections 1024;
}

Add the following lines to the http block in /etc/nginx/nginx.conf, replacing values for your CAS servers where necessary:

http {
     log_format main '$remote_addr - $remote_user [$time_local] '
          '"$request'' $status $body_bytes_sent '
          '"$http_user_agent" "$upstream_addr"';
     #set the log
     access_log /var/log/nginx/access.log main;
     keepalive_timeout 3h;
     proxy_read_timeout 3h;
     tcp_nodelay on;

     upstream exchange {
          zone exchange-general 64k;
          server 172.17.80.21:443; # Replace with IP address of a your CAS
          server 172.17.80.22:443; # Replace with IP address of a your CAS
          sticky learn create=$remote_addr lookup=$remote_addr
                    zone=client_sessions:10m timeout=3h;
     }

     server {
          # redirect to HTTPS
          listen 80;
          location / {
               return 301 https://$host$request_uri;
               }
     }

     server {
          listen 443 ssl;
          client_max_body_size 2G;
          ssl_certificate /etc/nginx/ssl/mail.mdb-lab.com.cer;
          ssl_certificate_key /etc/nginx/ssl/mail.mdb-lab.com.key;
          ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
          status_zone exchange-combined;
          # redirect from main page to /owa/
          location = / {
               return 301 "/owa/";
          }
     }

     location = /favicon.ico {
          empty_gif;
          access_log off;
     }

     location / {
          proxy_pass https://exchange;
          proxy_buffering off;
          proxy_http_version 1.1;
          proxy_request_buffering off;
          proxy_set_header Connection ''Keep-Alive'';
     }
}

Add the stream block to /etc/nginx/nginx.conf also:

stream {

     upstream exchange-smtp {
          zone exchange-smtp 64k;
          server 172.17.80.31:25; # Replace with IP address of a your Hub Transport
          server 172.17.80.32:25; # Replace with IP address of a your Hub Transport
     }

     upstream exchange-smtp-ssl {
          zone exchange-smtp-ssl 64k;
          server 172.17.80.31:465; # Replace with IP address of a your Hub Transport
          server 172.17.80.32:465; # Replace with IP address of a your Hub Transport
     }

     upstream exchange-smtp-submission {
          zone exchange-smtp-submission 64k;
          server 172.17.80.31:587; # Replace with IP address of a your Hub Transport
          server 172.17.80.32:587; # Replace with IP address of a your Hub Transport
     }

     upstream exchange-imaps {
          zone exchange-imaps 64k;
          server 172.17.80.21:993; # Replace with IP address of a your CAS
          server 172.17.80.22:993; # Replace with IP address of a your CAS
     }

     server {
          listen 25; #SMTP
          status_zone exchange-smtp;
          proxy_pass exchange-smtp;
     }

     server {
          listen 465; #SMTP SSL
          status_zone exchange-smtp-ssl;
          proxy_pass exchange-smtp-ssl;
     }

     server {
          listen 587; #SMTP submission
          status_zone exchange-smtp-submission;
          proxy_pass exchange-smtp-submission;
     }
}

Test the configuration before putting it live:

nginx -t

If everything is correct, it will yield the following:

Config okay

Modify iptables to allow traffic through the host firewall:

for i in {25,80,135,139,443,465,587,60000,60001}; do iptables -I INPUT -p tcp --dport $i -m state --state NEW,ESTABLISHED -j ACCEPT; done

Save the new iptables rulebase:

service iptables save

To get nginx+ running we need to disable SELinux temporarily:

setenforce 0

Edit /etc/selinux/config:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Change SELINUX=enforcing to SELINUX=permissive

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=permissive
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Start the service:

service nginx start

Make sure the config is the same on both HA1 and HA2. In part 5 I’ll configure rsync to ensure the configs are kept in sync.

That’s it for configuring nginx+.  In part 4 I’ll configure Exchange to support our nginx+ configuration.

Man down… when you need to DR your DR

Last week I received the following error on my DR ESXi host:

Configuration-Issue-Lost-connection-to-the-device

The host in question is a Dell PowerEdge R710, and this indicated a hardware failure of some sort.

mpx.vmhba32:C0:T0:L0 is an 8GB SanDisk SD card containing the ESXi 5.5 boot partitions.  With this gone, it meant the host would still live, but any changes wouldn’t be saved.  It was also unlikely that the host would boot up again.

Sure enough this turned out to be the case.  Three times out of four the server would hang at the BIOS.  To remedy this I disconnected the SD card reader and ordered a replacement.

Unfortunately this turned out to be a red herring, as the new SD card reader failed to be recognised, as did the internal USB stick.  After further testing, I decided it was the control panel (a daughterboard that the SD card reader, internal USB and console port plug into) that was faulty.  After replacing that, the server booted normally into ESXi.

The total downtime was about ten days, which meant when the remote vCenter and vSphere Replication servers came back up there was a lot of data which needed to be replicated.  This subsequently hammered my web connection, so much so that I couldn’t even bring up the vCenter Web Client to monitor how the replication was going.

To find out I needed to switch to the command line.

On the source host I used the following command to get the list of VMs and their associated VM IDs:

vim-cmd vmsvc/getallvms

This give an output similar to:

VM IDs

With each VM ID number, I then used the following command to get status of each replication:

vim-cmd hbrsvc/vmreplica.getState vmid

That told me exactly how much had replicated and how much was left to do.  With a total of thirteen VMs  to replicate, that was going to take a while!

Replication state

The BusyBox shell on ESXi is quite limited, but a loop such as:

for i in `seq 38 43`; do vim-cmd hbrsvc/vmreplica.getState $i; done | grep DiskID

Gives me a basic overview on how my mail replications are going:

For loop

In the long-run this hasn’t caused too much of an issue, however it has made me think about moving my DR needs to vCloud Air