Upping my homelab game

20160413 - 1Recently I’ve been toying with the idea of a homelab refresh. My current Dell and Cisco hardware has performed well over the years, and is still very powerful. However new technologies such as All-Flash VSAN and Azure Stack require enterprise-level hardware such as 10GbE and vast amounts of memory.

It was time for a change.

Current lab

At present I use two Dell PowerEdge T710s stuffed to the gills with RAM and gigabit network cards (ten of them!) across two sites. I also use vSphere Replication together with SRM to provide multi-site redundancy.

Three of the NICs pipe into Cisco 2600 routers to emulate a WAN environment. I also use two Cisco 2950 switches and an ASA 5510 to keep my networking skills fresh.

As homelabs go this is still pretty impressive, however it is not without its problems. The PowerEdges are very power hungry… every time they start up the lights dim. They only have one PSU fitted, but 1100W is a lot for any server.

20160413 - 4

Watch… ended

Each server has 144GB of RAM, but I still found this a limitation if I wanted to run full Exchange, View and vRealize Automation solutions concurrently. And this is without the supporting management infrastructure such as domain controllers, RSA Authentication Manager etc etc.

Networking was also a constraint. Whilst most of the VMs don’t require huge amounts of bandwidth (and I don’t need to perform vMotion eight machines at once), certain solutions I wished to evaluate, do. VSAN and Atlantis USX are prime examples.

Finally, noise was a factor. Enterprise-level hardware is not only expensive but noisy. Whilst the “secondary datacentre” (at my parents house) is located in an attic space, the primary currently sits in the dining room – not the official datacentre that it was designed for. As someone said, it had a low wife acceptance factor.

The King is dead, long live the King

Please note: I cannot take credit for this design. The genesis for this refresh can be traced back to Frank Denneman who first put me on to using this hardware.

I began the search for new hardware, which had to meet the following criteria:

  • Powerful – could take 128GB of RAM
  • Future proof – 10GbE was a must
  • Low noise – it had to have a high WAF
  • Low energy requirements – I don’t want the National Grid to have to fire up Dinorwig Power Station every time I power it on

In the end I came across the SuperMicro 5018D-FN4T. These are 19-inch, 1U rackable servers consisting of the following:

  • Intel Xeon D 8-core processor
  • 8GB RAM, expandable to 128GB
  • 2 x 10GbE NICs
  • 2 x 1GbE NICs
  • 1 x PCIe 3.0
  • 1 x M.2 slot
  • Remote management
  • 200W low noise PSU
  • No storage as standard (which suited me fine)

These servers seemed ideal, and on that basis I bought three of them.

However the base model fell way short of what I needed. With a budget in mind I headed over to Amazon and added the following to each box:

As the current Cisco switches lack gigabit, never mind ten gigabit, I decided to add a Netgear XS708E also. Six ports would provide connectivity for the three hosts, and two for trunking into my existing switches.

Conscious that I wanted to run software-defined storage as opposed to traditional fibre-channel SAN I decided I would buy some decent disk controllers for the SSDs. Having previous experience with Dell, I purchased three PERC H700 cards off eBay and placed them in each PCIe slot.

To house the servers, two Cisco switches, two routers, a firewall and a PDU I bought a 12U rack and racked it all up.

20160413 - 2

Damn she’s lovely!

The three servers are powered directly from my APC UPS, whilst the remaining hardware is plugged into the PDU. To get the power cables into the rack we we had to drill a 38mm hole in the back, which unfortunately chewed up one my my father’s drill bits in the process.

Another item for the bill of materials….

Software

Being a massive fan of VMware vSphere, I decided to take advantage of their generous software allowance for vExperts and installed vSphere 6.0 U2 on each of the 128GB SSDs.

With sequential reads up to a maximum of 2,500MB/s, and writes up to 1,500MB/s, it made sense to use the Samsung 950 Pro as the caching tier for VSAN. The remaining 1TB SSD would be the capacity disk.

With that plan I began the installation.

Hardware issue

The first issue I encountered was that the Dell PERC H700 cards refused to boot into the config utility to setup the virtual drives. No amount of pressing <CTRL><R> or changing the motherboard settings did the trick – they just simply wouldn’t play.

I tried:

  • Upgrading/downgrading the BIOS
  • Every possible BIOS combination potentially affecting the PCIe slot
  • Flashing the PERC cards with different vendor firmware/SBR/SPD

Finally, suspecting a dodgy batch of PERC cards I sourced another locally. That failed to work too.

Open-heart disk controller issues

Open-heart disk controller issues

I reached out to SuperMicro support and they did their best to help. They were really great… even writing custom BIOSes for me to try. However in the end I had to admit defeat – for some reason the PERC cards did not want to play with the SuperMicro servers.

Reluctantly I purchased three more disk controllers, this time the IBM M1015 cards. Whilst without a memory cache and battery backup, they’re still pretty solid cards and perform well.

vSphere

vSphere 6.0 U2 installed without issue, but did not include the Intel 10GbE drivers. After a quick hunt on Google I found the necessary vib and installed it.

As each SSD is configured as a virtual drive on the disk controller, vSphere needs to be told each disk is a flash device. In previous versions of vSphere this was time-consuming, but in 6.0 U2 this can done painlessly in the Web Client.

Despite being initially sceptical at how easy it was, I was pleasantly surprised to find that configuring VSAN is just a matter of three clicks. Unfortunately for me, I’d accidentally configured one of the NVMe cards as a datastore at one point, so VSAN refused to use it. To discover why I dropped into the shell and used:

vdq -q

This gave me the problem straight away:I removed the partitions using partedUtil and then proceeded to configure VSAN.

Items outstanding

All my VMs have been migrated, but the move to vSphere 6.0 has left a couple of issues outstanding.

First order of business is to change my vCenter background with William Lam’s help. The second is to secure my PSCs and vCenter with genuine SSL certificates from my multi-tiered in-house Certificate Authority.

The result?

To quote the film, these servers  “move like s**t through a goose”. They are really fast and have plenty of room for future expansion. The move to VSAN has been worthwhile, giving both more space and a hefty increase in performance.

A special thanks

I’d like thank my Dad  for his help with the project. Not only did he spend time racking it all with me, but he painstakingly measured, cut, crimped and tested each network cable to perfection. He was an absolute star and if it wasn’t for him it would have been an absolute mess.

20160413 - 3

What a professional cable-job looks like

I’d also like to thank him for continuing to allow me to store one of the Dell PowerEdge T710s in his attic.

And yes Dad, that is the reason your electricity bill went up £75 a month. And yes, I will most certainly cover the cost of that 😉

22 thoughts on “Upping my homelab game

  1. Hello and thanks for the brilliant post!

    I am also about to purchase 3 of the Supermicro Superservers but I am considering the non-rack mountable version:

    https://tinkertry.com/superservers

    I’ll be equiping each server with 128GB of RAM but the one area I am *really* battling with is the storage side if things. These 3 servers will be used in a vSphere 6 cluster and will only use VSAN for the datastore storage.

    On my current server that uses a Supermicro X10SL7-F (with LSI 2308 in IT mode) I setup a single node VSAN server as a test and so far the performance has been awful. I am using the Samsung Pro 256GB 850 M2 PCIe SSD NVMe drive in the cache tier and am using 4 SATA SSD Samsung Pro 840/850 drives in the capacity tier. At best I am getting (if I am lucky) 80MB/s but often I am getting sub 30MB/s speeds. I have seen one or two file copies hit 300MB/s BRIEFLY but overall the speeds are…ummm…. sad!

    So I hope you don’t mind me asking (since I am just starting to get into VMware) but how or what should I use for the storage in the new Supermicro Superservers that I buy? Do I need a RAID card with cache/battery to maximise performance with VSAN? Do I need an HBA? Do I just use the onboard SATA/M2 connectors? And what SSD drives should I purchase for VSAN? I also keep hearing about queue length which I am finding confusing!

    I’m just a bit confused at this point at how bad the speeds are with VSAN on my current (single) server and would definitely like to avoid that with the new servers considering how much this setup will cost!

    Hope you don’t mind the long post.

    Great blog. I’m subscribing! Cheers.

    Like

    • Hi Sean,

      First off I’d stay away from using the M2 drives for caching (or for anything really). Mine ran like a dog when I tried, and then I started receiving loads of errors. When I rebuilt them all I thought I’d use them as the boot device for ESXi… but they couldn’t even handle that! ESXi installed fine, but at boot it’d just crash.

      The Samsung 850 drives are really good, but the only issue is they don’t have power (upper or lower) protection, so hence they’re not on the HCL. So if you used them for both caching and capacity, and the host lost power, then the lookup table than knows where everything is stored would most likely be corrupted.

      So I would get a LSI-based HBA (also in IT mode), and buy some Intel P3700 drives off eBay for caching and capacity. I went with brand new Samsung PM363 and SM363 drives for mine as I was lucky in that my budget could stretch.

      With regards to the queue depth, check out https://vwilmo.wordpress.com/2014/06/21/home-lab-4-1-supplemental-checking-m1015-queue-depthflashing-it-mode/.

      Don’t forget there are other options (which I may end up trying myself if I ditch VSAN) such as EMC ScaleIO for the software-defined storage (free), and Pernix FVP for acceleration (also free).

      Good luck!

      -Mark

      Like

  2. Hi Mark, thanks for the reply!

    I have the budget (only just!) but would you say the following combination would give me good read/write speeds in VSAN for a 3 node cluster:

    Cache tier: Samsung PM863 240GB Enterprise Class SATA SSD
    Capacity tier: Two Samsung PM863 480GB Enterprise Class SATA SSD

    (the above is for each server) (I assume you meant PM863 and that PM363 was a typo?)

    Should I even consider any datacentre type PCIe SSD NVMe drives?

    I couldn’t find the IBM M1015 SAS controller on the VSAN guide PDF:

    Click to access vi_vsan_guide.pdf

    I just want to make sure this all works! Sorry for being pedantic!

    Does the controller need onboard cache/battery for read/write performance to be ok?

    I really appreciate your help!

    I spent the weekend moving my VMs onto the vsanDatastore where I am using the PCIe SSD for cache and am using 4 SATA SSDs for capacity and performance is rubbish. Sooo slooooooooow. I was expecting 300+MB/s and am barely getting 30MB/s…even with the stripe width set to 4 in the VSAN policy!

    I’d be interested to hear what performance you get with your DC SSD drives and HBA!! 😉

    Like

    • Yes sorry… I meant the “PM863” 🙂

      Based on those specs, that would make an awesome VSAN. You’re approximately at the 10% caching/capacity rule (if not slightly over), so that will work nicely.

      The IBM M1015 is really a LSI9211-8i, which definitely is on the HCL. For info on cross-flashing it, check out: http://www.servethehome.com/ibm-serveraid-m1015-part-4/.

      I’m not sure if not having onboard cache/BBU will be a problem, but I’m sure it would only have helped. Unfortunately 5018-FN4T rackmount form-factor prohibits the use of full-height/full-length cards and using four disks, so even if I could have got the Dell PERC H700s to work I’d been stuffed.

      If I were you I’d be getting those VMs off that vsanDatastore and then rebuilding it with your new kit… whilst you can!

      I plan on running a stressful HCIbench later this week… so keep your eyes open for the results!

      -Mark

      Like

  3. Thanks for the reply!

    After we last “spoke” last night I did a bit more research for my current hardware. I really do want to understand VSAN and my current hardware before spending thousands on a new VSAN cluster. The onboard LSI-2308 controller on my Supermicro X10SL7-F has a queue depth of 600 and the M2 PCIe adapter card I am using for the PCIe SSD has a queue depth of 1024. I confirmed this by using esxtop on my server and the numbers line up. Taking this into account, why would the performance of the PCIe/SATA SSD drives be so dreadful? When testing I only run 2 VMs and there is zero load on the “cluster” (ok its a single server cluster!). I just find it so odd.

    I guess one of the things I am trying to understand with VSAN is: Do you need a RAID card with cache/BBU to get good write speeds? Or is this best left alone for the cache tier?

    This is the second time I have installed ESXi and ended up giving up with it as I just couldn’t get decent storage speeds but I am determinded this time to do it 😉

    I just don’t want to buy another controller card and more drives and be disappointed! I really look forward to your benchmarks.

    The non-rack mounted Supermicro super servers have a single PCIe slot (low profile I think) but not sure if they are full length. This is annoying as I’d *love* to use one of those Intel PCIe SSD drives in it!!

    Theres a long weekend coming up so I may move my VMs onto a temporary datastore and then start trying to get some decent storage performance. I honestly don’t know what I am doing wrong yet. Appreciate your help and comments.

    Cheers!

    Like

  4. Argh, had a really bad week with the performance of my all flash VSAN! ! have since deleted the VSAN datastore as the performance was so poor.

    Do you think using an Intel DC 3700 400GB (with a U2 to M2 adapter) for the cache tier and a Samsung SM863 for the capacity tier (with no RAID card) would yield good performance on the Supermicro Superserver? I think both of these drives are on the VSAN HCL.

    I have yet to achive good disk IO with ESXi…;-(

    Like

    • To be honest, I’m not having a much better time of it. I can’t go into a specifics (for now), but I’m thinking of dropping VSAN.

      The Intel DC 3700 is a PCI card right? Sorry to be thick… but where does the U2/M2 adapter come in? Either way, the SM863 for the capacity tier sounds good. But my concern on using the SATA channels is will there be enough bandwidth to the disks?

      -Mark

      -Mark

      Like

  5. Hi Mark

    I’ve actually got a thread going in the VSAN forum but I’m still stumped and baffled. I keep hearing how amazing VSAN performance is and, I know I am running consumer grade hardware currently, but come on, 20MB/s?? A 5yr old laptop is faster. I’m interested to read more on your thoughts for dropping VSAN! And you have more hardware that is more modern than what I have.

    The Intel DC P3700 comes in 2 form factors: PCIe slot and 2.5″ variety. WIth the Supermicro Superserver it only has two options for installing a PCIe SSD: One is to use the PCIe slot itself and the other is to use the onboard M2 slot. If you buy the 2.5″ Intel DC P3700 then you need a U2 to M2 adapter with the correct cable.

    I agree re the SATA disks: Can they give good read performance on the capacity tier?

    Argh, I feel quite let down by VSAN. Look forward to hearing more of your thoughts on this and what you do for the storage side of things! I know I want to purchase a Supermicro Superserver later this year but am still lost on what to do for the storage side of things. I really want to get 200-300MB/s read/write speeds for what I want to learn/test/lab.

    Like

  6. This guy seems to be getting great performance with VSAN:

    https://communities.vmware.com/thread/514014?start=0&tstart=0

    Interestingly, he has the same motherboard as me and (I assume) is using the onboard LSI-2308 controller (with the same firmware version I am running).

    The big diifference are the disks he’s using: Intel 750 for cache (on the HCL) and SAS 1TB HDDs (also on the HCL).

    Maybe theres hope for me if I just purchase the correct enterprise SATA/SAS/PCIe disks!

    Like

    • It is odd that you’re not getting the requisite performance though.

      As happy as he is with his system, he is using spinning rust for his capacity drives. If your budget can stretch, all-flash is definitely the way forward…

      -Mark

      Like

  7. I agree 100%, all flash is the way to go (especially for dedupe!!). He is running 4 nodes whereas I am running a single node so I’m sure that helps loads with the performance figures he has quoted..

    I’m guessing that I need to replace all the drives in my current server with ones that are on the HCL.

    I’m itching to place on order for the Intel DC P3700 400GB drive for the cache tier and then look at the Samsung SM863 480GB Enterprise Class SATA SSDs for capacity.

    I’m still unsure if my LSI-2308 controller (with SAS ports) will be sufficient for VSAN and I’m still confused as to whether Enterprise SSD SATA drives will work ok performance wise with VSAN?

    This sure is pushing my knowledge of storage 😉

    Like

    • Well I can definitely vouch for the SM863s.The P3700 is also a decent drive, so you should do okay.

      I’m wondering now if I should have gone down the P3700 route. I would have saved myself three LSI9211 cards and three 480GB PM863 drives. Although that’s irrelevant now I’ve moved away from VSAN 😉

      -Mark

      Like

  8. What controller did you connect your SM863s to?

    All these enterprise drives are expensive and add up quickly! Thats why I am trying to get this right first time round. I thought I could try a single SM863 for capacity and then (shock horror) try my Samsung Pro 850 PCIe NVMe for cache again. If performance still sucks then the PCIe NVMe Samsung drive has to go and I’d have to look at the P3700 (which has great endurance!)!

    Have you decided yet what you are going to replace VSAN with?

    Like

  9. Pingback: My home datacenter. - Blah, Cloud.

  10. Pingback: Getting rid of the homelab | virtualhobbit

  11. Pingback: Moving Production to Amazon Web Services | virtualhobbit

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.