Blog

Hands-On Testing and Analysis

Small Clusters of Big Servers Cost Less 

Laurel Hardy Way Out West 01 0

As I was writing an upcoming blog post on sizing HCI clusters, I started thinking about optimizing not just the number of nodes in the cluster but also the size of each server in the cluster. Is the premium Intel charges for Xeon Platinum processors enough to offset the savings on vSphere and other software licensed by the processor socket? I decided to run the numbers and see.

The process was almost enough to make me wish for the old days when a server was basically a server. When I installed a DL380G3 the difference between the minimum 2.4Ghz and maxed out 3.2Ghz processor was minimal. Today the same server model can have anywhere from six to 28 cores per socket.

The first step was to price out a server. Since I planned to use these servers as HCI nodes, and I’m comfortable playing what-if on the Dell website, my victim was the 2U workhorse Dell PowerEdge R740. I configured a server the way I’d like to buy it, including 256GB of DRAM, the RAID1 M.2 SSD boot option and such which totaled $11,500.

I then built a spreadsheet using processors with 8-28 2.1Ghz cores and the premium Dell charges for each over the minimum Xeon 3104. I picked processors with the same clock rate to eliminate one of the 5,000 possible variables.

Processors

3104

4110

4116

6130

8160

8176

Cores

6

8

12

16

24

28

Price Premium

 

$504

$1,462

$3,176

$9,140

$17,180

$/core

$38.50

$60.75

$80.42

$113.88

$200.17

$315.14

In terms of raw cost per compute power it looks like there’s a sweet spot around 16 cores with the cost per core rising rapidly for the high-end Xeon Platinum 8000 processors.

Once we start looking at the cost of a vSphere cluster, the bigger processors start looking more attractive. I decided, kind of arbitrarily, to see how much it would cost to build a cluster of hosts to provide 1200 cores of computing power and roughly 16GB DRAM per core. Again I pulled the 16GB/core number out of thin air, and some may argue that much DRAM is excessive but reducing the hardware cost just makes bigger hosts even more attractive.

Hosts/1200 cores

100

75

50

38.0

25

21

DRAM/Node

256

256

384

512

768

768

Node Compute Hardware (Dell DRAM)

$11,500

$12,004

$15,052

$18,100

$23,692

$23,692

DRAM/CORE

21.33

16.00

16.00

16.00

16.00

13.71

Cluster DRAM

25600

19200

19200

19456

19200

16128

Cluster hardware Total

$1,150,000

$900,300

$752,600

$687,800

$592,300

$497,532

 I then added in the cost of a vSphere Enterprise Plus license with 3-years of production support ($8,308 discounted price from the Dell configurator) for each node and discovered that those 16-core servers that looked like a sweet spot would still cost almost 1.5 times the cost of roughly equivalent horsepower from a smaller number of bigger servers.

Cluster vSphere

$830,800

$623,100

$415,400

$315,704

$207,700

$174,468

Cluster vSphere + Hardware

$1,980,800

$1,523,400

$1,168,000

$1,003,504

$800,000

$672,000

Price vs 28 core

2.95

2.27

1.74

1.49

1.19

1.00

HW vs 28

2.31

1.81

1.51

1.38

1.19

1.00

 

Add VSAN

Since SSDs make up so much of an HCI cluster’s cost, I thought going to an HCI solution might shift the balance a bit to the smaller server side. I added 1.6TB Dell NVMe Mixed use SSDs for the performance/cache layer, and 3.84TB read oriented SATA SSDs for the capacity layer to each server configuration with 1 NVMe SSD and 2-4 SATA SSDs in a disk group to have the total capacity layer around 550TB. The low-end node has 1 NVMe SSD and 2 SATA SSDs while the high-end node has 2 NVMe SSDs and 7 SATA SSDs.

Once we add in the cost of vSAN Enterprise Edition (3995/socket) and 3-years of support the relative costs of large vs. small servers remains the same.

VSAN Enterprise $1,923,600 $1,442,700 $961,800 $730,968 $480,900 $403,956
SSDs $868,100 $651,075 $597,850 $578,854 $597,850 $570,990
Capacity layer RAW storage 768 576 576 583.68 576 564.48
Disk Groups / NVMe SSDs 1 1 1 1 2 2
SATA SSDs 2 2 3 4 6 7
Total VSAN Cluster $4,772,500 $3,617,175 $2,727,650 $2,313,326 $1,878,750 $1,646,946
$/TB $6,214 $6,280 $4,736 $3,963 $3,262 $2,918
Price vs 28 core 2.90 2.20 1.66 1.40 1.14 1.00

While I always knew per socket pricing was encouraging bigger and bigger servers I was surprised to see that even using 12 or 16 core servers could cost half again as much as maxing out my server configs.

What about AMD?

AMD’s Epyc processors promise the power of a dual-socket Xeon from a single socket server by cramming up to 32 cores, and just as importantly 128 PCIe lanes in a single processor. Using a single socket cuts the cost of vSphere and vSAN in half, but when I ran the Dell configurator for a PowerEdge 7415 with a 32-core AMD 7601 and 512GB of memory (the same as the 32 core Xeon Server), the total cost was just nominally lower than the R740 with two 16 core Xeon.

Second Order Costs

My calculations are limited to the day one acquisition costs of a cluster, and to the discounts, Dell offers anyone for a quantity one server purchase (about 30%). I didn’t include:

  • Network costs (every server needs two 10Gbps ports or more)
  • Rack and data center space
  • Power, cooling
  • Administration costs
  • HCI storage efficiencies like erasure coding

All of which are basically cost per server so including them will further push the cost equation in the direction of bigger servers in smaller clusters.

Discounts

All my numbers are the quantity one price on the Dell site.  Organizations that get bigger discounts on software from colleges to megacorps may see smaller servers a bit more affordable than the spreadsheet shows though

Cost isn’t the only consideration

While the math may say that maxed out, servers are cheaper, that doesn’t mean you should buy three or four 28 core servers for your next refresh. A 28 core server with a terabyte of memory would host over 100 virtual machines creating demand on the 25Gbps network connections especially when you evacuate that host, and a terabyte of memory data to perform maintenance.

Of course, the biggest reason not to put all your VMs in 2-4 hosts is the impact of a host failure on the remaining members of the cluster. Three or four hosts supporting the reboot of 30 or 40 VMs each will be overloaded somewhere, which slows apps and delays the reboots, which annoys users, which makes them call the help desk, which makes my phone ring, and you know I hate when my phone rings.

What’s the optimal vSphere host size? Like everything else, it depends but I would start thinking hard about bigger servers, as opposed to bigger clusters if I’d still have a minimum of 8-10 nodes in the cluster.

All my calculations are in a Google Sheet. Feel free to plug in your numbers and see how your clusters change costs with big, and little, servers.

 

 



Like this series on HCI? Do you want a more? I’m presenting a six-hour deep dive webinar/class as part of my friend Ivan’s ipSpace.net site.

The first 2-hour session was live December 11 and now available on demand.  Part 2 goes live January 22nd. Sign up here


Dell, VMware and Intel have all been clients of DeepStorage, LLC on projects unrelated to this blog post.