During a conversation with Mike Laverick recently on Twitter, I was trying to justify a speed comparison that I had performed on a Nimble CS220 SAN. When he correctly pointed out that I was been unfair and not comparing Apples to Apples. So I tried to do a more fair comparison on a 2nd SAN. It quickly turned out that 140 characters is not enough to discuss this properly, hence this post.
The Hosts are comparable both HP, both DL380’s with 6 core 3GHz ish CPU’s
Site a; has a Nimble SAN on 1Gb iSCSI
Site b; has a non-SSD Hybrid SAN that is 18months old, but connected directly with 8Gb Fibre Channel and 10Gb NIC’s
The “very simple” Test
I copied a 40Gb guest (not thin provisioned) – and compared the results.
The operation completed quicker and at a higher throughput rate on the 1Gb iSCSI, by a factor of around 20%.
This raised an interesting question, where is the bottleneck now? Have we come so far with SSD and RAM Caching that the storage is no longer the slowest element?
At what point is Host CPU the issue, and isn’t that what we are ultimately aiming for..
After all VMware Hosts are supposed to be the ‘commodity’. In our arena a replacement CPU would cost £1k, a replacement Host £5k to £10k (maybe), a nice pair of 10Gb capable switches will set you back £30k, and a SAN the thing that should last the longest in my opinion rocks in at anywhere from £50k upwards. Again, in my SMB (Small to Medium Business, SME as was) experience we are in a market where typically we store less than 20Tb of data.
This is surely good news, we have reached a point where changing the cheapest component, will make the most noticeable performance increase – to a point. Obviously, the tables will turn somewhere, and CPU/Host bus speed will outstrip the connectivity. Although, we do have true 16Gb FC support and 40Gb NIC support in VMware v5.5
My knowledge of high end SAN’s and blade servers is extremely limited, so in thoese environments systems may scale completely differently. We are a typical SMB, not a huge corporate with money coming out of our ears for long term testing, and 100’s of Hosts. We have two clusters each with only three Hosts and their own dedicated vFabric managed by one vCenter. We keep as up to date as possible by paying maintenance on our software (currently running VMware 5.0, heading for v5.5 in Q4 2013) but we have tight budgets and a business to maintain.
I do not know if the throughput can be correlated to CPU cores, nor if dual core Hosts will perform any better – ours are all single 6 core CPU’s, with as much RAM as the board will take (well, half the capacity, as you need both sockets filled to access every RAM slot). Maybe someone can offer comment on this missing information.
This is not a scientific test, and this simple test will have many flaws – it has just raised an interesting question in my head. Hence the title of this article, please feel free to pass comment and point out some obvious mistake that I have missed. I am not a commercial “tester” nor do I have access to loads of test equipment, LAN’s , SAN’s or Hosts. These observations are done on my companies equipment – out of office hours .
I look forward to your feedback (I think)…