I have received a few questions relating to my previous post about NetApp VMware bootstorm results and want to answer them here. I have also had a chance to look through the performance data gathered during the tests and have a few interesting data points to share. I also wanted to mention that I now have a pair of second generation Performance Accelerator Modules (PAM 2) in hand and will be publishing updated VMware boot storm results with the larger capacity cards.
What type of disk were the virtual machines stored on?
- The virtual machines were stored on a SATA RAID-DP aggregate.
What was the rate of data reduction through deduplication?
- The VMDK files were all fully provisioned at the time of creation. Each operating system type was placed on a different NFS datastore. This resulted in 50 virtual machines on each of 4 shares. The deduplication reduced the physical footprint of the data by 97%
A few interesting stats gathered during the testing. These numbers are not exact and due to the somewhat imprecise nature of starting and stopping statit in synchronization with the start and end of each test.
- The CPU utilization moved inversely with the boot time. The shorter the boot time, the higher the CPU utilization. This is not surprising as during the faster boots, the CPUs were not waiting around for disk drives to respond. More data was served from cache the the CPU could stay more utilized.
- The total NFS operations required for each test was 2.8 million.
- The total GB read by the VMware physical servers from the NetApp was roughly 49GB.
- The total GB read from disk trended down between cold and warm cache boots. This is what I expected and would be somewhat concerned if it was not true.
- The total GB read from disk trended down with the addition of each PAM. Again, I would be somewhat concerned if this was not the case.
- The total GB read from disk took a significant drop when the data was deduplicated. This helps to prove out the theory that NetApp is no longer going to disk for every read of a different logical block that points to the same physical block.
How much disk load was eliminated by the combination of dedup and PAM?
- The cold boots with no dedup and no PAM read about 67GB of data from disk. The cold boot with dedup and no PAM dropped that down to around 16GB. Adding 2 PAM (or 32GB of extended dedup aware cache) dropped the amount of data read from disk to less that 4GB.
There is still time to register for the VMware vs. Hyper-V Hands-on Workshop we are holding on Wednesday, November 18th, at the Hilton Boston/Woburn hotel in Woburn, MA.
The workshop will begin at 8:30 am and includes lunch. During the workshop John Laferriere will present a quick overview of Corporate Technologies. Next I will present a talk based on our VMware vSphere 4 vs. Hyper-V R2 white paper. Next Sean Daly and Joe Gries will do hands-on demonstrations of the two technologies. This will be followed by Q&A and lunch.
We are encouraging attendees to ask us about specific use cases and solution requirements to optimize the value of the workshop. For more details and to register please see the invitation.
UPDATE: I have posted an update to this article here: More boot storm details
Measuring the benefit of cache deduplication with a real world workload can be very difficult unless you try it in production. I have written about the theory in the past and I did a lab test here with highly duplicate synthetic data. The results were revealing about how the NetApp deduplication technology impacts both read cache and disk. Based on our findings, we decided to run another test. This time the plan was to test NetApp deduplication with a VMware guest boot storm. We also added the NetApp Performance Accelerator Module (PAM) to the testing.
The test infrastructure consists of 4 dual socket Intel Nehalem servers with 48GB of RAM each. Each server is connected to a 10GbE switch. A FAS3170 is connected to the same 10GbE switch. There are 200 virtual machines: 50 Microsoft Windows 2003, 50 Microsoft Vista, 50 Microsoft Windows 2008, and 50 linux. Each operating system type is installed in a separate NetApp FlexVol for a total of 4 volumes. This was not done to maximize the deduplication results. Instead we did it to allow the VMware systems to use 4 different NFS datastores. Each physical server mounts all 4 NFS datastores and the guests were split evenly across the 4 physical servers.
The test consisted of booting all 200 guests simultaneously. This test was run multiple times with the FAS 3170 cache warm and cold, with deduplication and without, and with PAM and without. Here is a table summarizing the boot timing results. This is the amount of time between starting the boot and the 200th system acquiring an IP address. Here are the results: Read more…
There are many, many choices available when it comes to virtualization technologies. Even within server virtualization, there are many options. Once the choices have been narrowed, it is still a chore to wade through the options and limitations to determine the best fit for a given datacenter environment.
Some frequent decision points include:
- Is your environment large enough to bother virtualizing?
- If you are running VMware, should you consider Microsoft Windows Server 2008 Hyper-V R2?
- Can Hyper-V run other guest operating systems?
- What should a Windows-only shop do?
To help ease the effort, we’ve created a decision flow chart involving the two contenders on the short list at most sites – VMware vSphere 4 and Microsoft Hyper-V R2. This chart starts from your current infrastructure and leads you through the important decisions, and to the conclusions you are likely to reach.
The chart is based on much more detailed information provided in our vSphere vs. Hyper-V whitepaper available for download in this blog posting as well as the associated talk available here.
Hopefully this chart will help you make your server virtualization decisions. Please get in touch if you would like to review the whitepaper or have us evaluate the virtualization options for your datacenter. (Please click on the image for a full-size view.)

A client invited us to give a presentation at their internal IT conference based on the virtualization whitepaper that we published last month. The whitepaper is available in this post. Registration is not required, but if you register, we will let you know when the next whitepaper comes out. We have a simple privacy policy and we will not fill your inbox with junk.
The talk went over well, including a lively discussion of the pros and cons of both approaches and how they would fit into the client’s infrastructure.
We are making a .pdf of the talk available today, containing much of the content of the talk. You can download the talk here:
Virtualization Presentation - VMware vSphere vs. Microsoft Hyper-V (659.7 KiB)
.
Also, if you are in the Northeastern U.S. and are interested in hearing this talk first hand, please get in touch and perhaps we could present the talk at a lunch-and-learn event at your company.
We’re pleased to make available our first whitepaper. This one is a technical analysis of vSphere 4 vs. Hyper-V R2. If you have any comments, please post them here.
The Executive Summary should give you guidance as to whether this whitepaper will be of use to you:
The battle to be your virtualization vendor is in full swing, and it has important ramifications for the vendors involved, and for your data center. The goal of this whitepaper is to analyze the technical aspects of the two major choices: VMware vSphere 4 and Microsoft Hyper-V R2 (as part of Windows Server 2008 R2). This paper considers server virtualization alone, not desktop virtualization or “presentation virtualization”. Certainly presentation virtualization will be an important aspect of the virtualization gamut, but with the entry of Microsoft into the server virtualization market, and the still-unrealized huge potential for server virtualization, this is a topic of great interest to many datacenters.
This whitepaper covers the following topics:
› A summary of virtualization technologies and terms.
› The reasons to consider virtualizing.
› The features of virtualization and the effect it has on application implementation, and datacenter facility implementation and management.
› The impact that future server technology will have in driving virtualization, based on the need of datacenters to achieve optimal resource use and optimal application performance.
› Decision criteria to use in determining when and how to virtualize a datacenter.
› A description and comparison of the features and pricing of vSphere and Hyper-V.
› An analysis of the current state of virtualization and best practices to consider when deploying virtualized infrastructure.
› Our prognosis of the future of virtualization, the expected next feature sets of virtualization, and the future of data centers management and application deployment.
› Advice on how to determine which of the virtualization offerings to consider and how to test that chosen path.
› Reference pointers and suggestions for further reading.
The whitepaper is free and available for download in .pdf format. Registration is not required, but if you register, we will let you know when the next whitepaper comes out. We have a simple privacy policy and we will not fill your inbox with junk.
Virtualization Whitepaper - VMware vSphere vs. Microsoft Hyper-V (1.1 MiB)
In a previous post I discussed the topic of deduplication for capacity optimization. Removing redundant data blocks on disk is the first, and most obvious, phase of deduplication in the marketplace. It helps to drive down the most obvious cost – the cost per GB of disk capacity. This market has grown quickly over the last few years. Both startups and established storage vendors have products that compete in the space. They are most commonly marketed as virtual tape library (VTL) or disk-to-disk backup solutions.
Does that mean that deduplication is a point solution for highly sequential workloads? No. There is another somewhat less obvious benefit of deduplication.
What storage administrator does not ask for more cache in the storage array? If I can afford 8GB, I want 16GB. If the system supports 16GB, I want 32GB. Whether it is for financial or technical reasons, cache is always limited. What about deduplicating the data in cache? When the workload is streaming sequential backup data from disk, this may not be very helpful. However, in a primary storage system with a more varied workload, this becomes very interesting.
Read more…