Archive

Posts Tagged ‘NAS’

VMware boot storm on NetApp – Part 2

December 28th, 2009 Jesse St. Laurent 2 comments

I have received a few questions relating to my previous post about NetApp VMware bootstorm results and want to answer them here.  I have also had a chance to look through the performance data gathered during the tests and have a few interesting data points to share. I also wanted to mention that I now have a pair of second generation Performance Accelerator Modules (PAM 2) in hand and will be publishing updated VMware boot storm results with the larger capacity cards.

What type of disk were the virtual machines stored on?

  • The virtual machines were stored on a SATA RAID-DP aggregate.

What was the rate of data reduction through deduplication?

  • The VMDK files were all fully provisioned at the time of creation. Each operating system type was placed on a different NFS datastore. This resulted in 50 virtual machines on each of 4 shares. The deduplication reduced the physical footprint of the data by 97%

A few interesting stats gathered during the testing. These numbers are not exact and due to the somewhat imprecise nature of starting and stopping statit in synchronization with the start and end of each test.

  • The CPU utilization moved inversely with the boot time. The shorter the boot time, the higher the CPU utilization. This is not surprising as during the faster boots, the CPUs were not waiting around for disk drives to respond. More data was served from cache the the CPU could stay more utilized.
  • The total NFS operations required for each test was 2.8 million.
  • The total GB read by the VMware physical servers from the NetApp was roughly 49GB.
  • The total GB read from disk trended down between cold and warm cache boots. This is what I expected and would be somewhat concerned if it was not true.
  • The total GB read from disk trended down with the addition of each PAM. Again, I would be somewhat concerned if this was not the case.
  • The total GB read from disk took a significant drop when the data was deduplicated. This helps to prove out the theory that NetApp is no longer going to disk for every read of a different logical block that points to the same physical block.

How much disk load was eliminated by the combination of dedup and PAM?

  • The cold boots with no dedup and no PAM read about 67GB of data from disk. The cold boot with dedup and no PAM dropped that down to around 16GB. Adding 2 PAM (or 32GB of extended dedup aware cache) dropped the amount of data read from disk to less that 4GB.

VMware boot storm on NetApp

November 1st, 2009 Jesse St. Laurent Comments off

UPDATE: I have posted an update to this article here: More boot storm details

Measuring the benefit of cache deduplication with a real world workload can be very difficult unless you try it in production. I have written about the theory in the past and I did a lab test here with highly duplicate synthetic data. The results were revealing about how the NetApp deduplication technology impacts both read cache and disk. Based on our findings, we decided to run another test. This time the plan was to test NetApp deduplication with a VMware guest boot storm. We also added the NetApp Performance Accelerator Module (PAM) to the testing.

The test infrastructure consists of 4 dual socket Intel Nehalem servers with 48GB of RAM each. Each server is connected to a 10GbE switch. A FAS3170 is connected to the same 10GbE switch. There are 200 virtual machines: 50 Microsoft Windows 2003, 50 Microsoft Vista, 50 Microsoft Windows 2008, and 50 linux. Each operating system type is installed in a separate NetApp FlexVol for a total of 4 volumes. This was not done to maximize the deduplication results. Instead we did it to allow the VMware systems to use 4 different NFS datastores. Each physical server mounts all 4 NFS datastores and the guests were split evenly across the 4 physical servers.

The test consisted of booting all 200 guests simultaneously. This test was run multiple times with the FAS 3170 cache warm and cold, with deduplication and without, and with PAM and without. Here is a table summarizing the boot timing results. This is the amount of time between starting the boot and the 200th system acquiring an IP address. Here are the results: Read more…

Categories: Storage, Systems Tags: , , , ,

Deduplication – The NetApp Approach

July 20th, 2009 Jesse St. Laurent 5 comments

After writing a couple of articles (here and here) about deduplication and how I think it should be implemented, I figured I would try it on a NetApp system I have in the lab. The goal of the testing here is to compare storage performance of a data set before and after deduplication. Sometimes capacity is the only factor, but sometimes performance matters. The test is random 4KB reads against a 100GB file. The 100GB file represents significantly more data than the test system can fit into its’ 16GB read cache. I am using 4KB because that is the natural block size for NetApp.

To maximize the observability of the results in this deduplication test, the 100GB file is completely full of duplicate data. For those who are interested, the data was created by doing a dd from /dev/zero. It does not get any more redundant than that. I am not suggesting this is representative of a real world deduplication scenario. It is simply the easiest way to observe the effect deduplication has on other aspects of the system.

This is the output from sysstat -x during the first test. The data is being transferred over NFS and the client system has caching disabled, so all reads are going to the storage device. (The command output below is truncated to the right, but the important data is all there.)

Random 4KB reads from a 100GB file – pre-deduplication:

 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
 19%  6572     0     0    6579  1423 27901  23104     11     0     0     7   16%   0%  -  100%      0     7     0     0     0     0
 19%  6542     0     0    6549  1367 27812  23265    726     0     0     7   17%   5%  T  100%      0     7     0     0     0     0
 19%  6550     0     0    6559  1305 27839  23146     11     0     0     7   15%   0%  -  100%      0     9     0     0     0     0
 19%  6569     0     0    6576  1362 27856  23247    442     0     0     7   16%   4%  T  100%      0     7     0     0     0     0
 19%  6484     0     0    6491  1357 27527  22870      6     0     0     7   16%   0%  -  100%      0     7     0     0     0     0
 19%  6500     0     0    6509  1300 27635  23102    442     0     0     7   17%   9%  T  100%      0     9     0     0     0     0

The system is delivering an average of 6536 NFS operations per second. The cache hit rate hovers around 16-17%. As you can see, the working set does not fit in primary cache. This makes sense. The 3170 has 16GB of primary cache and we are randomly reading from a 100GB file. Ideally, we would like to get a 16% cache hit rate (16GB cache / 100GB working set) and we are very close. The disks are running at 100% utilization and are clearly the bottleneck in this scenario. The spindles are delivering as many operations as the are capable of. So what happens if we deduplication this data?

Read more…

Sun 7000 Online Capacity Calculator

July 2nd, 2009 Jesse St. Laurent Comments off

Sun has published a usable capacity calculator available for the Sun 7000. It was originally written by Adam Leventhal and the latest update from Ryan Matthews is available here. The calculator connects to a 7000 series appliance (or simulator) to calculate the usable capacity. Unfortunately, not everyone has easy access to a system. This is an online version of the calculator so you do not need to have a system locally. It is nothing fancy, but it should get the job done.

The online calculator is here.

Categories: Storage Tags: ,

Deduplication – Sometimes it’s about performance

June 11th, 2009 Jesse St. Laurent Comments off

In a previous post I discussed the topic of deduplication for capacity optimization. Removing redundant data blocks on disk is the first, and most obvious, phase of deduplication in the marketplace. It helps to drive down the most obvious cost – the cost per GB of disk capacity. This market has grown quickly over the last few years. Both startups and established storage vendors have products that compete in the space. They are most commonly marketed as virtual tape library (VTL) or disk-to-disk backup solutions.

Does that mean that deduplication is a point solution for highly sequential workloads? No. There is another somewhat less obvious benefit of deduplication.

What storage administrator does not ask for more cache in the storage array? If I can afford 8GB, I want 16GB. If the system supports 16GB, I want 32GB. Whether it is for financial or technical reasons, cache is always limited. What about deduplicating the data in cache? When the workload is streaming sequential backup data from disk, this may not be very helpful. However, in a primary storage system with a more varied workload, this becomes very interesting.

Read more…

Sun 7000 FAQ

April 21st, 2009 Jesse St. Laurent Comments off

Over the course of the next several months, we will be posting a number of FAQs here. The Sun 7000 is the first one to go live. There is a link to the new FAQ in the menu bar on the blog front page. If any of the data is inaccurate, please email. If there is something missing, send the question along and we will take a look at it.

Categories: Storage Tags: , , , , ,

Do I need more cache in my NetApp?

February 27th, 2009 Jesse St. Laurent 3 comments

How many times have you wondered whether you could improve the performance of your storage array by adding additional cache?

Will more cache improve the performance of my storage array? This is what the vendors so often tell us, but they have no objective information to explain why it is going to help. Depending on the workload, increasing the cache may have little or no effect on performance.

There are two ways to know whether your environment will benefit from additional cache. The first is to understand every nuance of your application. Most storage managers I speak with classify this as impractical at best and impossible at worst. Even if you have an application with a very well understood workload, most storage devices are not hosting a single application. Instead, they are the hosting many different applications. It is even more complex to understand how this combined workload will be effected by adding cache.

The second way to measure cache benefit is to put the cache in and see what happens. This is the most common approach I see in the field. When performance becomes unacceptable, the options of adding additional disk and/or cache are weighed and a purchase is made. (I will save the topic of adding spindles to increase performance for a future post.) Both of these options force a purchase to be made with no guarantee it will solve the problem.

NetApp has introduced a tool to provide a 3rd option: Predictive Cache Statistics. It provides the objective data needed to rationalize a hardware purchase. Predictive Cache Statistics (PCS) is available in systems running 7.3+ and having at least 2GB of memory. When it is enabled, PCS reports what the cache hit ratio would be if the system had 2x (ec0), 4x (ec1), and 8x (ec2) the current cache footprint. (ec0, ec1, and ec2 are the names of the extended caches when the stats are presented by the NetApp system.)

Now, let’s drill down into exactly how predictive cache statistics work…

Read more…

Categories: Storage Tags: , , ,

WAN optimization for array replication

January 27th, 2009 Jesse St. Laurent 1 comment

As the need for disaster recovery continues to move downmarket from the enterprise to medium and small businesses, the number of IT shops replicating their data to an offsite location is increasing. Array based replication was once a feature reserved for the big budgets of the Fortune 1000. Today, array based replication is a feature that is available on most midrange storage devices (and even some of the entry level products).

This increase in replication deployments has created a new challenge for IT. The most common replication solutions move the data over the IP network. That data puts a significant load on the IP network infrastructure. The LAN infrastructure is almost always up to the task, but the WAN is often not able to handle this new burden. While the prices of network infrastructure have come down over the years, big pipes are still an expensive monthly outlay. So, how do we get that data offsite without driving up those WAN costs? WAN optimization technology provides a potential solution.

Not every workload or protocol can benefit from today’s WAN optimization technology, but replication is one that usually gets a big boost. I gathered some data from a client who is using NetApp SnapMirror to replicate to a remote datacenter and deployed  WAN optimization to prevent a major WAN upgrade.

Read more…

Sun Storage 7000 Analytics Overview

December 17th, 2008 Peter Galvin Comments off

With the release of the Sun Storage 7000 line of storage appliances, Sun has included a new “Analytics” toolkit. These analytics are based on DTrace (http://en.wikipedia.org/wiki/DTrace), but essentially hide the DTrace complexity in a cloak of Ajax-based browser graphics. Through the GUI, a storage administrator can determine which clients are causing which files on the server to be “hot”, or resource use-intensive. Also the administrator can see the latency of each request to the blocks of that file, or how many request of each protocol are being processed, or how many cache hits a file had. In this blog I’ll explore the basics of Analytics.

The analytics component of the Sun Storage 7000 line can provide useful information to a storage administrator who is trying to manage and monitor the appliance and the files and blocks stored there. Just like DTrace, the analytics run in real time, and allow quick progression from hypothesis through data gathering to new hypothesis, data and conclusions. Unlike DTrace, the analytics component has a very complete and useful graphical interface and visualization engine.

Read more…

Categories: Storage Tags: , , ,