Home > Storage > Do I need more cache in my NetApp?

Do I need more cache in my NetApp?

February 27th, 2009

How many times have you wondered whether you could improve the performance of your storage array by adding additional cache?

Will more cache improve the performance of my storage array? This is what the vendors so often tell us, but they have no objective information to explain why it is going to help. Depending on the workload, increasing the cache may have little or no effect on performance.

There are two ways to know whether your environment will benefit from additional cache. The first is to understand every nuance of your application. Most storage managers I speak with classify this as impractical at best and impossible at worst. Even if you have an application with a very well understood workload, most storage devices are not hosting a single application. Instead, they are the hosting many different applications. It is even more complex to understand how this combined workload will be effected by adding cache.

The second way to measure cache benefit is to put the cache in and see what happens. This is the most common approach I see in the field. When performance becomes unacceptable, the options of adding additional disk and/or cache are weighed and a purchase is made. (I will save the topic of adding spindles to increase performance for a future post.) Both of these options force a purchase to be made with no guarantee it will solve the problem.

NetApp has introduced a tool to provide a 3rd option: Predictive Cache Statistics. It provides the objective data needed to rationalize a hardware purchase. Predictive Cache Statistics (PCS) is available in systems running 7.3+ and having at least 2GB of memory. When it is enabled, PCS reports what the cache hit ratio would be if the system had 2x (ec0), 4x (ec1), and 8x (ec2) the current cache footprint. (ec0, ec1, and ec2 are the names of the extended caches when the stats are presented by the NetApp system.)

Now, let’s drill down into exactly how predictive cache statistics work…

In most conditions there is no significant impact to system performance. I monitored the change in latency on my test system with PCS enabled and disabled and there was not a measurable difference. The storage controller was running at about 25% CPU utilization at the time with a 40% cache hit rate. NetApp warns in their docs that performance can be effected when the storage controller is at 80% CPU utilization or higher. It is understandable given the amount of information the array has to track in order to provide the cache statistics. This simply means some thought needs to be put into when it is enabled and how long it is run for in production.

Here are the steps required to gather the information:

1) Enable Predictive Cache Statistics (PCS)

options flexscale.enable pcs

2) It is important to allow the workload to run until the virtual caches have time to warm up. In a system with a large amount of cache, this can be hours or even days.  Monitor array performance while the storage workload runs. If latency increases to unacceptable levels, you can disable PCS.

options flexscale.enable off

3) The NetApp perfstat tool can be used to capture and analyze the data that is gathered. I prefer instant gratification, so for this example, I will use real time stats command.

stats show –p flexscale-pcs

The way the results are reported can be a little confusing the first time you look at it. The ec0, ec1, and ec2 ‘virtual caches’ are relative to the base cache in the system being tested (2x, 4x, and 8x). If the test system has 16GB of primary cache, ec0 will represent 32GB of ‘virtual cache’ (2x 16GB). ec1 brings the ‘virtual cache’ to a total of 4x base cache or an additional 32GB beyond ec0. ec2 brings the total to 8x base cache or an additional 64GB beyond ec0 + ec1. The statistics on each line represent the values for that specific cache segment. Hopefully that explanation clears up more confusion than it introduces.

Here are a couple examples. This testing was completed on a NetApp FAS3170. The 3170 platform has 16GB of cache standard. So, in these examples, ec0 is 32GB, ec1 is 32GB, and ec2 is 64GB.

Example 1: 8GB working set, 4KB IO, and 100% random reads

fas3170-a> sysstat -x 5
 CPU   NFS  CIFS  HTTP   Total    Net  kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in    out   read  write  read write   age   hit time  ty util                 in   out    in   out
 39% 39137     0     0   39137  7102 165539    206    370     0     0   >60  100%   3%  T    2%      0     0     0     0     0     0
 39% 39882     0     0   39882  7236 168677    136      6     0     0   >60  100%   0%  -    1%      0     0     0     0     0     0
 39% 39098     0     0   39098  7094 165338    186    285     0     0   >60  100%   3%  T    2%      0     0     0     0     0     0

fas3170-a> stats show -p flexscale-pcs
Instance    Blocks Usage   Hit  Miss Hit Evict Invalidate Insert
                       %    /s    /s   %    /s         /s     /s
     ec0   8388608     0     0     0   0     0          0      0
     ec1   8388608     0     0     0   0     0          0      0
     ec2  16777216     0     0     0   0     0          0      0
---
     ec0   8388608     0     0     0   0     0          0      0
     ec1   8388608     0     0     0   0     0          0      0
     ec2  16777216     0     0     0   0     0          0      0
---
     ec0   8388608     0     0     0   0     0          0      0
     ec1   8388608     0     0     0   0     0          0      0
     ec2  16777216     0     0     0   0     0          0      0

The sysstat shows a cache hit rate of 100%. This is exactly what we would expect for an 8GB dataset on a system with 16GB of cache. The stats command shows that PCS is currently reporting no activity. Again, this is exactly what we should expect with a working set that fits completely in main cache.

Example 2: 30GB working set, 4KB IO, and 100% random reads

fas3170-a> sysstat -x 5
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
 27% 11607     0     0   11607  2173 49352  27850      6     0     0     3   41%   0%  -   99%      0     0     0     0     0     0
 27% 11642     0     0   11642  2180 49518  28097    279     0     0     3   41%  21%  T   99%      0     0     0     0     0     0
 26% 11413     0     0   11413  2138 48511  27773     11     0     0     3   41%   0%  -   99%      0     0     0     0     0     0

fas3170-a> stats show -p flexscale-pcs
Instance    Blocks Usage   Hit  Miss Hit Evict Invalidate Insert
                       %    /s    /s   %    /s         /s     /s
     ec0   8388608     1    38  8560   0     0          0  14811
     ec1   8388608     0     0  8560   0     0          0      0
     ec2  16777216     0     0  8560   0     0          0      0
---
     ec0   8388608     1    65  6985   0     0          0      0
     ec1   8388608     0     0  6985   0     0          0      0
     ec2  16777216     0     0  6985   0     0          0      0
---
     ec0   8388608     1   100  6922   1     0          0  11899
     ec1   8388608     0     0  6922   0     0          0      0
     ec2  16777216     0     0  6922   0     0          0      0

This data was gathered after the 30GB workload had been running for a few minutes, but just after I enabled predictive cache statistics. The PCS data shows that there are very few hits, but there are a significant number of inserts. This is what we should expect when PCS is first enabled. The sysstat output shows a cache hit rate of 41%.

fas3170-a> sysstat -x 5
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write   age   hit time  ty util                 in   out    in   out
 27% 11238     0     0   11238  2105 47784  27862    286     0     0     4   40%  18%  T   99%      0     0     0     0     0     0
 26% 11371     0     0   11371  2130 48349  27934     11     0     0     4   40%   0%  -   99%      0     0     0     0     0     0
 27% 11184     0     0   11184  2096 47554  27938    275     0     0     4   40%  33%  T   99%      0     0     0     0     0     0

fas3170-a> stats show -p flexscale-pcs
Instance    Blocks Usage   Hit  Miss Hit Evict Invalidate Insert
                       %    /s    /s   %    /s         /s     /s
     ec0   8388608    87  6536   456  93   933          0    934
     ec1   8388608     6   453     3  99     0        934    933
     ec2  16777216     0     0     3   0     0          0      0
---
     ec0   8388608    87  6512   435  93     0          0      0
     ec1   8388608     6   435     0 100     0          0      0
     ec2  16777216     0     0     0   0     0          0      0
---
     ec0   8388608    87  6472   450  93   963          0    964
     ec1   8388608     6   445     5  98     0        964    963
     ec2  16777216     0     0     5   0     0          0      0

Now that the ec0 virtual cache has warmed up, the potential value of additional cache becomes more apparent. The hit rate has gone up to 93% and it is servicing over 6500 operations per second. With 32GB of additional cache, 6500+ disk reads would be alleviated and the latency would be dramatically reduced. These cache hits are virtual, so currently those ‘hits’ are still causing disk reads. Clearly, the additional cache will provide a major performance boost, but unfortunately, it is impossible to determine exactly how it will effect overall system performance. The current bottleneck, reads from disk, would be alleviated, but that simply means we will find the next one.

Additional cache can be added to most NetApp systems in the form of a Performance Accelerator Module (PAM). The PAM is a  PCI Express card with 16GB of DRAM on it. It plugs directly into one of the PCI Express slots in the filer. I suspect there a slight increase in latency when accessing data in the PAM over the main system cache. Although, this increase is likely so small that it will not be noticed on the client side as it is a very small portion of the total transaction time from the client perspective. Unfortunately, I do not have first hand performance data that I can share as I have not been able to get access to a PAM for complete lab testing.

It is important to note that a system with 16GB of primary cache and 32GB of PAM cache is not the same as a system with 48GB of primary cache. The PAM cache is populated as items are evicted from primary cache. If there is a hit in the PAM, that block is copied back into primary cache. This type of cache is commonly referred to as a victim cache or an L2 cache. If the goal is to serve a working set without ever going to disk, then that working set needs to fit into extended cache, not the the primary cache plus extended cache.

Predictive cache statistics are a great feature. It gives us the power to answer a question we could only guess at in the past. However, like most end users, I always want more. There are a couple things that I would love to see in the future. First, the PAM cards are 16GB in size. It would be great if the extended cache segments reported by PCS could be in 16GB increments. That would make it even easier to determine the value of each card I add. It would also remove all the confusion around how big ec0, ec1, and ec2 are. The ability to reset the PCS counters back to zero would also be helpful. When testing different workloads, this would allow the stats to be associated with each individual workload.

It is worth noting that this was not a performance test and the data above should be treated as such. Nothing was done to either the client or the filer to optimize NFS performance. In an attempt to prevent these numbers from being used to judge system performance, I am intentionally omiting the details of how the disk was configured.

  1. Deduplication – The NetApp Approach
  2. VMware boot storm on NetApp
  3. VMware boot storm on NetApp – Part 2
  4. The “Problem” with NAS

Categories: Storage Tags: , , ,
  1. Mike Lugassy
    March 4th, 2009 at 11:59 | #1

    Great post!..very useful to help customers if they can benefit from PAM!

  2. Jeff DiNisco
    March 18th, 2009 at 08:31 | #2

    Good article. It would be great to see a before and after. Please post your results when you do get your hands on a PAM card.

  3. April 1st, 2009 at 11:13 | #3

    We were just in the midst of deciding whether to go for another shelf or a PAM module. My hunch was that we needed another shelf and the data supported my hunch. Thank you for the article! It was like gold to me.

Comments are closed.