Archive

Posts Tagged ‘SAN’

The “Problem” with NAS

December 23rd, 2010 Comments off

Introduction

Computer storage has evolved from Directly Attached (DAS) to Storage Area Networks (SAN). Along the way, Sun in 1984 invented NFS, and Network Area Storage (NAS) was born. Since then other NAS protocols have been added, most notably the Windows-based Server Message Block (SMB), aka CIFS. But throughout the history of storage, NAS has been regarded as poorly performing and unreliable compared to SAN and DAS. Certainly Auspex’s creation and NetApp’s advancement of NAS “appliances” helped move NAS from being a science project to a mainstream production solution, but in my opinion NAS is still under-appreciated and under-deployed. Perhaps in light of the new generation of NAS appliances, that should change.

At a more philosophical level, it’s worth asking “what is SAN” and “what is NAS.” Fundamentally, they are storage arrays that make disk space available via varying protocols over varying interconnect media. For the most part, both technologies are available with Fibre Channel (FC), SATA, and SAS disks. Both have disks of varying speeds, capacities, and performance. Traditionally, SANs have been FC connected and NAS appliances connected via Ethernet, but many current products provide both interconnects—block transactions occur via FC or iSCSI and file transactions over Ethernet. A proof point of this merger of NAS and SAN is the FCOE protocol which places Fibre Channel frames over Ethernet networks. Perhaps the most straightforward definition is that “SAN” is block-based storage and “NAS” is file storage, and that a given datacenter should choose which to use for any given application or function. After those decisions are made, it is easier to determine the best products to implement the resulting storage architecture. Now let’s consider the problem with NAS as well as the solutions it can provide.

The “Problem”

Over the years I’ve seen many, many computing infrastructures. Back in the “old days” (say, the 1980s), we had servers and SANs for production, and NAS was pushed to the side. It was typically used for home directories and the storage of utility programs, if at all. In those cases, NAS storage was mounted to all servers as well as all workstations.

That helped NAS gain a reputation for unreliability—probably because any failure caused everyone to notice it, and failures were difficult to recover from (with hard mounts never timing out, for example, taking down all computing until the NAS server could be fixed). Also, many situations called for “cross mounts,” where servers would mount each other’s directories via NFS. If one server then failed, all servers would eventually end up hanging until the failed one recovered. NFS also had quirks like “stale file handles” that left a bad taste in the mouth.

So failures of NFS servers were quite painful to the computing infrastructure. Why did NAS servers fail as often as they did? Well, they were non-clustered, while their SAN brethren typically had more redundant components and automatic recovery from problems. Originally, a “NAS server” was just a general-purpose Sun server running NFS. SAN originally and usually still is a purpose-built storage array. Also, they were and still are network- connected. Back in the day, there was typically one network connection to each workstation (and frequently between servers as well). That one link was used for NAS and non-NAS network traffic. Even if there was a separate network carved out for storage communication between the servers and NAS, it was rarely redundant. Multiple use and single points of failure meant NAS was more prone to failure than SAN. Thus the lingering impression that SAN is more reliable than NAS.
Read more…

Deduplication – Sometimes it’s about performance

June 11th, 2009 Comments off

In a previous post I discussed the topic of deduplication for capacity optimization. Removing redundant data blocks on disk is the first, and most obvious, phase of deduplication in the marketplace. It helps to drive down the most obvious cost – the cost per GB of disk capacity. This market has grown quickly over the last few years. Both startups and established storage vendors have products that compete in the space. They are most commonly marketed as virtual tape library (VTL) or disk-to-disk backup solutions.

Does that mean that deduplication is a point solution for highly sequential workloads? No. There is another somewhat less obvious benefit of deduplication.

What storage administrator does not ask for more cache in the storage array? If I can afford 8GB, I want 16GB. If the system supports 16GB, I want 32GB. Whether it is for financial or technical reasons, cache is always limited. What about deduplicating the data in cache? When the workload is streaming sequential backup data from disk, this may not be very helpful. However, in a primary storage system with a more varied workload, this becomes very interesting.

Read more…

Do I need more cache in my NetApp?

February 27th, 2009 3 comments

How many times have you wondered whether you could improve the performance of your storage array by adding additional cache?

Will more cache improve the performance of my storage array? This is what the vendors so often tell us, but they have no objective information to explain why it is going to help. Depending on the workload, increasing the cache may have little or no effect on performance.

There are two ways to know whether your environment will benefit from additional cache. The first is to understand every nuance of your application. Most storage managers I speak with classify this as impractical at best and impossible at worst. Even if you have an application with a very well understood workload, most storage devices are not hosting a single application. Instead, they are the hosting many different applications. It is even more complex to understand how this combined workload will be effected by adding cache.

The second way to measure cache benefit is to put the cache in and see what happens. This is the most common approach I see in the field. When performance becomes unacceptable, the options of adding additional disk and/or cache are weighed and a purchase is made. (I will save the topic of adding spindles to increase performance for a future post.) Both of these options force a purchase to be made with no guarantee it will solve the problem.

NetApp has introduced a tool to provide a 3rd option: Predictive Cache Statistics. It provides the objective data needed to rationalize a hardware purchase. Predictive Cache Statistics (PCS) is available in systems running 7.3+ and having at least 2GB of memory. When it is enabled, PCS reports what the cache hit ratio would be if the system had 2x (ec0), 4x (ec1), and 8x (ec2) the current cache footprint. (ec0, ec1, and ec2 are the names of the extended caches when the stats are presented by the NetApp system.)

Now, let’s drill down into exactly how predictive cache statistics work…

Read more…

Categories: Storage Tags: , , ,

Sun Storage 7000 Analytics Overview

December 17th, 2008 Comments off

With the release of the Sun Storage 7000 line of storage appliances, Sun has included a new “Analytics” toolkit. These analytics are based on DTrace (http://en.wikipedia.org/wiki/DTrace), but essentially hide the DTrace complexity in a cloak of Ajax-based browser graphics. Through the GUI, a storage administrator can determine which clients are causing which files on the server to be “hot”, or resource use-intensive. Also the administrator can see the latency of each request to the blocks of that file, or how many request of each protocol are being processed, or how many cache hits a file had. In this blog I’ll explore the basics of Analytics.

The analytics component of the Sun Storage 7000 line can provide useful information to a storage administrator who is trying to manage and monitor the appliance and the files and blocks stored there. Just like DTrace, the analytics run in real time, and allow quick progression from hypothesis through data gathering to new hypothesis, data and conclusions. Unlike DTrace, the analytics component has a very complete and useful graphical interface and visualization engine.

Read more…

Categories: Storage Tags: , , ,