Archive

Author Archive

Deduplication – Sometimes it’s about performance

June 11th, 2009 Comments off

In a previous post I discussed the topic of deduplication for capacity optimization. Removing redundant data blocks on disk is the first, and most obvious, phase of deduplication in the marketplace. It helps to drive down the most obvious cost – the cost per GB of disk capacity. This market has grown quickly over the last few years. Both startups and established storage vendors have products that compete in the space. They are most commonly marketed as virtual tape library (VTL) or disk-to-disk backup solutions.

Does that mean that deduplication is a point solution for highly sequential workloads? No. There is another somewhat less obvious benefit of deduplication.

What storage administrator does not ask for more cache in the storage array? If I can afford 8GB, I want 16GB. If the system supports 16GB, I want 32GB. Whether it is for financial or technical reasons, cache is always limited. What about deduplicating the data in cache? When the workload is streaming sequential backup data from disk, this may not be very helpful. However, in a primary storage system with a more varied workload, this becomes very interesting.

Read more…

Why Oracle is NOT going to sell off Sun’s hardware business

Why is there such a buzz among the analyst, press, and blogging community that Oracle is going to sell of the Sun hardware business? It makes no sense to me. I shared my thoughts on the acquisition in a previous post, but I am going to elaborate a bit here. Not only do I believe Oracle will continue selling Sun hardware, I think it is the primary reason they bought Sun.

Why would Oracle spend $7.4B to buy Sun? Is it for Solaris? I don’t think so. Solaris is open source and Sun would have welcomed Oracle’s help in tuning the operating system for Oracle’s software applications. Is it for Java? That is a little more plausible, but there was no need for Oracle to control Java. As far as I know, Sun was not doing anything to make it difficult for Oracle to use Java. Oracle is buying Sun for the hardware business. The hardware (and support) business is what generates the revenue at Sun.

I would like to share a few relevant quotes. The first comes from Larry Ellison in a recent interview. He did his best to shut down the rumor mill churning on what will happen to Sun’s hardware business.

Interviewer – “Are you going to exit the hardware business?”

Read more…

Categories: Storage, Systems Tags: ,

Oracle to buy Sun for $7.4B – How will it affect the industry?

April 24th, 2009 1 comment

A few weeks ago, it looked like IBM was going to make a deal to purchase Sun. That fell through when the Sun board could not come to agreement. On April 20th, with very little rumor in the marketplace, Oracle announced they were buying Sun for $7.4B in cash. What does this mean for the new company?

  • To quote a recent Oracle publication, “Oracle plans to engineer and deliver an integrated system – applications to disk – where all the pieces fit and work together, so customers do not have to do it themselves.” Sun is already shipping Infiniband switches and blades with InfiniBand on the motherboard. They have also mentioned IB is on the roadmap for the Sun 7000. Andy Bechtolsheim mentioned it at the Sun product announcement on April 14th. What about an integrated Oracle appliance running on Nehalem blades, Solaris x64,  Sun 7000 storage, and using Infiniband switches. It should not be a major technology leap to put it all together. What would this mean for the Oracle/HP appliance?
  • Sun SPARC processors are at an Oracle pricing disadvantage to IBM Power processors in the current Oracle pricing model. Oracle has never been afraid to use pricing to move the market in their direction. Watch for them to use their pricing model to encourage customer to buy Sun servers.
  • Solaris x64 has been intentionally neglected by Oracle. Oracle delivers patches on Solaris SPARC and Linux immediately. Then, they have historically waited up to 6 months to release that same patch for Solaris x64. In the past, this has helped Oracle push their Linux agenda in the marketplace. Given the ease of porting the Oracle patches to Solaris x64, there is no logical technical reason for this lag. Watch for Solaris x64 to become a first class citizen in the Oracle OS support matrix now that growth of Solaris x64 means growth for Oracle.

Read more…

Categories: Storage, Systems Tags: ,

Deduplication – It’s not just about capacity

April 10th, 2009 1 comment

There is no debating that duplication is one of the hottest topics in IT. The question is if the hype has started to become bigger than the technology. Today, there are two primary use cases driving deduplication in the marketplace. The first is backup to disk and the second is virtual guest operating systems (VMware, Hyper-V, and Xen guests). (I will talk a bit about the disk to disk scenario in this article and the virtual guest topic in the next one.) These are both logical markets to adopt deduplication because they suffer from a common challenge. They both create a tremendous amount of redundant data on the disk array. The goal in both cases is to pack more data onto a disk drive and reduce the cost per GB. This is the first and most obvious use case for deduplication.

Disk drive capacity is growing exponentially, but disk performance is increasing at a much slower rate. In many cases, when helping customers size for their workload, performance drives the spindle count and not capacity. It is easy to meet the capacity needs with large drives, but will they meet the performance requirement? That is the problem. Often performance is what dictates the spindle count. It is no longer sufficient to size a storage device based solely on capacity requirements. This is a general challenge that must be taken into account when sizing a storage array.

Read more…

Speaking at Cloud Computing Expo in NY

March 30th, 2009 Comments off

I will be participating in a panel discussion at the Cloud Computing Expo in New York on Wednesday (4/2). The topic is “How and Why is a Flexible IT Infrastructure the Key To the Future?” Please stop by and say hello if you are at the show.

Categories: Events Tags:

Presentation: Demystifying Deduplication

March 17th, 2009 Comments off

Last week I gave a presentation at the TechForum Roundtable in New York. Thank you to Priscilla Tate for running a great event.

2007.03.18.DemystifyingDeduplication

Backup to disk is the number one application for deduplication today. My presentation covered the most common approaches vendors use to leverage deduplication in a backup environment. This includes client side, backup server based, inline processing, and post-processing.

  Demystifying Deduplication (7.9 MiB)

This topic was originally developed as part of a CTI Strategy Services consulting engagement. The customer credited us with saving them 6 months of effort.

Do I need more cache in my NetApp?

February 27th, 2009 3 comments

How many times have you wondered whether you could improve the performance of your storage array by adding additional cache?

Will more cache improve the performance of my storage array? This is what the vendors so often tell us, but they have no objective information to explain why it is going to help. Depending on the workload, increasing the cache may have little or no effect on performance.

There are two ways to know whether your environment will benefit from additional cache. The first is to understand every nuance of your application. Most storage managers I speak with classify this as impractical at best and impossible at worst. Even if you have an application with a very well understood workload, most storage devices are not hosting a single application. Instead, they are the hosting many different applications. It is even more complex to understand how this combined workload will be effected by adding cache.

The second way to measure cache benefit is to put the cache in and see what happens. This is the most common approach I see in the field. When performance becomes unacceptable, the options of adding additional disk and/or cache are weighed and a purchase is made. (I will save the topic of adding spindles to increase performance for a future post.) Both of these options force a purchase to be made with no guarantee it will solve the problem.

NetApp has introduced a tool to provide a 3rd option: Predictive Cache Statistics. It provides the objective data needed to rationalize a hardware purchase. Predictive Cache Statistics (PCS) is available in systems running 7.3+ and having at least 2GB of memory. When it is enabled, PCS reports what the cache hit ratio would be if the system had 2x (ec0), 4x (ec1), and 8x (ec2) the current cache footprint. (ec0, ec1, and ec2 are the names of the extended caches when the stats are presented by the NetApp system.)

Now, let’s drill down into exactly how predictive cache statistics work…

Read more…

Categories: Storage Tags: , , ,

WAN optimization for array replication

January 27th, 2009 1 comment

As the need for disaster recovery continues to move downmarket from the enterprise to medium and small businesses, the number of IT shops replicating their data to an offsite location is increasing. Array based replication was once a feature reserved for the big budgets of the Fortune 1000. Today, array based replication is a feature that is available on most midrange storage devices (and even some of the entry level products).

This increase in replication deployments has created a new challenge for IT. The most common replication solutions move the data over the IP network. That data puts a significant load on the IP network infrastructure. The LAN infrastructure is almost always up to the task, but the WAN is often not able to handle this new burden. While the prices of network infrastructure have come down over the years, big pipes are still an expensive monthly outlay. So, how do we get that data offsite without driving up those WAN costs? WAN optimization technology provides a potential solution.

Not every workload or protocol can benefit from today’s WAN optimization technology, but replication is one that usually gets a big boost. I gathered some data from a client who is using NetApp SnapMirror to replicate to a remote datacenter and deployed  WAN optimization to prevent a major WAN upgrade.

Read more…

Benchmarking and ‘real FC’

January 5th, 2009 5 comments

Sometimes I think the only people who read technology blogs are people who write other technology blogs. I have no way to figure out if this is true or not, but it is an interesting topic to ponder. Do IT end users actually read technology blogs? If they are reading, they do not seem to comment very frequently. Much more often comments come from other bloggers or competing vendors.

That said, I am going to talk about an issue that some of the storage bloggers seem to be caught up in at the moment. The issue of ‘emulated FC’ vs ‘real FC.’ Let me start off by sharing a few recent posts from other blogs:

Chuck Hollis at EMC writes about the EMC/Dell relationship and takes the opportunity to compare EMC to NetApp. In this case, he is comparing the EMC NX4 to the NetApp FAS2020. The comment in the post that certainly aggravated NetApp is that EMC does “real deal FC that isn’t emulated.” The obvious implications being that EMC FC is not emulated, NetApp FC is emulated, and FC emulation is bad. (This is not a new debate between EMC and NetApp. Look back through the blogs at both companies and you will find plenty of back and forth on the topic.

Kostadis Russos at NetApp has a post explaining why he, not surprisingly, completely disagrees with Chuck.

Stephen Foskett, a storage consultant, posts what I think is an excellent overview of the issues. He cuts through the marketing spin and asks the right questions. His coverage of the topic is so complete, I almost decided not to write about the topic. I will try not to retrace all the issues he covered. I will hit a couple of his high level points in case you have not had a chance to read his post (I highly recommend it though, it is very good.) In summary:

  • All enterprise storage arrays “emulate” Fibre Channel drives to one extent or another
  • NetApp is emulating Fibre Channel drives
  • All modern storage arrays emulate SCSI drives
  • Using the wrong tool for the job will always lead to trouble
  • Which is more important to you, integration, performance, or features?

So, why am I writing about it? I am writing about it because Chuck posted a very good blog entry about benchmarking a few days later that, to me, contradicts the importance he gave to ‘real FC’ on 12/9. I have never meet Chuck or Stephen, but they both seem to be very technically adept from their postings.

Without trying to put words in his mouth (text on his blog?), the overall theme of Chuck’s post is to make sure you use meaningful tests if you want meaningful results from a storage product benchmark. He is absolutely correct. I could not agree more. How many times have we seen benchmarks performed that were completely irrelevant to the workload the array would see in production?

Read more…

HSM without the headaches

December 1st, 2008 1 comment

Hierarchical Storage Managementement (HSM), Information Lifecycle Management (ILM), and Data Lifecycle Management (DLM). Everyone wants to manage their data intelligently to reduce their spending on storage infrastructure. The storage vendors and the trade rags would like to convince us that there are magic tools to solve this challenge. The truth is there is no magic tool to manage unstructured data. (I am not talking about the archiving tools that integrate with application here, I am only talking about unstructured data.) I have tried many tools over the years and they are simply not cost effective. Don’t panic though, in most cases, the solution is far simpler and far less expensive than HSM.

File services is a huge consumer of storage capacity. For the purposes of this conversation, let’s consider file services as NFS or CIFS storage whether they be integrated appliances or a servers leveraging back end storage devices. In most environments I visit, the file serving infrastructure is using tier 1 disk drives (fibre channel, SCSI, or SAS). These disk drives are populated with data that is mostly idle and the storage managers want to get that idle data onto a less expensive disk tier. The most common request is to transparently move the idle data to a SATA based devices.

Let’s walk through this the scenarios for an environment with 20TB of unstructured data.

Read more…

Categories: Storage Tags: , , ,