Deduplication – It’s not just about capacity
There is no debating that duplication is one of the hottest topics in IT. The question is if the hype has started to become bigger than the technology. Today, there are two primary use cases driving deduplication in the marketplace. The first is backup to disk and the second is virtual guest operating systems (VMware, Hyper-V, and Xen guests). (I will talk a bit about the disk to disk scenario in this article and the virtual guest topic in the next one.) These are both logical markets to adopt deduplication because they suffer from a common challenge. They both create a tremendous amount of redundant data on the disk array. The goal in both cases is to pack more data onto a disk drive and reduce the cost per GB. This is the first and most obvious use case for deduplication.
Disk drive capacity is growing exponentially, but disk performance is increasing at a much slower rate. In many cases, when helping customers size for their workload, performance drives the spindle count and not capacity. It is easy to meet the capacity needs with large drives, but will they meet the performance requirement? That is the problem. Often performance is what dictates the spindle count. It is no longer sufficient to size a storage device based solely on capacity requirements. This is a general challenge that must be taken into account when sizing a storage array.
