Building a ZFS Deduplication System
The news of Sun integrating an in-line deduplication feature into ZFS has created quite a buzz in storage circles. And our clients have been asking us about how to gain access to this new feature. This blog post describes the steps needed to build an OpenSolaris server, integrate the deduplication feature, and enable it.
For details about the ZFS deduplication feature, what it does, and how it does it, have a look at Jeff Bonwick’s blog post on the topic. He was the lead engineer on the project so you can take his word on it.
Deduplication was integrated into OpenSolaris build 128. That takes a little explanation. Solaris is Sun’s current commercial operating system. OpenSolaris has two flavors – the semiannual support-able release, and the frequently-updated developer release. The current supportable release is called 2009.06 and is available for download here. Also at that location is the “SXCE” latest build. That distribution is more like Solaris 10 – a big ol’ DVD including all the bits of all the packages. OpenSolaris is the acknowledged future of Solaris, including a new package manager (more like Linux) and a live-CD image that can be booted for exploration, and installed as the core release. To that core more packages can be added via the package manager.
For this example I started by downloading the 2009.06 OpenSolaris distribution. I then clicked on the desktop “install” icon to install OpenSolaris to my hard drive (in this case inside of VMware Fusion on Mac OS X, but it can be installed anywhere good OSes can live). At the conclusion of that step, my system is now rebooted into 2009.06. The good news is that 2009.06 is a valid release to run for production use. You can pay for support on it, and important security fixes and patches are made available to those with a support contract. The bad news is that deduplication is not available in that release. Rather, we need to point OpenSolaris at a package repository that contains the latest OpenSolaris developer release. Note that the developer release is not supported, and performing these next steps on Opensolaris 2009.06 makes your system unsupported by Sun. But until an official OpenSolaris distribution ships that includes the deduplication code, this is the only current way to get ZFS deduplication.
pbg@opensolaris:~$ pfexec pkg set-publisher -O http://pkg.opensolaris.org/dev opensolaris.org Refreshing catalog Refreshing catalog 1/1 opensolaris.org Caching catalogs ...
Now we tell OpenSolaris to update itself, creating a new boot environment in which the current packages are replaced by any newer packages:
pbg@opensolaris:~$ pfexec pkg image-update Refreshing catalog Refreshing catalog 1/1 opensolaris.org Creating Plan . . . DOWNLOAD PKGS FILES XFER (MB) entire 0/690 0/21250 0.0/449.4 SUNW1394 1/690 1/21250 0.0/449.4 . . .
A few-hundred megabytes of downloads later, OpenSolaris adds a new grub (on x86) boot entry as the default boot environment, pointing at the updated version.
A clone of opensolaris-1 exists and has been updated and activated. On the next boot the Boot Environment opensolaris-2 will be mounted on '/'. Reboot when ready to switch to this updated BE. --------------------------------------------------------------------------- NOTE: Please review release notes posted at: http://opensolaris.org/os/project/indiana/resources/relnotes/200906/x86/ ---------------------------------------------------------------------------
A reboot to that new environment brings up the latest OpenSolaris developer distribution, in this case build 129:
pbg@opensolaris:~$ cat /etc/release
OpenSolaris Development snv_129 X86
Copyright 2009 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 04 December 2009
Finally, ZFS deduplication is available in this system.
pbg@opensolaris:~$ zfs get dedup rpool NAME PROPERTY VALUE SOURCE rpool dedup on default
Let’s try using it:
pbg@opensolaris:~$ pfexec zfs set dedup=on rpool/export/home/pbg cannot set property for 'rpool/export/home/pbg': pool and or dataset must be upgraded to set this property or value
Hmm, the on-disk ZFS format is from the 2009.06 release. We need up upgrade it to gain access to the deduplication feature.
pbg@opensolaris:~$ zpool upgrade This system is currently running ZFS pool version 22. The following pools are out of date, and can be upgraded. After being upgraded, these pools will no longer be accessible by older software versions. VER POOL --- ------------ 14 rpool Use 'zpool upgrade -v' for a list of available versions and their associated features. pbg@opensolaris:~$ zpool upgrade -v This system is currently running ZFS pool version 22. The following versions are supported: VER DESCRIPTION --- -------------------------------------------------------- 1 Initial ZFS version 2 Ditto blocks (replicated metadata) 3 Hot spares and double parity RAID-Z 4 zpool history 5 Compression using the gzip algorithm 6 bootfs pool property 7 Separate intent log devices 8 Delegated administration 9 refquota and refreservation properties 10 Cache devices 11 Improved scrub performance 12 Snapshot properties 13 snapused property 14 passthrough-x aclinherit 15 user/group space accounting 16 stmf property support 17 Triple-parity RAID-Z 18 Snapshot user holds 19 Log device removal 20 Compression using zle (zero-length encoding) 21 Deduplication 22 Received properties For more information on a particular version, including supported releases, see: http://www.opensolaris.org/os/community/zfs/version/N Where 'N' is the version number. pbg@opensolaris:~$ pfexec zpool upgrade -a This system is currently running ZFS pool version 22. Successfully upgraded 'rpool'
Now we are ready to start using deduplication.
pbg@opensolaris:~$ zfs get dedup rpool NAME PROPERTY VALUE SOURCE rpool dedup off default pbg@opensolaris:~$ pfexec zfs set dedup=on rpool pbg@opensolaris:~$ zfs get dedup rpool NAME PROPERTY VALUE SOURCE rpool dedup on local pbg@opensolaris:~$ zpool list rpool NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT rpool 19.9G 10.7G 9.19G 53% 1.00x ONLINE -
Ben Rockwood provides a nice blog entry discussing its use, so I refer you to that site rather than repeating the information.
Also, according to this “PSARC” (architecture plan), deduplication also applies to replication, so in essence a deduplicated stream is used when replicating data. Let’s take a look:
pbg@opensolaris:~$ pfexec zfs snapshot rpool/export/home/pbg@friday pbg@opensolaris:~$ pfexec zfs send -D rpool/export/home/pbg@friday > /var/tmp/pbg-friday-dedupe pbg@opensolaris:~$
Unfortunately, the current “zfs send -D” functionality is only a subset of what is really needed. With -D, within that “send”, a given block is only sent once (and thus deduplicated). However, if additional duplicate blocks are written, executing the same “zfs send -D” again would send the same set of blocks again. There is no knowledge by ZFS of whether a block already exists at the destination of the send. If there was such knowledge, then “zfs send” would only transmit a given block once to a given target. In that case ZFS could become an even better replacement for backup tape: a ZFS system in production replicating to a ZFS system at a DR site, only sending blocks that the DR site has not seen before. Hopefully such functionality is in the ZFS development pipeline.
As it stands, ZFS deduplication is a powerful new feature. Once integrated into production-ready operating system releases and appliances, it could provide a breakthrough in low cost data reduction and management. We plan to track that progress here, so stay tuned.

Peter,
Thank you for this post. I was planning to experiment with deduplication over the Christmas break. You have helped me to prepare. Do you have experience with build 129? Is it stable enough for experimentation by sys admins?
Regards,
Joe
Hi Joe,
Yes, build 129 seems to work fine. No issues so far in playing with deduplication.
–Peter
Cool stuff! Thanks!