Archive

Posts Tagged ‘Sun’

The “Problem” with NAS

December 23rd, 2010 Comments off

Introduction

Computer storage has evolved from Directly Attached (DAS) to Storage Area Networks (SAN). Along the way, Sun in 1984 invented NFS, and Network Area Storage (NAS) was born. Since then other NAS protocols have been added, most notably the Windows-based Server Message Block (SMB), aka CIFS. But throughout the history of storage, NAS has been regarded as poorly performing and unreliable compared to SAN and DAS. Certainly Auspex’s creation and NetApp’s advancement of NAS “appliances” helped move NAS from being a science project to a mainstream production solution, but in my opinion NAS is still under-appreciated and under-deployed. Perhaps in light of the new generation of NAS appliances, that should change.

At a more philosophical level, it’s worth asking “what is SAN” and “what is NAS.” Fundamentally, they are storage arrays that make disk space available via varying protocols over varying interconnect media. For the most part, both technologies are available with Fibre Channel (FC), SATA, and SAS disks. Both have disks of varying speeds, capacities, and performance. Traditionally, SANs have been FC connected and NAS appliances connected via Ethernet, but many current products provide both interconnects—block transactions occur via FC or iSCSI and file transactions over Ethernet. A proof point of this merger of NAS and SAN is the FCOE protocol which places Fibre Channel frames over Ethernet networks. Perhaps the most straightforward definition is that “SAN” is block-based storage and “NAS” is file storage, and that a given datacenter should choose which to use for any given application or function. After those decisions are made, it is easier to determine the best products to implement the resulting storage architecture. Now let’s consider the problem with NAS as well as the solutions it can provide.

The “Problem”

Over the years I’ve seen many, many computing infrastructures. Back in the “old days” (say, the 1980s), we had servers and SANs for production, and NAS was pushed to the side. It was typically used for home directories and the storage of utility programs, if at all. In those cases, NAS storage was mounted to all servers as well as all workstations.

That helped NAS gain a reputation for unreliability—probably because any failure caused everyone to notice it, and failures were difficult to recover from (with hard mounts never timing out, for example, taking down all computing until the NAS server could be fixed). Also, many situations called for “cross mounts,” where servers would mount each other’s directories via NFS. If one server then failed, all servers would eventually end up hanging until the failed one recovered. NFS also had quirks like “stale file handles” that left a bad taste in the mouth.

So failures of NFS servers were quite painful to the computing infrastructure. Why did NAS servers fail as often as they did? Well, they were non-clustered, while their SAN brethren typically had more redundant components and automatic recovery from problems. Originally, a “NAS server” was just a general-purpose Sun server running NFS. SAN originally and usually still is a purpose-built storage array. Also, they were and still are network- connected. Back in the day, there was typically one network connection to each workstation (and frequently between servers as well). That one link was used for NAS and non-NAS network traffic. Even if there was a separate network carved out for storage communication between the servers and NAS, it was rarely redundant. Multiple use and single points of failure meant NAS was more prone to failure than SAN. Thus the lingering impression that SAN is more reliable than NAS.
Read more…

Oracle/Sun F20 Flash Card – How fast is it?

April 15th, 2010 Comments off

I received several questions about the performance of the Oracle/Sun F20 flash card I used in my previous post about block alignment, so I put together a quick overview of the card’s performance capabilities. The following results are from testing the card in a dual socket 2.93Ghz Nehalem (x5570) system running Solaris x64. This is similar to the server platform Oracle uses in the ExaData 2 platform.

The F20 card is a SAS controller with 4 x 24GB flash modules attached to it. You can find more info on the flash modules on Adam Leventhal’s blog and the official Oracle product page has the F20 details.

All of my tests used 100% random 4KB blocks. I focused on random operations, because in most cases it is not cost effective to use SSD for sequential operations. These tests were run with a variety of different thread counts to give an idea of how the card scales with multiple threads. The first test compared the performance of a single 24GB flash module to the performance of all 4 modules. Read more…

Column – OpenSolaris Crossbow

February 17th, 2010 Comments off

Project Crossbow is an innovate, and I think important, new contribution to the OpenSolaris project. Crossbow makes network virtualization and resource management first-class citizens in OpenSolaris. If follows in the footsteps of ZFS by having a simple and easy-to-understand interface, while providing great flexibility and power to the administrator. Crossbow can only be found in OpenSolaris, and is not available in Solaris 10. My February column for ;login: Magazine describes and explores Project Crossbow in detail. You can download it here, but as always I encourage you to become a member of Usenix, thereby gaining access to all of the content of ;login: (along with many other great benefits).

  2010-02-galvin.pdf (678.9 KiB)

You Are Invited to the New England Open Solaris Users Group (NEOSUG) Ninth Meeting

January 22nd, 2010 Comments off

Topic: DTrace Deep Dive and a short talk on LDOM Domains and ZFS

When:
Burlington MA Sun Campus – Feb 2, 2010 6:00PM to 9:00 PM
Boston MA – Boston University – Feb 3, 2010 6:00PM to 9:00 PM
(Note: The same content will be presented twice – once in Burlington and once in Boston. Pick the best location and date as convenient.)

Where:
Feb 2 – Sun Microsystems Burlington Campus; 1 Network Drive, Burlington, MA
Feb 3 – Boston University, Electrical and Computer Engineering Department Photonics Center Building – Room PHO 339 (3rd floor), 8 Saint Mary’s Street Boston, MA 02215
BU Parking: Street parking available on St. Mary’s Street and Bay State Road. Metered parking spots do not require a fee after 6pm.

RSVP: To Linda Wendlandt: lwendlandt@cptech.com

Registration Required! – so we can plan food and drink

Join Jim Mauro and Shannon Sylvia for how-to DTrace, and how to use LDOMs with ZFS.

AGENDA:

6:00-6:20: Registration, Pizza and Beverages

6:20-6:30: Introductions: Peter Galvin, CTO, Corporate Technologies

6:30-8:30: Solaris Dynamic Tracing – DTrace – Jim Mauro, Principle Engineer, Sun Microsystems

8:30-9:00: LDOM Domains and ZFS: An example of creating a ZFS bootable root LDOM domain using jumpstart – Shannon Sylvia, Sysadmin, Northeastern University

9:00 Q&A and Discussion

Also we’ll be giving out official NEOSUG T-Shirts and other trinkets, and copies of the OpenSolaris CD and instruction manual.

For more information see the NEOSUG discussion forum.

Categories: Events Tags: , , ,

Building a ZFS Deduplication System

December 24th, 2009 3 comments

The news of Sun integrating an in-line deduplication feature into ZFS has created quite a buzz in storage circles. And our clients have been asking us about how to gain access to this new feature. This blog post describes the steps needed to build an OpenSolaris server, integrate the deduplication feature, and enable it.

For details about the ZFS deduplication feature, what it does, and how it does it, have a look at Jeff Bonwick’s blog post on the topic. He was the lead engineer on the project so you can take his word on it.

Deduplication was integrated into OpenSolaris build 128. That takes a little explanation. Solaris is Sun’s current commercial operating system. OpenSolaris has two flavors – the semiannual support-able release, and the frequently-updated developer release. The current supportable release is called 2009.06 and is available for download here. Also at that location is the “SXCE” latest build. That distribution is more like Solaris 10 – a big ol’ DVD including all the bits of all the packages. OpenSolaris is the acknowledged future of Solaris, including a new package manager (more like Linux) and a live-CD image that can be booted for exploration, and installed as the core release. To that core more packages can be added via the package manager.
Read more…

Categories: Storage, Systems Tags: , , ,

Column – Immutable Service Containers in OpenSolaris

December 21st, 2009 Comments off

The OpenSolaris security team has added an interesting proof of concept feature. Immutable Service Containers are designed to make building, configuring, and recreating pre-secured containers easier. The net result, if incorporated into OpenSolaris and eventually a future version of Solaris, should be a set of security best practices managed via a feature-rich framework. Between now and then, there is quite a bit of work for the team to do. My December 2009 column in ;login: Magazine discusses the design goals and current state of Immutable Service Containers. Members of USENIX can read it on-line, while others can download it here:

  2009-12-galvin-login-column.pdf (269.0 KiB)

Categories: Systems Tags: , ,

Warning – A Sun kernel patch can break IP Multipathing

November 6th, 2009 Comments off

There is a bug that has been hit by one of our clients and we wanted to post a quick alert before other sites implement the change that causes this problem.

The problem is only of concern to sites running Sun Solaris and using the IP Multipathing facility – using multiple ethernet connections bundled together for availability and performance.

Here are the details of the problem:

There is an issue with IPMP failures (Probe based detection only) due to a kernel patch (141444-09 {SPARC} and 1414450-09 {x86}) found in the latest Solaris 10 Recommended Patch Cluster (Released 10/21/09).

See Patch Cluster ReadMe for additional details on patch contents.

The included kernel patch causes failures with IPMP Probe Based Failure Detection IPMP Groups, which is what we frequently use when deploying best practices standalone systems as well as SunCluster based systems. The problem can be confirmed by snooping the FAILED interface for outgoing ICMP probe packets that should exist but don’t, due to the bug caused by the kernel patch. Instead, the active interface that hasn’t failed will be sending and receiving ICMP probe packets using both configured IPMP group test IP address.

The details of the problem are in this bug document:
http://sunsolve.sun.com/search/document.do?assetkey=1-66-271519-1 <http://sunsolve.sun.com/search/document.do?assetkey=1-66-271519-1>

Sun is recommending that the patch cluster (and the specific patch) not be backed out and remain in place because of security fixes it addresses.

Customers using probe based IPMP groups that require stability (and probe based IPMP failure detection) rather than security are best to avoid this Patch Cluster. Customers needing the security protection due to either operation within a hostile environment or compliancy requirements will need to convert their probe based IPMP groups to link based IPMP groups prior to applying the new Patch Cluster. This will reduce the effectiveness of the IPMP failure detection, but will allow the IPMP groups to remain functional until Sun addresses the issue.

We will continue to monitor this issue until resolution is announced, and will post updated information here.  Thanks to Corporate Technologies’ solution architect Ed Hamilton for detecting this problem and writing up the details.

Categories: Systems Tags:

ZFS Capacity Usage – Optimizing Compression and Record Size Settings

October 2nd, 2009 Comments off

I have migrated some data to ZFS filesystems recently and the capacity consumed has surprised me a couple times. In general, it has appeared that the data uses more capacity when stored on the ZFS filesystem. This prompted me to do a little investigating. Is ZFS using more capacity? Is it simply a reporting anomaly? Where is that space going? Does ZFS record size have a major impact? Does enabling compression have a significant impact?

In part, the extra space use is a result of ZFS reporting space utilization differently than other filesystems. When a ZFS filesystem is formatted, almost no capacity is used. A df command will show nearly the entire raw capacity. Many other filesystems take a portion of the raw capacity off the top and reserve it for metadata. This reserve will not show up in df. As data is added to the ZFS filesystem, blocks are allocated for both data and metadata. Both the data and metadata blocks will show up as used capacity. In many other filesystems, at least some of the metadata blocks will be taken from the reserve and only the data blocks will show as consumed capacity. For example, in Solaris, the du command will return the capacity used by the data blocks in a file. In ZFS, that du command returns the total space consumed by the file including metadata and compression. So the question at hand is, when storing a given set of files, does ZFS use more total space than other file systems? That one is difficult to test, given all the variables. But we can test various ZFS configuration options to determine the best settings for minimizing block use.

Read more…

Column – T Servers – Why – and Why Not

September 22nd, 2009 Comments off

In May, we published a blog entry about Sun’s “T” servers (also known as CMT or coolthreads). Those servers are terrific for some applications but are sometimes an ill-fit for others.  That blog posting was expanded into a full column for ;login: Magazine, which is available to USENIX members. Thanks to USENIX, we are allowed to republish the column to make it available to non-members. You can download the full column here:

  August 2009 Usenix ;login: column (150.9 KiB)

Categories: Systems Tags: , ,

Oracle & Sun – What to do with the hardware business

August 27th, 2009 Comments off

The questions are going to continue here until Oracle officially owns Sun and perhaps beyond. Will Oracle sell the Sun hardware business? As I have said in the past, I do not think they will. I could certainly be wrong and many industry analysts think I am. Here are a few new data points to think about:

  1. The rumor mill continues to churn and CNNMoney.com is suggesting that HP may want to purchase the Sun hardware business. HP has the cash, but does the investment make sense? Would Oracle sell Solaris as well? HP would be in a tough position if they bought the hardware but Oracle still owned Solaris. Interestingly, in the article, CNNMoney points Mark Hurd at HP out as the unnamed “Party B” in the Sun regulatory filings.
  2. Oracle ran this front page ad in the Wall Street Journal today promoting Oracle DB on Sun SPARC. Is this just Oracle bluffing? Perhaps.
  3. If Oracle wants to be in the appliance space, I believe they need to sell general purpose servers. Without the volume that comes from selling general purpose servers, the cost of the appliance platform goes through the roof. Oracle would also have a difficult time getting specialized hardware without paying a premium for a small production run of servers.
  4. Oracle and Larry Ellison want to own the IT budget. The “save money on hardware and spend it on Oracle software” go to market strategy  was nothing short of brilliant. Keeping the Sun hardware business is Ellison’s opportunity to compete head to head with IBM. Oracle would have all the applications and the hardware to run it on. That would be quite a legacy for Ellison.

The US Department of Justice as approved the acquisition. Now, the European Union needs to make a decision before we will get any more answers.

Categories: General, Systems Tags: , ,