Jesse St. Laurent’s New Blog
Jesse St. Laurent will no longer be contributing posts here.. If you’d like to follow his activities, have a look at http://jessestlaurent.com.
Jesse St. Laurent will no longer be contributing posts here.. If you’d like to follow his activities, have a look at http://jessestlaurent.com.
Computer storage has evolved from Directly Attached (DAS) to Storage Area Networks (SAN). Along the way, Sun in 1984 invented NFS, and Network Area Storage (NAS) was born. Since then other NAS protocols have been added, most notably the Windows-based Server Message Block (SMB), aka CIFS. But throughout the history of storage, NAS has been regarded as poorly performing and unreliable compared to SAN and DAS. Certainly Auspex’s creation and NetApp’s advancement of NAS “appliances” helped move NAS from being a science project to a mainstream production solution, but in my opinion NAS is still under-appreciated and under-deployed. Perhaps in light of the new generation of NAS appliances, that should change.
At a more philosophical level, it’s worth asking “what is SAN” and “what is NAS.” Fundamentally, they are storage arrays that make disk space available via varying protocols over varying interconnect media. For the most part, both technologies are available with Fibre Channel (FC), SATA, and SAS disks. Both have disks of varying speeds, capacities, and performance. Traditionally, SANs have been FC connected and NAS appliances connected via Ethernet, but many current products provide both interconnects—block transactions occur via FC or iSCSI and file transactions over Ethernet. A proof point of this merger of NAS and SAN is the FCOE protocol which places Fibre Channel frames over Ethernet networks. Perhaps the most straightforward definition is that “SAN” is block-based storage and “NAS” is file storage, and that a given datacenter should choose which to use for any given application or function. After those decisions are made, it is easier to determine the best products to implement the resulting storage architecture. Now let’s consider the problem with NAS as well as the solutions it can provide.
Over the years I’ve seen many, many computing infrastructures. Back in the “old days” (say, the 1980s), we had servers and SANs for production, and NAS was pushed to the side. It was typically used for home directories and the storage of utility programs, if at all. In those cases, NAS storage was mounted to all servers as well as all workstations.
That helped NAS gain a reputation for unreliability—probably because any failure caused everyone to notice it, and failures were difficult to recover from (with hard mounts never timing out, for example, taking down all computing until the NAS server could be fixed). Also, many situations called for “cross mounts,” where servers would mount each other’s directories via NFS. If one server then failed, all servers would eventually end up hanging until the failed one recovered. NFS also had quirks like “stale file handles” that left a bad taste in the mouth.
So failures of NFS servers were quite painful to the computing infrastructure. Why did NAS servers fail as often as they did? Well, they were non-clustered, while their SAN brethren typically had more redundant components and automatic recovery from problems. Originally, a “NAS server” was just a general-purpose Sun server running NFS. SAN originally and usually still is a purpose-built storage array. Also, they were and still are network- connected. Back in the day, there was typically one network connection to each workstation (and frequently between servers as well). That one link was used for NAS and non-NAS network traffic. Even if there was a separate network carved out for storage communication between the servers and NAS, it was rarely redundant. Multiple use and single points of failure meant NAS was more prone to failure than SAN. Thus the lingering impression that SAN is more reliable than NAS.
Read more…
The major storage manufacturers are all chasing the cloud storage market. The private cloud storage market makes a lot of sense to me. Clients adopting private cloud methodologies have additional, often more advanced, storage requirements. This will frequently require a storage rearchitecture and may dictate changing storage platforms to meet the new requirements. The public cloud storage market outlook is much less clear to me.
If public cloud services are as successful as the analysts, media, and vendors are suggesting they will be, then cloud providers will become massive storage buyers at a scale that dwarfs today’s corporate consumers. Whether the public cloud storage is part of an overall architecture that includes compute and capacity or a pure storage solution, the issue is the same. This is not about 1 or 2PB. The large cloud providers could easily be orders of magnitude larger than that.
Huge storage consumers are exactly what the storage manufacturers are looking for, right? Let me suggest something that may sound counterintuitive. Enormous success of cloud providers will be terrible news for today’s mainstream storage manufacturers. Read more…
We have been working on a comparison between VMware datastores running on NFS, iSCSI, and FC. (Stay tuned. We will publish those results shortly.) Along the way we were reminded of the performance boost that jumbo frames can provide. These tests were run using the same ‘boot storm’ test harness on the server side we have used before (details can be found at the end of this post). The question is, “How much faster will ESX be with jumbo frames enabled?”
Let’s jump right to the answer… Read more…
I received several questions about the performance of the Oracle/Sun F20 flash card I used in my previous post about block alignment, so I put together a quick overview of the card’s performance capabilities. The following results are from testing the card in a dual socket 2.93Ghz Nehalem (x5570) system running Solaris x64. This is similar to the server platform Oracle uses in the ExaData 2 platform.
The F20 card is a SAS controller with 4 x 24GB flash modules attached to it. You can find more info on the flash modules on Adam Leventhal’s blog and the official Oracle product page has the F20 details.
All of my tests used 100% random 4KB blocks. I focused on random operations, because in most cases it is not cost effective to use SSD for sequential operations. These tests were run with a variety of different thread counts to give an idea of how the card scales with multiple threads. The first test compared the performance of a single 24GB flash module to the performance of all 4 modules. Read more…
Block alignment is an important topic that is often overlooked in storage. I read a blog entry by Robin Harris a couple months back about the importance of block alignment with the new 4KB drives. I was curious to test the theory on one of the new 4KB drives, but I did not have one on hand. That got me thinking about Solid State Disk (SSD) devices. If filesystem misalignment hurts traditional spinning disk performance, how would it impact SSD performance. In short, it is ugly.
Here is a chart showing the difference between aligned and misaligned random read operations to a Sun F20 card. I guess it is officially an Oracle F20 card. Read more…
I spoke at TechForum in New York earlier this week. Here is a copy of my presentation for anyone who is interested. The official title is “Rethinking Storage Strategies: How Virtualization is Transforming Storage.” At a high level, I spoke about the current trends in storage and how they play together with server virtualization. I do not think it will have the same impact without the running commentary, so feel free to comment here or drop me a line if you have any questions.
Storage Trends and Server Virtualization (199.0 KiB)
When Oracle announced the Exadata V2 database appliance late last year, it created quite a stir. The performance numbers for the box are extremely high, and the feature set and capacity are quite large.
Last week we had an executive briefing for folks interested in Exadata V2. My colleagues Kurt Rosenfeld and John Laferrier presented information on business intelligence and the Exadata, as well as the business case and use cases for considering buying one. Joe LaFlamme from Oracle presented some reference customer examples.
I presented the Exadata V2 technical overview, traveling through the architecture details, migration strategies, and component details. Along the way there were a few points I made that seemed a bit surprising to the audience, and that led to a lively discussion. I summarize those points here, as they do not seem to be well known within the industry.
Project Crossbow is an innovate, and I think important, new contribution to the OpenSolaris project. Crossbow makes network virtualization and resource management first-class citizens in OpenSolaris. If follows in the footsteps of ZFS by having a simple and easy-to-understand interface, while providing great flexibility and power to the administrator. Crossbow can only be found in OpenSolaris, and is not available in Solaris 10. My February column for ;login: Magazine describes and explores Project Crossbow in detail. You can download it here, but as always I encourage you to become a member of Usenix, thereby gaining access to all of the content of ;login: (along with many other great benefits).
2010-02-galvin.pdf (678.9 KiB)
Topic: DTrace Deep Dive and a short talk on LDOM Domains and ZFS
When:
Burlington MA Sun Campus – Feb 2, 2010 6:00PM to 9:00 PM
Boston MA – Boston University – Feb 3, 2010 6:00PM to 9:00 PM
(Note: The same content will be presented twice – once in Burlington and once in Boston. Pick the best location and date as convenient.)
Where:
Feb 2 – Sun Microsystems Burlington Campus; 1 Network Drive, Burlington, MA
Feb 3 – Boston University, Electrical and Computer Engineering Department Photonics Center Building – Room PHO 339 (3rd floor), 8 Saint Mary’s Street Boston, MA 02215
BU Parking: Street parking available on St. Mary’s Street and Bay State Road. Metered parking spots do not require a fee after 6pm.
RSVP: To Linda Wendlandt: lwendlandt@cptech.com
Registration Required! – so we can plan food and drink
Join Jim Mauro and Shannon Sylvia for how-to DTrace, and how to use LDOMs with ZFS.
AGENDA:
6:00-6:20: Registration, Pizza and Beverages
6:20-6:30: Introductions: Peter Galvin, CTO, Corporate Technologies
6:30-8:30: Solaris Dynamic Tracing – DTrace – Jim Mauro, Principle Engineer, Sun Microsystems
8:30-9:00: LDOM Domains and ZFS: An example of creating a ZFS bootable root LDOM domain using jumpstart – Shannon Sylvia, Sysadmin, Northeastern University
9:00 Q&A and Discussion
Also we’ll be giving out official NEOSUG T-Shirts and other trinkets, and copies of the OpenSolaris CD and instruction manual.
For more information see the NEOSUG discussion forum.