Solaris System Analysis FAQ
Airplane pilots must execute a pre-flight checklist before taking off. This list ensures that no steps are missed as the pilot prepares for flight. Over time, these checklists have been standardized and edited by many pilots and aircraft designers to the point that they are complete, logical, useful, and indispensable. System administrators are lacking such consensus documents, for the most part. Rather, some sysadmins have no useful checklists, doing all work ad hoc. Others have their own lists or work with groups that have documented methodologies that they follow. Frequently these lists have limited scope or assume site-specific details.
This FAQ gathers together some best practices for how solve problems on Solaris systems. Frequently the problems are performance, but they could also be reliability or functionality. Also, many of these FAQ entries pertain to computers in general, and Unix in more specific. They might be helpful for systems other than those running Solaris.
Each of the FAQ questions can be read independently for a given type of activity, but they also flow from start to finish to give a complete set of steps to start from a problem and resolve it.
This FAQ was originally based on two of my columns at ;login: Magazine. They are available for download here – System Analysis 101 and here – System Analysis 102.
Please consider helping your fellow Solaris system administrators by contributing your questions, answers, or additions to the questions and answers already posted here filling out the form at the bottom of this page.
Preparing for Problems
[faq summary SSA-PFP]
Determine Status
Sometimes the system has a large, easy-to-find problem. In those cases it would be a shame to spend a lot of time chasing down complex paths. Rather, the first step is to check for obvious problems with the “usual suspect” commands. The goal of this phase is to narrow the problem area to a specific aspect of the system. Solaris System Analysis 101 ended with a list of areas to explore.
[faq summary SSA-DS]
Problem Solutions
User-level problems are relatively easy. If a process is using too much CPU or memory and you have the source code, it is now a program development and debugging problem. If the application is well written, then perhaps the only solution is adding resources to the system to allow the application to match your performance needs. For home-grown code, be sure to use the latest version of a given compiler. Also note that Sun’s SunStudio development environment is now available for free (without support), generates great code, and has good debugging tools built in, including the DTrace-based D-light tool and “performance analyzer” functionality. (http://developers.sun.com/sunstudio/.) Also, at least with Solaris, each release usually brings about performance improvements. If you are running an older version of Solaris, consider the (difficult) step of upgrading. In addition, Java code is a major component in many applications, and Java can be difficult to performance-analyze and tune. Try to use the latest JVM, especially because Java 1.5 adds DTrace support and Java 1.6 automatically optimizes garbage collection.
If the problem is at the system level, then more time (and commands) may be needed to track down the problem. The good news is that Solaris 10 has many more tools than previous Solaris releases (and other operating systems in general) to find and fix these problems.
[faq summary SSA-PS]
DTrace the Problem
Once the range of the problem has been narrowed, specific analysis can be done on the problem area to ferret out the source of the problem. DTrace is a fabulous tool for this analysis. The DTraceToolkit provides over 200 prewritten (but unsupported) tools for getting detailed information about the operation of many areas of the system. Get familiar with the tools so they are in your arsenal when needed. The scripts are well documented and demonstrated online (http://www.brendangregg.com/dtrace.html), so I won’t repeat that information here.
[faq summary SSA-DTP]
Other Resources
[faq summary SSA-OR]
Preparing for Problems
[faq list SSA-PFP]
Determine Status
[faq list SSA-DS]
Problem Solutions
[faq list SSA-PS]
DTrace the Problem
[faq list SSA-DTP]
Other Resources
[faq list SSA-OR]
