Solaris System Analysis FAQ
Airplane pilots must execute a pre-flight checklist before taking off. This list ensures that no steps are missed as the pilot prepares for flight. Over time, these checklists have been standardized and edited by many pilots and aircraft designers to the point that they are complete, logical, useful, and indispensable. System administrators are lacking such consensus documents, for the most part. Rather, some sysadmins have no useful checklists, doing all work ad hoc. Others have their own lists or work with groups that have documented methodologies that they follow. Frequently these lists have limited scope or assume site-specific details.
This FAQ gathers together some best practices for how solve problems on Solaris systems. Frequently the problems are performance, but they could also be reliability or functionality. Also, many of these FAQ entries pertain to computers in general, and Unix in more specific. They might be helpful for systems other than those running Solaris.
Each of the FAQ questions can be read independently for a given type of activity, but they also flow from start to finish to give a complete set of steps to start from a problem and resolve it.
This FAQ was originally based on two of my columns at ;login: Magazine. They are available for download here – System Analysis 101 and here – System Analysis 102.
Please consider helping your fellow Solaris system administrators by contributing your questions, answers, or additions to the questions and answers already posted here filling out the form at the bottom of this page.
Preparing for Problems
Determine Status
Sometimes the system has a large, easy-to-find problem. In those cases it would be a shame to spend a lot of time chasing down complex paths. Rather, the first step is to check for obvious problems with the “usual suspect” commands. The goal of this phase is to narrow the problem area to a specific aspect of the system. Solaris System Analysis 101 ended with a list of areas to explore.
Problem Solutions
User-level problems are relatively easy. If a process is using too much CPU or memory and you have the source code, it is now a program development and debugging problem. If the application is well written, then perhaps the only solution is adding resources to the system to allow the application to match your performance needs. For home-grown code, be sure to use the latest version of a given compiler. Also note that Sun’s SunStudio development environment is now available for free (without support), generates great code, and has good debugging tools built in, including the DTrace-based D-light tool and “performance analyzer” functionality. (http://developers.sun.com/sunstudio/.) Also, at least with Solaris, each release usually brings about performance improvements. If you are running an older version of Solaris, consider the (difficult) step of upgrading. In addition, Java code is a major component in many applications, and Java can be difficult to performance-analyze and tune. Try to use the latest JVM, especially because Java 1.5 adds DTrace support and Java 1.6 automatically optimizes garbage collection.
If the problem is at the system level, then more time (and commands) may be needed to track down the problem. The good news is that Solaris 10 has many more tools than previous Solaris releases (and other operating systems in general) to find and fix these problems.
DTrace the Problem
Once the range of the problem has been narrowed, specific analysis can be done on the problem area to ferret out the source of the problem. DTrace is a fabulous tool for this analysis. The DTraceToolkit provides over 200 prewritten (but unsupported) tools for getting detailed information about the operation of many areas of the system. Get familiar with the tools so they are in your arsenal when needed. The scripts are well documented and demonstrated online (http://www.brendangregg.com/dtrace.html), so I won’t repeat that information here.
Other Resources
Preparing for Problems
The time to learn how to use tools, to understand your facility architecture and performance, and to learn administration and debugging techniques is not when in the midst of a production problem. Rather, these things needs to be part of your DNA, ready for when they are needed. To prepare for the inevitable problem, consider doing the following: Capture the problem definition as succinctly as possible. Doing so helps keep focus on the problem and helps communicate the problem as needed. It also helps avoid the “death spiral”, in which while exploring one problem other (potential) problems, or red herrings, are found. Areas to capture include: The timing of this activity is variable, and generally occurs throughout the other part of problem solving. If more than one person is working on the problem then this phase can be delegated because it can be time-consuming. Identifying the resources you have available is an important step in solving a problem. If you are planning ahead try to get these set up such that you can use them when there is a problem. For each component in the problem environment (certainly computers, but this could also extend to storage, networking, and security components), do the following.
Capture the state and configuration details with the "best" tools available.
Some things to capture if not already recorded by the above actions:
Note that support organizations are likely to push for changes even in supported configurations before escalating a problem. For example, a very common scenario is that technical support will encourage or try to require installation of the latest patches, upgrading to the latest firmware, or even upgrading operating system or application versions. This eliminates variables for them, but it's also how they try to get you off the phone. Such work can take hours or days and frequently is not necessary - the problem frequently still exists after the work. Push back on support, depending on the level of effort and time their recommendations would take, and your level of support.
Ask support if they have any evidence that the problem will be solved by the changes recommended. If they have no proof, try to force them to continue working on the problem without first making their suggested changes. Note that politeness, firmness, and mention of how much you pay them for support is much more effective than poor behavior. If satisfaction is not received, then (politely) try to get to the next level of support management. Also, whomever sold you the facility has a vested interest in having you as a happy customer so have them help increase the priority of the problem within the support organizations. Of course, this is the meat of the project and the most difficult part. But with the preparation from the other questions in the FAQ done, variables have been eliminated and lots of information is now readily available to help diagnose the problem. This phase can vary immensely depending on the kind of problem being worked on, the scale of the problem, and all of the details determined above.
Some first steps for Solaris 10 systems are listed here, but other lists for other operating systems and devices should be compiled for your site (or found to have already been published).
Some general areas to consider, especially on Solaris:
How do I prepare to be ready to solve problems on my Solaris system?
How do I capture the problem details?
Where can I go for help?
What resources should I have available?
How do I capture the state of the problem and its environment?
How do I track down / solve the problem?
Generally, compare the results attained during the problem against the same information from when the system was healthy (see "How do I prepare to be ready to solve problems on my Solaris system").
- iostat -x 10 - large response times.
- mpstat 10 - how many threads are in which states?
- vmstat 10 - thread counts, scan rate indicating memory shortage.
- vmstat -p 10 - system-wide memory operations.
- prstat - resource hogs.
- prstat -Lmp - detailed state information about a specific process.
- pmap -x -explore the memory map of a problem process.
- DTraceToolKit and DTrace scripts - look at specific suspect aspects of the system.
Determine Status
Scan through log files such as /var/adm/messages and via the "dmesg" command. Don’t ignore anything odd: It could be the canary indicating the problem. Run "svcs -a" to check for services that have failed or are disabled. Check for full disks or changed mount information via "df -kh" and "mount". Run "ifconfig -a" and look for any errors; run "kstat" and read through the section of output of a given network interface (such as e1000g0) to check network parameters such as duplex and speed. If on Solaris 10 U5 or better, us "dladm" to check the same. Read through /etc/system and look for settings copied from other systems or left behind during an upgrade. /etc/system should never be copied or left intact between operating system or application upgrades; such events should cause an audit of the file for entries to remove or update. Check the Solaris Tunable Parameters Reference Manual (http://docs.sun.com/app/docs/doc/817-0404). This document is updated for every Solaris release. Watch out for system setting recommendations from vendor documents. Check /etc/projects for any resource management settings that could be affecting system or application performance. Check the load average of the system. "uptime" shows the 5-, 10-, and 15-minute average number of threads running and wanting to run on the system. If those numbers are significantly (two times or more) higher than the number of cores in the system, users will report “slowness.” Check "iostat -x 10" and check the svc_t column for large service times (in milliseconds). Anything above 30 ms can be of concern. Also note that dividing kilobytes written per second by writes per second produces the average write size during that period, which can help when analyzing I/O issues. Is the write size the same size as your database block size? If not there can be a database performance impact. The same applies to the read values (r/s and kr/s). Check "mpstat 10": How was processor time spent? Per CPU (each row being a CPU’s status), what percentage of time was spent in user-land (running user code) (usr), how much in the kernel (sys), and how much idle (idl)? Most time should be usr, and any more than a few percent in the kernel can indicate a problem. Check "vmstat 10" and look at the "sr" colum, the scan rate, to see whether the system is short of memory. The larger this number, the more the system is hunting for memory. Anything above 0 is considered a memory shortage. Memory is orders of magnitude faster than disk, so any use of disk as virtual memory can cause a system slowdown. Check "vmstat -p 10". This shows system-wide memory operations. This is the place to check whether the system is short on memory and to determine which system aspect is using the memory [executable process pages, anonymous (heap, stack, or malloc) uses, or file system I/O]. "vmstat -s" shows the virtual memory status of the system. "available" shows the amount of virtual memory unallocated. "vmstat -l" will show all swap devices and how much space is used and free on each one. If the system is very low on virtual memory, more can be added by adding more swap devices via the "swap -a" command. Note that having more than one swap area per disk can greatly decrease performance. Check "prstat". If the problem is simply processes using up CPUs, then "prstat" can show which processes those are. What is more difficult is figuring out what the process is doing and whether it should be doing it. Find the process-id (pid) of the process in question via "prstat" or "ps -elf". Check "prstat -Lmp
Use "pmap -x
Use DTraceToolKit and DTrace scripts to look at specific suspect aspects of the system. (http://opensolaris.org/os/community/dtrace/dtracetoolkit/)
Are there failing components of the system?
Are any daemons / services failing?
Are any disks full?
Are their networking issues?
Can /etc/system negatively effect system performance?
Does my system have enough CPUs?
Check "vmstat 10" : How many threads want to run but had no CPU available to run on, on average, per second (kthr r)? How many are blocked waiting for something (usually I/O) (kthr b)? How many processes have been swapped out (kthr s)? Swapped out means that the system was desperately short of memory and booted entire processes out to disk. That’s bad.Is my system bottlenecked on disk I/O?
Is my system spending too much time in the kernel and too little time running user processes?
Is my system short on memory?
What is using the memory in my system?
Is my system short on virtual memory?
What processes are using up my CPU?
What is a process spending its time doing?
How much memory is a given process using?
How do I examine all aspects of the system in more detail?
Problem Solutions
Solaris defaults to time-share scheduling for user processes. If your system is a server that doesn’t run general user tasks, then time-sharing is overkill with more overhead than other schedulers. If you want all processes on the system to have the same priority (not changing as time-sharing does based on CPU used and I/O requested), then consider changing to the much lower-overhead fixed priority scheduler “FX.” Such a change could buy you 5% or more CPU time. To make FX the default class execute "dispadmin -d FSS". That change is persistent across reboots and new processes will be assigned to that scheduling class. To move current processes from time-sharing to FX, use "priocntl -s -c FX -i class TS". If so consider implementing the fair-share scheduler and resource management. Those can be implemented either for the full system or, more easily, per-zone when zones (a.k.a. containers) are installed on a system. There is a lot to resource management and zones, as has been covered previously in ;login:Magazine. The slides from my tutorial on Solaris 10 administration have all the gory details and are freely available online (http://www.galvin.info/2006-11.s10admin.zip). There are links to this and other resources at my personal blog (http://www.galvin.info). If so, consider “pinning” them to a set of CPUs. These processes will stay on those CPUs and not be rescheduled or interrupted. A good time to use this technique is for database servers or just for the log-writing process of a database. The Solaris tools to use here are processor sets and process bindings. Consider using containers to hold the applications, and creating a dynamic resource pool that holds those containers. We'll call that "app-pool". If you assign containers to the non-default tool, and all your apps are in the containers, then the only thing left in the default pool (where all interrupts and I/O takes place) is the kernel. The kernel should have one or more CPUs to do its work, maybe 1 CPU on a system of 4 CPUs, and 2 CPUs on a system with 8 or more CPUs. (Check "vmstat" in the default pool to determine how much CPU the kernel and system applications are using to balance the system). The applications in the non-default pool will not be interrupted by system operation and those applications then run highly efficiently. For information on how to implement dynamic resource pools take a look at the containers how-to guide. Having the same sizes of I/O operations from memory through to the physical disk is one key to good I/O performance. For example, OLTP databases such as Oracle’s frequently perform I/O in 8-kB chunks. If you format your disks to use 8-kB block sizes, I/O will be streamlined. Be sure to take into account the underlying disk structures (i.e., if you have a SAN, understand the I/O geometry within the LUNs that are provided). Note that terminology of disk structures varies, but ZFS calls its I/O chunk the “recordsize.” In this Oracle example, set a ZFS recordsize to 8 kB, and, for good performance, make sure that the underlying storage array has RAID sets that are multiples of 8 kB. Jiri Schindler wrote a very in-depth analysis of matching application and device I/O patterns in his PhD thesis (). In general, I/O is the most likely bottleneck, disk I/O the most likely I/O culprit, and individual disks the most limiting I/O device. Any given disk can perform 100 to 200 I/O operations per second (IOPS). If your system needs to do thousands of IOPS, then you need tens of disks, well tuned, to provide that I/O. RAID 0+1 and 1+0 are better-performing than RAID 5, so match the RAID level with the performance needed. To determine if your network ports have headroom or are maxed-out, download and use the nicstat tool. Sun has two product categories: The first includes the “X” and “M” servers, which run a few threads very fast. The “T” servers are chip multi-threading (CMT) systems and run lots of threads, but run them rather slowly. An analogy can help sort out the best uses for these systems. Think of the “X” and “M” servers as race cars and the “T” servers as trucks. Each has its uses, so make sure you use the right system for the needed performance. Also, there are several steps that can be taken to determine whether a “T” server is right for your applications and to tune these servers. Sun’s Web site is the best place to start (http://www.sun.com/bigadmin/topics/coolthreads/). As always, benchmarking is the best way to test performance and performance changes, if the benchmarking is accurate and repeatable. Don't test at T server by having it do 1 part of the job (say 1 request rather than thousands of requests). Watch out especially for caching effects in benchmark efforts. Caching happens at all levels of computer systems, so, for example, it is safest to reboot the systems involved between each benchmark run. Consider, however, that SAN arrays also have caches, which could invalidate (or at least complicate) benchmark results.
Are you running the most appropriate scheduler for each system in your environment?
Does problem involves some processes starving others of resources?
Are there high-priority processes on the system?
Is your system configured with the best-fit page sizes?
Is your I/O well-balanced and spread across enough devices (e.g., disks and network ports)?
Are you using the best CPU type for the workload?
DTrace the Problem
Beyond the DtraceToolkit, the sky is the limit for delving into system activity details. For example, here is sample code to graph the time spent in each system call by each process: Processes starting and exiting immediately can be difficult to spot and can greatly decrease system performance. Find them by the command line Another previously hidden performance hit is error management. Detect and fix failing system calls before moving forward, as that will change your performance picture. A DTraceToolkit tool, errinfo, displays all system call errors. To determine the block size execute To determine the level of multi-threading of the applications on the system execute Networking can also be a bottleneck, as even multiple 1-Gb links can be slower than other system aspects. Even with Solaris 10, network bottlenecks can be difficult to spot owing to the lack of a DTrace networking provider.That provider was included in Solaris Nevada build 93, so it is in OpenSolaris and it should appear in a future Solaris release. For details see Sun’s wiki (http://wikis.sun.com/display/DTrace/ip+Provider). In the meantime a good tool is nicstat, also available online (http://www.brendangregg.com/K9Toolkit/nicstat).
How do I graph the time spent in each system call, by each process?
syscall:::entry
/uid != 0/
{
self->tm = timestamp
}
syscall:::return
/self->tm/
{
@[execname, pid, probefunc] = quantize(timestamp - self->tm);
self->tm = 0
} How do I see whenever a new process is created on the system?
"/usr/sbin/dtrace -n ‘proc:::exec{printf(“%s execing %s, , uid/zone =%d/%sn”,execname,args[0],uid,zonename)}’".How do I see every error that occurs in all processes on the system?
For I/O, to display files and the I/O being done to them execute
"/usr/sbin/dtrace -n ‘io:::start{@[execname, args[2]->fi_pathname] = count()}’"How do I determine the block size of the disk I/O on the system?
"/usr/sbin/dtrace -n ‘io:::start{@[execname, args[2]->fi_pathname] = quantize(args[0]->b _bufsize)}"How do I determine the level of multi-threading of the applications on the system?
"/usr/sbin/dtrace -n ‘profile:::profile-100hz /pid/{@[pid, execname] = lquantize(cpu, 0, 512, 1);}’"How can I DTrace network details on Solaris?
Other Resources
If this FAQ didn't help you solve your system problem, consider the following great resources:
What other resources are available to help with Solaris system analysis?
The DTrace Toolkit - http://opensolaris.org/os/community/dtrace/dtracetoolkit/
Bigadmin - System Administrator Resources and Community - http://www.sun.com/bigadmin/home/index.jsp
Solaris Internals and Performance FAQ - http://www.solarisinternals.com/
The Solaris Internals / Performance and Tools books
- http://www.amazon.com/Solaris-Internals-TM-OpenSolaris-Architecture/dp/0131482092/ref=pd_sim_b_7
- http://www.amazon.com/Solaris-Performance-Tools-Techniques-OpenSolaris/dp/0131568191/ref=pd_sim_b_2
The OpenSolaris book - http://www.amazon.com/OpenSolaris-Bible-Wiley-Nicholas-Solter/dp/0470385480/ref=pd_sim_b_2
The best place to ask questions of Sun experts is probably - http://www.opensolaris.org/os/discussions/