top
What it's really useful for
top is a simple, command-line tool that enables you to identify:
- Processes running
- What processes are using all the CPU (to the level of each core)
- Who's eating all the memory
- How much time each core is sitting idle
- How much time each core is sitting idle because it's waiting for IO
top - Basic use, basic interpretation
Some more advanced use is demonstrated through the examples that follow this section.
Invocation
Invoke top at your command line:
top d0.5
This invokes top, with an update period of 0.5 seconds (i.e. the display will be refreshed every half second). It will look something like this:
You can get a higher or lower frequency update by altering the numerical value in the command. This looped anigif shows the activity over several seconds. The sort order is the default CPU% (i.e. at any given sample, the commands in the table are ordered by highest CPU% value). The hungry commands in this example are firefox and byzanz-record (which is what I'm using to record the terminal). But what does this all mean? If you're going to use top, at some point you should read the man pages for it, which you can see with the command
man top
but for the moment, we can make do with understanding just a few of the values being shown. As you can see, every process running is included; to get the most accurate view on the process you're interested in, turn off anything else that's heavy on the CPU, memory or any other system resource.
Interpretation
Each time the values refresh, you're looking at a new measurement based on the time period. So in this example, each time the values refresh, the values show what happened in the previous half-second. The third line, beginning %Cpu(s), shows what the CPU has been doing during the interval. These values should sum to 100%. To begin with, the values of particular interest are us, sy, id and wa.
- us - Userspace
The percentage of time that the processes being executed were in "user space". It's not very wrong to think of "user space" as anything outside the kernel. Various things your code can do (for example, reading or writing to a file, or messing about with memory) are done for you by the kernel, via a "system call".
- sy - System
In contrast to the above, the percentage of time that the processes being executed were in "kernel space".
As a very rough first-order rule-of-thumb, if the above two values aren't pretty high, you're leaving a lot of CPU on the table; it's sitting idle, doing nothing, when it could be working for you. All else being equal, this is bad; if you need your program to run faster, and you see a lot of idle CPU time, you need to look for what's causing that idle time.
- id - Idle
The percentage of time in which the CPU didn't really have much to do.
- wa - Waiting
This is actually a kind of idle time; it's the percentage of time in which the CPU is idle, because it's waiting for IO. Contrast this with the id value, which is the percentage of time in which the CPU is idle for some other reason.
The other values in this example are zero because this is a simple example. If you're chasing performance and you see the other values rising, dig out the top man page.
So already we can see how this tool can be used to give a top-level, big-handful understanding of what's going on. Let's take a look at some hypothetical situations:
User/system is high - idle/waiting are low
Whatever's going on with your program, it's not waiting around. You're not going to get much performance improvement by fine-tuning IO.
User/system is low - waiting is high
Your program is spending a lot of time waiting for IO to hand over some data (or waiting for something to finish being pumped out to disk). If you can improve the performance on IO, your program will be able to get on with everything else that much sooner, wasting less time.
Everything is low
Whatever your program is doing involves sitting around waiting for something that isn't IO. Maybe it's sleeping a lot. Maybe it's waiting for data from something that doesn't come up on the IO chart in top. Whatever's going on, fine-tuning little bits of code to make them run faster isn't going to do much for you; your program will just have more time to sit around doing nothing.