top

What it's really useful for

top is a simple, command-line tool that enables you to identify:

Use this tool to confirm which processes are running, and identify processes that are maxing out CPU (these are the bottleneck processes), processes that are eating so much memory that its affecting their performance (and the performance of other processes that are starved of memory), and processes that would love to be doing something for you if only they weren't waiting for IO to complete (these can often be rewritten to make better use of IO, leading to some stunning performance improvements).

top - Basic use, basic interpretation

Some more advanced use is demonstrated through the examples that follow this section.

Invocation

Invoke top at your command line:

top d0.5

This invokes top, with an update period of 0.5 seconds (i.e. the display will be refreshed every half second). It will look something like this:

top running

You can get a higher or lower frequency update by altering the numerical value in the command. This looped anigif shows the activity over several seconds. The sort order is the default CPU% (i.e. at any given sample, the commands in the table are ordered by highest CPU% value). The hungry commands in this example are firefox and byzanz-record (which is what I'm using to record the terminal). But what does this all mean? If you're going to use top, at some point you should read the man pages for it, which you can see with the command

man top

but for the moment, we can make do with understanding just a few of the values being shown. As you can see, every process running is included; to get the most accurate view on the process you're interested in, turn off anything else that's heavy on the CPU, memory or any other system resource.

Interpretation

Each time the values refresh, you're looking at a new measurement based on the time period. So in this example, each time the values refresh, the values show what happened in the previous half-second. The third line, beginning %Cpu(s), shows what the CPU has been doing during the interval. These values should sum to 100%. To begin with, the values of particular interest are us, sy, id and wa.

  1. us - Userspace

    The percentage of time that the processes being executed were in "user space". It's not very wrong to think of "user space" as anything outside the kernel. Various things your code can do (for example, reading or writing to a file, or messing about with memory) are done for you by the kernel, via a "system call".

  2. sy - System

    In contrast to the above, the percentage of time that the processes being executed were in "kernel space".

As a very rough first-order rule-of-thumb, if the above two values aren't pretty high, you're leaving a lot of CPU on the table; it's sitting idle, doing nothing, when it could be working for you. All else being equal, this is bad; if you need your program to run faster, and you see a lot of idle CPU time, you need to look for what's causing that idle time.

  1. id - Idle

    The percentage of time in which the CPU didn't really have much to do.

  2. wa - Waiting

    This is actually a kind of idle time; it's the percentage of time in which the CPU is idle, because it's waiting for IO. Contrast this with the id value, which is the percentage of time in which the CPU is idle for some other reason.

The other values in this example are zero because this is a simple example. If you're chasing performance and you see the other values rising, dig out the top man page.

So already we can see how this tool can be used to give a top-level, big-handful understanding of what's going on. Let's take a look at some hypothetical situations:

User/system is high - idle/waiting are low

Whatever's going on with your program, it's not waiting around. You're not going to get much performance improvement by fine-tuning IO.

User/system is low - waiting is high

Your program is spending a lot of time waiting for IO to hand over some data (or waiting for something to finish being pumped out to disk). If you can improve the performance on IO, your program will be able to get on with everything else that much sooner, wasting less time.

Everything is low

Whatever your program is doing involves sitting around waiting for something that isn't IO. Maybe it's sleeping a lot. Maybe it's waiting for data from something that doesn't come up on the IO chart in top. Whatever's going on, fine-tuning little bits of code to make them run faster isn't going to do much for you; your program will just have more time to sit around doing nothing.