Wednesday, January 26, 2011

Top shows load of 2.5: how can I find out which processes are waiting for the CPU?

I have a linux box hosting a pretty low-traffic site. It's an Amazon EC2 "small" instance, running Ubuntu 10.04. When I run "top" on it, I usually see a load of between 2 and 3, which in my experience is pretty high. However, not much seems to really be happening on the machine -- the CPU is almost always idle, and by watching the apache2 access.log file, I can see that not a lot of requests are coming through. How can I figure out what processes are waiting for the CPU, so that I can try to understand why the load metric is so high?

  • Using top you can see which threads are running and which ones are sleeping. This should allow you to at least know what is draining your resources if we're talking about a CPU bottleneck.

    [xxx@absynthe proc]$ top -H
    

    Should bring up a screen like this:

    top - 17:54:38 up 37 min,  2 users,  load average: 0.03, 0.06, 0.07
    Tasks: 338 total,   2 running, 336 sleeping,   0 stopped,   0 zombie
    Cpu(s):  4.1%us,  2.3%sy,  0.0%ni, 92.1%id,  1.5%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   3852932k total,  1596468k used,  2256464k free,    47108k buffers
    Swap:  5963768k total,        0k used,  5963768k free,   681728k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                           
     1853 root      20   0  196m  38m  15m S  9.7  1.0   1:57.58 Xorg                                                                                                                                              
     2186 xxx       -6   0  564m 8828 7216 S  3.9  0.2   0:43.69 pulseaudio                                                                                                                                        
     2611 xxx       20   0 1095m 235m  27m S  3.9  6.3   2:29.52 firefox                                                                                                                                           
     2179 xxx        9 -11  564m 8828 7216 S  1.9  0.2   0:38.34 pulseaudio                                                                                                                                        
     2671 xxx       20   0 1087m  43m  18m S  1.9  1.2   0:06.06 plugin-containe                                                                                                                                   
     2820 xxx       20   0 1275m  67m  23m S  1.9  1.8   0:13.13 souphttpsrc13:s                                                                                                                                   
     2824 xxx       20   0  315m  13m 9492 S  1.9  0.4   0:02.35 gnome-terminal                                                                                                                                    
     3114 xxx       20   0 15088 1300  820 R  1.9  0.0   0:00.02 top                                                                                                                                               
        1 root      20   0 19236 1440 1152 S  0.0  0.0   0:01.07 init                                                                                                                                              
        2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd                                                                                                                                          
        3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0                                                                                                                                       
        4 root      20   0     0    0    0 S  0.0  0.0   0:00.17 ksoftirqd/0                                                                                                                                       
        5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0                                                                                                                                        
        6 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1                                                                                                                                       
    

    The column named S (that is the 8th one from the right) shows S for sleeping threads and R for running threads. You can choose the sorting order by pressing F (capital) and choosing the Process Status field. You can then reverse the sort order by pressing R (capital) so you can see the running threads first.

    If your problem is not a CPU bottleneck I'll need some more info to assist you. Maybe you could post your top like I did on the above example.

    Hope this helps.

    EDIT:

    If you want to know if you're experiencing some kind of I/O bottleneck you can issue the following command: vmstat -s and look for the IO-wait CPU ticks. If, running the command with a couple of seconds interval, the value goes up a lot you might be experiencing an I/O bottleneck. In that case you might be better off using iotop to see what processes are using more I/O resources.

    jfrank : Thank you for the reply. I don't think it's a CPU bottleneck, because it's almost always listed as 100% idle. I can see the list of processes and what state they are in, as you suggest, but isn't it always the case that just one process will be Running and the rest are Sleeping (there is just one CPU)? How will that help me determine which processes are queued up for the CPU, which is my understanding of what the "load" number measures? Actually, the other poster (patate) makes me wonder if processes that are waiting for non-CPU resources, e.g. disk I/O, are included in the load average?
    Khai : I don't know of any way of listing the threads that are waiting for resources however the iotop suggestion is the best to diagnose I/O bottlenecks. I'll edit the answer to add some more insight.
    From Khai
  • The load indicator is not caused only by high CPU usage. Processes can wait for other reasons such as disk and network I/O.

    On Amazon EC2 small instances, disk I/O is very bad, especially /dev/sda2.

    You can run iotop (apt-get install iotop on debian/ubuntu) to see which process consume I/O.

    jfrank : Ok that is helpful. My limited understanding was that the load measured processes waiting for the CPU, but a little more digging indicates that on linux (as opposed to unix) it will include processes waiting for other resources. I installed iotop but there does not seem to be much disk I/O going on either. However, since I am not familiar with iotop and how to interpret its output, I will do some research on it and see what I can figure out.
    jfrank : Actually while searching for information about iotop, I came across this thread. It leads me to believe there may be an issue with Ubuntu 10.04. https://bugs.launchpad.net/ubuntu-on-ec2/+bug/574910
    jfrank : I launched another small instance running Ubuntu 9.10, switched the traffic over to that instance, and now I'm seeing load averages around 0, which is much more in line with what I expected. So, while I did not really learn too much about diagnosing a high load, I did solve my particular issue. And, I did learn a bit more about what load refers to, plus I've added iotop to the list of things that I need to learn more about.
    From patate

0 comments:

Post a Comment