Find out the website causing high load on a Apache webserver

If you’re running an Apache webserver with many customer websites, there will be a time (sooner or later) where your server is flooded with a lot of page requests, causing a high CPU-load and memory usage. Specially if PHP or other scripting is used behind. Most of the time this is caused by a harmful script somewhere in the net. But, how to find out which of the sites is the affected one? Looking at top/ps doesn’t helps much if PHP (f.ex.) is running as a apache module. You will only see a lot of “httpd” processes.

A good tool to get closer to it is apachetop. It takes the access.log as argument and shows you all accessed pages, hosts and more:

apachetop -f /var/log/httpd/access_log

Cool, but… What if you’re using Plesk? It stores the access_log of each website in a separate file within the corresponding vhost directory. The default access_log doesn’t help, as long the problem is not related to webmail for example.

You can add multiple “-f” arguments to apachetop manually. But if you have 300+ vhosts? Not really. Luckily we’re on Linux and can do something like this:

apachetop $(find /var/www/vhosts/*/statistics/logs/ -name "access_log" -print | sed 's/^/-f '/)

This adds all access-logs within our vhosts directory as arguments. Unfortunately, it fails:

Only 50 files are supported at the moment
Segmentation fault

OK, how we can limit the number of files passed to apachetop? Because we’re searching for a lot of request, we can assume the logfile already has some size. Most of our customer sites have a very low load anyway or are used for mail only. So, let us extend the used find command a bit:

apachetop $(find /var/www/vhosts/*/statistics/logs/ -type f -size +10k -name "access_log" -print | sed 's/^/-f '/)

Now, only logs are passed to apachetop which are bigger than 10 kilobytes. You can adjust it as needed. “c” => Bytes, “k” => Kilobytes “M” => Megabytes, “G” => Gigabytes.

Now we see something like this:

last hit: 17:56:25 atop runtime: 0 days, 00:24:25 17:56:35</pre>
All: 747 reqs ( 0.5/sec) 14.5M ( 10.2K/sec) 19.9K/req
 2xx: 657 (88.0%) 3xx: 42 ( 5.6%) 4xx: 44 ( 5.9%) 5xx: 4 ( 0.5%)
 R ( 30s): 40 reqs ( 1.3/sec) 464.6K ( 15.5K/sec) 11.6K/req
 2xx: 38 (95.0%) 3xx: 1 ( 2.5%) 4xx: 1 ( 2.5%) 5xx: 0 ( 0.0%)

REQS REQ/S KB KB/S URL
 2 0.09 21.0 1.0*/plugins/system/yoo_effects/yoo_effects.js.php
 1 0.04 0.5 0.0 /index.php
 1 0.05 6.7 0.3 /
 1 0.05 10.9 0.5 /templates/mobile_elegance/jqm/jquery.mobile-1.2.0.min.css
 1 0.05 1.4 0.1 /media/zoo/assets/css/reset.css
 1 0.05 0.5 0.0 /media/zoo/applications/product/templates/default/assets/css/zoo.css
 1 0.05 1.0 0.0 /plugins/system/yoo_effects/lightbox/shadowbox.css
 1 0.05 1.8 0.1 /components/com_rsform/assets/calendar/calendar.css
 1 0.05 0.7 0.0 /components/com_rsform/assets/css/front.css
 1 0.05 5.2 0.2 /components/com_rsform/assets/js/script.js
 1 0.05 0.5 0.0 /components/com_zoo/assets/js/default.js

Missing something? Yes, the domain…
The access_log doesn’t contains the domain of the  vhost itself, just the path to the file. But maybe enough to find out which site is affected.

If you’re pressing the key “d” one time, you can switch to hosts view. Maybe there is one single IP the all the requests are coming from. If so, you can simply block this IP for some time.

Or, if you could identify the IP address, you can grep for it within all access_logs with (not tried with 50+ files, but think it should work):

grep "12.34.56.78" /var/www/vhosts/*/statistics/logs/access_log
This entry was posted in Apache, Linux, Plesk and tagged , , , , , , , . Bookmark the permalink.

2 Responses to Find out the website causing high load on a Apache webserver

  1. Romu says:

    Great job, thanks for sharing.

    I have added 25 domains now to apachetop and it works as expected. However, now i cant see the domain name that was accessed. How do i find out the domain name that is accessed form the 25 domains that scanned.

    • Urs says:

      Yes, you’re right. As described in the article, I haven’t found a way to get the domain directly:

      Missing something? Yes, the domain…
      The access_log doesn’t contains the domain of the vhost itself, just the path to the file. But maybe enough to find out which site is affected.

      In my case, because the file that caused the high load was unique on the server, i simply could do a ‘find /var/www/vhosts/ -name “filename”‘ to find the affected website. Doesn’t help much if you have a lot of very similar websites of course.

Leave a Reply

Your email address will not be published. Required fields are marked *